From patchwork Thu Aug 24 03:13:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenghui Pan X-Patchwork-Id: 1825107 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RWSqQ6Lc0z1yfF for ; Thu, 24 Aug 2023 13:14:26 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EFFBF383138A for ; Thu, 24 Aug 2023 03:14:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id CE7EB3856254 for ; Thu, 24 Aug 2023 03:14:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CE7EB3856254 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qZ0nO-0007LP-Ec for gcc-patches@gcc.gnu.org; Wed, 23 Aug 2023 23:14:10 -0400 Received: from loongson.cn (unknown [10.20.4.45]) by gateway (Coremail) with SMTP id _____8BxuOjzyuZk7mEbAA--.20272S3; Thu, 24 Aug 2023 11:13:55 +0800 (CST) Received: from loongson-pc.loongson.cn (unknown [10.20.4.45]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxviPdyuZkzvJhAA--.583S5; Thu, 24 Aug 2023 11:13:54 +0800 (CST) From: Chenghui Pan To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn Subject: [PATCH v5 1/6] LoongArch: Add Loongson SX vector directive compilation framework. Date: Thu, 24 Aug 2023 11:13:11 +0800 Message-Id: <20230824031316.16599-2-panchenghui@loongson.cn> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230824031316.16599-1-panchenghui@loongson.cn> References: <20230824031316.16599-1-panchenghui@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxviPdyuZkzvJhAA--.583S5 X-CM-SenderInfo: psdquxxhqjx33l6o00pqjv00gofq/1tbiAQANBGTlhzMLuAAAsO X-Coremail-Antispam: 1Uk129KBj9fXoW3uw1rGF43XFyDWFWrZrWrJFc_yoW8GF1rWo WFyF98Zw18Gr4S934DtwnIqrWDtr1UAr4UA397Zw4UGFs7JFy5JFyUWr1Yvry7Ja97Wr98 A34UW397Ja4xJrnxl-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY17kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUXVWUAwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU7_MaUUUUU Received-SPF: pass client-ip=114.242.206.163; envelope-from=panchenghui@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_FAIL, SPF_HELO_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Lulu Cheng gcc/ChangeLog: * config/loongarch/genopts/loongarch-strings: Add compilation framework. * config/loongarch/genopts/loongarch.opt.in: Ditto. * config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto. * config/loongarch/loongarch-def.c: Ditto. * config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto. (ISA_EXT_SIMD_LSX): Ditto. (N_SWITCH_TYPES): Ditto. (SW_LSX): Ditto. (struct loongarch_isa): Ditto. * config/loongarch/loongarch-driver.cc (APPEND_SWITCH): Ditto. (driver_get_normalized_m_opts): Ditto. * config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): Ditto. * config/loongarch/loongarch-opts.cc (loongarch_config_target): Ditto. (isa_str): Ditto. * config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto. * config/loongarch/loongarch-str.h (OPTSTR_LSX): Ditto. * config/loongarch/loongarch.opt: Ditto. --- .../loongarch/genopts/loongarch-strings | 3 + gcc/config/loongarch/genopts/loongarch.opt.in | 8 +- gcc/config/loongarch/loongarch-c.cc | 7 ++ gcc/config/loongarch/loongarch-def.c | 4 + gcc/config/loongarch/loongarch-def.h | 7 +- gcc/config/loongarch/loongarch-driver.cc | 10 +++ gcc/config/loongarch/loongarch-driver.h | 1 + gcc/config/loongarch/loongarch-opts.cc | 82 ++++++++++++++++++- gcc/config/loongarch/loongarch-opts.h | 1 + gcc/config/loongarch/loongarch-str.h | 2 + gcc/config/loongarch/loongarch.opt | 8 +- 11 files changed, 128 insertions(+), 5 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch-strings b/gcc/config/loongarch/genopts/loongarch-strings index a40998ead97..24a5025061f 100644 --- a/gcc/config/loongarch/genopts/loongarch-strings +++ b/gcc/config/loongarch/genopts/loongarch-strings @@ -40,6 +40,9 @@ OPTSTR_SOFT_FLOAT soft-float OPTSTR_SINGLE_FLOAT single-float OPTSTR_DOUBLE_FLOAT double-float +# SIMD extensions +OPTSTR_LSX lsx + # -mabi= OPTSTR_ABI_BASE abi STR_ABI_BASE_LP64D lp64d diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 4b9b4ac273e..338d77a7e40 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -76,6 +76,9 @@ m@@OPTSTR_DOUBLE_FLOAT@@ Target Driver RejectNegative Var(la_opt_switches) Mask(FORCE_F64) Negative(m@@OPTSTR_SOFT_FLOAT@@) Allow hardware floating-point instructions to cover both 32-bit and 64-bit operations. +m@@OPTSTR_LSX@@ +Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@) +Enable LoongArch SIMD Extension (LSX). ;; Base target models (implies ISA & tune parameters) Enum @@ -125,11 +128,14 @@ Target RejectNegative Joined ToLower Enum(abi_base) Var(la_opt_abi_base) Init(M_ Variable int la_opt_abi_ext = M_OPTION_NOT_SEEN - mbranch-cost= Target RejectNegative Joined UInteger Var(loongarch_branch_cost) -mbranch-cost=COST Set the cost of branches to roughly COST instructions. +mmemvec-cost= +Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) IntegerRange(1, 5) +mmemvec-cost=COST Set the cost of vector memory access instructions. + mcheck-zero-division Target Mask(CHECK_ZERO_DIV) Trap on integer divide by zero. diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index 67911b78f28..b065921adc3 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -99,6 +99,13 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) else builtin_define ("__loongarch_frlen=0"); + if (ISA_HAS_LSX) + { + builtin_define ("__loongarch_simd"); + builtin_define ("__loongarch_sx"); + builtin_define ("__loongarch_sx_width=128"); + } + /* Native Data Sizes. */ builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE); builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE); diff --git a/gcc/config/loongarch/loongarch-def.c b/gcc/config/loongarch/loongarch-def.c index 6729c857f7c..28e24c62249 100644 --- a/gcc/config/loongarch/loongarch-def.c +++ b/gcc/config/loongarch/loongarch-def.c @@ -49,10 +49,12 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = { [CPU_LOONGARCH64] = { .base = ISA_BASE_LA64V100, .fpu = ISA_EXT_FPU64, + .simd = 0, }, [CPU_LA464] = { .base = ISA_BASE_LA64V100, .fpu = ISA_EXT_FPU64, + .simd = ISA_EXT_SIMD_LSX, }, }; @@ -147,6 +149,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = { [ISA_EXT_FPU64] = STR_ISA_EXT_FPU64, [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32, [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU, + [ISA_EXT_SIMD_LSX] = OPTSTR_LSX, }; const char* @@ -176,6 +179,7 @@ loongarch_switch_strings[] = { [SW_SOFT_FLOAT] = OPTSTR_SOFT_FLOAT, [SW_SINGLE_FLOAT] = OPTSTR_SINGLE_FLOAT, [SW_DOUBLE_FLOAT] = OPTSTR_DOUBLE_FLOAT, + [SW_LSX] = OPTSTR_LSX, }; diff --git a/gcc/config/loongarch/loongarch-def.h b/gcc/config/loongarch/loongarch-def.h index fb8bb88eb52..f34cffcfb9b 100644 --- a/gcc/config/loongarch/loongarch-def.h +++ b/gcc/config/loongarch/loongarch-def.h @@ -63,7 +63,8 @@ extern const char* loongarch_isa_ext_strings[]; #define ISA_EXT_FPU32 1 #define ISA_EXT_FPU64 2 #define N_ISA_EXT_FPU_TYPES 3 -#define N_ISA_EXT_TYPES 3 +#define ISA_EXT_SIMD_LSX 3 +#define N_ISA_EXT_TYPES 4 /* enum abi_base */ extern const char* loongarch_abi_base_strings[]; @@ -97,7 +98,8 @@ extern const char* loongarch_switch_strings[]; #define SW_SOFT_FLOAT 0 #define SW_SINGLE_FLOAT 1 #define SW_DOUBLE_FLOAT 2 -#define N_SWITCH_TYPES 3 +#define SW_LSX 3 +#define N_SWITCH_TYPES 4 /* The common default value for variables whose assignments are triggered by command-line options. */ @@ -111,6 +113,7 @@ struct loongarch_isa { unsigned char base; /* ISA_BASE_ */ unsigned char fpu; /* ISA_EXT_FPU_ */ + unsigned char simd; /* ISA_EXT_SIMD_ */ }; struct loongarch_abi diff --git a/gcc/config/loongarch/loongarch-driver.cc b/gcc/config/loongarch/loongarch-driver.cc index 11ce082417f..aa5011bd86a 100644 --- a/gcc/config/loongarch/loongarch-driver.cc +++ b/gcc/config/loongarch/loongarch-driver.cc @@ -160,6 +160,10 @@ driver_get_normalized_m_opts (int argc, const char **argv) APPEND_LTR (" % promotes %<%s%> to %<%s%s%>", + OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[t.isa.fpu], + OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[ISA_EXT_FPU64]); + + t.isa.fpu = ISA_EXT_FPU64; + } + else if (on (SOFT_FLOAT) || on (SINGLE_FLOAT)) + { + if (constrained.simd) + inform (UNKNOWN_LOCATION, + "%<-m%s%> is disabled by %<-m%s%>, because it requires %<%s%s%>", + loongarch_switch_strings[simd_switch], + loongarch_switch_strings[on_switch], + OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[ISA_EXT_FPU64]); + + /* SIMD that comes from arch default. */ + t.isa.simd = 0; + } + else + { + /* -mfpu=0 / -mfpu=32 is set. */ + if (constrained.simd) + fatal_error (UNKNOWN_LOCATION, + "%<-m%s=%s%> conflicts with %<-m%s%>," + "which requires %<%s%s%>", + OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[t.isa.fpu], + loongarch_switch_strings[simd_switch], + OPTSTR_ISA_EXT_FPU, + loongarch_isa_ext_strings[ISA_EXT_FPU64]); + + /* Same as above. */ + t.isa.simd = 0; + } + } /* 4. ABI-ISA compatibility */ /* Note: @@ -530,6 +599,17 @@ isa_str (const struct loongarch_isa *isa, char separator) APPEND_STRING (OPTSTR_ISA_EXT_FPU) APPEND_STRING (loongarch_isa_ext_strings[isa->fpu]) } + + switch (isa->simd) + { + case ISA_EXT_SIMD_LSX: + APPEND1 (separator); + APPEND_STRING (loongarch_isa_ext_strings[isa->simd]); + break; + + default: + gcc_assert (isa->simd == 0); + } APPEND1 ('\0') /* Add more here. */ diff --git a/gcc/config/loongarch/loongarch-opts.h b/gcc/config/loongarch/loongarch-opts.h index b1ff54426e4..d067c05dfc9 100644 --- a/gcc/config/loongarch/loongarch-opts.h +++ b/gcc/config/loongarch/loongarch-opts.h @@ -66,6 +66,7 @@ loongarch_config_target (struct loongarch_target *target, || la_target.abi.base == ABI_BASE_LP64F \ || la_target.abi.base == ABI_BASE_LP64S) +#define ISA_HAS_LSX (la_target.isa.simd == ISA_EXT_SIMD_LSX) #define TARGET_ARCH_NATIVE (la_target.cpu_arch == CPU_NATIVE) #define LARCH_ACTUAL_ARCH (TARGET_ARCH_NATIVE \ ? (la_target.cpu_native < N_ARCH_TYPES \ diff --git a/gcc/config/loongarch/loongarch-str.h b/gcc/config/loongarch/loongarch-str.h index af2e82a321f..6fa1b1571c5 100644 --- a/gcc/config/loongarch/loongarch-str.h +++ b/gcc/config/loongarch/loongarch-str.h @@ -42,6 +42,8 @@ along with GCC; see the file COPYING3. If not see #define OPTSTR_SINGLE_FLOAT "single-float" #define OPTSTR_DOUBLE_FLOAT "double-float" +#define OPTSTR_LSX "lsx" + #define OPTSTR_ABI_BASE "abi" #define STR_ABI_BASE_LP64D "lp64d" #define STR_ABI_BASE_LP64F "lp64f" diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 68018ade73f..5c7e6d37220 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -83,6 +83,9 @@ mdouble-float Target Driver RejectNegative Var(la_opt_switches) Mask(FORCE_F64) Negative(msoft-float) Allow hardware floating-point instructions to cover both 32-bit and 64-bit operations. +mlsx +Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(mlsx) +Enable LoongArch SIMD Extension (LSX). ;; Base target models (implies ISA & tune parameters) Enum @@ -132,11 +135,14 @@ Target RejectNegative Joined ToLower Enum(abi_base) Var(la_opt_abi_base) Init(M_ Variable int la_opt_abi_ext = M_OPTION_NOT_SEEN - mbranch-cost= Target RejectNegative Joined UInteger Var(loongarch_branch_cost) -mbranch-cost=COST Set the cost of branches to roughly COST instructions. +mmemvec-cost= +Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) IntegerRange(1, 5) +mmemvec-cost=COST Set the cost of vector memory access instructions. + mcheck-zero-division Target Mask(CHECK_ZERO_DIV) Trap on integer divide by zero. From patchwork Thu Aug 24 03:13:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenghui Pan X-Patchwork-Id: 1825108 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RWSrQ6Fnyz1yNm for ; Thu, 24 Aug 2023 13:15:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C3851385E038 for ; Thu, 24 Aug 2023 03:15:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 6B5973858C01 for ; Thu, 24 Aug 2023 03:14:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6B5973858C01 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.45]) by gateway (Coremail) with SMTP id _____8AxEvD2yuZk72EbAA--.55160S3; Thu, 24 Aug 2023 11:13:58 +0800 (CST) Received: from loongson-pc.loongson.cn (unknown [10.20.4.45]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxviPdyuZkzvJhAA--.583S6; Thu, 24 Aug 2023 11:13:55 +0800 (CST) From: Chenghui Pan To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn Subject: [PATCH v5 2/6] LoongArch: Add Loongson SX base instruction support. Date: Thu, 24 Aug 2023 11:13:12 +0800 Message-Id: <20230824031316.16599-3-panchenghui@loongson.cn> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230824031316.16599-1-panchenghui@loongson.cn> References: <20230824031316.16599-1-panchenghui@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxviPdyuZkzvJhAA--.583S6 X-CM-SenderInfo: psdquxxhqjx33l6o00pqjv00gofq/1tbiAQANBGTlhzMLuAABsP X-Coremail-Antispam: 1Uk129KBj9DXoWkKrWUWr4rGrWkAw4rKr1kCrX_yoW5XFWkWr c_Ww1Syr17Jry5WasYqws29r15GrykJF10kFnxZFyUWas2gw1rtw1qqrs7ZasxZrn7trZ3 tryqkFs09r1Sgr1kKosvyTuYvTs0mTUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUj1kv1T uYvTs0mT0YCTnIWjqI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9f nUUIcSsGvfJTRUUUb7AYFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4 v20xvaj40_Wr0E3s1l1IIY67AEw4v_JF0_JFyl8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW8JVW5JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E 14v26r4UJVWxJr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44 I27wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jw0_WrylYx0Ex4A2 jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20x vY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I 3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2Ij64vIr41lIx AIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAI cVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2js IEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU8rwIDUUUUU== X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_STOCKGEN, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Lulu Cheng gcc/ChangeLog: * config/loongarch/constraints.md (M): Add Loongson LSX base instruction support. (N): Ditto. (O): Ditto. (P): Ditto. (R): Ditto. (S): Ditto. (YG): Ditto. (YA): Ditto. (YB): Ditto. (Yb): Ditto. (Yh): Ditto. (Yw): Ditto. (YI): Ditto. (YC): Ditto. (YZ): Ditto. (Unv5): Ditto. (Uuv5): Ditto. (Usv5): Ditto. (Uuv6): Ditto. (Urv8): Ditto. * config/loongarch/loongarch-builtins.cc (loongarch_gen_const_int_vector): Ditto. * config/loongarch/loongarch-modes.def (VECTOR_MODES): Ditto. (VECTOR_MODE): Ditto. (INT_MODE): Ditto. * config/loongarch/loongarch-protos.h (loongarch_split_move_insn_p): Ditto. (loongarch_split_move_insn): Ditto. (loongarch_split_128bit_move): Ditto. (loongarch_split_128bit_move_p): Ditto. (loongarch_split_lsx_copy_d): Ditto. (loongarch_split_lsx_insert_d): Ditto. (loongarch_split_lsx_fill_d): Ditto. (loongarch_expand_vec_cmp): Ditto. (loongarch_const_vector_same_val_p): Ditto. (loongarch_const_vector_same_bytes_p): Ditto. (loongarch_const_vector_same_int_p): Ditto. (loongarch_const_vector_shuffle_set_p): Ditto. (loongarch_const_vector_bitimm_set_p): Ditto. (loongarch_const_vector_bitimm_clr_p): Ditto. (loongarch_lsx_vec_parallel_const_half): Ditto. (loongarch_gen_const_int_vector): Ditto. (loongarch_lsx_output_division): Ditto. (loongarch_expand_vector_init): Ditto. (loongarch_expand_vec_unpack): Ditto. (loongarch_expand_vec_perm): Ditto. (loongarch_expand_vector_extract): Ditto. (loongarch_expand_vector_reduc): Ditto. (loongarch_ldst_scaled_shift): Ditto. (loongarch_expand_vec_cond_expr): Ditto. (loongarch_expand_vec_cond_mask_expr): Ditto. (loongarch_builtin_vectorized_function): Ditto. (loongarch_gen_const_int_vector_shuffle): Ditto. (loongarch_build_signbit_mask): Ditto. * config/loongarch/loongarch.cc (loongarch_pass_aggregate_num_fpr): Ditto. (loongarch_setup_incoming_varargs): Ditto. (loongarch_emit_move): Ditto. (loongarch_const_vector_bitimm_set_p): Ditto. (loongarch_const_vector_bitimm_clr_p): Ditto. (loongarch_const_vector_same_val_p): Ditto. (loongarch_const_vector_same_bytes_p): Ditto. (loongarch_const_vector_same_int_p): Ditto. (loongarch_const_vector_shuffle_set_p): Ditto. (loongarch_symbol_insns): Ditto. (loongarch_cannot_force_const_mem): Ditto. (loongarch_valid_offset_p): Ditto. (loongarch_valid_index_p): Ditto. (loongarch_classify_address): Ditto. (loongarch_address_insns): Ditto. (loongarch_ldst_scaled_shift): Ditto. (loongarch_const_insns): Ditto. (loongarch_split_move_insn_p): Ditto. (loongarch_subword_at_byte): Ditto. (loongarch_legitimize_move): Ditto. (loongarch_builtin_vectorization_cost): Ditto. (loongarch_split_move_p): Ditto. (loongarch_split_move): Ditto. (loongarch_split_move_insn): Ditto. (loongarch_output_move_index_float): Ditto. (loongarch_split_128bit_move_p): Ditto. (loongarch_split_128bit_move): Ditto. (loongarch_split_lsx_copy_d): Ditto. (loongarch_split_lsx_insert_d): Ditto. (loongarch_split_lsx_fill_d): Ditto. (loongarch_output_move): Ditto. (loongarch_extend_comparands): Ditto. (loongarch_print_operand_reloc): Ditto. (loongarch_print_operand): Ditto. (loongarch_hard_regno_mode_ok_uncached): Ditto. (loongarch_hard_regno_call_part_clobbered): Ditto. (loongarch_hard_regno_nregs): Ditto. (loongarch_class_max_nregs): Ditto. (loongarch_can_change_mode_class): Ditto. (loongarch_mode_ok_for_mov_fmt_p): Ditto. (loongarch_secondary_reload): Ditto. (loongarch_vector_mode_supported_p): Ditto. (loongarch_preferred_simd_mode): Ditto. (loongarch_autovectorize_vector_modes): Ditto. (loongarch_lsx_output_division): Ditto. (loongarch_option_override_internal): Ditto. (loongarch_hard_regno_caller_save_mode): Ditto. (MAX_VECT_LEN): Ditto. (loongarch_spill_class): Ditto. (struct expand_vec_perm_d): Ditto. (loongarch_promote_function_mode): Ditto. (loongarch_expand_vselect): Ditto. (loongarch_starting_frame_offset): Ditto. (loongarch_expand_vselect_vconcat): Ditto. (TARGET_ASM_ALIGNED_DI_OP): Ditto. (TARGET_OPTION_OVERRIDE): Ditto. (TARGET_LEGITIMIZE_ADDRESS): Ditto. (loongarch_expand_lsx_shuffle): Ditto. (TARGET_ASM_SELECT_RTX_SECTION): Ditto. (TARGET_ASM_FUNCTION_RODATA_SECTION): Ditto. (TARGET_SCHED_INIT): Ditto. (TARGET_SCHED_REORDER): Ditto. (TARGET_SCHED_REORDER2): Ditto. (TARGET_SCHED_VARIABLE_ISSUE): Ditto. (TARGET_SCHED_ADJUST_COST): Ditto. (TARGET_SCHED_ISSUE_RATE): Ditto. (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD): Ditto. (TARGET_FUNCTION_OK_FOR_SIBCALL): Ditto. (TARGET_VALID_POINTER_MODE): Ditto. (TARGET_REGISTER_MOVE_COST): Ditto. (TARGET_MEMORY_MOVE_COST): Ditto. (TARGET_RTX_COSTS): Ditto. (TARGET_ADDRESS_COST): Ditto. (TARGET_IN_SMALL_DATA_P): Ditto. (TARGET_PREFERRED_RELOAD_CLASS): Ditto. (TARGET_ASM_FILE_START_FILE_DIRECTIVE): Ditto. (loongarch_expand_vec_perm): Ditto. (TARGET_EXPAND_BUILTIN_VA_START): Ditto. (TARGET_PROMOTE_FUNCTION_MODE): Ditto. (TARGET_RETURN_IN_MEMORY): Ditto. (TARGET_FUNCTION_VALUE): Ditto. (TARGET_LIBCALL_VALUE): Ditto. (loongarch_try_expand_lsx_vshuf_const): Ditto. (TARGET_ASM_OUTPUT_MI_THUNK): Ditto. (TARGET_ASM_CAN_OUTPUT_MI_THUNK): Ditto. (TARGET_PRINT_OPERAND): Ditto. (TARGET_PRINT_OPERAND_ADDRESS): Ditto. (TARGET_PRINT_OPERAND_PUNCT_VALID_P): Ditto. (TARGET_SETUP_INCOMING_VARARGS): Ditto. (TARGET_STRICT_ARGUMENT_NAMING): Ditto. (TARGET_MUST_PASS_IN_STACK): Ditto. (TARGET_PASS_BY_REFERENCE): Ditto. (TARGET_ARG_PARTIAL_BYTES): Ditto. (TARGET_FUNCTION_ARG): Ditto. (TARGET_FUNCTION_ARG_ADVANCE): Ditto. (TARGET_FUNCTION_ARG_BOUNDARY): Ditto. (TARGET_SCALAR_MODE_SUPPORTED_P): Ditto. (TARGET_INIT_BUILTINS): Ditto. (loongarch_expand_vec_perm_const_1): Ditto. (loongarch_expand_vec_perm_const_2): Ditto. (loongarch_vectorize_vec_perm_const): Ditto. (loongarch_sched_reassociation_width): Ditto. (loongarch_expand_vector_extract): Ditto. (emit_reduc_half): Ditto. (loongarch_expand_vector_reduc): Ditto. (loongarch_expand_vec_unpack): Ditto. (loongarch_lsx_vec_parallel_const_half): Ditto. (loongarch_constant_elt_p): Ditto. (loongarch_gen_const_int_vector_shuffle): Ditto. (loongarch_expand_vector_init): Ditto. (loongarch_expand_lsx_cmp): Ditto. (loongarch_expand_vec_cond_expr): Ditto. (loongarch_expand_vec_cond_mask_expr): Ditto. (loongarch_expand_vec_cmp): Ditto. (loongarch_case_values_threshold): Ditto. (loongarch_build_const_vector): Ditto. (loongarch_build_signbit_mask): Ditto. (loongarch_builtin_support_vector_misalignment): Ditto. (TARGET_ASM_ALIGNED_HI_OP): Ditto. (TARGET_ASM_ALIGNED_SI_OP): Ditto. (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Ditto. (TARGET_VECTOR_MODE_SUPPORTED_P): Ditto. (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto. (TARGET_VECTORIZE_VEC_PERM_CONST): Ditto. (TARGET_SCHED_REASSOCIATION_WIDTH): Ditto. (TARGET_CASE_VALUES_THRESHOLD): Ditto. (TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Ditto. (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto. * config/loongarch/loongarch.h (TARGET_SUPPORTS_WIDE_INT): Ditto. (UNITS_PER_LSX_REG): Ditto. (BITS_PER_LSX_REG): Ditto. (BIGGEST_ALIGNMENT): Ditto. (LSX_REG_FIRST): Ditto. (LSX_REG_LAST): Ditto. (LSX_REG_NUM): Ditto. (LSX_REG_P): Ditto. (LSX_REG_RTX_P): Ditto. (IMM13_OPERAND): Ditto. (LSX_SUPPORTED_MODE_P): Ditto. * config/loongarch/loongarch.md (unknown,add,sub,not,nor,and,or,xor): Ditto. (unknown,add,sub,not,nor,and,or,xor,simd_add): Ditto. (unknown,none,QI,HI,SI,DI,TI,SF,DF,TF,FCC): Ditto. (mode" ): Ditto. (DF): Ditto. (SF): Ditto. (sf): Ditto. (DI): Ditto. (SI): Ditto. * config/loongarch/predicates.md (const_lsx_branch_operand): Ditto. (const_uimm3_operand): Ditto. (const_8_to_11_operand): Ditto. (const_12_to_15_operand): Ditto. (const_uimm4_operand): Ditto. (const_uimm6_operand): Ditto. (const_uimm7_operand): Ditto. (const_uimm8_operand): Ditto. (const_imm5_operand): Ditto. (const_imm10_operand): Ditto. (const_imm13_operand): Ditto. (reg_imm10_operand): Ditto. (aq8b_operand): Ditto. (aq8h_operand): Ditto. (aq8w_operand): Ditto. (aq8d_operand): Ditto. (aq10b_operand): Ditto. (aq10h_operand): Ditto. (aq10w_operand): Ditto. (aq10d_operand): Ditto. (aq12b_operand): Ditto. (aq12h_operand): Ditto. (aq12w_operand): Ditto. (aq12d_operand): Ditto. (const_m1_operand): Ditto. (reg_or_m1_operand): Ditto. (const_exp_2_operand): Ditto. (const_exp_4_operand): Ditto. (const_exp_8_operand): Ditto. (const_exp_16_operand): Ditto. (const_exp_32_operand): Ditto. (const_0_or_1_operand): Ditto. (const_0_to_3_operand): Ditto. (const_0_to_7_operand): Ditto. (const_2_or_3_operand): Ditto. (const_4_to_7_operand): Ditto. (const_8_to_15_operand): Ditto. (const_16_to_31_operand): Ditto. (qi_mask_operand): Ditto. (hi_mask_operand): Ditto. (si_mask_operand): Ditto. (d_operand): Ditto. (db4_operand): Ditto. (db7_operand): Ditto. (db8_operand): Ditto. (ib3_operand): Ditto. (sb4_operand): Ditto. (sb5_operand): Ditto. (sb8_operand): Ditto. (sd8_operand): Ditto. (ub4_operand): Ditto. (ub8_operand): Ditto. (uh4_operand): Ditto. (uw4_operand): Ditto. (uw5_operand): Ditto. (uw6_operand): Ditto. (uw8_operand): Ditto. (addiur2_operand): Ditto. (addiusp_operand): Ditto. (andi16_operand): Ditto. (movep_src_register): Ditto. (movep_src_operand): Ditto. (fcc_reload_operand): Ditto. (muldiv_target_operand): Ditto. (const_vector_same_val_operand): Ditto. (const_vector_same_simm5_operand): Ditto. (const_vector_same_uimm5_operand): Ditto. (const_vector_same_ximm5_operand): Ditto. (const_vector_same_uimm6_operand): Ditto. (par_const_vector_shf_set_operand): Ditto. (reg_or_vector_same_val_operand): Ditto. (reg_or_vector_same_simm5_operand): Ditto. (reg_or_vector_same_uimm5_operand): Ditto. (reg_or_vector_same_ximm5_operand): Ditto. (reg_or_vector_same_uimm6_operand): Ditto. * doc/md.texi: Ditto. * config/loongarch/lsx.md: New file. gcc/testsuite/ChangeLog: * g++.dg/torture/vshuf-v16qi.C: Skip loongarch*-*-* because of vshuf insn's undefined result when 6 or 7 bit of vector's element is set. * g++.dg/torture/vshuf-v2df.C: Ditto. * g++.dg/torture/vshuf-v2di.C: Ditto. * g++.dg/torture/vshuf-v4sf.C: Ditto. * g++.dg/torture/vshuf-v8hi.C: Ditto. --- gcc/config/loongarch/constraints.md | 131 +- gcc/config/loongarch/loongarch-builtins.cc | 10 + gcc/config/loongarch/loongarch-modes.def | 38 + gcc/config/loongarch/loongarch-protos.h | 31 + gcc/config/loongarch/loongarch.cc | 2214 +++++++++- gcc/config/loongarch/loongarch.h | 65 +- gcc/config/loongarch/loongarch.md | 44 +- gcc/config/loongarch/lsx.md | 4467 ++++++++++++++++++++ gcc/config/loongarch/predicates.md | 333 +- gcc/doc/md.texi | 11 + gcc/testsuite/g++.dg/torture/vshuf-v16qi.C | 1 + gcc/testsuite/g++.dg/torture/vshuf-v2df.C | 2 + gcc/testsuite/g++.dg/torture/vshuf-v2di.C | 1 + gcc/testsuite/g++.dg/torture/vshuf-v4sf.C | 2 +- gcc/testsuite/g++.dg/torture/vshuf-v8hi.C | 1 + 15 files changed, 7167 insertions(+), 184 deletions(-) create mode 100644 gcc/config/loongarch/lsx.md diff --git a/gcc/config/loongarch/constraints.md b/gcc/config/loongarch/constraints.md index 7a38cd07ae9..39505e45efe 100644 --- a/gcc/config/loongarch/constraints.md +++ b/gcc/config/loongarch/constraints.md @@ -76,12 +76,13 @@ ;; "Le" ;; "A signed 32-bit constant can be expressed as Lb + I, but not a ;; single Lb or I." -;; "M" <-----unused -;; "N" <-----unused -;; "O" <-----unused -;; "P" <-----unused +;; "M" "A constant that cannot be loaded using @code{lui}, @code{addiu} +;; or @code{ori}." +;; "N" "A constant in the range -65535 to -1 (inclusive)." +;; "O" "A signed 15-bit constant." +;; "P" "A constant in the range 1 to 65535 (inclusive)." ;; "Q" <-----unused -;; "R" <-----unused +;; "R" "An address that can be used in a non-macro load or store." ;; "S" <-----unused ;; "T" <-----unused ;; "U" <-----unused @@ -214,6 +215,63 @@ (define_constraint "Le" (and (match_code "const_int") (match_test "loongarch_addu16i_imm12_operand_p (ival, SImode)"))) +(define_constraint "M" + "A constant that cannot be loaded using @code{lui}, @code{addiu} + or @code{ori}." + (and (match_code "const_int") + (not (match_test "IMM12_OPERAND (ival)")) + (not (match_test "IMM12_OPERAND_UNSIGNED (ival)")) + (not (match_test "LU12I_OPERAND (ival)")))) + +(define_constraint "N" + "A constant in the range -65535 to -1 (inclusive)." + (and (match_code "const_int") + (match_test "ival >= -0xffff && ival < 0"))) + +(define_constraint "O" + "A signed 15-bit constant." + (and (match_code "const_int") + (match_test "ival >= -0x4000 && ival < 0x4000"))) + +(define_constraint "P" + "A constant in the range 1 to 65535 (inclusive)." + (and (match_code "const_int") + (match_test "ival > 0 && ival < 0x10000"))) + +;; General constraints + +(define_memory_constraint "R" + "An address that can be used in a non-macro load or store." + (and (match_code "mem") + (match_test "loongarch_address_insns (XEXP (op, 0), mode, false) == 1"))) +(define_constraint "S" + "@internal + A constant call address." + (and (match_operand 0 "call_insn_operand") + (match_test "CONSTANT_P (op)"))) + +(define_constraint "YG" + "@internal + A vector zero." + (and (match_code "const_vector") + (match_test "op == CONST0_RTX (mode)"))) + +(define_constraint "YA" + "@internal + An unsigned 6-bit constant." + (and (match_code "const_int") + (match_test "UIMM6_OPERAND (ival)"))) + +(define_constraint "YB" + "@internal + A signed 10-bit constant." + (and (match_code "const_int") + (match_test "IMM10_OPERAND (ival)"))) + +(define_constraint "Yb" + "@internal" + (match_operand 0 "qi_mask_operand")) + (define_constraint "Yd" "@internal A constant @code{move_operand} that can be safely loaded using @@ -221,10 +279,73 @@ (define_constraint "Yd" (and (match_operand 0 "move_operand") (match_test "CONSTANT_P (op)"))) +(define_constraint "Yh" + "@internal" + (match_operand 0 "hi_mask_operand")) + +(define_constraint "Yw" + "@internal" + (match_operand 0 "si_mask_operand")) + (define_constraint "Yx" "@internal" (match_operand 0 "low_bitmask_operand")) +(define_constraint "YI" + "@internal + A replicated vector const in which the replicated value is in the range + [-512,511]." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_same_int_p (op, mode, -512, 511)"))) + +(define_constraint "YC" + "@internal + A replicated vector const in which the replicated value has a single + bit set." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_bitimm_set_p (op, mode)"))) + +(define_constraint "YZ" + "@internal + A replicated vector const in which the replicated value has a single + bit clear." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_bitimm_clr_p (op, mode)"))) + +(define_constraint "Unv5" + "@internal + A replicated vector const in which the replicated value is in the range + [-31,0]." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_same_int_p (op, mode, -31, 0)"))) + +(define_constraint "Uuv5" + "@internal + A replicated vector const in which the replicated value is in the range + [0,31]." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_same_int_p (op, mode, 0, 31)"))) + +(define_constraint "Usv5" + "@internal + A replicated vector const in which the replicated value is in the range + [-16,15]." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_same_int_p (op, mode, -16, 15)"))) + +(define_constraint "Uuv6" + "@internal + A replicated vector const in which the replicated value is in the range + [0,63]." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_same_int_p (op, mode, 0, 63)"))) + +(define_constraint "Urv8" + "@internal + A replicated vector const with replicated byte values as well as elements" + (and (match_code "const_vector") + (match_test "loongarch_const_vector_same_bytes_p (op, mode)"))) + (define_memory_constraint "ZC" "A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index b929f224dfa..ebe70a986c3 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see #include "fold-const.h" #include "expr.h" #include "langhooks.h" +#include "emit-rtl.h" /* Macros to create an enumeration identifier for a function prototype. */ #define LARCH_FTYPE_NAME1(A, B) LARCH_##A##_FTYPE_##B @@ -297,6 +298,15 @@ loongarch_prepare_builtin_arg (struct expand_operand *op, tree exp, create_input_operand (op, value, TYPE_MODE (TREE_TYPE (arg))); } +/* Return a const_int vector of VAL with mode MODE. */ + +rtx +loongarch_gen_const_int_vector (machine_mode mode, HOST_WIDE_INT val) +{ + rtx c = gen_int_mode (val, GET_MODE_INNER (mode)); + return gen_const_vec_duplicate (mode, c); +} + /* Expand instruction ICODE as part of a built-in function sequence. Use the first NOPS elements of OPS as the instruction's operands. HAS_TARGET_P is true if operand 0 is a target; it is false if the diff --git a/gcc/config/loongarch/loongarch-modes.def b/gcc/config/loongarch/loongarch-modes.def index 8082ce993a5..6f57b60525d 100644 --- a/gcc/config/loongarch/loongarch-modes.def +++ b/gcc/config/loongarch/loongarch-modes.def @@ -23,3 +23,41 @@ FLOAT_MODE (TF, 16, ieee_quad_format); /* For floating point conditions in FCC registers. */ CC_MODE (FCC); + +/* Vector modes. */ +VECTOR_MODES (INT, 4); /* V4QI V2HI */ +VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */ +VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ + +/* For LARCH LSX 128 bits. */ +VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ +VECTOR_MODES (FLOAT, 16); /* V4SF V2DF */ + +VECTOR_MODES (INT, 32); /* V32QI V16HI V8SI V4DI */ +VECTOR_MODES (FLOAT, 32); /* V8SF V4DF */ + +/* Double-sized vector modes for vec_concat. */ +/* VECTOR_MODE (INT, QI, 32); V32QI */ +/* VECTOR_MODE (INT, HI, 16); V16HI */ +/* VECTOR_MODE (INT, SI, 8); V8SI */ +/* VECTOR_MODE (INT, DI, 4); V4DI */ +/* VECTOR_MODE (FLOAT, SF, 8); V8SF */ +/* VECTOR_MODE (FLOAT, DF, 4); V4DF */ + +VECTOR_MODE (INT, QI, 64); /* V64QI */ +VECTOR_MODE (INT, HI, 32); /* V32HI */ +VECTOR_MODE (INT, SI, 16); /* V16SI */ +VECTOR_MODE (INT, DI, 8); /* V8DI */ +VECTOR_MODE (FLOAT, SF, 16); /* V16SF */ +VECTOR_MODE (FLOAT, DF, 8); /* V8DF */ + +VECTOR_MODES (FRACT, 4); /* V4QQ V2HQ */ +VECTOR_MODES (UFRACT, 4); /* V4UQQ V2UHQ */ +VECTOR_MODES (ACCUM, 4); /* V2HA */ +VECTOR_MODES (UACCUM, 4); /* V2UHA */ + +INT_MODE (OI, 32); + +/* Keep the OI modes from confusing the compiler into thinking + that these modes could actually be used for computation. They are + only holders for vectors during data movement. */ diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index b71b188507a..fc33527cdcf 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -85,10 +85,18 @@ extern bool loongarch_split_move_p (rtx, rtx); extern void loongarch_split_move (rtx, rtx, rtx); extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, machine_mode); extern void loongarch_split_plus_constant (rtx *, machine_mode); +extern bool loongarch_split_move_insn_p (rtx, rtx); +extern void loongarch_split_move_insn (rtx, rtx, rtx); +extern void loongarch_split_128bit_move (rtx, rtx); +extern bool loongarch_split_128bit_move_p (rtx, rtx); +extern void loongarch_split_lsx_copy_d (rtx, rtx, rtx, rtx (*)(rtx, rtx, rtx)); +extern void loongarch_split_lsx_insert_d (rtx, rtx, rtx, rtx); +extern void loongarch_split_lsx_fill_d (rtx, rtx); extern const char *loongarch_output_move (rtx, rtx); extern bool loongarch_cfun_has_cprestore_slot_p (void); #ifdef RTX_CODE extern void loongarch_expand_scc (rtx *); +extern bool loongarch_expand_vec_cmp (rtx *); extern void loongarch_expand_conditional_branch (rtx *); extern void loongarch_expand_conditional_move (rtx *); extern void loongarch_expand_conditional_trap (rtx); @@ -110,6 +118,15 @@ extern bool loongarch_small_data_pattern_p (rtx); extern rtx loongarch_rewrite_small_data (rtx); extern rtx loongarch_return_addr (int, rtx); +extern bool loongarch_const_vector_same_val_p (rtx, machine_mode); +extern bool loongarch_const_vector_same_bytes_p (rtx, machine_mode); +extern bool loongarch_const_vector_same_int_p (rtx, machine_mode, HOST_WIDE_INT, + HOST_WIDE_INT); +extern bool loongarch_const_vector_shuffle_set_p (rtx, machine_mode); +extern bool loongarch_const_vector_bitimm_set_p (rtx, machine_mode); +extern bool loongarch_const_vector_bitimm_clr_p (rtx, machine_mode); +extern rtx loongarch_lsx_vec_parallel_const_half (machine_mode, bool); +extern rtx loongarch_gen_const_int_vector (machine_mode, HOST_WIDE_INT); extern enum reg_class loongarch_secondary_reload_class (enum reg_class, machine_mode, rtx, bool); @@ -129,6 +146,7 @@ extern const char *loongarch_output_equal_conditional_branch (rtx_insn *, rtx *, bool); extern const char *loongarch_output_division (const char *, rtx *); +extern const char *loongarch_lsx_output_division (const char *, rtx *); extern const char *loongarch_output_probe_stack_range (rtx, rtx, rtx); extern bool loongarch_hard_regno_rename_ok (unsigned int, unsigned int); extern int loongarch_dspalu_bypass_p (rtx, rtx); @@ -156,6 +174,13 @@ union loongarch_gen_fn_ptrs extern void loongarch_expand_atomic_qihi (union loongarch_gen_fn_ptrs, rtx, rtx, rtx, rtx, rtx); +extern void loongarch_expand_vector_init (rtx, rtx); +extern void loongarch_expand_vec_unpack (rtx op[2], bool, bool); +extern void loongarch_expand_vec_perm (rtx, rtx, rtx, rtx); +extern void loongarch_expand_vector_extract (rtx, rtx, int); +extern void loongarch_expand_vector_reduc (rtx (*)(rtx, rtx, rtx), rtx, rtx); + +extern int loongarch_ldst_scaled_shift (machine_mode); extern bool loongarch_signed_immediate_p (unsigned HOST_WIDE_INT, int, int); extern bool loongarch_unsigned_immediate_p (unsigned HOST_WIDE_INT, int, int); extern bool loongarch_12bit_offset_address_p (rtx, machine_mode); @@ -171,6 +196,9 @@ extern bool loongarch_split_symbol_type (enum loongarch_symbol_type); typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx); extern void loongarch_register_frame_header_opt (void); +extern void loongarch_expand_vec_cond_expr (machine_mode, machine_mode, rtx *); +extern void loongarch_expand_vec_cond_mask_expr (machine_mode, machine_mode, + rtx *); /* Routines implemented in loongarch-c.c. */ void loongarch_cpu_cpp_builtins (cpp_reader *); @@ -180,6 +208,9 @@ extern void loongarch_atomic_assign_expand_fenv (tree *, tree *, tree *); extern tree loongarch_builtin_decl (unsigned int, bool); extern rtx loongarch_expand_builtin (tree, rtx, rtx subtarget ATTRIBUTE_UNUSED, machine_mode, int); +extern tree loongarch_builtin_vectorized_function (unsigned int, tree, tree); +extern rtx loongarch_gen_const_int_vector_shuffle (machine_mode, int); extern tree loongarch_build_builtin_va_list (void); +extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 86d58784113..7ffa6bbb73d 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -432,7 +432,7 @@ loongarch_flatten_aggregate_argument (const_tree type, static unsigned loongarch_pass_aggregate_num_fpr (const_tree type, - loongarch_aggregate_field fields[2]) + loongarch_aggregate_field fields[2]) { int n = loongarch_flatten_aggregate_argument (type, fields); @@ -773,7 +773,7 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum, { rtx ptr = plus_constant (Pmode, virtual_incoming_args_rtx, REG_PARM_STACK_SPACE (cfun->decl) - - gp_saved * UNITS_PER_WORD); + - gp_saved * UNITS_PER_WORD); rtx mem = gen_frame_mem (BLKmode, ptr); set_mem_alias_set (mem, get_varargs_alias_set ()); @@ -1049,7 +1049,7 @@ rtx loongarch_emit_move (rtx dest, rtx src) { return (can_create_pseudo_p () ? emit_move_insn (dest, src) - : emit_move_insn_1 (dest, src)); + : emit_move_insn_1 (dest, src)); } /* Save register REG to MEM. Make the instruction frame-related. */ @@ -1675,6 +1675,140 @@ loongarch_symbol_binds_local_p (const_rtx x) return false; } +/* Return true if OP is a constant vector with the number of units in MODE, + and each unit has the same bit set. */ + +bool +loongarch_const_vector_bitimm_set_p (rtx op, machine_mode mode) +{ + if (GET_CODE (op) == CONST_VECTOR && op != CONST0_RTX (mode)) + { + unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (op, 0)); + int vlog2 = exact_log2 (val & GET_MODE_MASK (GET_MODE_INNER (mode))); + + if (vlog2 != -1) + { + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT); + gcc_assert (vlog2 >= 0 && vlog2 <= GET_MODE_UNIT_BITSIZE (mode) - 1); + return loongarch_const_vector_same_val_p (op, mode); + } + } + + return false; +} + +/* Return true if OP is a constant vector with the number of units in MODE, + and each unit has the same bit clear. */ + +bool +loongarch_const_vector_bitimm_clr_p (rtx op, machine_mode mode) +{ + if (GET_CODE (op) == CONST_VECTOR && op != CONSTM1_RTX (mode)) + { + unsigned HOST_WIDE_INT val = ~UINTVAL (CONST_VECTOR_ELT (op, 0)); + int vlog2 = exact_log2 (val & GET_MODE_MASK (GET_MODE_INNER (mode))); + + if (vlog2 != -1) + { + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT); + gcc_assert (vlog2 >= 0 && vlog2 <= GET_MODE_UNIT_BITSIZE (mode) - 1); + return loongarch_const_vector_same_val_p (op, mode); + } + } + + return false; +} + +/* Return true if OP is a constant vector with the number of units in MODE, + and each unit has the same value. */ + +bool +loongarch_const_vector_same_val_p (rtx op, machine_mode mode) +{ + int i, nunits = GET_MODE_NUNITS (mode); + rtx first; + + if (GET_CODE (op) != CONST_VECTOR || GET_MODE (op) != mode) + return false; + + first = CONST_VECTOR_ELT (op, 0); + for (i = 1; i < nunits; i++) + if (!rtx_equal_p (first, CONST_VECTOR_ELT (op, i))) + return false; + + return true; +} + +/* Return true if OP is a constant vector with the number of units in MODE, + and each unit has the same value as well as replicated bytes in the value. +*/ + +bool +loongarch_const_vector_same_bytes_p (rtx op, machine_mode mode) +{ + int i, bytes; + HOST_WIDE_INT val, first_byte; + rtx first; + + if (!loongarch_const_vector_same_val_p (op, mode)) + return false; + + first = CONST_VECTOR_ELT (op, 0); + bytes = GET_MODE_UNIT_SIZE (mode); + val = INTVAL (first); + first_byte = val & 0xff; + for (i = 1; i < bytes; i++) + { + val >>= 8; + if ((val & 0xff) != first_byte) + return false; + } + + return true; +} + +/* Return true if OP is a constant vector with the number of units in MODE, + and each unit has the same integer value in the range [LOW, HIGH]. */ + +bool +loongarch_const_vector_same_int_p (rtx op, machine_mode mode, HOST_WIDE_INT low, + HOST_WIDE_INT high) +{ + HOST_WIDE_INT value; + rtx elem0; + + if (!loongarch_const_vector_same_val_p (op, mode)) + return false; + + elem0 = CONST_VECTOR_ELT (op, 0); + if (!CONST_INT_P (elem0)) + return false; + + value = INTVAL (elem0); + return (value >= low && value <= high); +} + +/* Return true if OP is a constant vector with repeated 4-element sets + in mode MODE. */ + +bool +loongarch_const_vector_shuffle_set_p (rtx op, machine_mode mode) +{ + int nunits = GET_MODE_NUNITS (mode); + int nsets = nunits / 4; + int set = 0; + int i, j; + + /* Check if we have the same 4-element sets. */ + for (j = 0; j < nsets; j++, set = 4 * j) + for (i = 0; i < 4; i++) + if ((INTVAL (XVECEXP (op, 0, i)) + != (INTVAL (XVECEXP (op, 0, set + i)) - set)) + || !IN_RANGE (INTVAL (XVECEXP (op, 0, set + i)), 0, set + 3)) + return false; + return true; +} + /* Return true if rtx constants of mode MODE should be put into a small data section. */ @@ -1792,6 +1926,11 @@ loongarch_symbolic_constant_p (rtx x, enum loongarch_symbol_type *symbol_type) static int loongarch_symbol_insns (enum loongarch_symbol_type type, machine_mode mode) { + /* LSX LD.* and ST.* cannot support loading symbols via an immediate + operand. */ + if (LSX_SUPPORTED_MODE_P (mode)) + return 0; + switch (type) { case SYMBOL_GOT_DISP: @@ -1838,7 +1977,8 @@ loongarch_cannot_force_const_mem (machine_mode mode, rtx x) references, reload will consider forcing C into memory and using one of the instruction's memory alternatives. Returning false here will force it to use an input reload instead. */ - if (CONST_INT_P (x) && loongarch_legitimate_constant_p (mode, x)) + if ((CONST_INT_P (x) || GET_CODE (x) == CONST_VECTOR) + && loongarch_legitimate_constant_p (mode, x)) return true; split_const (x, &base, &offset); @@ -1915,6 +2055,12 @@ loongarch_valid_offset_p (rtx x, machine_mode mode) && !IMM12_OPERAND (INTVAL (x) + GET_MODE_SIZE (mode) - UNITS_PER_WORD)) return false; + /* LSX LD.* and ST.* supports 10-bit signed offsets. */ + if (LSX_SUPPORTED_MODE_P (mode) + && !loongarch_signed_immediate_p (INTVAL (x), 10, + loongarch_ldst_scaled_shift (mode))) + return false; + return true; } @@ -1999,7 +2145,7 @@ loongarch_valid_lo_sum_p (enum loongarch_symbol_type symbol_type, static bool loongarch_valid_index_p (struct loongarch_address_info *info, rtx x, - machine_mode mode, bool strict_p) + machine_mode mode, bool strict_p) { rtx index; @@ -2052,7 +2198,7 @@ loongarch_classify_address (struct loongarch_address_info *info, rtx x, } if (loongarch_valid_base_register_p (XEXP (x, 1), mode, strict_p) - && loongarch_valid_index_p (info, XEXP (x, 0), mode, strict_p)) + && loongarch_valid_index_p (info, XEXP (x, 0), mode, strict_p)) { info->reg = XEXP (x, 1); return true; @@ -2128,6 +2274,7 @@ loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p) { struct loongarch_address_info addr; int factor; + bool lsx_p = !might_split_p && LSX_SUPPORTED_MODE_P (mode); if (!loongarch_classify_address (&addr, x, mode, false)) return 0; @@ -2145,15 +2292,29 @@ loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p) switch (addr.type) { case ADDRESS_REG: + if (lsx_p) + { + /* LSX LD.* and ST.* supports 10-bit signed offsets. */ + if (loongarch_signed_immediate_p (INTVAL (addr.offset), 10, + loongarch_ldst_scaled_shift (mode))) + return 1; + else + return 0; + } + return factor; + case ADDRESS_REG_REG: - case ADDRESS_CONST_INT: return factor; + case ADDRESS_CONST_INT: + return lsx_p ? 0 : factor; + case ADDRESS_LO_SUM: return factor + 1; case ADDRESS_SYMBOLIC: - return factor * loongarch_symbol_insns (addr.symbol_type, mode); + return lsx_p ? 0 + : factor * loongarch_symbol_insns (addr.symbol_type, mode); } return 0; } @@ -2179,6 +2340,19 @@ loongarch_signed_immediate_p (unsigned HOST_WIDE_INT x, int bits, return loongarch_unsigned_immediate_p (x, bits, shift); } +/* Return the scale shift that applied to LSX LD/ST address offset. */ + +int +loongarch_ldst_scaled_shift (machine_mode mode) +{ + int shift = exact_log2 (GET_MODE_UNIT_SIZE (mode)); + + if (shift < 0 || shift > 8) + gcc_unreachable (); + + return shift; +} + /* Return true if X is a legitimate address with a 12-bit offset or addr.type is ADDRESS_LO_SUM. MODE is the mode of the value being accessed. */ @@ -2246,6 +2420,9 @@ loongarch_const_insns (rtx x) return loongarch_integer_cost (INTVAL (x)); case CONST_VECTOR: + if (LSX_SUPPORTED_MODE_P (GET_MODE (x)) + && loongarch_const_vector_same_int_p (x, GET_MODE (x), -512, 511)) + return 1; /* Fall through. */ case CONST_DOUBLE: return x == CONST0_RTX (GET_MODE (x)) ? 1 : 0; @@ -2280,7 +2457,7 @@ loongarch_const_insns (rtx x) case SYMBOL_REF: case LABEL_REF: return loongarch_symbol_insns ( - loongarch_classify_symbol (x), MAX_MACHINE_MODE); + loongarch_classify_symbol (x), MAX_MACHINE_MODE); default: return 0; @@ -2302,7 +2479,26 @@ loongarch_split_const_insns (rtx x) return low + high; } -static bool loongarch_split_move_insn_p (rtx dest, rtx src); +bool loongarch_split_move_insn_p (rtx dest, rtx src); +/* Return one word of 128-bit value OP, taking into account the fixed + endianness of certain registers. BYTE selects from the byte address. */ + +rtx +loongarch_subword_at_byte (rtx op, unsigned int byte) +{ + machine_mode mode; + + mode = GET_MODE (op); + if (mode == VOIDmode) + mode = TImode; + + gcc_assert (!FP_REG_RTX_P (op)); + + if (MEM_P (op)) + return loongarch_rewrite_small_data (adjust_address (op, word_mode, byte)); + + return simplify_gen_subreg (word_mode, op, mode, byte); +} /* Return the number of instructions needed to implement INSN, given that it loads from or stores to MEM. */ @@ -3063,9 +3259,10 @@ loongarch_legitimize_move (machine_mode mode, rtx dest, rtx src) /* Both src and dest are non-registers; one special case is supported where the source is (const_int 0) and the store can source the zero register. - */ + LSX is never able to source the zero register directly in + memory operations. */ if (!register_operand (dest, mode) && !register_operand (src, mode) - && !const_0_operand (src, mode)) + && (!const_0_operand (src, mode) || LSX_SUPPORTED_MODE_P (mode))) { loongarch_emit_move (dest, force_reg (mode, src)); return true; @@ -3637,6 +3834,54 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, } } +/* Vectorizer cost model implementation. */ + +/* Implement targetm.vectorize.builtin_vectorization_cost. */ + +static int +loongarch_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, + tree vectype, + int misalign ATTRIBUTE_UNUSED) +{ + unsigned elements; + + switch (type_of_cost) + { + case scalar_stmt: + case scalar_load: + case vector_stmt: + case vector_load: + case vec_to_scalar: + case scalar_to_vec: + case cond_branch_not_taken: + case vec_promote_demote: + case scalar_store: + case vector_store: + return 1; + + case vec_perm: + return 1; + + case unaligned_load: + case vector_gather_load: + return 2; + + case unaligned_store: + case vector_scatter_store: + return 10; + + case cond_branch_taken: + return 3; + + case vec_construct: + elements = TYPE_VECTOR_SUBPARTS (vectype); + return elements / 2 + 1; + + default: + gcc_unreachable (); + } +} + /* Implement TARGET_ADDRESS_COST. */ static int @@ -3691,6 +3936,11 @@ loongarch_split_move_p (rtx dest, rtx src) if (FP_REG_RTX_P (src) && MEM_P (dest)) return false; } + + /* Check if LSX moves need splitting. */ + if (LSX_SUPPORTED_MODE_P (GET_MODE (dest))) + return loongarch_split_128bit_move_p (dest, src); + /* Otherwise split all multiword moves. */ return size > UNITS_PER_WORD; } @@ -3704,7 +3954,9 @@ loongarch_split_move (rtx dest, rtx src, rtx insn_) rtx low_dest; gcc_checking_assert (loongarch_split_move_p (dest, src)); - if (FP_REG_RTX_P (dest) || FP_REG_RTX_P (src)) + if (LSX_SUPPORTED_MODE_P (GET_MODE (dest))) + loongarch_split_128bit_move (dest, src); + else if (FP_REG_RTX_P (dest) || FP_REG_RTX_P (src)) { if (!TARGET_64BIT && GET_MODE (dest) == DImode) emit_insn (gen_move_doubleword_fprdi (dest, src)); @@ -3808,12 +4060,21 @@ loongarch_split_plus_constant (rtx *op, machine_mode mode) /* Return true if a move from SRC to DEST in INSN should be split. */ -static bool +bool loongarch_split_move_insn_p (rtx dest, rtx src) { return loongarch_split_move_p (dest, src); } +/* Split a move from SRC to DEST in INSN, given that + loongarch_split_move_insn_p holds. */ + +void +loongarch_split_move_insn (rtx dest, rtx src, rtx insn) +{ + loongarch_split_move (dest, src, insn); +} + /* Implement TARGET_CONSTANT_ALIGNMENT. */ static HOST_WIDE_INT @@ -3860,7 +4121,7 @@ const char * loongarch_output_move_index_float (rtx x, machine_mode mode, bool ldr) { int index = exact_log2 (GET_MODE_SIZE (mode)); - if (!IN_RANGE (index, 2, 3)) + if (!IN_RANGE (index, 2, 4)) return NULL; struct loongarch_address_info info; @@ -3869,20 +4130,216 @@ loongarch_output_move_index_float (rtx x, machine_mode mode, bool ldr) || !loongarch_legitimate_address_p (mode, x, false)) return NULL; - const char *const insn[][2] = + const char *const insn[][3] = { { "fstx.s\t%1,%0", - "fstx.d\t%1,%0" + "fstx.d\t%1,%0", + "vstx\t%w1,%0" }, { "fldx.s\t%0,%1", - "fldx.d\t%0,%1" - }, + "fldx.d\t%0,%1", + "vldx\t%w0,%1" + } }; return insn[ldr][index-2]; } +/* Return true if a 128-bit move from SRC to DEST should be split. */ + +bool +loongarch_split_128bit_move_p (rtx dest, rtx src) +{ + /* LSX-to-LSX moves can be done in a single instruction. */ + if (FP_REG_RTX_P (src) && FP_REG_RTX_P (dest)) + return false; + + /* Check for LSX loads and stores. */ + if (FP_REG_RTX_P (dest) && MEM_P (src)) + return false; + if (FP_REG_RTX_P (src) && MEM_P (dest)) + return false; + + /* Check for LSX set to an immediate const vector with valid replicated + element. */ + if (FP_REG_RTX_P (dest) + && loongarch_const_vector_same_int_p (src, GET_MODE (src), -512, 511)) + return false; + + /* Check for LSX load zero immediate. */ + if (FP_REG_RTX_P (dest) && src == CONST0_RTX (GET_MODE (src))) + return false; + + return true; +} + +/* Split a 128-bit move from SRC to DEST. */ + +void +loongarch_split_128bit_move (rtx dest, rtx src) +{ + int byte, index; + rtx low_dest, low_src, d, s; + + if (FP_REG_RTX_P (dest)) + { + gcc_assert (!MEM_P (src)); + + rtx new_dest = dest; + if (!TARGET_64BIT) + { + if (GET_MODE (dest) != V4SImode) + new_dest = simplify_gen_subreg (V4SImode, dest, GET_MODE (dest), 0); + } + else + { + if (GET_MODE (dest) != V2DImode) + new_dest = simplify_gen_subreg (V2DImode, dest, GET_MODE (dest), 0); + } + + for (byte = 0, index = 0; byte < GET_MODE_SIZE (TImode); + byte += UNITS_PER_WORD, index++) + { + s = loongarch_subword_at_byte (src, byte); + if (!TARGET_64BIT) + emit_insn (gen_lsx_vinsgr2vr_w (new_dest, s, new_dest, + GEN_INT (1 << index))); + else + emit_insn (gen_lsx_vinsgr2vr_d (new_dest, s, new_dest, + GEN_INT (1 << index))); + } + } + else if (FP_REG_RTX_P (src)) + { + gcc_assert (!MEM_P (dest)); + + rtx new_src = src; + if (!TARGET_64BIT) + { + if (GET_MODE (src) != V4SImode) + new_src = simplify_gen_subreg (V4SImode, src, GET_MODE (src), 0); + } + else + { + if (GET_MODE (src) != V2DImode) + new_src = simplify_gen_subreg (V2DImode, src, GET_MODE (src), 0); + } + + for (byte = 0, index = 0; byte < GET_MODE_SIZE (TImode); + byte += UNITS_PER_WORD, index++) + { + d = loongarch_subword_at_byte (dest, byte); + if (!TARGET_64BIT) + emit_insn (gen_lsx_vpickve2gr_w (d, new_src, GEN_INT (index))); + else + emit_insn (gen_lsx_vpickve2gr_d (d, new_src, GEN_INT (index))); + } + } + else + { + low_dest = loongarch_subword_at_byte (dest, 0); + low_src = loongarch_subword_at_byte (src, 0); + gcc_assert (REG_P (low_dest) && REG_P (low_src)); + /* Make sure the source register is not written before reading. */ + if (REGNO (low_dest) <= REGNO (low_src)) + { + for (byte = 0; byte < GET_MODE_SIZE (TImode); + byte += UNITS_PER_WORD) + { + d = loongarch_subword_at_byte (dest, byte); + s = loongarch_subword_at_byte (src, byte); + loongarch_emit_move (d, s); + } + } + else + { + for (byte = GET_MODE_SIZE (TImode) - UNITS_PER_WORD; byte >= 0; + byte -= UNITS_PER_WORD) + { + d = loongarch_subword_at_byte (dest, byte); + s = loongarch_subword_at_byte (src, byte); + loongarch_emit_move (d, s); + } + } + } +} + + +/* Split a COPY_S.D with operands DEST, SRC and INDEX. GEN is a function + used to generate subregs. */ + +void +loongarch_split_lsx_copy_d (rtx dest, rtx src, rtx index, + rtx (*gen_fn)(rtx, rtx, rtx)) +{ + gcc_assert ((GET_MODE (src) == V2DImode && GET_MODE (dest) == DImode) + || (GET_MODE (src) == V2DFmode && GET_MODE (dest) == DFmode)); + + /* Note that low is always from the lower index, and high is always + from the higher index. */ + rtx low = loongarch_subword (dest, false); + rtx high = loongarch_subword (dest, true); + rtx new_src = simplify_gen_subreg (V4SImode, src, GET_MODE (src), 0); + + emit_insn (gen_fn (low, new_src, GEN_INT (INTVAL (index) * 2))); + emit_insn (gen_fn (high, new_src, GEN_INT (INTVAL (index) * 2 + 1))); +} + +/* Split a INSERT.D with operand DEST, SRC1.INDEX and SRC2. */ + +void +loongarch_split_lsx_insert_d (rtx dest, rtx src1, rtx index, rtx src2) +{ + int i; + gcc_assert (GET_MODE (dest) == GET_MODE (src1)); + gcc_assert ((GET_MODE (dest) == V2DImode + && (GET_MODE (src2) == DImode || src2 == const0_rtx)) + || (GET_MODE (dest) == V2DFmode && GET_MODE (src2) == DFmode)); + + /* Note that low is always from the lower index, and high is always + from the higher index. */ + rtx low = loongarch_subword (src2, false); + rtx high = loongarch_subword (src2, true); + rtx new_dest = simplify_gen_subreg (V4SImode, dest, GET_MODE (dest), 0); + rtx new_src1 = simplify_gen_subreg (V4SImode, src1, GET_MODE (src1), 0); + i = exact_log2 (INTVAL (index)); + gcc_assert (i != -1); + + emit_insn (gen_lsx_vinsgr2vr_w (new_dest, low, new_src1, + GEN_INT (1 << (i * 2)))); + emit_insn (gen_lsx_vinsgr2vr_w (new_dest, high, new_dest, + GEN_INT (1 << (i * 2 + 1)))); +} + +/* Split FILL.D. */ + +void +loongarch_split_lsx_fill_d (rtx dest, rtx src) +{ + gcc_assert ((GET_MODE (dest) == V2DImode + && (GET_MODE (src) == DImode || src == const0_rtx)) + || (GET_MODE (dest) == V2DFmode && GET_MODE (src) == DFmode)); + + /* Note that low is always from the lower index, and high is always + from the higher index. */ + rtx low, high; + if (src == const0_rtx) + { + low = src; + high = src; + } + else + { + low = loongarch_subword (src, false); + high = loongarch_subword (src, true); + } + rtx new_dest = simplify_gen_subreg (V4SImode, dest, GET_MODE (dest), 0); + emit_insn (gen_lsx_vreplgr2vr_w (new_dest, low)); + emit_insn (gen_lsx_vinsgr2vr_w (new_dest, high, new_dest, GEN_INT (1 << 1))); + emit_insn (gen_lsx_vinsgr2vr_w (new_dest, high, new_dest, GEN_INT (1 << 3))); +} + /* Return the appropriate instructions to move SRC into DEST. Assume that SRC is operand 1 and DEST is operand 0. */ @@ -3894,10 +4351,25 @@ loongarch_output_move (rtx dest, rtx src) enum rtx_code src_code = GET_CODE (src); machine_mode mode = GET_MODE (dest); bool dbl_p = (GET_MODE_SIZE (mode) == 8); + bool lsx_p = LSX_SUPPORTED_MODE_P (mode); if (loongarch_split_move_p (dest, src)) return "#"; + if ((lsx_p) + && dest_code == REG && FP_REG_P (REGNO (dest)) + && src_code == CONST_VECTOR + && CONST_INT_P (CONST_VECTOR_ELT (src, 0))) + { + gcc_assert (loongarch_const_vector_same_int_p (src, mode, -512, 511)); + switch (GET_MODE_SIZE (mode)) + { + case 16: + return "vrepli.%v0\t%w0,%E1"; + default: gcc_unreachable (); + } + } + if ((src_code == REG && GP_REG_P (REGNO (src))) || (src == CONST0_RTX (mode))) { @@ -3907,7 +4379,21 @@ loongarch_output_move (rtx dest, rtx src) return "or\t%0,%z1,$r0"; if (FP_REG_P (REGNO (dest))) - return dbl_p ? "movgr2fr.d\t%0,%z1" : "movgr2fr.w\t%0,%z1"; + { + if (lsx_p) + { + gcc_assert (src == CONST0_RTX (GET_MODE (src))); + switch (GET_MODE_SIZE (mode)) + { + case 16: + return "vrepli.b\t%w0,0"; + default: + gcc_unreachable (); + } + } + + return dbl_p ? "movgr2fr.d\t%0,%z1" : "movgr2fr.w\t%0,%z1"; + } } if (dest_code == MEM) { @@ -3949,7 +4435,10 @@ loongarch_output_move (rtx dest, rtx src) { if (src_code == REG) if (FP_REG_P (REGNO (src))) - return dbl_p ? "movfr2gr.d\t%0,%1" : "movfr2gr.s\t%0,%1"; + { + gcc_assert (!lsx_p); + return dbl_p ? "movfr2gr.d\t%0,%1" : "movfr2gr.s\t%0,%1"; + } if (src_code == MEM) { @@ -3994,7 +4483,7 @@ loongarch_output_move (rtx dest, rtx src) enum loongarch_symbol_type type = SYMBOL_PCREL; if (UNSPEC_ADDRESS_P (x)) - type = UNSPEC_ADDRESS_TYPE (x); + type = UNSPEC_ADDRESS_TYPE (x); if (type == SYMBOL_TLS_LE) return "lu12i.w\t%0,%h1"; @@ -4029,7 +4518,20 @@ loongarch_output_move (rtx dest, rtx src) if (src_code == REG && FP_REG_P (REGNO (src))) { if (dest_code == REG && FP_REG_P (REGNO (dest))) - return dbl_p ? "fmov.d\t%0,%1" : "fmov.s\t%0,%1"; + { + if (lsx_p) + { + switch (GET_MODE_SIZE (mode)) + { + case 16: + return "vori.b\t%w0,%w1,0"; + default: + gcc_unreachable (); + } + } + + return dbl_p ? "fmov.d\t%0,%1" : "fmov.s\t%0,%1"; + } if (dest_code == MEM) { @@ -4040,6 +4542,17 @@ loongarch_output_move (rtx dest, rtx src) if (insn) return insn; + if (lsx_p) + { + switch (GET_MODE_SIZE (mode)) + { + case 16: + return "vst\t%w1,%0"; + default: + gcc_unreachable (); + } + } + return dbl_p ? "fst.d\t%1,%0" : "fst.s\t%1,%0"; } } @@ -4055,6 +4568,16 @@ loongarch_output_move (rtx dest, rtx src) if (insn) return insn; + if (lsx_p) + { + switch (GET_MODE_SIZE (mode)) + { + case 16: + return "vld\t%w0,%1"; + default: + gcc_unreachable (); + } + } return dbl_p ? "fld.d\t%0,%1" : "fld.s\t%0,%1"; } } @@ -4244,6 +4767,7 @@ loongarch_extend_comparands (rtx_code code, rtx *op0, rtx *op1) } } + /* Convert a comparison into something that can be used in a branch. On entry, *OP0 and *OP1 are the values being compared and *CODE is the code used to compare them. Update them to describe the final comparison. */ @@ -5003,9 +5527,12 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part, 'A' Print a _DB suffix if the memory model requires a release. 'b' Print the address of a memory operand, without offset. + 'B' Print CONST_INT OP element 0 of a replicated CONST_VECTOR + as an unsigned byte [0..255]. 'c' Print an integer. 'C' Print the integer branch condition for comparison OP. 'd' Print CONST_INT OP in decimal. + 'E' Print CONST_INT OP element 0 of a replicated CONST_VECTOR in decimal. 'F' Print the FPU branch condition for comparison OP. 'G' Print a DBAR insn if the memory model requires a release. 'H' Print address 52-61bit relocation associated with OP. @@ -5021,13 +5548,16 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part, 't' Like 'T', but with the EQ/NE cases reversed 'V' Print exact log2 of CONST_INT OP element 0 of a replicated CONST_VECTOR in decimal. + 'v' Print the insn size suffix b, h, w or d for vector modes V16QI, V8HI, + V4SI, V2SI, and w, d for vector modes V4SF, V2DF respectively. 'W' Print the inverse of the FPU branch condition for comparison OP. + 'w' Print a LSX register. 'X' Print CONST_INT OP in hexadecimal format. 'x' Print the low 16 bits of CONST_INT OP in hexadecimal format. 'Y' Print loongarch_fp_conditions[INTVAL (OP)] 'y' Print exact log2 of CONST_INT OP in decimal. 'Z' Print OP and a comma for 8CC, otherwise print nothing. - 'z' Print $0 if OP is zero, otherwise print OP normally. */ + 'z' Print $r0 if OP is zero, otherwise print OP normally. */ static void loongarch_print_operand (FILE *file, rtx op, int letter) @@ -5049,6 +5579,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter) if (loongarch_memmodel_needs_rel_acq_fence ((enum memmodel) INTVAL (op))) fputs ("_db", file); break; + case 'E': + if (GET_CODE (op) == CONST_VECTOR) + { + gcc_assert (loongarch_const_vector_same_val_p (op, GET_MODE (op))); + op = CONST_VECTOR_ELT (op, 0); + gcc_assert (CONST_INT_P (op)); + fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (op)); + } + else + output_operand_lossage ("invalid use of '%%%c'", letter); + break; + case 'c': if (CONST_INT_P (op)) @@ -5099,6 +5641,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter) loongarch_print_operand_reloc (file, op, false /* hi64_part*/, false /* lo_reloc */); break; + case 'B': + if (GET_CODE (op) == CONST_VECTOR) + { + gcc_assert (loongarch_const_vector_same_val_p (op, GET_MODE (op))); + op = CONST_VECTOR_ELT (op, 0); + gcc_assert (CONST_INT_P (op)); + unsigned HOST_WIDE_INT val8 = UINTVAL (op) & GET_MODE_MASK (QImode); + fprintf (file, HOST_WIDE_INT_PRINT_UNSIGNED, val8); + } + else + output_operand_lossage ("invalid use of '%%%c'", letter); + break; case 'm': if (CONST_INT_P (op)) @@ -5145,10 +5699,45 @@ loongarch_print_operand (FILE *file, rtx op, int letter) output_operand_lossage ("invalid use of '%%%c'", letter); break; - case 'W': - loongarch_print_float_branch_condition (file, reverse_condition (code), - letter); - break; + case 'v': + switch (GET_MODE (op)) + { + case E_V16QImode: + case E_V32QImode: + fprintf (file, "b"); + break; + case E_V8HImode: + case E_V16HImode: + fprintf (file, "h"); + break; + case E_V4SImode: + case E_V4SFmode: + case E_V8SImode: + case E_V8SFmode: + fprintf (file, "w"); + break; + case E_V2DImode: + case E_V2DFmode: + case E_V4DImode: + case E_V4DFmode: + fprintf (file, "d"); + break; + default: + output_operand_lossage ("invalid use of '%%%c'", letter); + } + break; + + case 'W': + loongarch_print_float_branch_condition (file, reverse_condition (code), + letter); + break; + + case 'w': + if (code == REG && LSX_REG_P (REGNO (op))) + fprintf (file, "$vr%s", ®_names[REGNO (op)][2]); + else + output_operand_lossage ("invalid use of '%%%c'", letter); + break; case 'x': if (CONST_INT_P (op)) @@ -5521,9 +6110,13 @@ loongarch_hard_regno_mode_ok_uncached (unsigned int regno, machine_mode mode) size = GET_MODE_SIZE (mode); mclass = GET_MODE_CLASS (mode); - if (GP_REG_P (regno)) + if (GP_REG_P (regno) && !LSX_SUPPORTED_MODE_P (mode)) return ((regno - GP_REG_FIRST) & 1) == 0 || size <= UNITS_PER_WORD; + /* For LSX, allow TImode and 128-bit vector modes in all FPR. */ + if (FP_REG_P (regno) && LSX_SUPPORTED_MODE_P (mode)) + return true; + if (FP_REG_P (regno)) { if (mclass == MODE_FLOAT @@ -5550,6 +6143,17 @@ loongarch_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return loongarch_hard_regno_mode_ok_p[mode][regno]; } + +static bool +loongarch_hard_regno_call_part_clobbered (unsigned int, + unsigned int regno, machine_mode mode) +{ + if (ISA_HAS_LSX && FP_REG_P (regno) && GET_MODE_SIZE (mode) > 8) + return true; + + return false; +} + /* Implement TARGET_HARD_REGNO_NREGS. */ static unsigned int @@ -5561,7 +6165,12 @@ loongarch_hard_regno_nregs (unsigned int regno, machine_mode mode) return (GET_MODE_SIZE (mode) + 3) / 4; if (FP_REG_P (regno)) - return (GET_MODE_SIZE (mode) + UNITS_PER_FPREG - 1) / UNITS_PER_FPREG; + { + if (LSX_SUPPORTED_MODE_P (mode)) + return 1; + + return (GET_MODE_SIZE (mode) + UNITS_PER_FPREG - 1) / UNITS_PER_FPREG; + } /* All other registers are word-sized. */ return (GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD; @@ -5588,8 +6197,12 @@ loongarch_class_max_nregs (enum reg_class rclass, machine_mode mode) if (hard_reg_set_intersect_p (left, reg_class_contents[(int) FP_REGS])) { if (loongarch_hard_regno_mode_ok (FP_REG_FIRST, mode)) - size = MIN (size, UNITS_PER_FPREG); - + { + if (LSX_SUPPORTED_MODE_P (mode)) + size = MIN (size, UNITS_PER_LSX_REG); + else + size = MIN (size, UNITS_PER_FPREG); + } left &= ~reg_class_contents[FP_REGS]; } if (!hard_reg_set_empty_p (left)) @@ -5600,9 +6213,13 @@ loongarch_class_max_nregs (enum reg_class rclass, machine_mode mode) /* Implement TARGET_CAN_CHANGE_MODE_CLASS. */ static bool -loongarch_can_change_mode_class (machine_mode, machine_mode, +loongarch_can_change_mode_class (machine_mode from, machine_mode to, reg_class_t rclass) { + /* Allow conversions between different LSX vector modes. */ + if (LSX_SUPPORTED_MODE_P (from) && LSX_SUPPORTED_MODE_P (to)) + return true; + return !reg_classes_intersect_p (FP_REGS, rclass); } @@ -5622,7 +6239,7 @@ loongarch_mode_ok_for_mov_fmt_p (machine_mode mode) return TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT; default: - return 0; + return LSX_SUPPORTED_MODE_P (mode); } } @@ -5779,7 +6396,12 @@ loongarch_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, if (regno < 0 || (MEM_P (x) && (GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8))) - /* In this case we can use fld.s, fst.s, fld.d or fst.d. */ + /* In this case we can use lwc1, swc1, ldc1 or sdc1. We'll use + pairs of lwc1s and swc1s if ldc1 and sdc1 are not supported. */ + return NO_REGS; + + if (MEM_P (x) && LSX_SUPPORTED_MODE_P (mode)) + /* In this case we can use LSX LD.* and ST.*. */ return NO_REGS; if (GP_REG_P (regno) || x == CONST0_RTX (mode)) @@ -5814,6 +6436,14 @@ loongarch_valid_pointer_mode (scalar_int_mode mode) return mode == SImode || (TARGET_64BIT && mode == DImode); } +/* Implement TARGET_VECTOR_MODE_SUPPORTED_P. */ + +static bool +loongarch_vector_mode_supported_p (machine_mode mode) +{ + return LSX_SUPPORTED_MODE_P (mode); +} + /* Implement TARGET_SCALAR_MODE_SUPPORTED_P. */ static bool @@ -5826,6 +6456,48 @@ loongarch_scalar_mode_supported_p (scalar_mode mode) return default_scalar_mode_supported_p (mode); } +/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE. */ + +static machine_mode +loongarch_preferred_simd_mode (scalar_mode mode) +{ + if (!ISA_HAS_LSX) + return word_mode; + + switch (mode) + { + case E_QImode: + return E_V16QImode; + case E_HImode: + return E_V8HImode; + case E_SImode: + return E_V4SImode; + case E_DImode: + return E_V2DImode; + + case E_SFmode: + return E_V4SFmode; + + case E_DFmode: + return E_V2DFmode; + + default: + break; + } + return word_mode; +} + +static unsigned int +loongarch_autovectorize_vector_modes (vector_modes *modes, bool) +{ + if (ISA_HAS_LSX) + { + modes->safe_push (V16QImode); + } + + return 0; +} + /* Return the assembly code for INSN, which has the operands given by OPERANDS, and which branches to OPERANDS[0] if some condition is true. BRANCH_IF_TRUE is the asm template that should be used if OPERANDS[0] @@ -5990,6 +6662,29 @@ loongarch_output_division (const char *division, rtx *operands) return s; } +/* Return the assembly code for LSX DIV_{S,U}.DF or MOD_{S,U}.DF instructions, + which has the operands given by OPERANDS. Add in a divide-by-zero check + if needed. */ + +const char * +loongarch_lsx_output_division (const char *division, rtx *operands) +{ + const char *s; + + s = division; + if (TARGET_CHECK_ZERO_DIV) + { + if (ISA_HAS_LSX) + { + output_asm_insn ("vsetallnez.%v0\t$fcc7,%w2",operands); + output_asm_insn (s, operands); + output_asm_insn ("bcnez\t$fcc7,1f", operands); + } + s = "break\t7\n1:"; + } + return s; +} + /* Implement TARGET_SCHED_ADJUST_COST. We assume that anti and output dependencies have no cost. */ @@ -6259,6 +6954,9 @@ loongarch_option_override_internal (struct gcc_options *opts) if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib) error ("%qs cannot be used for compiling a shared library", "-mdirect-extern-access"); + if (loongarch_vector_access_cost == 0) + loongarch_vector_access_cost = 5; + switch (la_target.cmodel) { @@ -6477,64 +7175,60 @@ loongarch_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value) emit_insn (gen_clear_cache (addr, end_addr)); } -/* Implement HARD_REGNO_CALLER_SAVE_MODE. */ - -machine_mode -loongarch_hard_regno_caller_save_mode (unsigned int regno, unsigned int nregs, - machine_mode mode) -{ - /* For performance, avoid saving/restoring upper parts of a register - by returning MODE as save mode when the mode is known. */ - if (mode == VOIDmode) - return choose_hard_reg_mode (regno, nregs, NULL); - else - return mode; -} +/* Generate or test for an insn that supports a constant permutation. */ -/* Implement TARGET_SPILL_CLASS. */ +#define MAX_VECT_LEN 32 -static reg_class_t -loongarch_spill_class (reg_class_t rclass ATTRIBUTE_UNUSED, - machine_mode mode ATTRIBUTE_UNUSED) +struct expand_vec_perm_d { - return NO_REGS; -} - -/* Implement TARGET_PROMOTE_FUNCTION_MODE. */ + rtx target, op0, op1; + unsigned char perm[MAX_VECT_LEN]; + machine_mode vmode; + unsigned char nelt; + bool one_vector_p; + bool testing_p; +}; -/* This function is equivalent to default_promote_function_mode_always_promote - except that it returns a promoted mode even if type is NULL_TREE. This is - needed by libcalls which have no type (only a mode) such as fixed conversion - routines that take a signed or unsigned char/short argument and convert it - to a fixed type. */ +/* Construct (set target (vec_select op0 (parallel perm))) and + return true if that's a valid instruction in the active ISA. */ -static machine_mode -loongarch_promote_function_mode (const_tree type ATTRIBUTE_UNUSED, - machine_mode mode, - int *punsignedp ATTRIBUTE_UNUSED, - const_tree fntype ATTRIBUTE_UNUSED, - int for_return ATTRIBUTE_UNUSED) +static bool +loongarch_expand_vselect (rtx target, rtx op0, + const unsigned char *perm, unsigned nelt) { - int unsignedp; + rtx rperm[MAX_VECT_LEN], x; + rtx_insn *insn; + unsigned i; - if (type != NULL_TREE) - return promote_mode (type, mode, punsignedp); + for (i = 0; i < nelt; ++i) + rperm[i] = GEN_INT (perm[i]); - unsignedp = *punsignedp; - PROMOTE_MODE (mode, unsignedp, type); - *punsignedp = unsignedp; - return mode; + x = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (nelt, rperm)); + x = gen_rtx_VEC_SELECT (GET_MODE (target), op0, x); + x = gen_rtx_SET (target, x); + + insn = emit_insn (x); + if (recog_memoized (insn) < 0) + { + remove_insn (insn); + return false; + } + return true; } -/* Implement TARGET_STARTING_FRAME_OFFSET. See loongarch_compute_frame_info - for details about the frame layout. */ +/* Similar, but generate a vec_concat from op0 and op1 as well. */ -static HOST_WIDE_INT -loongarch_starting_frame_offset (void) +static bool +loongarch_expand_vselect_vconcat (rtx target, rtx op0, rtx op1, + const unsigned char *perm, unsigned nelt) { - if (FRAME_GROWS_DOWNWARD) - return 0; - return crtl->outgoing_args_size; + machine_mode v2mode; + rtx x; + + if (!GET_MODE_2XWIDER_MODE (GET_MODE (op0)).exists (&v2mode)) + return false; + x = gen_rtx_VEC_CONCAT (v2mode, op0, op1); + return loongarch_expand_vselect (target, x, perm, nelt); } static tree @@ -6797,105 +7491,1279 @@ loongarch_set_handled_components (sbitmap components) #define TARGET_ASM_ALIGNED_SI_OP "\t.word\t" #undef TARGET_ASM_ALIGNED_DI_OP #define TARGET_ASM_ALIGNED_DI_OP "\t.dword\t" +/* Construct (set target (vec_select op0 (parallel selector))) and + return true if that's a valid instruction in the active ISA. */ -#undef TARGET_OPTION_OVERRIDE -#define TARGET_OPTION_OVERRIDE loongarch_option_override - -#undef TARGET_LEGITIMIZE_ADDRESS -#define TARGET_LEGITIMIZE_ADDRESS loongarch_legitimize_address - -#undef TARGET_ASM_SELECT_RTX_SECTION -#define TARGET_ASM_SELECT_RTX_SECTION loongarch_select_rtx_section -#undef TARGET_ASM_FUNCTION_RODATA_SECTION -#define TARGET_ASM_FUNCTION_RODATA_SECTION loongarch_function_rodata_section +static bool +loongarch_expand_lsx_shuffle (struct expand_vec_perm_d *d) +{ + rtx x, elts[MAX_VECT_LEN]; + rtvec v; + rtx_insn *insn; + unsigned i; -#undef TARGET_SCHED_INIT -#define TARGET_SCHED_INIT loongarch_sched_init -#undef TARGET_SCHED_REORDER -#define TARGET_SCHED_REORDER loongarch_sched_reorder -#undef TARGET_SCHED_REORDER2 -#define TARGET_SCHED_REORDER2 loongarch_sched_reorder2 -#undef TARGET_SCHED_VARIABLE_ISSUE -#define TARGET_SCHED_VARIABLE_ISSUE loongarch_variable_issue -#undef TARGET_SCHED_ADJUST_COST -#define TARGET_SCHED_ADJUST_COST loongarch_adjust_cost -#undef TARGET_SCHED_ISSUE_RATE -#define TARGET_SCHED_ISSUE_RATE loongarch_issue_rate -#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD -#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \ - loongarch_multipass_dfa_lookahead + if (!ISA_HAS_LSX) + return false; -#undef TARGET_FUNCTION_OK_FOR_SIBCALL -#define TARGET_FUNCTION_OK_FOR_SIBCALL loongarch_function_ok_for_sibcall + for (i = 0; i < d->nelt; i++) + elts[i] = GEN_INT (d->perm[i]); -#undef TARGET_VALID_POINTER_MODE -#define TARGET_VALID_POINTER_MODE loongarch_valid_pointer_mode -#undef TARGET_REGISTER_MOVE_COST -#define TARGET_REGISTER_MOVE_COST loongarch_register_move_cost -#undef TARGET_MEMORY_MOVE_COST -#define TARGET_MEMORY_MOVE_COST loongarch_memory_move_cost -#undef TARGET_RTX_COSTS -#define TARGET_RTX_COSTS loongarch_rtx_costs -#undef TARGET_ADDRESS_COST -#define TARGET_ADDRESS_COST loongarch_address_cost + v = gen_rtvec_v (d->nelt, elts); + x = gen_rtx_PARALLEL (VOIDmode, v); -#undef TARGET_IN_SMALL_DATA_P -#define TARGET_IN_SMALL_DATA_P loongarch_in_small_data_p + if (!loongarch_const_vector_shuffle_set_p (x, d->vmode)) + return false; -#undef TARGET_PREFERRED_RELOAD_CLASS -#define TARGET_PREFERRED_RELOAD_CLASS loongarch_preferred_reload_class + x = gen_rtx_VEC_SELECT (d->vmode, d->op0, x); + x = gen_rtx_SET (d->target, x); -#undef TARGET_ASM_FILE_START_FILE_DIRECTIVE -#define TARGET_ASM_FILE_START_FILE_DIRECTIVE true + insn = emit_insn (x); + if (recog_memoized (insn) < 0) + { + remove_insn (insn); + return false; + } + return true; +} -#undef TARGET_EXPAND_BUILTIN_VA_START -#define TARGET_EXPAND_BUILTIN_VA_START loongarch_va_start +void +loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) +{ + machine_mode vmode = GET_MODE (target); -#undef TARGET_PROMOTE_FUNCTION_MODE -#define TARGET_PROMOTE_FUNCTION_MODE loongarch_promote_function_mode -#undef TARGET_RETURN_IN_MEMORY -#define TARGET_RETURN_IN_MEMORY loongarch_return_in_memory + switch (vmode) + { + case E_V16QImode: + emit_insn (gen_lsx_vshuf_b (target, op1, op0, sel)); + break; + case E_V2DFmode: + emit_insn (gen_lsx_vshuf_d_f (target, sel, op1, op0)); + break; + case E_V2DImode: + emit_insn (gen_lsx_vshuf_d (target, sel, op1, op0)); + break; + case E_V4SFmode: + emit_insn (gen_lsx_vshuf_w_f (target, sel, op1, op0)); + break; + case E_V4SImode: + emit_insn (gen_lsx_vshuf_w (target, sel, op1, op0)); + break; + case E_V8HImode: + emit_insn (gen_lsx_vshuf_h (target, sel, op1, op0)); + break; + default: + break; + } +} -#undef TARGET_FUNCTION_VALUE -#define TARGET_FUNCTION_VALUE loongarch_function_value -#undef TARGET_LIBCALL_VALUE -#define TARGET_LIBCALL_VALUE loongarch_libcall_value +static bool +loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d) +{ + int i; + rtx target, op0, op1, sel, tmp; + rtx rperm[MAX_VECT_LEN]; -#undef TARGET_ASM_OUTPUT_MI_THUNK -#define TARGET_ASM_OUTPUT_MI_THUNK loongarch_output_mi_thunk -#undef TARGET_ASM_CAN_OUTPUT_MI_THUNK -#define TARGET_ASM_CAN_OUTPUT_MI_THUNK \ - hook_bool_const_tree_hwi_hwi_const_tree_true + if (d->vmode == E_V2DImode || d->vmode == E_V2DFmode + || d->vmode == E_V4SImode || d->vmode == E_V4SFmode + || d->vmode == E_V8HImode || d->vmode == E_V16QImode) + { + target = d->target; + op0 = d->op0; + op1 = d->one_vector_p ? d->op0 : d->op1; -#undef TARGET_PRINT_OPERAND -#define TARGET_PRINT_OPERAND loongarch_print_operand -#undef TARGET_PRINT_OPERAND_ADDRESS -#define TARGET_PRINT_OPERAND_ADDRESS loongarch_print_operand_address -#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P -#define TARGET_PRINT_OPERAND_PUNCT_VALID_P \ - loongarch_print_operand_punct_valid_p + if (GET_MODE (op0) != GET_MODE (op1) + || GET_MODE (op0) != GET_MODE (target)) + return false; -#undef TARGET_SETUP_INCOMING_VARARGS -#define TARGET_SETUP_INCOMING_VARARGS loongarch_setup_incoming_varargs -#undef TARGET_STRICT_ARGUMENT_NAMING -#define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true -#undef TARGET_MUST_PASS_IN_STACK -#define TARGET_MUST_PASS_IN_STACK must_pass_in_stack_var_size -#undef TARGET_PASS_BY_REFERENCE -#define TARGET_PASS_BY_REFERENCE loongarch_pass_by_reference -#undef TARGET_ARG_PARTIAL_BYTES -#define TARGET_ARG_PARTIAL_BYTES loongarch_arg_partial_bytes -#undef TARGET_FUNCTION_ARG -#define TARGET_FUNCTION_ARG loongarch_function_arg -#undef TARGET_FUNCTION_ARG_ADVANCE -#define TARGET_FUNCTION_ARG_ADVANCE loongarch_function_arg_advance -#undef TARGET_FUNCTION_ARG_BOUNDARY -#define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary + if (d->testing_p) + return true; -#undef TARGET_SCALAR_MODE_SUPPORTED_P -#define TARGET_SCALAR_MODE_SUPPORTED_P loongarch_scalar_mode_supported_p + for (i = 0; i < d->nelt; i += 1) + { + rperm[i] = GEN_INT (d->perm[i]); + } -#undef TARGET_INIT_BUILTINS + if (d->vmode == E_V2DFmode) + { + sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm)); + tmp = gen_rtx_SUBREG (E_V2DImode, d->target, 0); + emit_move_insn (tmp, sel); + } + else if (d->vmode == E_V4SFmode) + { + sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm)); + tmp = gen_rtx_SUBREG (E_V4SImode, d->target, 0); + emit_move_insn (tmp, sel); + } + else + { + sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm)); + emit_move_insn (d->target, sel); + } + + switch (d->vmode) + { + case E_V2DFmode: + emit_insn (gen_lsx_vshuf_d_f (target, target, op1, op0)); + break; + case E_V2DImode: + emit_insn (gen_lsx_vshuf_d (target, target, op1, op0)); + break; + case E_V4SFmode: + emit_insn (gen_lsx_vshuf_w_f (target, target, op1, op0)); + break; + case E_V4SImode: + emit_insn (gen_lsx_vshuf_w (target, target, op1, op0)); + break; + case E_V8HImode: + emit_insn (gen_lsx_vshuf_h (target, target, op1, op0)); + break; + case E_V16QImode: + emit_insn (gen_lsx_vshuf_b (target, op1, op0, target)); + break; + default: + break; + } + + return true; + } + return false; +} + +static bool +loongarch_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) +{ + unsigned int i, nelt = d->nelt; + unsigned char perm2[MAX_VECT_LEN]; + + if (d->one_vector_p) + { + /* Try interleave with alternating operands. */ + memcpy (perm2, d->perm, sizeof (perm2)); + for (i = 1; i < nelt; i += 2) + perm2[i] += nelt; + if (loongarch_expand_vselect_vconcat (d->target, d->op0, d->op1, perm2, + nelt)) + return true; + } + else + { + if (loongarch_expand_vselect_vconcat (d->target, d->op0, d->op1, + d->perm, nelt)) + return true; + + /* Try again with swapped operands. */ + for (i = 0; i < nelt; ++i) + perm2[i] = (d->perm[i] + nelt) & (2 * nelt - 1); + if (loongarch_expand_vselect_vconcat (d->target, d->op1, d->op0, perm2, + nelt)) + return true; + } + + if (loongarch_expand_lsx_shuffle (d)) + return true; + return false; +} + +/* Implementation of constant vector permuatation. This function identifies + * recognized pattern of permuation selector argument, and use one or more + * instruction(s) to finish the permutation job correctly. For unsupported + * patterns, it will return false. */ + +static bool +loongarch_expand_vec_perm_const_2 (struct expand_vec_perm_d *d) +{ + /* Although we have the LSX vec_perm template, there's still some + 128bit vector permuatation operations send to vectorize_vec_perm_const. + In this case, we just simpliy wrap them by single vshuf.* instruction, + because LSX vshuf.* instruction just have the same behavior that GCC + expects. */ + return loongarch_try_expand_lsx_vshuf_const (d); +} + +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST. */ + +static bool +loongarch_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, + rtx target, rtx op0, rtx op1, + const vec_perm_indices &sel) +{ + if (vmode != op_mode) + return false; + + struct expand_vec_perm_d d; + int i, nelt, which; + unsigned char orig_perm[MAX_VECT_LEN]; + bool ok; + + d.target = target; + if (op0) + { + rtx nop0 = force_reg (vmode, op0); + if (op0 == op1) + op1 = nop0; + op0 = nop0; + } + if (op1) + op1 = force_reg (vmode, op1); + d.op0 = op0; + d.op1 = op1; + + d.vmode = vmode; + gcc_assert (VECTOR_MODE_P (vmode)); + d.nelt = nelt = GET_MODE_NUNITS (vmode); + d.testing_p = !target; + + /* This is overly conservative, but ensures we don't get an + uninitialized warning on ORIG_PERM. */ + memset (orig_perm, 0, MAX_VECT_LEN); + for (i = which = 0; i < nelt; ++i) + { + int ei = sel[i] & (2 * nelt - 1); + which |= (ei < nelt ? 1 : 2); + orig_perm[i] = ei; + } + memcpy (d.perm, orig_perm, MAX_VECT_LEN); + + switch (which) + { + default: + gcc_unreachable (); + + case 3: + d.one_vector_p = false; + if (d.testing_p || !rtx_equal_p (d.op0, d.op1)) + break; + /* FALLTHRU */ + + case 2: + for (i = 0; i < nelt; ++i) + d.perm[i] &= nelt - 1; + d.op0 = d.op1; + d.one_vector_p = true; + break; + + case 1: + d.op1 = d.op0; + d.one_vector_p = true; + break; + } + + if (d.testing_p) + { + d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1); + d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2); + if (!d.one_vector_p) + d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3); + + ok = loongarch_expand_vec_perm_const_2 (&d); + if (ok) + return ok; + + start_sequence (); + ok = loongarch_expand_vec_perm_const_1 (&d); + end_sequence (); + return ok; + } + + ok = loongarch_expand_vec_perm_const_2 (&d); + if (!ok) + ok = loongarch_expand_vec_perm_const_1 (&d); + + /* If we were given a two-vector permutation which just happened to + have both input vectors equal, we folded this into a one-vector + permutation. There are several loongson patterns that are matched + via direct vec_select+vec_concat expansion, but we do not have + support in loongarch_expand_vec_perm_const_1 to guess the adjustment + that should be made for a single operand. Just try again with + the original permutation. */ + if (!ok && which == 3) + { + d.op0 = op0; + d.op1 = op1; + d.one_vector_p = false; + memcpy (d.perm, orig_perm, MAX_VECT_LEN); + ok = loongarch_expand_vec_perm_const_1 (&d); + } + + return ok; +} + +/* Implement TARGET_SCHED_REASSOCIATION_WIDTH. */ + +static int +loongarch_sched_reassociation_width (unsigned int opc, machine_mode mode) +{ + switch (LARCH_ACTUAL_TUNE) + { + case CPU_LOONGARCH64: + case CPU_LA464: + /* Vector part. */ + if (LSX_SUPPORTED_MODE_P (mode)) + { + /* Integer vector instructions execute in FP unit. + The width of integer/float-point vector instructions is 3. */ + return 3; + } + + /* Scalar part. */ + else if (INTEGRAL_MODE_P (mode)) + return 1; + else if (FLOAT_MODE_P (mode)) + { + if (opc == PLUS_EXPR) + { + return 2; + } + return 4; + } + break; + default: + break; + } + return 1; +} + +/* Implement extract a scalar element from vecotr register */ + +void +loongarch_expand_vector_extract (rtx target, rtx vec, int elt) +{ + machine_mode mode = GET_MODE (vec); + machine_mode inner_mode = GET_MODE_INNER (mode); + rtx tmp; + + switch (mode) + { + case E_V8HImode: + case E_V16QImode: + break; + + default: + break; + } + + tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, GEN_INT (elt))); + tmp = gen_rtx_VEC_SELECT (inner_mode, vec, tmp); + + /* Let the rtl optimizers know about the zero extension performed. */ + if (inner_mode == QImode || inner_mode == HImode) + { + tmp = gen_rtx_ZERO_EXTEND (SImode, tmp); + target = gen_lowpart (SImode, target); + } + if (inner_mode == SImode || inner_mode == DImode) + { + tmp = gen_rtx_SIGN_EXTEND (inner_mode, tmp); + } + + emit_insn (gen_rtx_SET (target, tmp)); +} + +/* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC + to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode. + The upper bits of DEST are undefined, though they shouldn't cause + exceptions (some bits from src or all zeros are ok). */ + +static void +emit_reduc_half (rtx dest, rtx src, int i) +{ + rtx tem, d = dest; + switch (GET_MODE (src)) + { + case E_V4SFmode: + tem = gen_lsx_vbsrl_w_f (dest, src, GEN_INT (i == 128 ? 8 : 4)); + break; + case E_V2DFmode: + tem = gen_lsx_vbsrl_d_f (dest, src, GEN_INT (8)); + break; + case E_V16QImode: + case E_V8HImode: + case E_V4SImode: + case E_V2DImode: + d = gen_reg_rtx (V2DImode); + tem = gen_lsx_vbsrl_d (d, gen_lowpart (V2DImode, src), GEN_INT (i/16)); + break; + default: + gcc_unreachable (); + } + emit_insn (tem); + if (d != dest) + emit_move_insn (dest, gen_lowpart (GET_MODE (dest), d)); +} + +/* Expand a vector reduction. FN is the binary pattern to reduce; + DEST is the destination; IN is the input vector. */ + +void +loongarch_expand_vector_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in) +{ + rtx half, dst, vec = in; + machine_mode mode = GET_MODE (in); + int i; + + for (i = GET_MODE_BITSIZE (mode); + i > GET_MODE_UNIT_BITSIZE (mode); + i >>= 1) + { + half = gen_reg_rtx (mode); + emit_reduc_half (half, vec, i); + if (i == GET_MODE_UNIT_BITSIZE (mode) * 2) + dst = dest; + else + dst = gen_reg_rtx (mode); + emit_insn (fn (dst, half, vec)); + vec = dst; + } +} + +/* Expand an integral vector unpack operation. */ + +void +loongarch_expand_vec_unpack (rtx operands[2], bool unsigned_p, bool high_p) +{ + machine_mode imode = GET_MODE (operands[1]); + rtx (*unpack) (rtx, rtx, rtx); + rtx (*cmpFunc) (rtx, rtx, rtx); + rtx tmp, dest; + + if (ISA_HAS_LSX) + { + switch (imode) + { + case E_V4SImode: + if (high_p != 0) + unpack = gen_lsx_vilvh_w; + else + unpack = gen_lsx_vilvl_w; + + cmpFunc = gen_lsx_vslt_w; + break; + + case E_V8HImode: + if (high_p != 0) + unpack = gen_lsx_vilvh_h; + else + unpack = gen_lsx_vilvl_h; + + cmpFunc = gen_lsx_vslt_h; + break; + + case E_V16QImode: + if (high_p != 0) + unpack = gen_lsx_vilvh_b; + else + unpack = gen_lsx_vilvl_b; + + cmpFunc = gen_lsx_vslt_b; + break; + + default: + gcc_unreachable (); + break; + } + + if (!unsigned_p) + { + /* Extract sign extention for each element comparing each element + with immediate zero. */ + tmp = gen_reg_rtx (imode); + emit_insn (cmpFunc (tmp, operands[1], CONST0_RTX (imode))); + } + else + tmp = force_reg (imode, CONST0_RTX (imode)); + + dest = gen_reg_rtx (imode); + + emit_insn (unpack (dest, operands[1], tmp)); + emit_move_insn (operands[0], gen_lowpart (GET_MODE (operands[0]), dest)); + return; + } + gcc_unreachable (); +} + +/* Construct and return PARALLEL RTX with CONST_INTs for HIGH (high_p == TRUE) + or LOW (high_p == FALSE) half of a vector for mode MODE. */ + +rtx +loongarch_lsx_vec_parallel_const_half (machine_mode mode, bool high_p) +{ + int nunits = GET_MODE_NUNITS (mode); + rtvec v = rtvec_alloc (nunits / 2); + int base; + int i; + + base = high_p ? nunits / 2 : 0; + + for (i = 0; i < nunits / 2; i++) + RTVEC_ELT (v, i) = GEN_INT (base + i); + + return gen_rtx_PARALLEL (VOIDmode, v); +} + +/* A subroutine of loongarch_expand_vec_init, match constant vector + elements. */ + +static inline bool +loongarch_constant_elt_p (rtx x) +{ + return CONST_INT_P (x) || GET_CODE (x) == CONST_DOUBLE; +} + +rtx +loongarch_gen_const_int_vector_shuffle (machine_mode mode, int val) +{ + int nunits = GET_MODE_NUNITS (mode); + int nsets = nunits / 4; + rtx elts[MAX_VECT_LEN]; + int set = 0; + int i, j; + + /* Generate a const_int vector replicating the same 4-element set + from an immediate. */ + for (j = 0; j < nsets; j++, set = 4 * j) + for (i = 0; i < 4; i++) + elts[set + i] = GEN_INT (set + ((val >> (2 * i)) & 0x3)); + + return gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (nunits, elts)); +} + +/* Expand a vector initialization. */ + +void +loongarch_expand_vector_init (rtx target, rtx vals) +{ + machine_mode vmode = GET_MODE (target); + machine_mode imode = GET_MODE_INNER (vmode); + unsigned i, nelt = GET_MODE_NUNITS (vmode); + unsigned nvar = 0; + bool all_same = true; + rtx x; + + for (i = 0; i < nelt; ++i) + { + x = XVECEXP (vals, 0, i); + if (!loongarch_constant_elt_p (x)) + nvar++; + if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0))) + all_same = false; + } + + if (ISA_HAS_LSX) + { + if (all_same) + { + rtx same = XVECEXP (vals, 0, 0); + rtx temp, temp2; + + if (CONST_INT_P (same) && nvar == 0 + && loongarch_signed_immediate_p (INTVAL (same), 10, 0)) + { + switch (vmode) + { + case E_V16QImode: + case E_V8HImode: + case E_V4SImode: + case E_V2DImode: + temp = gen_rtx_CONST_VECTOR (vmode, XVEC (vals, 0)); + emit_move_insn (target, temp); + return; + + default: + gcc_unreachable (); + } + } + temp = gen_reg_rtx (imode); + if (imode == GET_MODE (same)) + temp2 = same; + else if (GET_MODE_SIZE (imode) >= UNITS_PER_WORD) + { + if (GET_CODE (same) == MEM) + { + rtx reg_tmp = gen_reg_rtx (GET_MODE (same)); + loongarch_emit_move (reg_tmp, same); + temp2 = simplify_gen_subreg (imode, reg_tmp, + GET_MODE (reg_tmp), 0); + } + else + temp2 = simplify_gen_subreg (imode, same, GET_MODE (same), 0); + } + else + { + if (GET_CODE (same) == MEM) + { + rtx reg_tmp = gen_reg_rtx (GET_MODE (same)); + loongarch_emit_move (reg_tmp, same); + temp2 = lowpart_subreg (imode, reg_tmp, GET_MODE (reg_tmp)); + } + else + temp2 = lowpart_subreg (imode, same, GET_MODE (same)); + } + emit_move_insn (temp, temp2); + + switch (vmode) + { + case E_V16QImode: + case E_V8HImode: + case E_V4SImode: + case E_V2DImode: + loongarch_emit_move (target, gen_rtx_VEC_DUPLICATE (vmode, temp)); + break; + + case E_V4SFmode: + emit_insn (gen_lsx_vreplvei_w_f_scalar (target, temp)); + break; + + case E_V2DFmode: + emit_insn (gen_lsx_vreplvei_d_f_scalar (target, temp)); + break; + + default: + gcc_unreachable (); + } + } + else + { + emit_move_insn (target, CONST0_RTX (vmode)); + + for (i = 0; i < nelt; ++i) + { + rtx temp = gen_reg_rtx (imode); + emit_move_insn (temp, XVECEXP (vals, 0, i)); + switch (vmode) + { + case E_V16QImode: + if (i == 0) + emit_insn (gen_lsx_vreplvei_b_scalar (target, temp)); + else + emit_insn (gen_vec_setv16qi (target, temp, GEN_INT (i))); + break; + + case E_V8HImode: + if (i == 0) + emit_insn (gen_lsx_vreplvei_h_scalar (target, temp)); + else + emit_insn (gen_vec_setv8hi (target, temp, GEN_INT (i))); + break; + + case E_V4SImode: + if (i == 0) + emit_insn (gen_lsx_vreplvei_w_scalar (target, temp)); + else + emit_insn (gen_vec_setv4si (target, temp, GEN_INT (i))); + break; + + case E_V2DImode: + if (i == 0) + emit_insn (gen_lsx_vreplvei_d_scalar (target, temp)); + else + emit_insn (gen_vec_setv2di (target, temp, GEN_INT (i))); + break; + + case E_V4SFmode: + if (i == 0) + emit_insn (gen_lsx_vreplvei_w_f_scalar (target, temp)); + else + emit_insn (gen_vec_setv4sf (target, temp, GEN_INT (i))); + break; + + case E_V2DFmode: + if (i == 0) + emit_insn (gen_lsx_vreplvei_d_f_scalar (target, temp)); + else + emit_insn (gen_vec_setv2df (target, temp, GEN_INT (i))); + break; + + default: + gcc_unreachable (); + } + } + } + return; + } + + /* Load constants from the pool, or whatever's handy. */ + if (nvar == 0) + { + emit_move_insn (target, gen_rtx_CONST_VECTOR (vmode, XVEC (vals, 0))); + return; + } + + /* For two-part initialization, always use CONCAT. */ + if (nelt == 2) + { + rtx op0 = force_reg (imode, XVECEXP (vals, 0, 0)); + rtx op1 = force_reg (imode, XVECEXP (vals, 0, 1)); + x = gen_rtx_VEC_CONCAT (vmode, op0, op1); + emit_insn (gen_rtx_SET (target, x)); + return; + } + + /* Loongson is the only cpu with vectors with more elements. */ + gcc_assert (0); +} + +/* Implement HARD_REGNO_CALLER_SAVE_MODE. */ + +machine_mode +loongarch_hard_regno_caller_save_mode (unsigned int regno, unsigned int nregs, + machine_mode mode) +{ + /* For performance, avoid saving/restoring upper parts of a register + by returning MODE as save mode when the mode is known. */ + if (mode == VOIDmode) + return choose_hard_reg_mode (regno, nregs, NULL); + else + return mode; +} + +/* Generate RTL for comparing CMP_OP0 and CMP_OP1 using condition COND and + store the result -1 or 0 in DEST. */ + +static void +loongarch_expand_lsx_cmp (rtx dest, enum rtx_code cond, rtx op0, rtx op1) +{ + machine_mode cmp_mode = GET_MODE (op0); + int unspec = -1; + bool negate = false; + + switch (cmp_mode) + { + case E_V16QImode: + case E_V32QImode: + case E_V8HImode: + case E_V16HImode: + case E_V4SImode: + case E_V8SImode: + case E_V2DImode: + case E_V4DImode: + switch (cond) + { + case NE: + cond = reverse_condition (cond); + negate = true; + break; + case EQ: + case LT: + case LE: + case LTU: + case LEU: + break; + case GE: + case GT: + case GEU: + case GTU: + std::swap (op0, op1); + cond = swap_condition (cond); + break; + default: + gcc_unreachable (); + } + loongarch_emit_binary (cond, dest, op0, op1); + if (negate) + emit_move_insn (dest, gen_rtx_NOT (GET_MODE (dest), dest)); + break; + + case E_V4SFmode: + case E_V2DFmode: + switch (cond) + { + case UNORDERED: + case ORDERED: + case EQ: + case NE: + case UNEQ: + case UNLE: + case UNLT: + break; + case LTGT: cond = NE; break; + case UNGE: cond = UNLE; std::swap (op0, op1); break; + case UNGT: cond = UNLT; std::swap (op0, op1); break; + case LE: unspec = UNSPEC_LSX_VFCMP_SLE; break; + case LT: unspec = UNSPEC_LSX_VFCMP_SLT; break; + case GE: unspec = UNSPEC_LSX_VFCMP_SLE; std::swap (op0, op1); break; + case GT: unspec = UNSPEC_LSX_VFCMP_SLT; std::swap (op0, op1); break; + default: + gcc_unreachable (); + } + if (unspec < 0) + loongarch_emit_binary (cond, dest, op0, op1); + else + { + rtx x = gen_rtx_UNSPEC (GET_MODE (dest), + gen_rtvec (2, op0, op1), unspec); + emit_insn (gen_rtx_SET (dest, x)); + } + break; + + default: + gcc_unreachable (); + break; + } +} + +/* Expand VEC_COND_EXPR, where: + MODE is mode of the result + VIMODE equivalent integer mode + OPERANDS operands of VEC_COND_EXPR. */ + +void +loongarch_expand_vec_cond_expr (machine_mode mode, machine_mode vimode, + rtx *operands) +{ + rtx cond = operands[3]; + rtx cmp_op0 = operands[4]; + rtx cmp_op1 = operands[5]; + rtx cmp_res = gen_reg_rtx (vimode); + + loongarch_expand_lsx_cmp (cmp_res, GET_CODE (cond), cmp_op0, cmp_op1); + + /* We handle the following cases: + 1) r = a CMP b ? -1 : 0 + 2) r = a CMP b ? -1 : v + 3) r = a CMP b ? v : 0 + 4) r = a CMP b ? v1 : v2 */ + + /* Case (1) above. We only move the results. */ + if (operands[1] == CONSTM1_RTX (vimode) + && operands[2] == CONST0_RTX (vimode)) + emit_move_insn (operands[0], cmp_res); + else + { + rtx src1 = gen_reg_rtx (vimode); + rtx src2 = gen_reg_rtx (vimode); + rtx mask = gen_reg_rtx (vimode); + rtx bsel; + + /* Move the vector result to use it as a mask. */ + emit_move_insn (mask, cmp_res); + + if (register_operand (operands[1], mode)) + { + rtx xop1 = operands[1]; + if (mode != vimode) + { + xop1 = gen_reg_rtx (vimode); + emit_move_insn (xop1, gen_rtx_SUBREG (vimode, operands[1], 0)); + } + emit_move_insn (src1, xop1); + } + else + { + gcc_assert (operands[1] == CONSTM1_RTX (vimode)); + /* Case (2) if the below doesn't move the mask to src2. */ + emit_move_insn (src1, mask); + } + + if (register_operand (operands[2], mode)) + { + rtx xop2 = operands[2]; + if (mode != vimode) + { + xop2 = gen_reg_rtx (vimode); + emit_move_insn (xop2, gen_rtx_SUBREG (vimode, operands[2], 0)); + } + emit_move_insn (src2, xop2); + } + else + { + gcc_assert (operands[2] == CONST0_RTX (mode)); + /* Case (3) if the above didn't move the mask to src1. */ + emit_move_insn (src2, mask); + } + + /* We deal with case (4) if the mask wasn't moved to either src1 or src2. + In any case, we eventually do vector mask-based copy. */ + bsel = gen_rtx_IOR (vimode, + gen_rtx_AND (vimode, + gen_rtx_NOT (vimode, mask), src2), + gen_rtx_AND (vimode, mask, src1)); + /* The result is placed back to a register with the mask. */ + emit_insn (gen_rtx_SET (mask, bsel)); + emit_move_insn (operands[0], gen_rtx_SUBREG (mode, mask, 0)); + } +} + +void +loongarch_expand_vec_cond_mask_expr (machine_mode mode, machine_mode vimode, + rtx *operands) +{ + rtx cmp_res = operands[3]; + + /* We handle the following cases: + 1) r = a CMP b ? -1 : 0 + 2) r = a CMP b ? -1 : v + 3) r = a CMP b ? v : 0 + 4) r = a CMP b ? v1 : v2 */ + + /* Case (1) above. We only move the results. */ + if (operands[1] == CONSTM1_RTX (vimode) + && operands[2] == CONST0_RTX (vimode)) + emit_move_insn (operands[0], cmp_res); + else + { + rtx src1 = gen_reg_rtx (vimode); + rtx src2 = gen_reg_rtx (vimode); + rtx mask = gen_reg_rtx (vimode); + rtx bsel; + + /* Move the vector result to use it as a mask. */ + emit_move_insn (mask, cmp_res); + + if (register_operand (operands[1], mode)) + { + rtx xop1 = operands[1]; + if (mode != vimode) + { + xop1 = gen_reg_rtx (vimode); + emit_move_insn (xop1, gen_rtx_SUBREG (vimode, operands[1], 0)); + } + emit_move_insn (src1, xop1); + } + else + { + gcc_assert (operands[1] == CONSTM1_RTX (vimode)); + /* Case (2) if the below doesn't move the mask to src2. */ + emit_move_insn (src1, mask); + } + + if (register_operand (operands[2], mode)) + { + rtx xop2 = operands[2]; + if (mode != vimode) + { + xop2 = gen_reg_rtx (vimode); + emit_move_insn (xop2, gen_rtx_SUBREG (vimode, operands[2], 0)); + } + emit_move_insn (src2, xop2); + } + else + { + gcc_assert (operands[2] == CONST0_RTX (mode)); + /* Case (3) if the above didn't move the mask to src1. */ + emit_move_insn (src2, mask); + } + + /* We deal with case (4) if the mask wasn't moved to either src1 or src2. + In any case, we eventually do vector mask-based copy. */ + bsel = gen_rtx_IOR (vimode, + gen_rtx_AND (vimode, + gen_rtx_NOT (vimode, mask), src2), + gen_rtx_AND (vimode, mask, src1)); + /* The result is placed back to a register with the mask. */ + emit_insn (gen_rtx_SET (mask, bsel)); + emit_move_insn (operands[0], gen_rtx_SUBREG (mode, mask, 0)); + } +} + +/* Expand integer vector comparison */ +bool +loongarch_expand_vec_cmp (rtx operands[]) +{ + + rtx_code code = GET_CODE (operands[1]); + loongarch_expand_lsx_cmp (operands[0], code, operands[2], operands[3]); + return true; +} + +/* Implement TARGET_CASE_VALUES_THRESHOLD. */ + +unsigned int +loongarch_case_values_threshold (void) +{ + return default_case_values_threshold (); +} + +/* Implement TARGET_SPILL_CLASS. */ + +static reg_class_t +loongarch_spill_class (reg_class_t rclass ATTRIBUTE_UNUSED, + machine_mode mode ATTRIBUTE_UNUSED) +{ + return NO_REGS; +} + +/* Implement TARGET_PROMOTE_FUNCTION_MODE. */ + +/* This function is equivalent to default_promote_function_mode_always_promote + except that it returns a promoted mode even if type is NULL_TREE. This is + needed by libcalls which have no type (only a mode) such as fixed conversion + routines that take a signed or unsigned char/short argument and convert it + to a fixed type. */ + +static machine_mode +loongarch_promote_function_mode (const_tree type ATTRIBUTE_UNUSED, + machine_mode mode, + int *punsignedp ATTRIBUTE_UNUSED, + const_tree fntype ATTRIBUTE_UNUSED, + int for_return ATTRIBUTE_UNUSED) +{ + int unsignedp; + + if (type != NULL_TREE) + return promote_mode (type, mode, punsignedp); + + unsignedp = *punsignedp; + PROMOTE_MODE (mode, unsignedp, type); + *punsignedp = unsignedp; + return mode; +} + +/* Implement TARGET_STARTING_FRAME_OFFSET. See loongarch_compute_frame_info + for details about the frame layout. */ + +static HOST_WIDE_INT +loongarch_starting_frame_offset (void) +{ + if (FRAME_GROWS_DOWNWARD) + return 0; + return crtl->outgoing_args_size; +} + +/* A subroutine of loongarch_build_signbit_mask. If VECT is true, + then replicate the value for all elements of the vector + register. */ + +rtx +loongarch_build_const_vector (machine_mode mode, bool vect, rtx value) +{ + int i, n_elt; + rtvec v; + machine_mode scalar_mode; + + switch (mode) + { + case E_V32QImode: + case E_V16QImode: + case E_V32HImode: + case E_V16HImode: + case E_V8HImode: + case E_V8SImode: + case E_V4SImode: + case E_V8DImode: + case E_V4DImode: + case E_V2DImode: + gcc_assert (vect); + /* FALLTHRU */ + case E_V8SFmode: + case E_V4SFmode: + case E_V8DFmode: + case E_V4DFmode: + case E_V2DFmode: + n_elt = GET_MODE_NUNITS (mode); + v = rtvec_alloc (n_elt); + scalar_mode = GET_MODE_INNER (mode); + + RTVEC_ELT (v, 0) = value; + + for (i = 1; i < n_elt; ++i) + RTVEC_ELT (v, i) = vect ? value : CONST0_RTX (scalar_mode); + + return gen_rtx_CONST_VECTOR (mode, v); + + default: + gcc_unreachable (); + } +} + +/* Create a mask for the sign bit in MODE + for an register. If VECT is true, then replicate the mask for + all elements of the vector register. If INVERT is true, then create + a mask excluding the sign bit. */ + +rtx +loongarch_build_signbit_mask (machine_mode mode, bool vect, bool invert) +{ + machine_mode vec_mode, imode; + wide_int w; + rtx mask, v; + + switch (mode) + { + case E_V16SImode: + case E_V16SFmode: + case E_V8SImode: + case E_V4SImode: + case E_V8SFmode: + case E_V4SFmode: + vec_mode = mode; + imode = SImode; + break; + + case E_V8DImode: + case E_V4DImode: + case E_V2DImode: + case E_V8DFmode: + case E_V4DFmode: + case E_V2DFmode: + vec_mode = mode; + imode = DImode; + break; + + case E_TImode: + case E_TFmode: + vec_mode = VOIDmode; + imode = TImode; + break; + + default: + gcc_unreachable (); + } + + machine_mode inner_mode = GET_MODE_INNER (mode); + w = wi::set_bit_in_zero (GET_MODE_BITSIZE (inner_mode) - 1, + GET_MODE_BITSIZE (inner_mode)); + if (invert) + w = wi::bit_not (w); + + /* Force this value into the low part of a fp vector constant. */ + mask = immed_wide_int_const (w, imode); + mask = gen_lowpart (inner_mode, mask); + + if (vec_mode == VOIDmode) + return force_reg (inner_mode, mask); + + v = loongarch_build_const_vector (vec_mode, vect, mask); + return force_reg (vec_mode, v); +} + +static bool +loongarch_builtin_support_vector_misalignment (machine_mode mode, + const_tree type, + int misalignment, + bool is_packed) +{ + if (ISA_HAS_LSX && STRICT_ALIGNMENT) + { + if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing) + return false; + if (misalignment == -1) + return false; + } + return default_builtin_support_vector_misalignment (mode, type, misalignment, + is_packed); +} + +/* Initialize the GCC target structure. */ +#undef TARGET_ASM_ALIGNED_HI_OP +#define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" +#undef TARGET_ASM_ALIGNED_SI_OP +#define TARGET_ASM_ALIGNED_SI_OP "\t.word\t" +#undef TARGET_ASM_ALIGNED_DI_OP +#define TARGET_ASM_ALIGNED_DI_OP "\t.dword\t" + +#undef TARGET_OPTION_OVERRIDE +#define TARGET_OPTION_OVERRIDE loongarch_option_override + +#undef TARGET_LEGITIMIZE_ADDRESS +#define TARGET_LEGITIMIZE_ADDRESS loongarch_legitimize_address + +#undef TARGET_ASM_SELECT_RTX_SECTION +#define TARGET_ASM_SELECT_RTX_SECTION loongarch_select_rtx_section +#undef TARGET_ASM_FUNCTION_RODATA_SECTION +#define TARGET_ASM_FUNCTION_RODATA_SECTION loongarch_function_rodata_section + +#undef TARGET_SCHED_INIT +#define TARGET_SCHED_INIT loongarch_sched_init +#undef TARGET_SCHED_REORDER +#define TARGET_SCHED_REORDER loongarch_sched_reorder +#undef TARGET_SCHED_REORDER2 +#define TARGET_SCHED_REORDER2 loongarch_sched_reorder2 +#undef TARGET_SCHED_VARIABLE_ISSUE +#define TARGET_SCHED_VARIABLE_ISSUE loongarch_variable_issue +#undef TARGET_SCHED_ADJUST_COST +#define TARGET_SCHED_ADJUST_COST loongarch_adjust_cost +#undef TARGET_SCHED_ISSUE_RATE +#define TARGET_SCHED_ISSUE_RATE loongarch_issue_rate +#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD +#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \ + loongarch_multipass_dfa_lookahead + +#undef TARGET_FUNCTION_OK_FOR_SIBCALL +#define TARGET_FUNCTION_OK_FOR_SIBCALL loongarch_function_ok_for_sibcall + +#undef TARGET_VALID_POINTER_MODE +#define TARGET_VALID_POINTER_MODE loongarch_valid_pointer_mode +#undef TARGET_REGISTER_MOVE_COST +#define TARGET_REGISTER_MOVE_COST loongarch_register_move_cost +#undef TARGET_MEMORY_MOVE_COST +#define TARGET_MEMORY_MOVE_COST loongarch_memory_move_cost +#undef TARGET_RTX_COSTS +#define TARGET_RTX_COSTS loongarch_rtx_costs +#undef TARGET_ADDRESS_COST +#define TARGET_ADDRESS_COST loongarch_address_cost +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST +#define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ + loongarch_builtin_vectorization_cost + + +#undef TARGET_IN_SMALL_DATA_P +#define TARGET_IN_SMALL_DATA_P loongarch_in_small_data_p + +#undef TARGET_PREFERRED_RELOAD_CLASS +#define TARGET_PREFERRED_RELOAD_CLASS loongarch_preferred_reload_class + +#undef TARGET_ASM_FILE_START_FILE_DIRECTIVE +#define TARGET_ASM_FILE_START_FILE_DIRECTIVE true + +#undef TARGET_EXPAND_BUILTIN_VA_START +#define TARGET_EXPAND_BUILTIN_VA_START loongarch_va_start + +#undef TARGET_PROMOTE_FUNCTION_MODE +#define TARGET_PROMOTE_FUNCTION_MODE loongarch_promote_function_mode +#undef TARGET_RETURN_IN_MEMORY +#define TARGET_RETURN_IN_MEMORY loongarch_return_in_memory + +#undef TARGET_FUNCTION_VALUE +#define TARGET_FUNCTION_VALUE loongarch_function_value +#undef TARGET_LIBCALL_VALUE +#define TARGET_LIBCALL_VALUE loongarch_libcall_value + +#undef TARGET_ASM_OUTPUT_MI_THUNK +#define TARGET_ASM_OUTPUT_MI_THUNK loongarch_output_mi_thunk +#undef TARGET_ASM_CAN_OUTPUT_MI_THUNK +#define TARGET_ASM_CAN_OUTPUT_MI_THUNK \ + hook_bool_const_tree_hwi_hwi_const_tree_true + +#undef TARGET_PRINT_OPERAND +#define TARGET_PRINT_OPERAND loongarch_print_operand +#undef TARGET_PRINT_OPERAND_ADDRESS +#define TARGET_PRINT_OPERAND_ADDRESS loongarch_print_operand_address +#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P +#define TARGET_PRINT_OPERAND_PUNCT_VALID_P \ + loongarch_print_operand_punct_valid_p + +#undef TARGET_SETUP_INCOMING_VARARGS +#define TARGET_SETUP_INCOMING_VARARGS loongarch_setup_incoming_varargs +#undef TARGET_STRICT_ARGUMENT_NAMING +#define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true +#undef TARGET_MUST_PASS_IN_STACK +#define TARGET_MUST_PASS_IN_STACK must_pass_in_stack_var_size +#undef TARGET_PASS_BY_REFERENCE +#define TARGET_PASS_BY_REFERENCE loongarch_pass_by_reference +#undef TARGET_ARG_PARTIAL_BYTES +#define TARGET_ARG_PARTIAL_BYTES loongarch_arg_partial_bytes +#undef TARGET_FUNCTION_ARG +#define TARGET_FUNCTION_ARG loongarch_function_arg +#undef TARGET_FUNCTION_ARG_ADVANCE +#define TARGET_FUNCTION_ARG_ADVANCE loongarch_function_arg_advance +#undef TARGET_FUNCTION_ARG_BOUNDARY +#define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary + +#undef TARGET_VECTOR_MODE_SUPPORTED_P +#define TARGET_VECTOR_MODE_SUPPORTED_P loongarch_vector_mode_supported_p + +#undef TARGET_SCALAR_MODE_SUPPORTED_P +#define TARGET_SCALAR_MODE_SUPPORTED_P loongarch_scalar_mode_supported_p + +#undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE +#define TARGET_VECTORIZE_PREFERRED_SIMD_MODE loongarch_preferred_simd_mode + +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \ + loongarch_autovectorize_vector_modes + +#undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS loongarch_init_builtins #undef TARGET_BUILTIN_DECL #define TARGET_BUILTIN_DECL loongarch_builtin_decl @@ -6942,6 +8810,14 @@ loongarch_set_handled_components (sbitmap components) #undef TARGET_MAX_ANCHOR_OFFSET #define TARGET_MAX_ANCHOR_OFFSET (IMM_REACH/2-1) +#undef TARGET_VECTORIZE_VEC_PERM_CONST +#define TARGET_VECTORIZE_VEC_PERM_CONST loongarch_vectorize_vec_perm_const + +#undef TARGET_SCHED_REASSOCIATION_WIDTH +#define TARGET_SCHED_REASSOCIATION_WIDTH loongarch_sched_reassociation_width + +#undef TARGET_CASE_VALUES_THRESHOLD +#define TARGET_CASE_VALUES_THRESHOLD loongarch_case_values_threshold #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV loongarch_atomic_assign_expand_fenv @@ -6960,6 +8836,10 @@ loongarch_set_handled_components (sbitmap components) #undef TARGET_MODES_TIEABLE_P #define TARGET_MODES_TIEABLE_P loongarch_modes_tieable_p +#undef TARGET_HARD_REGNO_CALL_PART_CLOBBERED +#define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \ + loongarch_hard_regno_call_part_clobbered + #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 2 @@ -7010,6 +8890,10 @@ loongarch_set_handled_components (sbitmap components) #define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS \ loongarch_set_handled_components +#undef TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT +#define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT \ + loongarch_builtin_support_vector_misalignment + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-loongarch.h" diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index eca723293a1..e939dd826d1 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -23,6 +23,8 @@ along with GCC; see the file COPYING3. If not see #include "config/loongarch/loongarch-opts.h" +#define TARGET_SUPPORTS_WIDE_INT 1 + /* Macros to silence warnings about numbers being signed in traditional C and unsigned in ISO C when compiled on 32-bit hosts. */ @@ -179,6 +181,11 @@ along with GCC; see the file COPYING3. If not see #define MIN_UNITS_PER_WORD 4 #endif +/* Width of a LSX vector register in bytes. */ +#define UNITS_PER_LSX_REG 16 +/* Width of a LSX vector register in bits. */ +#define BITS_PER_LSX_REG (UNITS_PER_LSX_REG * BITS_PER_UNIT) + /* For LARCH, width of a floating point register. */ #define UNITS_PER_FPREG (TARGET_DOUBLE_FLOAT ? 8 : 4) @@ -241,8 +248,10 @@ along with GCC; see the file COPYING3. If not see #define STRUCTURE_SIZE_BOUNDARY 8 /* There is no point aligning anything to a rounder boundary than - LONG_DOUBLE_TYPE_SIZE. */ -#define BIGGEST_ALIGNMENT (LONG_DOUBLE_TYPE_SIZE) + LONG_DOUBLE_TYPE_SIZE, unless under LSX the bigggest alignment is + BITS_PER_LSX_REG/.. */ +#define BIGGEST_ALIGNMENT \ + (ISA_HAS_LSX ? BITS_PER_LSX_REG : LONG_DOUBLE_TYPE_SIZE) /* All accesses must be aligned. */ #define STRICT_ALIGNMENT (TARGET_STRICT_ALIGN) @@ -378,6 +387,9 @@ along with GCC; see the file COPYING3. If not see #define FP_REG_FIRST 32 #define FP_REG_LAST 63 #define FP_REG_NUM (FP_REG_LAST - FP_REG_FIRST + 1) +#define LSX_REG_FIRST FP_REG_FIRST +#define LSX_REG_LAST FP_REG_LAST +#define LSX_REG_NUM FP_REG_NUM /* The DWARF 2 CFA column which tracks the return address from a signal handler context. This means that to maintain backwards @@ -395,8 +407,11 @@ along with GCC; see the file COPYING3. If not see ((unsigned int) ((int) (REGNO) - FP_REG_FIRST) < FP_REG_NUM) #define FCC_REG_P(REGNO) \ ((unsigned int) ((int) (REGNO) - FCC_REG_FIRST) < FCC_REG_NUM) +#define LSX_REG_P(REGNO) \ + ((unsigned int) ((int) (REGNO) - LSX_REG_FIRST) < LSX_REG_NUM) #define FP_REG_RTX_P(X) (REG_P (X) && FP_REG_P (REGNO (X))) +#define LSX_REG_RTX_P(X) (REG_P (X) && LSX_REG_P (REGNO (X))) /* Select a register mode required for caller save of hard regno REGNO. */ #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ @@ -577,6 +592,11 @@ enum reg_class #define IMM12_OPERAND(VALUE) \ ((unsigned HOST_WIDE_INT) (VALUE) + IMM_REACH / 2 < IMM_REACH) +/* True if VALUE is a signed 13-bit number. */ + +#define IMM13_OPERAND(VALUE) \ + ((unsigned HOST_WIDE_INT) (VALUE) + 0x1000 < 0x2000) + /* True if VALUE is a signed 16-bit number. */ #define IMM16_OPERAND(VALUE) \ @@ -706,6 +726,13 @@ enum reg_class #define FP_ARG_FIRST (FP_REG_FIRST + 0) #define FP_ARG_LAST (FP_ARG_FIRST + MAX_ARGS_IN_REGISTERS - 1) +/* True if MODE is vector and supported in a LSX vector register. */ +#define LSX_SUPPORTED_MODE_P(MODE) \ + (ISA_HAS_LSX \ + && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG \ + && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT \ + || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)) + /* 1 if N is a possible register number for function argument passing. We have no FP argument registers when soft-float. */ @@ -926,7 +953,39 @@ typedef struct { { "s7", 30 + GP_REG_FIRST }, \ { "s8", 31 + GP_REG_FIRST }, \ { "v0", 4 + GP_REG_FIRST }, \ - { "v1", 5 + GP_REG_FIRST } \ + { "v1", 5 + GP_REG_FIRST }, \ + { "vr0", 0 + FP_REG_FIRST }, \ + { "vr1", 1 + FP_REG_FIRST }, \ + { "vr2", 2 + FP_REG_FIRST }, \ + { "vr3", 3 + FP_REG_FIRST }, \ + { "vr4", 4 + FP_REG_FIRST }, \ + { "vr5", 5 + FP_REG_FIRST }, \ + { "vr6", 6 + FP_REG_FIRST }, \ + { "vr7", 7 + FP_REG_FIRST }, \ + { "vr8", 8 + FP_REG_FIRST }, \ + { "vr9", 9 + FP_REG_FIRST }, \ + { "vr10", 10 + FP_REG_FIRST }, \ + { "vr11", 11 + FP_REG_FIRST }, \ + { "vr12", 12 + FP_REG_FIRST }, \ + { "vr13", 13 + FP_REG_FIRST }, \ + { "vr14", 14 + FP_REG_FIRST }, \ + { "vr15", 15 + FP_REG_FIRST }, \ + { "vr16", 16 + FP_REG_FIRST }, \ + { "vr17", 17 + FP_REG_FIRST }, \ + { "vr18", 18 + FP_REG_FIRST }, \ + { "vr19", 19 + FP_REG_FIRST }, \ + { "vr20", 20 + FP_REG_FIRST }, \ + { "vr21", 21 + FP_REG_FIRST }, \ + { "vr22", 22 + FP_REG_FIRST }, \ + { "vr23", 23 + FP_REG_FIRST }, \ + { "vr24", 24 + FP_REG_FIRST }, \ + { "vr25", 25 + FP_REG_FIRST }, \ + { "vr26", 26 + FP_REG_FIRST }, \ + { "vr27", 27 + FP_REG_FIRST }, \ + { "vr28", 28 + FP_REG_FIRST }, \ + { "vr29", 29 + FP_REG_FIRST }, \ + { "vr30", 30 + FP_REG_FIRST }, \ + { "vr31", 31 + FP_REG_FIRST } \ } /* Globalizing directive for a label. */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index b37e070660f..7b8978e2533 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -158,11 +158,12 @@ (define_attr "move_type" const,signext,pick_ins,logical,arith,sll0,andi,shift_shift" (const_string "unknown")) -(define_attr "alu_type" "unknown,add,sub,not,nor,and,or,xor" +(define_attr "alu_type" "unknown,add,sub,not,nor,and,or,xor,simd_add" (const_string "unknown")) ;; Main data type used by the insn -(define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,SF,DF,TF,FCC" +(define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,SF,DF,TF,FCC, + V2DI,V4SI,V8HI,V16QI,V2DF,V4SF" (const_string "unknown")) ;; True if the main data type is twice the size of a word. @@ -234,7 +235,12 @@ (define_attr "type" prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical, shift,slt,signext,clz,trap,imul,idiv,move, fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,flogb,fneg,fcmp,fcopysign,fcvt, - fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost" + fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost, + simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd, + simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp, + simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill, + simd_permute,simd_shf,simd_sat,simd_pcnt,simd_copy,simd_branch,simd_clsx, + simd_fminmax,simd_logic,simd_move,simd_load,simd_store" (cond [(eq_attr "jirl" "!unset") (const_string "call") (eq_attr "got" "load") (const_string "load") @@ -414,11 +420,20 @@ (define_mode_attr ifmt [(SI "w") (DI "l")]) ;; This attribute gives the upper-case mode name for one unit of a ;; floating-point mode or vector mode. -(define_mode_attr UNITMODE [(SF "SF") (DF "DF")]) +(define_mode_attr UNITMODE [(SF "SF") (DF "DF") (V2SF "SF") (V4SF "SF") + (V16QI "QI") (V8HI "HI") (V4SI "SI") (V2DI "DI") + (V2DF "DF")]) + +;; As above, but in lower case. +(define_mode_attr unitmode [(SF "sf") (DF "df") (V2SF "sf") (V4SF "sf") + (V16QI "qi") (V8QI "qi") (V8HI "hi") (V4HI "hi") + (V4SI "si") (V2SI "si") (V2DI "di") (V2DF "df")]) ;; This attribute gives the integer mode that has half the size of ;; the controlling mode. -(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (TF "DI")]) +(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") + (V2SI "SI") (V4HI "SI") (V8QI "SI") + (TF "DI")]) ;; This attribute gives the integer mode that has the same size of a ;; floating-point mode. @@ -445,6 +460,18 @@ (define_code_iterator neg_bitwise [and ior]) ;; from the same template. (define_code_iterator any_div [div udiv mod umod]) +;; This code iterator allows addition and subtraction to be generated +;; from the same template. +(define_code_iterator addsub [plus minus]) + +;; This code iterator allows addition and multiplication to be generated +;; from the same template. +(define_code_iterator addmul [plus mult]) + +;; This code iterator allows addition subtraction and multiplication to be +;; generated from the same template +(define_code_iterator addsubmul [plus minus mult]) + ;; This code iterator allows all native floating-point comparisons to be ;; generated from the same template. (define_code_iterator fcond [unordered uneq unlt unle eq lt le @@ -684,7 +711,6 @@ (define_insn "sub3" [(set_attr "alu_type" "sub") (set_attr "mode" "")]) - (define_insn "*subsi3_extended" [(set (match_operand:DI 0 "register_operand" "= r") (sign_extend:DI @@ -1228,7 +1254,7 @@ (define_insn "smina3" "fmina.\t%0,%1,%2" [(set_attr "type" "fmove") (set_attr "mode" "")]) - + ;; ;; .................... ;; @@ -2541,7 +2567,6 @@ (define_insn "rotr3" [(set_attr "type" "shift,shift") (set_attr "mode" "")]) - ;; The following templates were added to generate "bstrpick.d + alsl.d" ;; instruction pairs. ;; It is required that the values of const_immalsl_operand and @@ -3606,6 +3631,9 @@ (define_insn "loongarch_crcc_w__w" (include "generic.md") (include "la464.md") +; The LoongArch SX Instructions. +(include "lsx.md") + (define_c_enum "unspec" [ UNSPEC_ADDRESS_FIRST ]) diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md new file mode 100644 index 00000000000..fb4d228ba84 --- /dev/null +++ b/gcc/config/loongarch/lsx.md @@ -0,0 +1,4467 @@ +;; Machine Description for LARCH Loongson SX ASE +;; +;; Copyright (C) 2018 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . +;; + +(define_c_enum "unspec" [ + UNSPEC_LSX_ABSD_S + UNSPEC_LSX_VABSD_U + UNSPEC_LSX_VAVG_S + UNSPEC_LSX_VAVG_U + UNSPEC_LSX_VAVGR_S + UNSPEC_LSX_VAVGR_U + UNSPEC_LSX_VBITCLR + UNSPEC_LSX_VBITCLRI + UNSPEC_LSX_VBITREV + UNSPEC_LSX_VBITREVI + UNSPEC_LSX_VBITSET + UNSPEC_LSX_VBITSETI + UNSPEC_LSX_BRANCH_V + UNSPEC_LSX_BRANCH + UNSPEC_LSX_VFCMP_CAF + UNSPEC_LSX_VFCLASS + UNSPEC_LSX_VFCMP_CUNE + UNSPEC_LSX_VFCVT + UNSPEC_LSX_VFCVTH + UNSPEC_LSX_VFCVTL + UNSPEC_LSX_VFLOGB + UNSPEC_LSX_VFRECIP + UNSPEC_LSX_VFRINT + UNSPEC_LSX_VFRSQRT + UNSPEC_LSX_VFCMP_SAF + UNSPEC_LSX_VFCMP_SEQ + UNSPEC_LSX_VFCMP_SLE + UNSPEC_LSX_VFCMP_SLT + UNSPEC_LSX_VFCMP_SNE + UNSPEC_LSX_VFCMP_SOR + UNSPEC_LSX_VFCMP_SUEQ + UNSPEC_LSX_VFCMP_SULE + UNSPEC_LSX_VFCMP_SULT + UNSPEC_LSX_VFCMP_SUN + UNSPEC_LSX_VFCMP_SUNE + UNSPEC_LSX_VFTINT_S + UNSPEC_LSX_VFTINT_U + UNSPEC_LSX_VSAT_S + UNSPEC_LSX_VSAT_U + UNSPEC_LSX_VREPLVEI + UNSPEC_LSX_VSRAR + UNSPEC_LSX_VSRARI + UNSPEC_LSX_VSRLR + UNSPEC_LSX_VSRLRI + UNSPEC_LSX_VSHUF + UNSPEC_LSX_VMUH_S + UNSPEC_LSX_VMUH_U + UNSPEC_LSX_VEXTW_S + UNSPEC_LSX_VEXTW_U + UNSPEC_LSX_VSLLWIL_S + UNSPEC_LSX_VSLLWIL_U + UNSPEC_LSX_VSRAN + UNSPEC_LSX_VSSRAN_S + UNSPEC_LSX_VSSRAN_U + UNSPEC_LSX_VSRAIN + UNSPEC_LSX_VSRAINS_S + UNSPEC_LSX_VSRAINS_U + UNSPEC_LSX_VSRARN + UNSPEC_LSX_VSRLN + UNSPEC_LSX_VSRLRN + UNSPEC_LSX_VSSRLRN_U + UNSPEC_LSX_VFRSTPI + UNSPEC_LSX_VFRSTP + UNSPEC_LSX_VSHUF4I + UNSPEC_LSX_VBSRL_V + UNSPEC_LSX_VBSLL_V + UNSPEC_LSX_VEXTRINS + UNSPEC_LSX_VMSKLTZ + UNSPEC_LSX_VSIGNCOV + UNSPEC_LSX_VFTINTRNE + UNSPEC_LSX_VFTINTRP + UNSPEC_LSX_VFTINTRM + UNSPEC_LSX_VFTINT_W_D + UNSPEC_LSX_VFFINT_S_L + UNSPEC_LSX_VFTINTRZ_W_D + UNSPEC_LSX_VFTINTRP_W_D + UNSPEC_LSX_VFTINTRM_W_D + UNSPEC_LSX_VFTINTRNE_W_D + UNSPEC_LSX_VFTINTL_L_S + UNSPEC_LSX_VFFINTH_D_W + UNSPEC_LSX_VFFINTL_D_W + UNSPEC_LSX_VFTINTRZL_L_S + UNSPEC_LSX_VFTINTRZH_L_S + UNSPEC_LSX_VFTINTRPL_L_S + UNSPEC_LSX_VFTINTRPH_L_S + UNSPEC_LSX_VFTINTRMH_L_S + UNSPEC_LSX_VFTINTRML_L_S + UNSPEC_LSX_VFTINTRNEL_L_S + UNSPEC_LSX_VFTINTRNEH_L_S + UNSPEC_LSX_VFTINTH_L_H + UNSPEC_LSX_VFRINTRNE_S + UNSPEC_LSX_VFRINTRNE_D + UNSPEC_LSX_VFRINTRZ_S + UNSPEC_LSX_VFRINTRZ_D + UNSPEC_LSX_VFRINTRP_S + UNSPEC_LSX_VFRINTRP_D + UNSPEC_LSX_VFRINTRM_S + UNSPEC_LSX_VFRINTRM_D + UNSPEC_LSX_VSSRARN_S + UNSPEC_LSX_VSSRARN_U + UNSPEC_LSX_VSSRLN_U + UNSPEC_LSX_VSSRLN + UNSPEC_LSX_VSSRLRN + UNSPEC_LSX_VLDI + UNSPEC_LSX_VSHUF_B + UNSPEC_LSX_VLDX + UNSPEC_LSX_VSTX + UNSPEC_LSX_VEXTL_QU_DU + UNSPEC_LSX_VSETEQZ_V + UNSPEC_LSX_VADDWEV + UNSPEC_LSX_VADDWEV2 + UNSPEC_LSX_VADDWEV3 + UNSPEC_LSX_VADDWOD + UNSPEC_LSX_VADDWOD2 + UNSPEC_LSX_VADDWOD3 + UNSPEC_LSX_VSUBWEV + UNSPEC_LSX_VSUBWEV2 + UNSPEC_LSX_VSUBWOD + UNSPEC_LSX_VSUBWOD2 + UNSPEC_LSX_VMULWEV + UNSPEC_LSX_VMULWEV2 + UNSPEC_LSX_VMULWEV3 + UNSPEC_LSX_VMULWOD + UNSPEC_LSX_VMULWOD2 + UNSPEC_LSX_VMULWOD3 + UNSPEC_LSX_VHADDW_Q_D + UNSPEC_LSX_VHADDW_QU_DU + UNSPEC_LSX_VHSUBW_Q_D + UNSPEC_LSX_VHSUBW_QU_DU + UNSPEC_LSX_VMADDWEV + UNSPEC_LSX_VMADDWEV2 + UNSPEC_LSX_VMADDWEV3 + UNSPEC_LSX_VMADDWOD + UNSPEC_LSX_VMADDWOD2 + UNSPEC_LSX_VMADDWOD3 + UNSPEC_LSX_VROTR + UNSPEC_LSX_VADD_Q + UNSPEC_LSX_VSUB_Q + UNSPEC_LSX_VEXTH_Q_D + UNSPEC_LSX_VEXTH_QU_DU + UNSPEC_LSX_VMSKGEZ + UNSPEC_LSX_VMSKNZ + UNSPEC_LSX_VEXTL_Q_D + UNSPEC_LSX_VSRLNI + UNSPEC_LSX_VSRLRNI + UNSPEC_LSX_VSSRLNI + UNSPEC_LSX_VSSRLNI2 + UNSPEC_LSX_VSSRLRNI + UNSPEC_LSX_VSSRLRNI2 + UNSPEC_LSX_VSRANI + UNSPEC_LSX_VSRARNI + UNSPEC_LSX_VSSRANI + UNSPEC_LSX_VSSRANI2 + UNSPEC_LSX_VSSRARNI + UNSPEC_LSX_VSSRARNI2 + UNSPEC_LSX_VPERMI +]) + +;; This attribute gives suffix for integers in VHMODE. +(define_mode_attr dlsxfmt + [(V2DI "q") + (V4SI "d") + (V8HI "w") + (V16QI "h")]) + +(define_mode_attr dlsxfmt_u + [(V2DI "qu") + (V4SI "du") + (V8HI "wu") + (V16QI "hu")]) + +(define_mode_attr d2lsxfmt + [(V4SI "q") + (V8HI "d") + (V16QI "w")]) + +(define_mode_attr d2lsxfmt_u + [(V4SI "qu") + (V8HI "du") + (V16QI "wu")]) + +;; The attribute gives two double modes for vector modes. +(define_mode_attr VD2MODE + [(V4SI "V2DI") + (V8HI "V2DI") + (V16QI "V4SI")]) + +;; All vector modes with 128 bits. +(define_mode_iterator LSX [V2DF V4SF V2DI V4SI V8HI V16QI]) + +;; Same as LSX. Used by vcond to iterate two modes. +(define_mode_iterator LSX_2 [V2DF V4SF V2DI V4SI V8HI V16QI]) + +;; Only used for vilvh and splitting insert_d and copy_{u,s}.d. +(define_mode_iterator LSX_D [V2DI V2DF]) + +;; Only used for copy_{u,s}.w and vilvh. +(define_mode_iterator LSX_W [V4SI V4SF]) + +;; Only integer modes. +(define_mode_iterator ILSX [V2DI V4SI V8HI V16QI]) + +;; As ILSX but excludes V16QI. +(define_mode_iterator ILSX_DWH [V2DI V4SI V8HI]) + +;; As LSX but excludes V16QI. +(define_mode_iterator LSX_DWH [V2DF V4SF V2DI V4SI V8HI]) + +;; As ILSX but excludes V2DI. +(define_mode_iterator ILSX_WHB [V4SI V8HI V16QI]) + +;; Only integer modes equal or larger than a word. +(define_mode_iterator ILSX_DW [V2DI V4SI]) + +;; Only integer modes smaller than a word. +(define_mode_iterator ILSX_HB [V8HI V16QI]) + +;;;; Only integer modes for fixed-point madd_q/maddr_q. +;;(define_mode_iterator ILSX_WH [V4SI V8HI]) + +;; Only floating-point modes. +(define_mode_iterator FLSX [V2DF V4SF]) + +;; Only used for immediate set shuffle elements instruction. +(define_mode_iterator LSX_WHB_W [V4SI V8HI V16QI V4SF]) + +;; The attribute gives the integer vector mode with same size. +(define_mode_attr VIMODE + [(V2DF "V2DI") + (V4SF "V4SI") + (V2DI "V2DI") + (V4SI "V4SI") + (V8HI "V8HI") + (V16QI "V16QI")]) + +;; The attribute gives half modes for vector modes. +(define_mode_attr VHMODE + [(V8HI "V16QI") + (V4SI "V8HI") + (V2DI "V4SI")]) + +;; The attribute gives double modes for vector modes. +(define_mode_attr VDMODE + [(V2DI "V2DI") + (V4SI "V2DI") + (V8HI "V4SI") + (V16QI "V8HI")]) + +;; The attribute gives half modes with same number of elements for vector modes. +(define_mode_attr VTRUNCMODE + [(V8HI "V8QI") + (V4SI "V4HI") + (V2DI "V2SI")]) + +;; Double-sized Vector MODE with same elemet type. "Vector, Enlarged-MODE" +(define_mode_attr VEMODE + [(V4SF "V8SF") + (V4SI "V8SI") + (V2DI "V4DI") + (V2DF "V4DF")]) + +;; This attribute gives the mode of the result for "vpickve2gr_b, copy_u_b" etc. +(define_mode_attr VRES + [(V2DF "DF") + (V4SF "SF") + (V2DI "DI") + (V4SI "SI") + (V8HI "SI") + (V16QI "SI")]) + +;; Only used with LSX_D iterator. +(define_mode_attr lsx_d + [(V2DI "reg_or_0") + (V2DF "register")]) + +;; This attribute gives the integer vector mode with same size. +(define_mode_attr mode_i + [(V2DF "v2di") + (V4SF "v4si") + (V2DI "v2di") + (V4SI "v4si") + (V8HI "v8hi") + (V16QI "v16qi")]) + +;; This attribute gives suffix for LSX instructions. +(define_mode_attr lsxfmt + [(V2DF "d") + (V4SF "w") + (V2DI "d") + (V4SI "w") + (V8HI "h") + (V16QI "b")]) + +;; This attribute gives suffix for LSX instructions. +(define_mode_attr lsxfmt_u + [(V2DF "du") + (V4SF "wu") + (V2DI "du") + (V4SI "wu") + (V8HI "hu") + (V16QI "bu")]) + +;; This attribute gives suffix for integers in VHMODE. +(define_mode_attr hlsxfmt + [(V2DI "w") + (V4SI "h") + (V8HI "b")]) + +;; This attribute gives suffix for integers in VHMODE. +(define_mode_attr hlsxfmt_u + [(V2DI "wu") + (V4SI "hu") + (V8HI "bu")]) + +;; This attribute gives define_insn suffix for LSX instructions that need +;; distinction between integer and floating point. +(define_mode_attr lsxfmt_f + [(V2DF "d_f") + (V4SF "w_f") + (V2DI "d") + (V4SI "w") + (V8HI "h") + (V16QI "b")]) + +(define_mode_attr flsxfmt_f + [(V2DF "d_f") + (V4SF "s_f") + (V2DI "d") + (V4SI "w") + (V8HI "h") + (V16QI "b")]) + +(define_mode_attr flsxfmt + [(V2DF "d") + (V4SF "s") + (V2DI "d") + (V4SI "s")]) + +(define_mode_attr flsxfrint + [(V2DF "d") + (V4SF "s")]) + +(define_mode_attr ilsxfmt + [(V2DF "l") + (V4SF "w")]) + +(define_mode_attr ilsxfmt_u + [(V2DF "lu") + (V4SF "wu")]) + +;; This is used to form an immediate operand constraint using +;; "const__operand". +(define_mode_attr indeximm + [(V2DF "0_or_1") + (V4SF "0_to_3") + (V2DI "0_or_1") + (V4SI "0_to_3") + (V8HI "uimm3") + (V16QI "uimm4")]) + +;; This attribute represents bitmask needed for vec_merge using +;; "const__operand". +(define_mode_attr bitmask + [(V2DF "exp_2") + (V4SF "exp_4") + (V2DI "exp_2") + (V4SI "exp_4") + (V8HI "exp_8") + (V16QI "exp_16")]) + +;; This attribute is used to form an immediate operand constraint using +;; "const__operand". +(define_mode_attr bitimm + [(V16QI "uimm3") + (V8HI "uimm4") + (V4SI "uimm5") + (V2DI "uimm6")]) + + +(define_int_iterator FRINT_S [UNSPEC_LSX_VFRINTRP_S + UNSPEC_LSX_VFRINTRZ_S + UNSPEC_LSX_VFRINT + UNSPEC_LSX_VFRINTRM_S]) + +(define_int_iterator FRINT_D [UNSPEC_LSX_VFRINTRP_D + UNSPEC_LSX_VFRINTRZ_D + UNSPEC_LSX_VFRINT + UNSPEC_LSX_VFRINTRM_D]) + +(define_int_attr frint_pattern_s + [(UNSPEC_LSX_VFRINTRP_S "ceil") + (UNSPEC_LSX_VFRINTRZ_S "btrunc") + (UNSPEC_LSX_VFRINT "rint") + (UNSPEC_LSX_VFRINTRM_S "floor")]) + +(define_int_attr frint_pattern_d + [(UNSPEC_LSX_VFRINTRP_D "ceil") + (UNSPEC_LSX_VFRINTRZ_D "btrunc") + (UNSPEC_LSX_VFRINT "rint") + (UNSPEC_LSX_VFRINTRM_D "floor")]) + +(define_int_attr frint_suffix + [(UNSPEC_LSX_VFRINTRP_S "rp") + (UNSPEC_LSX_VFRINTRP_D "rp") + (UNSPEC_LSX_VFRINTRZ_S "rz") + (UNSPEC_LSX_VFRINTRZ_D "rz") + (UNSPEC_LSX_VFRINT "") + (UNSPEC_LSX_VFRINTRM_S "rm") + (UNSPEC_LSX_VFRINTRM_D "rm")]) + +(define_expand "vec_init" + [(match_operand:LSX 0 "register_operand") + (match_operand:LSX 1 "")] + "ISA_HAS_LSX" +{ + loongarch_expand_vector_init (operands[0], operands[1]); + DONE; +}) + +;; vpickev pattern with implicit type conversion. +(define_insn "vec_pack_trunc_" + [(set (match_operand: 0 "register_operand" "=f") + (vec_concat: + (truncate: + (match_operand:ILSX_DWH 1 "register_operand" "f")) + (truncate: + (match_operand:ILSX_DWH 2 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vpickev.\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "")]) + +(define_expand "vec_unpacks_hi_v4sf" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (float_extend:V2DF + (vec_select:V2SF + (match_operand:V4SF 1 "register_operand" "f") + (match_dup 2))))] + "ISA_HAS_LSX" +{ + operands[2] = loongarch_lsx_vec_parallel_const_half (V4SFmode, + true/*high_p*/); +}) + +(define_expand "vec_unpacks_lo_v4sf" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (float_extend:V2DF + (vec_select:V2SF + (match_operand:V4SF 1 "register_operand" "f") + (match_dup 2))))] + "ISA_HAS_LSX" +{ + operands[2] = loongarch_lsx_vec_parallel_const_half (V4SFmode, + false/*high_p*/); +}) + +(define_expand "vec_unpacks_hi_" + [(match_operand: 0 "register_operand") + (match_operand:ILSX_WHB 1 "register_operand")] + "ISA_HAS_LSX" +{ + loongarch_expand_vec_unpack (operands, false/*unsigned_p*/, true/*high_p*/); + DONE; +}) + +(define_expand "vec_unpacks_lo_" + [(match_operand: 0 "register_operand") + (match_operand:ILSX_WHB 1 "register_operand")] + "ISA_HAS_LSX" +{ + loongarch_expand_vec_unpack (operands, false/*unsigned_p*/, false/*high_p*/); + DONE; +}) + +(define_expand "vec_unpacku_hi_" + [(match_operand: 0 "register_operand") + (match_operand:ILSX_WHB 1 "register_operand")] + "ISA_HAS_LSX" +{ + loongarch_expand_vec_unpack (operands, true/*unsigned_p*/, true/*high_p*/); + DONE; +}) + +(define_expand "vec_unpacku_lo_" + [(match_operand: 0 "register_operand") + (match_operand:ILSX_WHB 1 "register_operand")] + "ISA_HAS_LSX" +{ + loongarch_expand_vec_unpack (operands, true/*unsigned_p*/, false/*high_p*/); + DONE; +}) + +(define_expand "vec_extract" + [(match_operand: 0 "register_operand") + (match_operand:ILSX 1 "register_operand") + (match_operand 2 "const__operand")] + "ISA_HAS_LSX" +{ + if (mode == QImode || mode == HImode) + { + rtx dest1 = gen_reg_rtx (SImode); + emit_insn (gen_lsx_vpickve2gr_ (dest1, operands[1], operands[2])); + emit_move_insn (operands[0], + gen_lowpart (mode, dest1)); + } + else + emit_insn (gen_lsx_vpickve2gr_ (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_extract" + [(match_operand: 0 "register_operand") + (match_operand:FLSX 1 "register_operand") + (match_operand 2 "const__operand")] + "ISA_HAS_LSX" +{ + rtx temp; + HOST_WIDE_INT val = INTVAL (operands[2]); + + if (val == 0) + temp = operands[1]; + else + { + rtx n = GEN_INT (val * GET_MODE_SIZE (mode)); + temp = gen_reg_rtx (mode); + emit_insn (gen_lsx_vbsrl_ (temp, operands[1], n)); + } + emit_insn (gen_lsx_vec_extract_ (operands[0], temp)); + DONE; +}) + +(define_insn_and_split "lsx_vec_extract_" + [(set (match_operand: 0 "register_operand" "=f") + (vec_select: + (match_operand:FLSX 1 "register_operand" "f") + (parallel [(const_int 0)])))] + "ISA_HAS_LSX" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 1))] +{ + operands[1] = gen_rtx_REG (mode, REGNO (operands[1])); +} + [(set_attr "move_type" "fmove") + (set_attr "mode" "")]) + +(define_expand "vec_set" + [(match_operand:ILSX 0 "register_operand") + (match_operand: 1 "reg_or_0_operand") + (match_operand 2 "const__operand")] + "ISA_HAS_LSX" +{ + rtx index = GEN_INT (1 << INTVAL (operands[2])); + emit_insn (gen_lsx_vinsgr2vr_ (operands[0], operands[1], + operands[0], index)); + DONE; +}) + +(define_expand "vec_set" + [(match_operand:FLSX 0 "register_operand") + (match_operand: 1 "register_operand") + (match_operand 2 "const__operand")] + "ISA_HAS_LSX" +{ + rtx index = GEN_INT (1 << INTVAL (operands[2])); + emit_insn (gen_lsx_vextrins__scalar (operands[0], operands[1], + operands[0], index)); + DONE; +}) + +(define_expand "vec_cmp" + [(set (match_operand: 0 "register_operand") + (match_operator 1 "" + [(match_operand:LSX 2 "register_operand") + (match_operand:LSX 3 "register_operand")]))] + "ISA_HAS_LSX" +{ + bool ok = loongarch_expand_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu" + [(set (match_operand: 0 "register_operand") + (match_operator 1 "" + [(match_operand:ILSX 2 "register_operand") + (match_operand:ILSX 3 "register_operand")]))] + "ISA_HAS_LSX" +{ + bool ok = loongarch_expand_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vcondu" + [(match_operand:LSX 0 "register_operand") + (match_operand:LSX 1 "reg_or_m1_operand") + (match_operand:LSX 2 "reg_or_0_operand") + (match_operator 3 "" + [(match_operand:ILSX 4 "register_operand") + (match_operand:ILSX 5 "register_operand")])] + "ISA_HAS_LSX + && (GET_MODE_NUNITS (mode) == GET_MODE_NUNITS (mode))" +{ + loongarch_expand_vec_cond_expr (mode, mode, operands); + DONE; +}) + +(define_expand "vcond" + [(match_operand:LSX 0 "register_operand") + (match_operand:LSX 1 "reg_or_m1_operand") + (match_operand:LSX 2 "reg_or_0_operand") + (match_operator 3 "" + [(match_operand:LSX_2 4 "register_operand") + (match_operand:LSX_2 5 "register_operand")])] + "ISA_HAS_LSX + && (GET_MODE_NUNITS (mode) == GET_MODE_NUNITS (mode))" +{ + loongarch_expand_vec_cond_expr (mode, mode, operands); + DONE; +}) + +(define_expand "vcond_mask_" + [(match_operand:ILSX 0 "register_operand") + (match_operand:ILSX 1 "reg_or_m1_operand") + (match_operand:ILSX 2 "reg_or_0_operand") + (match_operand:ILSX 3 "register_operand")] + "ISA_HAS_LSX" +{ + loongarch_expand_vec_cond_mask_expr (mode, + mode, operands); + DONE; +}) + +(define_insn "lsx_vinsgr2vr_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (vec_merge:ILSX + (vec_duplicate:ILSX + (match_operand: 1 "reg_or_0_operand" "rJ")) + (match_operand:ILSX 2 "register_operand" "0") + (match_operand 3 "const__operand" "")))] + "ISA_HAS_LSX" +{ + if (!TARGET_64BIT && (mode == V2DImode || mode == V2DFmode)) + return "#"; + else + return "vinsgr2vr.\t%w0,%z1,%y3"; +} + [(set_attr "type" "simd_insert") + (set_attr "mode" "")]) + +(define_split + [(set (match_operand:LSX_D 0 "register_operand") + (vec_merge:LSX_D + (vec_duplicate:LSX_D + (match_operand: 1 "_operand")) + (match_operand:LSX_D 2 "register_operand") + (match_operand 3 "const__operand")))] + "reload_completed && ISA_HAS_LSX && !TARGET_64BIT" + [(const_int 0)] +{ + loongarch_split_lsx_insert_d (operands[0], operands[2], operands[3], operands[1]); + DONE; +}) + +(define_insn "lsx_vextrins__internal" + [(set (match_operand:LSX 0 "register_operand" "=f") + (vec_merge:LSX + (vec_duplicate:LSX + (vec_select: + (match_operand:LSX 1 "register_operand" "f") + (parallel [(const_int 0)]))) + (match_operand:LSX 2 "register_operand" "0") + (match_operand 3 "const__operand" "")))] + "ISA_HAS_LSX" + "vextrins.\t%w0,%w1,%y3<<4" + [(set_attr "type" "simd_insert") + (set_attr "mode" "")]) + +;; Operand 3 is a scalar. +(define_insn "lsx_vextrins__scalar" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (vec_merge:FLSX + (vec_duplicate:FLSX + (match_operand: 1 "register_operand" "f")) + (match_operand:FLSX 2 "register_operand" "0") + (match_operand 3 "const__operand" "")))] + "ISA_HAS_LSX" + "vextrins.\t%w0,%w1,%y3<<4" + [(set_attr "type" "simd_insert") + (set_attr "mode" "")]) + +(define_insn "lsx_vpickve2gr_" + [(set (match_operand: 0 "register_operand" "=r") + (any_extend: + (vec_select: + (match_operand:ILSX_HB 1 "register_operand" "f") + (parallel [(match_operand 2 "const__operand" "")]))))] + "ISA_HAS_LSX" + "vpickve2gr.\t%0,%w1,%2" + [(set_attr "type" "simd_copy") + (set_attr "mode" "")]) + +(define_insn "lsx_vpickve2gr_" + [(set (match_operand: 0 "register_operand" "=r") + (any_extend: + (vec_select: + (match_operand:LSX_W 1 "register_operand" "f") + (parallel [(match_operand 2 "const__operand" "")]))))] + "ISA_HAS_LSX" + "vpickve2gr.\t%0,%w1,%2" + [(set_attr "type" "simd_copy") + (set_attr "mode" "")]) + +(define_insn_and_split "lsx_vpickve2gr_du" + [(set (match_operand:DI 0 "register_operand" "=r") + (vec_select:DI + (match_operand:V2DI 1 "register_operand" "f") + (parallel [(match_operand 2 "const_0_or_1_operand" "")])))] + "ISA_HAS_LSX" +{ + if (TARGET_64BIT) + return "vpickve2gr.du\t%0,%w1,%2"; + else + return "#"; +} + "reload_completed && ISA_HAS_LSX && !TARGET_64BIT" + [(const_int 0)] +{ + loongarch_split_lsx_copy_d (operands[0], operands[1], operands[2], + gen_lsx_vpickve2gr_wu); + DONE; +} + [(set_attr "type" "simd_copy") + (set_attr "mode" "V2DI")]) + +(define_insn_and_split "lsx_vpickve2gr_" + [(set (match_operand: 0 "register_operand" "=r") + (vec_select: + (match_operand:LSX_D 1 "register_operand" "f") + (parallel [(match_operand 2 "const__operand" "")])))] + "ISA_HAS_LSX" +{ + if (TARGET_64BIT) + return "vpickve2gr.\t%0,%w1,%2"; + else + return "#"; +} + "reload_completed && ISA_HAS_LSX && !TARGET_64BIT" + [(const_int 0)] +{ + loongarch_split_lsx_copy_d (operands[0], operands[1], operands[2], + gen_lsx_vpickve2gr_w); + DONE; +} + [(set_attr "type" "simd_copy") + (set_attr "mode" "")]) + + +(define_expand "abs2" + [(match_operand:ILSX 0 "register_operand" "=f") + (abs:ILSX (match_operand:ILSX 1 "register_operand" "f"))] + "ISA_HAS_LSX" +{ + if (ISA_HAS_LSX) + { + emit_insn (gen_vabs2 (operands[0], operands[1])); + DONE; + } + else + { + rtx reg = gen_reg_rtx (mode); + emit_move_insn (reg, CONST0_RTX (mode)); + emit_insn (gen_lsx_vadda_ (operands[0], operands[1], reg)); + DONE; + } +}) + +(define_expand "neg2" + [(set (match_operand:ILSX 0 "register_operand") + (neg:ILSX (match_operand:ILSX 1 "register_operand")))] + "ISA_HAS_LSX" +{ + emit_insn (gen_vneg2 (operands[0], operands[1])); + DONE; +}) + +(define_expand "neg2" + [(set (match_operand:FLSX 0 "register_operand") + (neg:FLSX (match_operand:FLSX 1 "register_operand")))] + "ISA_HAS_LSX" +{ + rtx reg = gen_reg_rtx (mode); + emit_move_insn (reg, CONST0_RTX (mode)); + emit_insn (gen_sub3 (operands[0], reg, operands[1])); + DONE; +}) + +(define_expand "lsx_vrepli" + [(match_operand:ILSX 0 "register_operand") + (match_operand 1 "const_imm10_operand")] + "ISA_HAS_LSX" +{ + if (mode == V16QImode) + operands[1] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]), + mode)); + emit_move_insn (operands[0], + loongarch_gen_const_int_vector (mode, INTVAL (operands[1]))); + DONE; +}) + +(define_expand "vec_perm" + [(match_operand:LSX 0 "register_operand") + (match_operand:LSX 1 "register_operand") + (match_operand:LSX 2 "register_operand") + (match_operand:LSX 3 "register_operand")] + "ISA_HAS_LSX" +{ + loongarch_expand_vec_perm (operands[0], operands[1], + operands[2], operands[3]); + DONE; +}) + +(define_insn "lsx_vshuf_" + [(set (match_operand:LSX_DWH 0 "register_operand" "=f") + (unspec:LSX_DWH [(match_operand:LSX_DWH 1 "register_operand" "0") + (match_operand:LSX_DWH 2 "register_operand" "f") + (match_operand:LSX_DWH 3 "register_operand" "f")] + UNSPEC_LSX_VSHUF))] + "ISA_HAS_LSX" + "vshuf.\t%w0,%w2,%w3" + [(set_attr "type" "simd_sld") + (set_attr "mode" "")]) + +(define_expand "mov" + [(set (match_operand:LSX 0) + (match_operand:LSX 1))] + "ISA_HAS_LSX" +{ + if (loongarch_legitimize_move (mode, operands[0], operands[1])) + DONE; +}) + +(define_expand "movmisalign" + [(set (match_operand:LSX 0) + (match_operand:LSX 1))] + "ISA_HAS_LSX" +{ + if (loongarch_legitimize_move (mode, operands[0], operands[1])) + DONE; +}) + +(define_insn "mov_lsx" + [(set (match_operand:LSX 0 "nonimmediate_operand" "=f,f,R,*r,*f") + (match_operand:LSX 1 "move_operand" "fYGYI,R,f,*f,*r"))] + "ISA_HAS_LSX" +{ return loongarch_output_move (operands[0], operands[1]); } + [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert") + (set_attr "mode" "")]) + +(define_split + [(set (match_operand:LSX 0 "nonimmediate_operand") + (match_operand:LSX 1 "move_operand"))] + "reload_completed && ISA_HAS_LSX + && loongarch_split_move_insn_p (operands[0], operands[1])" + [(const_int 0)] +{ + loongarch_split_move_insn (operands[0], operands[1], curr_insn); + DONE; +}) + +;; Offset load +(define_expand "lsx_ld_" + [(match_operand:LSX 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq10_operand")] + "ISA_HAS_LSX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (operands[0], gen_rtx_MEM (mode, addr)); + DONE; +}) + +;; Offset store +(define_expand "lsx_st_" + [(match_operand:LSX 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq10_operand")] + "ISA_HAS_LSX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (gen_rtx_MEM (mode, addr), operands[0]); + DONE; +}) + +;; Integer operations +(define_insn "add3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f,f") + (plus:ILSX + (match_operand:ILSX 1 "register_operand" "f,f,f") + (match_operand:ILSX 2 "reg_or_vector_same_ximm5_operand" "f,Unv5,Uuv5")))] + "ISA_HAS_LSX" +{ + switch (which_alternative) + { + case 0: + return "vadd.\t%w0,%w1,%w2"; + case 1: + { + HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0)); + + operands[2] = GEN_INT (-val); + return "vsubi.\t%w0,%w1,%d2"; + } + case 2: + return "vaddi.\t%w0,%w1,%E2"; + default: + gcc_unreachable (); + } +} + [(set_attr "alu_type" "simd_add") + (set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "sub3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (minus:ILSX + (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_uimm5_operand" "f,Uuv5")))] + "ISA_HAS_LSX" + "@ + vsub.\t%w0,%w1,%w2 + vsubi.\t%w0,%w1,%E2" + [(set_attr "alu_type" "simd_add") + (set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "mul3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (mult:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vmul.\t%w0,%w1,%w2" + [(set_attr "type" "simd_mul") + (set_attr "mode" "")]) + +(define_insn "lsx_vmadd_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (plus:ILSX (mult:ILSX (match_operand:ILSX 2 "register_operand" "f") + (match_operand:ILSX 3 "register_operand" "f")) + (match_operand:ILSX 1 "register_operand" "0")))] + "ISA_HAS_LSX" + "vmadd.\t%w0,%w2,%w3" + [(set_attr "type" "simd_mul") + (set_attr "mode" "")]) + +(define_insn "lsx_vmsub_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (minus:ILSX (match_operand:ILSX 1 "register_operand" "0") + (mult:ILSX (match_operand:ILSX 2 "register_operand" "f") + (match_operand:ILSX 3 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vmsub.\t%w0,%w2,%w3" + [(set_attr "type" "simd_mul") + (set_attr "mode" "")]) + +(define_insn "div3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (div:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" +{ return loongarch_lsx_output_division ("vdiv.\t%w0,%w1,%w2", operands); } + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "udiv3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (udiv:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" +{ return loongarch_lsx_output_division ("vdiv.\t%w0,%w1,%w2", operands); } + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "mod3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (mod:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" +{ return loongarch_lsx_output_division ("vmod.\t%w0,%w1,%w2", operands); } + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "umod3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (umod:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" +{ return loongarch_lsx_output_division ("vmod.\t%w0,%w1,%w2", operands); } + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "xor3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f,f") + (xor:ILSX + (match_operand:ILSX 1 "register_operand" "f,f,f") + (match_operand:ILSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))] + "ISA_HAS_LSX" + "@ + vxor.v\t%w0,%w1,%w2 + vbitrevi.%v0\t%w0,%w1,%V2 + vxori.b\t%w0,%w1,%B2" + [(set_attr "type" "simd_logic,simd_bit,simd_logic") + (set_attr "mode" "")]) + +(define_insn "ior3" + [(set (match_operand:LSX 0 "register_operand" "=f,f,f") + (ior:LSX + (match_operand:LSX 1 "register_operand" "f,f,f") + (match_operand:LSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))] + "ISA_HAS_LSX" + "@ + vor.v\t%w0,%w1,%w2 + vbitseti.%v0\t%w0,%w1,%V2 + vori.b\t%w0,%w1,%B2" + [(set_attr "type" "simd_logic,simd_bit,simd_logic") + (set_attr "mode" "")]) + +(define_insn "and3" + [(set (match_operand:LSX 0 "register_operand" "=f,f,f") + (and:LSX + (match_operand:LSX 1 "register_operand" "f,f,f") + (match_operand:LSX 2 "reg_or_vector_same_val_operand" "f,YZ,Urv8")))] + "ISA_HAS_LSX" +{ + switch (which_alternative) + { + case 0: + return "vand.v\t%w0,%w1,%w2"; + case 1: + { + rtx elt0 = CONST_VECTOR_ELT (operands[2], 0); + unsigned HOST_WIDE_INT val = ~UINTVAL (elt0); + operands[2] = loongarch_gen_const_int_vector (mode, val & (-val)); + return "vbitclri.%v0\t%w0,%w1,%V2"; + } + case 2: + return "vandi.b\t%w0,%w1,%B2"; + default: + gcc_unreachable (); + } +} + [(set_attr "type" "simd_logic,simd_bit,simd_logic") + (set_attr "mode" "")]) + +(define_insn "one_cmpl2" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (not:ILSX (match_operand:ILSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vnor.v\t%w0,%w1,%w1" + [(set_attr "type" "simd_logic") + (set_attr "mode" "TI")]) + +(define_insn "vlshr3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (lshiftrt:ILSX + (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + "ISA_HAS_LSX" + "@ + vsrl.\t%w0,%w1,%w2 + vsrli.\t%w0,%w1,%E2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "vashr3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (ashiftrt:ILSX + (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + "ISA_HAS_LSX" + "@ + vsra.\t%w0,%w1,%w2 + vsrai.\t%w0,%w1,%E2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "vashl3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (ashift:ILSX + (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + "ISA_HAS_LSX" + "@ + vsll.\t%w0,%w1,%w2 + vslli.\t%w0,%w1,%E2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +;; Floating-point operations +(define_insn "add3" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (plus:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfadd.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fadd") + (set_attr "mode" "")]) + +(define_insn "sub3" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (minus:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfsub.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fadd") + (set_attr "mode" "")]) + +(define_insn "mul3" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (mult:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfmul.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fmul") + (set_attr "mode" "")]) + +(define_insn "div3" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (div:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfdiv.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +(define_insn "fma4" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (fma:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f") + (match_operand:FLSX 3 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfmadd.\t%w0,%w1,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "fnma4" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (fma:FLSX (neg:FLSX (match_operand:FLSX 1 "register_operand" "f")) + (match_operand:FLSX 2 "register_operand" "f") + (match_operand:FLSX 3 "register_operand" "0")))] + "ISA_HAS_LSX" + "vfnmsub.\t%w0,%w1,%w2,%w0" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "sqrt2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (sqrt:FLSX (match_operand:FLSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfsqrt.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +;; Built-in functions +(define_insn "lsx_vadda_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (plus:ILSX (abs:ILSX (match_operand:ILSX 1 "register_operand" "f")) + (abs:ILSX (match_operand:ILSX 2 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vadda.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "ssadd3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (ss_plus:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vsadd.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "usadd3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (us_plus:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vsadd.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vabsd_s_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_ABSD_S))] + "ISA_HAS_LSX" + "vabsd.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vabsd_u_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VABSD_U))] + "ISA_HAS_LSX" + "vabsd.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vavg_s_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VAVG_S))] + "ISA_HAS_LSX" + "vavg.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vavg_u_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VAVG_U))] + "ISA_HAS_LSX" + "vavg.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vavgr_s_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VAVGR_S))] + "ISA_HAS_LSX" + "vavgr.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vavgr_u_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VAVGR_U))] + "ISA_HAS_LSX" + "vavgr.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vbitclr_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VBITCLR))] + "ISA_HAS_LSX" + "vbitclr.\t%w0,%w1,%w2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lsx_vbitclri_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VBITCLRI))] + "ISA_HAS_LSX" + "vbitclri.\t%w0,%w1,%2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lsx_vbitrev_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VBITREV))] + "ISA_HAS_LSX" + "vbitrev.\t%w0,%w1,%w2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lsx_vbitrevi_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const_lsx_branch_operand" "")] + UNSPEC_LSX_VBITREVI))] + "ISA_HAS_LSX" + "vbitrevi.\t%w0,%w1,%2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lsx_vbitsel_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (ior:ILSX (and:ILSX (not:ILSX + (match_operand:ILSX 3 "register_operand" "f")) + (match_operand:ILSX 1 "register_operand" "f")) + (and:ILSX (match_dup 3) + (match_operand:ILSX 2 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vbitsel.v\t%w0,%w1,%w2,%w3" + [(set_attr "type" "simd_bitmov") + (set_attr "mode" "")]) + +(define_insn "lsx_vbitseli_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (ior:V16QI (and:V16QI (not:V16QI + (match_operand:V16QI 1 "register_operand" "0")) + (match_operand:V16QI 2 "register_operand" "f")) + (and:V16QI (match_dup 1) + (match_operand:V16QI 3 "const_vector_same_val_operand" "Urv8"))))] + "ISA_HAS_LSX" + "vbitseli.b\t%w0,%w2,%B3" + [(set_attr "type" "simd_bitmov") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vbitset_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VBITSET))] + "ISA_HAS_LSX" + "vbitset.\t%w0,%w1,%w2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lsx_vbitseti_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VBITSETI))] + "ISA_HAS_LSX" + "vbitseti.\t%w0,%w1,%2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_code_iterator ICC [eq le leu lt ltu]) + +(define_code_attr icc + [(eq "eq") + (le "le") + (leu "le") + (lt "lt") + (ltu "lt")]) + +(define_code_attr icci + [(eq "eqi") + (le "lei") + (leu "lei") + (lt "lti") + (ltu "lti")]) + +(define_code_attr cmpi + [(eq "s") + (le "s") + (leu "u") + (lt "s") + (ltu "u")]) + +(define_code_attr cmpi_1 + [(eq "") + (le "") + (leu "u") + (lt "") + (ltu "u")]) + +(define_insn "lsx_vs_" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (ICC:ILSX + (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_imm5_operand" "f,Uv5")))] + "ISA_HAS_LSX" + "@ + vs.\t%w0,%w1,%w2 + vs.\t%w0,%w1,%E2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vfclass_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFCLASS))] + "ISA_HAS_LSX" + "vfclass.\t%w0,%w1" + [(set_attr "type" "simd_fclass") + (set_attr "mode" "")]) + +(define_insn "lsx_vfcmp_caf_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")] + UNSPEC_LSX_VFCMP_CAF))] + "ISA_HAS_LSX" + "vfcmp.caf.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + +(define_insn "lsx_vfcmp_cune_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")] + UNSPEC_LSX_VFCMP_CUNE))] + "ISA_HAS_LSX" + "vfcmp.cune.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + +(define_code_iterator vfcond [unordered ordered eq ne le lt uneq unle unlt]) + +(define_code_attr fcc + [(unordered "cun") + (ordered "cor") + (eq "ceq") + (ne "cne") + (uneq "cueq") + (unle "cule") + (unlt "cult") + (le "cle") + (lt "clt")]) + +(define_int_iterator FSC_UNS [UNSPEC_LSX_VFCMP_SAF UNSPEC_LSX_VFCMP_SUN UNSPEC_LSX_VFCMP_SOR + UNSPEC_LSX_VFCMP_SEQ UNSPEC_LSX_VFCMP_SNE UNSPEC_LSX_VFCMP_SUEQ + UNSPEC_LSX_VFCMP_SUNE UNSPEC_LSX_VFCMP_SULE UNSPEC_LSX_VFCMP_SULT + UNSPEC_LSX_VFCMP_SLE UNSPEC_LSX_VFCMP_SLT]) + +(define_int_attr fsc + [(UNSPEC_LSX_VFCMP_SAF "saf") + (UNSPEC_LSX_VFCMP_SUN "sun") + (UNSPEC_LSX_VFCMP_SOR "sor") + (UNSPEC_LSX_VFCMP_SEQ "seq") + (UNSPEC_LSX_VFCMP_SNE "sne") + (UNSPEC_LSX_VFCMP_SUEQ "sueq") + (UNSPEC_LSX_VFCMP_SUNE "sune") + (UNSPEC_LSX_VFCMP_SULE "sule") + (UNSPEC_LSX_VFCMP_SULT "sult") + (UNSPEC_LSX_VFCMP_SLE "sle") + (UNSPEC_LSX_VFCMP_SLT "slt")]) + +(define_insn "lsx_vfcmp__" + [(set (match_operand: 0 "register_operand" "=f") + (vfcond: (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfcmp..\t%w0,%w1,%w2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + +(define_insn "lsx_vfcmp__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")] + FSC_UNS))] + "ISA_HAS_LSX" + "vfcmp..\t%w0,%w1,%w2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + +(define_mode_attr fint + [(V4SF "v4si") + (V2DF "v2di")]) + +(define_mode_attr FINTCNV + [(V4SF "I2S") + (V2DF "I2D")]) + +(define_mode_attr FINTCNV_2 + [(V4SF "S2I") + (V2DF "D2I")]) + +(define_insn "float2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (float:FLSX (match_operand: 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vffint..\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "floatuns2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unsigned_float:FLSX + (match_operand: 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vffint..\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_mode_attr FFQ + [(V4SF "V8HI") + (V2DF "V4SI")]) + +(define_insn "lsx_vreplgr2vr_" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (vec_duplicate:ILSX + (match_operand: 1 "reg_or_0_operand" "r,J")))] + "ISA_HAS_LSX" +{ + if (which_alternative == 1) + return "ldi.\t%w0,0"; + + if (!TARGET_64BIT && (mode == V2DImode || mode == V2DFmode)) + return "#"; + else + return "vreplgr2vr.\t%w0,%z1"; +} + [(set_attr "type" "simd_fill") + (set_attr "mode" "")]) + +(define_split + [(set (match_operand:LSX_D 0 "register_operand") + (vec_duplicate:LSX_D + (match_operand: 1 "register_operand")))] + "reload_completed && ISA_HAS_LSX && !TARGET_64BIT" + [(const_int 0)] +{ + loongarch_split_lsx_fill_d (operands[0], operands[1]); + DONE; +}) + +(define_insn "logb2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFLOGB))] + "ISA_HAS_LSX" + "vflogb.\t%w0,%w1" + [(set_attr "type" "simd_flog2") + (set_attr "mode" "")]) + +(define_insn "smax3" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (smax:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfmax.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "lsx_vfmaxa_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (if_then_else:FLSX + (gt (abs:FLSX (match_operand:FLSX 1 "register_operand" "f")) + (abs:FLSX (match_operand:FLSX 2 "register_operand" "f"))) + (match_dup 1) + (match_dup 2)))] + "ISA_HAS_LSX" + "vfmaxa.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "smin3" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (smin:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfmin.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "lsx_vfmina_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (if_then_else:FLSX + (lt (abs:FLSX (match_operand:FLSX 1 "register_operand" "f")) + (abs:FLSX (match_operand:FLSX 2 "register_operand" "f"))) + (match_dup 1) + (match_dup 2)))] + "ISA_HAS_LSX" + "vfmina.\t%w0,%w1,%w2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "lsx_vfrecip_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRECIP))] + "ISA_HAS_LSX" + "vfrecip.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +(define_insn "lsx_vfrint_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRINT))] + "ISA_HAS_LSX" + "vfrint.\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "")]) + +(define_insn "lsx_vfrsqrt_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] + "ISA_HAS_LSX" + "vfrsqrt.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +(define_insn "lsx_vftint_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFTINT_S))] + "ISA_HAS_LSX" + "vftint..\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "lsx_vftint_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFTINT_U))] + "ISA_HAS_LSX" + "vftint..\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "fix_trunc2" + [(set (match_operand: 0 "register_operand" "=f") + (fix: (match_operand:FLSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vftintrz..\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "fixuns_trunc2" + [(set (match_operand: 0 "register_operand" "=f") + (unsigned_fix: (match_operand:FLSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vftintrz..\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "lsx_vhw_h_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (addsub:V8HI + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 1 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)])))))] + "ISA_HAS_LSX" + "vhw.h.b\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vhw_w_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (addsub:V4SI + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 1 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))))] + "ISA_HAS_LSX" + "vhw.w.h\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vhw_d_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (addsub:V2DI + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 1 "register_operand" "f") + (parallel [(const_int 1) (const_int 3)]))) + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2)])))))] + "ISA_HAS_LSX" + "vhw.d.w\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vpackev_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "f") + (match_operand:V16QI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 16) + (const_int 2) (const_int 18) + (const_int 4) (const_int 20) + (const_int 6) (const_int 22) + (const_int 8) (const_int 24) + (const_int 10) (const_int 26) + (const_int 12) (const_int 28) + (const_int 14) (const_int 30)])))] + "ISA_HAS_LSX" + "vpackev.b\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vpackev_h" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "f") + (match_operand:V8HI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 8) + (const_int 2) (const_int 10) + (const_int 4) (const_int 12) + (const_int 6) (const_int 14)])))] + "ISA_HAS_LSX" + "vpackev.h\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vpackev_w" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (vec_select:V4SI + (vec_concat:V8SI + (match_operand:V4SI 1 "register_operand" "f") + (match_operand:V4SI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 4) + (const_int 2) (const_int 6)])))] + "ISA_HAS_LSX" + "vpackev.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vpackev_w_f" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "f") + (match_operand:V4SF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 4) + (const_int 2) (const_int 6)])))] + "ISA_HAS_LSX" + "vpackev.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vilvh_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "f") + (match_operand:V16QI 2 "register_operand" "f")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "ISA_HAS_LSX" + "vilvh.b\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vilvh_h" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "f") + (match_operand:V8HI 2 "register_operand" "f")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "ISA_HAS_LSX" + "vilvh.h\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8HI")]) + +(define_mode_attr vilvh_suffix + [(V4SI "") (V4SF "_f") + (V2DI "") (V2DF "_f")]) + +(define_insn "lsx_vilvh_w" + [(set (match_operand:LSX_W 0 "register_operand" "=f") + (vec_select:LSX_W + (vec_concat: + (match_operand:LSX_W 1 "register_operand" "f") + (match_operand:LSX_W 2 "register_operand" "f")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "ISA_HAS_LSX" + "vilvh.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "")]) + +(define_insn "lsx_vilvh_d" + [(set (match_operand:LSX_D 0 "register_operand" "=f") + (vec_select:LSX_D + (vec_concat: + (match_operand:LSX_D 1 "register_operand" "f") + (match_operand:LSX_D 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3)])))] + "ISA_HAS_LSX" + "vilvh.d\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "")]) + +(define_insn "lsx_vpackod_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "f") + (match_operand:V16QI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 17) + (const_int 3) (const_int 19) + (const_int 5) (const_int 21) + (const_int 7) (const_int 23) + (const_int 9) (const_int 25) + (const_int 11) (const_int 27) + (const_int 13) (const_int 29) + (const_int 15) (const_int 31)])))] + "ISA_HAS_LSX" + "vpackod.b\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vpackod_h" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "f") + (match_operand:V8HI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 9) + (const_int 3) (const_int 11) + (const_int 5) (const_int 13) + (const_int 7) (const_int 15)])))] + "ISA_HAS_LSX" + "vpackod.h\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vpackod_w" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (vec_select:V4SI + (vec_concat:V8SI + (match_operand:V4SI 1 "register_operand" "f") + (match_operand:V4SI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 5) + (const_int 3) (const_int 7)])))] + "ISA_HAS_LSX" + "vpackod.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vpackod_w_f" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "f") + (match_operand:V4SF 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 5) + (const_int 3) (const_int 7)])))] + "ISA_HAS_LSX" + "vpackod.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vilvl_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "f") + (match_operand:V16QI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "ISA_HAS_LSX" + "vilvl.b\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vilvl_h" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "f") + (match_operand:V8HI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "ISA_HAS_LSX" + "vilvl.h\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vilvl_w" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (vec_select:V4SI + (vec_concat:V8SI + (match_operand:V4SI 1 "register_operand" "f") + (match_operand:V4SI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "ISA_HAS_LSX" + "vilvl.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vilvl_w_f" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "f") + (match_operand:V4SF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "ISA_HAS_LSX" + "vilvl.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vilvl_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (vec_select:V2DI + (vec_concat:V4DI + (match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2)])))] + "ISA_HAS_LSX" + "vilvl.d\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vilvl_d_f" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (vec_select:V2DF + (vec_concat:V4DF + (match_operand:V2DF 1 "register_operand" "f") + (match_operand:V2DF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2)])))] + "ISA_HAS_LSX" + "vilvl.d\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V2DF")]) + +(define_insn "smax3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (smax:ILSX (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_simm5_operand" "f,Usv5")))] + "ISA_HAS_LSX" + "@ + vmax.\t%w0,%w1,%w2 + vmaxi.\t%w0,%w1,%E2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "umax3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (umax:ILSX (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_uimm5_operand" "f,Uuv5")))] + "ISA_HAS_LSX" + "@ + vmax.\t%w0,%w1,%w2 + vmaxi.\t%w0,%w1,%B2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "smin3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (smin:ILSX (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_simm5_operand" "f,Usv5")))] + "ISA_HAS_LSX" + "@ + vmin.\t%w0,%w1,%w2 + vmini.\t%w0,%w1,%E2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "umin3" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (umin:ILSX (match_operand:ILSX 1 "register_operand" "f,f") + (match_operand:ILSX 2 "reg_or_vector_same_uimm5_operand" "f,Uuv5")))] + "ISA_HAS_LSX" + "@ + vmin.\t%w0,%w1,%w2 + vmini.\t%w0,%w1,%B2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vclo_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (clz:ILSX (not:ILSX (match_operand:ILSX 1 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vclo.\t%w0,%w1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "clz2" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (clz:ILSX (match_operand:ILSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vclz.\t%w0,%w1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lsx_nor_" + [(set (match_operand:ILSX 0 "register_operand" "=f,f") + (and:ILSX (not:ILSX (match_operand:ILSX 1 "register_operand" "f,f")) + (not:ILSX (match_operand:ILSX 2 "reg_or_vector_same_val_operand" "f,Urv8"))))] + "ISA_HAS_LSX" + "@ + vnor.v\t%w0,%w1,%w2 + vnori.b\t%w0,%w1,%B2" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "lsx_vpickev_b" +[(set (match_operand:V16QI 0 "register_operand" "=f") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "f") + (match_operand:V16QI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)])))] + "ISA_HAS_LSX" + "vpickev.b\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vpickev_h" +[(set (match_operand:V8HI 0 "register_operand" "=f") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "f") + (match_operand:V8HI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)])))] + "ISA_HAS_LSX" + "vpickev.h\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vpickev_w" +[(set (match_operand:V4SI 0 "register_operand" "=f") + (vec_select:V4SI + (vec_concat:V8SI + (match_operand:V4SI 1 "register_operand" "f") + (match_operand:V4SI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))] + "ISA_HAS_LSX" + "vpickev.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vpickev_w_f" +[(set (match_operand:V4SF 0 "register_operand" "=f") + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "f") + (match_operand:V4SF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))] + "ISA_HAS_LSX" + "vpickev.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vpickod_b" +[(set (match_operand:V16QI 0 "register_operand" "=f") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "f") + (match_operand:V16QI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)])))] + "ISA_HAS_LSX" + "vpickod.b\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vpickod_h" +[(set (match_operand:V8HI 0 "register_operand" "=f") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "f") + (match_operand:V8HI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)])))] + "ISA_HAS_LSX" + "vpickod.h\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vpickod_w" +[(set (match_operand:V4SI 0 "register_operand" "=f") + (vec_select:V4SI + (vec_concat:V8SI + (match_operand:V4SI 1 "register_operand" "f") + (match_operand:V4SI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)])))] + "ISA_HAS_LSX" + "vpickod.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vpickod_w_f" +[(set (match_operand:V4SF 0 "register_operand" "=f") + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "f") + (match_operand:V4SF 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)])))] + "ISA_HAS_LSX" + "vpickod.w\t%w0,%w2,%w1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4SF")]) + +(define_insn "popcount2" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (popcount:ILSX (match_operand:ILSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vpcnt.\t%w0,%w1" + [(set_attr "type" "simd_pcnt") + (set_attr "mode" "")]) + +(define_insn "lsx_vsat_s_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSAT_S))] + "ISA_HAS_LSX" + "vsat.\t%w0,%w1,%2" + [(set_attr "type" "simd_sat") + (set_attr "mode" "")]) + +(define_insn "lsx_vsat_u_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSAT_U))] + "ISA_HAS_LSX" + "vsat.\t%w0,%w1,%2" + [(set_attr "type" "simd_sat") + (set_attr "mode" "")]) + +(define_insn "lsx_vshuf4i_" + [(set (match_operand:LSX_WHB_W 0 "register_operand" "=f") + (vec_select:LSX_WHB_W + (match_operand:LSX_WHB_W 1 "register_operand" "f") + (match_operand 2 "par_const_vector_shf_set_operand" "")))] + "ISA_HAS_LSX" +{ + HOST_WIDE_INT val = 0; + unsigned int i; + + /* We convert the selection to an immediate. */ + for (i = 0; i < 4; i++) + val |= INTVAL (XVECEXP (operands[2], 0, i)) << (2 * i); + + operands[2] = GEN_INT (val); + return "vshuf4i.\t%w0,%w1,%X2"; +} + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrar_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VSRAR))] + "ISA_HAS_LSX" + "vsrar.\t%w0,%w1,%w2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrari_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSRARI))] + "ISA_HAS_LSX" + "vsrari.\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrlr_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VSRLR))] + "ISA_HAS_LSX" + "vsrlr.\t%w0,%w1,%w2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrlri_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSRLRI))] + "ISA_HAS_LSX" + "vsrlri.\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssub_s_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (ss_minus:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vssub.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vssub_u_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (us_minus:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vssub.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vreplve_" + [(set (match_operand:LSX 0 "register_operand" "=f") + (vec_duplicate:LSX + (vec_select: + (match_operand:LSX 1 "register_operand" "f") + (parallel [(match_operand:SI 2 "register_operand" "r")]))))] + "ISA_HAS_LSX" + "vreplve.\t%w0,%w1,%z2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lsx_vreplvei_" + [(set (match_operand:LSX 0 "register_operand" "=f") + (vec_duplicate:LSX + (vec_select: + (match_operand:LSX 1 "register_operand" "f") + (parallel [(match_operand 2 "const__operand" "")]))))] + "ISA_HAS_LSX" + "vreplvei.\t%w0,%w1,%2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lsx_vreplvei__scalar" + [(set (match_operand:LSX 0 "register_operand" "=f") + (vec_duplicate:LSX + (match_operand: 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vreplvei.\t%w0,%w1,0" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lsx_vfcvt_h_s" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (unspec:V8HI [(match_operand:V4SF 1 "register_operand" "f") + (match_operand:V4SF 2 "register_operand" "f")] + UNSPEC_LSX_VFCVT))] + "ISA_HAS_LSX" + "vfcvt.h.s\t%w0,%w1,%w2" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vfcvt_s_d" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V2DF 1 "register_operand" "f") + (match_operand:V2DF 2 "register_operand" "f")] + UNSPEC_LSX_VFCVT))] + "ISA_HAS_LSX" + "vfcvt.s.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4SF")]) + +(define_insn "vec_pack_trunc_v2df" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (vec_concat:V4SF + (float_truncate:V2SF (match_operand:V2DF 1 "register_operand" "f")) + (float_truncate:V2SF (match_operand:V2DF 2 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vfcvt.s.d\t%w0,%w2,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfcvth_s_h" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "f")] + UNSPEC_LSX_VFCVTH))] + "ISA_HAS_LSX" + "vfcvth.s.h\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfcvth_d_s" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (float_extend:V2DF + (vec_select:V2SF + (match_operand:V4SF 1 "register_operand" "f") + (parallel [(const_int 2) (const_int 3)]))))] + "ISA_HAS_LSX" + "vfcvth.d.s\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vfcvtl_s_h" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "f")] + UNSPEC_LSX_VFCVTL))] + "ISA_HAS_LSX" + "vfcvtl.s.h\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfcvtl_d_s" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (float_extend:V2DF + (vec_select:V2SF + (match_operand:V4SF 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1)]))))] + "ISA_HAS_LSX" + "vfcvtl.d.s\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V2DF")]) + +(define_code_attr lsxbr + [(eq "bz") + (ne "bnz")]) + +(define_code_attr lsxeq_v + [(eq "eqz") + (ne "nez")]) + +(define_code_attr lsxne_v + [(eq "nez") + (ne "eqz")]) + +(define_code_attr lsxeq + [(eq "anyeqz") + (ne "allnez")]) + +(define_code_attr lsxne + [(eq "allnez") + (ne "anyeqz")]) + +(define_insn "lsx__" + [(set (pc) (if_then_else + (equality_op + (unspec:SI [(match_operand:LSX 1 "register_operand" "f")] + UNSPEC_LSX_BRANCH) + (match_operand:SI 2 "const_0_operand")) + (label_ref (match_operand 0)) + (pc))) + (clobber (match_scratch:FCC 3 "=z"))] + "ISA_HAS_LSX" +{ + return loongarch_output_conditional_branch (insn, operands, + "vset.\t%Z3%w1\n\tbcnez\t%Z3%0", + "vset.\t%Z3%w1\n\tbcnez\t%Z3%0"); +} + [(set_attr "type" "simd_branch") + (set_attr "mode" "")]) + +(define_insn "lsx__v_" + [(set (pc) (if_then_else + (equality_op + (unspec:SI [(match_operand:LSX 1 "register_operand" "f")] + UNSPEC_LSX_BRANCH_V) + (match_operand:SI 2 "const_0_operand")) + (label_ref (match_operand 0)) + (pc))) + (clobber (match_scratch:FCC 3 "=z"))] + "ISA_HAS_LSX" +{ + return loongarch_output_conditional_branch (insn, operands, + "vset.v\t%Z3%w1\n\tbcnez\t%Z3%0", + "vset.v\t%Z3%w1\n\tbcnez\t%Z3%0"); +} + [(set_attr "type" "simd_branch") + (set_attr "mode" "TI")]) + +;; vec_concate +(define_expand "vec_concatv2di" + [(set (match_operand:V2DI 0 "register_operand") + (vec_concat:V2DI + (match_operand:DI 1 "register_operand") + (match_operand:DI 2 "register_operand")))] + "ISA_HAS_LSX" +{ + emit_insn (gen_lsx_vinsgr2vr_d (operands[0], operands[1], + operands[0], GEN_INT (0))); + emit_insn (gen_lsx_vinsgr2vr_d (operands[0], operands[2], + operands[0], GEN_INT (1))); + DONE; +}) + + +(define_insn "vandn3" + [(set (match_operand:LSX 0 "register_operand" "=f") + (and:LSX (not:LSX (match_operand:LSX 1 "register_operand" "f")) + (match_operand:LSX 2 "register_operand" "f")))] + "ISA_HAS_LSX" + "vandn.v\t%w0,%w1,%w2" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "vabs2" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (abs:ILSX (match_operand:ILSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vsigncov.\t%w0,%w1,%w1" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "vneg2" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (neg:ILSX (match_operand:ILSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vneg.\t%w0,%w1" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "lsx_vmuh_s_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VMUH_S))] + "ISA_HAS_LSX" + "vmuh.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vmuh_u_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VMUH_U))] + "ISA_HAS_LSX" + "vmuh.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vextw_s_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "f")] + UNSPEC_LSX_VEXTW_S))] + "ISA_HAS_LSX" + "vextw_s.d\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vextw_u_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SI 1 "register_operand" "f")] + UNSPEC_LSX_VEXTW_U))] + "ISA_HAS_LSX" + "vextw_u.d\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vsllwil_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_WHB 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSLLWIL_S))] + "ISA_HAS_LSX" + "vsllwil..\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsllwil_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_WHB 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSLLWIL_U))] + "ISA_HAS_LSX" + "vsllwil..\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsran__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSRAN))] + "ISA_HAS_LSX" + "vsran..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vssran_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRAN_S))] + "ISA_HAS_LSX" + "vssran..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vssran_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRAN_U))] + "ISA_HAS_LSX" + "vssran..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrain_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSRAIN))] + "ISA_HAS_LSX" + "vsrain.\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +;; FIXME: bitimm +(define_insn "lsx_vsrains_s_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSRAINS_S))] + "ISA_HAS_LSX" + "vsrains_s.\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +;; FIXME: bitimm +(define_insn "lsx_vsrains_u_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LSX_VSRAINS_U))] + "ISA_HAS_LSX" + "vsrains_u.\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrarn__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSRARN))] + "ISA_HAS_LSX" + "vsrarn..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrarn_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRARN_S))] + "ISA_HAS_LSX" + "vssrarn..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrarn_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRARN_U))] + "ISA_HAS_LSX" + "vssrarn..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrln__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSRLN))] + "ISA_HAS_LSX" + "vsrln..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrln_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRLN_U))] + "ISA_HAS_LSX" + "vssrln..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrlrn__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSRLRN))] + "ISA_HAS_LSX" + "vsrlrn..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrlrn_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRLRN_U))] + "ISA_HAS_LSX" + "vssrlrn..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vfrstpi_" + [(set (match_operand:ILSX_HB 0 "register_operand" "=f") + (unspec:ILSX_HB [(match_operand:ILSX_HB 1 "register_operand" "0") + (match_operand:ILSX_HB 2 "register_operand" "f") + (match_operand 3 "const_uimm5_operand" "")] + UNSPEC_LSX_VFRSTPI))] + "ISA_HAS_LSX" + "vfrstpi.\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vfrstp_" + [(set (match_operand:ILSX_HB 0 "register_operand" "=f") + (unspec:ILSX_HB [(match_operand:ILSX_HB 1 "register_operand" "0") + (match_operand:ILSX_HB 2 "register_operand" "f") + (match_operand:ILSX_HB 3 "register_operand" "f")] + UNSPEC_LSX_VFRSTP))] + "ISA_HAS_LSX" + "vfrstp.\t%w0,%w2,%w3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vshuf4i_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand")] + UNSPEC_LSX_VSHUF4I))] + "ISA_HAS_LSX" + "vshuf4i.d\t%w0,%w2,%3" + [(set_attr "type" "simd_sld") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vbsrl_" + [(set (match_operand:LSX 0 "register_operand" "=f") + (unspec:LSX [(match_operand:LSX 1 "register_operand" "f") + (match_operand 2 "const_uimm5_operand" "")] + UNSPEC_LSX_VBSRL_V))] + "ISA_HAS_LSX" + "vbsrl.v\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vbsll_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const_uimm5_operand" "")] + UNSPEC_LSX_VBSLL_V))] + "ISA_HAS_LSX" + "vbsll.v\t%w0,%w1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vextrins_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VEXTRINS))] + "ISA_HAS_LSX" + "vextrins.\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vmskltz_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")] + UNSPEC_LSX_VMSKLTZ))] + "ISA_HAS_LSX" + "vmskltz.\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsigncov_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VSIGNCOV))] + "ISA_HAS_LSX" + "vsigncov.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_expand "copysign3" + [(set (match_dup 4) + (and:FLSX + (not:FLSX (match_dup 3)) + (match_operand:FLSX 1 "register_operand"))) + (set (match_dup 5) + (and:FLSX (match_dup 3) + (match_operand:FLSX 2 "register_operand"))) + (set (match_operand:FLSX 0 "register_operand") + (ior:FLSX (match_dup 4) (match_dup 5)))] + "ISA_HAS_LSX" +{ + operands[3] = loongarch_build_signbit_mask (mode, 1, 0); + + operands[4] = gen_reg_rtx (mode); + operands[5] = gen_reg_rtx (mode); +}) + +(define_insn "absv2df2" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (abs:V2DF (match_operand:V2DF 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vbitclri.d\t%w0,%w1,63" + [(set_attr "type" "simd_logic") + (set_attr "mode" "V2DF")]) + +(define_insn "absv4sf2" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (abs:V4SF (match_operand:V4SF 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vbitclri.w\t%w0,%w1,31" + [(set_attr "type" "simd_logic") + (set_attr "mode" "V4SF")]) + +(define_insn "vfmadd4" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (fma:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f") + (match_operand:FLSX 3 "register_operand" "f")))] + "ISA_HAS_LSX" + "vfmadd.\t%w0,%w1,$w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "fms4" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (fma:FLSX (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f") + (neg:FLSX (match_operand:FLSX 3 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vfmsub.\t%w0,%w1,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "vfnmsub4_nmsub4" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (neg:FLSX + (fma:FLSX + (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f") + (neg:FLSX (match_operand:FLSX 3 "register_operand" "f")))))] + "ISA_HAS_LSX" + "vfnmsub.\t%w0,%w1,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + + +(define_insn "vfnmadd4_nmadd4" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (neg:FLSX + (fma:FLSX + (match_operand:FLSX 1 "register_operand" "f") + (match_operand:FLSX 2 "register_operand" "f") + (match_operand:FLSX 3 "register_operand" "f"))))] + "ISA_HAS_LSX" + "vfnmadd.\t%w0,%w1,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "lsx_vftintrne_w_s" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRNE))] + "ISA_HAS_LSX" + "vftintrne.w.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrne_l_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRNE))] + "ISA_HAS_LSX" + "vftintrne.l.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vftintrp_w_s" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRP))] + "ISA_HAS_LSX" + "vftintrp.w.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrp_l_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRP))] + "ISA_HAS_LSX" + "vftintrp.l.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vftintrm_w_s" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRM))] + "ISA_HAS_LSX" + "vftintrm.w.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrm_l_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRM))] + "ISA_HAS_LSX" + "vftintrm.l.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vftint_w_d" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f") + (match_operand:V2DF 2 "register_operand" "f")] + UNSPEC_LSX_VFTINT_W_D))] + "ISA_HAS_LSX" + "vftint.w.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vffint_s_l" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VFFINT_S_L))] + "ISA_HAS_LSX" + "vffint.s.l\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vftintrz_w_d" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f") + (match_operand:V2DF 2 "register_operand" "f")] + UNSPEC_LSX_VFTINTRZ_W_D))] + "ISA_HAS_LSX" + "vftintrz.w.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vftintrp_w_d" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f") + (match_operand:V2DF 2 "register_operand" "f")] + UNSPEC_LSX_VFTINTRP_W_D))] + "ISA_HAS_LSX" + "vftintrp.w.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vftintrm_w_d" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f") + (match_operand:V2DF 2 "register_operand" "f")] + UNSPEC_LSX_VFTINTRM_W_D))] + "ISA_HAS_LSX" + "vftintrm.w.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vftintrne_w_d" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f") + (match_operand:V2DF 2 "register_operand" "f")] + UNSPEC_LSX_VFTINTRNE_W_D))] + "ISA_HAS_LSX" + "vftintrne.w.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vftinth_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTH_L_H))] + "ISA_HAS_LSX" + "vftinth.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintl_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTL_L_S))] + "ISA_HAS_LSX" + "vftintl.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vffinth_d_w" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (unspec:V2DF [(match_operand:V4SI 1 "register_operand" "f")] + UNSPEC_LSX_VFFINTH_D_W))] + "ISA_HAS_LSX" + "vffinth.d.w\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vffintl_d_w" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (unspec:V2DF [(match_operand:V4SI 1 "register_operand" "f")] + UNSPEC_LSX_VFFINTL_D_W))] + "ISA_HAS_LSX" + "vffintl.d.w\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vftintrzh_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRZH_L_S))] + "ISA_HAS_LSX" + "vftintrzh.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrzl_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRZL_L_S))] + "ISA_HAS_LSX" + "vftintrzl.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrph_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRPH_L_S))] + "ISA_HAS_LSX" + "vftintrph.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrpl_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRPL_L_S))] + "ISA_HAS_LSX" + "vftintrpl.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrmh_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRMH_L_S))] + "ISA_HAS_LSX" + "vftintrmh.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrml_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRML_L_S))] + "ISA_HAS_LSX" + "vftintrml.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrneh_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRNEH_L_S))] + "ISA_HAS_LSX" + "vftintrneh.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vftintrnel_l_s" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFTINTRNEL_L_S))] + "ISA_HAS_LSX" + "vftintrnel.l.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfrintrne_s" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRNE_S))] + "ISA_HAS_LSX" + "vfrintrne.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfrintrne_d" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRNE_D))] + "ISA_HAS_LSX" + "vfrintrne.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vfrintrz_s" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRZ_S))] + "ISA_HAS_LSX" + "vfrintrz.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfrintrz_d" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRZ_D))] + "ISA_HAS_LSX" + "vfrintrz.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vfrintrp_s" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRP_S))] + "ISA_HAS_LSX" + "vfrintrp.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfrintrp_d" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRP_D))] + "ISA_HAS_LSX" + "vfrintrp.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +(define_insn "lsx_vfrintrm_s" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRM_S))] + "ISA_HAS_LSX" + "vfrintrm.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lsx_vfrintrm_d" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")] + UNSPEC_LSX_VFRINTRM_D))] + "ISA_HAS_LSX" + "vfrintrm.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +;; Vector versions of the floating-point frint patterns. +;; Expands to btrunc, ceil, floor, rint. +(define_insn "v4sf2" + [(set (match_operand:V4SF 0 "register_operand" "=f") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")] + FRINT_S))] + "ISA_HAS_LSX" + "vfrint.s\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "v2df2" + [(set (match_operand:V2DF 0 "register_operand" "=f") + (unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")] + FRINT_D))] + "ISA_HAS_LSX" + "vfrint.d\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V2DF")]) + +;; Expands to round. +(define_insn "round2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRINT))] + "ISA_HAS_LSX" + "vfrint.\t%w0,%w1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +;; Offset load and broadcast +(define_expand "lsx_vldrepl_" + [(match_operand:LSX 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq12_operand")] + "ISA_HAS_LSX" +{ + emit_insn (gen_lsx_vldrepl__insn + (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "lsx_vldrepl__insn" + [(set (match_operand:LSX 0 "register_operand" "=f") + (vec_duplicate:LSX + (mem: (plus:DI (match_operand:DI 1 "register_operand" "r") + (match_operand 2 "aq12_operand")))))] + "ISA_HAS_LSX" +{ + return "vldrepl.\t%w0,%1,%2"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "") + (set_attr "length" "4")]) + +(define_insn "lsx_vldrepl__insn_0" + [(set (match_operand:LSX 0 "register_operand" "=f") + (vec_duplicate:LSX + (mem: (match_operand:DI 1 "register_operand" "r"))))] + "ISA_HAS_LSX" +{ + return "vldrepl.\t%w0,%1,0"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "") + (set_attr "length" "4")]) + +;; Offset store by sel +(define_expand "lsx_vstelm_" + [(match_operand:LSX 0 "register_operand") + (match_operand 3 "const__operand") + (match_operand 2 "aq8_operand") + (match_operand 1 "pmode_register_operand")] + "ISA_HAS_LSX" +{ + emit_insn (gen_lsx_vstelm__insn + (operands[1], operands[2], operands[0], operands[3])); + DONE; +}) + +(define_insn "lsx_vstelm__insn" + [(set (mem: (plus:DI (match_operand:DI 0 "register_operand" "r") + (match_operand 1 "aq8_operand"))) + (vec_select: + (match_operand:LSX 2 "register_operand" "f") + (parallel [(match_operand 3 "const__operand" "")])))] + + "ISA_HAS_LSX" +{ + return "vstelm.\t%w2,%0,%1,%3"; +} + [(set_attr "type" "simd_store") + (set_attr "mode" "") + (set_attr "length" "4")]) + +;; Offset is "0" +(define_insn "lsx_vstelm__insn_0" + [(set (mem: (match_operand:DI 0 "register_operand" "r")) + (vec_select: + (match_operand:LSX 1 "register_operand" "f") + (parallel [(match_operand:SI 2 "const__operand")])))] + "ISA_HAS_LSX" +{ + return "vstelm.\t%w1,%0,0,%2"; +} + [(set_attr "type" "simd_store") + (set_attr "mode" "") + (set_attr "length" "4")]) + +(define_expand "lsx_vld" + [(match_operand:V16QI 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq12b_operand")] + "ISA_HAS_LSX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (operands[0], gen_rtx_MEM (V16QImode, addr)); + DONE; +}) + +(define_expand "lsx_vst" + [(match_operand:V16QI 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq12b_operand")] + "ISA_HAS_LSX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (gen_rtx_MEM (V16QImode, addr), operands[0]); + DONE; +}) + +(define_insn "lsx_vssrln__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRLN))] + "ISA_HAS_LSX" + "vssrln..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + + +(define_insn "lsx_vssrlrn__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILSX_DWH 1 "register_operand" "f") + (match_operand:ILSX_DWH 2 "register_operand" "f")] + UNSPEC_LSX_VSSRLRN))] + "ISA_HAS_LSX" + "vssrlrn..\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "vorn3" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (ior:ILSX (not:ILSX (match_operand:ILSX 2 "register_operand" "f")) + (match_operand:ILSX 1 "register_operand" "f")))] + "ISA_HAS_LSX" + "vorn.v\t%w0,%w1,%w2" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "lsx_vldi" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand 1 "const_imm13_operand")] + UNSPEC_LSX_VLDI))] + "ISA_HAS_LSX" +{ + HOST_WIDE_INT val = INTVAL (operands[1]); + if (val < 0) + { + HOST_WIDE_INT modeVal = (val & 0xf00) >> 8; + if (modeVal < 13) + return "vldi\t%w0,%1"; + else + sorry ("imm13 only support 0000 ~ 1100 in bits 9 ~ 12 when bit '13' is 1"); + return "#"; + } + else + return "vldi\t%w0,%1"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vshuf_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f") + (match_operand:V16QI 2 "register_operand" "f") + (match_operand:V16QI 3 "register_operand" "f")] + UNSPEC_LSX_VSHUF_B))] + "ISA_HAS_LSX" + "vshuf.b\t%w0,%w1,%w2,%w3" + [(set_attr "type" "simd_shf") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vldx" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (unspec:V16QI [(match_operand:DI 1 "register_operand" "r") + (match_operand:DI 2 "reg_or_0_operand" "rJ")] + UNSPEC_LSX_VLDX))] + "ISA_HAS_LSX" +{ + return "vldx\t%w0,%1,%z2"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vstx" + [(set (mem:V16QI (plus:DI (match_operand:DI 1 "register_operand" "r") + (match_operand:DI 2 "reg_or_0_operand" "rJ"))) + (unspec: V16QI [(match_operand:V16QI 0 "register_operand" "f")] + UNSPEC_LSX_VSTX))] + + "ISA_HAS_LSX" +{ + return "vstx\t%w0,%1,%z2"; +} + [(set_attr "type" "simd_store") + (set_attr "mode" "DI")]) + +(define_insn "lsx_vextl_qu_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")] + UNSPEC_LSX_VEXTL_QU_DU))] + "ISA_HAS_LSX" + "vextl.qu.du\t%w0,%w1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vseteqz_v" + [(set (match_operand:FCC 0 "register_operand" "=z") + (eq:FCC + (unspec:SI [(match_operand:V16QI 1 "register_operand" "f")] + UNSPEC_LSX_VSETEQZ_V) + (match_operand:SI 2 "const_0_operand")))] + "ISA_HAS_LSX" +{ + return "vseteqz.v\t%0,%1"; +} + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "FCC")]) + +;; Vector reduction operation +(define_expand "reduc_plus_scal_v2di" + [(match_operand:DI 0 "register_operand") + (match_operand:V2DI 1 "register_operand")] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (V2DImode); + emit_insn (gen_lsx_vhaddw_q_d (tmp, operands[1], operands[1])); + emit_insn (gen_vec_extractv2didi (operands[0], tmp, const0_rtx)); + DONE; +}) + +(define_expand "reduc_plus_scal_v4si" + [(match_operand:SI 0 "register_operand") + (match_operand:V4SI 1 "register_operand")] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx tmp1 = gen_reg_rtx (V2DImode); + emit_insn (gen_lsx_vhaddw_d_w (tmp, operands[1], operands[1])); + emit_insn (gen_lsx_vhaddw_q_d (tmp1, tmp, tmp)); + emit_insn (gen_vec_extractv4sisi (operands[0], gen_lowpart (V4SImode,tmp1), + const0_rtx)); + DONE; +}) + +(define_expand "reduc_plus_scal_" + [(match_operand: 0 "register_operand") + (match_operand:FLSX 1 "register_operand")] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_add3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc__scal_" + [(any_bitwise: + (match_operand: 0 "register_operand") + (match_operand:ILSX 1 "register_operand"))] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_smax_scal_" + [(match_operand: 0 "register_operand") + (match_operand:LSX 1 "register_operand")] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_smax3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_smin_scal_" + [(match_operand: 0 "register_operand") + (match_operand:LSX 1 "register_operand")] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_smin3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_umax_scal_" + [(match_operand: 0 "register_operand") + (match_operand:ILSX 1 "register_operand")] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_umax3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_umin_scal_" + [(match_operand: 0 "register_operand") + (match_operand:ILSX 1 "register_operand")] + "ISA_HAS_LSX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_umin3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_insn "lsx_vwev_d_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (addsubmul:V2DI + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2)]))) + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2)])))))] + "ISA_HAS_LSX" + "vwev.d.w\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vwev_w_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (addsubmul:V4SI + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))))] + "ISA_HAS_LSX" + "vwev.w.h\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vwev_h_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (addsubmul:V8HI + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)])))))] + "ISA_HAS_LSX" + "vwev.h.b\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vwod_d_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (addsubmul:V2DI + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3)]))) + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3)])))))] + "ISA_HAS_LSX" + "vwod.d.w\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vwod_w_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (addsubmul:V4SI + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)])))))] + "ISA_HAS_LSX" + "vwod.w.h\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vwod_h_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (addsubmul:V8HI + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)])))))] + "ISA_HAS_LSX" + "vwod.h.b\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vwev_d_wu_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (addmul:V2DI + (zero_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2)]))) + (sign_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2)])))))] + "ISA_HAS_LSX" + "vwev.d.wu.w\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vwev_w_hu_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (addmul:V4SI + (zero_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (sign_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))))] + "ISA_HAS_LSX" + "vwev.w.hu.h\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vwev_h_bu_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (addmul:V8HI + (zero_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (sign_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)])))))] + "ISA_HAS_LSX" + "vwev.h.bu.b\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vwod_d_wu_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (addmul:V2DI + (zero_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3)]))) + (sign_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3)])))))] + "ISA_HAS_LSX" + "vwod.d.wu.w\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vwod_w_hu_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (addmul:V4SI + (zero_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (sign_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)])))))] + "ISA_HAS_LSX" + "vwod.w.hu.h\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vwod_h_bu_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (addmul:V8HI + (zero_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (sign_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)])))))] + "ISA_HAS_LSX" + "vwod.h.bu.b\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vaddwev_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VADDWEV))] + "ISA_HAS_LSX" + "vaddwev.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vaddwev_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VADDWEV2))] + "ISA_HAS_LSX" + "vaddwev.q.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vaddwod_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VADDWOD))] + "ISA_HAS_LSX" + "vaddwod.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vaddwod_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VADDWOD2))] + "ISA_HAS_LSX" + "vaddwod.q.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vsubwev_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VSUBWEV))] + "ISA_HAS_LSX" + "vsubwev.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vsubwev_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VSUBWEV2))] + "ISA_HAS_LSX" + "vsubwev.q.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vsubwod_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VSUBWOD))] + "ISA_HAS_LSX" + "vsubwod.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vsubwod_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VSUBWOD2))] + "ISA_HAS_LSX" + "vsubwod.q.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vaddwev_q_du_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VADDWEV3))] + "ISA_HAS_LSX" + "vaddwev.q.du.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vaddwod_q_du_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VADDWOD3))] + "ISA_HAS_LSX" + "vaddwod.q.du.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmulwev_q_du_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VMULWEV3))] + "ISA_HAS_LSX" + "vmulwev.q.du.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmulwod_q_du_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VMULWOD3))] + "ISA_HAS_LSX" + "vmulwod.q.du.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmulwev_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VMULWEV))] + "ISA_HAS_LSX" + "vmulwev.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmulwev_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VMULWEV2))] + "ISA_HAS_LSX" + "vmulwev.q.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmulwod_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VMULWOD))] + "ISA_HAS_LSX" + "vmulwod.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmulwod_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VMULWOD2))] + "ISA_HAS_LSX" + "vmulwod.q.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vhaddw_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VHADDW_Q_D))] + "ISA_HAS_LSX" + "vhaddw.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vhaddw_qu_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VHADDW_QU_DU))] + "ISA_HAS_LSX" + "vhaddw.qu.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vhsubw_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VHSUBW_Q_D))] + "ISA_HAS_LSX" + "vhsubw.q.d\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vhsubw_qu_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VHSUBW_QU_DU))] + "ISA_HAS_LSX" + "vhsubw.qu.du\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwev_d_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (plus:V2DI + (match_operand:V2DI 1 "register_operand" "0") + (mult:V2DI + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2)]))) + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2)]))))))] + "ISA_HAS_LSX" + "vmaddwev.d.w\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwev_w_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (plus:V4SI + (match_operand:V4SI 1 "register_operand" "0") + (mult:V4SI + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))))))] + "ISA_HAS_LSX" + "vmaddwev.w.h\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vmaddwev_h_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (plus:V8HI + (match_operand:V8HI 1 "register_operand" "0") + (mult:V8HI + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))))))] + "ISA_HAS_LSX" + "vmaddwev.h.b\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vmaddwod_d_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (plus:V2DI + (match_operand:V2DI 1 "register_operand" "0") + (mult:V2DI + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3)]))) + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3)]))))))] + "ISA_HAS_LSX" + "vmaddwod.d.w\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwod_w_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (plus:V4SI + (match_operand:V4SI 1 "register_operand" "0") + (mult:V4SI + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))))))] + "ISA_HAS_LSX" + "vmaddwod.w.h\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vmaddwod_h_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (plus:V8HI + (match_operand:V8HI 1 "register_operand" "0") + (mult:V8HI + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))))))] + "ISA_HAS_LSX" + "vmaddwod.h.b\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vmaddwev_d_wu_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (plus:V2DI + (match_operand:V2DI 1 "register_operand" "0") + (mult:V2DI + (zero_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2)]))) + (sign_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2)]))))))] + "ISA_HAS_LSX" + "vmaddwev.d.wu.w\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwev_w_hu_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (plus:V4SI + (match_operand:V4SI 1 "register_operand" "0") + (mult:V4SI + (zero_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (sign_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))))))] + "ISA_HAS_LSX" + "vmaddwev.w.hu.h\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vmaddwev_h_bu_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (plus:V8HI + (match_operand:V8HI 1 "register_operand" "0") + (mult:V8HI + (zero_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (sign_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))))))] + "ISA_HAS_LSX" + "vmaddwev.h.bu.b\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vmaddwod_d_wu_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (plus:V2DI + (match_operand:V2DI 1 "register_operand" "0") + (mult:V2DI + (zero_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3)]))) + (sign_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3)]))))))] + "ISA_HAS_LSX" + "vmaddwod.d.wu.w\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwod_w_hu_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (plus:V4SI + (match_operand:V4SI 1 "register_operand" "0") + (mult:V4SI + (zero_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (sign_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))))))] + "ISA_HAS_LSX" + "vmaddwod.w.hu.h\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vmaddwod_h_bu_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (plus:V8HI + (match_operand:V8HI 1 "register_operand" "0") + (mult:V8HI + (zero_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (sign_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))))))] + "ISA_HAS_LSX" + "vmaddwod.h.bu.b\t%w0,%w2,%w3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vmaddwev_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f") + (match_operand:V2DI 3 "register_operand" "f")] + UNSPEC_LSX_VMADDWEV))] + "ISA_HAS_LSX" + "vmaddwev.q.d\t%w0,%w2,%w3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwod_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f") + (match_operand:V2DI 3 "register_operand" "f")] + UNSPEC_LSX_VMADDWOD))] + "ISA_HAS_LSX" + "vmaddwod.q.d\t%w0,%w2,%w3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwev_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f") + (match_operand:V2DI 3 "register_operand" "f")] + UNSPEC_LSX_VMADDWEV2))] + "ISA_HAS_LSX" + "vmaddwev.q.du\t%w0,%w2,%w3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwod_q_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f") + (match_operand:V2DI 3 "register_operand" "f")] + UNSPEC_LSX_VMADDWOD2))] + "ISA_HAS_LSX" + "vmaddwod.q.du\t%w0,%w2,%w3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwev_q_du_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f") + (match_operand:V2DI 3 "register_operand" "f")] + UNSPEC_LSX_VMADDWEV3))] + "ISA_HAS_LSX" + "vmaddwev.q.du.d\t%w0,%w2,%w3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmaddwod_q_du_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f") + (match_operand:V2DI 3 "register_operand" "f")] + UNSPEC_LSX_VMADDWOD3))] + "ISA_HAS_LSX" + "vmaddwod.q.du.d\t%w0,%w2,%w3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vrotr_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f") + (match_operand:ILSX 2 "register_operand" "f")] + UNSPEC_LSX_VROTR))] + "ISA_HAS_LSX" + "vrotr.\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lsx_vadd_q" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VADD_Q))] + "ISA_HAS_LSX" + "vadd.q\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vsub_q" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f") + (match_operand:V2DI 2 "register_operand" "f")] + UNSPEC_LSX_VSUB_Q))] + "ISA_HAS_LSX" + "vsub.q\t%w0,%w1,%w2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vmskgez_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")] + UNSPEC_LSX_VMSKGEZ))] + "ISA_HAS_LSX" + "vmskgez.b\t%w0,%w1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vmsknz_b" + [(set (match_operand:V16QI 0 "register_operand" "=f") + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")] + UNSPEC_LSX_VMSKNZ))] + "ISA_HAS_LSX" + "vmsknz.b\t%w0,%w1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "V16QI")]) + +(define_insn "lsx_vexth_h_b" + [(set (match_operand:V8HI 0 "register_operand" "=f") + (any_extend:V8HI + (vec_select:V8QI + (match_operand:V16QI 1 "register_operand" "f") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)]))))] + "ISA_HAS_LSX" + "vexth.h.b\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V8HI")]) + +(define_insn "lsx_vexth_w_h" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (any_extend:V4SI + (vec_select:V4HI + (match_operand:V8HI 1 "register_operand" "f") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))))] + "ISA_HAS_LSX" + "vexth.w.h\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4SI")]) + +(define_insn "lsx_vexth_d_w" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (any_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 1 "register_operand" "f") + (parallel [(const_int 2) (const_int 3)]))))] + "ISA_HAS_LSX" + "vexth.d.w\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vexth_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")] + UNSPEC_LSX_VEXTH_Q_D))] + "ISA_HAS_LSX" + "vexth.q.d\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vexth_qu_du" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")] + UNSPEC_LSX_VEXTH_QU_DU))] + "ISA_HAS_LSX" + "vexth.qu.du\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vrotri_" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (rotatert:ILSX (match_operand:ILSX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")))] + "ISA_HAS_LSX" + "vrotri.\t%w0,%w1,%2" + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lsx_vextl_q_d" + [(set (match_operand:V2DI 0 "register_operand" "=f") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")] + UNSPEC_LSX_VEXTL_Q_D))] + "ISA_HAS_LSX" + "vextl.q.d\t%w0,%w1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V2DI")]) + +(define_insn "lsx_vsrlni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSRLNI))] + "ISA_HAS_LSX" + "vsrlni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrlrni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSRLRNI))] + "ISA_HAS_LSX" + "vsrlrni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrlni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRLNI))] + "ISA_HAS_LSX" + "vssrlni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrlni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRLNI2))] + "ISA_HAS_LSX" + "vssrlni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrlrni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRLRNI))] + "ISA_HAS_LSX" + "vssrlrni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrlrni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRLRNI2))] + "ISA_HAS_LSX" + "vssrlrni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrani__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSRANI))] + "ISA_HAS_LSX" + "vsrani..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vsrarni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSRARNI))] + "ISA_HAS_LSX" + "vsrarni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrani__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRANI))] + "ISA_HAS_LSX" + "vssrani..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrani__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRANI2))] + "ISA_HAS_LSX" + "vssrani..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrarni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRARNI))] + "ISA_HAS_LSX" + "vssrarni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vssrarni__" + [(set (match_operand:ILSX 0 "register_operand" "=f") + (unspec:ILSX [(match_operand:ILSX 1 "register_operand" "0") + (match_operand:ILSX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VSSRARNI2))] + "ISA_HAS_LSX" + "vssrarni..\t%w0,%w2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lsx_vpermi_w" + [(set (match_operand:V4SI 0 "register_operand" "=f") + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0") + (match_operand:V4SI 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LSX_VPERMI))] + "ISA_HAS_LSX" + "vpermi.w\t%w0,%w2,%3" + [(set_attr "type" "simd_bit") + (set_attr "mode" "V4SI")]) diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index 510973aa339..f430629825e 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -87,10 +87,42 @@ (define_predicate "const_immalsl_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 1, 4)"))) +(define_predicate "const_lsx_branch_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), -1024, 1023)"))) + +(define_predicate "const_uimm3_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 7)"))) + +(define_predicate "const_8_to_11_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 8, 11)"))) + +(define_predicate "const_12_to_15_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 12, 15)"))) + +(define_predicate "const_uimm4_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 15)"))) + (define_predicate "const_uimm5_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 0, 31)"))) +(define_predicate "const_uimm6_operand" + (and (match_code "const_int") + (match_test "UIMM6_OPERAND (INTVAL (op))"))) + +(define_predicate "const_uimm7_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 127)"))) + +(define_predicate "const_uimm8_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 255)"))) + (define_predicate "const_uimm14_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 0, 16383)"))) @@ -99,10 +131,74 @@ (define_predicate "const_uimm15_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 0, 32767)"))) +(define_predicate "const_imm5_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), -16, 15)"))) + +(define_predicate "const_imm10_operand" + (and (match_code "const_int") + (match_test "IMM10_OPERAND (INTVAL (op))"))) + (define_predicate "const_imm12_operand" (and (match_code "const_int") (match_test "IMM12_OPERAND (INTVAL (op))"))) +(define_predicate "const_imm13_operand" + (and (match_code "const_int") + (match_test "IMM13_OPERAND (INTVAL (op))"))) + +(define_predicate "reg_imm10_operand" + (ior (match_operand 0 "const_imm10_operand") + (match_operand 0 "register_operand"))) + +(define_predicate "aq8b_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 8, 0)"))) + +(define_predicate "aq8h_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 8, 1)"))) + +(define_predicate "aq8w_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 8, 2)"))) + +(define_predicate "aq8d_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 8, 3)"))) + +(define_predicate "aq10b_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 10, 0)"))) + +(define_predicate "aq10h_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 10, 1)"))) + +(define_predicate "aq10w_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 10, 2)"))) + +(define_predicate "aq10d_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 10, 3)"))) + +(define_predicate "aq12b_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 12, 0)"))) + +(define_predicate "aq12h_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 11, 1)"))) + +(define_predicate "aq12w_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 10, 2)"))) + +(define_predicate "aq12d_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 9, 3)"))) + (define_predicate "sle_operand" (and (match_code "const_int") (match_test "IMM12_OPERAND (INTVAL (op) + 1)"))) @@ -112,29 +208,206 @@ (define_predicate "sleu_operand" (match_test "INTVAL (op) + 1 != 0"))) (define_predicate "const_0_operand" - (and (match_code "const_int,const_double,const_vector") + (and (match_code "const_int,const_wide_int,const_double,const_vector") (match_test "op == CONST0_RTX (GET_MODE (op))"))) +(define_predicate "const_m1_operand" + (and (match_code "const_int,const_wide_int,const_double,const_vector") + (match_test "op == CONSTM1_RTX (GET_MODE (op))"))) + +(define_predicate "reg_or_m1_operand" + (ior (match_operand 0 "const_m1_operand") + (match_operand 0 "register_operand"))) + (define_predicate "reg_or_0_operand" (ior (match_operand 0 "const_0_operand") (match_operand 0 "register_operand"))) (define_predicate "const_1_operand" - (and (match_code "const_int,const_double,const_vector") + (and (match_code "const_int,const_wide_int,const_double,const_vector") (match_test "op == CONST1_RTX (GET_MODE (op))"))) (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) +;; These are used in vec_merge, hence accept bitmask as const_int. +(define_predicate "const_exp_2_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (exact_log2 (INTVAL (op)), 0, 1)"))) + +(define_predicate "const_exp_4_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (exact_log2 (INTVAL (op)), 0, 3)"))) + +(define_predicate "const_exp_8_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (exact_log2 (INTVAL (op)), 0, 7)"))) + +(define_predicate "const_exp_16_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (exact_log2 (INTVAL (op)), 0, 15)"))) + +(define_predicate "const_exp_32_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (exact_log2 (INTVAL (op)), 0, 31)"))) + +;; This is used for indexing into vectors, and hence only accepts const_int. +(define_predicate "const_0_or_1_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 1)"))) + +(define_predicate "const_0_to_3_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 3)"))) + +(define_predicate "const_0_to_7_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 7)"))) + +(define_predicate "const_2_or_3_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 2, 3)"))) + +(define_predicate "const_4_to_7_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 4, 7)"))) + +(define_predicate "const_8_to_15_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 7)"))) + +(define_predicate "const_16_to_31_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 7)"))) + +(define_predicate "qi_mask_operand" + (and (match_code "const_int") + (match_test "UINTVAL (op) == 0xff"))) + +(define_predicate "hi_mask_operand" + (and (match_code "const_int") + (match_test "UINTVAL (op) == 0xffff"))) + (define_predicate "lu52i_mask_operand" (and (match_code "const_int") (match_test "UINTVAL (op) == 0xfffffffffffff"))) +(define_predicate "si_mask_operand" + (and (match_code "const_int") + (match_test "UINTVAL (op) == 0xffffffff"))) + (define_predicate "low_bitmask_operand" (and (match_code "const_int") (match_test "low_bitmask_len (mode, INTVAL (op)) > 12"))) +(define_predicate "d_operand" + (and (match_code "reg") + (match_test "GP_REG_P (REGNO (op))"))) + +(define_predicate "db4_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op) + 1, 4, 0)"))) + +(define_predicate "db7_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op) + 1, 7, 0)"))) + +(define_predicate "db8_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op) + 1, 8, 0)"))) + +(define_predicate "ib3_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op) - 1, 3, 0)"))) + +(define_predicate "sb4_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 4, 0)"))) + +(define_predicate "sb5_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 5, 0)"))) + +(define_predicate "sb8_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 8, 0)"))) + +(define_predicate "sd8_operand" + (and (match_code "const_int") + (match_test "loongarch_signed_immediate_p (INTVAL (op), 8, 3)"))) + +(define_predicate "ub4_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op), 4, 0)"))) + +(define_predicate "ub8_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op), 8, 0)"))) + +(define_predicate "uh4_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op), 4, 1)"))) + +(define_predicate "uw4_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op), 4, 2)"))) + +(define_predicate "uw5_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op), 5, 2)"))) + +(define_predicate "uw6_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op), 6, 2)"))) + +(define_predicate "uw8_operand" + (and (match_code "const_int") + (match_test "loongarch_unsigned_immediate_p (INTVAL (op), 8, 2)"))) + +(define_predicate "addiur2_operand" + (and (match_code "const_int") + (ior (match_test "INTVAL (op) == -1") + (match_test "INTVAL (op) == 1") + (match_test "INTVAL (op) == 4") + (match_test "INTVAL (op) == 8") + (match_test "INTVAL (op) == 12") + (match_test "INTVAL (op) == 16") + (match_test "INTVAL (op) == 20") + (match_test "INTVAL (op) == 24")))) + +(define_predicate "addiusp_operand" + (and (match_code "const_int") + (ior (match_test "(IN_RANGE (INTVAL (op), 2, 257))") + (match_test "(IN_RANGE (INTVAL (op), -258, -3))")))) + +(define_predicate "andi16_operand" + (and (match_code "const_int") + (ior (match_test "IN_RANGE (INTVAL (op), 1, 4)") + (match_test "IN_RANGE (INTVAL (op), 7, 8)") + (match_test "IN_RANGE (INTVAL (op), 15, 16)") + (match_test "IN_RANGE (INTVAL (op), 31, 32)") + (match_test "IN_RANGE (INTVAL (op), 63, 64)") + (match_test "INTVAL (op) == 255") + (match_test "INTVAL (op) == 32768") + (match_test "INTVAL (op) == 65535")))) + +(define_predicate "movep_src_register" + (and (match_code "reg") + (ior (match_test ("IN_RANGE (REGNO (op), 2, 3)")) + (match_test ("IN_RANGE (REGNO (op), 16, 20)"))))) + +(define_predicate "movep_src_operand" + (ior (match_operand 0 "const_0_operand") + (match_operand 0 "movep_src_register"))) + +(define_predicate "fcc_reload_operand" + (and (match_code "reg,subreg") + (match_test "FCC_REG_P (true_regnum (op))"))) + +(define_predicate "muldiv_target_operand" + (match_operand 0 "register_operand")) + (define_predicate "const_call_insn_operand" (match_code "const,symbol_ref,label_ref") { @@ -303,3 +576,59 @@ (define_predicate "small_data_pattern" (define_predicate "non_volatile_mem_operand" (and (match_operand 0 "memory_operand") (not (match_test "MEM_VOLATILE_P (op)")))) + +(define_predicate "const_vector_same_val_operand" + (match_code "const_vector") +{ + return loongarch_const_vector_same_val_p (op, mode); +}) + +(define_predicate "const_vector_same_simm5_operand" + (match_code "const_vector") +{ + return loongarch_const_vector_same_int_p (op, mode, -16, 15); +}) + +(define_predicate "const_vector_same_uimm5_operand" + (match_code "const_vector") +{ + return loongarch_const_vector_same_int_p (op, mode, 0, 31); +}) + +(define_predicate "const_vector_same_ximm5_operand" + (match_code "const_vector") +{ + return loongarch_const_vector_same_int_p (op, mode, -31, 31); +}) + +(define_predicate "const_vector_same_uimm6_operand" + (match_code "const_vector") +{ + return loongarch_const_vector_same_int_p (op, mode, 0, 63); +}) + +(define_predicate "par_const_vector_shf_set_operand" + (match_code "parallel") +{ + return loongarch_const_vector_shuffle_set_p (op, mode); +}) + +(define_predicate "reg_or_vector_same_val_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "const_vector_same_val_operand"))) + +(define_predicate "reg_or_vector_same_simm5_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "const_vector_same_simm5_operand"))) + +(define_predicate "reg_or_vector_same_uimm5_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "const_vector_same_uimm5_operand"))) + +(define_predicate "reg_or_vector_same_ximm5_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "const_vector_same_ximm5_operand"))) + +(define_predicate "reg_or_vector_same_uimm6_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "const_vector_same_uimm6_operand"))) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 24453693d89..daa318ee3da 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -2892,6 +2892,17 @@ as @code{st.w} and @code{ld.w}. A signed 12-bit constant (for arithmetic instructions). @item K An unsigned 12-bit constant (for logic instructions). +@item M +A constant that cannot be loaded using @code{lui}, @code{addiu} +or @code{ori}. +@item N +A constant in the range -65535 to -1 (inclusive). +@item O +A signed 15-bit constant. +@item P +A constant in the range 1 to 65535 (inclusive). +@item R +An address that can be used in a non-macro load or store. @item ZB An address that is held in a general-purpose register. The offset is zero. diff --git a/gcc/testsuite/g++.dg/torture/vshuf-v16qi.C b/gcc/testsuite/g++.dg/torture/vshuf-v16qi.C index 56801177583..350b450d05f 100644 --- a/gcc/testsuite/g++.dg/torture/vshuf-v16qi.C +++ b/gcc/testsuite/g++.dg/torture/vshuf-v16qi.C @@ -1,5 +1,6 @@ // { dg-options "-std=c++11" } // { dg-do run } +// { dg-skip-if "LoongArch vshuf/xvshuf insn result is undefined when 6 or 7 bit of vector's element is set." { loongarch*-*-* } } typedef unsigned char V __attribute__((vector_size(16))); typedef V VI; diff --git a/gcc/testsuite/g++.dg/torture/vshuf-v2df.C b/gcc/testsuite/g++.dg/torture/vshuf-v2df.C index ba45078ea13..9a7f7c4188a 100644 --- a/gcc/testsuite/g++.dg/torture/vshuf-v2df.C +++ b/gcc/testsuite/g++.dg/torture/vshuf-v2df.C @@ -1,5 +1,7 @@ // { dg-options "-std=c++11" } // // { dg-do run } +// { dg-skip-if "LoongArch vshuf/xvshuf insn result is undefined when 6 or 7 bit of vector's element is set." { loongarch*-*-* } } + #if __SIZEOF_DOUBLE__ == 8 && __SIZEOF_LONG_LONG__ == 8 typedef double V __attribute__((vector_size(16))); typedef unsigned long long VI __attribute__((vector_size(16))); diff --git a/gcc/testsuite/g++.dg/torture/vshuf-v2di.C b/gcc/testsuite/g++.dg/torture/vshuf-v2di.C index a4272842a36..26ab98c09cc 100644 --- a/gcc/testsuite/g++.dg/torture/vshuf-v2di.C +++ b/gcc/testsuite/g++.dg/torture/vshuf-v2di.C @@ -1,5 +1,6 @@ // { dg-options "-std=c++11" } // // { dg-do run } +// { dg-skip-if "LoongArch vshuf/xvshuf insn result is undefined when 6 or 7 bit of vector's element is set." { loongarch*-*-* } } #if __SIZEOF_LONG_LONG__ == 8 typedef unsigned long long V __attribute__((vector_size(16))); diff --git a/gcc/testsuite/g++.dg/torture/vshuf-v4sf.C b/gcc/testsuite/g++.dg/torture/vshuf-v4sf.C index c7d58434409..1617eba3827 100644 --- a/gcc/testsuite/g++.dg/torture/vshuf-v4sf.C +++ b/gcc/testsuite/g++.dg/torture/vshuf-v4sf.C @@ -1,6 +1,6 @@ // { dg-options "-std=c++11" } // { dg-do run } - +// { dg-skip-if "LoongArch vshuf/xvshuf insn result is undefined when 6 or 7 bit of vector's element is set." { loongarch*-*-* } } #if __SIZEOF_FLOAT__ == 4 typedef float V __attribute__((vector_size(16))); diff --git a/gcc/testsuite/g++.dg/torture/vshuf-v8hi.C b/gcc/testsuite/g++.dg/torture/vshuf-v8hi.C index 33b20c68a87..61a40f733f6 100644 --- a/gcc/testsuite/g++.dg/torture/vshuf-v8hi.C +++ b/gcc/testsuite/g++.dg/torture/vshuf-v8hi.C @@ -1,5 +1,6 @@ // { dg-options "-std=c++11" } // { dg-do run } +// { dg-skip-if "LoongArch vshuf/xvshuf insn result is undefined when 6 or 7 bit of vector's element is set." { loongarch*-*-* } } typedef unsigned short V __attribute__((vector_size(16))); typedef V VI; From patchwork Thu Aug 24 03:13:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenghui Pan X-Patchwork-Id: 1825111 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RWSsG1nBZz1ygJ for ; Thu, 24 Aug 2023 13:16:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B75423865493 for ; Thu, 24 Aug 2023 03:15:59 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 585BC3857006 for ; Thu, 24 Aug 2023 03:14:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 585BC3857006 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qZ0nO-0007LZ-JK for gcc-patches@gcc.gnu.org; Wed, 23 Aug 2023 23:14:13 -0400 Received: from loongson.cn (unknown [10.20.4.45]) by gateway (Coremail) with SMTP id _____8CxNvH3yuZk9GEbAA--.55783S3; Thu, 24 Aug 2023 11:13:59 +0800 (CST) Received: from loongson-pc.loongson.cn (unknown [10.20.4.45]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxviPdyuZkzvJhAA--.583S7; Thu, 24 Aug 2023 11:13:58 +0800 (CST) From: Chenghui Pan To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn Subject: [PATCH v5 3/6] LoongArch: Add Loongson SX directive builtin function support. Date: Thu, 24 Aug 2023 11:13:13 +0800 Message-Id: <20230824031316.16599-4-panchenghui@loongson.cn> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230824031316.16599-1-panchenghui@loongson.cn> References: <20230824031316.16599-1-panchenghui@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxviPdyuZkzvJhAA--.583S7 X-CM-SenderInfo: psdquxxhqjx33l6o00pqjv00gofq/1tbiAQANBGTlhzMLuAACsM X-Coremail-Antispam: 1Uk129KBj9kXoW8JrWrCr18Ar1rCr4DGw1kGFy8p5X_XFWktF 45pFyxCryUCFWUWrZI9ay8JFW5ur4a9F1fA3WUXry5C342yayUta93tF4Fkr45CasIvrWj k342qayFqF9rKF13A3gCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7 ZEXasCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29K BjDU0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26c xKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r126r13M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x02 67AKxVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6x ACxx1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E 87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82 IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC2 0s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMI IF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF 0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87 Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07j5l1kUUUUU= Received-SPF: pass client-ip=114.242.206.163; envelope-from=panchenghui@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UPPERCASE_50_75=0.008 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-14.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_FAIL, SPF_HELO_PASS, TXREP, UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Lulu Cheng gcc/ChangeLog: * config.gcc: Export the header file lsxintrin.h. * config/loongarch/loongarch-builtins.cc (LARCH_FTYPE_NAME4): Add builtin function support. (enum loongarch_builtin_type): Ditto. (AVAIL_ALL): Ditto. (LARCH_BUILTIN): Ditto. (LSX_BUILTIN): Ditto. (LSX_BUILTIN_TEST_BRANCH): Ditto. (LSX_NO_TARGET_BUILTIN): Ditto. (CODE_FOR_lsx_vsadd_b): Ditto. (CODE_FOR_lsx_vsadd_h): Ditto. (CODE_FOR_lsx_vsadd_w): Ditto. (CODE_FOR_lsx_vsadd_d): Ditto. (CODE_FOR_lsx_vsadd_bu): Ditto. (CODE_FOR_lsx_vsadd_hu): Ditto. (CODE_FOR_lsx_vsadd_wu): Ditto. (CODE_FOR_lsx_vsadd_du): Ditto. (CODE_FOR_lsx_vadd_b): Ditto. (CODE_FOR_lsx_vadd_h): Ditto. (CODE_FOR_lsx_vadd_w): Ditto. (CODE_FOR_lsx_vadd_d): Ditto. (CODE_FOR_lsx_vaddi_bu): Ditto. (CODE_FOR_lsx_vaddi_hu): Ditto. (CODE_FOR_lsx_vaddi_wu): Ditto. (CODE_FOR_lsx_vaddi_du): Ditto. (CODE_FOR_lsx_vand_v): Ditto. (CODE_FOR_lsx_vandi_b): Ditto. (CODE_FOR_lsx_bnz_v): Ditto. (CODE_FOR_lsx_bz_v): Ditto. (CODE_FOR_lsx_vbitsel_v): Ditto. (CODE_FOR_lsx_vseqi_b): Ditto. (CODE_FOR_lsx_vseqi_h): Ditto. (CODE_FOR_lsx_vseqi_w): Ditto. (CODE_FOR_lsx_vseqi_d): Ditto. (CODE_FOR_lsx_vslti_b): Ditto. (CODE_FOR_lsx_vslti_h): Ditto. (CODE_FOR_lsx_vslti_w): Ditto. (CODE_FOR_lsx_vslti_d): Ditto. (CODE_FOR_lsx_vslti_bu): Ditto. (CODE_FOR_lsx_vslti_hu): Ditto. (CODE_FOR_lsx_vslti_wu): Ditto. (CODE_FOR_lsx_vslti_du): Ditto. (CODE_FOR_lsx_vslei_b): Ditto. (CODE_FOR_lsx_vslei_h): Ditto. (CODE_FOR_lsx_vslei_w): Ditto. (CODE_FOR_lsx_vslei_d): Ditto. (CODE_FOR_lsx_vslei_bu): Ditto. (CODE_FOR_lsx_vslei_hu): Ditto. (CODE_FOR_lsx_vslei_wu): Ditto. (CODE_FOR_lsx_vslei_du): Ditto. (CODE_FOR_lsx_vdiv_b): Ditto. (CODE_FOR_lsx_vdiv_h): Ditto. (CODE_FOR_lsx_vdiv_w): Ditto. (CODE_FOR_lsx_vdiv_d): Ditto. (CODE_FOR_lsx_vdiv_bu): Ditto. (CODE_FOR_lsx_vdiv_hu): Ditto. (CODE_FOR_lsx_vdiv_wu): Ditto. (CODE_FOR_lsx_vdiv_du): Ditto. (CODE_FOR_lsx_vfadd_s): Ditto. (CODE_FOR_lsx_vfadd_d): Ditto. (CODE_FOR_lsx_vftintrz_w_s): Ditto. (CODE_FOR_lsx_vftintrz_l_d): Ditto. (CODE_FOR_lsx_vftintrz_wu_s): Ditto. (CODE_FOR_lsx_vftintrz_lu_d): Ditto. (CODE_FOR_lsx_vffint_s_w): Ditto. (CODE_FOR_lsx_vffint_d_l): Ditto. (CODE_FOR_lsx_vffint_s_wu): Ditto. (CODE_FOR_lsx_vffint_d_lu): Ditto. (CODE_FOR_lsx_vfsub_s): Ditto. (CODE_FOR_lsx_vfsub_d): Ditto. (CODE_FOR_lsx_vfmul_s): Ditto. (CODE_FOR_lsx_vfmul_d): Ditto. (CODE_FOR_lsx_vfdiv_s): Ditto. (CODE_FOR_lsx_vfdiv_d): Ditto. (CODE_FOR_lsx_vfmax_s): Ditto. (CODE_FOR_lsx_vfmax_d): Ditto. (CODE_FOR_lsx_vfmin_s): Ditto. (CODE_FOR_lsx_vfmin_d): Ditto. (CODE_FOR_lsx_vfsqrt_s): Ditto. (CODE_FOR_lsx_vfsqrt_d): Ditto. (CODE_FOR_lsx_vflogb_s): Ditto. (CODE_FOR_lsx_vflogb_d): Ditto. (CODE_FOR_lsx_vmax_b): Ditto. (CODE_FOR_lsx_vmax_h): Ditto. (CODE_FOR_lsx_vmax_w): Ditto. (CODE_FOR_lsx_vmax_d): Ditto. (CODE_FOR_lsx_vmaxi_b): Ditto. (CODE_FOR_lsx_vmaxi_h): Ditto. (CODE_FOR_lsx_vmaxi_w): Ditto. (CODE_FOR_lsx_vmaxi_d): Ditto. (CODE_FOR_lsx_vmax_bu): Ditto. (CODE_FOR_lsx_vmax_hu): Ditto. (CODE_FOR_lsx_vmax_wu): Ditto. (CODE_FOR_lsx_vmax_du): Ditto. (CODE_FOR_lsx_vmaxi_bu): Ditto. (CODE_FOR_lsx_vmaxi_hu): Ditto. (CODE_FOR_lsx_vmaxi_wu): Ditto. (CODE_FOR_lsx_vmaxi_du): Ditto. (CODE_FOR_lsx_vmin_b): Ditto. (CODE_FOR_lsx_vmin_h): Ditto. (CODE_FOR_lsx_vmin_w): Ditto. (CODE_FOR_lsx_vmin_d): Ditto. (CODE_FOR_lsx_vmini_b): Ditto. (CODE_FOR_lsx_vmini_h): Ditto. (CODE_FOR_lsx_vmini_w): Ditto. (CODE_FOR_lsx_vmini_d): Ditto. (CODE_FOR_lsx_vmin_bu): Ditto. (CODE_FOR_lsx_vmin_hu): Ditto. (CODE_FOR_lsx_vmin_wu): Ditto. (CODE_FOR_lsx_vmin_du): Ditto. (CODE_FOR_lsx_vmini_bu): Ditto. (CODE_FOR_lsx_vmini_hu): Ditto. (CODE_FOR_lsx_vmini_wu): Ditto. (CODE_FOR_lsx_vmini_du): Ditto. (CODE_FOR_lsx_vmod_b): Ditto. (CODE_FOR_lsx_vmod_h): Ditto. (CODE_FOR_lsx_vmod_w): Ditto. (CODE_FOR_lsx_vmod_d): Ditto. (CODE_FOR_lsx_vmod_bu): Ditto. (CODE_FOR_lsx_vmod_hu): Ditto. (CODE_FOR_lsx_vmod_wu): Ditto. (CODE_FOR_lsx_vmod_du): Ditto. (CODE_FOR_lsx_vmul_b): Ditto. (CODE_FOR_lsx_vmul_h): Ditto. (CODE_FOR_lsx_vmul_w): Ditto. (CODE_FOR_lsx_vmul_d): Ditto. (CODE_FOR_lsx_vclz_b): Ditto. (CODE_FOR_lsx_vclz_h): Ditto. (CODE_FOR_lsx_vclz_w): Ditto. (CODE_FOR_lsx_vclz_d): Ditto. (CODE_FOR_lsx_vnor_v): Ditto. (CODE_FOR_lsx_vor_v): Ditto. (CODE_FOR_lsx_vori_b): Ditto. (CODE_FOR_lsx_vnori_b): Ditto. (CODE_FOR_lsx_vpcnt_b): Ditto. (CODE_FOR_lsx_vpcnt_h): Ditto. (CODE_FOR_lsx_vpcnt_w): Ditto. (CODE_FOR_lsx_vpcnt_d): Ditto. (CODE_FOR_lsx_vxor_v): Ditto. (CODE_FOR_lsx_vxori_b): Ditto. (CODE_FOR_lsx_vsll_b): Ditto. (CODE_FOR_lsx_vsll_h): Ditto. (CODE_FOR_lsx_vsll_w): Ditto. (CODE_FOR_lsx_vsll_d): Ditto. (CODE_FOR_lsx_vslli_b): Ditto. (CODE_FOR_lsx_vslli_h): Ditto. (CODE_FOR_lsx_vslli_w): Ditto. (CODE_FOR_lsx_vslli_d): Ditto. (CODE_FOR_lsx_vsra_b): Ditto. (CODE_FOR_lsx_vsra_h): Ditto. (CODE_FOR_lsx_vsra_w): Ditto. (CODE_FOR_lsx_vsra_d): Ditto. (CODE_FOR_lsx_vsrai_b): Ditto. (CODE_FOR_lsx_vsrai_h): Ditto. (CODE_FOR_lsx_vsrai_w): Ditto. (CODE_FOR_lsx_vsrai_d): Ditto. (CODE_FOR_lsx_vsrl_b): Ditto. (CODE_FOR_lsx_vsrl_h): Ditto. (CODE_FOR_lsx_vsrl_w): Ditto. (CODE_FOR_lsx_vsrl_d): Ditto. (CODE_FOR_lsx_vsrli_b): Ditto. (CODE_FOR_lsx_vsrli_h): Ditto. (CODE_FOR_lsx_vsrli_w): Ditto. (CODE_FOR_lsx_vsrli_d): Ditto. (CODE_FOR_lsx_vsub_b): Ditto. (CODE_FOR_lsx_vsub_h): Ditto. (CODE_FOR_lsx_vsub_w): Ditto. (CODE_FOR_lsx_vsub_d): Ditto. (CODE_FOR_lsx_vsubi_bu): Ditto. (CODE_FOR_lsx_vsubi_hu): Ditto. (CODE_FOR_lsx_vsubi_wu): Ditto. (CODE_FOR_lsx_vsubi_du): Ditto. (CODE_FOR_lsx_vpackod_d): Ditto. (CODE_FOR_lsx_vpackev_d): Ditto. (CODE_FOR_lsx_vpickod_d): Ditto. (CODE_FOR_lsx_vpickev_d): Ditto. (CODE_FOR_lsx_vrepli_b): Ditto. (CODE_FOR_lsx_vrepli_h): Ditto. (CODE_FOR_lsx_vrepli_w): Ditto. (CODE_FOR_lsx_vrepli_d): Ditto. (CODE_FOR_lsx_vsat_b): Ditto. (CODE_FOR_lsx_vsat_h): Ditto. (CODE_FOR_lsx_vsat_w): Ditto. (CODE_FOR_lsx_vsat_d): Ditto. (CODE_FOR_lsx_vsat_bu): Ditto. (CODE_FOR_lsx_vsat_hu): Ditto. (CODE_FOR_lsx_vsat_wu): Ditto. (CODE_FOR_lsx_vsat_du): Ditto. (CODE_FOR_lsx_vavg_b): Ditto. (CODE_FOR_lsx_vavg_h): Ditto. (CODE_FOR_lsx_vavg_w): Ditto. (CODE_FOR_lsx_vavg_d): Ditto. (CODE_FOR_lsx_vavg_bu): Ditto. (CODE_FOR_lsx_vavg_hu): Ditto. (CODE_FOR_lsx_vavg_wu): Ditto. (CODE_FOR_lsx_vavg_du): Ditto. (CODE_FOR_lsx_vavgr_b): Ditto. (CODE_FOR_lsx_vavgr_h): Ditto. (CODE_FOR_lsx_vavgr_w): Ditto. (CODE_FOR_lsx_vavgr_d): Ditto. (CODE_FOR_lsx_vavgr_bu): Ditto. (CODE_FOR_lsx_vavgr_hu): Ditto. (CODE_FOR_lsx_vavgr_wu): Ditto. (CODE_FOR_lsx_vavgr_du): Ditto. (CODE_FOR_lsx_vssub_b): Ditto. (CODE_FOR_lsx_vssub_h): Ditto. (CODE_FOR_lsx_vssub_w): Ditto. (CODE_FOR_lsx_vssub_d): Ditto. (CODE_FOR_lsx_vssub_bu): Ditto. (CODE_FOR_lsx_vssub_hu): Ditto. (CODE_FOR_lsx_vssub_wu): Ditto. (CODE_FOR_lsx_vssub_du): Ditto. (CODE_FOR_lsx_vabsd_b): Ditto. (CODE_FOR_lsx_vabsd_h): Ditto. (CODE_FOR_lsx_vabsd_w): Ditto. (CODE_FOR_lsx_vabsd_d): Ditto. (CODE_FOR_lsx_vabsd_bu): Ditto. (CODE_FOR_lsx_vabsd_hu): Ditto. (CODE_FOR_lsx_vabsd_wu): Ditto. (CODE_FOR_lsx_vabsd_du): Ditto. (CODE_FOR_lsx_vftint_w_s): Ditto. (CODE_FOR_lsx_vftint_l_d): Ditto. (CODE_FOR_lsx_vftint_wu_s): Ditto. (CODE_FOR_lsx_vftint_lu_d): Ditto. (CODE_FOR_lsx_vandn_v): Ditto. (CODE_FOR_lsx_vorn_v): Ditto. (CODE_FOR_lsx_vneg_b): Ditto. (CODE_FOR_lsx_vneg_h): Ditto. (CODE_FOR_lsx_vneg_w): Ditto. (CODE_FOR_lsx_vneg_d): Ditto. (CODE_FOR_lsx_vshuf4i_d): Ditto. (CODE_FOR_lsx_vbsrl_v): Ditto. (CODE_FOR_lsx_vbsll_v): Ditto. (CODE_FOR_lsx_vfmadd_s): Ditto. (CODE_FOR_lsx_vfmadd_d): Ditto. (CODE_FOR_lsx_vfmsub_s): Ditto. (CODE_FOR_lsx_vfmsub_d): Ditto. (CODE_FOR_lsx_vfnmadd_s): Ditto. (CODE_FOR_lsx_vfnmadd_d): Ditto. (CODE_FOR_lsx_vfnmsub_s): Ditto. (CODE_FOR_lsx_vfnmsub_d): Ditto. (CODE_FOR_lsx_vmuh_b): Ditto. (CODE_FOR_lsx_vmuh_h): Ditto. (CODE_FOR_lsx_vmuh_w): Ditto. (CODE_FOR_lsx_vmuh_d): Ditto. (CODE_FOR_lsx_vmuh_bu): Ditto. (CODE_FOR_lsx_vmuh_hu): Ditto. (CODE_FOR_lsx_vmuh_wu): Ditto. (CODE_FOR_lsx_vmuh_du): Ditto. (CODE_FOR_lsx_vsllwil_h_b): Ditto. (CODE_FOR_lsx_vsllwil_w_h): Ditto. (CODE_FOR_lsx_vsllwil_d_w): Ditto. (CODE_FOR_lsx_vsllwil_hu_bu): Ditto. (CODE_FOR_lsx_vsllwil_wu_hu): Ditto. (CODE_FOR_lsx_vsllwil_du_wu): Ditto. (CODE_FOR_lsx_vssran_b_h): Ditto. (CODE_FOR_lsx_vssran_h_w): Ditto. (CODE_FOR_lsx_vssran_w_d): Ditto. (CODE_FOR_lsx_vssran_bu_h): Ditto. (CODE_FOR_lsx_vssran_hu_w): Ditto. (CODE_FOR_lsx_vssran_wu_d): Ditto. (CODE_FOR_lsx_vssrarn_b_h): Ditto. (CODE_FOR_lsx_vssrarn_h_w): Ditto. (CODE_FOR_lsx_vssrarn_w_d): Ditto. (CODE_FOR_lsx_vssrarn_bu_h): Ditto. (CODE_FOR_lsx_vssrarn_hu_w): Ditto. (CODE_FOR_lsx_vssrarn_wu_d): Ditto. (CODE_FOR_lsx_vssrln_bu_h): Ditto. (CODE_FOR_lsx_vssrln_hu_w): Ditto. (CODE_FOR_lsx_vssrln_wu_d): Ditto. (CODE_FOR_lsx_vssrlrn_bu_h): Ditto. (CODE_FOR_lsx_vssrlrn_hu_w): Ditto. (CODE_FOR_lsx_vssrlrn_wu_d): Ditto. (loongarch_builtin_vector_type): Ditto. (loongarch_build_cvpointer_type): Ditto. (LARCH_ATYPE_CVPOINTER): Ditto. (LARCH_ATYPE_BOOLEAN): Ditto. (LARCH_ATYPE_V2SF): Ditto. (LARCH_ATYPE_V2HI): Ditto. (LARCH_ATYPE_V2SI): Ditto. (LARCH_ATYPE_V4QI): Ditto. (LARCH_ATYPE_V4HI): Ditto. (LARCH_ATYPE_V8QI): Ditto. (LARCH_ATYPE_V2DI): Ditto. (LARCH_ATYPE_V4SI): Ditto. (LARCH_ATYPE_V8HI): Ditto. (LARCH_ATYPE_V16QI): Ditto. (LARCH_ATYPE_V2DF): Ditto. (LARCH_ATYPE_V4SF): Ditto. (LARCH_ATYPE_V4DI): Ditto. (LARCH_ATYPE_V8SI): Ditto. (LARCH_ATYPE_V16HI): Ditto. (LARCH_ATYPE_V32QI): Ditto. (LARCH_ATYPE_V4DF): Ditto. (LARCH_ATYPE_V8SF): Ditto. (LARCH_ATYPE_UV2DI): Ditto. (LARCH_ATYPE_UV4SI): Ditto. (LARCH_ATYPE_UV8HI): Ditto. (LARCH_ATYPE_UV16QI): Ditto. (LARCH_ATYPE_UV4DI): Ditto. (LARCH_ATYPE_UV8SI): Ditto. (LARCH_ATYPE_UV16HI): Ditto. (LARCH_ATYPE_UV32QI): Ditto. (LARCH_ATYPE_UV2SI): Ditto. (LARCH_ATYPE_UV4HI): Ditto. (LARCH_ATYPE_UV8QI): Ditto. (loongarch_builtin_vectorized_function): Ditto. (LARCH_GET_BUILTIN): Ditto. (loongarch_expand_builtin_insn): Ditto. (loongarch_expand_builtin_lsx_test_branch): Ditto. (loongarch_expand_builtin): Ditto. * config/loongarch/loongarch-ftypes.def (1): Ditto. (2): Ditto. (3): Ditto. (4): Ditto. * config/loongarch/lsxintrin.h: New file. --- gcc/config.gcc | 2 +- gcc/config/loongarch/loongarch-builtins.cc | 1498 +++++- gcc/config/loongarch/loongarch-ftypes.def | 397 +- gcc/config/loongarch/lsxintrin.h | 5181 ++++++++++++++++++++ 4 files changed, 7071 insertions(+), 7 deletions(-) create mode 100644 gcc/config/loongarch/lsxintrin.h diff --git a/gcc/config.gcc b/gcc/config.gcc index 415e0e1ebc5..d6b809cdb55 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -469,7 +469,7 @@ mips*-*-*) ;; loongarch*-*-*) cpu_type=loongarch - extra_headers="larchintrin.h" + extra_headers="larchintrin.h lsxintrin.h" extra_objs="loongarch-c.o loongarch-builtins.o loongarch-cpu.o loongarch-opts.o loongarch-def.o" extra_gcc_objs="loongarch-driver.o loongarch-cpu.o loongarch-opts.o loongarch-def.o" extra_options="${extra_options} g.opt fused-madd.opt" diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index ebe70a986c3..5958f5b7fbe 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -34,14 +34,18 @@ along with GCC; see the file COPYING3. If not see #include "recog.h" #include "diagnostic.h" #include "fold-const.h" +#include "explow.h" #include "expr.h" #include "langhooks.h" #include "emit-rtl.h" +#include "case-cfn-macros.h" /* Macros to create an enumeration identifier for a function prototype. */ #define LARCH_FTYPE_NAME1(A, B) LARCH_##A##_FTYPE_##B #define LARCH_FTYPE_NAME2(A, B, C) LARCH_##A##_FTYPE_##B##_##C #define LARCH_FTYPE_NAME3(A, B, C, D) LARCH_##A##_FTYPE_##B##_##C##_##D +#define LARCH_FTYPE_NAME4(A, B, C, D, E) \ + LARCH_##A##_FTYPE_##B##_##C##_##D##_##E /* Classifies the prototype of a built-in function. */ enum loongarch_function_type @@ -64,6 +68,12 @@ enum loongarch_builtin_type value and the arguments are mapped to operands 0 and above. */ LARCH_BUILTIN_DIRECT_NO_TARGET, + /* For generating LoongArch LSX. */ + LARCH_BUILTIN_LSX, + + /* The function corresponds to an LSX conditional branch instruction + combined with a compare instruction. */ + LARCH_BUILTIN_LSX_TEST_BRANCH, }; /* Declare an availability predicate for built-in functions that require @@ -101,6 +111,7 @@ struct loongarch_builtin_description }; AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) +AVAIL_ALL (lsx, ISA_HAS_LSX) /* Construct a loongarch_builtin_description from the given arguments. @@ -120,8 +131,8 @@ AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) #define LARCH_BUILTIN(INSN, NAME, BUILTIN_TYPE, FUNCTION_TYPE, AVAIL) \ { \ CODE_FOR_loongarch_##INSN, "__builtin_loongarch_" NAME, \ - BUILTIN_TYPE, FUNCTION_TYPE, \ - loongarch_builtin_avail_##AVAIL \ + BUILTIN_TYPE, FUNCTION_TYPE, \ + loongarch_builtin_avail_##AVAIL \ } /* Define __builtin_loongarch_, which is a LARCH_BUILTIN_DIRECT function @@ -137,6 +148,300 @@ AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) LARCH_BUILTIN (INSN, #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \ FUNCTION_TYPE, AVAIL) +/* Define an LSX LARCH_BUILTIN_DIRECT function __builtin_lsx_ + for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description + field. */ +#define LSX_BUILTIN(INSN, FUNCTION_TYPE) \ + { CODE_FOR_lsx_ ## INSN, \ + "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT, \ + FUNCTION_TYPE, loongarch_builtin_avail_lsx } + + +/* Define an LSX LARCH_BUILTIN_LSX_TEST_BRANCH function __builtin_lsx_ + for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description + field. */ +#define LSX_BUILTIN_TEST_BRANCH(INSN, FUNCTION_TYPE) \ + { CODE_FOR_lsx_ ## INSN, \ + "__builtin_lsx_" #INSN, LARCH_BUILTIN_LSX_TEST_BRANCH, \ + FUNCTION_TYPE, loongarch_builtin_avail_lsx } + +/* Define an LSX LARCH_BUILTIN_DIRECT_NO_TARGET function __builtin_lsx_ + for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description + field. */ +#define LSX_NO_TARGET_BUILTIN(INSN, FUNCTION_TYPE) \ + { CODE_FOR_lsx_ ## INSN, \ + "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \ + FUNCTION_TYPE, loongarch_builtin_avail_lsx } + +/* LoongArch SX define CODE_FOR_lsx_xxx */ +#define CODE_FOR_lsx_vsadd_b CODE_FOR_ssaddv16qi3 +#define CODE_FOR_lsx_vsadd_h CODE_FOR_ssaddv8hi3 +#define CODE_FOR_lsx_vsadd_w CODE_FOR_ssaddv4si3 +#define CODE_FOR_lsx_vsadd_d CODE_FOR_ssaddv2di3 +#define CODE_FOR_lsx_vsadd_bu CODE_FOR_usaddv16qi3 +#define CODE_FOR_lsx_vsadd_hu CODE_FOR_usaddv8hi3 +#define CODE_FOR_lsx_vsadd_wu CODE_FOR_usaddv4si3 +#define CODE_FOR_lsx_vsadd_du CODE_FOR_usaddv2di3 +#define CODE_FOR_lsx_vadd_b CODE_FOR_addv16qi3 +#define CODE_FOR_lsx_vadd_h CODE_FOR_addv8hi3 +#define CODE_FOR_lsx_vadd_w CODE_FOR_addv4si3 +#define CODE_FOR_lsx_vadd_d CODE_FOR_addv2di3 +#define CODE_FOR_lsx_vaddi_bu CODE_FOR_addv16qi3 +#define CODE_FOR_lsx_vaddi_hu CODE_FOR_addv8hi3 +#define CODE_FOR_lsx_vaddi_wu CODE_FOR_addv4si3 +#define CODE_FOR_lsx_vaddi_du CODE_FOR_addv2di3 +#define CODE_FOR_lsx_vand_v CODE_FOR_andv16qi3 +#define CODE_FOR_lsx_vandi_b CODE_FOR_andv16qi3 +#define CODE_FOR_lsx_bnz_v CODE_FOR_lsx_bnz_v_b +#define CODE_FOR_lsx_bz_v CODE_FOR_lsx_bz_v_b +#define CODE_FOR_lsx_vbitsel_v CODE_FOR_lsx_vbitsel_b +#define CODE_FOR_lsx_vseqi_b CODE_FOR_lsx_vseq_b +#define CODE_FOR_lsx_vseqi_h CODE_FOR_lsx_vseq_h +#define CODE_FOR_lsx_vseqi_w CODE_FOR_lsx_vseq_w +#define CODE_FOR_lsx_vseqi_d CODE_FOR_lsx_vseq_d +#define CODE_FOR_lsx_vslti_b CODE_FOR_lsx_vslt_b +#define CODE_FOR_lsx_vslti_h CODE_FOR_lsx_vslt_h +#define CODE_FOR_lsx_vslti_w CODE_FOR_lsx_vslt_w +#define CODE_FOR_lsx_vslti_d CODE_FOR_lsx_vslt_d +#define CODE_FOR_lsx_vslti_bu CODE_FOR_lsx_vslt_bu +#define CODE_FOR_lsx_vslti_hu CODE_FOR_lsx_vslt_hu +#define CODE_FOR_lsx_vslti_wu CODE_FOR_lsx_vslt_wu +#define CODE_FOR_lsx_vslti_du CODE_FOR_lsx_vslt_du +#define CODE_FOR_lsx_vslei_b CODE_FOR_lsx_vsle_b +#define CODE_FOR_lsx_vslei_h CODE_FOR_lsx_vsle_h +#define CODE_FOR_lsx_vslei_w CODE_FOR_lsx_vsle_w +#define CODE_FOR_lsx_vslei_d CODE_FOR_lsx_vsle_d +#define CODE_FOR_lsx_vslei_bu CODE_FOR_lsx_vsle_bu +#define CODE_FOR_lsx_vslei_hu CODE_FOR_lsx_vsle_hu +#define CODE_FOR_lsx_vslei_wu CODE_FOR_lsx_vsle_wu +#define CODE_FOR_lsx_vslei_du CODE_FOR_lsx_vsle_du +#define CODE_FOR_lsx_vdiv_b CODE_FOR_divv16qi3 +#define CODE_FOR_lsx_vdiv_h CODE_FOR_divv8hi3 +#define CODE_FOR_lsx_vdiv_w CODE_FOR_divv4si3 +#define CODE_FOR_lsx_vdiv_d CODE_FOR_divv2di3 +#define CODE_FOR_lsx_vdiv_bu CODE_FOR_udivv16qi3 +#define CODE_FOR_lsx_vdiv_hu CODE_FOR_udivv8hi3 +#define CODE_FOR_lsx_vdiv_wu CODE_FOR_udivv4si3 +#define CODE_FOR_lsx_vdiv_du CODE_FOR_udivv2di3 +#define CODE_FOR_lsx_vfadd_s CODE_FOR_addv4sf3 +#define CODE_FOR_lsx_vfadd_d CODE_FOR_addv2df3 +#define CODE_FOR_lsx_vftintrz_w_s CODE_FOR_fix_truncv4sfv4si2 +#define CODE_FOR_lsx_vftintrz_l_d CODE_FOR_fix_truncv2dfv2di2 +#define CODE_FOR_lsx_vftintrz_wu_s CODE_FOR_fixuns_truncv4sfv4si2 +#define CODE_FOR_lsx_vftintrz_lu_d CODE_FOR_fixuns_truncv2dfv2di2 +#define CODE_FOR_lsx_vffint_s_w CODE_FOR_floatv4siv4sf2 +#define CODE_FOR_lsx_vffint_d_l CODE_FOR_floatv2div2df2 +#define CODE_FOR_lsx_vffint_s_wu CODE_FOR_floatunsv4siv4sf2 +#define CODE_FOR_lsx_vffint_d_lu CODE_FOR_floatunsv2div2df2 +#define CODE_FOR_lsx_vfsub_s CODE_FOR_subv4sf3 +#define CODE_FOR_lsx_vfsub_d CODE_FOR_subv2df3 +#define CODE_FOR_lsx_vfmul_s CODE_FOR_mulv4sf3 +#define CODE_FOR_lsx_vfmul_d CODE_FOR_mulv2df3 +#define CODE_FOR_lsx_vfdiv_s CODE_FOR_divv4sf3 +#define CODE_FOR_lsx_vfdiv_d CODE_FOR_divv2df3 +#define CODE_FOR_lsx_vfmax_s CODE_FOR_smaxv4sf3 +#define CODE_FOR_lsx_vfmax_d CODE_FOR_smaxv2df3 +#define CODE_FOR_lsx_vfmin_s CODE_FOR_sminv4sf3 +#define CODE_FOR_lsx_vfmin_d CODE_FOR_sminv2df3 +#define CODE_FOR_lsx_vfsqrt_s CODE_FOR_sqrtv4sf2 +#define CODE_FOR_lsx_vfsqrt_d CODE_FOR_sqrtv2df2 +#define CODE_FOR_lsx_vflogb_s CODE_FOR_logbv4sf2 +#define CODE_FOR_lsx_vflogb_d CODE_FOR_logbv2df2 +#define CODE_FOR_lsx_vmax_b CODE_FOR_smaxv16qi3 +#define CODE_FOR_lsx_vmax_h CODE_FOR_smaxv8hi3 +#define CODE_FOR_lsx_vmax_w CODE_FOR_smaxv4si3 +#define CODE_FOR_lsx_vmax_d CODE_FOR_smaxv2di3 +#define CODE_FOR_lsx_vmaxi_b CODE_FOR_smaxv16qi3 +#define CODE_FOR_lsx_vmaxi_h CODE_FOR_smaxv8hi3 +#define CODE_FOR_lsx_vmaxi_w CODE_FOR_smaxv4si3 +#define CODE_FOR_lsx_vmaxi_d CODE_FOR_smaxv2di3 +#define CODE_FOR_lsx_vmax_bu CODE_FOR_umaxv16qi3 +#define CODE_FOR_lsx_vmax_hu CODE_FOR_umaxv8hi3 +#define CODE_FOR_lsx_vmax_wu CODE_FOR_umaxv4si3 +#define CODE_FOR_lsx_vmax_du CODE_FOR_umaxv2di3 +#define CODE_FOR_lsx_vmaxi_bu CODE_FOR_umaxv16qi3 +#define CODE_FOR_lsx_vmaxi_hu CODE_FOR_umaxv8hi3 +#define CODE_FOR_lsx_vmaxi_wu CODE_FOR_umaxv4si3 +#define CODE_FOR_lsx_vmaxi_du CODE_FOR_umaxv2di3 +#define CODE_FOR_lsx_vmin_b CODE_FOR_sminv16qi3 +#define CODE_FOR_lsx_vmin_h CODE_FOR_sminv8hi3 +#define CODE_FOR_lsx_vmin_w CODE_FOR_sminv4si3 +#define CODE_FOR_lsx_vmin_d CODE_FOR_sminv2di3 +#define CODE_FOR_lsx_vmini_b CODE_FOR_sminv16qi3 +#define CODE_FOR_lsx_vmini_h CODE_FOR_sminv8hi3 +#define CODE_FOR_lsx_vmini_w CODE_FOR_sminv4si3 +#define CODE_FOR_lsx_vmini_d CODE_FOR_sminv2di3 +#define CODE_FOR_lsx_vmin_bu CODE_FOR_uminv16qi3 +#define CODE_FOR_lsx_vmin_hu CODE_FOR_uminv8hi3 +#define CODE_FOR_lsx_vmin_wu CODE_FOR_uminv4si3 +#define CODE_FOR_lsx_vmin_du CODE_FOR_uminv2di3 +#define CODE_FOR_lsx_vmini_bu CODE_FOR_uminv16qi3 +#define CODE_FOR_lsx_vmini_hu CODE_FOR_uminv8hi3 +#define CODE_FOR_lsx_vmini_wu CODE_FOR_uminv4si3 +#define CODE_FOR_lsx_vmini_du CODE_FOR_uminv2di3 +#define CODE_FOR_lsx_vmod_b CODE_FOR_modv16qi3 +#define CODE_FOR_lsx_vmod_h CODE_FOR_modv8hi3 +#define CODE_FOR_lsx_vmod_w CODE_FOR_modv4si3 +#define CODE_FOR_lsx_vmod_d CODE_FOR_modv2di3 +#define CODE_FOR_lsx_vmod_bu CODE_FOR_umodv16qi3 +#define CODE_FOR_lsx_vmod_hu CODE_FOR_umodv8hi3 +#define CODE_FOR_lsx_vmod_wu CODE_FOR_umodv4si3 +#define CODE_FOR_lsx_vmod_du CODE_FOR_umodv2di3 +#define CODE_FOR_lsx_vmul_b CODE_FOR_mulv16qi3 +#define CODE_FOR_lsx_vmul_h CODE_FOR_mulv8hi3 +#define CODE_FOR_lsx_vmul_w CODE_FOR_mulv4si3 +#define CODE_FOR_lsx_vmul_d CODE_FOR_mulv2di3 +#define CODE_FOR_lsx_vclz_b CODE_FOR_clzv16qi2 +#define CODE_FOR_lsx_vclz_h CODE_FOR_clzv8hi2 +#define CODE_FOR_lsx_vclz_w CODE_FOR_clzv4si2 +#define CODE_FOR_lsx_vclz_d CODE_FOR_clzv2di2 +#define CODE_FOR_lsx_vnor_v CODE_FOR_lsx_nor_b +#define CODE_FOR_lsx_vor_v CODE_FOR_iorv16qi3 +#define CODE_FOR_lsx_vori_b CODE_FOR_iorv16qi3 +#define CODE_FOR_lsx_vnori_b CODE_FOR_lsx_nor_b +#define CODE_FOR_lsx_vpcnt_b CODE_FOR_popcountv16qi2 +#define CODE_FOR_lsx_vpcnt_h CODE_FOR_popcountv8hi2 +#define CODE_FOR_lsx_vpcnt_w CODE_FOR_popcountv4si2 +#define CODE_FOR_lsx_vpcnt_d CODE_FOR_popcountv2di2 +#define CODE_FOR_lsx_vxor_v CODE_FOR_xorv16qi3 +#define CODE_FOR_lsx_vxori_b CODE_FOR_xorv16qi3 +#define CODE_FOR_lsx_vsll_b CODE_FOR_vashlv16qi3 +#define CODE_FOR_lsx_vsll_h CODE_FOR_vashlv8hi3 +#define CODE_FOR_lsx_vsll_w CODE_FOR_vashlv4si3 +#define CODE_FOR_lsx_vsll_d CODE_FOR_vashlv2di3 +#define CODE_FOR_lsx_vslli_b CODE_FOR_vashlv16qi3 +#define CODE_FOR_lsx_vslli_h CODE_FOR_vashlv8hi3 +#define CODE_FOR_lsx_vslli_w CODE_FOR_vashlv4si3 +#define CODE_FOR_lsx_vslli_d CODE_FOR_vashlv2di3 +#define CODE_FOR_lsx_vsra_b CODE_FOR_vashrv16qi3 +#define CODE_FOR_lsx_vsra_h CODE_FOR_vashrv8hi3 +#define CODE_FOR_lsx_vsra_w CODE_FOR_vashrv4si3 +#define CODE_FOR_lsx_vsra_d CODE_FOR_vashrv2di3 +#define CODE_FOR_lsx_vsrai_b CODE_FOR_vashrv16qi3 +#define CODE_FOR_lsx_vsrai_h CODE_FOR_vashrv8hi3 +#define CODE_FOR_lsx_vsrai_w CODE_FOR_vashrv4si3 +#define CODE_FOR_lsx_vsrai_d CODE_FOR_vashrv2di3 +#define CODE_FOR_lsx_vsrl_b CODE_FOR_vlshrv16qi3 +#define CODE_FOR_lsx_vsrl_h CODE_FOR_vlshrv8hi3 +#define CODE_FOR_lsx_vsrl_w CODE_FOR_vlshrv4si3 +#define CODE_FOR_lsx_vsrl_d CODE_FOR_vlshrv2di3 +#define CODE_FOR_lsx_vsrli_b CODE_FOR_vlshrv16qi3 +#define CODE_FOR_lsx_vsrli_h CODE_FOR_vlshrv8hi3 +#define CODE_FOR_lsx_vsrli_w CODE_FOR_vlshrv4si3 +#define CODE_FOR_lsx_vsrli_d CODE_FOR_vlshrv2di3 +#define CODE_FOR_lsx_vsub_b CODE_FOR_subv16qi3 +#define CODE_FOR_lsx_vsub_h CODE_FOR_subv8hi3 +#define CODE_FOR_lsx_vsub_w CODE_FOR_subv4si3 +#define CODE_FOR_lsx_vsub_d CODE_FOR_subv2di3 +#define CODE_FOR_lsx_vsubi_bu CODE_FOR_subv16qi3 +#define CODE_FOR_lsx_vsubi_hu CODE_FOR_subv8hi3 +#define CODE_FOR_lsx_vsubi_wu CODE_FOR_subv4si3 +#define CODE_FOR_lsx_vsubi_du CODE_FOR_subv2di3 + +#define CODE_FOR_lsx_vpackod_d CODE_FOR_lsx_vilvh_d +#define CODE_FOR_lsx_vpackev_d CODE_FOR_lsx_vilvl_d +#define CODE_FOR_lsx_vpickod_d CODE_FOR_lsx_vilvh_d +#define CODE_FOR_lsx_vpickev_d CODE_FOR_lsx_vilvl_d + +#define CODE_FOR_lsx_vrepli_b CODE_FOR_lsx_vrepliv16qi +#define CODE_FOR_lsx_vrepli_h CODE_FOR_lsx_vrepliv8hi +#define CODE_FOR_lsx_vrepli_w CODE_FOR_lsx_vrepliv4si +#define CODE_FOR_lsx_vrepli_d CODE_FOR_lsx_vrepliv2di +#define CODE_FOR_lsx_vsat_b CODE_FOR_lsx_vsat_s_b +#define CODE_FOR_lsx_vsat_h CODE_FOR_lsx_vsat_s_h +#define CODE_FOR_lsx_vsat_w CODE_FOR_lsx_vsat_s_w +#define CODE_FOR_lsx_vsat_d CODE_FOR_lsx_vsat_s_d +#define CODE_FOR_lsx_vsat_bu CODE_FOR_lsx_vsat_u_bu +#define CODE_FOR_lsx_vsat_hu CODE_FOR_lsx_vsat_u_hu +#define CODE_FOR_lsx_vsat_wu CODE_FOR_lsx_vsat_u_wu +#define CODE_FOR_lsx_vsat_du CODE_FOR_lsx_vsat_u_du +#define CODE_FOR_lsx_vavg_b CODE_FOR_lsx_vavg_s_b +#define CODE_FOR_lsx_vavg_h CODE_FOR_lsx_vavg_s_h +#define CODE_FOR_lsx_vavg_w CODE_FOR_lsx_vavg_s_w +#define CODE_FOR_lsx_vavg_d CODE_FOR_lsx_vavg_s_d +#define CODE_FOR_lsx_vavg_bu CODE_FOR_lsx_vavg_u_bu +#define CODE_FOR_lsx_vavg_hu CODE_FOR_lsx_vavg_u_hu +#define CODE_FOR_lsx_vavg_wu CODE_FOR_lsx_vavg_u_wu +#define CODE_FOR_lsx_vavg_du CODE_FOR_lsx_vavg_u_du +#define CODE_FOR_lsx_vavgr_b CODE_FOR_lsx_vavgr_s_b +#define CODE_FOR_lsx_vavgr_h CODE_FOR_lsx_vavgr_s_h +#define CODE_FOR_lsx_vavgr_w CODE_FOR_lsx_vavgr_s_w +#define CODE_FOR_lsx_vavgr_d CODE_FOR_lsx_vavgr_s_d +#define CODE_FOR_lsx_vavgr_bu CODE_FOR_lsx_vavgr_u_bu +#define CODE_FOR_lsx_vavgr_hu CODE_FOR_lsx_vavgr_u_hu +#define CODE_FOR_lsx_vavgr_wu CODE_FOR_lsx_vavgr_u_wu +#define CODE_FOR_lsx_vavgr_du CODE_FOR_lsx_vavgr_u_du +#define CODE_FOR_lsx_vssub_b CODE_FOR_lsx_vssub_s_b +#define CODE_FOR_lsx_vssub_h CODE_FOR_lsx_vssub_s_h +#define CODE_FOR_lsx_vssub_w CODE_FOR_lsx_vssub_s_w +#define CODE_FOR_lsx_vssub_d CODE_FOR_lsx_vssub_s_d +#define CODE_FOR_lsx_vssub_bu CODE_FOR_lsx_vssub_u_bu +#define CODE_FOR_lsx_vssub_hu CODE_FOR_lsx_vssub_u_hu +#define CODE_FOR_lsx_vssub_wu CODE_FOR_lsx_vssub_u_wu +#define CODE_FOR_lsx_vssub_du CODE_FOR_lsx_vssub_u_du +#define CODE_FOR_lsx_vabsd_b CODE_FOR_lsx_vabsd_s_b +#define CODE_FOR_lsx_vabsd_h CODE_FOR_lsx_vabsd_s_h +#define CODE_FOR_lsx_vabsd_w CODE_FOR_lsx_vabsd_s_w +#define CODE_FOR_lsx_vabsd_d CODE_FOR_lsx_vabsd_s_d +#define CODE_FOR_lsx_vabsd_bu CODE_FOR_lsx_vabsd_u_bu +#define CODE_FOR_lsx_vabsd_hu CODE_FOR_lsx_vabsd_u_hu +#define CODE_FOR_lsx_vabsd_wu CODE_FOR_lsx_vabsd_u_wu +#define CODE_FOR_lsx_vabsd_du CODE_FOR_lsx_vabsd_u_du +#define CODE_FOR_lsx_vftint_w_s CODE_FOR_lsx_vftint_s_w_s +#define CODE_FOR_lsx_vftint_l_d CODE_FOR_lsx_vftint_s_l_d +#define CODE_FOR_lsx_vftint_wu_s CODE_FOR_lsx_vftint_u_wu_s +#define CODE_FOR_lsx_vftint_lu_d CODE_FOR_lsx_vftint_u_lu_d +#define CODE_FOR_lsx_vandn_v CODE_FOR_vandnv16qi3 +#define CODE_FOR_lsx_vorn_v CODE_FOR_vornv16qi3 +#define CODE_FOR_lsx_vneg_b CODE_FOR_vnegv16qi2 +#define CODE_FOR_lsx_vneg_h CODE_FOR_vnegv8hi2 +#define CODE_FOR_lsx_vneg_w CODE_FOR_vnegv4si2 +#define CODE_FOR_lsx_vneg_d CODE_FOR_vnegv2di2 +#define CODE_FOR_lsx_vshuf4i_d CODE_FOR_lsx_vshuf4i_d +#define CODE_FOR_lsx_vbsrl_v CODE_FOR_lsx_vbsrl_b +#define CODE_FOR_lsx_vbsll_v CODE_FOR_lsx_vbsll_b +#define CODE_FOR_lsx_vfmadd_s CODE_FOR_fmav4sf4 +#define CODE_FOR_lsx_vfmadd_d CODE_FOR_fmav2df4 +#define CODE_FOR_lsx_vfmsub_s CODE_FOR_fmsv4sf4 +#define CODE_FOR_lsx_vfmsub_d CODE_FOR_fmsv2df4 +#define CODE_FOR_lsx_vfnmadd_s CODE_FOR_vfnmaddv4sf4_nmadd4 +#define CODE_FOR_lsx_vfnmadd_d CODE_FOR_vfnmaddv2df4_nmadd4 +#define CODE_FOR_lsx_vfnmsub_s CODE_FOR_vfnmsubv4sf4_nmsub4 +#define CODE_FOR_lsx_vfnmsub_d CODE_FOR_vfnmsubv2df4_nmsub4 + +#define CODE_FOR_lsx_vmuh_b CODE_FOR_lsx_vmuh_s_b +#define CODE_FOR_lsx_vmuh_h CODE_FOR_lsx_vmuh_s_h +#define CODE_FOR_lsx_vmuh_w CODE_FOR_lsx_vmuh_s_w +#define CODE_FOR_lsx_vmuh_d CODE_FOR_lsx_vmuh_s_d +#define CODE_FOR_lsx_vmuh_bu CODE_FOR_lsx_vmuh_u_bu +#define CODE_FOR_lsx_vmuh_hu CODE_FOR_lsx_vmuh_u_hu +#define CODE_FOR_lsx_vmuh_wu CODE_FOR_lsx_vmuh_u_wu +#define CODE_FOR_lsx_vmuh_du CODE_FOR_lsx_vmuh_u_du +#define CODE_FOR_lsx_vsllwil_h_b CODE_FOR_lsx_vsllwil_s_h_b +#define CODE_FOR_lsx_vsllwil_w_h CODE_FOR_lsx_vsllwil_s_w_h +#define CODE_FOR_lsx_vsllwil_d_w CODE_FOR_lsx_vsllwil_s_d_w +#define CODE_FOR_lsx_vsllwil_hu_bu CODE_FOR_lsx_vsllwil_u_hu_bu +#define CODE_FOR_lsx_vsllwil_wu_hu CODE_FOR_lsx_vsllwil_u_wu_hu +#define CODE_FOR_lsx_vsllwil_du_wu CODE_FOR_lsx_vsllwil_u_du_wu +#define CODE_FOR_lsx_vssran_b_h CODE_FOR_lsx_vssran_s_b_h +#define CODE_FOR_lsx_vssran_h_w CODE_FOR_lsx_vssran_s_h_w +#define CODE_FOR_lsx_vssran_w_d CODE_FOR_lsx_vssran_s_w_d +#define CODE_FOR_lsx_vssran_bu_h CODE_FOR_lsx_vssran_u_bu_h +#define CODE_FOR_lsx_vssran_hu_w CODE_FOR_lsx_vssran_u_hu_w +#define CODE_FOR_lsx_vssran_wu_d CODE_FOR_lsx_vssran_u_wu_d +#define CODE_FOR_lsx_vssrarn_b_h CODE_FOR_lsx_vssrarn_s_b_h +#define CODE_FOR_lsx_vssrarn_h_w CODE_FOR_lsx_vssrarn_s_h_w +#define CODE_FOR_lsx_vssrarn_w_d CODE_FOR_lsx_vssrarn_s_w_d +#define CODE_FOR_lsx_vssrarn_bu_h CODE_FOR_lsx_vssrarn_u_bu_h +#define CODE_FOR_lsx_vssrarn_hu_w CODE_FOR_lsx_vssrarn_u_hu_w +#define CODE_FOR_lsx_vssrarn_wu_d CODE_FOR_lsx_vssrarn_u_wu_d +#define CODE_FOR_lsx_vssrln_bu_h CODE_FOR_lsx_vssrln_u_bu_h +#define CODE_FOR_lsx_vssrln_hu_w CODE_FOR_lsx_vssrln_u_hu_w +#define CODE_FOR_lsx_vssrln_wu_d CODE_FOR_lsx_vssrln_u_wu_d +#define CODE_FOR_lsx_vssrlrn_bu_h CODE_FOR_lsx_vssrlrn_u_bu_h +#define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w +#define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d + static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 DIRECT_BUILTIN (movfcsr2gr, LARCH_USI_FTYPE_UQI, hard_float), @@ -184,6 +489,727 @@ static const struct loongarch_builtin_description loongarch_builtins[] = { DIRECT_NO_TARGET_BUILTIN (asrtgt_d, LARCH_VOID_FTYPE_DI_DI, default), DIRECT_NO_TARGET_BUILTIN (syscall, LARCH_VOID_FTYPE_USI, default), DIRECT_NO_TARGET_BUILTIN (break, LARCH_VOID_FTYPE_USI, default), + + /* Built-in functions for LSX. */ + LSX_BUILTIN (vsll_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsll_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsll_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsll_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vslli_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vslli_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vslli_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vslli_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vsra_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsra_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsra_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsra_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsrai_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vsrai_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vsrai_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vsrai_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vsrar_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsrar_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsrar_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsrar_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsrari_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vsrari_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vsrari_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vsrari_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vsrl_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsrl_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsrl_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsrl_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsrli_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vsrli_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vsrli_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vsrli_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vsrlr_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsrlr_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsrlr_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsrlr_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsrlri_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vsrlri_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vsrlri_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vsrlri_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vbitclr_b, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vbitclr_h, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vbitclr_w, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vbitclr_d, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vbitclri_b, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vbitclri_h, LARCH_UV8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vbitclri_w, LARCH_UV4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vbitclri_d, LARCH_UV2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vbitset_b, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vbitset_h, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vbitset_w, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vbitset_d, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vbitseti_b, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vbitseti_h, LARCH_UV8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vbitseti_w, LARCH_UV4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vbitseti_d, LARCH_UV2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vbitrev_b, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vbitrev_h, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vbitrev_w, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vbitrev_d, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vbitrevi_b, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vbitrevi_h, LARCH_UV8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vbitrevi_w, LARCH_UV4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vbitrevi_d, LARCH_UV2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vadd_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vadd_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vadd_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vadd_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vaddi_bu, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vaddi_hu, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vaddi_wu, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vaddi_du, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vsub_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsub_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsub_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsub_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsubi_bu, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vsubi_hu, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vsubi_wu, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vsubi_du, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vmax_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vmax_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vmax_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vmax_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vmaxi_b, LARCH_V16QI_FTYPE_V16QI_QI), + LSX_BUILTIN (vmaxi_h, LARCH_V8HI_FTYPE_V8HI_QI), + LSX_BUILTIN (vmaxi_w, LARCH_V4SI_FTYPE_V4SI_QI), + LSX_BUILTIN (vmaxi_d, LARCH_V2DI_FTYPE_V2DI_QI), + LSX_BUILTIN (vmax_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vmax_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vmax_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vmax_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vmaxi_bu, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vmaxi_hu, LARCH_UV8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vmaxi_wu, LARCH_UV4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vmaxi_du, LARCH_UV2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vmin_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vmin_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vmin_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vmin_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vmini_b, LARCH_V16QI_FTYPE_V16QI_QI), + LSX_BUILTIN (vmini_h, LARCH_V8HI_FTYPE_V8HI_QI), + LSX_BUILTIN (vmini_w, LARCH_V4SI_FTYPE_V4SI_QI), + LSX_BUILTIN (vmini_d, LARCH_V2DI_FTYPE_V2DI_QI), + LSX_BUILTIN (vmin_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vmin_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vmin_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vmin_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vmini_bu, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vmini_hu, LARCH_UV8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vmini_wu, LARCH_UV4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vmini_du, LARCH_UV2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vseq_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vseq_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vseq_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vseq_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vseqi_b, LARCH_V16QI_FTYPE_V16QI_QI), + LSX_BUILTIN (vseqi_h, LARCH_V8HI_FTYPE_V8HI_QI), + LSX_BUILTIN (vseqi_w, LARCH_V4SI_FTYPE_V4SI_QI), + LSX_BUILTIN (vseqi_d, LARCH_V2DI_FTYPE_V2DI_QI), + LSX_BUILTIN (vslti_b, LARCH_V16QI_FTYPE_V16QI_QI), + LSX_BUILTIN (vslt_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vslt_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vslt_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vslt_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vslti_h, LARCH_V8HI_FTYPE_V8HI_QI), + LSX_BUILTIN (vslti_w, LARCH_V4SI_FTYPE_V4SI_QI), + LSX_BUILTIN (vslti_d, LARCH_V2DI_FTYPE_V2DI_QI), + LSX_BUILTIN (vslt_bu, LARCH_V16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vslt_hu, LARCH_V8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vslt_wu, LARCH_V4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vslt_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vslti_bu, LARCH_V16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vslti_hu, LARCH_V8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vslti_wu, LARCH_V4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vslti_du, LARCH_V2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vsle_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsle_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsle_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsle_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vslei_b, LARCH_V16QI_FTYPE_V16QI_QI), + LSX_BUILTIN (vslei_h, LARCH_V8HI_FTYPE_V8HI_QI), + LSX_BUILTIN (vslei_w, LARCH_V4SI_FTYPE_V4SI_QI), + LSX_BUILTIN (vslei_d, LARCH_V2DI_FTYPE_V2DI_QI), + LSX_BUILTIN (vsle_bu, LARCH_V16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vsle_hu, LARCH_V8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vsle_wu, LARCH_V4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vsle_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vslei_bu, LARCH_V16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vslei_hu, LARCH_V8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vslei_wu, LARCH_V4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vslei_du, LARCH_V2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vsat_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vsat_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vsat_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vsat_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vsat_bu, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vsat_hu, LARCH_UV8HI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vsat_wu, LARCH_UV4SI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vsat_du, LARCH_UV2DI_FTYPE_UV2DI_UQI), + LSX_BUILTIN (vadda_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vadda_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vadda_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vadda_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsadd_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsadd_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsadd_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsadd_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsadd_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vsadd_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vsadd_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vsadd_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vavg_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vavg_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vavg_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vavg_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vavg_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vavg_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vavg_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vavg_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vavgr_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vavgr_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vavgr_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vavgr_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vavgr_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vavgr_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vavgr_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vavgr_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vssub_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vssub_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vssub_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vssub_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssub_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vssub_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vssub_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vssub_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vabsd_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vabsd_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vabsd_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vabsd_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vabsd_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vabsd_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vabsd_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vabsd_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vmul_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vmul_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vmul_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vmul_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vmadd_b, LARCH_V16QI_FTYPE_V16QI_V16QI_V16QI), + LSX_BUILTIN (vmadd_h, LARCH_V8HI_FTYPE_V8HI_V8HI_V8HI), + LSX_BUILTIN (vmadd_w, LARCH_V4SI_FTYPE_V4SI_V4SI_V4SI), + LSX_BUILTIN (vmadd_d, LARCH_V2DI_FTYPE_V2DI_V2DI_V2DI), + LSX_BUILTIN (vmsub_b, LARCH_V16QI_FTYPE_V16QI_V16QI_V16QI), + LSX_BUILTIN (vmsub_h, LARCH_V8HI_FTYPE_V8HI_V8HI_V8HI), + LSX_BUILTIN (vmsub_w, LARCH_V4SI_FTYPE_V4SI_V4SI_V4SI), + LSX_BUILTIN (vmsub_d, LARCH_V2DI_FTYPE_V2DI_V2DI_V2DI), + LSX_BUILTIN (vdiv_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vdiv_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vdiv_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vdiv_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vdiv_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vdiv_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vdiv_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vdiv_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vhaddw_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vhaddw_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vhaddw_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vhaddw_hu_bu, LARCH_UV8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vhaddw_wu_hu, LARCH_UV4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vhaddw_du_wu, LARCH_UV2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vhsubw_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vhsubw_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vhsubw_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vhsubw_hu_bu, LARCH_V8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vhsubw_wu_hu, LARCH_V4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vhsubw_du_wu, LARCH_V2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vmod_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vmod_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vmod_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vmod_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vmod_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vmod_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vmod_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vmod_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vreplve_b, LARCH_V16QI_FTYPE_V16QI_SI), + LSX_BUILTIN (vreplve_h, LARCH_V8HI_FTYPE_V8HI_SI), + LSX_BUILTIN (vreplve_w, LARCH_V4SI_FTYPE_V4SI_SI), + LSX_BUILTIN (vreplve_d, LARCH_V2DI_FTYPE_V2DI_SI), + LSX_BUILTIN (vreplvei_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vreplvei_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vreplvei_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vreplvei_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vpickev_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vpickev_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vpickev_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vpickev_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vpickod_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vpickod_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vpickod_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vpickod_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vilvh_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vilvh_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vilvh_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vilvh_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vilvl_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vilvl_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vilvl_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vilvl_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vpackev_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vpackev_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vpackev_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vpackev_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vpackod_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vpackod_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vpackod_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vpackod_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vshuf_h, LARCH_V8HI_FTYPE_V8HI_V8HI_V8HI), + LSX_BUILTIN (vshuf_w, LARCH_V4SI_FTYPE_V4SI_V4SI_V4SI), + LSX_BUILTIN (vshuf_d, LARCH_V2DI_FTYPE_V2DI_V2DI_V2DI), + LSX_BUILTIN (vand_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vandi_b, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vor_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vori_b, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vnor_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vnori_b, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vxor_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vxori_b, LARCH_UV16QI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vbitsel_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI_UV16QI), + LSX_BUILTIN (vbitseli_b, LARCH_UV16QI_FTYPE_UV16QI_UV16QI_USI), + LSX_BUILTIN (vshuf4i_b, LARCH_V16QI_FTYPE_V16QI_USI), + LSX_BUILTIN (vshuf4i_h, LARCH_V8HI_FTYPE_V8HI_USI), + LSX_BUILTIN (vshuf4i_w, LARCH_V4SI_FTYPE_V4SI_USI), + LSX_BUILTIN (vreplgr2vr_b, LARCH_V16QI_FTYPE_SI), + LSX_BUILTIN (vreplgr2vr_h, LARCH_V8HI_FTYPE_SI), + LSX_BUILTIN (vreplgr2vr_w, LARCH_V4SI_FTYPE_SI), + LSX_BUILTIN (vreplgr2vr_d, LARCH_V2DI_FTYPE_DI), + LSX_BUILTIN (vpcnt_b, LARCH_V16QI_FTYPE_V16QI), + LSX_BUILTIN (vpcnt_h, LARCH_V8HI_FTYPE_V8HI), + LSX_BUILTIN (vpcnt_w, LARCH_V4SI_FTYPE_V4SI), + LSX_BUILTIN (vpcnt_d, LARCH_V2DI_FTYPE_V2DI), + LSX_BUILTIN (vclo_b, LARCH_V16QI_FTYPE_V16QI), + LSX_BUILTIN (vclo_h, LARCH_V8HI_FTYPE_V8HI), + LSX_BUILTIN (vclo_w, LARCH_V4SI_FTYPE_V4SI), + LSX_BUILTIN (vclo_d, LARCH_V2DI_FTYPE_V2DI), + LSX_BUILTIN (vclz_b, LARCH_V16QI_FTYPE_V16QI), + LSX_BUILTIN (vclz_h, LARCH_V8HI_FTYPE_V8HI), + LSX_BUILTIN (vclz_w, LARCH_V4SI_FTYPE_V4SI), + LSX_BUILTIN (vclz_d, LARCH_V2DI_FTYPE_V2DI), + LSX_BUILTIN (vpickve2gr_b, LARCH_SI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vpickve2gr_h, LARCH_SI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vpickve2gr_w, LARCH_SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vpickve2gr_d, LARCH_DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vpickve2gr_bu, LARCH_USI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vpickve2gr_hu, LARCH_USI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vpickve2gr_wu, LARCH_USI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vpickve2gr_du, LARCH_UDI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vinsgr2vr_b, LARCH_V16QI_FTYPE_V16QI_SI_UQI), + LSX_BUILTIN (vinsgr2vr_h, LARCH_V8HI_FTYPE_V8HI_SI_UQI), + LSX_BUILTIN (vinsgr2vr_w, LARCH_V4SI_FTYPE_V4SI_SI_UQI), + LSX_BUILTIN (vinsgr2vr_d, LARCH_V2DI_FTYPE_V2DI_DI_UQI), + LSX_BUILTIN_TEST_BRANCH (bnz_b, LARCH_SI_FTYPE_UV16QI), + LSX_BUILTIN_TEST_BRANCH (bnz_h, LARCH_SI_FTYPE_UV8HI), + LSX_BUILTIN_TEST_BRANCH (bnz_w, LARCH_SI_FTYPE_UV4SI), + LSX_BUILTIN_TEST_BRANCH (bnz_d, LARCH_SI_FTYPE_UV2DI), + LSX_BUILTIN_TEST_BRANCH (bz_b, LARCH_SI_FTYPE_UV16QI), + LSX_BUILTIN_TEST_BRANCH (bz_h, LARCH_SI_FTYPE_UV8HI), + LSX_BUILTIN_TEST_BRANCH (bz_w, LARCH_SI_FTYPE_UV4SI), + LSX_BUILTIN_TEST_BRANCH (bz_d, LARCH_SI_FTYPE_UV2DI), + LSX_BUILTIN_TEST_BRANCH (bz_v, LARCH_SI_FTYPE_UV16QI), + LSX_BUILTIN_TEST_BRANCH (bnz_v, LARCH_SI_FTYPE_UV16QI), + LSX_BUILTIN (vrepli_b, LARCH_V16QI_FTYPE_HI), + LSX_BUILTIN (vrepli_h, LARCH_V8HI_FTYPE_HI), + LSX_BUILTIN (vrepli_w, LARCH_V4SI_FTYPE_HI), + LSX_BUILTIN (vrepli_d, LARCH_V2DI_FTYPE_HI), + LSX_BUILTIN (vfcmp_caf_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_caf_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cor_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cor_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cun_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cun_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cune_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cune_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cueq_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cueq_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_ceq_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_ceq_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cne_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cne_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_clt_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_clt_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cult_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cult_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cle_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cle_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_cule_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_cule_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_saf_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_saf_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sor_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sor_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sun_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sun_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sune_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sune_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sueq_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sueq_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_seq_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_seq_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sne_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sne_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_slt_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_slt_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sult_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sult_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sle_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sle_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcmp_sule_s, LARCH_V4SI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcmp_sule_d, LARCH_V2DI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfadd_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfadd_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfsub_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfsub_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfmul_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfmul_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfdiv_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfdiv_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfcvt_h_s, LARCH_V8HI_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfcvt_s_d, LARCH_V4SF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfmin_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfmin_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfmina_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfmina_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfmax_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfmax_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfmaxa_s, LARCH_V4SF_FTYPE_V4SF_V4SF), + LSX_BUILTIN (vfmaxa_d, LARCH_V2DF_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vfclass_s, LARCH_V4SI_FTYPE_V4SF), + LSX_BUILTIN (vfclass_d, LARCH_V2DI_FTYPE_V2DF), + LSX_BUILTIN (vfsqrt_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfsqrt_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrecip_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrecip_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrint_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrint_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrsqrt_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrsqrt_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vflogb_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vflogb_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfcvth_s_h, LARCH_V4SF_FTYPE_V8HI), + LSX_BUILTIN (vfcvth_d_s, LARCH_V2DF_FTYPE_V4SF), + LSX_BUILTIN (vfcvtl_s_h, LARCH_V4SF_FTYPE_V8HI), + LSX_BUILTIN (vfcvtl_d_s, LARCH_V2DF_FTYPE_V4SF), + LSX_BUILTIN (vftint_w_s, LARCH_V4SI_FTYPE_V4SF), + LSX_BUILTIN (vftint_l_d, LARCH_V2DI_FTYPE_V2DF), + LSX_BUILTIN (vftint_wu_s, LARCH_UV4SI_FTYPE_V4SF), + LSX_BUILTIN (vftint_lu_d, LARCH_UV2DI_FTYPE_V2DF), + LSX_BUILTIN (vftintrz_w_s, LARCH_V4SI_FTYPE_V4SF), + LSX_BUILTIN (vftintrz_l_d, LARCH_V2DI_FTYPE_V2DF), + LSX_BUILTIN (vftintrz_wu_s, LARCH_UV4SI_FTYPE_V4SF), + LSX_BUILTIN (vftintrz_lu_d, LARCH_UV2DI_FTYPE_V2DF), + LSX_BUILTIN (vffint_s_w, LARCH_V4SF_FTYPE_V4SI), + LSX_BUILTIN (vffint_d_l, LARCH_V2DF_FTYPE_V2DI), + LSX_BUILTIN (vffint_s_wu, LARCH_V4SF_FTYPE_UV4SI), + LSX_BUILTIN (vffint_d_lu, LARCH_V2DF_FTYPE_UV2DI), + + LSX_BUILTIN (vandn_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vneg_b, LARCH_V16QI_FTYPE_V16QI), + LSX_BUILTIN (vneg_h, LARCH_V8HI_FTYPE_V8HI), + LSX_BUILTIN (vneg_w, LARCH_V4SI_FTYPE_V4SI), + LSX_BUILTIN (vneg_d, LARCH_V2DI_FTYPE_V2DI), + LSX_BUILTIN (vmuh_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vmuh_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vmuh_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vmuh_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vmuh_bu, LARCH_UV16QI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vmuh_hu, LARCH_UV8HI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vmuh_wu, LARCH_UV4SI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vmuh_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vsllwil_h_b, LARCH_V8HI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vsllwil_w_h, LARCH_V4SI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vsllwil_d_w, LARCH_V2DI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vsllwil_hu_bu, LARCH_UV8HI_FTYPE_UV16QI_UQI), + LSX_BUILTIN (vsllwil_wu_hu, LARCH_UV4SI_FTYPE_UV8HI_UQI), + LSX_BUILTIN (vsllwil_du_wu, LARCH_UV2DI_FTYPE_UV4SI_UQI), + LSX_BUILTIN (vsran_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsran_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsran_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssran_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vssran_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vssran_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssran_bu_h, LARCH_UV16QI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vssran_hu_w, LARCH_UV8HI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vssran_wu_d, LARCH_UV4SI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vsrarn_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsrarn_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsrarn_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssrarn_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vssrarn_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vssrarn_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssrarn_bu_h, LARCH_UV16QI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vssrarn_hu_w, LARCH_UV8HI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vssrarn_wu_d, LARCH_UV4SI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vsrln_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsrln_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsrln_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssrln_bu_h, LARCH_UV16QI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vssrln_hu_w, LARCH_UV8HI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vssrln_wu_d, LARCH_UV4SI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vsrlrn_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsrlrn_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsrlrn_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssrlrn_bu_h, LARCH_UV16QI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vssrlrn_hu_w, LARCH_UV8HI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vssrlrn_wu_d, LARCH_UV4SI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vfrstpi_b, LARCH_V16QI_FTYPE_V16QI_V16QI_UQI), + LSX_BUILTIN (vfrstpi_h, LARCH_V8HI_FTYPE_V8HI_V8HI_UQI), + LSX_BUILTIN (vfrstp_b, LARCH_V16QI_FTYPE_V16QI_V16QI_V16QI), + LSX_BUILTIN (vfrstp_h, LARCH_V8HI_FTYPE_V8HI_V8HI_V8HI), + LSX_BUILTIN (vshuf4i_d, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vbsrl_v, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vbsll_v, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vextrins_b, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vextrins_h, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vextrins_w, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vextrins_d, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vmskltz_b, LARCH_V16QI_FTYPE_V16QI), + LSX_BUILTIN (vmskltz_h, LARCH_V8HI_FTYPE_V8HI), + LSX_BUILTIN (vmskltz_w, LARCH_V4SI_FTYPE_V4SI), + LSX_BUILTIN (vmskltz_d, LARCH_V2DI_FTYPE_V2DI), + LSX_BUILTIN (vsigncov_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsigncov_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsigncov_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsigncov_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vfmadd_s, LARCH_V4SF_FTYPE_V4SF_V4SF_V4SF), + LSX_BUILTIN (vfmadd_d, LARCH_V2DF_FTYPE_V2DF_V2DF_V2DF), + LSX_BUILTIN (vfmsub_s, LARCH_V4SF_FTYPE_V4SF_V4SF_V4SF), + LSX_BUILTIN (vfmsub_d, LARCH_V2DF_FTYPE_V2DF_V2DF_V2DF), + LSX_BUILTIN (vfnmadd_s, LARCH_V4SF_FTYPE_V4SF_V4SF_V4SF), + LSX_BUILTIN (vfnmadd_d, LARCH_V2DF_FTYPE_V2DF_V2DF_V2DF), + LSX_BUILTIN (vfnmsub_s, LARCH_V4SF_FTYPE_V4SF_V4SF_V4SF), + LSX_BUILTIN (vfnmsub_d, LARCH_V2DF_FTYPE_V2DF_V2DF_V2DF), + LSX_BUILTIN (vftintrne_w_s, LARCH_V4SI_FTYPE_V4SF), + LSX_BUILTIN (vftintrne_l_d, LARCH_V2DI_FTYPE_V2DF), + LSX_BUILTIN (vftintrp_w_s, LARCH_V4SI_FTYPE_V4SF), + LSX_BUILTIN (vftintrp_l_d, LARCH_V2DI_FTYPE_V2DF), + LSX_BUILTIN (vftintrm_w_s, LARCH_V4SI_FTYPE_V4SF), + LSX_BUILTIN (vftintrm_l_d, LARCH_V2DI_FTYPE_V2DF), + LSX_BUILTIN (vftint_w_d, LARCH_V4SI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vffint_s_l, LARCH_V4SF_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vftintrz_w_d, LARCH_V4SI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vftintrp_w_d, LARCH_V4SI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vftintrm_w_d, LARCH_V4SI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vftintrne_w_d, LARCH_V4SI_FTYPE_V2DF_V2DF), + LSX_BUILTIN (vftintl_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftinth_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vffinth_d_w, LARCH_V2DF_FTYPE_V4SI), + LSX_BUILTIN (vffintl_d_w, LARCH_V2DF_FTYPE_V4SI), + LSX_BUILTIN (vftintrzl_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftintrzh_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftintrpl_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftintrph_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftintrml_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftintrmh_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftintrnel_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vftintrneh_l_s, LARCH_V2DI_FTYPE_V4SF), + LSX_BUILTIN (vfrintrne_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrintrne_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrintrz_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrintrz_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrintrp_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrintrp_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrintrm_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrintrm_d, LARCH_V2DF_FTYPE_V2DF), + LSX_NO_TARGET_BUILTIN (vstelm_b, LARCH_VOID_FTYPE_V16QI_CVPOINTER_SI_UQI), + LSX_NO_TARGET_BUILTIN (vstelm_h, LARCH_VOID_FTYPE_V8HI_CVPOINTER_SI_UQI), + LSX_NO_TARGET_BUILTIN (vstelm_w, LARCH_VOID_FTYPE_V4SI_CVPOINTER_SI_UQI), + LSX_NO_TARGET_BUILTIN (vstelm_d, LARCH_VOID_FTYPE_V2DI_CVPOINTER_SI_UQI), + LSX_BUILTIN (vaddwev_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vaddwev_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vaddwev_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vaddwod_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vaddwod_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vaddwod_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vaddwev_d_wu, LARCH_V2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vaddwev_w_hu, LARCH_V4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vaddwev_h_bu, LARCH_V8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vaddwod_d_wu, LARCH_V2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vaddwod_w_hu, LARCH_V4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vaddwod_h_bu, LARCH_V8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vaddwev_d_wu_w, LARCH_V2DI_FTYPE_UV4SI_V4SI), + LSX_BUILTIN (vaddwev_w_hu_h, LARCH_V4SI_FTYPE_UV8HI_V8HI), + LSX_BUILTIN (vaddwev_h_bu_b, LARCH_V8HI_FTYPE_UV16QI_V16QI), + LSX_BUILTIN (vaddwod_d_wu_w, LARCH_V2DI_FTYPE_UV4SI_V4SI), + LSX_BUILTIN (vaddwod_w_hu_h, LARCH_V4SI_FTYPE_UV8HI_V8HI), + LSX_BUILTIN (vaddwod_h_bu_b, LARCH_V8HI_FTYPE_UV16QI_V16QI), + LSX_BUILTIN (vsubwev_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsubwev_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsubwev_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsubwod_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vsubwod_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vsubwod_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vsubwev_d_wu, LARCH_V2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vsubwev_w_hu, LARCH_V4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vsubwev_h_bu, LARCH_V8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vsubwod_d_wu, LARCH_V2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vsubwod_w_hu, LARCH_V4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vsubwod_h_bu, LARCH_V8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vaddwev_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vaddwod_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vaddwev_q_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vaddwod_q_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vsubwev_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsubwod_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsubwev_q_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vsubwod_q_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vaddwev_q_du_d, LARCH_V2DI_FTYPE_UV2DI_V2DI), + LSX_BUILTIN (vaddwod_q_du_d, LARCH_V2DI_FTYPE_UV2DI_V2DI), + + LSX_BUILTIN (vmulwev_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vmulwev_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vmulwev_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vmulwod_d_w, LARCH_V2DI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vmulwod_w_h, LARCH_V4SI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vmulwod_h_b, LARCH_V8HI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vmulwev_d_wu, LARCH_V2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vmulwev_w_hu, LARCH_V4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vmulwev_h_bu, LARCH_V8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vmulwod_d_wu, LARCH_V2DI_FTYPE_UV4SI_UV4SI), + LSX_BUILTIN (vmulwod_w_hu, LARCH_V4SI_FTYPE_UV8HI_UV8HI), + LSX_BUILTIN (vmulwod_h_bu, LARCH_V8HI_FTYPE_UV16QI_UV16QI), + LSX_BUILTIN (vmulwev_d_wu_w, LARCH_V2DI_FTYPE_UV4SI_V4SI), + LSX_BUILTIN (vmulwev_w_hu_h, LARCH_V4SI_FTYPE_UV8HI_V8HI), + LSX_BUILTIN (vmulwev_h_bu_b, LARCH_V8HI_FTYPE_UV16QI_V16QI), + LSX_BUILTIN (vmulwod_d_wu_w, LARCH_V2DI_FTYPE_UV4SI_V4SI), + LSX_BUILTIN (vmulwod_w_hu_h, LARCH_V4SI_FTYPE_UV8HI_V8HI), + LSX_BUILTIN (vmulwod_h_bu_b, LARCH_V8HI_FTYPE_UV16QI_V16QI), + LSX_BUILTIN (vmulwev_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vmulwod_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vmulwev_q_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vmulwod_q_du, LARCH_V2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vmulwev_q_du_d, LARCH_V2DI_FTYPE_UV2DI_V2DI), + LSX_BUILTIN (vmulwod_q_du_d, LARCH_V2DI_FTYPE_UV2DI_V2DI), + LSX_BUILTIN (vhaddw_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vhaddw_qu_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vhsubw_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vhsubw_qu_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI), + LSX_BUILTIN (vmaddwev_d_w, LARCH_V2DI_FTYPE_V2DI_V4SI_V4SI), + LSX_BUILTIN (vmaddwev_w_h, LARCH_V4SI_FTYPE_V4SI_V8HI_V8HI), + LSX_BUILTIN (vmaddwev_h_b, LARCH_V8HI_FTYPE_V8HI_V16QI_V16QI), + LSX_BUILTIN (vmaddwev_d_wu, LARCH_UV2DI_FTYPE_UV2DI_UV4SI_UV4SI), + LSX_BUILTIN (vmaddwev_w_hu, LARCH_UV4SI_FTYPE_UV4SI_UV8HI_UV8HI), + LSX_BUILTIN (vmaddwev_h_bu, LARCH_UV8HI_FTYPE_UV8HI_UV16QI_UV16QI), + LSX_BUILTIN (vmaddwod_d_w, LARCH_V2DI_FTYPE_V2DI_V4SI_V4SI), + LSX_BUILTIN (vmaddwod_w_h, LARCH_V4SI_FTYPE_V4SI_V8HI_V8HI), + LSX_BUILTIN (vmaddwod_h_b, LARCH_V8HI_FTYPE_V8HI_V16QI_V16QI), + LSX_BUILTIN (vmaddwod_d_wu, LARCH_UV2DI_FTYPE_UV2DI_UV4SI_UV4SI), + LSX_BUILTIN (vmaddwod_w_hu, LARCH_UV4SI_FTYPE_UV4SI_UV8HI_UV8HI), + LSX_BUILTIN (vmaddwod_h_bu, LARCH_UV8HI_FTYPE_UV8HI_UV16QI_UV16QI), + LSX_BUILTIN (vmaddwev_d_wu_w, LARCH_V2DI_FTYPE_V2DI_UV4SI_V4SI), + LSX_BUILTIN (vmaddwev_w_hu_h, LARCH_V4SI_FTYPE_V4SI_UV8HI_V8HI), + LSX_BUILTIN (vmaddwev_h_bu_b, LARCH_V8HI_FTYPE_V8HI_UV16QI_V16QI), + LSX_BUILTIN (vmaddwod_d_wu_w, LARCH_V2DI_FTYPE_V2DI_UV4SI_V4SI), + LSX_BUILTIN (vmaddwod_w_hu_h, LARCH_V4SI_FTYPE_V4SI_UV8HI_V8HI), + LSX_BUILTIN (vmaddwod_h_bu_b, LARCH_V8HI_FTYPE_V8HI_UV16QI_V16QI), + LSX_BUILTIN (vmaddwev_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI_V2DI), + LSX_BUILTIN (vmaddwod_q_d, LARCH_V2DI_FTYPE_V2DI_V2DI_V2DI), + LSX_BUILTIN (vmaddwev_q_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI_UV2DI), + LSX_BUILTIN (vmaddwod_q_du, LARCH_UV2DI_FTYPE_UV2DI_UV2DI_UV2DI), + LSX_BUILTIN (vmaddwev_q_du_d, LARCH_V2DI_FTYPE_V2DI_UV2DI_V2DI), + LSX_BUILTIN (vmaddwod_q_du_d, LARCH_V2DI_FTYPE_V2DI_UV2DI_V2DI), + LSX_BUILTIN (vrotr_b, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vrotr_h, LARCH_V8HI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vrotr_w, LARCH_V4SI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vrotr_d, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vadd_q, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vsub_q, LARCH_V2DI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vldrepl_b, LARCH_V16QI_FTYPE_CVPOINTER_SI), + LSX_BUILTIN (vldrepl_h, LARCH_V8HI_FTYPE_CVPOINTER_SI), + LSX_BUILTIN (vldrepl_w, LARCH_V4SI_FTYPE_CVPOINTER_SI), + LSX_BUILTIN (vldrepl_d, LARCH_V2DI_FTYPE_CVPOINTER_SI), + + LSX_BUILTIN (vmskgez_b, LARCH_V16QI_FTYPE_V16QI), + LSX_BUILTIN (vmsknz_b, LARCH_V16QI_FTYPE_V16QI), + LSX_BUILTIN (vexth_h_b, LARCH_V8HI_FTYPE_V16QI), + LSX_BUILTIN (vexth_w_h, LARCH_V4SI_FTYPE_V8HI), + LSX_BUILTIN (vexth_d_w, LARCH_V2DI_FTYPE_V4SI), + LSX_BUILTIN (vexth_q_d, LARCH_V2DI_FTYPE_V2DI), + LSX_BUILTIN (vexth_hu_bu, LARCH_UV8HI_FTYPE_UV16QI), + LSX_BUILTIN (vexth_wu_hu, LARCH_UV4SI_FTYPE_UV8HI), + LSX_BUILTIN (vexth_du_wu, LARCH_UV2DI_FTYPE_UV4SI), + LSX_BUILTIN (vexth_qu_du, LARCH_UV2DI_FTYPE_UV2DI), + LSX_BUILTIN (vrotri_b, LARCH_V16QI_FTYPE_V16QI_UQI), + LSX_BUILTIN (vrotri_h, LARCH_V8HI_FTYPE_V8HI_UQI), + LSX_BUILTIN (vrotri_w, LARCH_V4SI_FTYPE_V4SI_UQI), + LSX_BUILTIN (vrotri_d, LARCH_V2DI_FTYPE_V2DI_UQI), + LSX_BUILTIN (vextl_q_d, LARCH_V2DI_FTYPE_V2DI), + LSX_BUILTIN (vsrlni_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vsrlni_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vsrlni_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vsrlni_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vsrlrni_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vsrlrni_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vsrlrni_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vsrlrni_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vssrlni_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vssrlni_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vssrlni_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vssrlni_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vssrlni_bu_h, LARCH_UV16QI_FTYPE_UV16QI_V16QI_USI), + LSX_BUILTIN (vssrlni_hu_w, LARCH_UV8HI_FTYPE_UV8HI_V8HI_USI), + LSX_BUILTIN (vssrlni_wu_d, LARCH_UV4SI_FTYPE_UV4SI_V4SI_USI), + LSX_BUILTIN (vssrlni_du_q, LARCH_UV2DI_FTYPE_UV2DI_V2DI_USI), + LSX_BUILTIN (vssrlrni_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vssrlrni_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vssrlrni_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vssrlrni_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vssrlrni_bu_h, LARCH_UV16QI_FTYPE_UV16QI_V16QI_USI), + LSX_BUILTIN (vssrlrni_hu_w, LARCH_UV8HI_FTYPE_UV8HI_V8HI_USI), + LSX_BUILTIN (vssrlrni_wu_d, LARCH_UV4SI_FTYPE_UV4SI_V4SI_USI), + LSX_BUILTIN (vssrlrni_du_q, LARCH_UV2DI_FTYPE_UV2DI_V2DI_USI), + LSX_BUILTIN (vsrani_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vsrani_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vsrani_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vsrani_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vsrarni_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vsrarni_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vsrarni_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vsrarni_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vssrani_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vssrani_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vssrani_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vssrani_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vssrani_bu_h, LARCH_UV16QI_FTYPE_UV16QI_V16QI_USI), + LSX_BUILTIN (vssrani_hu_w, LARCH_UV8HI_FTYPE_UV8HI_V8HI_USI), + LSX_BUILTIN (vssrani_wu_d, LARCH_UV4SI_FTYPE_UV4SI_V4SI_USI), + LSX_BUILTIN (vssrani_du_q, LARCH_UV2DI_FTYPE_UV2DI_V2DI_USI), + LSX_BUILTIN (vssrarni_b_h, LARCH_V16QI_FTYPE_V16QI_V16QI_USI), + LSX_BUILTIN (vssrarni_h_w, LARCH_V8HI_FTYPE_V8HI_V8HI_USI), + LSX_BUILTIN (vssrarni_w_d, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vssrarni_d_q, LARCH_V2DI_FTYPE_V2DI_V2DI_USI), + LSX_BUILTIN (vssrarni_bu_h, LARCH_UV16QI_FTYPE_UV16QI_V16QI_USI), + LSX_BUILTIN (vssrarni_hu_w, LARCH_UV8HI_FTYPE_UV8HI_V8HI_USI), + LSX_BUILTIN (vssrarni_wu_d, LARCH_UV4SI_FTYPE_UV4SI_V4SI_USI), + LSX_BUILTIN (vssrarni_du_q, LARCH_UV2DI_FTYPE_UV2DI_V2DI_USI), + LSX_BUILTIN (vpermi_w, LARCH_V4SI_FTYPE_V4SI_V4SI_USI), + LSX_BUILTIN (vld, LARCH_V16QI_FTYPE_CVPOINTER_SI), + LSX_NO_TARGET_BUILTIN (vst, LARCH_VOID_FTYPE_V16QI_CVPOINTER_SI), + LSX_BUILTIN (vssrlrn_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vssrlrn_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vssrlrn_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vssrln_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI), + LSX_BUILTIN (vssrln_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI), + LSX_BUILTIN (vssrln_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI), + LSX_BUILTIN (vorn_v, LARCH_V16QI_FTYPE_V16QI_V16QI), + LSX_BUILTIN (vldi, LARCH_V2DI_FTYPE_HI), + LSX_BUILTIN (vshuf_b, LARCH_V16QI_FTYPE_V16QI_V16QI_V16QI), + LSX_BUILTIN (vldx, LARCH_V16QI_FTYPE_CVPOINTER_DI), + LSX_NO_TARGET_BUILTIN (vstx, LARCH_VOID_FTYPE_V16QI_CVPOINTER_DI), + LSX_BUILTIN (vextl_qu_du, LARCH_UV2DI_FTYPE_UV2DI) }; /* Index I is the function declaration for loongarch_builtins[I], or null if @@ -193,11 +1219,46 @@ static GTY (()) tree loongarch_builtin_decls[ARRAY_SIZE (loongarch_builtins)]; using the instruction code or return null if not defined for the target. */ static GTY (()) int loongarch_get_builtin_decl_index[NUM_INSN_CODES]; + +/* MODE is a vector mode whose elements have type TYPE. Return the type + of the vector itself. */ + +static tree +loongarch_builtin_vector_type (tree type, machine_mode mode) +{ + static tree types[2 * (int) MAX_MACHINE_MODE]; + int mode_index; + + mode_index = (int) mode; + + if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type)) + mode_index += MAX_MACHINE_MODE; + + if (types[mode_index] == NULL_TREE) + types[mode_index] = build_vector_type_for_mode (type, mode); + return types[mode_index]; +} + +/* Return a type for 'const volatile void *'. */ + +static tree +loongarch_build_cvpointer_type (void) +{ + static tree cache; + + if (cache == NULL_TREE) + cache = build_pointer_type (build_qualified_type (void_type_node, + TYPE_QUAL_CONST + | TYPE_QUAL_VOLATILE)); + return cache; +} + /* Source-level argument types. */ #define LARCH_ATYPE_VOID void_type_node #define LARCH_ATYPE_INT integer_type_node #define LARCH_ATYPE_POINTER ptr_type_node - +#define LARCH_ATYPE_CVPOINTER loongarch_build_cvpointer_type () +#define LARCH_ATYPE_BOOLEAN boolean_type_node /* Standard mode-based argument types. */ #define LARCH_ATYPE_QI intQI_type_node #define LARCH_ATYPE_UQI unsigned_intQI_type_node @@ -210,6 +1271,72 @@ static GTY (()) int loongarch_get_builtin_decl_index[NUM_INSN_CODES]; #define LARCH_ATYPE_SF float_type_node #define LARCH_ATYPE_DF double_type_node +/* Vector argument types. */ +#define LARCH_ATYPE_V2SF \ + loongarch_builtin_vector_type (float_type_node, V2SFmode) +#define LARCH_ATYPE_V2HI \ + loongarch_builtin_vector_type (intHI_type_node, V2HImode) +#define LARCH_ATYPE_V2SI \ + loongarch_builtin_vector_type (intSI_type_node, V2SImode) +#define LARCH_ATYPE_V4QI \ + loongarch_builtin_vector_type (intQI_type_node, V4QImode) +#define LARCH_ATYPE_V4HI \ + loongarch_builtin_vector_type (intHI_type_node, V4HImode) +#define LARCH_ATYPE_V8QI \ + loongarch_builtin_vector_type (intQI_type_node, V8QImode) + +#define LARCH_ATYPE_V2DI \ + loongarch_builtin_vector_type (long_long_integer_type_node, V2DImode) +#define LARCH_ATYPE_V4SI \ + loongarch_builtin_vector_type (intSI_type_node, V4SImode) +#define LARCH_ATYPE_V8HI \ + loongarch_builtin_vector_type (intHI_type_node, V8HImode) +#define LARCH_ATYPE_V16QI \ + loongarch_builtin_vector_type (intQI_type_node, V16QImode) +#define LARCH_ATYPE_V2DF \ + loongarch_builtin_vector_type (double_type_node, V2DFmode) +#define LARCH_ATYPE_V4SF \ + loongarch_builtin_vector_type (float_type_node, V4SFmode) + +/* LoongArch ASX. */ +#define LARCH_ATYPE_V4DI \ + loongarch_builtin_vector_type (long_long_integer_type_node, V4DImode) +#define LARCH_ATYPE_V8SI \ + loongarch_builtin_vector_type (intSI_type_node, V8SImode) +#define LARCH_ATYPE_V16HI \ + loongarch_builtin_vector_type (intHI_type_node, V16HImode) +#define LARCH_ATYPE_V32QI \ + loongarch_builtin_vector_type (intQI_type_node, V32QImode) +#define LARCH_ATYPE_V4DF \ + loongarch_builtin_vector_type (double_type_node, V4DFmode) +#define LARCH_ATYPE_V8SF \ + loongarch_builtin_vector_type (float_type_node, V8SFmode) + +#define LARCH_ATYPE_UV2DI \ + loongarch_builtin_vector_type (long_long_unsigned_type_node, V2DImode) +#define LARCH_ATYPE_UV4SI \ + loongarch_builtin_vector_type (unsigned_intSI_type_node, V4SImode) +#define LARCH_ATYPE_UV8HI \ + loongarch_builtin_vector_type (unsigned_intHI_type_node, V8HImode) +#define LARCH_ATYPE_UV16QI \ + loongarch_builtin_vector_type (unsigned_intQI_type_node, V16QImode) + +#define LARCH_ATYPE_UV4DI \ + loongarch_builtin_vector_type (long_long_unsigned_type_node, V4DImode) +#define LARCH_ATYPE_UV8SI \ + loongarch_builtin_vector_type (unsigned_intSI_type_node, V8SImode) +#define LARCH_ATYPE_UV16HI \ + loongarch_builtin_vector_type (unsigned_intHI_type_node, V16HImode) +#define LARCH_ATYPE_UV32QI \ + loongarch_builtin_vector_type (unsigned_intQI_type_node, V32QImode) + +#define LARCH_ATYPE_UV2SI \ + loongarch_builtin_vector_type (unsigned_intSI_type_node, V2SImode) +#define LARCH_ATYPE_UV4HI \ + loongarch_builtin_vector_type (unsigned_intHI_type_node, V4HImode) +#define LARCH_ATYPE_UV8QI \ + loongarch_builtin_vector_type (unsigned_intQI_type_node, V8QImode) + /* LARCH_FTYPE_ATYPESN takes N LARCH_FTYPES-like type codes and lists their associated LARCH_ATYPEs. */ #define LARCH_FTYPE_ATYPES1(A, B) LARCH_ATYPE_##A, LARCH_ATYPE_##B @@ -283,6 +1410,92 @@ loongarch_builtin_decl (unsigned int code, bool initialize_p ATTRIBUTE_UNUSED) return loongarch_builtin_decls[code]; } +/* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. */ + +tree +loongarch_builtin_vectorized_function (unsigned int fn, tree type_out, + tree type_in) +{ + machine_mode in_mode, out_mode; + int in_n, out_n; + + if (TREE_CODE (type_out) != VECTOR_TYPE + || TREE_CODE (type_in) != VECTOR_TYPE + || !ISA_HAS_LSX) + return NULL_TREE; + + out_mode = TYPE_MODE (TREE_TYPE (type_out)); + out_n = TYPE_VECTOR_SUBPARTS (type_out); + in_mode = TYPE_MODE (TREE_TYPE (type_in)); + in_n = TYPE_VECTOR_SUBPARTS (type_in); + + /* INSN is the name of the associated instruction pattern, without + the leading CODE_FOR_. */ +#define LARCH_GET_BUILTIN(INSN) \ + loongarch_builtin_decls[loongarch_get_builtin_decl_index[CODE_FOR_##INSN]] + + switch (fn) + { + CASE_CFN_CEIL: + if (out_mode == DFmode && in_mode == DFmode) + { + if (out_n == 2 && in_n == 2) + return LARCH_GET_BUILTIN (lsx_vfrintrp_d); + } + if (out_mode == SFmode && in_mode == SFmode) + { + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lsx_vfrintrp_s); + } + break; + + CASE_CFN_TRUNC: + if (out_mode == DFmode && in_mode == DFmode) + { + if (out_n == 2 && in_n == 2) + return LARCH_GET_BUILTIN (lsx_vfrintrz_d); + } + if (out_mode == SFmode && in_mode == SFmode) + { + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lsx_vfrintrz_s); + } + break; + + CASE_CFN_RINT: + CASE_CFN_ROUND: + if (out_mode == DFmode && in_mode == DFmode) + { + if (out_n == 2 && in_n == 2) + return LARCH_GET_BUILTIN (lsx_vfrint_d); + } + if (out_mode == SFmode && in_mode == SFmode) + { + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lsx_vfrint_s); + } + break; + + CASE_CFN_FLOOR: + if (out_mode == DFmode && in_mode == DFmode) + { + if (out_n == 2 && in_n == 2) + return LARCH_GET_BUILTIN (lsx_vfrintrm_d); + } + if (out_mode == SFmode && in_mode == SFmode) + { + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lsx_vfrintrm_s); + } + break; + + default: + break; + } + + return NULL_TREE; +} + /* Take argument ARGNO from EXP's argument list and convert it into an expand operand. Store the operand in *OP. */ @@ -318,7 +1531,236 @@ static rtx loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, struct expand_operand *ops, bool has_target_p) { - if (!maybe_expand_insn (icode, nops, ops)) + machine_mode imode; + int rangelo = 0, rangehi = 0, error_opno = 0; + + switch (icode) + { + case CODE_FOR_lsx_vaddi_bu: + case CODE_FOR_lsx_vaddi_hu: + case CODE_FOR_lsx_vaddi_wu: + case CODE_FOR_lsx_vaddi_du: + case CODE_FOR_lsx_vslti_bu: + case CODE_FOR_lsx_vslti_hu: + case CODE_FOR_lsx_vslti_wu: + case CODE_FOR_lsx_vslti_du: + case CODE_FOR_lsx_vslei_bu: + case CODE_FOR_lsx_vslei_hu: + case CODE_FOR_lsx_vslei_wu: + case CODE_FOR_lsx_vslei_du: + case CODE_FOR_lsx_vmaxi_bu: + case CODE_FOR_lsx_vmaxi_hu: + case CODE_FOR_lsx_vmaxi_wu: + case CODE_FOR_lsx_vmaxi_du: + case CODE_FOR_lsx_vmini_bu: + case CODE_FOR_lsx_vmini_hu: + case CODE_FOR_lsx_vmini_wu: + case CODE_FOR_lsx_vmini_du: + case CODE_FOR_lsx_vsubi_bu: + case CODE_FOR_lsx_vsubi_hu: + case CODE_FOR_lsx_vsubi_wu: + case CODE_FOR_lsx_vsubi_du: + gcc_assert (has_target_p && nops == 3); + /* We only generate a vector of constants iff the second argument + is an immediate. We also validate the range of the immediate. */ + if (CONST_INT_P (ops[2].value)) + { + rangelo = 0; + rangehi = 31; + if (IN_RANGE (INTVAL (ops[2].value), rangelo, rangehi)) + { + ops[2].mode = ops[0].mode; + ops[2].value = loongarch_gen_const_int_vector (ops[2].mode, + INTVAL (ops[2].value)); + } + else + error_opno = 2; + } + break; + + case CODE_FOR_lsx_vseqi_b: + case CODE_FOR_lsx_vseqi_h: + case CODE_FOR_lsx_vseqi_w: + case CODE_FOR_lsx_vseqi_d: + case CODE_FOR_lsx_vslti_b: + case CODE_FOR_lsx_vslti_h: + case CODE_FOR_lsx_vslti_w: + case CODE_FOR_lsx_vslti_d: + case CODE_FOR_lsx_vslei_b: + case CODE_FOR_lsx_vslei_h: + case CODE_FOR_lsx_vslei_w: + case CODE_FOR_lsx_vslei_d: + case CODE_FOR_lsx_vmaxi_b: + case CODE_FOR_lsx_vmaxi_h: + case CODE_FOR_lsx_vmaxi_w: + case CODE_FOR_lsx_vmaxi_d: + case CODE_FOR_lsx_vmini_b: + case CODE_FOR_lsx_vmini_h: + case CODE_FOR_lsx_vmini_w: + case CODE_FOR_lsx_vmini_d: + gcc_assert (has_target_p && nops == 3); + /* We only generate a vector of constants iff the second argument + is an immediate. We also validate the range of the immediate. */ + if (CONST_INT_P (ops[2].value)) + { + rangelo = -16; + rangehi = 15; + if (IN_RANGE (INTVAL (ops[2].value), rangelo, rangehi)) + { + ops[2].mode = ops[0].mode; + ops[2].value = loongarch_gen_const_int_vector (ops[2].mode, + INTVAL (ops[2].value)); + } + else + error_opno = 2; + } + break; + + case CODE_FOR_lsx_vandi_b: + case CODE_FOR_lsx_vori_b: + case CODE_FOR_lsx_vnori_b: + case CODE_FOR_lsx_vxori_b: + gcc_assert (has_target_p && nops == 3); + if (!CONST_INT_P (ops[2].value)) + break; + ops[2].mode = ops[0].mode; + ops[2].value = loongarch_gen_const_int_vector (ops[2].mode, + INTVAL (ops[2].value)); + break; + + case CODE_FOR_lsx_vbitseli_b: + gcc_assert (has_target_p && nops == 4); + if (!CONST_INT_P (ops[3].value)) + break; + ops[3].mode = ops[0].mode; + ops[3].value = loongarch_gen_const_int_vector (ops[3].mode, + INTVAL (ops[3].value)); + break; + + case CODE_FOR_lsx_vreplgr2vr_b: + case CODE_FOR_lsx_vreplgr2vr_h: + case CODE_FOR_lsx_vreplgr2vr_w: + case CODE_FOR_lsx_vreplgr2vr_d: + /* Map the built-ins to vector fill operations. We need fix up the mode + for the element being inserted. */ + gcc_assert (has_target_p && nops == 2); + imode = GET_MODE_INNER (ops[0].mode); + ops[1].value = lowpart_subreg (imode, ops[1].value, ops[1].mode); + ops[1].mode = imode; + break; + + case CODE_FOR_lsx_vilvh_b: + case CODE_FOR_lsx_vilvh_h: + case CODE_FOR_lsx_vilvh_w: + case CODE_FOR_lsx_vilvh_d: + case CODE_FOR_lsx_vilvl_b: + case CODE_FOR_lsx_vilvl_h: + case CODE_FOR_lsx_vilvl_w: + case CODE_FOR_lsx_vilvl_d: + case CODE_FOR_lsx_vpackev_b: + case CODE_FOR_lsx_vpackev_h: + case CODE_FOR_lsx_vpackev_w: + case CODE_FOR_lsx_vpackod_b: + case CODE_FOR_lsx_vpackod_h: + case CODE_FOR_lsx_vpackod_w: + case CODE_FOR_lsx_vpickev_b: + case CODE_FOR_lsx_vpickev_h: + case CODE_FOR_lsx_vpickev_w: + case CODE_FOR_lsx_vpickod_b: + case CODE_FOR_lsx_vpickod_h: + case CODE_FOR_lsx_vpickod_w: + /* Swap the operands 1 and 2 for interleave operations. Built-ins follow + convention of ISA, which have op1 as higher component and op2 as lower + component. However, the VEC_PERM op in tree and vec_concat in RTL + expects first operand to be lower component, because of which this + swap is needed for builtins. */ + gcc_assert (has_target_p && nops == 3); + std::swap (ops[1], ops[2]); + break; + + case CODE_FOR_lsx_vslli_b: + case CODE_FOR_lsx_vslli_h: + case CODE_FOR_lsx_vslli_w: + case CODE_FOR_lsx_vslli_d: + case CODE_FOR_lsx_vsrai_b: + case CODE_FOR_lsx_vsrai_h: + case CODE_FOR_lsx_vsrai_w: + case CODE_FOR_lsx_vsrai_d: + case CODE_FOR_lsx_vsrli_b: + case CODE_FOR_lsx_vsrli_h: + case CODE_FOR_lsx_vsrli_w: + case CODE_FOR_lsx_vsrli_d: + gcc_assert (has_target_p && nops == 3); + if (CONST_INT_P (ops[2].value)) + { + rangelo = 0; + rangehi = GET_MODE_UNIT_BITSIZE (ops[0].mode) - 1; + if (IN_RANGE (INTVAL (ops[2].value), rangelo, rangehi)) + { + ops[2].mode = ops[0].mode; + ops[2].value = loongarch_gen_const_int_vector (ops[2].mode, + INTVAL (ops[2].value)); + } + else + error_opno = 2; + } + break; + + case CODE_FOR_lsx_vinsgr2vr_b: + case CODE_FOR_lsx_vinsgr2vr_h: + case CODE_FOR_lsx_vinsgr2vr_w: + case CODE_FOR_lsx_vinsgr2vr_d: + /* Map the built-ins to insert operations. We need to swap operands, + fix up the mode for the element being inserted, and generate + a bit mask for vec_merge. */ + gcc_assert (has_target_p && nops == 4); + std::swap (ops[1], ops[2]); + imode = GET_MODE_INNER (ops[0].mode); + ops[1].value = lowpart_subreg (imode, ops[1].value, ops[1].mode); + ops[1].mode = imode; + rangelo = 0; + rangehi = GET_MODE_NUNITS (ops[0].mode) - 1; + if (CONST_INT_P (ops[3].value) + && IN_RANGE (INTVAL (ops[3].value), rangelo, rangehi)) + ops[3].value = GEN_INT (1 << INTVAL (ops[3].value)); + else + error_opno = 2; + break; + + /* Map the built-ins to element insert operations. We need to swap + operands and generate a bit mask. */ + gcc_assert (has_target_p && nops == 4); + std::swap (ops[1], ops[2]); + std::swap (ops[1], ops[3]); + rangelo = 0; + rangehi = GET_MODE_NUNITS (ops[0].mode) - 1; + if (CONST_INT_P (ops[3].value) + && IN_RANGE (INTVAL (ops[3].value), rangelo, rangehi)) + ops[3].value = GEN_INT (1 << INTVAL (ops[3].value)); + else + error_opno = 2; + break; + + case CODE_FOR_lsx_vshuf4i_b: + case CODE_FOR_lsx_vshuf4i_h: + case CODE_FOR_lsx_vshuf4i_w: + case CODE_FOR_lsx_vshuf4i_w_f: + gcc_assert (has_target_p && nops == 3); + ops[2].value = loongarch_gen_const_int_vector_shuffle (ops[0].mode, + INTVAL (ops[2].value)); + break; + + default: + break; + } + + if (error_opno != 0) + { + error ("argument %d to the built-in must be a constant" + " in range %d to %d", error_opno, rangelo, rangehi); + return has_target_p ? gen_reg_rtx (ops[0].mode) : const0_rtx; + } + else if (!maybe_expand_insn (icode, nops, ops)) { error ("invalid argument to built-in function"); return has_target_p ? gen_reg_rtx (ops[0].mode) : const0_rtx; @@ -352,6 +1794,50 @@ loongarch_expand_builtin_direct (enum insn_code icode, rtx target, tree exp, return loongarch_expand_builtin_insn (icode, opno, ops, has_target_p); } +/* Expand an LSX built-in for a compare and branch instruction specified by + ICODE, set a general-purpose register to 1 if the branch was taken, + 0 otherwise. */ + +static rtx +loongarch_expand_builtin_lsx_test_branch (enum insn_code icode, tree exp) +{ + struct expand_operand ops[3]; + rtx_insn *cbranch; + rtx_code_label *true_label, *done_label; + rtx cmp_result; + + true_label = gen_label_rtx (); + done_label = gen_label_rtx (); + + create_input_operand (&ops[0], true_label, TYPE_MODE (TREE_TYPE (exp))); + loongarch_prepare_builtin_arg (&ops[1], exp, 0); + create_fixed_operand (&ops[2], const0_rtx); + + /* Make sure that the operand 1 is a REG. */ + if (GET_CODE (ops[1].value) != REG) + ops[1].value = force_reg (ops[1].mode, ops[1].value); + + if ((cbranch = maybe_gen_insn (icode, 3, ops)) == NULL_RTX) + error ("failed to expand built-in function"); + + cmp_result = gen_reg_rtx (SImode); + + /* First assume that CMP_RESULT is false. */ + loongarch_emit_move (cmp_result, const0_rtx); + + /* Branch to TRUE_LABEL if CBRANCH is taken and DONE_LABEL otherwise. */ + emit_jump_insn (cbranch); + emit_jump_insn (gen_jump (done_label)); + emit_barrier (); + + /* Set CMP_RESULT to true if the branch was taken. */ + emit_label (true_label); + loongarch_emit_move (cmp_result, const1_rtx); + + emit_label (done_label); + return cmp_result; +} + /* Implement TARGET_EXPAND_BUILTIN. */ rtx @@ -372,10 +1858,14 @@ loongarch_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED, switch (d->builtin_type) { case LARCH_BUILTIN_DIRECT: + case LARCH_BUILTIN_LSX: return loongarch_expand_builtin_direct (d->icode, target, exp, true); case LARCH_BUILTIN_DIRECT_NO_TARGET: return loongarch_expand_builtin_direct (d->icode, target, exp, false); + + case LARCH_BUILTIN_LSX_TEST_BRANCH: + return loongarch_expand_builtin_lsx_test_branch (d->icode, exp); } gcc_unreachable (); } diff --git a/gcc/config/loongarch/loongarch-ftypes.def b/gcc/config/loongarch/loongarch-ftypes.def index 06d2e0519f7..1ce9d83ccab 100644 --- a/gcc/config/loongarch/loongarch-ftypes.def +++ b/gcc/config/loongarch/loongarch-ftypes.def @@ -1,7 +1,7 @@ /* Definitions of prototypes for LoongArch built-in functions. Copyright (C) 2021-2023 Free Software Foundation, Inc. Contributed by Loongson Ltd. - Based on MIPS target for GNU ckompiler. + Based on MIPS target for GNU compiler. This file is part of GCC. @@ -32,7 +32,7 @@ along with GCC; see the file COPYING3. If not see INT for integer_type_node POINTER for ptr_type_node - (we don't use PTR because that's a ANSI-compatibillity macro). + (we don't use PTR because that's a ANSI-compatibility macro). Please keep this list lexicographically sorted by the LIST argument. */ @@ -63,3 +63,396 @@ DEF_LARCH_FTYPE (3, (VOID, USI, USI, SI)) DEF_LARCH_FTYPE (3, (VOID, USI, UDI, SI)) DEF_LARCH_FTYPE (3, (USI, USI, USI, USI)) DEF_LARCH_FTYPE (3, (UDI, UDI, UDI, USI)) + +DEF_LARCH_FTYPE (1, (DF, DF)) +DEF_LARCH_FTYPE (2, (DF, DF, DF)) +DEF_LARCH_FTYPE (1, (DF, V2DF)) + +DEF_LARCH_FTYPE (1, (DI, DI)) +DEF_LARCH_FTYPE (1, (DI, SI)) +DEF_LARCH_FTYPE (1, (DI, UQI)) +DEF_LARCH_FTYPE (2, (DI, DI, DI)) +DEF_LARCH_FTYPE (2, (DI, DI, SI)) +DEF_LARCH_FTYPE (3, (DI, DI, SI, SI)) +DEF_LARCH_FTYPE (3, (DI, DI, USI, USI)) +DEF_LARCH_FTYPE (3, (DI, DI, DI, QI)) +DEF_LARCH_FTYPE (3, (DI, DI, V2HI, V2HI)) +DEF_LARCH_FTYPE (3, (DI, DI, V4QI, V4QI)) +DEF_LARCH_FTYPE (2, (DI, POINTER, SI)) +DEF_LARCH_FTYPE (2, (DI, SI, SI)) +DEF_LARCH_FTYPE (2, (DI, USI, USI)) + +DEF_LARCH_FTYPE (2, (DI, V2DI, UQI)) + +DEF_LARCH_FTYPE (2, (INT, DF, DF)) +DEF_LARCH_FTYPE (2, (INT, SF, SF)) + +DEF_LARCH_FTYPE (2, (INT, V2SF, V2SF)) +DEF_LARCH_FTYPE (4, (INT, V2SF, V2SF, V2SF, V2SF)) + +DEF_LARCH_FTYPE (1, (SF, SF)) +DEF_LARCH_FTYPE (2, (SF, SF, SF)) +DEF_LARCH_FTYPE (1, (SF, V2SF)) +DEF_LARCH_FTYPE (1, (SF, V4SF)) + +DEF_LARCH_FTYPE (2, (SI, POINTER, SI)) +DEF_LARCH_FTYPE (1, (SI, SI)) +DEF_LARCH_FTYPE (1, (SI, UDI)) +DEF_LARCH_FTYPE (2, (QI, QI, QI)) +DEF_LARCH_FTYPE (2, (HI, HI, HI)) +DEF_LARCH_FTYPE (3, (SI, SI, SI, SI)) +DEF_LARCH_FTYPE (3, (SI, SI, SI, QI)) +DEF_LARCH_FTYPE (1, (SI, UQI)) +DEF_LARCH_FTYPE (1, (SI, UV16QI)) +DEF_LARCH_FTYPE (1, (SI, UV2DI)) +DEF_LARCH_FTYPE (1, (SI, UV4SI)) +DEF_LARCH_FTYPE (1, (SI, UV8HI)) +DEF_LARCH_FTYPE (2, (SI, V16QI, UQI)) +DEF_LARCH_FTYPE (1, (SI, V2HI)) +DEF_LARCH_FTYPE (2, (SI, V2HI, V2HI)) +DEF_LARCH_FTYPE (1, (SI, V4QI)) +DEF_LARCH_FTYPE (2, (SI, V4QI, V4QI)) +DEF_LARCH_FTYPE (2, (SI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (SI, V8HI, UQI)) +DEF_LARCH_FTYPE (1, (SI, VOID)) + +DEF_LARCH_FTYPE (2, (UDI, UDI, UDI)) +DEF_LARCH_FTYPE (2, (UDI, UV2SI, UV2SI)) +DEF_LARCH_FTYPE (2, (UDI, V2DI, UQI)) + +DEF_LARCH_FTYPE (2, (USI, V16QI, UQI)) +DEF_LARCH_FTYPE (2, (USI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (USI, V8HI, UQI)) +DEF_LARCH_FTYPE (1, (USI, VOID)) + +DEF_LARCH_FTYPE (2, (UV16QI, UV16QI, UQI)) +DEF_LARCH_FTYPE (2, (UV16QI, UV16QI, USI)) +DEF_LARCH_FTYPE (2, (UV16QI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (3, (UV16QI, UV16QI, UV16QI, UQI)) +DEF_LARCH_FTYPE (3, (UV16QI, UV16QI, UV16QI, USI)) +DEF_LARCH_FTYPE (3, (UV16QI, UV16QI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (2, (UV16QI, UV16QI, V16QI)) + +DEF_LARCH_FTYPE (2, (UV2DI, UV2DI, UQI)) +DEF_LARCH_FTYPE (2, (UV2DI, UV2DI, UV2DI)) +DEF_LARCH_FTYPE (3, (UV2DI, UV2DI, UV2DI, UQI)) +DEF_LARCH_FTYPE (3, (UV2DI, UV2DI, UV2DI, UV2DI)) +DEF_LARCH_FTYPE (3, (UV2DI, UV2DI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (2, (UV2DI, UV2DI, V2DI)) +DEF_LARCH_FTYPE (2, (UV2DI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (1, (UV2DI, V2DF)) + +DEF_LARCH_FTYPE (2, (UV2SI, UV2SI, UQI)) +DEF_LARCH_FTYPE (2, (UV2SI, UV2SI, UV2SI)) + +DEF_LARCH_FTYPE (2, (UV4HI, UV4HI, UQI)) +DEF_LARCH_FTYPE (2, (UV4HI, UV4HI, USI)) +DEF_LARCH_FTYPE (2, (UV4HI, UV4HI, UV4HI)) +DEF_LARCH_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI)) +DEF_LARCH_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI)) +DEF_LARCH_FTYPE (1, (UV4HI, UV8QI)) +DEF_LARCH_FTYPE (2, (UV4HI, UV8QI, UV8QI)) + +DEF_LARCH_FTYPE (2, (UV4SI, UV4SI, UQI)) +DEF_LARCH_FTYPE (2, (UV4SI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (3, (UV4SI, UV4SI, UV4SI, UQI)) +DEF_LARCH_FTYPE (3, (UV4SI, UV4SI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (3, (UV4SI, UV4SI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (UV4SI, UV4SI, V4SI)) +DEF_LARCH_FTYPE (2, (UV4SI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (1, (UV4SI, V4SF)) + +DEF_LARCH_FTYPE (2, (UV8HI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (2, (UV8HI, UV8HI, UQI)) +DEF_LARCH_FTYPE (3, (UV8HI, UV8HI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (2, (UV8HI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (3, (UV8HI, UV8HI, UV8HI, UQI)) +DEF_LARCH_FTYPE (3, (UV8HI, UV8HI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (UV8HI, UV8HI, V8HI)) + + + +DEF_LARCH_FTYPE (2, (UV8QI, UV4HI, UV4HI)) +DEF_LARCH_FTYPE (1, (UV8QI, UV8QI)) +DEF_LARCH_FTYPE (2, (UV8QI, UV8QI, UV8QI)) + +DEF_LARCH_FTYPE (2, (V16QI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (2, (V16QI, CVPOINTER, DI)) +DEF_LARCH_FTYPE (1, (V16QI, HI)) +DEF_LARCH_FTYPE (1, (V16QI, SI)) +DEF_LARCH_FTYPE (2, (V16QI, UV16QI, UQI)) +DEF_LARCH_FTYPE (2, (V16QI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (1, (V16QI, V16QI)) +DEF_LARCH_FTYPE (2, (V16QI, V16QI, QI)) +DEF_LARCH_FTYPE (2, (V16QI, V16QI, SI)) +DEF_LARCH_FTYPE (2, (V16QI, V16QI, USI)) +DEF_LARCH_FTYPE (2, (V16QI, V16QI, UQI)) +DEF_LARCH_FTYPE (3, (V16QI, V16QI, UQI, SI)) +DEF_LARCH_FTYPE (3, (V16QI, V16QI, UQI, V16QI)) +DEF_LARCH_FTYPE (2, (V16QI, V16QI, V16QI)) +DEF_LARCH_FTYPE (3, (V16QI, V16QI, V16QI, SI)) +DEF_LARCH_FTYPE (3, (V16QI, V16QI, V16QI, UQI)) +DEF_LARCH_FTYPE (4, (V16QI, V16QI, V16QI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V16QI, V16QI, V16QI, USI)) +DEF_LARCH_FTYPE (3, (V16QI, V16QI, V16QI, V16QI)) + + +DEF_LARCH_FTYPE (1, (V2DF, DF)) +DEF_LARCH_FTYPE (1, (V2DF, UV2DI)) +DEF_LARCH_FTYPE (1, (V2DF, V2DF)) +DEF_LARCH_FTYPE (2, (V2DF, V2DF, V2DF)) +DEF_LARCH_FTYPE (3, (V2DF, V2DF, V2DF, V2DF)) +DEF_LARCH_FTYPE (2, (V2DF, V2DF, V2DI)) +DEF_LARCH_FTYPE (1, (V2DF, V2DI)) +DEF_LARCH_FTYPE (1, (V2DF, V4SF)) +DEF_LARCH_FTYPE (1, (V2DF, V4SI)) + +DEF_LARCH_FTYPE (2, (V2DI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (1, (V2DI, DI)) +DEF_LARCH_FTYPE (1, (V2DI, HI)) +DEF_LARCH_FTYPE (2, (V2DI, UV2DI, UQI)) +DEF_LARCH_FTYPE (2, (V2DI, UV2DI, UV2DI)) +DEF_LARCH_FTYPE (2, (V2DI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (1, (V2DI, V2DF)) +DEF_LARCH_FTYPE (2, (V2DI, V2DF, V2DF)) +DEF_LARCH_FTYPE (1, (V2DI, V2DI)) +DEF_LARCH_FTYPE (1, (UV2DI, UV2DI)) +DEF_LARCH_FTYPE (2, (V2DI, V2DI, QI)) +DEF_LARCH_FTYPE (2, (V2DI, V2DI, SI)) +DEF_LARCH_FTYPE (2, (V2DI, V2DI, UQI)) +DEF_LARCH_FTYPE (2, (V2DI, V2DI, USI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, UQI, DI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, UQI, V2DI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (2, (V2DI, V2DI, V2DI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, V2DI, SI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, V2DI, UQI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, V2DI, USI)) +DEF_LARCH_FTYPE (4, (V2DI, V2DI, V2DI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, V2DI, V2DI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, V4SI, V4SI)) +DEF_LARCH_FTYPE (2, (V2DI, V4SI, V4SI)) + +DEF_LARCH_FTYPE (1, (V2HI, SI)) +DEF_LARCH_FTYPE (2, (V2HI, SI, SI)) +DEF_LARCH_FTYPE (3, (V2HI, SI, SI, SI)) +DEF_LARCH_FTYPE (1, (V2HI, V2HI)) +DEF_LARCH_FTYPE (2, (V2HI, V2HI, SI)) +DEF_LARCH_FTYPE (2, (V2HI, V2HI, V2HI)) +DEF_LARCH_FTYPE (1, (V2HI, V4QI)) +DEF_LARCH_FTYPE (2, (V2HI, V4QI, V2HI)) + +DEF_LARCH_FTYPE (2, (V2SF, SF, SF)) +DEF_LARCH_FTYPE (1, (V2SF, V2SF)) +DEF_LARCH_FTYPE (2, (V2SF, V2SF, V2SF)) +DEF_LARCH_FTYPE (3, (V2SF, V2SF, V2SF, INT)) +DEF_LARCH_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF)) + +DEF_LARCH_FTYPE (2, (V2SI, V2SI, UQI)) +DEF_LARCH_FTYPE (2, (V2SI, V2SI, V2SI)) +DEF_LARCH_FTYPE (2, (V2SI, V4HI, V4HI)) + +DEF_LARCH_FTYPE (2, (V4HI, V2SI, V2SI)) +DEF_LARCH_FTYPE (2, (V4HI, V4HI, UQI)) +DEF_LARCH_FTYPE (2, (V4HI, V4HI, USI)) +DEF_LARCH_FTYPE (2, (V4HI, V4HI, V4HI)) +DEF_LARCH_FTYPE (3, (V4HI, V4HI, V4HI, UQI)) +DEF_LARCH_FTYPE (3, (V4HI, V4HI, V4HI, USI)) + +DEF_LARCH_FTYPE (1, (V4QI, SI)) +DEF_LARCH_FTYPE (2, (V4QI, V2HI, V2HI)) +DEF_LARCH_FTYPE (1, (V4QI, V4QI)) +DEF_LARCH_FTYPE (2, (V4QI, V4QI, SI)) +DEF_LARCH_FTYPE (2, (V4QI, V4QI, V4QI)) + +DEF_LARCH_FTYPE (1, (V4SF, SF)) +DEF_LARCH_FTYPE (1, (V4SF, UV4SI)) +DEF_LARCH_FTYPE (2, (V4SF, V2DF, V2DF)) +DEF_LARCH_FTYPE (1, (V4SF, V4SF)) +DEF_LARCH_FTYPE (2, (V4SF, V4SF, V4SF)) +DEF_LARCH_FTYPE (3, (V4SF, V4SF, V4SF, V4SF)) +DEF_LARCH_FTYPE (2, (V4SF, V4SF, V4SI)) +DEF_LARCH_FTYPE (1, (V4SF, V4SI)) +DEF_LARCH_FTYPE (1, (V4SF, V8HI)) + +DEF_LARCH_FTYPE (2, (V4SI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (1, (V4SI, HI)) +DEF_LARCH_FTYPE (1, (V4SI, SI)) +DEF_LARCH_FTYPE (2, (V4SI, UV4SI, UQI)) +DEF_LARCH_FTYPE (2, (V4SI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (2, (V4SI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (V4SI, V2DF, V2DF)) +DEF_LARCH_FTYPE (1, (V4SI, V4SF)) +DEF_LARCH_FTYPE (2, (V4SI, V4SF, V4SF)) +DEF_LARCH_FTYPE (1, (V4SI, V4SI)) +DEF_LARCH_FTYPE (2, (V4SI, V4SI, QI)) +DEF_LARCH_FTYPE (2, (V4SI, V4SI, SI)) +DEF_LARCH_FTYPE (2, (V4SI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (V4SI, V4SI, USI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, UQI, SI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, UQI, V4SI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (V4SI, V4SI, V4SI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, V4SI, SI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, V4SI, UQI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, V4SI, USI)) +DEF_LARCH_FTYPE (4, (V4SI, V4SI, V4SI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, V4SI, V4SI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, V8HI, V8HI)) +DEF_LARCH_FTYPE (2, (V4SI, V8HI, V8HI)) + +DEF_LARCH_FTYPE (2, (V8HI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (1, (V8HI, HI)) +DEF_LARCH_FTYPE (1, (V8HI, SI)) +DEF_LARCH_FTYPE (2, (V8HI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (2, (V8HI, UV8HI, UQI)) +DEF_LARCH_FTYPE (2, (V8HI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (V8HI, V16QI, V16QI)) +DEF_LARCH_FTYPE (2, (V8HI, V4SF, V4SF)) +DEF_LARCH_FTYPE (1, (V8HI, V8HI)) +DEF_LARCH_FTYPE (2, (V8HI, V8HI, QI)) +DEF_LARCH_FTYPE (2, (V8HI, V8HI, SI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, SI, UQI)) +DEF_LARCH_FTYPE (2, (V8HI, V8HI, UQI)) +DEF_LARCH_FTYPE (2, (V8HI, V8HI, USI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, UQI, SI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, UQI, V8HI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, V16QI, V16QI)) +DEF_LARCH_FTYPE (2, (V8HI, V8HI, V8HI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, V8HI, SI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, V8HI, UQI)) +DEF_LARCH_FTYPE (4, (V8HI, V8HI, V8HI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, V8HI, USI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, V8HI, V8HI)) + +DEF_LARCH_FTYPE (2, (V8QI, V4HI, V4HI)) +DEF_LARCH_FTYPE (1, (V8QI, V8QI)) +DEF_LARCH_FTYPE (2, (V8QI, V8QI, V8QI)) + +DEF_LARCH_FTYPE (2, (VOID, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (VOID, SI, SI)) +DEF_LARCH_FTYPE (2, (VOID, UQI, SI)) +DEF_LARCH_FTYPE (2, (VOID, USI, UQI)) +DEF_LARCH_FTYPE (1, (VOID, UHI)) +DEF_LARCH_FTYPE (3, (VOID, V16QI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V16QI, CVPOINTER, DI)) +DEF_LARCH_FTYPE (3, (VOID, V2DF, POINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V2DI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (2, (VOID, V2HI, V2HI)) +DEF_LARCH_FTYPE (2, (VOID, V4QI, V4QI)) +DEF_LARCH_FTYPE (3, (VOID, V4SF, POINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V4SI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V8HI, CVPOINTER, SI)) + +DEF_LARCH_FTYPE (1, (V8HI, V16QI)) +DEF_LARCH_FTYPE (1, (V4SI, V16QI)) +DEF_LARCH_FTYPE (1, (V2DI, V16QI)) +DEF_LARCH_FTYPE (1, (V4SI, V8HI)) +DEF_LARCH_FTYPE (1, (V2DI, V8HI)) +DEF_LARCH_FTYPE (1, (V2DI, V4SI)) +DEF_LARCH_FTYPE (1, (UV8HI, V16QI)) +DEF_LARCH_FTYPE (1, (UV4SI, V16QI)) +DEF_LARCH_FTYPE (1, (UV2DI, V16QI)) +DEF_LARCH_FTYPE (1, (UV4SI, V8HI)) +DEF_LARCH_FTYPE (1, (UV2DI, V8HI)) +DEF_LARCH_FTYPE (1, (UV2DI, V4SI)) +DEF_LARCH_FTYPE (1, (UV8HI, UV16QI)) +DEF_LARCH_FTYPE (1, (UV4SI, UV16QI)) +DEF_LARCH_FTYPE (1, (UV2DI, UV16QI)) +DEF_LARCH_FTYPE (1, (UV4SI, UV8HI)) +DEF_LARCH_FTYPE (1, (UV2DI, UV8HI)) +DEF_LARCH_FTYPE (1, (UV2DI, UV4SI)) +DEF_LARCH_FTYPE (2, (UV8HI, V16QI, V16QI)) +DEF_LARCH_FTYPE (2, (UV4SI, V8HI, V8HI)) +DEF_LARCH_FTYPE (2, (UV2DI, V4SI, V4SI)) +DEF_LARCH_FTYPE (2, (V8HI, V16QI, UQI)) +DEF_LARCH_FTYPE (2, (V4SI, V8HI, UQI)) +DEF_LARCH_FTYPE (2, (V2DI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (UV8HI, UV16QI, UQI)) +DEF_LARCH_FTYPE (2, (UV4SI, UV8HI, UQI)) +DEF_LARCH_FTYPE (2, (UV2DI, UV4SI, UQI)) +DEF_LARCH_FTYPE (2, (V16QI, V8HI, V8HI)) +DEF_LARCH_FTYPE (2, (V8HI, V4SI, V4SI)) +DEF_LARCH_FTYPE (2, (V4SI, V2DI, V2DI)) +DEF_LARCH_FTYPE (2, (UV16QI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (UV8HI, UV4SI, UV4SI)) +DEF_LARCH_FTYPE (2, (UV4SI, UV2DI, UV2DI)) +DEF_LARCH_FTYPE (2, (V16QI, V8HI, UQI)) +DEF_LARCH_FTYPE (2, (V8HI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (V4SI, V2DI, UQI)) +DEF_LARCH_FTYPE (2, (UV16QI, UV8HI, UQI)) +DEF_LARCH_FTYPE (2, (UV8HI, UV4SI, UQI)) +DEF_LARCH_FTYPE (2, (UV4SI, UV2DI, UQI)) +DEF_LARCH_FTYPE (2, (V16QI, V16QI, DI)) +DEF_LARCH_FTYPE (2, (V16QI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V16QI, V16QI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, UQI, UQI)) +DEF_LARCH_FTYPE (2, (V4SF, V2DI, V2DI)) +DEF_LARCH_FTYPE (1, (V2DI, V4SF)) +DEF_LARCH_FTYPE (2, (V2DI, UQI, USI)) +DEF_LARCH_FTYPE (2, (V2DI, UQI, UQI)) +DEF_LARCH_FTYPE (4, (VOID, SI, UQI, V16QI, CVPOINTER)) +DEF_LARCH_FTYPE (4, (VOID, SI, UQI, V8HI, CVPOINTER)) +DEF_LARCH_FTYPE (4, (VOID, SI, UQI, V4SI, CVPOINTER)) +DEF_LARCH_FTYPE (4, (VOID, SI, UQI, V2DI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V16QI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V8HI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V4SI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V2DI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V8HI, UV16QI, V16QI)) +DEF_LARCH_FTYPE (2, (V16QI, V16QI, UV16QI)) +DEF_LARCH_FTYPE (2, (UV16QI, V16QI, UV16QI)) +DEF_LARCH_FTYPE (2, (V8HI, V8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (UV8HI, V8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (V4SI, V4SI, UV4SI)) +DEF_LARCH_FTYPE (2, (UV4SI, V4SI, UV4SI)) +DEF_LARCH_FTYPE (2, (V4SI, V16QI, V16QI)) +DEF_LARCH_FTYPE (2, (V4SI, UV16QI, V16QI)) +DEF_LARCH_FTYPE (2, (UV4SI, UV16QI, UV16QI)) +DEF_LARCH_FTYPE (2, (V2DI, V2DI, UV2DI)) +DEF_LARCH_FTYPE (2, (UV2DI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (2, (V4SI, UV8HI, V8HI)) +DEF_LARCH_FTYPE (2, (V2DI, UV4SI, V4SI)) +DEF_LARCH_FTYPE (2, (V2DI, UV2DI, V2DI)) +DEF_LARCH_FTYPE (2, (V2DI, V8HI, V8HI)) +DEF_LARCH_FTYPE (2, (V2DI, UV8HI, V8HI)) +DEF_LARCH_FTYPE (2, (UV2DI, V2DI, UV2DI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, UV8HI, V8HI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, UV2DI, V2DI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, UV4SI, V4SI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, V8HI, V8HI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, UV8HI, V8HI)) +DEF_LARCH_FTYPE (3, (UV2DI, UV2DI, UV8HI, UV8HI)) +DEF_LARCH_FTYPE (3, (V8HI, V8HI, UV16QI, V16QI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, V16QI, V16QI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, UV16QI, V16QI)) +DEF_LARCH_FTYPE (3, (UV4SI, UV4SI, UV16QI, UV16QI)) + +DEF_LARCH_FTYPE(4,(VOID,V16QI,CVPOINTER,SI,UQI)) +DEF_LARCH_FTYPE(4,(VOID,V8HI,CVPOINTER,SI,UQI)) +DEF_LARCH_FTYPE(4,(VOID,V4SI,CVPOINTER,SI,UQI)) +DEF_LARCH_FTYPE(4,(VOID,V2DI,CVPOINTER,SI,UQI)) + +DEF_LARCH_FTYPE (2, (DI, V16QI, UQI)) +DEF_LARCH_FTYPE (2, (DI, V8HI, UQI)) +DEF_LARCH_FTYPE (2, (DI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (UDI, V16QI, UQI)) +DEF_LARCH_FTYPE (2, (UDI, V8HI, UQI)) +DEF_LARCH_FTYPE (2, (UDI, V4SI, UQI)) + +DEF_LARCH_FTYPE (3, (UV16QI, UV16QI, V16QI, USI)) +DEF_LARCH_FTYPE (3, (UV8HI, UV8HI, V8HI, USI)) +DEF_LARCH_FTYPE (3, (UV4SI, UV4SI, V4SI, USI)) +DEF_LARCH_FTYPE (3, (UV2DI, UV2DI, V2DI, USI)) + +DEF_LARCH_FTYPE (1, (BOOLEAN,V16QI)) +DEF_LARCH_FTYPE(2,(V16QI,CVPOINTER,CVPOINTER)) +DEF_LARCH_FTYPE(3,(VOID,V16QI,CVPOINTER,CVPOINTER)) + +DEF_LARCH_FTYPE (3, (V16QI, V16QI, SI, UQI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, SI, UQI)) +DEF_LARCH_FTYPE (3, (V2DI, V2DI, DI, UQI)) +DEF_LARCH_FTYPE (3, (V4SI, V4SI, SI, UQI)) diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h new file mode 100644 index 00000000000..ec42069904d --- /dev/null +++ b/gcc/config/loongarch/lsxintrin.h @@ -0,0 +1,5181 @@ +/* LARCH Loongson SX intrinsics include file. + + Copyright (C) 2018 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _GCC_LOONGSON_SXINTRIN_H +#define _GCC_LOONGSON_SXINTRIN_H 1 + +#if defined(__loongarch_sx) +typedef signed char v16i8 __attribute__ ((vector_size(16), aligned(16))); +typedef signed char v16i8_b __attribute__ ((vector_size(16), aligned(1))); +typedef unsigned char v16u8 __attribute__ ((vector_size(16), aligned(16))); +typedef unsigned char v16u8_b __attribute__ ((vector_size(16), aligned(1))); +typedef short v8i16 __attribute__ ((vector_size(16), aligned(16))); +typedef short v8i16_h __attribute__ ((vector_size(16), aligned(2))); +typedef unsigned short v8u16 __attribute__ ((vector_size(16), aligned(16))); +typedef unsigned short v8u16_h __attribute__ ((vector_size(16), aligned(2))); +typedef int v4i32 __attribute__ ((vector_size(16), aligned(16))); +typedef int v4i32_w __attribute__ ((vector_size(16), aligned(4))); +typedef unsigned int v4u32 __attribute__ ((vector_size(16), aligned(16))); +typedef unsigned int v4u32_w __attribute__ ((vector_size(16), aligned(4))); +typedef long long v2i64 __attribute__ ((vector_size(16), aligned(16))); +typedef long long v2i64_d __attribute__ ((vector_size(16), aligned(8))); +typedef unsigned long long v2u64 __attribute__ ((vector_size(16), aligned(16))); +typedef unsigned long long v2u64_d __attribute__ ((vector_size(16), aligned(8))); +typedef float v4f32 __attribute__ ((vector_size(16), aligned(16))); +typedef float v4f32_w __attribute__ ((vector_size(16), aligned(4))); +typedef double v2f64 __attribute__ ((vector_size(16), aligned(16))); +typedef double v2f64_d __attribute__ ((vector_size(16), aligned(8))); + +typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsll_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsll_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsll_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsll_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsll_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsll_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsll_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsll_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vslli_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vslli_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vslli_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vslli_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vslli_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslli_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vslli_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vslli_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsra_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsra_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsra_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsra_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsra_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsra_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsra_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsra_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vsrai_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsrai_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vsrai_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsrai_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vsrai_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsrai_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vsrai_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vsrai_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrar_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrar_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrar_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrar_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrar_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrar_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrar_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrar_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vsrari_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsrari_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vsrari_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsrari_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vsrari_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsrari_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vsrari_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vsrari_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrl_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrl_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrl_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrl_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrl_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrl_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrl_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrl_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vsrli_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsrli_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vsrli_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsrli_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vsrli_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsrli_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vsrli_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vsrli_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrlr_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrlr_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrlr_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrlr_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrlr_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrlr_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrlr_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrlr_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vsrlri_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsrlri_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vsrlri_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsrlri_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vsrlri_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsrlri_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vsrlri_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vsrlri_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitclr_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitclr_b ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitclr_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitclr_h ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitclr_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitclr_w ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitclr_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitclr_d ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vbitclri_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vbitclri_b ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV8HI, UV8HI, UQI. */ +#define __lsx_vbitclri_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vbitclri_h ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV4SI, UV4SI, UQI. */ +#define __lsx_vbitclri_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vbitclri_w ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV2DI, UV2DI, UQI. */ +#define __lsx_vbitclri_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vbitclri_d ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitset_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitset_b ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitset_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitset_h ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitset_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitset_w ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitset_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitset_d ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vbitseti_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vbitseti_b ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV8HI, UV8HI, UQI. */ +#define __lsx_vbitseti_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vbitseti_h ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV4SI, UV4SI, UQI. */ +#define __lsx_vbitseti_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vbitseti_w ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV2DI, UV2DI, UQI. */ +#define __lsx_vbitseti_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vbitseti_d ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitrev_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitrev_b ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitrev_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitrev_h ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitrev_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitrev_w ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitrev_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vbitrev_d ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vbitrevi_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vbitrevi_b ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV8HI, UV8HI, UQI. */ +#define __lsx_vbitrevi_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vbitrevi_h ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV4SI, UV4SI, UQI. */ +#define __lsx_vbitrevi_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vbitrevi_w ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV2DI, UV2DI, UQI. */ +#define __lsx_vbitrevi_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vbitrevi_d ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadd_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadd_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadd_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadd_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadd_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadd_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadd_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadd_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vaddi_bu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vaddi_bu ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vaddi_hu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vaddi_hu ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vaddi_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vaddi_wu ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vaddi_du(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vaddi_du ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsub_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsub_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsub_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsub_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsub_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsub_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsub_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsub_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vsubi_bu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsubi_bu ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vsubi_hu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsubi_hu ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vsubi_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsubi_wu ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vsubi_du(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsubi_du ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V16QI, V16QI, QI. */ +#define __lsx_vmaxi_b(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V8HI, V8HI, QI. */ +#define __lsx_vmaxi_h(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V4SI, V4SI, QI. */ +#define __lsx_vmaxi_w(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V2DI, V2DI, QI. */ +#define __lsx_vmaxi_d(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmax_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmax_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vmaxi_bu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_bu ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV8HI, UV8HI, UQI. */ +#define __lsx_vmaxi_hu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_hu ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV4SI, UV4SI, UQI. */ +#define __lsx_vmaxi_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_wu ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV2DI, UV2DI, UQI. */ +#define __lsx_vmaxi_du(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmaxi_du ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V16QI, V16QI, QI. */ +#define __lsx_vmini_b(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V8HI, V8HI, QI. */ +#define __lsx_vmini_h(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V4SI, V4SI, QI. */ +#define __lsx_vmini_w(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V2DI, V2DI, QI. */ +#define __lsx_vmini_d(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmin_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmin_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vmini_bu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_bu ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV8HI, UV8HI, UQI. */ +#define __lsx_vmini_hu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_hu ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV4SI, UV4SI, UQI. */ +#define __lsx_vmini_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_wu ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV2DI, UV2DI, UQI. */ +#define __lsx_vmini_du(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vmini_du ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vseq_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vseq_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vseq_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vseq_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vseq_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vseq_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vseq_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vseq_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V16QI, V16QI, QI. */ +#define __lsx_vseqi_b(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vseqi_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V8HI, V8HI, QI. */ +#define __lsx_vseqi_h(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vseqi_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V4SI, V4SI, QI. */ +#define __lsx_vseqi_w(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vseqi_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V2DI, V2DI, QI. */ +#define __lsx_vseqi_d(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vseqi_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V16QI, V16QI, QI. */ +#define __lsx_vslti_b(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V8HI, V8HI, QI. */ +#define __lsx_vslti_h(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V4SI, V4SI, QI. */ +#define __lsx_vslti_w(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V2DI, V2DI, QI. */ +#define __lsx_vslti_d(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vslt_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vslt_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V16QI, UV16QI, UQI. */ +#define __lsx_vslti_bu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_bu ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, UV8HI, UQI. */ +#define __lsx_vslti_hu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_hu ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, UV4SI, UQI. */ +#define __lsx_vslti_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_wu ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V2DI, UV2DI, UQI. */ +#define __lsx_vslti_du(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslti_du ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V16QI, V16QI, QI. */ +#define __lsx_vslei_b(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V8HI, V8HI, QI. */ +#define __lsx_vslei_h(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V4SI, V4SI, QI. */ +#define __lsx_vslei_w(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, si5. */ +/* Data types in instruction templates: V2DI, V2DI, QI. */ +#define __lsx_vslei_d(/*__m128i*/ _1, /*si5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsle_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsle_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V16QI, UV16QI, UQI. */ +#define __lsx_vslei_bu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_bu ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, UV8HI, UQI. */ +#define __lsx_vslei_hu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_hu ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, UV4SI, UQI. */ +#define __lsx_vslei_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_wu ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V2DI, UV2DI, UQI. */ +#define __lsx_vslei_du(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vslei_du ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vsat_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsat_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vsat_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsat_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vsat_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsat_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vsat_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vsat_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vsat_bu(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsat_bu ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV8HI, UV8HI, UQI. */ +#define __lsx_vsat_hu(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsat_hu ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV4SI, UV4SI, UQI. */ +#define __lsx_vsat_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsat_wu ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV2DI, UV2DI, UQI. */ +#define __lsx_vsat_du(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vsat_du ((v2u64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadda_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadda_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadda_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadda_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadda_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadda_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadda_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadda_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsadd_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsadd_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavg_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavg_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vavgr_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vavgr_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssub_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssub_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vabsd_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vabsd_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmul_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmul_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmul_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmul_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmul_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmul_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmul_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmul_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmadd_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmadd_b ((v16i8)_1, (v16i8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmadd_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmadd_h ((v8i16)_1, (v8i16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmadd_w (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmadd_w ((v4i32)_1, (v4i32)_2, (v4i32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmadd_d (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmadd_d ((v2i64)_1, (v2i64)_2, (v2i64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmsub_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmsub_b ((v16i8)_1, (v16i8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmsub_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmsub_h ((v8i16)_1, (v8i16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmsub_w (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmsub_w ((v4i32)_1, (v4i32)_2, (v4i32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmsub_d (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmsub_d ((v2i64)_1, (v2i64)_2, (v2i64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vdiv_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vdiv_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_hu_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_hu_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_wu_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_wu_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_du_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_du_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_hu_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_hu_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_wu_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_wu_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_du_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_du_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmod_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmod_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, rk. */ +/* Data types in instruction templates: V16QI, V16QI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplve_b (__m128i _1, int _2) +{ + return (__m128i)__builtin_lsx_vreplve_b ((v16i8)_1, (int)_2); +} + +/* Assembly instruction format: vd, vj, rk. */ +/* Data types in instruction templates: V8HI, V8HI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplve_h (__m128i _1, int _2) +{ + return (__m128i)__builtin_lsx_vreplve_h ((v8i16)_1, (int)_2); +} + +/* Assembly instruction format: vd, vj, rk. */ +/* Data types in instruction templates: V4SI, V4SI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplve_w (__m128i _1, int _2) +{ + return (__m128i)__builtin_lsx_vreplve_w ((v4i32)_1, (int)_2); +} + +/* Assembly instruction format: vd, vj, rk. */ +/* Data types in instruction templates: V2DI, V2DI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplve_d (__m128i _1, int _2) +{ + return (__m128i)__builtin_lsx_vreplve_d ((v2i64)_1, (int)_2); +} + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vreplvei_b(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vreplvei_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vreplvei_h(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vreplvei_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui2. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vreplvei_w(/*__m128i*/ _1, /*ui2*/ _2) \ + ((__m128i)__builtin_lsx_vreplvei_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui1. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vreplvei_d(/*__m128i*/ _1, /*ui1*/ _2) \ + ((__m128i)__builtin_lsx_vreplvei_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickev_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickev_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickev_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickev_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickev_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickev_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickev_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickev_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickod_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickod_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickod_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickod_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickod_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickod_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpickod_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpickod_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvh_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvh_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvh_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvh_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvh_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvh_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvh_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvh_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvl_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvl_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvl_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvl_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvl_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvl_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vilvl_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vilvl_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackev_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackev_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackev_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackev_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackev_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackev_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackev_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackev_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackod_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackod_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackod_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackod_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackod_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackod_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpackod_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vpackod_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vshuf_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vshuf_h ((v8i16)_1, (v8i16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vshuf_w (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vshuf_w ((v4i32)_1, (v4i32)_2, (v4i32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vshuf_d (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vshuf_d ((v2i64)_1, (v2i64)_2, (v2i64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vand_v (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vand_v ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vandi_b(/*__m128i*/ _1, /*ui8*/ _2) \ + ((__m128i)__builtin_lsx_vandi_b ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vor_v (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vor_v ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vori_b(/*__m128i*/ _1, /*ui8*/ _2) \ + ((__m128i)__builtin_lsx_vori_b ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vnor_v (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vnor_v ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vnori_b(/*__m128i*/ _1, /*ui8*/ _2) \ + ((__m128i)__builtin_lsx_vnori_b ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vxor_v (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vxor_v ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: UV16QI, UV16QI, UQI. */ +#define __lsx_vxori_b(/*__m128i*/ _1, /*ui8*/ _2) \ + ((__m128i)__builtin_lsx_vxori_b ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vbitsel_v (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vbitsel_v ((v16u8)_1, (v16u8)_2, (v16u8)_3); +} + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI, USI. */ +#define __lsx_vbitseli_b(/*__m128i*/ _1, /*__m128i*/ _2, /*ui8*/ _3) \ + ((__m128i)__builtin_lsx_vbitseli_b ((v16u8)(_1), (v16u8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V16QI, V16QI, USI. */ +#define __lsx_vshuf4i_b(/*__m128i*/ _1, /*ui8*/ _2) \ + ((__m128i)__builtin_lsx_vshuf4i_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V8HI, V8HI, USI. */ +#define __lsx_vshuf4i_h(/*__m128i*/ _1, /*ui8*/ _2) \ + ((__m128i)__builtin_lsx_vshuf4i_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V4SI, V4SI, USI. */ +#define __lsx_vshuf4i_w(/*__m128i*/ _1, /*ui8*/ _2) \ + ((__m128i)__builtin_lsx_vshuf4i_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, rj. */ +/* Data types in instruction templates: V16QI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplgr2vr_b (int _1) +{ + return (__m128i)__builtin_lsx_vreplgr2vr_b ((int)_1); +} + +/* Assembly instruction format: vd, rj. */ +/* Data types in instruction templates: V8HI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplgr2vr_h (int _1) +{ + return (__m128i)__builtin_lsx_vreplgr2vr_h ((int)_1); +} + +/* Assembly instruction format: vd, rj. */ +/* Data types in instruction templates: V4SI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplgr2vr_w (int _1) +{ + return (__m128i)__builtin_lsx_vreplgr2vr_w ((int)_1); +} + +/* Assembly instruction format: vd, rj. */ +/* Data types in instruction templates: V2DI, DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vreplgr2vr_d (long int _1) +{ + return (__m128i)__builtin_lsx_vreplgr2vr_d ((long int)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpcnt_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vpcnt_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpcnt_h (__m128i _1) +{ + return (__m128i)__builtin_lsx_vpcnt_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpcnt_w (__m128i _1) +{ + return (__m128i)__builtin_lsx_vpcnt_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vpcnt_d (__m128i _1) +{ + return (__m128i)__builtin_lsx_vpcnt_d ((v2i64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclo_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclo_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclo_h (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclo_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclo_w (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclo_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclo_d (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclo_d ((v2i64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclz_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclz_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclz_h (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclz_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclz_w (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclz_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vclz_d (__m128i _1) +{ + return (__m128i)__builtin_lsx_vclz_d ((v2i64)_1); +} + +/* Assembly instruction format: rd, vj, ui4. */ +/* Data types in instruction templates: SI, V16QI, UQI. */ +#define __lsx_vpickve2gr_b(/*__m128i*/ _1, /*ui4*/ _2) \ + ((int)__builtin_lsx_vpickve2gr_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: rd, vj, ui3. */ +/* Data types in instruction templates: SI, V8HI, UQI. */ +#define __lsx_vpickve2gr_h(/*__m128i*/ _1, /*ui3*/ _2) \ + ((int)__builtin_lsx_vpickve2gr_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: rd, vj, ui2. */ +/* Data types in instruction templates: SI, V4SI, UQI. */ +#define __lsx_vpickve2gr_w(/*__m128i*/ _1, /*ui2*/ _2) \ + ((int)__builtin_lsx_vpickve2gr_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: rd, vj, ui1. */ +/* Data types in instruction templates: DI, V2DI, UQI. */ +#define __lsx_vpickve2gr_d(/*__m128i*/ _1, /*ui1*/ _2) \ + ((long int)__builtin_lsx_vpickve2gr_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: rd, vj, ui4. */ +/* Data types in instruction templates: USI, V16QI, UQI. */ +#define __lsx_vpickve2gr_bu(/*__m128i*/ _1, /*ui4*/ _2) \ + ((unsigned int)__builtin_lsx_vpickve2gr_bu ((v16i8)(_1), (_2))) + +/* Assembly instruction format: rd, vj, ui3. */ +/* Data types in instruction templates: USI, V8HI, UQI. */ +#define __lsx_vpickve2gr_hu(/*__m128i*/ _1, /*ui3*/ _2) \ + ((unsigned int)__builtin_lsx_vpickve2gr_hu ((v8i16)(_1), (_2))) + +/* Assembly instruction format: rd, vj, ui2. */ +/* Data types in instruction templates: USI, V4SI, UQI. */ +#define __lsx_vpickve2gr_wu(/*__m128i*/ _1, /*ui2*/ _2) \ + ((unsigned int)__builtin_lsx_vpickve2gr_wu ((v4i32)(_1), (_2))) + +/* Assembly instruction format: rd, vj, ui1. */ +/* Data types in instruction templates: UDI, V2DI, UQI. */ +#define __lsx_vpickve2gr_du(/*__m128i*/ _1, /*ui1*/ _2) \ + ((unsigned long int)__builtin_lsx_vpickve2gr_du ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, rj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, SI, UQI. */ +#define __lsx_vinsgr2vr_b(/*__m128i*/ _1, /*int*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vinsgr2vr_b ((v16i8)(_1), (int)(_2), (_3))) + +/* Assembly instruction format: vd, rj, ui3. */ +/* Data types in instruction templates: V8HI, V8HI, SI, UQI. */ +#define __lsx_vinsgr2vr_h(/*__m128i*/ _1, /*int*/ _2, /*ui3*/ _3) \ + ((__m128i)__builtin_lsx_vinsgr2vr_h ((v8i16)(_1), (int)(_2), (_3))) + +/* Assembly instruction format: vd, rj, ui2. */ +/* Data types in instruction templates: V4SI, V4SI, SI, UQI. */ +#define __lsx_vinsgr2vr_w(/*__m128i*/ _1, /*int*/ _2, /*ui2*/ _3) \ + ((__m128i)__builtin_lsx_vinsgr2vr_w ((v4i32)(_1), (int)(_2), (_3))) + +/* Assembly instruction format: vd, rj, ui1. */ +/* Data types in instruction templates: V2DI, V2DI, DI, UQI. */ +#define __lsx_vinsgr2vr_d(/*__m128i*/ _1, /*long int*/ _2, /*ui1*/ _3) \ + ((__m128i)__builtin_lsx_vinsgr2vr_d ((v2i64)(_1), (long int)(_2), (_3))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfadd_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfadd_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfadd_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfadd_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfsub_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfsub_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfsub_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfsub_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfmul_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfmul_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfmul_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfmul_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfdiv_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfdiv_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfdiv_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfdiv_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcvt_h_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcvt_h_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfcvt_s_d (__m128d _1, __m128d _2) +{ + return (__m128)__builtin_lsx_vfcvt_s_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfmin_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfmin_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfmin_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfmin_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfmina_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfmina_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfmina_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfmina_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfmax_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfmax_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfmax_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfmax_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfmaxa_s (__m128 _1, __m128 _2) +{ + return (__m128)__builtin_lsx_vfmaxa_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfmaxa_d (__m128d _1, __m128d _2) +{ + return (__m128d)__builtin_lsx_vfmaxa_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfclass_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vfclass_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfclass_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vfclass_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfsqrt_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfsqrt_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfsqrt_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfsqrt_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrecip_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrecip_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrecip_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrecip_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrint_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrint_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrint_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrint_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrsqrt_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrsqrt_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrsqrt_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrsqrt_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vflogb_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vflogb_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vflogb_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vflogb_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfcvth_s_h (__m128i _1) +{ + return (__m128)__builtin_lsx_vfcvth_s_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfcvth_d_s (__m128 _1) +{ + return (__m128d)__builtin_lsx_vfcvth_d_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfcvtl_s_h (__m128i _1) +{ + return (__m128)__builtin_lsx_vfcvtl_s_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfcvtl_d_s (__m128 _1) +{ + return (__m128d)__builtin_lsx_vfcvtl_d_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftint_w_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftint_w_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftint_l_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vftint_l_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftint_wu_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftint_wu_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftint_lu_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vftint_lu_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrz_w_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrz_w_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrz_l_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vftintrz_l_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrz_wu_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrz_wu_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrz_lu_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vftintrz_lu_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vffint_s_w (__m128i _1) +{ + return (__m128)__builtin_lsx_vffint_s_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vffint_d_l (__m128i _1) +{ + return (__m128d)__builtin_lsx_vffint_d_l ((v2i64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vffint_s_wu (__m128i _1) +{ + return (__m128)__builtin_lsx_vffint_s_wu ((v4u32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vffint_d_lu (__m128i _1) +{ + return (__m128d)__builtin_lsx_vffint_d_lu ((v2u64)_1); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vandn_v (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vandn_v ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vneg_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vneg_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vneg_h (__m128i _1) +{ + return (__m128i)__builtin_lsx_vneg_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vneg_w (__m128i _1) +{ + return (__m128i)__builtin_lsx_vneg_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vneg_d (__m128i _1) +{ + return (__m128i)__builtin_lsx_vneg_d ((v2i64)_1); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmuh_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmuh_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V8HI, V16QI, UQI. */ +#define __lsx_vsllwil_h_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsllwil_h_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V4SI, V8HI, UQI. */ +#define __lsx_vsllwil_w_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsllwil_w_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V2DI, V4SI, UQI. */ +#define __lsx_vsllwil_d_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsllwil_d_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: UV8HI, UV16QI, UQI. */ +#define __lsx_vsllwil_hu_bu(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vsllwil_hu_bu ((v16u8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV4SI, UV8HI, UQI. */ +#define __lsx_vsllwil_wu_hu(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vsllwil_wu_hu ((v8u16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV2DI, UV4SI, UQI. */ +#define __lsx_vsllwil_du_wu(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vsllwil_du_wu ((v4u32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsran_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsran_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsran_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsran_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsran_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsran_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssran_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssran_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssran_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssran_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssran_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssran_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssran_bu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssran_bu_h ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssran_hu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssran_hu_w ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssran_wu_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssran_wu_d ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrarn_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrarn_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrarn_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrarn_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrarn_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrarn_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrarn_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrarn_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrarn_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrarn_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrarn_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrarn_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrarn_bu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrarn_bu_h ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrarn_hu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrarn_hu_w ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrarn_wu_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrarn_wu_d ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrln_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrln_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrln_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrln_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrln_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrln_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrln_bu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrln_bu_h ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrln_hu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrln_hu_w ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrln_wu_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrln_wu_d ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrlrn_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrlrn_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrlrn_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrlrn_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsrlrn_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsrlrn_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV16QI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrlrn_bu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrlrn_bu_h ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrlrn_hu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrlrn_hu_w ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrlrn_wu_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrlrn_wu_d ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, UQI. */ +#define __lsx_vfrstpi_b(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vfrstpi_b ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, UQI. */ +#define __lsx_vfrstpi_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vfrstpi_h ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfrstp_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vfrstp_b ((v16i8)_1, (v16i8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfrstp_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vfrstp_h ((v8i16)_1, (v8i16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vshuf4i_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui8*/ _3) \ + ((__m128i)__builtin_lsx_vshuf4i_d ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vbsrl_v(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vbsrl_v ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vbsll_v(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vbsll_v ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vextrins_b(/*__m128i*/ _1, /*__m128i*/ _2, /*ui8*/ _3) \ + ((__m128i)__builtin_lsx_vextrins_b ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vextrins_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui8*/ _3) \ + ((__m128i)__builtin_lsx_vextrins_h ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vextrins_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui8*/ _3) \ + ((__m128i)__builtin_lsx_vextrins_w ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vextrins_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui8*/ _3) \ + ((__m128i)__builtin_lsx_vextrins_d ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmskltz_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vmskltz_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmskltz_h (__m128i _1) +{ + return (__m128i)__builtin_lsx_vmskltz_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmskltz_w (__m128i _1) +{ + return (__m128i)__builtin_lsx_vmskltz_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmskltz_d (__m128i _1) +{ + return (__m128i)__builtin_lsx_vmskltz_d ((v2i64)_1); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsigncov_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsigncov_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsigncov_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsigncov_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsigncov_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsigncov_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsigncov_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsigncov_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfmadd_s (__m128 _1, __m128 _2, __m128 _3) +{ + return (__m128)__builtin_lsx_vfmadd_s ((v4f32)_1, (v4f32)_2, (v4f32)_3); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfmadd_d (__m128d _1, __m128d _2, __m128d _3) +{ + return (__m128d)__builtin_lsx_vfmadd_d ((v2f64)_1, (v2f64)_2, (v2f64)_3); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfmsub_s (__m128 _1, __m128 _2, __m128 _3) +{ + return (__m128)__builtin_lsx_vfmsub_s ((v4f32)_1, (v4f32)_2, (v4f32)_3); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfmsub_d (__m128d _1, __m128d _2, __m128d _3) +{ + return (__m128d)__builtin_lsx_vfmsub_d ((v2f64)_1, (v2f64)_2, (v2f64)_3); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfnmadd_s (__m128 _1, __m128 _2, __m128 _3) +{ + return (__m128)__builtin_lsx_vfnmadd_s ((v4f32)_1, (v4f32)_2, (v4f32)_3); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfnmadd_d (__m128d _1, __m128d _2, __m128d _3) +{ + return (__m128d)__builtin_lsx_vfnmadd_d ((v2f64)_1, (v2f64)_2, (v2f64)_3); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V4SF, V4SF, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfnmsub_s (__m128 _1, __m128 _2, __m128 _3) +{ + return (__m128)__builtin_lsx_vfnmsub_s ((v4f32)_1, (v4f32)_2, (v4f32)_3); +} + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V2DF, V2DF, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfnmsub_d (__m128d _1, __m128d _2, __m128d _3) +{ + return (__m128d)__builtin_lsx_vfnmsub_d ((v2f64)_1, (v2f64)_2, (v2f64)_3); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrne_w_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrne_w_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrne_l_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vftintrne_l_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrp_w_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrp_w_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrp_l_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vftintrp_l_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrm_w_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrm_w_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrm_l_d (__m128d _1) +{ + return (__m128i)__builtin_lsx_vftintrm_l_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftint_w_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vftint_w_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SF, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vffint_s_l (__m128i _1, __m128i _2) +{ + return (__m128)__builtin_lsx_vffint_s_l ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrz_w_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vftintrz_w_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrp_w_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vftintrp_w_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrm_w_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vftintrm_w_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrne_w_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vftintrne_w_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintl_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintl_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftinth_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftinth_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vffinth_d_w (__m128i _1) +{ + return (__m128d)__builtin_lsx_vffinth_d_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vffintl_d_w (__m128i _1) +{ + return (__m128d)__builtin_lsx_vffintl_d_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrzl_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrzl_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrzh_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrzh_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrpl_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrpl_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrph_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrph_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrml_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrml_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrmh_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrmh_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrnel_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrnel_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vftintrneh_l_s (__m128 _1) +{ + return (__m128i)__builtin_lsx_vftintrneh_l_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrintrne_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrintrne_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrintrne_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrintrne_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrintrz_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrintrz_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrintrz_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrintrz_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrintrp_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrintrp_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrintrp_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrintrp_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrintrm_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrintrm_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrintrm_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrintrm_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V16QI, CVPOINTER, SI, UQI. */ +#define __lsx_vstelm_b(/*__m128i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lsx_vstelm_b ((v16i8)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: vd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V8HI, CVPOINTER, SI, UQI. */ +#define __lsx_vstelm_h(/*__m128i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lsx_vstelm_h ((v8i16)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: vd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V4SI, CVPOINTER, SI, UQI. */ +#define __lsx_vstelm_w(/*__m128i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lsx_vstelm_w ((v4i32)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: vd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V2DI, CVPOINTER, SI, UQI. */ +#define __lsx_vstelm_d(/*__m128i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lsx_vstelm_d ((v2i64)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_d_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_d_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_w_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_w_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_h_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_h_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_d_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_d_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_w_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_w_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_h_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_h_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_d_wu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_d_wu_w ((v4u32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_w_hu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_w_hu_h ((v8u16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_h_bu_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_h_bu_b ((v16u8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_d_wu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_d_wu_w ((v4u32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_w_hu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_w_hu_h ((v8u16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_h_bu_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_h_bu_b ((v16u8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_d_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_d_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_w_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_w_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_h_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_h_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_d_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_d_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_w_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_w_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_h_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_h_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_q_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_q_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_q_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_q_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwev_q_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwev_q_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsubwod_q_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsubwod_q_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwev_q_du_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwev_q_du_d ((v2u64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vaddwod_q_du_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vaddwod_q_du_d ((v2u64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_d_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_d_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_w_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_w_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_h_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_h_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_d_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_d_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_w_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_w_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_h_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_h_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_d_wu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_d_wu ((v4u32)_1, (v4u32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_w_hu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_w_hu ((v8u16)_1, (v8u16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_h_bu (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_h_bu ((v16u8)_1, (v16u8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_d_wu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_d_wu_w ((v4u32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_w_hu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_w_hu_h ((v8u16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_h_bu_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_h_bu_b ((v16u8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_d_wu_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_d_wu_w ((v4u32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, UV8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_w_hu_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_w_hu_h ((v8u16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, UV16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_h_bu_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_h_bu_b ((v16u8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_q_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_q_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_q_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_q_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwev_q_du_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwev_q_du_d ((v2u64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, UV2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmulwod_q_du_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vmulwod_q_du_d ((v2u64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhaddw_qu_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhaddw_qu_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_q_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_q_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vhsubw_qu_du (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vhsubw_qu_du ((v2u64)_1, (v2u64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_d_w (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_d_w ((v2i64)_1, (v4i32)_2, (v4i32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_w_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_w_h ((v4i32)_1, (v8i16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_h_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_h_b ((v8i16)_1, (v16i8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_d_wu (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_d_wu ((v2u64)_1, (v4u32)_2, (v4u32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_w_hu (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_w_hu ((v4u32)_1, (v8u16)_2, (v8u16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_h_bu (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_h_bu ((v8u16)_1, (v16u8)_2, (v16u8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_d_w (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_d_w ((v2i64)_1, (v4i32)_2, (v4i32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_w_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_w_h ((v4i32)_1, (v8i16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_h_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_h_b ((v8i16)_1, (v16i8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV4SI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_d_wu (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_d_wu ((v2u64)_1, (v4u32)_2, (v4u32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV4SI, UV4SI, UV8HI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_w_hu (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_w_hu ((v4u32)_1, (v8u16)_2, (v8u16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV8HI, UV8HI, UV16QI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_h_bu (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_h_bu ((v8u16)_1, (v16u8)_2, (v16u8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, UV4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_d_wu_w (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_d_wu_w ((v2i64)_1, (v4u32)_2, (v4i32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, UV8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_w_hu_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_w_hu_h ((v4i32)_1, (v8u16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, UV16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_h_bu_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_h_bu_b ((v8i16)_1, (v16u8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, UV4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_d_wu_w (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_d_wu_w ((v2i64)_1, (v4u32)_2, (v4i32)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, UV8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_w_hu_h (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_w_hu_h ((v4i32)_1, (v8u16)_2, (v8i16)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, UV16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_h_bu_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_h_bu_b ((v8i16)_1, (v16u8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_q_d (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_q_d ((v2i64)_1, (v2i64)_2, (v2i64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_q_d (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_q_d ((v2i64)_1, (v2i64)_2, (v2i64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_q_du (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_q_du ((v2u64)_1, (v2u64)_2, (v2u64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: UV2DI, UV2DI, UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_q_du (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_q_du ((v2u64)_1, (v2u64)_2, (v2u64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, UV2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwev_q_du_d (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwev_q_du_d ((v2i64)_1, (v2u64)_2, (v2i64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, UV2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmaddwod_q_du_d (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vmaddwod_q_du_d ((v2i64)_1, (v2u64)_2, (v2i64)_3); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vrotr_b (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vrotr_b ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vrotr_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vrotr_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vrotr_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vrotr_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vrotr_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vrotr_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vadd_q (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vadd_q ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vsub_q (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vsub_q ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, rj, si12. */ +/* Data types in instruction templates: V16QI, CVPOINTER, SI. */ +#define __lsx_vldrepl_b(/*void **/ _1, /*si12*/ _2) \ + ((__m128i)__builtin_lsx_vldrepl_b ((void *)(_1), (_2))) + +/* Assembly instruction format: vd, rj, si11. */ +/* Data types in instruction templates: V8HI, CVPOINTER, SI. */ +#define __lsx_vldrepl_h(/*void **/ _1, /*si11*/ _2) \ + ((__m128i)__builtin_lsx_vldrepl_h ((void *)(_1), (_2))) + +/* Assembly instruction format: vd, rj, si10. */ +/* Data types in instruction templates: V4SI, CVPOINTER, SI. */ +#define __lsx_vldrepl_w(/*void **/ _1, /*si10*/ _2) \ + ((__m128i)__builtin_lsx_vldrepl_w ((void *)(_1), (_2))) + +/* Assembly instruction format: vd, rj, si9. */ +/* Data types in instruction templates: V2DI, CVPOINTER, SI. */ +#define __lsx_vldrepl_d(/*void **/ _1, /*si9*/ _2) \ + ((__m128i)__builtin_lsx_vldrepl_d ((void *)(_1), (_2))) + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmskgez_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vmskgez_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vmsknz_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vmsknz_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V8HI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_h_b (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_h_b ((v16i8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_w_h (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_w_h ((v8i16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_d_w (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_d_w ((v4i32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_q_d (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_q_d ((v2i64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV8HI, UV16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_hu_bu (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_hu_bu ((v16u8)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV4SI, UV8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_wu_hu (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_wu_hu ((v8u16)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV2DI, UV4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_du_wu (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_du_wu ((v4u32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vexth_qu_du (__m128i _1) +{ + return (__m128i)__builtin_lsx_vexth_qu_du ((v2u64)_1); +} + +/* Assembly instruction format: vd, vj, ui3. */ +/* Data types in instruction templates: V16QI, V16QI, UQI. */ +#define __lsx_vrotri_b(/*__m128i*/ _1, /*ui3*/ _2) \ + ((__m128i)__builtin_lsx_vrotri_b ((v16i8)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V8HI, V8HI, UQI. */ +#define __lsx_vrotri_h(/*__m128i*/ _1, /*ui4*/ _2) \ + ((__m128i)__builtin_lsx_vrotri_h ((v8i16)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V4SI, V4SI, UQI. */ +#define __lsx_vrotri_w(/*__m128i*/ _1, /*ui5*/ _2) \ + ((__m128i)__builtin_lsx_vrotri_w ((v4i32)(_1), (_2))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V2DI, V2DI, UQI. */ +#define __lsx_vrotri_d(/*__m128i*/ _1, /*ui6*/ _2) \ + ((__m128i)__builtin_lsx_vrotri_d ((v2i64)(_1), (_2))) + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vextl_q_d (__m128i _1) +{ + return (__m128i)__builtin_lsx_vextl_q_d ((v2i64)_1); +} + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vsrlni_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vsrlni_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vsrlni_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vsrlni_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vsrlni_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vsrlni_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vsrlni_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vsrlni_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vsrlrni_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vsrlrni_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vsrlrni_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vsrlrni_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vsrlrni_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vsrlrni_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vsrlrni_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vsrlrni_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vssrlni_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vssrlni_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vssrlni_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vssrlni_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV16QI, UV16QI, V16QI, USI. */ +#define __lsx_vssrlni_bu_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_bu_h ((v16u8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV8HI, UV8HI, V8HI, USI. */ +#define __lsx_vssrlni_hu_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_hu_w ((v8u16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV4SI, UV4SI, V4SI, USI. */ +#define __lsx_vssrlni_wu_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_wu_d ((v4u32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: UV2DI, UV2DI, V2DI, USI. */ +#define __lsx_vssrlni_du_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrlni_du_q ((v2u64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vssrlrni_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vssrlrni_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vssrlrni_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vssrlrni_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV16QI, UV16QI, V16QI, USI. */ +#define __lsx_vssrlrni_bu_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_bu_h ((v16u8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV8HI, UV8HI, V8HI, USI. */ +#define __lsx_vssrlrni_hu_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_hu_w ((v8u16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV4SI, UV4SI, V4SI, USI. */ +#define __lsx_vssrlrni_wu_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_wu_d ((v4u32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: UV2DI, UV2DI, V2DI, USI. */ +#define __lsx_vssrlrni_du_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrlrni_du_q ((v2u64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vsrani_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vsrani_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vsrani_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vsrani_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vsrani_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vsrani_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vsrani_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vsrani_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vsrarni_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vsrarni_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vsrarni_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vsrarni_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vsrarni_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vsrarni_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vsrarni_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vsrarni_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vssrani_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vssrani_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vssrani_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vssrani_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV16QI, UV16QI, V16QI, USI. */ +#define __lsx_vssrani_bu_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_bu_h ((v16u8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV8HI, UV8HI, V8HI, USI. */ +#define __lsx_vssrani_hu_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_hu_w ((v8u16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV4SI, UV4SI, V4SI, USI. */ +#define __lsx_vssrani_wu_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_wu_d ((v4u32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: UV2DI, UV2DI, V2DI, USI. */ +#define __lsx_vssrani_du_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrani_du_q ((v2u64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, USI. */ +#define __lsx_vssrarni_b_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_b_h ((v16i8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: V8HI, V8HI, V8HI, USI. */ +#define __lsx_vssrarni_h_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_h_w ((v8i16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vssrarni_w_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_w_d ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: V2DI, V2DI, V2DI, USI. */ +#define __lsx_vssrarni_d_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_d_q ((v2i64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui4. */ +/* Data types in instruction templates: UV16QI, UV16QI, V16QI, USI. */ +#define __lsx_vssrarni_bu_h(/*__m128i*/ _1, /*__m128i*/ _2, /*ui4*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_bu_h ((v16u8)(_1), (v16i8)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui5. */ +/* Data types in instruction templates: UV8HI, UV8HI, V8HI, USI. */ +#define __lsx_vssrarni_hu_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui5*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_hu_w ((v8u16)(_1), (v8i16)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui6. */ +/* Data types in instruction templates: UV4SI, UV4SI, V4SI, USI. */ +#define __lsx_vssrarni_wu_d(/*__m128i*/ _1, /*__m128i*/ _2, /*ui6*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_wu_d ((v4u32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui7. */ +/* Data types in instruction templates: UV2DI, UV2DI, V2DI, USI. */ +#define __lsx_vssrarni_du_q(/*__m128i*/ _1, /*__m128i*/ _2, /*ui7*/ _3) \ + ((__m128i)__builtin_lsx_vssrarni_du_q ((v2u64)(_1), (v2i64)(_2), (_3))) + +/* Assembly instruction format: vd, vj, ui8. */ +/* Data types in instruction templates: V4SI, V4SI, V4SI, USI. */ +#define __lsx_vpermi_w(/*__m128i*/ _1, /*__m128i*/ _2, /*ui8*/ _3) \ + ((__m128i)__builtin_lsx_vpermi_w ((v4i32)(_1), (v4i32)(_2), (_3))) + +/* Assembly instruction format: vd, rj, si12. */ +/* Data types in instruction templates: V16QI, CVPOINTER, SI. */ +#define __lsx_vld(/*void **/ _1, /*si12*/ _2) \ + ((__m128i)__builtin_lsx_vld ((void *)(_1), (_2))) + +/* Assembly instruction format: vd, rj, si12. */ +/* Data types in instruction templates: VOID, V16QI, CVPOINTER, SI. */ +#define __lsx_vst(/*__m128i*/ _1, /*void **/ _2, /*si12*/ _3) \ + ((void)__builtin_lsx_vst ((v16i8)(_1), (void *)(_2), (_3))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrlrn_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrlrn_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrlrn_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrlrn_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrlrn_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrlrn_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V8HI, V8HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrln_b_h (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrln_b_h ((v8i16)_1, (v8i16)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V8HI, V4SI, V4SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrln_h_w (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrln_h_w ((v4i32)_1, (v4i32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V2DI, V2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vssrln_w_d (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vssrln_w_d ((v2i64)_1, (v2i64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vorn_v (__m128i _1, __m128i _2) +{ + return (__m128i)__builtin_lsx_vorn_v ((v16i8)_1, (v16i8)_2); +} + +/* Assembly instruction format: vd, i13. */ +/* Data types in instruction templates: V2DI, HI. */ +#define __lsx_vldi(/*i13*/ _1) \ + ((__m128i)__builtin_lsx_vldi ((_1))) + +/* Assembly instruction format: vd, vj, vk, va. */ +/* Data types in instruction templates: V16QI, V16QI, V16QI, V16QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vshuf_b (__m128i _1, __m128i _2, __m128i _3) +{ + return (__m128i)__builtin_lsx_vshuf_b ((v16i8)_1, (v16i8)_2, (v16i8)_3); +} + +/* Assembly instruction format: vd, rj, rk. */ +/* Data types in instruction templates: V16QI, CVPOINTER, DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vldx (void * _1, long int _2) +{ + return (__m128i)__builtin_lsx_vldx ((void *)_1, (long int)_2); +} + +/* Assembly instruction format: vd, rj, rk. */ +/* Data types in instruction templates: VOID, V16QI, CVPOINTER, DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +void __lsx_vstx (__m128i _1, void * _2, long int _3) +{ + return (void)__builtin_lsx_vstx ((v16i8)_1, (void *)_2, (long int)_3); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: UV2DI, UV2DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vextl_qu_du (__m128i _1) +{ + return (__m128i)__builtin_lsx_vextl_qu_du ((v2u64)_1); +} + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV16QI. */ +#define __lsx_bnz_b(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bnz_b ((v16u8)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV2DI. */ +#define __lsx_bnz_d(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bnz_d ((v2u64)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV8HI. */ +#define __lsx_bnz_h(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bnz_h ((v8u16)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV16QI. */ +#define __lsx_bnz_v(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bnz_v ((v16u8)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV4SI. */ +#define __lsx_bnz_w(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bnz_w ((v4u32)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV16QI. */ +#define __lsx_bz_b(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bz_b ((v16u8)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV2DI. */ +#define __lsx_bz_d(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bz_d ((v2u64)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV8HI. */ +#define __lsx_bz_h(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bz_h ((v8u16)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV16QI. */ +#define __lsx_bz_v(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bz_v ((v16u8)(_1))) + +/* Assembly instruction format: cd, vj. */ +/* Data types in instruction templates: SI, UV4SI. */ +#define __lsx_bz_w(/*__m128i*/ _1) \ + ((int)__builtin_lsx_bz_w ((v4u32)(_1))) + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_caf_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_caf_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_caf_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_caf_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_ceq_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_ceq_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_ceq_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_ceq_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cle_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cle_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cle_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cle_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_clt_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_clt_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_clt_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_clt_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cne_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cne_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cne_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cne_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cor_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cor_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cor_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cor_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cueq_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cueq_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cueq_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cueq_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cule_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cule_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cule_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cule_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cult_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cult_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cult_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cult_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cun_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cun_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cune_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cune_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cune_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cune_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_cun_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_cun_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_saf_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_saf_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_saf_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_saf_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_seq_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_seq_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_seq_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_seq_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sle_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sle_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sle_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sle_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_slt_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_slt_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_slt_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_slt_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sne_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sne_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sne_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sne_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sor_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sor_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sor_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sor_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sueq_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sueq_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sueq_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sueq_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sule_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sule_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sule_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sule_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sult_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sult_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sult_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sult_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sun_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sun_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V2DI, V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sune_d (__m128d _1, __m128d _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sune_d ((v2f64)_1, (v2f64)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sune_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sune_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, vj, vk. */ +/* Data types in instruction templates: V4SI, V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128i __lsx_vfcmp_sun_s (__m128 _1, __m128 _2) +{ + return (__m128i)__builtin_lsx_vfcmp_sun_s ((v4f32)_1, (v4f32)_2); +} + +/* Assembly instruction format: vd, si10. */ +/* Data types in instruction templates: V16QI, HI. */ +#define __lsx_vrepli_b(/*si10*/ _1) \ + ((__m128i)__builtin_lsx_vrepli_b ((_1))) + +/* Assembly instruction format: vd, si10. */ +/* Data types in instruction templates: V2DI, HI. */ +#define __lsx_vrepli_d(/*si10*/ _1) \ + ((__m128i)__builtin_lsx_vrepli_d ((_1))) + +/* Assembly instruction format: vd, si10. */ +/* Data types in instruction templates: V8HI, HI. */ +#define __lsx_vrepli_h(/*si10*/ _1) \ + ((__m128i)__builtin_lsx_vrepli_h ((_1))) + +/* Assembly instruction format: vd, si10. */ +/* Data types in instruction templates: V4SI, HI. */ +#define __lsx_vrepli_w(/*si10*/ _1) \ + ((__m128i)__builtin_lsx_vrepli_w ((_1))) + +#endif /* defined(__loongarch_sx) */ +#endif /* _GCC_LOONGSON_SXINTRIN_H */ From patchwork Thu Aug 24 03:13:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenghui Pan X-Patchwork-Id: 1825105 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RWSqL6nczz1yfF for ; Thu, 24 Aug 2023 13:14:19 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AFD4E3853D3B for ; Thu, 24 Aug 2023 03:14:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id D9A923858422 for ; Thu, 24 Aug 2023 03:14:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D9A923858422 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.45]) by gateway (Coremail) with SMTP id _____8Cxh+j4yuZk9mEbAA--.20299S3; Thu, 24 Aug 2023 11:14:00 +0800 (CST) Received: from loongson-pc.loongson.cn (unknown [10.20.4.45]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxviPdyuZkzvJhAA--.583S8; Thu, 24 Aug 2023 11:13:59 +0800 (CST) From: Chenghui Pan To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn Subject: [PATCH v5 4/6] LoongArch: Add Loongson ASX vector directive compilation framework. Date: Thu, 24 Aug 2023 11:13:14 +0800 Message-Id: <20230824031316.16599-5-panchenghui@loongson.cn> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230824031316.16599-1-panchenghui@loongson.cn> References: <20230824031316.16599-1-panchenghui@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxviPdyuZkzvJhAA--.583S8 X-CM-SenderInfo: psdquxxhqjx33l6o00pqjv00gofq/1tbiAQANBGTlhzMLuAADsN X-Coremail-Antispam: 1Uk129KBj93XoW3AFWfXrWfXF4DGw17Kw18CrX_yoWfKr4fpr 9rZw13tr48GFsagw1Dt3s8Ww1DJr9rGw12qa13tF18Cay7uryUZr1rJr9xXF1j9a1rXry2 qr1rKa1jva18J3cCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUk2b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv6x kF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxU2F4iUUUUU X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Lulu Cheng gcc/ChangeLog: * config/loongarch/genopts/loongarch-strings: Add compilation framework. * config/loongarch/genopts/loongarch.opt.in: Ditto. * config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto. * config/loongarch/loongarch-def.c: Ditto. * config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto. (ISA_EXT_SIMD_LASX): Ditto. (N_SWITCH_TYPES): Ditto. (SW_LASX): Ditto. * config/loongarch/loongarch-driver.cc (driver_get_normalized_m_opts): Ditto. * config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): Ditto. * config/loongarch/loongarch-opts.cc (isa_str): Ditto. * config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto. (ISA_HAS_LASX): Ditto. * config/loongarch/loongarch-str.h (OPTSTR_LASX): Ditto. * config/loongarch/loongarch.opt: Ditto. --- gcc/config/loongarch/genopts/loongarch-strings | 1 + gcc/config/loongarch/genopts/loongarch.opt.in | 4 ++++ gcc/config/loongarch/loongarch-c.cc | 11 +++++++++++ gcc/config/loongarch/loongarch-def.c | 4 +++- gcc/config/loongarch/loongarch-def.h | 6 ++++-- gcc/config/loongarch/loongarch-driver.cc | 2 +- gcc/config/loongarch/loongarch-driver.h | 1 + gcc/config/loongarch/loongarch-opts.cc | 9 ++++++++- gcc/config/loongarch/loongarch-opts.h | 4 +++- gcc/config/loongarch/loongarch-str.h | 1 + gcc/config/loongarch/loongarch.opt | 4 ++++ 11 files changed, 41 insertions(+), 6 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch-strings b/gcc/config/loongarch/genopts/loongarch-strings index 24a5025061f..35d08f5967d 100644 --- a/gcc/config/loongarch/genopts/loongarch-strings +++ b/gcc/config/loongarch/genopts/loongarch-strings @@ -42,6 +42,7 @@ OPTSTR_DOUBLE_FLOAT double-float # SIMD extensions OPTSTR_LSX lsx +OPTSTR_LASX lasx # -mabi= OPTSTR_ABI_BASE abi diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 338d77a7e40..afde23c9661 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -80,6 +80,10 @@ m@@OPTSTR_LSX@@ Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@) Enable LoongArch SIMD Extension (LSX). +m@@OPTSTR_LASX@@ +Target RejectNegative Var(la_opt_switches) Mask(LASX) Negative(m@@OPTSTR_LASX@@) +Enable LoongArch Advanced SIMD Extension (LASX). + ;; Base target models (implies ISA & tune parameters) Enum Name(cpu_type) Type(int) diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index b065921adc3..2747fb9e472 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -104,8 +104,19 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) builtin_define ("__loongarch_simd"); builtin_define ("__loongarch_sx"); builtin_define ("__loongarch_sx_width=128"); + + if (!ISA_HAS_LASX) + builtin_define ("__loongarch_simd_width=128"); } + if (ISA_HAS_LASX) + { + builtin_define ("__loongarch_asx"); + builtin_define ("__loongarch_asx_width=256"); + builtin_define ("__loongarch_simd_width=256"); + } + + /* Native Data Sizes. */ builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE); builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE); diff --git a/gcc/config/loongarch/loongarch-def.c b/gcc/config/loongarch/loongarch-def.c index 28e24c62249..bff92c86532 100644 --- a/gcc/config/loongarch/loongarch-def.c +++ b/gcc/config/loongarch/loongarch-def.c @@ -54,7 +54,7 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = { [CPU_LA464] = { .base = ISA_BASE_LA64V100, .fpu = ISA_EXT_FPU64, - .simd = ISA_EXT_SIMD_LSX, + .simd = ISA_EXT_SIMD_LASX, }, }; @@ -150,6 +150,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = { [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32, [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU, [ISA_EXT_SIMD_LSX] = OPTSTR_LSX, + [ISA_EXT_SIMD_LASX] = OPTSTR_LASX, }; const char* @@ -180,6 +181,7 @@ loongarch_switch_strings[] = { [SW_SINGLE_FLOAT] = OPTSTR_SINGLE_FLOAT, [SW_DOUBLE_FLOAT] = OPTSTR_DOUBLE_FLOAT, [SW_LSX] = OPTSTR_LSX, + [SW_LASX] = OPTSTR_LASX, }; diff --git a/gcc/config/loongarch/loongarch-def.h b/gcc/config/loongarch/loongarch-def.h index f34cffcfb9b..0bbcdb03d22 100644 --- a/gcc/config/loongarch/loongarch-def.h +++ b/gcc/config/loongarch/loongarch-def.h @@ -64,7 +64,8 @@ extern const char* loongarch_isa_ext_strings[]; #define ISA_EXT_FPU64 2 #define N_ISA_EXT_FPU_TYPES 3 #define ISA_EXT_SIMD_LSX 3 -#define N_ISA_EXT_TYPES 4 +#define ISA_EXT_SIMD_LASX 4 +#define N_ISA_EXT_TYPES 5 /* enum abi_base */ extern const char* loongarch_abi_base_strings[]; @@ -99,7 +100,8 @@ extern const char* loongarch_switch_strings[]; #define SW_SINGLE_FLOAT 1 #define SW_DOUBLE_FLOAT 2 #define SW_LSX 3 -#define N_SWITCH_TYPES 4 +#define SW_LASX 4 +#define N_SWITCH_TYPES 5 /* The common default value for variables whose assignments are triggered by command-line options. */ diff --git a/gcc/config/loongarch/loongarch-driver.cc b/gcc/config/loongarch/loongarch-driver.cc index aa5011bd86a..3b9605de35f 100644 --- a/gcc/config/loongarch/loongarch-driver.cc +++ b/gcc/config/loongarch/loongarch-driver.cc @@ -181,7 +181,7 @@ driver_get_normalized_m_opts (int argc, const char **argv) if (la_target.isa.simd) { - APPEND_LTR (" %simd) { case ISA_EXT_SIMD_LSX: + case ISA_EXT_SIMD_LASX: APPEND1 (separator); APPEND_STRING (loongarch_isa_ext_strings[isa->simd]); break; diff --git a/gcc/config/loongarch/loongarch-opts.h b/gcc/config/loongarch/loongarch-opts.h index d067c05dfc9..59a383ec5ca 100644 --- a/gcc/config/loongarch/loongarch-opts.h +++ b/gcc/config/loongarch/loongarch-opts.h @@ -66,7 +66,9 @@ loongarch_config_target (struct loongarch_target *target, || la_target.abi.base == ABI_BASE_LP64F \ || la_target.abi.base == ABI_BASE_LP64S) -#define ISA_HAS_LSX (la_target.isa.simd == ISA_EXT_SIMD_LSX) +#define ISA_HAS_LSX (la_target.isa.simd == ISA_EXT_SIMD_LSX \ + || la_target.isa.simd == ISA_EXT_SIMD_LASX) +#define ISA_HAS_LASX (la_target.isa.simd == ISA_EXT_SIMD_LASX) #define TARGET_ARCH_NATIVE (la_target.cpu_arch == CPU_NATIVE) #define LARCH_ACTUAL_ARCH (TARGET_ARCH_NATIVE \ ? (la_target.cpu_native < N_ARCH_TYPES \ diff --git a/gcc/config/loongarch/loongarch-str.h b/gcc/config/loongarch/loongarch-str.h index 6fa1b1571c5..951f35a3c24 100644 --- a/gcc/config/loongarch/loongarch-str.h +++ b/gcc/config/loongarch/loongarch-str.h @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #define OPTSTR_DOUBLE_FLOAT "double-float" #define OPTSTR_LSX "lsx" +#define OPTSTR_LASX "lasx" #define OPTSTR_ABI_BASE "abi" #define STR_ABI_BASE_LP64D "lp64d" diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 5c7e6d37220..611629b4203 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -87,6 +87,10 @@ mlsx Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(mlsx) Enable LoongArch SIMD Extension (LSX). +mlasx +Target RejectNegative Var(la_opt_switches) Mask(LASX) Negative(mlasx) +Enable LoongArch Advanced SIMD Extension (LASX). + ;; Base target models (implies ISA & tune parameters) Enum Name(cpu_type) Type(int) From patchwork Thu Aug 24 03:13:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenghui Pan X-Patchwork-Id: 1825112 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RWStY0Rs7z1yNm for ; Thu, 24 Aug 2023 13:17:09 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 068FF385483C for ; Thu, 24 Aug 2023 03:17:07 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 295733858426 for ; Thu, 24 Aug 2023 03:14:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 295733858426 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.45]) by gateway (Coremail) with SMTP id _____8AxTev6yuZk+GEbAA--.50627S3; Thu, 24 Aug 2023 11:14:02 +0800 (CST) Received: from loongson-pc.loongson.cn (unknown [10.20.4.45]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxviPdyuZkzvJhAA--.583S9; Thu, 24 Aug 2023 11:14:00 +0800 (CST) From: Chenghui Pan To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn Subject: [PATCH v5 5/6] LoongArch: Add Loongson ASX base instruction support. Date: Thu, 24 Aug 2023 11:13:15 +0800 Message-Id: <20230824031316.16599-6-panchenghui@loongson.cn> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230824031316.16599-1-panchenghui@loongson.cn> References: <20230824031316.16599-1-panchenghui@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxviPdyuZkzvJhAA--.583S9 X-CM-SenderInfo: psdquxxhqjx33l6o00pqjv00gofq/1tbiAQANBGTlhzMLuAAEsK X-Coremail-Antispam: 1Uk129KBj9DXoWkZF4UJF48KryUCw4rZry8CrX_yoW5GFWrWF c_Ww1fKr1xGry5C39Yqw409r15KrykXF4xCFnxu345Was7Ww1Sqrn0qrZ7ZFZxZwn2krZ5 tryqkFs09r1Sgr1vgosvyTuYvTs0mTUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUj1kv1T uYvTs0mT0YCTnIWjqI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9f nUUIcSsGvfJTRUUUb28YFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4 v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW5JVW7JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E 14v26r4UJVWxJr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44 I27wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jw0_WrylYx0Ex4A2 jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20x vY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I 3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2Ij64vIr41lIx AIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAI cVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2js IEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07j8CztUUUUU= X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Lulu Cheng gcc/ChangeLog: * config/loongarch/loongarch-modes.def (VECTOR_MODES): Add Loongson ASX instruction support. * config/loongarch/loongarch-protos.h (loongarch_split_256bit_move): Ditto. (loongarch_split_256bit_move_p): Ditto. (loongarch_expand_vector_group_init): Ditto. (loongarch_expand_vec_perm_1): Ditto. * config/loongarch/loongarch.cc (loongarch_symbol_insns): Ditto. (loongarch_valid_offset_p): Ditto. (loongarch_valid_index_p): Ditto. (loongarch_address_insns): Ditto. (loongarch_const_insns): Ditto. (loongarch_legitimize_move): Ditto. (loongarch_builtin_vectorization_cost): Ditto. (loongarch_split_move_p): Ditto. (loongarch_split_move): Ditto. (loongarch_output_move_index): Ditto. (loongarch_output_move_index_float): Ditto. (loongarch_split_256bit_move_p): Ditto. (loongarch_split_256bit_move): Ditto. (loongarch_output_move): Ditto. (loongarch_print_operand_reloc): Ditto. (loongarch_print_operand): Ditto. (loongarch_hard_regno_mode_ok_uncached): Ditto. (loongarch_hard_regno_nregs): Ditto. (loongarch_class_max_nregs): Ditto. (loongarch_can_change_mode_class): Ditto. (loongarch_mode_ok_for_mov_fmt_p): Ditto. (loongarch_vector_mode_supported_p): Ditto. (loongarch_preferred_simd_mode): Ditto. (loongarch_autovectorize_vector_modes): Ditto. (loongarch_lsx_output_division): Ditto. (loongarch_expand_lsx_shuffle): Ditto. (loongarch_expand_vec_perm): Ditto. (loongarch_expand_vec_perm_interleave): Ditto. (loongarch_try_expand_lsx_vshuf_const): Ditto. (loongarch_expand_vec_perm_even_odd_1): Ditto. (loongarch_expand_vec_perm_even_odd): Ditto. (loongarch_expand_vec_perm_1): Ditto. (loongarch_expand_vec_perm_const_1): Ditto. (loongarch_is_quad_duplicate): Ditto. (loongarch_is_double_duplicate): Ditto. (loongarch_is_odd_extraction): Ditto. (loongarch_is_even_extraction): Ditto. (loongarch_is_extraction_permutation): Ditto. (loongarch_is_center_extraction): Ditto. (loongarch_is_reversing_permutation): Ditto. (loongarch_is_di_misalign_extract): Ditto. (loongarch_is_si_misalign_extract): Ditto. (loongarch_is_lasx_lowpart_interleave): Ditto. (loongarch_is_lasx_lowpart_interleave_2): Ditto. (COMPARE_SELECTOR): Ditto. (loongarch_is_lasx_lowpart_extract): Ditto. (loongarch_is_lasx_highpart_interleave): Ditto. (loongarch_is_lasx_highpart_interleave_2): Ditto. (loongarch_is_elem_duplicate): Ditto. (loongarch_is_op_reverse_perm): Ditto. (loongarch_is_single_op_perm): Ditto. (loongarch_is_divisible_perm): Ditto. (loongarch_is_triple_stride_extract): Ditto. (loongarch_expand_vec_perm_const_2): Ditto. (loongarch_sched_reassociation_width): Ditto. (loongarch_expand_vector_extract): Ditto. (emit_reduc_half): Ditto. (loongarch_expand_vec_unpack): Ditto. (loongarch_expand_vector_group_init): Ditto. (loongarch_expand_vector_init): Ditto. (loongarch_expand_lsx_cmp): Ditto. (loongarch_builtin_support_vector_misalignment): Ditto. * config/loongarch/loongarch.h (UNITS_PER_LASX_REG): Ditto. (BITS_PER_LASX_REG): Ditto. (STRUCTURE_SIZE_BOUNDARY): Ditto. (LASX_REG_FIRST): Ditto. (LASX_REG_LAST): Ditto. (LASX_REG_NUM): Ditto. (LASX_REG_P): Ditto. (LASX_REG_RTX_P): Ditto. (LASX_SUPPORTED_MODE_P): Ditto. * config/loongarch/loongarch.md: Ditto. * config/loongarch/lasx.md: New file. gcc/testsuite/ChangeLog: * g++.dg/torture/vshuf-v16hi.C: Skip loongarch*-*-* because of xvshuf insn's undefined result when 6 or 7 bit of vector's element is set. --- gcc/config/loongarch/lasx.md | 5104 ++++++++++++++++++++ gcc/config/loongarch/loongarch-modes.def | 1 + gcc/config/loongarch/loongarch-protos.h | 4 + gcc/config/loongarch/loongarch.cc | 2490 +++++++++- gcc/config/loongarch/loongarch.h | 60 +- gcc/config/loongarch/loongarch.md | 20 +- gcc/testsuite/g++.dg/torture/vshuf-v16hi.C | 1 + 7 files changed, 7561 insertions(+), 119 deletions(-) create mode 100644 gcc/config/loongarch/lasx.md diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md new file mode 100644 index 00000000000..8111c8bb79a --- /dev/null +++ b/gcc/config/loongarch/lasx.md @@ -0,0 +1,5104 @@ +;; Machine Description for LARCH Loongson ASX ASE +;; +;; Copyright (C) 2018 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . +;; + +(define_c_enum "unspec" [ + UNSPEC_LASX_XVABSD_S + UNSPEC_LASX_XVABSD_U + UNSPEC_LASX_XVAVG_S + UNSPEC_LASX_XVAVG_U + UNSPEC_LASX_XVAVGR_S + UNSPEC_LASX_XVAVGR_U + UNSPEC_LASX_XVBITCLR + UNSPEC_LASX_XVBITCLRI + UNSPEC_LASX_XVBITREV + UNSPEC_LASX_XVBITREVI + UNSPEC_LASX_XVBITSET + UNSPEC_LASX_XVBITSETI + UNSPEC_LASX_XVFCMP_CAF + UNSPEC_LASX_XVFCLASS + UNSPEC_LASX_XVFCMP_CUNE + UNSPEC_LASX_XVFCVT + UNSPEC_LASX_XVFCVTH + UNSPEC_LASX_XVFCVTL + UNSPEC_LASX_XVFLOGB + UNSPEC_LASX_XVFRECIP + UNSPEC_LASX_XVFRINT + UNSPEC_LASX_XVFRSQRT + UNSPEC_LASX_XVFCMP_SAF + UNSPEC_LASX_XVFCMP_SEQ + UNSPEC_LASX_XVFCMP_SLE + UNSPEC_LASX_XVFCMP_SLT + UNSPEC_LASX_XVFCMP_SNE + UNSPEC_LASX_XVFCMP_SOR + UNSPEC_LASX_XVFCMP_SUEQ + UNSPEC_LASX_XVFCMP_SULE + UNSPEC_LASX_XVFCMP_SULT + UNSPEC_LASX_XVFCMP_SUN + UNSPEC_LASX_XVFCMP_SUNE + UNSPEC_LASX_XVFTINT_S + UNSPEC_LASX_XVFTINT_U + UNSPEC_LASX_XVCLO + UNSPEC_LASX_XVSAT_S + UNSPEC_LASX_XVSAT_U + UNSPEC_LASX_XVREPLVE0 + UNSPEC_LASX_XVREPL128VEI + UNSPEC_LASX_XVSRAR + UNSPEC_LASX_XVSRARI + UNSPEC_LASX_XVSRLR + UNSPEC_LASX_XVSRLRI + UNSPEC_LASX_XVSHUF + UNSPEC_LASX_XVSHUF_B + UNSPEC_LASX_BRANCH + UNSPEC_LASX_BRANCH_V + + UNSPEC_LASX_XVMUH_S + UNSPEC_LASX_XVMUH_U + UNSPEC_LASX_MXVEXTW_U + UNSPEC_LASX_XVSLLWIL_S + UNSPEC_LASX_XVSLLWIL_U + UNSPEC_LASX_XVSRAN + UNSPEC_LASX_XVSSRAN_S + UNSPEC_LASX_XVSSRAN_U + UNSPEC_LASX_XVSRARN + UNSPEC_LASX_XVSSRARN_S + UNSPEC_LASX_XVSSRARN_U + UNSPEC_LASX_XVSRLN + UNSPEC_LASX_XVSSRLN_U + UNSPEC_LASX_XVSRLRN + UNSPEC_LASX_XVSSRLRN_U + UNSPEC_LASX_XVFRSTPI + UNSPEC_LASX_XVFRSTP + UNSPEC_LASX_XVSHUF4I + UNSPEC_LASX_XVBSRL_V + UNSPEC_LASX_XVBSLL_V + UNSPEC_LASX_XVEXTRINS + UNSPEC_LASX_XVMSKLTZ + UNSPEC_LASX_XVSIGNCOV + UNSPEC_LASX_XVFTINTRNE_W_S + UNSPEC_LASX_XVFTINTRNE_L_D + UNSPEC_LASX_XVFTINTRP_W_S + UNSPEC_LASX_XVFTINTRP_L_D + UNSPEC_LASX_XVFTINTRM_W_S + UNSPEC_LASX_XVFTINTRM_L_D + UNSPEC_LASX_XVFTINT_W_D + UNSPEC_LASX_XVFFINT_S_L + UNSPEC_LASX_XVFTINTRZ_W_D + UNSPEC_LASX_XVFTINTRP_W_D + UNSPEC_LASX_XVFTINTRM_W_D + UNSPEC_LASX_XVFTINTRNE_W_D + UNSPEC_LASX_XVFTINTH_L_S + UNSPEC_LASX_XVFTINTL_L_S + UNSPEC_LASX_XVFFINTH_D_W + UNSPEC_LASX_XVFFINTL_D_W + UNSPEC_LASX_XVFTINTRZH_L_S + UNSPEC_LASX_XVFTINTRZL_L_S + UNSPEC_LASX_XVFTINTRPH_L_S + UNSPEC_LASX_XVFTINTRPL_L_S + UNSPEC_LASX_XVFTINTRMH_L_S + UNSPEC_LASX_XVFTINTRML_L_S + UNSPEC_LASX_XVFTINTRNEL_L_S + UNSPEC_LASX_XVFTINTRNEH_L_S + UNSPEC_LASX_XVFRINTRNE_S + UNSPEC_LASX_XVFRINTRNE_D + UNSPEC_LASX_XVFRINTRZ_S + UNSPEC_LASX_XVFRINTRZ_D + UNSPEC_LASX_XVFRINTRP_S + UNSPEC_LASX_XVFRINTRP_D + UNSPEC_LASX_XVFRINTRM_S + UNSPEC_LASX_XVFRINTRM_D + UNSPEC_LASX_XVREPLVE0_Q + UNSPEC_LASX_XVPERM_W + UNSPEC_LASX_XVPERMI_Q + UNSPEC_LASX_XVPERMI_D + + UNSPEC_LASX_XVADDWEV + UNSPEC_LASX_XVADDWEV2 + UNSPEC_LASX_XVADDWEV3 + UNSPEC_LASX_XVSUBWEV + UNSPEC_LASX_XVSUBWEV2 + UNSPEC_LASX_XVMULWEV + UNSPEC_LASX_XVMULWEV2 + UNSPEC_LASX_XVMULWEV3 + UNSPEC_LASX_XVADDWOD + UNSPEC_LASX_XVADDWOD2 + UNSPEC_LASX_XVADDWOD3 + UNSPEC_LASX_XVSUBWOD + UNSPEC_LASX_XVSUBWOD2 + UNSPEC_LASX_XVMULWOD + UNSPEC_LASX_XVMULWOD2 + UNSPEC_LASX_XVMULWOD3 + UNSPEC_LASX_XVMADDWEV + UNSPEC_LASX_XVMADDWEV2 + UNSPEC_LASX_XVMADDWEV3 + UNSPEC_LASX_XVMADDWOD + UNSPEC_LASX_XVMADDWOD2 + UNSPEC_LASX_XVMADDWOD3 + UNSPEC_LASX_XVHADDW_Q_D + UNSPEC_LASX_XVHSUBW_Q_D + UNSPEC_LASX_XVHADDW_QU_DU + UNSPEC_LASX_XVHSUBW_QU_DU + UNSPEC_LASX_XVROTR + UNSPEC_LASX_XVADD_Q + UNSPEC_LASX_XVSUB_Q + UNSPEC_LASX_XVREPLVE + UNSPEC_LASX_XVSHUF4 + UNSPEC_LASX_XVMSKGEZ + UNSPEC_LASX_XVMSKNZ + UNSPEC_LASX_XVEXTH_Q_D + UNSPEC_LASX_XVEXTH_QU_DU + UNSPEC_LASX_XVEXTL_Q_D + UNSPEC_LASX_XVSRLNI + UNSPEC_LASX_XVSRLRNI + UNSPEC_LASX_XVSSRLNI + UNSPEC_LASX_XVSSRLNI2 + UNSPEC_LASX_XVSSRLRNI + UNSPEC_LASX_XVSSRLRNI2 + UNSPEC_LASX_XVSRANI + UNSPEC_LASX_XVSRARNI + UNSPEC_LASX_XVSSRANI + UNSPEC_LASX_XVSSRANI2 + UNSPEC_LASX_XVSSRARNI + UNSPEC_LASX_XVSSRARNI2 + UNSPEC_LASX_XVPERMI + UNSPEC_LASX_XVINSVE0 + UNSPEC_LASX_XVPICKVE + UNSPEC_LASX_XVSSRLN + UNSPEC_LASX_XVSSRLRN + UNSPEC_LASX_XVEXTL_QU_DU + UNSPEC_LASX_XVLDI + UNSPEC_LASX_XVLDX + UNSPEC_LASX_XVSTX +]) + +;; All vector modes with 256 bits. +(define_mode_iterator LASX [V4DF V8SF V4DI V8SI V16HI V32QI]) + +;; Same as LASX. Used by vcond to iterate two modes. +(define_mode_iterator LASX_2 [V4DF V8SF V4DI V8SI V16HI V32QI]) + +;; Only used for splitting insert_d and copy_{u,s}.d. +(define_mode_iterator LASX_D [V4DI V4DF]) + +;; Only used for splitting insert_d and copy_{u,s}.d. +(define_mode_iterator LASX_WD [V4DI V4DF V8SI V8SF]) + +;; Only used for copy256_{u,s}.w. +(define_mode_iterator LASX_W [V8SI V8SF]) + +;; Only integer modes in LASX. +(define_mode_iterator ILASX [V4DI V8SI V16HI V32QI]) + +;; As ILASX but excludes V32QI. +(define_mode_iterator ILASX_DWH [V4DI V8SI V16HI]) + +;; As LASX but excludes V32QI. +(define_mode_iterator LASX_DWH [V4DF V8SF V4DI V8SI V16HI]) + +;; As ILASX but excludes V4DI. +(define_mode_iterator ILASX_WHB [V8SI V16HI V32QI]) + +;; Only integer modes equal or larger than a word. +(define_mode_iterator ILASX_DW [V4DI V8SI]) + +;; Only integer modes smaller than a word. +(define_mode_iterator ILASX_HB [V16HI V32QI]) + +;; Only floating-point modes in LASX. +(define_mode_iterator FLASX [V4DF V8SF]) + +;; Only used for immediate set shuffle elements instruction. +(define_mode_iterator LASX_WHB_W [V8SI V16HI V32QI V8SF]) + +;; The attribute gives the integer vector mode with same size in Loongson ASX. +(define_mode_attr VIMODE256 + [(V4DF "V4DI") + (V8SF "V8SI") + (V4DI "V4DI") + (V8SI "V8SI") + (V16HI "V16HI") + (V32QI "V32QI")]) + +;;attribute gives half modes for vector modes. +;;attribute gives half modes (Same Size) for vector modes. +(define_mode_attr VHSMODE256 + [(V16HI "V32QI") + (V8SI "V16HI") + (V4DI "V8SI")]) + +;;attribute gives half modes for vector modes. +(define_mode_attr VHMODE256 + [(V32QI "V16QI") + (V16HI "V8HI") + (V8SI "V4SI") + (V4DI "V2DI")]) + +;;attribute gives half float modes for vector modes. +(define_mode_attr VFHMODE256 + [(V8SF "V4SF") + (V4DF "V2DF")]) + +;; The attribute gives double modes for vector modes in LASX. +(define_mode_attr VDMODE256 + [(V8SI "V4DI") + (V16HI "V8SI") + (V32QI "V16HI")]) + +;; extended from VDMODE256 +(define_mode_attr VDMODEEXD256 + [(V4DI "V4DI") + (V8SI "V4DI") + (V16HI "V8SI") + (V32QI "V16HI")]) + +;; The attribute gives half modes with same number of elements for vector modes. +(define_mode_attr VTRUNCMODE256 + [(V16HI "V16QI") + (V8SI "V8HI") + (V4DI "V4SI")]) + +;; Double-sized Vector MODE with same elemet type. "Vector, Enlarged-MODE" +(define_mode_attr VEMODE256 + [(V8SF "V16SF") + (V8SI "V16SI") + (V4DF "V8DF") + (V4DI "V8DI")]) + +;; This attribute gives the mode of the result for "copy_s_b, copy_u_b" etc. +(define_mode_attr VRES256 + [(V4DF "DF") + (V8SF "SF") + (V4DI "DI") + (V8SI "SI") + (V16HI "SI") + (V32QI "SI")]) + +;; Only used with LASX_D iterator. +(define_mode_attr lasx_d + [(V4DI "reg_or_0") + (V4DF "register")]) + +;; This attribute gives the 256 bit integer vector mode with same size. +(define_mode_attr mode256_i + [(V4DF "v4di") + (V8SF "v8si") + (V4DI "v4di") + (V8SI "v8si") + (V16HI "v16hi") + (V32QI "v32qi")]) + + +;; This attribute gives the 256 bit float vector mode with same size. +(define_mode_attr mode256_f + [(V4DF "v4df") + (V8SF "v8sf") + (V4DI "v4df") + (V8SI "v8sf")]) + + ;; This attribute gives suffix for LASX instructions. HOW? +(define_mode_attr lasxfmt + [(V4DF "d") + (V8SF "w") + (V4DI "d") + (V8SI "w") + (V16HI "h") + (V32QI "b")]) + +(define_mode_attr flasxfmt + [(V4DF "d") + (V8SF "s")]) + +(define_mode_attr lasxfmt_u + [(V4DF "du") + (V8SF "wu") + (V4DI "du") + (V8SI "wu") + (V16HI "hu") + (V32QI "bu")]) + +(define_mode_attr ilasxfmt + [(V4DF "l") + (V8SF "w")]) + +(define_mode_attr ilasxfmt_u + [(V4DF "lu") + (V8SF "wu")]) + +;; This attribute gives suffix for integers in VHMODE256. +(define_mode_attr hlasxfmt + [(V4DI "w") + (V8SI "h") + (V16HI "b")]) + +(define_mode_attr hlasxfmt_u + [(V4DI "wu") + (V8SI "hu") + (V16HI "bu")]) + +;; This attribute gives suffix for integers in VHSMODE256. +(define_mode_attr hslasxfmt + [(V4DI "w") + (V8SI "h") + (V16HI "b")]) + +;; This attribute gives define_insn suffix for LASX instructions that need +;; distinction between integer and floating point. +(define_mode_attr lasxfmt_f + [(V4DF "d_f") + (V8SF "w_f") + (V4DI "d") + (V8SI "w") + (V16HI "h") + (V32QI "b")]) + +(define_mode_attr flasxfmt_f + [(V4DF "d_f") + (V8SF "s_f") + (V4DI "d") + (V8SI "w") + (V16HI "h") + (V32QI "b")]) + +;; This attribute gives define_insn suffix for LASX instructions that need +;; distinction between integer and floating point. +(define_mode_attr lasxfmt_f_wd + [(V4DF "d_f") + (V8SF "w_f") + (V4DI "d") + (V8SI "w")]) + +;; This attribute gives suffix for integers in VHMODE256. +(define_mode_attr dlasxfmt + [(V8SI "d") + (V16HI "w") + (V32QI "h")]) + +(define_mode_attr dlasxfmt_u + [(V8SI "du") + (V16HI "wu") + (V32QI "hu")]) + +;; for VDMODEEXD256 +(define_mode_attr dlasxqfmt + [(V4DI "q") + (V8SI "d") + (V16HI "w") + (V32QI "h")]) + +;; This is used to form an immediate operand constraint using +;; "const__operand". +(define_mode_attr indeximm256 + [(V4DF "0_to_3") + (V8SF "0_to_7") + (V4DI "0_to_3") + (V8SI "0_to_7") + (V16HI "uimm4") + (V32QI "uimm5")]) + +;; This is used to form an immediate operand constraint using to ref high half +;; "const__operand". +(define_mode_attr indeximm_hi + [(V4DF "2_or_3") + (V8SF "4_to_7") + (V4DI "2_or_3") + (V8SI "4_to_7") + (V16HI "8_to_15") + (V32QI "16_to_31")]) + +;; This is used to form an immediate operand constraint using to ref low half +;; "const__operand". +(define_mode_attr indeximm_lo + [(V4DF "0_or_1") + (V8SF "0_to_3") + (V4DI "0_or_1") + (V8SI "0_to_3") + (V16HI "uimm3") + (V32QI "uimm4")]) + +;; This attribute represents bitmask needed for vec_merge using in lasx +;; "const__operand". +(define_mode_attr bitmask256 + [(V4DF "exp_4") + (V8SF "exp_8") + (V4DI "exp_4") + (V8SI "exp_8") + (V16HI "exp_16") + (V32QI "exp_32")]) + +;; This attribute represents bitmask needed for vec_merge using to ref low half +;; "const__operand". +(define_mode_attr bitmask_lo + [(V4DF "exp_2") + (V8SF "exp_4") + (V4DI "exp_2") + (V8SI "exp_4") + (V16HI "exp_8") + (V32QI "exp_16")]) + + +;; This attribute is used to form an immediate operand constraint using +;; "const__operand". +(define_mode_attr bitimm256 + [(V32QI "uimm3") + (V16HI "uimm4") + (V8SI "uimm5") + (V4DI "uimm6")]) + + +(define_mode_attr d2lasxfmt + [(V8SI "q") + (V16HI "d") + (V32QI "w")]) + +(define_mode_attr d2lasxfmt_u + [(V8SI "qu") + (V16HI "du") + (V32QI "wu")]) + +(define_mode_attr VD2MODE256 + [(V8SI "V4DI") + (V16HI "V4DI") + (V32QI "V8SI")]) + +(define_mode_attr lasxfmt_wd + [(V4DI "d") + (V8SI "w") + (V16HI "w") + (V32QI "w")]) + +(define_int_iterator FRINT256_S [UNSPEC_LASX_XVFRINTRP_S + UNSPEC_LASX_XVFRINTRZ_S + UNSPEC_LASX_XVFRINT + UNSPEC_LASX_XVFRINTRM_S]) + +(define_int_iterator FRINT256_D [UNSPEC_LASX_XVFRINTRP_D + UNSPEC_LASX_XVFRINTRZ_D + UNSPEC_LASX_XVFRINT + UNSPEC_LASX_XVFRINTRM_D]) + +(define_int_attr frint256_pattern_s + [(UNSPEC_LASX_XVFRINTRP_S "ceil") + (UNSPEC_LASX_XVFRINTRZ_S "btrunc") + (UNSPEC_LASX_XVFRINT "rint") + (UNSPEC_LASX_XVFRINTRM_S "floor")]) + +(define_int_attr frint256_pattern_d + [(UNSPEC_LASX_XVFRINTRP_D "ceil") + (UNSPEC_LASX_XVFRINTRZ_D "btrunc") + (UNSPEC_LASX_XVFRINT "rint") + (UNSPEC_LASX_XVFRINTRM_D "floor")]) + +(define_int_attr frint256_suffix + [(UNSPEC_LASX_XVFRINTRP_S "rp") + (UNSPEC_LASX_XVFRINTRP_D "rp") + (UNSPEC_LASX_XVFRINTRZ_S "rz") + (UNSPEC_LASX_XVFRINTRZ_D "rz") + (UNSPEC_LASX_XVFRINT "") + (UNSPEC_LASX_XVFRINTRM_S "rm") + (UNSPEC_LASX_XVFRINTRM_D "rm")]) + +(define_expand "vec_init" + [(match_operand:LASX 0 "register_operand") + (match_operand:LASX 1 "")] + "ISA_HAS_LASX" +{ + loongarch_expand_vector_init (operands[0], operands[1]); + DONE; +}) + +(define_expand "vec_initv32qiv16qi" + [(match_operand:V32QI 0 "register_operand") + (match_operand:V16QI 1 "")] + "ISA_HAS_LASX" +{ + loongarch_expand_vector_group_init (operands[0], operands[1]); + DONE; +}) + +;; FIXME: Delete. +(define_insn "vec_pack_trunc_" + [(set (match_operand: 0 "register_operand" "=f") + (vec_concat: + (truncate: + (match_operand:ILASX_DWH 1 "register_operand" "f")) + (truncate: + (match_operand:ILASX_DWH 2 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvpickev.\t%u0,%u2,%u1\n\txvpermi.d\t%u0,%u0,0xd8" + [(set_attr "type" "simd_permute") + (set_attr "mode" "") + (set_attr "length" "8")]) + +(define_expand "vec_unpacks_hi_v8sf" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (float_extend:V4DF + (vec_select:V4SF + (match_operand:V8SF 1 "register_operand" "f") + (match_dup 2))))] + "ISA_HAS_LASX" +{ + operands[2] = loongarch_lsx_vec_parallel_const_half (V8SFmode, + true/*high_p*/); +}) + +(define_expand "vec_unpacks_lo_v8sf" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (float_extend:V4DF + (vec_select:V4SF + (match_operand:V8SF 1 "register_operand" "f") + (match_dup 2))))] + "ISA_HAS_LASX" +{ + operands[2] = loongarch_lsx_vec_parallel_const_half (V8SFmode, + false/*high_p*/); +}) + +(define_expand "vec_unpacks_hi_" + [(match_operand: 0 "register_operand") + (match_operand:ILASX_WHB 1 "register_operand")] + "ISA_HAS_LASX" +{ + loongarch_expand_vec_unpack (operands, false/*unsigned_p*/, + true/*high_p*/); + DONE; +}) + +(define_expand "vec_unpacks_lo_" + [(match_operand: 0 "register_operand") + (match_operand:ILASX_WHB 1 "register_operand")] + "ISA_HAS_LASX" +{ + loongarch_expand_vec_unpack (operands, false/*unsigned_p*/, false/*high_p*/); + DONE; +}) + +(define_expand "vec_unpacku_hi_" + [(match_operand: 0 "register_operand") + (match_operand:ILASX_WHB 1 "register_operand")] + "ISA_HAS_LASX" +{ + loongarch_expand_vec_unpack (operands, true/*unsigned_p*/, true/*high_p*/); + DONE; +}) + +(define_expand "vec_unpacku_lo_" + [(match_operand: 0 "register_operand") + (match_operand:ILASX_WHB 1 "register_operand")] + "ISA_HAS_LASX" +{ + loongarch_expand_vec_unpack (operands, true/*unsigned_p*/, false/*high_p*/); + DONE; +}) + +(define_insn "lasx_xvinsgr2vr_" + [(set (match_operand:ILASX_DW 0 "register_operand" "=f") + (vec_merge:ILASX_DW + (vec_duplicate:ILASX_DW + (match_operand: 1 "reg_or_0_operand" "rJ")) + (match_operand:ILASX_DW 2 "register_operand" "0") + (match_operand 3 "const__operand" "")))] + "ISA_HAS_LASX" +{ +#if 0 + if (!TARGET_64BIT && (mode == V4DImode || mode == V4DFmode)) + return "#"; + else +#endif + return "xvinsgr2vr.\t%u0,%z1,%y3"; +} + [(set_attr "type" "simd_insert") + (set_attr "mode" "")]) + +(define_insn "vec_concatv4di" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (vec_concat:V4DI + (match_operand:V2DI 1 "register_operand" "0") + (match_operand:V2DI 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return "xvpermi.q\t%u0,%u2,0x20"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "V4DI")]) + +(define_insn "vec_concatv8si" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (vec_concat:V8SI + (match_operand:V4SI 1 "register_operand" "0") + (match_operand:V4SI 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return "xvpermi.q\t%u0,%u2,0x20"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "V4DI")]) + +(define_insn "vec_concatv16hi" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "0") + (match_operand:V8HI 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return "xvpermi.q\t%u0,%u2,0x20"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "V4DI")]) + +(define_insn "vec_concatv32qi" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "0") + (match_operand:V16QI 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return "xvpermi.q\t%u0,%u2,0x20"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "V4DI")]) + +(define_insn "vec_concatv4df" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (vec_concat:V4DF + (match_operand:V2DF 1 "register_operand" "0") + (match_operand:V2DF 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return "xvpermi.q\t%u0,%u2,0x20"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "V4DF")]) + +(define_insn "vec_concatv8sf" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (vec_concat:V8SF + (match_operand:V4SF 1 "register_operand" "0") + (match_operand:V4SF 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return "xvpermi.q\t%u0,%u2,0x20"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "V4DI")]) + +;; xshuf.w +(define_insn "lasx_xvperm_" + [(set (match_operand:LASX_W 0 "register_operand" "=f") + (unspec:LASX_W + [(match_operand:LASX_W 1 "nonimmediate_operand" "f") + (match_operand:V8SI 2 "register_operand" "f")] + UNSPEC_LASX_XVPERM_W))] + "ISA_HAS_LASX" + "xvperm.w\t%u0,%u1,%u2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +;; xvpermi.d +(define_insn "lasx_xvpermi_d_" + [(set (match_operand:LASX 0 "register_operand" "=f") + (unspec:LASX + [(match_operand:LASX 1 "register_operand" "f") + (match_operand:SI 2 "const_uimm8_operand")] + UNSPEC_LASX_XVPERMI_D))] + "ISA_HAS_LASX" + "xvpermi.d\t%u0,%u1,%2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lasx_xvpermi_d__1" + [(set (match_operand:LASX_D 0 "register_operand" "=f") + (vec_select:LASX_D + (match_operand:LASX_D 1 "register_operand" "f") + (parallel [(match_operand 2 "const_0_to_3_operand") + (match_operand 3 "const_0_to_3_operand") + (match_operand 4 "const_0_to_3_operand") + (match_operand 5 "const_0_to_3_operand")])))] + "ISA_HAS_LASX" +{ + int mask = 0; + mask |= INTVAL (operands[2]) << 0; + mask |= INTVAL (operands[3]) << 2; + mask |= INTVAL (operands[4]) << 4; + mask |= INTVAL (operands[5]) << 6; + operands[2] = GEN_INT (mask); + return "xvpermi.d\t%u0,%u1,%2"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +;; xvpermi.q +(define_insn "lasx_xvpermi_q_" + [(set (match_operand:LASX 0 "register_operand" "=f") + (unspec:LASX + [(match_operand:LASX 1 "register_operand" "0") + (match_operand:LASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand")] + UNSPEC_LASX_XVPERMI_Q))] + "ISA_HAS_LASX" + "xvpermi.q\t%u0,%u2,%3" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lasx_xvpickve2gr_d" + [(set (match_operand:DI 0 "register_operand" "=r") + (any_extend:DI + (vec_select:DI + (match_operand:V4DI 1 "register_operand" "f") + (parallel [(match_operand 2 "const_0_to_3_operand" "")]))))] + "ISA_HAS_LASX" + "xvpickve2gr.d\t%0,%u1,%2" + [(set_attr "type" "simd_copy") + (set_attr "mode" "V4DI")]) + +(define_expand "vec_set" + [(match_operand:ILASX_DW 0 "register_operand") + (match_operand: 1 "reg_or_0_operand") + (match_operand 2 "const__operand")] + "ISA_HAS_LASX" +{ + rtx index = GEN_INT (1 << INTVAL (operands[2])); + emit_insn (gen_lasx_xvinsgr2vr_ (operands[0], operands[1], + operands[0], index)); + DONE; +}) + +(define_expand "vec_set" + [(match_operand:FLASX 0 "register_operand") + (match_operand: 1 "reg_or_0_operand") + (match_operand 2 "const__operand")] + "ISA_HAS_LASX" +{ + rtx index = GEN_INT (1 << INTVAL (operands[2])); + emit_insn (gen_lasx_xvinsve0__scalar (operands[0], operands[1], + operands[0], index)); + DONE; +}) + +(define_expand "vec_extract" + [(match_operand: 0 "register_operand") + (match_operand:LASX 1 "register_operand") + (match_operand 2 "const__operand")] + "ISA_HAS_LASX" +{ + loongarch_expand_vector_extract (operands[0], operands[1], + INTVAL (operands[2])); + DONE; +}) + +(define_expand "vec_perm" + [(match_operand:LASX 0 "register_operand") + (match_operand:LASX 1 "register_operand") + (match_operand:LASX 2 "register_operand") + (match_operand: 3 "register_operand")] + "ISA_HAS_LASX" +{ + loongarch_expand_vec_perm_1 (operands); + DONE; +}) + +;; FIXME: 256?? +(define_expand "vcondu" + [(match_operand:LASX 0 "register_operand") + (match_operand:LASX 1 "reg_or_m1_operand") + (match_operand:LASX 2 "reg_or_0_operand") + (match_operator 3 "" + [(match_operand:ILASX 4 "register_operand") + (match_operand:ILASX 5 "register_operand")])] + "ISA_HAS_LASX + && (GET_MODE_NUNITS (mode) + == GET_MODE_NUNITS (mode))" +{ + loongarch_expand_vec_cond_expr (mode, mode, + operands); + DONE; +}) + +;; FIXME: 256?? +(define_expand "vcond" + [(match_operand:LASX 0 "register_operand") + (match_operand:LASX 1 "reg_or_m1_operand") + (match_operand:LASX 2 "reg_or_0_operand") + (match_operator 3 "" + [(match_operand:LASX_2 4 "register_operand") + (match_operand:LASX_2 5 "register_operand")])] + "ISA_HAS_LASX + && (GET_MODE_NUNITS (mode) + == GET_MODE_NUNITS (mode))" +{ + loongarch_expand_vec_cond_expr (mode, mode, + operands); + DONE; +}) + +;; Same as vcond_ +(define_expand "vcond_mask_" + [(match_operand:ILASX 0 "register_operand") + (match_operand:ILASX 1 "reg_or_m1_operand") + (match_operand:ILASX 2 "reg_or_0_operand") + (match_operand:ILASX 3 "register_operand")] + "ISA_HAS_LASX" +{ + loongarch_expand_vec_cond_mask_expr (mode, + mode, operands); + DONE; +}) + +(define_expand "lasx_xvrepli" + [(match_operand:ILASX 0 "register_operand") + (match_operand 1 "const_imm10_operand")] + "ISA_HAS_LASX" +{ + if (mode == V32QImode) + operands[1] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]), + mode)); + emit_move_insn (operands[0], + loongarch_gen_const_int_vector (mode, INTVAL (operands[1]))); + DONE; +}) + +(define_expand "mov" + [(set (match_operand:LASX 0) + (match_operand:LASX 1))] + "ISA_HAS_LASX" +{ + if (loongarch_legitimize_move (mode, operands[0], operands[1])) + DONE; +}) + + +(define_expand "movmisalign" + [(set (match_operand:LASX 0) + (match_operand:LASX 1))] + "ISA_HAS_LASX" +{ + if (loongarch_legitimize_move (mode, operands[0], operands[1])) + DONE; +}) + +;; 256-bit LASX modes can only exist in LASX registers or memory. +(define_insn "mov_lasx" + [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f") + (match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))] + "ISA_HAS_LASX" + { return loongarch_output_move (operands[0], operands[1]); } + [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert") + (set_attr "mode" "") + (set_attr "length" "8,4,4,4,4")]) + + +(define_split + [(set (match_operand:LASX 0 "nonimmediate_operand") + (match_operand:LASX 1 "move_operand"))] + "reload_completed && ISA_HAS_LASX + && loongarch_split_move_insn_p (operands[0], operands[1])" + [(const_int 0)] +{ + loongarch_split_move_insn (operands[0], operands[1], curr_insn); + DONE; +}) + +;; Offset load +(define_expand "lasx_mxld_" + [(match_operand:LASX 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq10_operand")] + "ISA_HAS_LASX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (operands[0], gen_rtx_MEM (mode, addr)); + DONE; +}) + +;; Offset store +(define_expand "lasx_mxst_" + [(match_operand:LASX 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq10_operand")] + "ISA_HAS_LASX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (gen_rtx_MEM (mode, addr), operands[0]); + DONE; +}) + +;; LASX +(define_insn "add3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f,f") + (plus:ILASX + (match_operand:ILASX 1 "register_operand" "f,f,f") + (match_operand:ILASX 2 "reg_or_vector_same_ximm5_operand" "f,Unv5,Uuv5")))] + "ISA_HAS_LASX" +{ + switch (which_alternative) + { + case 0: + return "xvadd.\t%u0,%u1,%u2"; + case 1: + { + HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0)); + + operands[2] = GEN_INT (-val); + return "xvsubi.\t%u0,%u1,%d2"; + } + case 2: + return "xvaddi.\t%u0,%u1,%E2"; + default: + gcc_unreachable (); + } +} + [(set_attr "alu_type" "simd_add") + (set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "sub3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (minus:ILASX + (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_uimm5_operand" "f,Uuv5")))] + "ISA_HAS_LASX" + "@ + xvsub.\t%u0,%u1,%u2 + xvsubi.\t%u0,%u1,%E2" + [(set_attr "alu_type" "simd_add") + (set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "mul3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (mult:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvmul.\t%u0,%u1,%u2" + [(set_attr "type" "simd_mul") + (set_attr "mode" "")]) + +(define_insn "lasx_xvmadd_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (plus:ILASX (mult:ILASX (match_operand:ILASX 2 "register_operand" "f") + (match_operand:ILASX 3 "register_operand" "f")) + (match_operand:ILASX 1 "register_operand" "0")))] + "ISA_HAS_LASX" + "xvmadd.\t%u0,%u2,%u3" + [(set_attr "type" "simd_mul") + (set_attr "mode" "")]) + + + +(define_insn "lasx_xvmsub_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (minus:ILASX (match_operand:ILASX 1 "register_operand" "0") + (mult:ILASX (match_operand:ILASX 2 "register_operand" "f") + (match_operand:ILASX 3 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvmsub.\t%u0,%u2,%u3" + [(set_attr "type" "simd_mul") + (set_attr "mode" "")]) + +(define_insn "div3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (div:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return loongarch_lsx_output_division ("xvdiv.\t%u0,%u1,%u2", + operands); +} + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "udiv3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (udiv:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return loongarch_lsx_output_division ("xvdiv.\t%u0,%u1,%u2", + operands); +} + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "mod3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (mod:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return loongarch_lsx_output_division ("xvmod.\t%u0,%u1,%u2", + operands); +} + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "umod3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (umod:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" +{ + return loongarch_lsx_output_division ("xvmod.\t%u0,%u1,%u2", + operands); +} + [(set_attr "type" "simd_div") + (set_attr "mode" "")]) + +(define_insn "xor3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f,f") + (xor:ILASX + (match_operand:ILASX 1 "register_operand" "f,f,f") + (match_operand:ILASX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))] + "ISA_HAS_LASX" + "@ + xvxor.v\t%u0,%u1,%u2 + xvbitrevi.%v0\t%u0,%u1,%V2 + xvxori.b\t%u0,%u1,%B2" + [(set_attr "type" "simd_logic,simd_bit,simd_logic") + (set_attr "mode" "")]) + +(define_insn "ior3" + [(set (match_operand:LASX 0 "register_operand" "=f,f,f") + (ior:LASX + (match_operand:LASX 1 "register_operand" "f,f,f") + (match_operand:LASX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))] + "ISA_HAS_LASX" + "@ + xvor.v\t%u0,%u1,%u2 + xvbitseti.%v0\t%u0,%u1,%V2 + xvori.b\t%u0,%u1,%B2" + [(set_attr "type" "simd_logic,simd_bit,simd_logic") + (set_attr "mode" "")]) + +(define_insn "and3" + [(set (match_operand:LASX 0 "register_operand" "=f,f,f") + (and:LASX + (match_operand:LASX 1 "register_operand" "f,f,f") + (match_operand:LASX 2 "reg_or_vector_same_val_operand" "f,YZ,Urv8")))] + "ISA_HAS_LASX" +{ + switch (which_alternative) + { + case 0: + return "xvand.v\t%u0,%u1,%u2"; + case 1: + { + rtx elt0 = CONST_VECTOR_ELT (operands[2], 0); + unsigned HOST_WIDE_INT val = ~UINTVAL (elt0); + operands[2] = loongarch_gen_const_int_vector (mode, val & (-val)); + return "xvbitclri.%v0\t%u0,%u1,%V2"; + } + case 2: + return "xvandi.b\t%u0,%u1,%B2"; + default: + gcc_unreachable (); + } +} + [(set_attr "type" "simd_logic,simd_bit,simd_logic") + (set_attr "mode" "")]) + +(define_insn "one_cmpl2" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (not:ILASX (match_operand:ILASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvnor.v\t%u0,%u1,%u1" + [(set_attr "type" "simd_logic") + (set_attr "mode" "V32QI")]) + +;; LASX +(define_insn "vlshr3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (lshiftrt:ILASX + (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + "ISA_HAS_LASX" + "@ + xvsrl.\t%u0,%u1,%u2 + xvsrli.\t%u0,%u1,%E2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +;; LASX ">>" +(define_insn "vashr3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (ashiftrt:ILASX + (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + "ISA_HAS_LASX" + "@ + xvsra.\t%u0,%u1,%u2 + xvsrai.\t%u0,%u1,%E2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +;; LASX "<<" +(define_insn "vashl3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (ashift:ILASX + (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + "ISA_HAS_LASX" + "@ + xvsll.\t%u0,%u1,%u2 + xvslli.\t%u0,%u1,%E2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + + +(define_insn "add3" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (plus:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfadd.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fadd") + (set_attr "mode" "")]) + +(define_insn "sub3" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (minus:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfsub.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fadd") + (set_attr "mode" "")]) + +(define_insn "mul3" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (mult:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfmul.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fmul") + (set_attr "mode" "")]) + +(define_insn "div3" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (div:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfdiv.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +(define_insn "fma4" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (fma:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f") + (match_operand:FLASX 3 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfmadd.\t%u0,%u1,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "fnma4" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (fma:FLASX (neg:FLASX (match_operand:FLASX 1 "register_operand" "f")) + (match_operand:FLASX 2 "register_operand" "f") + (match_operand:FLASX 3 "register_operand" "0")))] + "ISA_HAS_LASX" + "xvfnmsub.\t%u0,%u1,%u2,%u0" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "sqrt2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (sqrt:FLASX (match_operand:FLASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfsqrt.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +(define_insn "lasx_xvadda_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (plus:ILASX (abs:ILASX (match_operand:ILASX 1 "register_operand" "f")) + (abs:ILASX (match_operand:ILASX 2 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvadda.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "ssadd3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (ss_plus:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvsadd.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "usadd3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (us_plus:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvsadd.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvabsd_s_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVABSD_S))] + "ISA_HAS_LASX" + "xvabsd.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvabsd_u_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVABSD_U))] + "ISA_HAS_LASX" + "xvabsd.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvavg_s_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVAVG_S))] + "ISA_HAS_LASX" + "xvavg.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvavg_u_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVAVG_U))] + "ISA_HAS_LASX" + "xvavg.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvavgr_s_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVAVGR_S))] + "ISA_HAS_LASX" + "xvavgr.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvavgr_u_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVAVGR_U))] + "ISA_HAS_LASX" + "xvavgr.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbitclr_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVBITCLR))] + "ISA_HAS_LASX" + "xvbitclr.\t%u0,%u1,%u2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbitclri_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVBITCLRI))] + "ISA_HAS_LASX" + "xvbitclri.\t%u0,%u1,%2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbitrev_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVBITREV))] + "ISA_HAS_LASX" + "xvbitrev.\t%u0,%u1,%u2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbitrevi_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVBITREVI))] + "ISA_HAS_LASX" + "xvbitrevi.\t%u0,%u1,%2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbitsel_" + [(set (match_operand:LASX 0 "register_operand" "=f") + (ior:LASX (and:LASX (not:LASX + (match_operand:LASX 3 "register_operand" "f")) + (match_operand:LASX 1 "register_operand" "f")) + (and:LASX (match_dup 3) + (match_operand:LASX 2 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvbitsel.v\t%u0,%u1,%u2,%u3" + [(set_attr "type" "simd_bitmov") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbitseli_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (ior:V32QI (and:V32QI (not:V32QI + (match_operand:V32QI 1 "register_operand" "0")) + (match_operand:V32QI 2 "register_operand" "f")) + (and:V32QI (match_dup 1) + (match_operand:V32QI 3 "const_vector_same_val_operand" "Urv8"))))] + "ISA_HAS_LASX" + "xvbitseli.b\t%u0,%u2,%B3" + [(set_attr "type" "simd_bitmov") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvbitset_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVBITSET))] + "ISA_HAS_LASX" + "xvbitset.\t%u0,%u1,%u2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbitseti_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVBITSETI))] + "ISA_HAS_LASX" + "xvbitseti.\t%u0,%u1,%2" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvs_" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (ICC:ILASX + (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_imm5_operand" "f,Uv5")))] + "ISA_HAS_LASX" + "@ + xvs.\t%u0,%u1,%u2 + xvs.\t%u0,%u1,%E2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_expand "vec_cmp" + [(set (match_operand: 0 "register_operand") + (match_operator 1 "" + [(match_operand:LASX 2 "register_operand") + (match_operand:LASX 3 "register_operand")]))] + "ISA_HAS_LASX" +{ + bool ok = loongarch_expand_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu" + [(set (match_operand: 0 "register_operand") + (match_operator 1 "" + [(match_operand:ILASX 2 "register_operand") + (match_operand:ILASX 3 "register_operand")]))] + "ISA_HAS_LASX" +{ + bool ok = loongarch_expand_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_insn "lasx_xvfclass_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFCLASS))] + "ISA_HAS_LASX" + "xvfclass.\t%u0,%u1" + [(set_attr "type" "simd_fclass") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfcmp_caf_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")] + UNSPEC_LASX_XVFCMP_CAF))] + "ISA_HAS_LASX" + "xvfcmp.caf.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfcmp_cune_" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")] + UNSPEC_LASX_XVFCMP_CUNE))] + "ISA_HAS_LASX" + "xvfcmp.cune.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + + + +(define_int_iterator FSC256_UNS [UNSPEC_LASX_XVFCMP_SAF UNSPEC_LASX_XVFCMP_SUN + UNSPEC_LASX_XVFCMP_SOR UNSPEC_LASX_XVFCMP_SEQ + UNSPEC_LASX_XVFCMP_SNE UNSPEC_LASX_XVFCMP_SUEQ + UNSPEC_LASX_XVFCMP_SUNE UNSPEC_LASX_XVFCMP_SULE + UNSPEC_LASX_XVFCMP_SULT UNSPEC_LASX_XVFCMP_SLE + UNSPEC_LASX_XVFCMP_SLT]) + +(define_int_attr fsc256 + [(UNSPEC_LASX_XVFCMP_SAF "saf") + (UNSPEC_LASX_XVFCMP_SUN "sun") + (UNSPEC_LASX_XVFCMP_SOR "sor") + (UNSPEC_LASX_XVFCMP_SEQ "seq") + (UNSPEC_LASX_XVFCMP_SNE "sne") + (UNSPEC_LASX_XVFCMP_SUEQ "sueq") + (UNSPEC_LASX_XVFCMP_SUNE "sune") + (UNSPEC_LASX_XVFCMP_SULE "sule") + (UNSPEC_LASX_XVFCMP_SULT "sult") + (UNSPEC_LASX_XVFCMP_SLE "sle") + (UNSPEC_LASX_XVFCMP_SLT "slt")]) + +(define_insn "lasx_xvfcmp__" + [(set (match_operand: 0 "register_operand" "=f") + (vfcond: (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfcmp..\t%u0,%u1,%u2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + + +(define_insn "lasx_xvfcmp__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")] + FSC256_UNS))] + "ISA_HAS_LASX" + "xvfcmp..\t%u0,%u1,%u2" + [(set_attr "type" "simd_fcmp") + (set_attr "mode" "")]) + + +(define_mode_attr fint256 + [(V8SF "v8si") + (V4DF "v4di")]) + +(define_mode_attr FINTCNV256 + [(V8SF "I2S") + (V4DF "I2D")]) + +(define_mode_attr FINTCNV256_2 + [(V8SF "S2I") + (V4DF "D2I")]) + +(define_insn "float2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (float:FLASX (match_operand: 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvffint..\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "floatuns2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unsigned_float:FLASX + (match_operand: 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvffint..\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_mode_attr FFQ256 + [(V4SF "V16HI") + (V2DF "V8SI")]) + +(define_insn "lasx_xvreplgr2vr_" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (vec_duplicate:ILASX + (match_operand: 1 "reg_or_0_operand" "r,J")))] + "ISA_HAS_LASX" +{ + if (which_alternative == 1) + return "xvldi.b\t%u0,0" ; + + if (!TARGET_64BIT && (mode == V2DImode || mode == V2DFmode)) + return "#"; + else + return "xvreplgr2vr.\t%u0,%z1"; +} + [(set_attr "type" "simd_fill") + (set_attr "mode" "") + (set_attr "length" "8")]) + +(define_insn "logb2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFLOGB))] + "ISA_HAS_LASX" + "xvflogb.\t%u0,%u1" + [(set_attr "type" "simd_flog2") + (set_attr "mode" "")]) + + +(define_insn "smax3" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (smax:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfmax.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfmaxa_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (if_then_else:FLASX + (gt (abs:FLASX (match_operand:FLASX 1 "register_operand" "f")) + (abs:FLASX (match_operand:FLASX 2 "register_operand" "f"))) + (match_dup 1) + (match_dup 2)))] + "ISA_HAS_LASX" + "xvfmaxa.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "smin3" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (smin:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfmin.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfmina_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (if_then_else:FLASX + (lt (abs:FLASX (match_operand:FLASX 1 "register_operand" "f")) + (abs:FLASX (match_operand:FLASX 2 "register_operand" "f"))) + (match_dup 1) + (match_dup 2)))] + "ISA_HAS_LASX" + "xvfmina.\t%u0,%u1,%u2" + [(set_attr "type" "simd_fminmax") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfrecip_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRECIP))] + "ISA_HAS_LASX" + "xvfrecip.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfrint_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINT))] + "ISA_HAS_LASX" + "xvfrint.\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfrsqrt_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] + "ISA_HAS_LASX" + "xvfrsqrt.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + +(define_insn "lasx_xvftint_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINT_S))] + "ISA_HAS_LASX" + "xvftint..\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "lasx_xvftint_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINT_U))] + "ISA_HAS_LASX" + "xvftint..\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + + + +(define_insn "fix_trunc2" + [(set (match_operand: 0 "register_operand" "=f") + (fix: (match_operand:FLASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvftintrz..\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + + +(define_insn "fixuns_trunc2" + [(set (match_operand: 0 "register_operand" "=f") + (unsigned_fix: (match_operand:FLASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvftintrz..\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "cnv_mode" "") + (set_attr "mode" "")]) + +(define_insn "lasx_xvhw_h_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (addsub:V16HI + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 1 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)]))) + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)])))))] + "ISA_HAS_LASX" + "xvhw.h.b\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvhw_w_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (addsub:V8SI + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 1 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)])))))] + "ISA_HAS_LASX" + "xvhw.w.h\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvhw_d_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (addsub:V4DI + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))))] + "ISA_HAS_LASX" + "xvhw.d.w\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvpackev_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_select:V32QI + (vec_concat:V64QI + (match_operand:V32QI 1 "register_operand" "f") + (match_operand:V32QI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 32) + (const_int 2) (const_int 34) + (const_int 4) (const_int 36) + (const_int 6) (const_int 38) + (const_int 8) (const_int 40) + (const_int 10) (const_int 42) + (const_int 12) (const_int 44) + (const_int 14) (const_int 46) + (const_int 16) (const_int 48) + (const_int 18) (const_int 50) + (const_int 20) (const_int 52) + (const_int 22) (const_int 54) + (const_int 24) (const_int 56) + (const_int 26) (const_int 58) + (const_int 28) (const_int 60) + (const_int 30) (const_int 62)])))] + "ISA_HAS_LASX" + "xvpackev.b\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V32QI")]) + + +(define_insn "lasx_xvpackev_h" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_select:V16HI + (vec_concat:V32HI + (match_operand:V16HI 1 "register_operand" "f") + (match_operand:V16HI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 16) + (const_int 2) (const_int 18) + (const_int 4) (const_int 20) + (const_int 6) (const_int 22) + (const_int 8) (const_int 24) + (const_int 10) (const_int 26) + (const_int 12) (const_int 28) + (const_int 14) (const_int 30)])))] + "ISA_HAS_LASX" + "xvpackev.h\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvpackev_w" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (vec_select:V8SI + (vec_concat:V16SI + (match_operand:V8SI 1 "register_operand" "f") + (match_operand:V8SI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 8) + (const_int 2) (const_int 10) + (const_int 4) (const_int 12) + (const_int 6) (const_int 14)])))] + "ISA_HAS_LASX" + "xvpackev.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvpackev_w_f" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (vec_select:V8SF + (vec_concat:V16SF + (match_operand:V8SF 1 "register_operand" "f") + (match_operand:V8SF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 8) + (const_int 2) (const_int 10) + (const_int 4) (const_int 12) + (const_int 6) (const_int 14)])))] + "ISA_HAS_LASX" + "xvpackev.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvilvh_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_select:V32QI + (vec_concat:V64QI + (match_operand:V32QI 1 "register_operand" "f") + (match_operand:V32QI 2 "register_operand" "f")) + (parallel [(const_int 8) (const_int 40) + (const_int 9) (const_int 41) + (const_int 10) (const_int 42) + (const_int 11) (const_int 43) + (const_int 12) (const_int 44) + (const_int 13) (const_int 45) + (const_int 14) (const_int 46) + (const_int 15) (const_int 47) + (const_int 24) (const_int 56) + (const_int 25) (const_int 57) + (const_int 26) (const_int 58) + (const_int 27) (const_int 59) + (const_int 28) (const_int 60) + (const_int 29) (const_int 61) + (const_int 30) (const_int 62) + (const_int 31) (const_int 63)])))] + "ISA_HAS_LASX" + "xvilvh.b\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvilvh_h" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_select:V16HI + (vec_concat:V32HI + (match_operand:V16HI 1 "register_operand" "f") + (match_operand:V16HI 2 "register_operand" "f")) + (parallel [(const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "ISA_HAS_LASX" + "xvilvh.h\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16HI")]) + +(define_mode_attr xvilvh_suffix + [(V8SI "") (V8SF "_f") + (V4DI "") (V4DF "_f")]) + +(define_insn "lasx_xvilvh_w" + [(set (match_operand:LASX_W 0 "register_operand" "=f") + (vec_select:LASX_W + (vec_concat: + (match_operand:LASX_W 1 "register_operand" "f") + (match_operand:LASX_W 2 "register_operand" "f")) + (parallel [(const_int 2) (const_int 10) + (const_int 3) (const_int 11) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "ISA_HAS_LASX" + "xvilvh.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "")]) + +(define_insn "lasx_xvilvh_d" + [(set (match_operand:LASX_D 0 "register_operand" "=f") + (vec_select:LASX_D + (vec_concat: + (match_operand:LASX_D 1 "register_operand" "f") + (match_operand:LASX_D 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 5) + (const_int 3) (const_int 7)])))] + "ISA_HAS_LASX" + "xvilvh.d\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "")]) + +(define_insn "lasx_xvpackod_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_select:V32QI + (vec_concat:V64QI + (match_operand:V32QI 1 "register_operand" "f") + (match_operand:V32QI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 33) + (const_int 3) (const_int 35) + (const_int 5) (const_int 37) + (const_int 7) (const_int 39) + (const_int 9) (const_int 41) + (const_int 11) (const_int 43) + (const_int 13) (const_int 45) + (const_int 15) (const_int 47) + (const_int 17) (const_int 49) + (const_int 19) (const_int 51) + (const_int 21) (const_int 53) + (const_int 23) (const_int 55) + (const_int 25) (const_int 57) + (const_int 27) (const_int 59) + (const_int 29) (const_int 61) + (const_int 31) (const_int 63)])))] + "ISA_HAS_LASX" + "xvpackod.b\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V32QI")]) + + +(define_insn "lasx_xvpackod_h" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_select:V16HI + (vec_concat:V32HI + (match_operand:V16HI 1 "register_operand" "f") + (match_operand:V16HI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 17) + (const_int 3) (const_int 19) + (const_int 5) (const_int 21) + (const_int 7) (const_int 23) + (const_int 9) (const_int 25) + (const_int 11) (const_int 27) + (const_int 13) (const_int 29) + (const_int 15) (const_int 31)])))] + "ISA_HAS_LASX" + "xvpackod.h\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16HI")]) + + +(define_insn "lasx_xvpackod_w" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (vec_select:V8SI + (vec_concat:V16SI + (match_operand:V8SI 1 "register_operand" "f") + (match_operand:V8SI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 9) + (const_int 3) (const_int 11) + (const_int 5) (const_int 13) + (const_int 7) (const_int 15)])))] + "ISA_HAS_LASX" + "xvpackod.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SI")]) + + +(define_insn "lasx_xvpackod_w_f" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (vec_select:V8SF + (vec_concat:V16SF + (match_operand:V8SF 1 "register_operand" "f") + (match_operand:V8SF 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 9) + (const_int 3) (const_int 11) + (const_int 5) (const_int 13) + (const_int 7) (const_int 15)])))] + "ISA_HAS_LASX" + "xvpackod.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvilvl_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_select:V32QI + (vec_concat:V64QI + (match_operand:V32QI 1 "register_operand" "f") + (match_operand:V32QI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 32) + (const_int 1) (const_int 33) + (const_int 2) (const_int 34) + (const_int 3) (const_int 35) + (const_int 4) (const_int 36) + (const_int 5) (const_int 37) + (const_int 6) (const_int 38) + (const_int 7) (const_int 39) + (const_int 16) (const_int 48) + (const_int 17) (const_int 49) + (const_int 18) (const_int 50) + (const_int 19) (const_int 51) + (const_int 20) (const_int 52) + (const_int 21) (const_int 53) + (const_int 22) (const_int 54) + (const_int 23) (const_int 55)])))] + "ISA_HAS_LASX" + "xvilvl.b\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvilvl_h" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_select:V16HI + (vec_concat:V32HI + (match_operand:V16HI 1 "register_operand" "f") + (match_operand:V16HI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27)])))] + "ISA_HAS_LASX" + "xvilvl.h\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvilvl_w" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (vec_select:V8SI + (vec_concat:V16SI + (match_operand:V8SI 1 "register_operand" "f") + (match_operand:V8SI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 4) (const_int 12) + (const_int 5) (const_int 13)])))] + "ISA_HAS_LASX" + "xvilvl.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvilvl_w_f" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (vec_select:V8SF + (vec_concat:V16SF + (match_operand:V8SF 1 "register_operand" "f") + (match_operand:V8SF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 4) (const_int 12) + (const_int 5) (const_int 13)])))] + "ISA_HAS_LASX" + "xvilvl.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvilvl_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (vec_select:V4DI + (vec_concat:V8DI + (match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 4) + (const_int 2) (const_int 6)])))] + "ISA_HAS_LASX" + "xvilvl.d\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvilvl_d_f" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (vec_select:V4DF + (vec_concat:V8DF + (match_operand:V4DF 1 "register_operand" "f") + (match_operand:V4DF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 4) + (const_int 2) (const_int 6)])))] + "ISA_HAS_LASX" + "xvilvl.d\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V4DF")]) + +(define_insn "smax3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (smax:ILASX (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_simm5_operand" "f,Usv5")))] + "ISA_HAS_LASX" + "@ + xvmax.\t%u0,%u1,%u2 + xvmaxi.\t%u0,%u1,%E2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "umax3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (umax:ILASX (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_uimm5_operand" "f,Uuv5")))] + "ISA_HAS_LASX" + "@ + xvmax.\t%u0,%u1,%u2 + xvmaxi.\t%u0,%u1,%B2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "smin3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (smin:ILASX (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_simm5_operand" "f,Usv5")))] + "ISA_HAS_LASX" + "@ + xvmin.\t%u0,%u1,%u2 + xvmini.\t%u0,%u1,%E2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "umin3" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (umin:ILASX (match_operand:ILASX 1 "register_operand" "f,f") + (match_operand:ILASX 2 "reg_or_vector_same_uimm5_operand" "f,Uuv5")))] + "ISA_HAS_LASX" + "@ + xvmin.\t%u0,%u1,%u2 + xvmini.\t%u0,%u1,%B2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvclo_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (clz:ILASX (not:ILASX (match_operand:ILASX 1 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvclo.\t%u0,%u1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "clz2" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (clz:ILASX (match_operand:ILASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvclz.\t%u0,%u1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvnor_" + [(set (match_operand:ILASX 0 "register_operand" "=f,f") + (and:ILASX (not:ILASX (match_operand:ILASX 1 "register_operand" "f,f")) + (not:ILASX (match_operand:ILASX 2 "reg_or_vector_same_val_operand" "f,Urv8"))))] + "ISA_HAS_LASX" + "@ + xvnor.v\t%u0,%u1,%u2 + xvnori.b\t%u0,%u1,%B2" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "lasx_xvpickev_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_select:V32QI + (vec_concat:V64QI + (match_operand:V32QI 1 "register_operand" "f") + (match_operand:V32QI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 32) (const_int 34) + (const_int 36) (const_int 38) + (const_int 40) (const_int 42) + (const_int 44) (const_int 46) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30) + (const_int 48) (const_int 50) + (const_int 52) (const_int 54) + (const_int 56) (const_int 58) + (const_int 60) (const_int 62)])))] + "ISA_HAS_LASX" + "xvpickev.b\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvpickev_h" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_select:V16HI + (vec_concat:V32HI + (match_operand:V16HI 1 "register_operand" "f") + (match_operand:V16HI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)])))] + "ISA_HAS_LASX" + "xvpickev.h\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvpickev_w" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (vec_select:V8SI + (vec_concat:V16SI + (match_operand:V8SI 1 "register_operand" "f") + (match_operand:V8SI 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 8) (const_int 10) + (const_int 4) (const_int 6) + (const_int 12) (const_int 14)])))] + "ISA_HAS_LASX" + "xvpickev.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvpickev_w_f" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (vec_select:V8SF + (vec_concat:V16SF + (match_operand:V8SF 1 "register_operand" "f") + (match_operand:V8SF 2 "register_operand" "f")) + (parallel [(const_int 0) (const_int 2) + (const_int 8) (const_int 10) + (const_int 4) (const_int 6) + (const_int 12) (const_int 14)])))] + "ISA_HAS_LASX" + "xvpickev.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvpickod_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_select:V32QI + (vec_concat:V64QI + (match_operand:V32QI 1 "register_operand" "f") + (match_operand:V32QI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 33) (const_int 35) + (const_int 37) (const_int 39) + (const_int 41) (const_int 43) + (const_int 45) (const_int 47) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31) + (const_int 49) (const_int 51) + (const_int 53) (const_int 55) + (const_int 57) (const_int 59) + (const_int 61) (const_int 63)])))] + "ISA_HAS_LASX" + "xvpickod.b\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvpickod_h" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_select:V16HI + (vec_concat:V32HI + (match_operand:V16HI 1 "register_operand" "f") + (match_operand:V16HI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)])))] + "ISA_HAS_LASX" + "xvpickod.h\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvpickod_w" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (vec_select:V8SI + (vec_concat:V16SI + (match_operand:V8SI 1 "register_operand" "f") + (match_operand:V8SI 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 9) (const_int 11) + (const_int 5) (const_int 7) + (const_int 13) (const_int 15)])))] + "ISA_HAS_LASX" + "xvpickod.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvpickod_w_f" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (vec_select:V8SF + (vec_concat:V16SF + (match_operand:V8SF 1 "register_operand" "f") + (match_operand:V8SF 2 "register_operand" "f")) + (parallel [(const_int 1) (const_int 3) + (const_int 9) (const_int 11) + (const_int 5) (const_int 7) + (const_int 13) (const_int 15)])))] + "ISA_HAS_LASX" + "xvpickod.w\t%u0,%u2,%u1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "V8SF")]) + +(define_insn "popcount2" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (popcount:ILASX (match_operand:ILASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvpcnt.\t%u0,%u1" + [(set_attr "type" "simd_pcnt") + (set_attr "mode" "")]) + + +(define_insn "lasx_xvsat_s_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVSAT_S))] + "ISA_HAS_LASX" + "xvsat.\t%u0,%u1,%2" + [(set_attr "type" "simd_sat") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsat_u_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVSAT_U))] + "ISA_HAS_LASX" + "xvsat.\t%u0,%u1,%2" + [(set_attr "type" "simd_sat") + (set_attr "mode" "")]) + +(define_insn "lasx_xvshuf4i_" + [(set (match_operand:LASX_WHB_W 0 "register_operand" "=f") + (unspec:LASX_WHB_W [(match_operand:LASX_WHB_W 1 "register_operand" "f") + (match_operand 2 "const_uimm8_operand")] + UNSPEC_LASX_XVSHUF4I))] + "ISA_HAS_LASX" + "xvshuf4i.\t%u0,%u1,%2" + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lasx_xvshuf4i__1" + [(set (match_operand:LASX_W 0 "register_operand" "=f") + (vec_select:LASX_W + (match_operand:LASX_W 1 "nonimmediate_operand" "f") + (parallel [(match_operand 2 "const_0_to_3_operand") + (match_operand 3 "const_0_to_3_operand") + (match_operand 4 "const_0_to_3_operand") + (match_operand 5 "const_0_to_3_operand") + (match_operand 6 "const_4_to_7_operand") + (match_operand 7 "const_4_to_7_operand") + (match_operand 8 "const_4_to_7_operand") + (match_operand 9 "const_4_to_7_operand")])))] + "ISA_HAS_LASX + && INTVAL (operands[2]) + 4 == INTVAL (operands[6]) + && INTVAL (operands[3]) + 4 == INTVAL (operands[7]) + && INTVAL (operands[4]) + 4 == INTVAL (operands[8]) + && INTVAL (operands[5]) + 4 == INTVAL (operands[9])" +{ + int mask = 0; + mask |= INTVAL (operands[2]) << 0; + mask |= INTVAL (operands[3]) << 2; + mask |= INTVAL (operands[4]) << 4; + mask |= INTVAL (operands[5]) << 6; + operands[2] = GEN_INT (mask); + + return "xvshuf4i.w\t%u0,%u1,%2"; +} + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrar_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVSRAR))] + "ISA_HAS_LASX" + "xvsrar.\t%u0,%u1,%u2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrari_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVSRARI))] + "ISA_HAS_LASX" + "xvsrari.\t%u0,%u1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrlr_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVSRLR))] + "ISA_HAS_LASX" + "xvsrlr.\t%u0,%u1,%u2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrlri_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVSRLRI))] + "ISA_HAS_LASX" + "xvsrlri.\t%u0,%u1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssub_s_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (ss_minus:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvssub.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssub_u_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (us_minus:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvssub.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvshuf_" + [(set (match_operand:LASX_DWH 0 "register_operand" "=f") + (unspec:LASX_DWH [(match_operand:LASX_DWH 1 "register_operand" "0") + (match_operand:LASX_DWH 2 "register_operand" "f") + (match_operand:LASX_DWH 3 "register_operand" "f")] + UNSPEC_LASX_XVSHUF))] + "ISA_HAS_LASX" + "xvshuf.\t%u0,%u2,%u3" + [(set_attr "type" "simd_sld") + (set_attr "mode" "")]) + +(define_insn "lasx_xvshuf_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (unspec:V32QI [(match_operand:V32QI 1 "register_operand" "f") + (match_operand:V32QI 2 "register_operand" "f") + (match_operand:V32QI 3 "register_operand" "f")] + UNSPEC_LASX_XVSHUF_B))] + "ISA_HAS_LASX" + "xvshuf.b\t%u0,%u1,%u2,%u3" + [(set_attr "type" "simd_sld") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvreplve0_" + [(set (match_operand:LASX 0 "register_operand" "=f") + (vec_duplicate:LASX + (vec_select: + (match_operand:LASX 1 "register_operand" "f") + (parallel [(const_int 0)]))))] + "ISA_HAS_LASX" + "xvreplve0.\t%u0,%u1" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lasx_xvrepl128vei_b_internal" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (vec_duplicate:V32QI + (vec_select:V32QI + (match_operand:V32QI 1 "register_operand" "f") + (parallel [(match_operand 2 "const_uimm4_operand" "") + (match_dup 2) (match_dup 2) (match_dup 2) + (match_dup 2) (match_dup 2) (match_dup 2) + (match_dup 2) (match_dup 2) (match_dup 2) + (match_dup 2) (match_dup 2) (match_dup 2) + (match_dup 2) (match_dup 2) (match_dup 2) + (match_operand 3 "const_16_to_31_operand" "") + (match_dup 3) (match_dup 3) (match_dup 3) + (match_dup 3) (match_dup 3) (match_dup 3) + (match_dup 3) (match_dup 3) (match_dup 3) + (match_dup 3) (match_dup 3) (match_dup 3) + (match_dup 3) (match_dup 3) (match_dup 3)]))))] + "ISA_HAS_LASX && ((INTVAL (operands[3]) - INTVAL (operands[2])) == 16)" + "xvrepl128vei.b\t%u0,%u1,%2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvrepl128vei_h_internal" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (vec_duplicate:V16HI + (vec_select:V16HI + (match_operand:V16HI 1 "register_operand" "f") + (parallel [(match_operand 2 "const_uimm3_operand" "") + (match_dup 2) (match_dup 2) (match_dup 2) + (match_dup 2) (match_dup 2) (match_dup 2) + (match_dup 2) + (match_operand 3 "const_8_to_15_operand" "") + (match_dup 3) (match_dup 3) (match_dup 3) + (match_dup 3) (match_dup 3) (match_dup 3) + (match_dup 3)]))))] + "ISA_HAS_LASX && ((INTVAL (operands[3]) - INTVAL (operands[2])) == 8)" + "xvrepl128vei.h\t%u0,%u1,%2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvrepl128vei_w_internal" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (vec_duplicate:V8SI + (vec_select:V8SI + (match_operand:V8SI 1 "register_operand" "f") + (parallel [(match_operand 2 "const_0_to_3_operand" "") + (match_dup 2) (match_dup 2) (match_dup 2) + (match_operand 3 "const_4_to_7_operand" "") + (match_dup 3) (match_dup 3) (match_dup 3)]))))] + "ISA_HAS_LASX && ((INTVAL (operands[3]) - INTVAL (operands[2])) == 4)" + "xvrepl128vei.w\t%u0,%u1,%2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvrepl128vei_d_internal" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (vec_duplicate:V4DI + (vec_select:V4DI + (match_operand:V4DI 1 "register_operand" "f") + (parallel [(match_operand 2 "const_0_or_1_operand" "") + (match_dup 2) + (match_operand 3 "const_2_or_3_operand" "") + (match_dup 3)]))))] + "ISA_HAS_LASX && ((INTVAL (operands[3]) - INTVAL (operands[2])) == 2)" + "xvrepl128vei.d\t%u0,%u1,%2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvrepl128vei_" + [(set (match_operand:LASX 0 "register_operand" "=f") + (unspec:LASX [(match_operand:LASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVREPL128VEI))] + "ISA_HAS_LASX" + "xvrepl128vei.\t%u0,%u1,%2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lasx_xvreplve0__scalar" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (vec_duplicate:FLASX + (match_operand: 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvreplve0.\t%u0,%u1" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +(define_insn "lasx_xvreplve0_q" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (unspec:V32QI [(match_operand:V32QI 1 "register_operand" "f")] + UNSPEC_LASX_XVREPLVE0_Q))] + "ISA_HAS_LASX" + "xvreplve0.q\t%u0,%u1" + [(set_attr "type" "simd_splat") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvfcvt_h_s" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (unspec:V16HI [(match_operand:V8SF 1 "register_operand" "f") + (match_operand:V8SF 2 "register_operand" "f")] + UNSPEC_LASX_XVFCVT))] + "ISA_HAS_LASX" + "xvfcvt.h.s\t%u0,%u1,%u2" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvfcvt_s_d" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V4DF 1 "register_operand" "f") + (match_operand:V4DF 2 "register_operand" "f")] + UNSPEC_LASX_XVFCVT))] + "ISA_HAS_LASX" + "xvfcvt.s.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V8SF")]) + +(define_insn "vec_pack_trunc_v4df" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (vec_concat:V8SF + (float_truncate:V4SF (match_operand:V4DF 1 "register_operand" "f")) + (float_truncate:V4SF (match_operand:V4DF 2 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvfcvt.s.d\t%u0,%u2,%u1\n\txvpermi.d\t%u0,%u0,0xd8" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V8SF") + (set_attr "length" "8")]) + +;; Define for builtin function. +(define_insn "lasx_xvfcvth_s_h" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V16HI 1 "register_operand" "f")] + UNSPEC_LASX_XVFCVTH))] + "ISA_HAS_LASX" + "xvfcvth.s.h\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V8SF")]) + +;; Define for builtin function. +(define_insn "lasx_xvfcvth_d_s" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (float_extend:V4DF + (vec_select:V4SF + (match_operand:V8SF 1 "register_operand" "f") + (parallel [(const_int 2) (const_int 3) + (const_int 6) (const_int 7)]))))] + "ISA_HAS_LASX" + "xvfcvth.d.s\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DF") + (set_attr "length" "12")]) + +;; Define for gen insn. +(define_insn "lasx_xvfcvth_d_insn" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (float_extend:V4DF + (vec_select:V4SF + (match_operand:V8SF 1 "register_operand" "f") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))))] + "ISA_HAS_LASX" + "xvpermi.d\t%u0,%u1,0xfa\n\txvfcvtl.d.s\t%u0,%u0" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DF") + (set_attr "length" "12")]) + +;; Define for builtin function. +(define_insn "lasx_xvfcvtl_s_h" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V16HI 1 "register_operand" "f")] + UNSPEC_LASX_XVFCVTL))] + "ISA_HAS_LASX" + "xvfcvtl.s.h\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V8SF")]) + +;; Define for builtin function. +(define_insn "lasx_xvfcvtl_d_s" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (float_extend:V4DF + (vec_select:V4SF + (match_operand:V8SF 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 4) (const_int 5)]))))] + "ISA_HAS_LASX" + "xvfcvtl.d.s\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DF") + (set_attr "length" "8")]) + +;; Define for gen insn. +(define_insn "lasx_xvfcvtl_d_insn" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (float_extend:V4DF + (vec_select:V4SF + (match_operand:V8SF 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))))] + "ISA_HAS_LASX" + "xvpermi.d\t%u0,%u1,0x50\n\txvfcvtl.d.s\t%u0,%u0" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DF") + (set_attr "length" "8")]) + +(define_code_attr lasxbr + [(eq "xbz") + (ne "xbnz")]) + +(define_code_attr lasxeq_v + [(eq "eqz") + (ne "nez")]) + +(define_code_attr lasxne_v + [(eq "nez") + (ne "eqz")]) + +(define_code_attr lasxeq + [(eq "anyeqz") + (ne "allnez")]) + +(define_code_attr lasxne + [(eq "allnez") + (ne "anyeqz")]) + +(define_insn "lasx__" + [(set (pc) + (if_then_else + (equality_op + (unspec:SI [(match_operand:LASX 1 "register_operand" "f")] + UNSPEC_LASX_BRANCH) + (match_operand:SI 2 "const_0_operand")) + (label_ref (match_operand 0)) + (pc))) + (clobber (match_scratch:FCC 3 "=z"))] + "ISA_HAS_LASX" +{ + return loongarch_output_conditional_branch (insn, operands, + "xvset.\t%Z3%u1\n\tbcnez\t%Z3%0", + "xvset.\t%z3%u1\n\tbcnez\t%Z3%0"); +} + [(set_attr "type" "simd_branch") + (set_attr "mode" "")]) + +(define_insn "lasx__v_" + [(set (pc) + (if_then_else + (equality_op + (unspec:SI [(match_operand:LASX 1 "register_operand" "f")] + UNSPEC_LASX_BRANCH_V) + (match_operand:SI 2 "const_0_operand")) + (label_ref (match_operand 0)) + (pc))) + (clobber (match_scratch:FCC 3 "=z"))] + "ISA_HAS_LASX" +{ + return loongarch_output_conditional_branch (insn, operands, + "xvset.v\t%Z3%u1\n\tbcnez\t%Z3%0", + "xvset.v\t%Z3%u1\n\tbcnez\t%Z3%0"); +} + [(set_attr "type" "simd_branch") + (set_attr "mode" "")]) + +;; loongson-asx. +(define_insn "lasx_vext2xv_h_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)]))))] + "ISA_HAS_LASX" + "vext2xv.h.b\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_vext2xv_w_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))))] + "ISA_HAS_LASX" + "vext2xv.w.h\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_vext2xv_d_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))))] + "ISA_HAS_LASX" + "vext2xv.d.w\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_vext2xv_w_b" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (any_extend:V8SI + (vec_select:V8QI + (match_operand:V32QI 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))))] + "ISA_HAS_LASX" + "vext2xv.w.b\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_vext2xv_d_h" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (any_extend:V4DI + (vec_select:V4HI + (match_operand:V16HI 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))))] + "ISA_HAS_LASX" + "vext2xv.d.h\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_vext2xv_d_b" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (any_extend:V4DI + (vec_select:V4QI + (match_operand:V32QI 1 "register_operand" "f") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))))] + "ISA_HAS_LASX" + "vext2xv.d.b\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DI")]) + +;; Extend loongson-sx to loongson-asx. +(define_insn "xvandn3" + [(set (match_operand:LASX 0 "register_operand" "=f") + (and:LASX (not:LASX (match_operand:LASX 1 "register_operand" "f")) + (match_operand:LASX 2 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvandn.v\t%u0,%u1,%u2" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "abs2" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (abs:ILASX (match_operand:ILASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvsigncov.\t%u0,%u1,%u1" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "neg2" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (neg:ILASX (match_operand:ILASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvneg.\t%u0,%u1" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "lasx_xvmuh_s_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVMUH_S))] + "ISA_HAS_LASX" + "xvmuh.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvmuh_u_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVMUH_U))] + "ISA_HAS_LASX" + "xvmuh.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsllwil_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_WHB 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVSLLWIL_S))] + "ISA_HAS_LASX" + "xvsllwil..\t%u0,%u1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsllwil_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_WHB 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVSLLWIL_U))] + "ISA_HAS_LASX" + "xvsllwil..\t%u0,%u1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsran__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSRAN))] + "ISA_HAS_LASX" + "xvsran..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssran_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRAN_S))] + "ISA_HAS_LASX" + "xvssran..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssran_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRAN_U))] + "ISA_HAS_LASX" + "xvssran..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrarn__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSRARN))] + "ISA_HAS_LASX" + "xvsrarn..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrarn_s__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRARN_S))] + "ISA_HAS_LASX" + "xvssrarn..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrarn_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRARN_U))] + "ISA_HAS_LASX" + "xvssrarn..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrln__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSRLN))] + "ISA_HAS_LASX" + "xvsrln..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrln_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRLN_U))] + "ISA_HAS_LASX" + "xvssrln..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrlrn__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSRLRN))] + "ISA_HAS_LASX" + "xvsrlrn..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrlrn_u__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRLRN_U))] + "ISA_HAS_LASX" + "xvssrlrn..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfrstpi_" + [(set (match_operand:ILASX_HB 0 "register_operand" "=f") + (unspec:ILASX_HB [(match_operand:ILASX_HB 1 "register_operand" "0") + (match_operand:ILASX_HB 2 "register_operand" "f") + (match_operand 3 "const_uimm5_operand" "")] + UNSPEC_LASX_XVFRSTPI))] + "ISA_HAS_LASX" + "xvfrstpi.\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvfrstp_" + [(set (match_operand:ILASX_HB 0 "register_operand" "=f") + (unspec:ILASX_HB [(match_operand:ILASX_HB 1 "register_operand" "0") + (match_operand:ILASX_HB 2 "register_operand" "f") + (match_operand:ILASX_HB 3 "register_operand" "f")] + UNSPEC_LASX_XVFRSTP))] + "ISA_HAS_LASX" + "xvfrstp.\t%u0,%u2,%u3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "lasx_xvshuf4i_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "0") + (match_operand:V4DI 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand")] + UNSPEC_LASX_XVSHUF4I))] + "ISA_HAS_LASX" + "xvshuf4i.d\t%u0,%u2,%3" + [(set_attr "type" "simd_sld") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvbsrl_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const_uimm5_operand" "")] + UNSPEC_LASX_XVBSRL_V))] + "ISA_HAS_LASX" + "xvbsrl.v\t%u0,%u1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvbsll_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const_uimm5_operand" "")] + UNSPEC_LASX_XVBSLL_V))] + "ISA_HAS_LASX" + "xvbsll.v\t%u0,%u1,%2" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvextrins_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVEXTRINS))] + "ISA_HAS_LASX" + "xvextrins.\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvmskltz_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")] + UNSPEC_LASX_XVMSKLTZ))] + "ISA_HAS_LASX" + "xvmskltz.\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsigncov_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVSIGNCOV))] + "ISA_HAS_LASX" + "xvsigncov.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_expand "copysign3" + [(set (match_dup 4) + (and:FLASX + (not:FLASX (match_dup 3)) + (match_operand:FLASX 1 "register_operand"))) + (set (match_dup 5) + (and:FLASX (match_dup 3) + (match_operand:FLASX 2 "register_operand"))) + (set (match_operand:FLASX 0 "register_operand") + (ior:FLASX (match_dup 4) (match_dup 5)))] + "ISA_HAS_LASX" +{ + operands[3] = loongarch_build_signbit_mask (mode, 1, 0); + + operands[4] = gen_reg_rtx (mode); + operands[5] = gen_reg_rtx (mode); +}) + + +(define_insn "absv4df2" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (abs:V4DF (match_operand:V4DF 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvbitclri.d\t%u0,%u1,63" + [(set_attr "type" "simd_logic") + (set_attr "mode" "V4DF")]) + +(define_insn "absv8sf2" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (abs:V8SF (match_operand:V8SF 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvbitclri.w\t%u0,%u1,31" + [(set_attr "type" "simd_logic") + (set_attr "mode" "V8SF")]) + +(define_insn "negv4df2" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (neg:V4DF (match_operand:V4DF 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvbitrevi.d\t%u0,%u1,63" + [(set_attr "type" "simd_logic") + (set_attr "mode" "V4DF")]) + +(define_insn "negv8sf2" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (neg:V8SF (match_operand:V8SF 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvbitrevi.w\t%u0,%u1,31" + [(set_attr "type" "simd_logic") + (set_attr "mode" "V8SF")]) + +(define_insn "xvfmadd4" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (fma:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f") + (match_operand:FLASX 3 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvfmadd.\t%u0,%u1,$u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "fms4" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (fma:FLASX (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f") + (neg:FLASX (match_operand:FLASX 3 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvfmsub.\t%u0,%u1,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "xvfnmsub4_nmsub4" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (neg:FLASX + (fma:FLASX + (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f") + (neg:FLASX (match_operand:FLASX 3 "register_operand" "f")))))] + "ISA_HAS_LASX" + "xvfnmsub.\t%u0,%u1,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + + +(define_insn "xvfnmadd4_nmadd4" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (neg:FLASX + (fma:FLASX + (match_operand:FLASX 1 "register_operand" "f") + (match_operand:FLASX 2 "register_operand" "f") + (match_operand:FLASX 3 "register_operand" "f"))))] + "ISA_HAS_LASX" + "xvfnmadd.\t%u0,%u1,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "")]) + +(define_insn "lasx_xvftintrne_w_s" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRNE_W_S))] + "ISA_HAS_LASX" + "xvftintrne.w.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrne_l_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRNE_L_D))] + "ISA_HAS_LASX" + "xvftintrne.l.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvftintrp_w_s" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRP_W_S))] + "ISA_HAS_LASX" + "xvftintrp.w.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrp_l_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRP_L_D))] + "ISA_HAS_LASX" + "xvftintrp.l.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvftintrm_w_s" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRM_W_S))] + "ISA_HAS_LASX" + "xvftintrm.w.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrm_l_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRM_L_D))] + "ISA_HAS_LASX" + "xvftintrm.l.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvftint_w_d" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V4DF 1 "register_operand" "f") + (match_operand:V4DF 2 "register_operand" "f")] + UNSPEC_LASX_XVFTINT_W_D))] + "ISA_HAS_LASX" + "xvftint.w.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvffint_s_l" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVFFINT_S_L))] + "ISA_HAS_LASX" + "xvffint.s.l\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvftintrz_w_d" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V4DF 1 "register_operand" "f") + (match_operand:V4DF 2 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRZ_W_D))] + "ISA_HAS_LASX" + "xvftintrz.w.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvftintrp_w_d" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V4DF 1 "register_operand" "f") + (match_operand:V4DF 2 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRP_W_D))] + "ISA_HAS_LASX" + "xvftintrp.w.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvftintrm_w_d" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V4DF 1 "register_operand" "f") + (match_operand:V4DF 2 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRM_W_D))] + "ISA_HAS_LASX" + "xvftintrm.w.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvftintrne_w_d" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (unspec:V8SI [(match_operand:V4DF 1 "register_operand" "f") + (match_operand:V4DF 2 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRNE_W_D))] + "ISA_HAS_LASX" + "xvftintrne.w.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvftinth_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTH_L_S))] + "ISA_HAS_LASX" + "xvftinth.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintl_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTL_L_S))] + "ISA_HAS_LASX" + "xvftintl.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvffinth_d_w" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (unspec:V4DF [(match_operand:V8SI 1 "register_operand" "f")] + UNSPEC_LASX_XVFFINTH_D_W))] + "ISA_HAS_LASX" + "xvffinth.d.w\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvffintl_d_w" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (unspec:V4DF [(match_operand:V8SI 1 "register_operand" "f")] + UNSPEC_LASX_XVFFINTL_D_W))] + "ISA_HAS_LASX" + "xvffintl.d.w\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvftintrzh_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRZH_L_S))] + "ISA_HAS_LASX" + "xvftintrzh.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrzl_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRZL_L_S))] + "ISA_HAS_LASX" + "xvftintrzl.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lasx_xvftintrph_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRPH_L_S))] + "ISA_HAS_LASX" + "xvftintrph.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4SF")]) + +(define_insn "lasx_xvftintrpl_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRPL_L_S))] + "ISA_HAS_LASX" + "xvftintrpl.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrmh_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRMH_L_S))] + "ISA_HAS_LASX" + "xvftintrmh.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrml_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRML_L_S))] + "ISA_HAS_LASX" + "xvftintrml.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrneh_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRNEH_L_S))] + "ISA_HAS_LASX" + "xvftintrneh.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvftintrnel_l_s" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFTINTRNEL_L_S))] + "ISA_HAS_LASX" + "xvftintrnel.l.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvfrintrne_s" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRNE_S))] + "ISA_HAS_LASX" + "xvfrintrne.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvfrintrne_d" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRNE_D))] + "ISA_HAS_LASX" + "xvfrintrne.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvfrintrz_s" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRZ_S))] + "ISA_HAS_LASX" + "xvfrintrz.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvfrintrz_d" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRZ_D))] + "ISA_HAS_LASX" + "xvfrintrz.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvfrintrp_s" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRP_S))] + "ISA_HAS_LASX" + "xvfrintrp.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvfrintrp_d" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRP_D))] + "ISA_HAS_LASX" + "xvfrintrp.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +(define_insn "lasx_xvfrintrm_s" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRM_S))] + "ISA_HAS_LASX" + "xvfrintrm.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "lasx_xvfrintrm_d" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINTRM_D))] + "ISA_HAS_LASX" + "xvfrintrm.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +;; Vector versions of the floating-point frint patterns. +;; Expands to btrunc, ceil, floor, rint. +(define_insn "v8sf2" + [(set (match_operand:V8SF 0 "register_operand" "=f") + (unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")] + FRINT256_S))] + "ISA_HAS_LASX" + "xvfrint.s\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V8SF")]) + +(define_insn "v4df2" + [(set (match_operand:V4DF 0 "register_operand" "=f") + (unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")] + FRINT256_D))] + "ISA_HAS_LASX" + "xvfrint.d\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "V4DF")]) + +;; Expands to round. +(define_insn "round2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRINT))] + "ISA_HAS_LASX" + "xvfrint.\t%u0,%u1" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +;; Offset load and broadcast +(define_expand "lasx_xvldrepl_" + [(match_operand:LASX 0 "register_operand") + (match_operand 2 "aq12_operand") + (match_operand 1 "pmode_register_operand")] + "ISA_HAS_LASX" +{ + emit_insn (gen_lasx_xvldrepl__insn + (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "lasx_xvldrepl__insn" + [(set (match_operand:LASX 0 "register_operand" "=f") + (vec_duplicate:LASX + (mem: (plus:DI (match_operand:DI 1 "register_operand" "r") + (match_operand 2 "aq12_operand")))))] + "ISA_HAS_LASX" +{ + return "xvldrepl.\t%u0,%1,%2"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "") + (set_attr "length" "4")]) + +;; Offset is "0" +(define_insn "lasx_xvldrepl__insn_0" + [(set (match_operand:LASX 0 "register_operand" "=f") + (vec_duplicate:LASX + (mem: (match_operand:DI 1 "register_operand" "r"))))] + "ISA_HAS_LASX" +{ + return "xvldrepl.\t%u0,%1,0"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "") + (set_attr "length" "4")]) + +;;XVADDWEV.H.B XVSUBWEV.H.B XVMULWEV.H.B +;;XVADDWEV.H.BU XVSUBWEV.H.BU XVMULWEV.H.BU +(define_insn "lasx_xvwev_h_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (addsubmul:V16HI + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)]))) + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)])))))] + "ISA_HAS_LASX" + "xvwev.h.b\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V16HI")]) + +;;XVADDWEV.W.H XVSUBWEV.W.H XVMULWEV.W.H +;;XVADDWEV.W.HU XVSUBWEV.W.HU XVMULWEV.W.HU +(define_insn "lasx_xvwev_w_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (addsubmul:V8SI + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)])))))] + "ISA_HAS_LASX" + "xvwev.w.h\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8SI")]) + +;;XVADDWEV.D.W XVSUBWEV.D.W XVMULWEV.D.W +;;XVADDWEV.D.WU XVSUBWEV.D.WU XVMULWEV.D.WU +(define_insn "lasx_xvwev_d_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (addsubmul:V4DI + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))))] + "ISA_HAS_LASX" + "xvwev.d.w\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVADDWEV.Q.D +;;TODO2 +(define_insn "lasx_xvaddwev_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVADDWEV))] + "ISA_HAS_LASX" + "xvaddwev.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVSUBWEV.Q.D +;;TODO2 +(define_insn "lasx_xvsubwev_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVSUBWEV))] + "ISA_HAS_LASX" + "xvsubwev.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMULWEV.Q.D +;;TODO2 +(define_insn "lasx_xvmulwev_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVMULWEV))] + "ISA_HAS_LASX" + "xvmulwev.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + + +;;XVADDWOD.H.B XVSUBWOD.H.B XVMULWOD.H.B +;;XVADDWOD.H.BU XVSUBWOD.H.BU XVMULWOD.H.BU +(define_insn "lasx_xvwod_h_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (addsubmul:V16HI + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)]))) + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)])))))] + "ISA_HAS_LASX" + "xvwod.h.b\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V16HI")]) + +;;XVADDWOD.W.H XVSUBWOD.W.H XVMULWOD.W.H +;;XVADDWOD.W.HU XVSUBWOD.W.HU XVMULWOD.W.HU +(define_insn "lasx_xvwod_w_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (addsubmul:V8SI + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)])))))] + "ISA_HAS_LASX" + "xvwod.w.h\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8SI")]) + + +;;XVADDWOD.D.W XVSUBWOD.D.W XVMULWOD.D.W +;;XVADDWOD.D.WU XVSUBWOD.D.WU XVMULWOD.D.WU +(define_insn "lasx_xvwod_d_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (addsubmul:V4DI + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)])))))] + "ISA_HAS_LASX" + "xvwod.d.w\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVADDWOD.Q.D +;;TODO2 +(define_insn "lasx_xvaddwod_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVADDWOD))] + "ISA_HAS_LASX" + "xvaddwod.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVSUBWOD.Q.D +;;TODO2 +(define_insn "lasx_xvsubwod_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVSUBWOD))] + "ISA_HAS_LASX" + "xvsubwod.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMULWOD.Q.D +;;TODO2 +(define_insn "lasx_xvmulwod_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVMULWOD))] + "ISA_HAS_LASX" + "xvmulwod.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVADDWEV.Q.DU +;;TODO2 +(define_insn "lasx_xvaddwev_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVADDWEV2))] + "ISA_HAS_LASX" + "xvaddwev.q.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVSUBWEV.Q.DU +;;TODO2 +(define_insn "lasx_xvsubwev_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVSUBWEV2))] + "ISA_HAS_LASX" + "xvsubwev.q.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMULWEV.Q.DU +;;TODO2 +(define_insn "lasx_xvmulwev_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVMULWEV2))] + "ISA_HAS_LASX" + "xvmulwev.q.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVADDWOD.Q.DU +;;TODO2 +(define_insn "lasx_xvaddwod_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVADDWOD2))] + "ISA_HAS_LASX" + "xvaddwod.q.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVSUBWOD.Q.DU +;;TODO2 +(define_insn "lasx_xvsubwod_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVSUBWOD2))] + "ISA_HAS_LASX" + "xvsubwod.q.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMULWOD.Q.DU +;;TODO2 +(define_insn "lasx_xvmulwod_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVMULWOD2))] + "ISA_HAS_LASX" + "xvmulwod.q.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVADDWEV.H.BU.B XVMULWEV.H.BU.B +(define_insn "lasx_xvwev_h_bu_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (addmul:V16HI + (zero_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)]))) + (sign_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)])))))] + "ISA_HAS_LASX" + "xvwev.h.bu.b\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V16HI")]) + +;;XVADDWEV.W.HU.H XVMULWEV.W.HU.H +(define_insn "lasx_xvwev_w_hu_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (addmul:V8SI + (zero_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (sign_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)])))))] + "ISA_HAS_LASX" + "xvwev.w.hu.h\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8SI")]) + +;;XVADDWEV.D.WU.W XVMULWEV.D.WU.W +(define_insn "lasx_xvwev_d_wu_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (addmul:V4DI + (zero_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (sign_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))))] + "ISA_HAS_LASX" + "xvwev.d.wu.w\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVADDWOD.H.BU.B XVMULWOD.H.BU.B +(define_insn "lasx_xvwod_h_bu_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (addmul:V16HI + (zero_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)]))) + (sign_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)])))))] + "ISA_HAS_LASX" + "xvwod.h.bu.b\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V16HI")]) + +;;XVADDWOD.W.HU.H XVMULWOD.W.HU.H +(define_insn "lasx_xvwod_w_hu_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (addmul:V8SI + (zero_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (sign_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)])))))] + "ISA_HAS_LASX" + "xvwod.w.hu.h\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V8SI")]) + +;;XVADDWOD.D.WU.W XVMULWOD.D.WU.W +(define_insn "lasx_xvwod_d_wu_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (addmul:V4DI + (zero_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (sign_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)])))))] + "ISA_HAS_LASX" + "xvwod.d.wu.w\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMADDWEV.H.B XVMADDWEV.H.BU +(define_insn "lasx_xvmaddwev_h_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (plus:V16HI + (match_operand:V16HI 1 "register_operand" "0") + (mult:V16HI + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)]))) + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)]))))))] + "ISA_HAS_LASX" + "xvmaddwev.h.b\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V16HI")]) + +;;XVMADDWEV.W.H XVMADDWEV.W.HU +(define_insn "lasx_xvmaddwev_w_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (plus:V8SI + (match_operand:V8SI 1 "register_operand" "0") + (mult:V8SI + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))))))] + "ISA_HAS_LASX" + "xvmaddwev.w.h\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8SI")]) + +;;XVMADDWEV.D.W XVMADDWEV.D.WU +(define_insn "lasx_xvmaddwev_d_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (plus:V4DI + (match_operand:V4DI 1 "register_operand" "0") + (mult:V4DI + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))))))] + "ISA_HAS_LASX" + "xvmaddwev.d.w\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4DI")]) + +;;XVMADDWEV.Q.D +;;TODO2 +(define_insn "lasx_xvmaddwev_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "0") + (match_operand:V4DI 2 "register_operand" "f") + (match_operand:V4DI 3 "register_operand" "f")] + UNSPEC_LASX_XVMADDWEV))] + "ISA_HAS_LASX" + "xvmaddwev.q.d\t%u0,%u2,%u3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMADDWOD.H.B XVMADDWOD.H.BU +(define_insn "lasx_xvmaddwod_h_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (plus:V16HI + (match_operand:V16HI 1 "register_operand" "0") + (mult:V16HI + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)]))) + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)]))))))] + "ISA_HAS_LASX" + "xvmaddwod.h.b\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V16HI")]) + +;;XVMADDWOD.W.H XVMADDWOD.W.HU +(define_insn "lasx_xvmaddwod_w_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (plus:V8SI + (match_operand:V8SI 1 "register_operand" "0") + (mult:V8SI + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))))))] + "ISA_HAS_LASX" + "xvmaddwod.w.h\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8SI")]) + +;;XVMADDWOD.D.W XVMADDWOD.D.WU +(define_insn "lasx_xvmaddwod_d_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (plus:V4DI + (match_operand:V4DI 1 "register_operand" "0") + (mult:V4DI + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))))))] + "ISA_HAS_LASX" + "xvmaddwod.d.w\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4DI")]) + +;;XVMADDWOD.Q.D +;;TODO2 +(define_insn "lasx_xvmaddwod_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "0") + (match_operand:V4DI 2 "register_operand" "f") + (match_operand:V4DI 3 "register_operand" "f")] + UNSPEC_LASX_XVMADDWOD))] + "ISA_HAS_LASX" + "xvmaddwod.q.d\t%u0,%u2,%u3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMADDWEV.Q.DU +;;TODO2 +(define_insn "lasx_xvmaddwev_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "0") + (match_operand:V4DI 2 "register_operand" "f") + (match_operand:V4DI 3 "register_operand" "f")] + UNSPEC_LASX_XVMADDWEV2))] + "ISA_HAS_LASX" + "xvmaddwev.q.du\t%u0,%u2,%u3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMADDWOD.Q.DU +;;TODO2 +(define_insn "lasx_xvmaddwod_q_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "0") + (match_operand:V4DI 2 "register_operand" "f") + (match_operand:V4DI 3 "register_operand" "f")] + UNSPEC_LASX_XVMADDWOD2))] + "ISA_HAS_LASX" + "xvmaddwod.q.du\t%u0,%u2,%u3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMADDWEV.H.BU.B +(define_insn "lasx_xvmaddwev_h_bu_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (plus:V16HI + (match_operand:V16HI 1 "register_operand" "0") + (mult:V16HI + (zero_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)]))) + (sign_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14) + (const_int 16) (const_int 18) + (const_int 20) (const_int 22) + (const_int 24) (const_int 26) + (const_int 28) (const_int 30)]))))))] + "ISA_HAS_LASX" + "xvmaddwev.h.bu.b\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V16HI")]) + +;;XVMADDWEV.W.HU.H +(define_insn "lasx_xvmaddwev_w_hu_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (plus:V8SI + (match_operand:V8SI 1 "register_operand" "0") + (mult:V8SI + (zero_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))) + (sign_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6) + (const_int 8) (const_int 10) + (const_int 12) (const_int 14)]))))))] + "ISA_HAS_LASX" + "xvmaddwev.w.hu.h\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8SI")]) + +;;XVMADDWEV.D.WU.W +(define_insn "lasx_xvmaddwev_d_wu_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (plus:V4DI + (match_operand:V4DI 1 "register_operand" "0") + (mult:V4DI + (zero_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (sign_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 3 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))))))] + "ISA_HAS_LASX" + "xvmaddwev.d.wu.w\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4DI")]) + +;;XVMADDWEV.Q.DU.D +;;TODO2 +(define_insn "lasx_xvmaddwev_q_du_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "0") + (match_operand:V4DI 2 "register_operand" "f") + (match_operand:V4DI 3 "register_operand" "f")] + UNSPEC_LASX_XVMADDWEV3))] + "ISA_HAS_LASX" + "xvmaddwev.q.du.d\t%u0,%u2,%u3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMADDWOD.H.BU.B +(define_insn "lasx_xvmaddwod_h_bu_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (plus:V16HI + (match_operand:V16HI 1 "register_operand" "0") + (mult:V16HI + (zero_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)]))) + (sign_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15) + (const_int 17) (const_int 19) + (const_int 21) (const_int 23) + (const_int 25) (const_int 27) + (const_int 29) (const_int 31)]))))))] + "ISA_HAS_LASX" + "xvmaddwod.h.bu.b\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V16HI")]) + +;;XVMADDWOD.W.HU.H +(define_insn "lasx_xvmaddwod_w_hu_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (plus:V8SI + (match_operand:V8SI 1 "register_operand" "0") + (mult:V8SI + (zero_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))) + (sign_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7) + (const_int 9) (const_int 11) + (const_int 13) (const_int 15)]))))))] + "ISA_HAS_LASX" + "xvmaddwod.w.hu.h\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V8SI")]) + +;;XVMADDWOD.D.WU.W +(define_insn "lasx_xvmaddwod_d_wu_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (plus:V4DI + (match_operand:V4DI 1 "register_operand" "0") + (mult:V4DI + (zero_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "%f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))) + (sign_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 3 "register_operand" "f") + (parallel [(const_int 1) (const_int 3) + (const_int 5) (const_int 7)]))))))] + "ISA_HAS_LASX" + "xvmaddwod.d.wu.w\t%u0,%u2,%u3" + [(set_attr "type" "simd_fmadd") + (set_attr "mode" "V4DI")]) + +;;XVMADDWOD.Q.DU.D +;;TODO2 +(define_insn "lasx_xvmaddwod_q_du_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "0") + (match_operand:V4DI 2 "register_operand" "f") + (match_operand:V4DI 3 "register_operand" "f")] + UNSPEC_LASX_XVMADDWOD3))] + "ISA_HAS_LASX" + "xvmaddwod.q.du.d\t%u0,%u2,%u3" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVHADDW.Q.D +;;TODO2 +(define_insn "lasx_xvhaddw_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVHADDW_Q_D))] + "ISA_HAS_LASX" + "xvhaddw.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVHSUBW.Q.D +;;TODO2 +(define_insn "lasx_xvhsubw_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVHSUBW_Q_D))] + "ISA_HAS_LASX" + "xvhsubw.q.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVHADDW.QU.DU +;;TODO2 +(define_insn "lasx_xvhaddw_qu_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVHADDW_QU_DU))] + "ISA_HAS_LASX" + "xvhaddw.qu.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVHSUBW.QU.DU +;;TODO2 +(define_insn "lasx_xvhsubw_qu_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVHSUBW_QU_DU))] + "ISA_HAS_LASX" + "xvhsubw.qu.du\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVROTR.B XVROTR.H XVROTR.W XVROTR.D +;;TODO-478 +(define_insn "lasx_xvrotr_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f") + (match_operand:ILASX 2 "register_operand" "f")] + UNSPEC_LASX_XVROTR))] + "ISA_HAS_LASX" + "xvrotr.\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +;;XVADD.Q +;;TODO2 +(define_insn "lasx_xvadd_q" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVADD_Q))] + "ISA_HAS_LASX" + "xvadd.q\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVSUB.Q +;;TODO2 +(define_insn "lasx_xvsub_q" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVSUB_Q))] + "ISA_HAS_LASX" + "xvsub.q\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVSSRLN.B.H XVSSRLN.H.W XVSSRLN.W.D +(define_insn "lasx_xvssrln__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRLN))] + "ISA_HAS_LASX" + "xvssrln..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +;;XVREPLVE.B XVREPLVE.H XVREPLVE.W XVREPLVE.D +(define_insn "lasx_xvreplve_" + [(set (match_operand:LASX 0 "register_operand" "=f") + (unspec:LASX [(match_operand:LASX 1 "register_operand" "f") + (match_operand:SI 2 "register_operand" "r")] + UNSPEC_LASX_XVREPLVE))] + "ISA_HAS_LASX" + "xvreplve.\t%u0,%u1,%z2" + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + +;;XVADDWEV.Q.DU.D +(define_insn "lasx_xvaddwev_q_du_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVADDWEV3))] + "ISA_HAS_LASX" + "xvaddwev.q.du.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVADDWOD.Q.DU.D +(define_insn "lasx_xvaddwod_q_du_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVADDWOD3))] + "ISA_HAS_LASX" + "xvaddwod.q.du.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMULWEV.Q.DU.D +(define_insn "lasx_xvmulwev_q_du_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVMULWEV3))] + "ISA_HAS_LASX" + "xvmulwev.q.du.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;;XVMULWOD.Q.DU.D +(define_insn "lasx_xvmulwod_q_du_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f") + (match_operand:V4DI 2 "register_operand" "f")] + UNSPEC_LASX_XVMULWOD3))] + "ISA_HAS_LASX" + "xvmulwod.q.du.d\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvpickve2gr_w" + [(set (match_operand:SI 0 "register_operand" "=r") + (any_extend:SI + (vec_select:SI + (match_operand:V8SI 1 "register_operand" "f") + (parallel [(match_operand 2 "const_0_to_7_operand" "")]))))] + "ISA_HAS_LASX" + "xvpickve2gr.w\t%0,%u1,%2" + [(set_attr "type" "simd_copy") + (set_attr "mode" "V8SI")]) + + +(define_insn "lasx_xvmskgez_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (unspec:V32QI [(match_operand:V32QI 1 "register_operand" "f")] + UNSPEC_LASX_XVMSKGEZ))] + "ISA_HAS_LASX" + "xvmskgez.b\t%u0,%u1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvmsknz_b" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (unspec:V32QI [(match_operand:V32QI 1 "register_operand" "f")] + UNSPEC_LASX_XVMSKNZ))] + "ISA_HAS_LASX" + "xvmsknz.b\t%u0,%u1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvexth_h_b" + [(set (match_operand:V16HI 0 "register_operand" "=f") + (any_extend:V16HI + (vec_select:V16QI + (match_operand:V32QI 1 "register_operand" "f") + (parallel [(const_int 16) (const_int 17) + (const_int 18) (const_int 19) + (const_int 20) (const_int 21) + (const_int 22) (const_int 23) + (const_int 24) (const_int 25) + (const_int 26) (const_int 27) + (const_int 28) (const_int 29) + (const_int 30) (const_int 31)]))))] + "ISA_HAS_LASX" + "xvexth.h.b\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V16HI")]) + +(define_insn "lasx_xvexth_w_h" + [(set (match_operand:V8SI 0 "register_operand" "=f") + (any_extend:V8SI + (vec_select:V8HI + (match_operand:V16HI 1 "register_operand" "f") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)]))))] + "ISA_HAS_LASX" + "xvexth.w.h\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V8SI")]) + +(define_insn "lasx_xvexth_d_w" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "f") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))))] + "ISA_HAS_LASX" + "xvexth.d.w\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvexth_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")] + UNSPEC_LASX_XVEXTH_Q_D))] + "ISA_HAS_LASX" + "xvexth.q.d\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvexth_qu_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")] + UNSPEC_LASX_XVEXTH_QU_DU))] + "ISA_HAS_LASX" + "xvexth.qu.du\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvrotri_" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (rotatert:ILASX (match_operand:ILASX 1 "register_operand" "f") + (match_operand 2 "const__operand" "")))] + "ISA_HAS_LASX" + "xvrotri.\t%u0,%u1,%2" + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lasx_xvextl_q_d" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")] + UNSPEC_LASX_XVEXTL_Q_D))] + "ISA_HAS_LASX" + "xvextl.q.d\t%u0,%u1" + [(set_attr "type" "simd_fcvt") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvsrlni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSRLNI))] + "ISA_HAS_LASX" + "xvsrlni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrlrni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSRLRNI))] + "ISA_HAS_LASX" + "xvsrlrni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrlni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRLNI))] + "ISA_HAS_LASX" + "xvssrlni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrlni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRLNI2))] + "ISA_HAS_LASX" + "xvssrlni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrlrni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRLRNI))] + "ISA_HAS_LASX" + "xvssrlrni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrlrni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRLRNI2))] + "ISA_HAS_LASX" + "xvssrlrni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrani__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSRANI))] + "ISA_HAS_LASX" + "xvsrani..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvsrarni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSRARNI))] + "ISA_HAS_LASX" + "xvsrarni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrani__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRANI))] + "ISA_HAS_LASX" + "xvssrani..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrani__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRANI2))] + "ISA_HAS_LASX" + "xvssrani..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrarni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRARNI))] + "ISA_HAS_LASX" + "xvssrarni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrarni__" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "0") + (match_operand:ILASX 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVSSRARNI2))] + "ISA_HAS_LASX" + "xvssrarni..\t%u0,%u2,%3" + [(set_attr "type" "simd_shift") + (set_attr "mode" "")]) + +(define_mode_attr VDOUBLEMODEW256 + [(V8SI "V16SI") + (V8SF "V16SF")]) + +(define_insn "lasx_xvpermi_" + [(set (match_operand:LASX_W 0 "register_operand" "=f") + (unspec:LASX_W [(match_operand:LASX_W 1 "register_operand" "0") + (match_operand:LASX_W 2 "register_operand" "f") + (match_operand 3 "const_uimm8_operand" "")] + UNSPEC_LASX_XVPERMI))] + "ISA_HAS_LASX" + "xvpermi.w\t%u0,%u2,%3" + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_insn "lasx_xvpermi__1" + [(set (match_operand:LASX_W 0 "register_operand" "=f") + (vec_select:LASX_W + (vec_concat: + (match_operand:LASX_W 1 "register_operand" "f") + (match_operand:LASX_W 2 "register_operand" "0")) + (parallel [(match_operand 3 "const_0_to_3_operand") + (match_operand 4 "const_0_to_3_operand" ) + (match_operand 5 "const_8_to_11_operand" ) + (match_operand 6 "const_8_to_11_operand" ) + (match_operand 7 "const_4_to_7_operand" ) + (match_operand 8 "const_4_to_7_operand" ) + (match_operand 9 "const_12_to_15_operand") + (match_operand 10 "const_12_to_15_operand")])))] + "ISA_HAS_LASX + && INTVAL (operands[3]) + 4 == INTVAL (operands[7]) + && INTVAL (operands[4]) + 4 == INTVAL (operands[8]) + && INTVAL (operands[5]) + 4 == INTVAL (operands[9]) + && INTVAL (operands[6]) + 4 == INTVAL (operands[10])" +{ + int mask = 0; + mask |= INTVAL (operands[3]) << 0; + mask |= INTVAL (operands[4]) << 2; + mask |= (INTVAL (operands[5]) - 8) << 4; + mask |= (INTVAL (operands[6]) - 8) << 6; + operands[3] = GEN_INT (mask); + + return "xvpermi.w\t%u0,%u1,%3"; +} + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + +(define_expand "lasx_xvld" + [(match_operand:V32QI 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq12b_operand")] + "ISA_HAS_LASX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (operands[0], gen_rtx_MEM (V32QImode, addr)); + DONE; +}) + +(define_expand "lasx_xvst" + [(match_operand:V32QI 0 "register_operand") + (match_operand 1 "pmode_register_operand") + (match_operand 2 "aq12b_operand")] + "ISA_HAS_LASX" +{ + rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], + INTVAL (operands[2])); + loongarch_emit_move (gen_rtx_MEM (V32QImode, addr), operands[0]); + DONE; +}) + +(define_expand "lasx_xvstelm_" + [(match_operand:LASX 0 "register_operand") + (match_operand 3 "const__operand") + (match_operand 2 "aq8_operand") + (match_operand 1 "pmode_register_operand")] + "ISA_HAS_LASX" +{ + emit_insn (gen_lasx_xvstelm__insn + (operands[1], operands[2], operands[0], operands[3])); + DONE; +}) + +(define_insn "lasx_xvstelm__insn" + [(set (mem: (plus:DI (match_operand:DI 0 "register_operand" "r") + (match_operand 1 "aq8_operand"))) + (vec_select: + (match_operand:LASX 2 "register_operand" "f") + (parallel [(match_operand 3 "const__operand" "")])))] + "ISA_HAS_LASX" +{ + return "xvstelm.\t%u2,%0,%1,%3"; +} + [(set_attr "type" "simd_store") + (set_attr "mode" "") + (set_attr "length" "4")]) + +;; Offset is "0" +(define_insn "lasx_xvstelm__insn_0" + [(set (mem: (match_operand:DI 0 "register_operand" "r")) + (vec_select: + (match_operand:LASX_WD 1 "register_operand" "f") + (parallel [(match_operand:SI 2 "const__operand")])))] + "ISA_HAS_LASX" +{ + return "xvstelm.\t%u1,%0,0,%2"; +} + [(set_attr "type" "simd_store") + (set_attr "mode" "") + (set_attr "length" "4")]) + +(define_insn "lasx_xvinsve0_" + [(set (match_operand:LASX_WD 0 "register_operand" "=f") + (unspec:LASX_WD [(match_operand:LASX_WD 1 "register_operand" "0") + (match_operand:LASX_WD 2 "register_operand" "f") + (match_operand 3 "const__operand" "")] + UNSPEC_LASX_XVINSVE0))] + "ISA_HAS_LASX" + "xvinsve0.\t%u0,%u2,%3" + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lasx_xvinsve0__scalar" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (vec_merge:FLASX + (vec_duplicate:FLASX + (match_operand: 1 "register_operand" "f")) + (match_operand:FLASX 2 "register_operand" "0") + (match_operand 3 "const__operand" "")))] + "ISA_HAS_LASX" + "xvinsve0.\t%u0,%u1,%y3" + [(set_attr "type" "simd_insert") + (set_attr "mode" "")]) + +(define_insn "lasx_xvpickve_" + [(set (match_operand:LASX_WD 0 "register_operand" "=f") + (unspec:LASX_WD [(match_operand:LASX_WD 1 "register_operand" "f") + (match_operand 2 "const__operand" "")] + UNSPEC_LASX_XVPICKVE))] + "ISA_HAS_LASX" + "xvpickve.\t%u0,%u1,%2" + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lasx_xvpickve__scalar" + [(set (match_operand: 0 "register_operand" "=f") + (vec_select: + (match_operand:FLASX 1 "register_operand" "f") + (parallel [(match_operand 2 "const__operand" "")])))] + "ISA_HAS_LASX" + "xvpickve.\t%u0,%u1,%2" + [(set_attr "type" "simd_shf") + (set_attr "mode" "")]) + +(define_insn "lasx_xvssrlrn__" + [(set (match_operand: 0 "register_operand" "=f") + (unspec: [(match_operand:ILASX_DWH 1 "register_operand" "f") + (match_operand:ILASX_DWH 2 "register_operand" "f")] + UNSPEC_LASX_XVSSRLRN))] + "ISA_HAS_LASX" + "xvssrlrn..\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "")]) + +(define_insn "xvorn3" + [(set (match_operand:ILASX 0 "register_operand" "=f") + (ior:ILASX (not:ILASX (match_operand:ILASX 2 "register_operand" "f")) + (match_operand:ILASX 1 "register_operand" "f")))] + "ISA_HAS_LASX" + "xvorn.v\t%u0,%u1,%u2" + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + +(define_insn "lasx_xvextl_qu_du" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")] + UNSPEC_LASX_XVEXTL_QU_DU))] + "ISA_HAS_LASX" + "xvextl.qu.du\t%u0,%u1" + [(set_attr "type" "simd_bit") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvldi" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (unspec:V4DI[(match_operand 1 "const_imm13_operand")] + UNSPEC_LASX_XVLDI))] + "ISA_HAS_LASX" +{ + HOST_WIDE_INT val = INTVAL (operands[1]); + if (val < 0) + { + HOST_WIDE_INT modeVal = (val & 0xf00) >> 8; + if (modeVal < 13) + return "xvldi\t%u0,%1"; + else + { + sorry ("imm13 only support 0000 ~ 1100 in bits '12 ~ 9' when bit '13' is 1"); + return "#"; + } + } + else + return "xvldi\t%u0,%1"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "V4DI")]) + +(define_insn "lasx_xvldx" + [(set (match_operand:V32QI 0 "register_operand" "=f") + (unspec:V32QI [(match_operand:DI 1 "register_operand" "r") + (match_operand:DI 2 "reg_or_0_operand" "rJ")] + UNSPEC_LASX_XVLDX))] + "ISA_HAS_LASX" +{ + return "xvldx\t%u0,%1,%z2"; +} + [(set_attr "type" "simd_load") + (set_attr "mode" "V32QI")]) + +(define_insn "lasx_xvstx" + [(set (mem:V32QI (plus:DI (match_operand:DI 1 "register_operand" "r") + (match_operand:DI 2 "reg_or_0_operand" "rJ"))) + (unspec: V32QI[(match_operand:V32QI 0 "register_operand" "f")] + UNSPEC_LASX_XVSTX))] + + "ISA_HAS_LASX" +{ + return "xvstx\t%u0,%1,%z2"; +} + [(set_attr "type" "simd_store") + (set_attr "mode" "DI")]) + +(define_insn "vec_widen_mult_even_v8si" + [(set (match_operand:V4DI 0 "register_operand" "=f") + (mult:V4DI + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 1 "register_operand" "%f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)]))) + (any_extend:V4DI + (vec_select:V4SI + (match_operand:V8SI 2 "register_operand" "f") + (parallel [(const_int 0) (const_int 2) + (const_int 4) (const_int 6)])))))] + "ISA_HAS_LASX" + "xvmulwev.d.w\t%u0,%u1,%u2" + [(set_attr "type" "simd_int_arith") + (set_attr "mode" "V4DI")]) + +;; Vector reduction operation +(define_expand "reduc_plus_scal_v4di" + [(match_operand:DI 0 "register_operand") + (match_operand:V4DI 1 "register_operand")] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (V4DImode); + rtx tmp1 = gen_reg_rtx (V4DImode); + rtx vec_res = gen_reg_rtx (V4DImode); + emit_insn (gen_lasx_xvhaddw_q_d (tmp, operands[1], operands[1])); + emit_insn (gen_lasx_xvpermi_d_v4di (tmp1, tmp, GEN_INT (2))); + emit_insn (gen_addv4di3 (vec_res, tmp, tmp1)); + emit_insn (gen_vec_extractv4didi (operands[0], vec_res, const0_rtx)); + DONE; +}) + +(define_expand "reduc_plus_scal_v8si" + [(match_operand:SI 0 "register_operand") + (match_operand:V8SI 1 "register_operand")] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (V4DImode); + rtx tmp1 = gen_reg_rtx (V4DImode); + rtx vec_res = gen_reg_rtx (V4DImode); + emit_insn (gen_lasx_xvhaddw_d_w (tmp, operands[1], operands[1])); + emit_insn (gen_lasx_xvhaddw_q_d (tmp1, tmp, tmp)); + emit_insn (gen_lasx_xvpermi_d_v4di (tmp, tmp1, GEN_INT (2))); + emit_insn (gen_addv4di3 (vec_res, tmp, tmp1)); + emit_insn (gen_vec_extractv8sisi (operands[0], gen_lowpart (V8SImode,vec_res), + const0_rtx)); + DONE; +}) + +(define_expand "reduc_plus_scal_" + [(match_operand: 0 "register_operand") + (match_operand:FLASX 1 "register_operand")] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_add3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc__scal_" + [(any_bitwise: + (match_operand: 0 "register_operand") + (match_operand:ILASX 1 "register_operand"))] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_smax_scal_" + [(match_operand: 0 "register_operand") + (match_operand:LASX 1 "register_operand")] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_smax3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_smin_scal_" + [(match_operand: 0 "register_operand") + (match_operand:LASX 1 "register_operand")] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_smin3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_umax_scal_" + [(match_operand: 0 "register_operand") + (match_operand:ILASX 1 "register_operand")] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_umax3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) + +(define_expand "reduc_umin_scal_" + [(match_operand: 0 "register_operand") + (match_operand:ILASX 1 "register_operand")] + "ISA_HAS_LASX" +{ + rtx tmp = gen_reg_rtx (mode); + loongarch_expand_vector_reduc (gen_umin3, tmp, operands[1]); + emit_insn (gen_vec_extract (operands[0], tmp, + const0_rtx)); + DONE; +}) diff --git a/gcc/config/loongarch/loongarch-modes.def b/gcc/config/loongarch/loongarch-modes.def index 6f57b60525d..68a829316f4 100644 --- a/gcc/config/loongarch/loongarch-modes.def +++ b/gcc/config/loongarch/loongarch-modes.def @@ -33,6 +33,7 @@ VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ VECTOR_MODES (FLOAT, 16); /* V4SF V2DF */ +/* For LARCH LASX 256 bits. */ VECTOR_MODES (INT, 32); /* V32QI V16HI V8SI V4DI */ VECTOR_MODES (FLOAT, 32); /* V8SF V4DF */ diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index fc33527cdcf..f4430d0d418 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -89,6 +89,8 @@ extern bool loongarch_split_move_insn_p (rtx, rtx); extern void loongarch_split_move_insn (rtx, rtx, rtx); extern void loongarch_split_128bit_move (rtx, rtx); extern bool loongarch_split_128bit_move_p (rtx, rtx); +extern void loongarch_split_256bit_move (rtx, rtx); +extern bool loongarch_split_256bit_move_p (rtx, rtx); extern void loongarch_split_lsx_copy_d (rtx, rtx, rtx, rtx (*)(rtx, rtx, rtx)); extern void loongarch_split_lsx_insert_d (rtx, rtx, rtx, rtx); extern void loongarch_split_lsx_fill_d (rtx, rtx); @@ -174,9 +176,11 @@ union loongarch_gen_fn_ptrs extern void loongarch_expand_atomic_qihi (union loongarch_gen_fn_ptrs, rtx, rtx, rtx, rtx, rtx); +extern void loongarch_expand_vector_group_init (rtx, rtx); extern void loongarch_expand_vector_init (rtx, rtx); extern void loongarch_expand_vec_unpack (rtx op[2], bool, bool); extern void loongarch_expand_vec_perm (rtx, rtx, rtx, rtx); +extern void loongarch_expand_vec_perm_1 (rtx[]); extern void loongarch_expand_vector_extract (rtx, rtx, int); extern void loongarch_expand_vector_reduc (rtx (*)(rtx, rtx, rtx), rtx, rtx); diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 7ffa6bbb73d..8406bf6228b 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1928,7 +1928,7 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, machine_mode mode) { /* LSX LD.* and ST.* cannot support loading symbols via an immediate operand. */ - if (LSX_SUPPORTED_MODE_P (mode)) + if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) return 0; switch (type) @@ -2061,6 +2061,11 @@ loongarch_valid_offset_p (rtx x, machine_mode mode) loongarch_ldst_scaled_shift (mode))) return false; + /* LASX XVLD.B and XVST.B supports 10-bit signed offsets without shift. */ + if (LASX_SUPPORTED_MODE_P (mode) + && !loongarch_signed_immediate_p (INTVAL (x), 10, 0)) + return false; + return true; } @@ -2274,7 +2279,9 @@ loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p) { struct loongarch_address_info addr; int factor; - bool lsx_p = !might_split_p && LSX_SUPPORTED_MODE_P (mode); + bool lsx_p = (!might_split_p + && (LSX_SUPPORTED_MODE_P (mode) + || LASX_SUPPORTED_MODE_P (mode))); if (!loongarch_classify_address (&addr, x, mode, false)) return 0; @@ -2420,7 +2427,8 @@ loongarch_const_insns (rtx x) return loongarch_integer_cost (INTVAL (x)); case CONST_VECTOR: - if (LSX_SUPPORTED_MODE_P (GET_MODE (x)) + if ((LSX_SUPPORTED_MODE_P (GET_MODE (x)) + || LASX_SUPPORTED_MODE_P (GET_MODE (x))) && loongarch_const_vector_same_int_p (x, GET_MODE (x), -512, 511)) return 1; /* Fall through. */ @@ -3259,10 +3267,11 @@ loongarch_legitimize_move (machine_mode mode, rtx dest, rtx src) /* Both src and dest are non-registers; one special case is supported where the source is (const_int 0) and the store can source the zero register. - LSX is never able to source the zero register directly in + LSX and LASX are never able to source the zero register directly in memory operations. */ if (!register_operand (dest, mode) && !register_operand (src, mode) - && (!const_0_operand (src, mode) || LSX_SUPPORTED_MODE_P (mode))) + && (!const_0_operand (src, mode) + || LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))) { loongarch_emit_move (dest, force_reg (mode, src)); return true; @@ -3844,6 +3853,7 @@ loongarch_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, int misalign ATTRIBUTE_UNUSED) { unsigned elements; + machine_mode mode = vectype != NULL ? TYPE_MODE (vectype) : DImode; switch (type_of_cost) { @@ -3860,7 +3870,8 @@ loongarch_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, return 1; case vec_perm: - return 1; + return LASX_SUPPORTED_MODE_P (mode) + && !LSX_SUPPORTED_MODE_P (mode) ? 2 : 1; case unaligned_load: case vector_gather_load: @@ -3941,6 +3952,10 @@ loongarch_split_move_p (rtx dest, rtx src) if (LSX_SUPPORTED_MODE_P (GET_MODE (dest))) return loongarch_split_128bit_move_p (dest, src); + /* Check if LASX moves need splitting. */ + if (LASX_SUPPORTED_MODE_P (GET_MODE (dest))) + return loongarch_split_256bit_move_p (dest, src); + /* Otherwise split all multiword moves. */ return size > UNITS_PER_WORD; } @@ -3956,6 +3971,8 @@ loongarch_split_move (rtx dest, rtx src, rtx insn_) gcc_checking_assert (loongarch_split_move_p (dest, src)); if (LSX_SUPPORTED_MODE_P (GET_MODE (dest))) loongarch_split_128bit_move (dest, src); + else if (LASX_SUPPORTED_MODE_P (GET_MODE (dest))) + loongarch_split_256bit_move (dest, src); else if (FP_REG_RTX_P (dest) || FP_REG_RTX_P (src)) { if (!TARGET_64BIT && GET_MODE (dest) == DImode) @@ -4121,7 +4138,7 @@ const char * loongarch_output_move_index_float (rtx x, machine_mode mode, bool ldr) { int index = exact_log2 (GET_MODE_SIZE (mode)); - if (!IN_RANGE (index, 2, 4)) + if (!IN_RANGE (index, 2, 5)) return NULL; struct loongarch_address_info info; @@ -4130,17 +4147,19 @@ loongarch_output_move_index_float (rtx x, machine_mode mode, bool ldr) || !loongarch_legitimate_address_p (mode, x, false)) return NULL; - const char *const insn[][3] = + const char *const insn[][4] = { { "fstx.s\t%1,%0", "fstx.d\t%1,%0", - "vstx\t%w1,%0" + "vstx\t%w1,%0", + "xvstx\t%u1,%0" }, { "fldx.s\t%0,%1", "fldx.d\t%0,%1", - "vldx\t%w0,%1" + "vldx\t%w0,%1", + "xvldx\t%u0,%1" } }; @@ -4174,6 +4193,34 @@ loongarch_split_128bit_move_p (rtx dest, rtx src) return true; } +/* Return true if a 256-bit move from SRC to DEST should be split. */ + +bool +loongarch_split_256bit_move_p (rtx dest, rtx src) +{ + /* LSX-to-LSX moves can be done in a single instruction. */ + if (FP_REG_RTX_P (src) && FP_REG_RTX_P (dest)) + return false; + + /* Check for LSX loads and stores. */ + if (FP_REG_RTX_P (dest) && MEM_P (src)) + return false; + if (FP_REG_RTX_P (src) && MEM_P (dest)) + return false; + + /* Check for LSX set to an immediate const vector with valid replicated + element. */ + if (FP_REG_RTX_P (dest) + && loongarch_const_vector_same_int_p (src, GET_MODE (src), -512, 511)) + return false; + + /* Check for LSX load zero immediate. */ + if (FP_REG_RTX_P (dest) && src == CONST0_RTX (GET_MODE (src))) + return false; + + return true; +} + /* Split a 128-bit move from SRC to DEST. */ void @@ -4265,6 +4312,97 @@ loongarch_split_128bit_move (rtx dest, rtx src) } } +/* Split a 256-bit move from SRC to DEST. */ + +void +loongarch_split_256bit_move (rtx dest, rtx src) +{ + int byte, index; + rtx low_dest, low_src, d, s; + + if (FP_REG_RTX_P (dest)) + { + gcc_assert (!MEM_P (src)); + + rtx new_dest = dest; + if (!TARGET_64BIT) + { + if (GET_MODE (dest) != V8SImode) + new_dest = simplify_gen_subreg (V8SImode, dest, GET_MODE (dest), 0); + } + else + { + if (GET_MODE (dest) != V4DImode) + new_dest = simplify_gen_subreg (V4DImode, dest, GET_MODE (dest), 0); + } + + for (byte = 0, index = 0; byte < GET_MODE_SIZE (GET_MODE (dest)); + byte += UNITS_PER_WORD, index++) + { + s = loongarch_subword_at_byte (src, byte); + if (!TARGET_64BIT) + emit_insn (gen_lasx_xvinsgr2vr_w (new_dest, s, new_dest, + GEN_INT (1 << index))); + else + emit_insn (gen_lasx_xvinsgr2vr_d (new_dest, s, new_dest, + GEN_INT (1 << index))); + } + } + else if (FP_REG_RTX_P (src)) + { + gcc_assert (!MEM_P (dest)); + + rtx new_src = src; + if (!TARGET_64BIT) + { + if (GET_MODE (src) != V8SImode) + new_src = simplify_gen_subreg (V8SImode, src, GET_MODE (src), 0); + } + else + { + if (GET_MODE (src) != V4DImode) + new_src = simplify_gen_subreg (V4DImode, src, GET_MODE (src), 0); + } + + for (byte = 0, index = 0; byte < GET_MODE_SIZE (GET_MODE (src)); + byte += UNITS_PER_WORD, index++) + { + d = loongarch_subword_at_byte (dest, byte); + if (!TARGET_64BIT) + emit_insn (gen_lsx_vpickve2gr_w (d, new_src, GEN_INT (index))); + else + emit_insn (gen_lsx_vpickve2gr_d (d, new_src, GEN_INT (index))); + } + } + else + { + low_dest = loongarch_subword_at_byte (dest, 0); + low_src = loongarch_subword_at_byte (src, 0); + gcc_assert (REG_P (low_dest) && REG_P (low_src)); + /* Make sure the source register is not written before reading. */ + if (REGNO (low_dest) <= REGNO (low_src)) + { + for (byte = 0; byte < GET_MODE_SIZE (TImode); + byte += UNITS_PER_WORD) + { + d = loongarch_subword_at_byte (dest, byte); + s = loongarch_subword_at_byte (src, byte); + loongarch_emit_move (d, s); + } + } + else + { + for (byte = GET_MODE_SIZE (TImode) - UNITS_PER_WORD; byte >= 0; + byte -= UNITS_PER_WORD) + { + d = loongarch_subword_at_byte (dest, byte); + s = loongarch_subword_at_byte (src, byte); + loongarch_emit_move (d, s); + } + } + } +} + /* Split a COPY_S.D with operands DEST, SRC and INDEX. GEN is a function used to generate subregs. */ @@ -4352,11 +4490,12 @@ loongarch_output_move (rtx dest, rtx src) machine_mode mode = GET_MODE (dest); bool dbl_p = (GET_MODE_SIZE (mode) == 8); bool lsx_p = LSX_SUPPORTED_MODE_P (mode); + bool lasx_p = LASX_SUPPORTED_MODE_P (mode); if (loongarch_split_move_p (dest, src)) return "#"; - if ((lsx_p) + if ((lsx_p || lasx_p) && dest_code == REG && FP_REG_P (REGNO (dest)) && src_code == CONST_VECTOR && CONST_INT_P (CONST_VECTOR_ELT (src, 0))) @@ -4366,6 +4505,8 @@ loongarch_output_move (rtx dest, rtx src) { case 16: return "vrepli.%v0\t%w0,%E1"; + case 32: + return "xvrepli.%v0\t%u0,%E1"; default: gcc_unreachable (); } } @@ -4380,13 +4521,15 @@ loongarch_output_move (rtx dest, rtx src) if (FP_REG_P (REGNO (dest))) { - if (lsx_p) + if (lsx_p || lasx_p) { gcc_assert (src == CONST0_RTX (GET_MODE (src))); switch (GET_MODE_SIZE (mode)) { case 16: return "vrepli.b\t%w0,0"; + case 32: + return "xvrepli.b\t%u0,0"; default: gcc_unreachable (); } @@ -4519,12 +4662,14 @@ loongarch_output_move (rtx dest, rtx src) { if (dest_code == REG && FP_REG_P (REGNO (dest))) { - if (lsx_p) + if (lsx_p || lasx_p) { switch (GET_MODE_SIZE (mode)) { case 16: return "vori.b\t%w0,%w1,0"; + case 32: + return "xvori.b\t%u0,%u1,0"; default: gcc_unreachable (); } @@ -4542,12 +4687,14 @@ loongarch_output_move (rtx dest, rtx src) if (insn) return insn; - if (lsx_p) + if (lsx_p || lasx_p) { switch (GET_MODE_SIZE (mode)) { case 16: return "vst\t%w1,%0"; + case 32: + return "xvst\t%u1,%0"; default: gcc_unreachable (); } @@ -4568,12 +4715,14 @@ loongarch_output_move (rtx dest, rtx src) if (insn) return insn; - if (lsx_p) + if (lsx_p || lasx_p) { switch (GET_MODE_SIZE (mode)) { case 16: return "vld\t%w0,%1"; + case 32: + return "xvld\t%u0,%1"; default: gcc_unreachable (); } @@ -5546,18 +5695,27 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part, 'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...), 'z' for (eq:?I ...), 'n' for (ne:?I ...). 't' Like 'T', but with the EQ/NE cases reversed - 'V' Print exact log2 of CONST_INT OP element 0 of a replicated - CONST_VECTOR in decimal. + 'F' Print the FPU branch condition for comparison OP. + 'W' Print the inverse of the FPU branch condition for comparison OP. + 'w' Print a LSX register. + 'u' Print a LASX register. + 'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...), + 'z' for (eq:?I ...), 'n' for (ne:?I ...). + 't' Like 'T', but with the EQ/NE cases reversed + 'Y' Print loongarch_fp_conditions[INTVAL (OP)] + 'Z' Print OP and a comma for 8CC, otherwise print nothing. + 'z' Print $0 if OP is zero, otherwise print OP normally. 'v' Print the insn size suffix b, h, w or d for vector modes V16QI, V8HI, V4SI, V2SI, and w, d for vector modes V4SF, V2DF respectively. + 'V' Print exact log2 of CONST_INT OP element 0 of a replicated + CONST_VECTOR in decimal. 'W' Print the inverse of the FPU branch condition for comparison OP. - 'w' Print a LSX register. 'X' Print CONST_INT OP in hexadecimal format. 'x' Print the low 16 bits of CONST_INT OP in hexadecimal format. 'Y' Print loongarch_fp_conditions[INTVAL (OP)] 'y' Print exact log2 of CONST_INT OP in decimal. 'Z' Print OP and a comma for 8CC, otherwise print nothing. - 'z' Print $r0 if OP is zero, otherwise print OP normally. */ + 'z' Print $0 if OP is zero, otherwise print OP normally. */ static void loongarch_print_operand (FILE *file, rtx op, int letter) @@ -5699,46 +5857,11 @@ loongarch_print_operand (FILE *file, rtx op, int letter) output_operand_lossage ("invalid use of '%%%c'", letter); break; - case 'v': - switch (GET_MODE (op)) - { - case E_V16QImode: - case E_V32QImode: - fprintf (file, "b"); - break; - case E_V8HImode: - case E_V16HImode: - fprintf (file, "h"); - break; - case E_V4SImode: - case E_V4SFmode: - case E_V8SImode: - case E_V8SFmode: - fprintf (file, "w"); - break; - case E_V2DImode: - case E_V2DFmode: - case E_V4DImode: - case E_V4DFmode: - fprintf (file, "d"); - break; - default: - output_operand_lossage ("invalid use of '%%%c'", letter); - } - break; - case 'W': loongarch_print_float_branch_condition (file, reverse_condition (code), letter); break; - case 'w': - if (code == REG && LSX_REG_P (REGNO (op))) - fprintf (file, "$vr%s", ®_names[REGNO (op)][2]); - else - output_operand_lossage ("invalid use of '%%%c'", letter); - break; - case 'x': if (CONST_INT_P (op)) fprintf (file, HOST_WIDE_INT_PRINT_HEX, INTVAL (op) & 0xffff); @@ -5780,6 +5903,48 @@ loongarch_print_operand (FILE *file, rtx op, int letter) fputc (',', file); break; + case 'w': + if (code == REG && LSX_REG_P (REGNO (op))) + fprintf (file, "$vr%s", ®_names[REGNO (op)][2]); + else + output_operand_lossage ("invalid use of '%%%c'", letter); + break; + + case 'u': + if (code == REG && LASX_REG_P (REGNO (op))) + fprintf (file, "$xr%s", ®_names[REGNO (op)][2]); + else + output_operand_lossage ("invalid use of '%%%c'", letter); + break; + + case 'v': + switch (GET_MODE (op)) + { + case E_V16QImode: + case E_V32QImode: + fprintf (file, "b"); + break; + case E_V8HImode: + case E_V16HImode: + fprintf (file, "h"); + break; + case E_V4SImode: + case E_V4SFmode: + case E_V8SImode: + case E_V8SFmode: + fprintf (file, "w"); + break; + case E_V2DImode: + case E_V2DFmode: + case E_V4DImode: + case E_V4DFmode: + fprintf (file, "d"); + break; + default: + output_operand_lossage ("invalid use of '%%%c'", letter); + } + break; + default: switch (code) { @@ -6110,13 +6275,18 @@ loongarch_hard_regno_mode_ok_uncached (unsigned int regno, machine_mode mode) size = GET_MODE_SIZE (mode); mclass = GET_MODE_CLASS (mode); - if (GP_REG_P (regno) && !LSX_SUPPORTED_MODE_P (mode)) + if (GP_REG_P (regno) && !LSX_SUPPORTED_MODE_P (mode) + && !LASX_SUPPORTED_MODE_P (mode)) return ((regno - GP_REG_FIRST) & 1) == 0 || size <= UNITS_PER_WORD; /* For LSX, allow TImode and 128-bit vector modes in all FPR. */ if (FP_REG_P (regno) && LSX_SUPPORTED_MODE_P (mode)) return true; + /* FIXED ME: For LASX, allow TImode and 256-bit vector modes in all FPR. */ + if (FP_REG_P (regno) && LASX_SUPPORTED_MODE_P (mode)) + return true; + if (FP_REG_P (regno)) { if (mclass == MODE_FLOAT @@ -6169,6 +6339,9 @@ loongarch_hard_regno_nregs (unsigned int regno, machine_mode mode) if (LSX_SUPPORTED_MODE_P (mode)) return 1; + if (LASX_SUPPORTED_MODE_P (mode)) + return 1; + return (GET_MODE_SIZE (mode) + UNITS_PER_FPREG - 1) / UNITS_PER_FPREG; } @@ -6198,7 +6371,10 @@ loongarch_class_max_nregs (enum reg_class rclass, machine_mode mode) { if (loongarch_hard_regno_mode_ok (FP_REG_FIRST, mode)) { - if (LSX_SUPPORTED_MODE_P (mode)) + /* Fixed me. */ + if (LASX_SUPPORTED_MODE_P (mode)) + size = MIN (size, UNITS_PER_LASX_REG); + else if (LSX_SUPPORTED_MODE_P (mode)) size = MIN (size, UNITS_PER_LSX_REG); else size = MIN (size, UNITS_PER_FPREG); @@ -6216,6 +6392,10 @@ static bool loongarch_can_change_mode_class (machine_mode from, machine_mode to, reg_class_t rclass) { + /* Allow conversions between different LSX/LASX vector modes. */ + if (LASX_SUPPORTED_MODE_P (from) && LASX_SUPPORTED_MODE_P (to)) + return true; + /* Allow conversions between different LSX vector modes. */ if (LSX_SUPPORTED_MODE_P (from) && LSX_SUPPORTED_MODE_P (to)) return true; @@ -6239,7 +6419,8 @@ loongarch_mode_ok_for_mov_fmt_p (machine_mode mode) return TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT; default: - return LSX_SUPPORTED_MODE_P (mode); + return ISA_HAS_LASX ? LASX_SUPPORTED_MODE_P (mode) + : LSX_SUPPORTED_MODE_P (mode); } } @@ -6441,7 +6622,8 @@ loongarch_valid_pointer_mode (scalar_int_mode mode) static bool loongarch_vector_mode_supported_p (machine_mode mode) { - return LSX_SUPPORTED_MODE_P (mode); + return ISA_HAS_LASX ? LASX_SUPPORTED_MODE_P (mode) + : LSX_SUPPORTED_MODE_P (mode); } /* Implement TARGET_SCALAR_MODE_SUPPORTED_P. */ @@ -6467,19 +6649,19 @@ loongarch_preferred_simd_mode (scalar_mode mode) switch (mode) { case E_QImode: - return E_V16QImode; + return ISA_HAS_LASX ? E_V32QImode : E_V16QImode; case E_HImode: - return E_V8HImode; + return ISA_HAS_LASX ? E_V16HImode : E_V8HImode; case E_SImode: - return E_V4SImode; + return ISA_HAS_LASX ? E_V8SImode : E_V4SImode; case E_DImode: - return E_V2DImode; + return ISA_HAS_LASX ? E_V4DImode : E_V2DImode; case E_SFmode: - return E_V4SFmode; + return ISA_HAS_LASX ? E_V8SFmode : E_V4SFmode; case E_DFmode: - return E_V2DFmode; + return ISA_HAS_LASX ? E_V4DFmode : E_V2DFmode; default: break; @@ -6490,7 +6672,12 @@ loongarch_preferred_simd_mode (scalar_mode mode) static unsigned int loongarch_autovectorize_vector_modes (vector_modes *modes, bool) { - if (ISA_HAS_LSX) + if (ISA_HAS_LASX) + { + modes->safe_push (V32QImode); + modes->safe_push (V16QImode); + } + else if (ISA_HAS_LSX) { modes->safe_push (V16QImode); } @@ -6670,11 +6857,18 @@ const char * loongarch_lsx_output_division (const char *division, rtx *operands) { const char *s; + machine_mode mode = GET_MODE (*operands); s = division; if (TARGET_CHECK_ZERO_DIV) { - if (ISA_HAS_LSX) + if (ISA_HAS_LASX && GET_MODE_SIZE (mode) == 32) + { + output_asm_insn ("xvsetallnez.%v0\t$fcc7,%u2",operands); + output_asm_insn (s, operands); + output_asm_insn ("bcnez\t$fcc7,1f", operands); + } + else if (ISA_HAS_LSX) { output_asm_insn ("vsetallnez.%v0\t$fcc7,%w2",operands); output_asm_insn (s, operands); @@ -7502,7 +7696,7 @@ loongarch_expand_lsx_shuffle (struct expand_vec_perm_d *d) rtx_insn *insn; unsigned i; - if (!ISA_HAS_LSX) + if (!ISA_HAS_LSX && !ISA_HAS_LASX) return false; for (i = 0; i < d->nelt; i++) @@ -7526,40 +7720,413 @@ loongarch_expand_lsx_shuffle (struct expand_vec_perm_d *d) return true; } -void -loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) +/* Try to simplify a two vector permutation using 2 intra-lane interleave + insns and cross-lane shuffle for 32-byte vectors. */ + +static bool +loongarch_expand_vec_perm_interleave (struct expand_vec_perm_d *d) { - machine_mode vmode = GET_MODE (target); + unsigned i, nelt; + rtx t1,t2,t3; + rtx (*gen_high) (rtx, rtx, rtx); + rtx (*gen_low) (rtx, rtx, rtx); + machine_mode mode = GET_MODE (d->target); - switch (vmode) + if (d->one_vector_p) + return false; + if (TARGET_LASX && GET_MODE_SIZE (d->vmode) == 32) + ; + else + return false; + + nelt = d->nelt; + if (d->perm[0] != 0 && d->perm[0] != nelt / 2) + return false; + for (i = 0; i < nelt; i += 2) + if (d->perm[i] != d->perm[0] + i / 2 + || d->perm[i + 1] != d->perm[0] + i / 2 + nelt) + return false; + + if (d->testing_p) + return true; + + switch (d->vmode) { - case E_V16QImode: - emit_insn (gen_lsx_vshuf_b (target, op1, op0, sel)); + case E_V32QImode: + gen_high = gen_lasx_xvilvh_b; + gen_low = gen_lasx_xvilvl_b; break; - case E_V2DFmode: - emit_insn (gen_lsx_vshuf_d_f (target, sel, op1, op0)); + case E_V16HImode: + gen_high = gen_lasx_xvilvh_h; + gen_low = gen_lasx_xvilvl_h; break; - case E_V2DImode: - emit_insn (gen_lsx_vshuf_d (target, sel, op1, op0)); + case E_V8SImode: + gen_high = gen_lasx_xvilvh_w; + gen_low = gen_lasx_xvilvl_w; break; - case E_V4SFmode: - emit_insn (gen_lsx_vshuf_w_f (target, sel, op1, op0)); + case E_V4DImode: + gen_high = gen_lasx_xvilvh_d; + gen_low = gen_lasx_xvilvl_d; break; - case E_V4SImode: - emit_insn (gen_lsx_vshuf_w (target, sel, op1, op0)); + case E_V8SFmode: + gen_high = gen_lasx_xvilvh_w_f; + gen_low = gen_lasx_xvilvl_w_f; break; - case E_V8HImode: - emit_insn (gen_lsx_vshuf_h (target, sel, op1, op0)); + case E_V4DFmode: + gen_high = gen_lasx_xvilvh_d_f; + gen_low = gen_lasx_xvilvl_d_f; break; default: - break; + gcc_unreachable (); + } + + t1 = gen_reg_rtx (mode); + t2 = gen_reg_rtx (mode); + emit_insn (gen_high (t1, d->op0, d->op1)); + emit_insn (gen_low (t2, d->op0, d->op1)); + if (mode == V4DFmode || mode == V8SFmode) + { + t3 = gen_reg_rtx (V4DFmode); + if (d->perm[0]) + emit_insn (gen_lasx_xvpermi_q_v4df (t3, gen_lowpart (V4DFmode, t1), + gen_lowpart (V4DFmode, t2), + GEN_INT (0x31))); + else + emit_insn (gen_lasx_xvpermi_q_v4df (t3, gen_lowpart (V4DFmode, t1), + gen_lowpart (V4DFmode, t2), + GEN_INT (0x20))); } + else + { + t3 = gen_reg_rtx (V4DImode); + if (d->perm[0]) + emit_insn (gen_lasx_xvpermi_q_v4di (t3, gen_lowpart (V4DImode, t1), + gen_lowpart (V4DImode, t2), + GEN_INT (0x31))); + else + emit_insn (gen_lasx_xvpermi_q_v4di (t3, gen_lowpart (V4DImode, t1), + gen_lowpart (V4DImode, t2), + GEN_INT (0x20))); + } + emit_move_insn (d->target, gen_lowpart (mode, t3)); + return true; } +/* Implement extract-even and extract-odd permutations. */ + static bool -loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d) +loongarch_expand_vec_perm_even_odd_1 (struct expand_vec_perm_d *d, unsigned odd) { - int i; + rtx t1; + machine_mode mode = GET_MODE (d->target); + t1 = gen_reg_rtx (mode); + + if (d->testing_p) + return true; + + switch (d->vmode) + { + case E_V4DFmode: + /* Shuffle the lanes around into { 0 4 2 6 } and { 1 5 3 7 }. */ + if (odd) + emit_insn (gen_lasx_xvilvh_d_f (t1, d->op0, d->op1)); + else + emit_insn (gen_lasx_xvilvl_d_f (t1, d->op0, d->op1)); + + /* Shuffle within the 256-bit lanes to produce the result required. + { 0 2 4 6 } | { 1 3 5 7 }. */ + emit_insn (gen_lasx_xvpermi_d_v4df (d->target, t1, GEN_INT (0xd8))); + break; + + case E_V4DImode: + if (odd) + emit_insn (gen_lasx_xvilvh_d (t1, d->op0, d->op1)); + else + emit_insn (gen_lasx_xvilvl_d (t1, d->op0, d->op1)); + + emit_insn (gen_lasx_xvpermi_d_v4di (d->target, t1, GEN_INT (0xd8))); + break; + + case E_V8SFmode: + /* Shuffle the lanes around into: + { 0 2 8 a 4 6 c e } | { 1 3 9 b 5 7 d f }. */ + if (odd) + emit_insn (gen_lasx_xvpickod_w_f (t1, d->op0, d->op1)); + else + emit_insn (gen_lasx_xvpickev_w_f (t1, d->op0, d->op1)); + + /* Shuffle within the 256-bit lanes to produce the result required. + { 0 2 4 6 8 a c e } | { 1 3 5 7 9 b d f }. */ + emit_insn (gen_lasx_xvpermi_d_v8sf (d->target, t1, GEN_INT (0xd8))); + break; + + case E_V8SImode: + if (odd) + emit_insn (gen_lasx_xvpickod_w (t1, d->op0, d->op1)); + else + emit_insn (gen_lasx_xvpickev_w (t1, d->op0, d->op1)); + + emit_insn (gen_lasx_xvpermi_d_v8si (d->target, t1, GEN_INT (0xd8))); + break; + + case E_V16HImode: + if (odd) + emit_insn (gen_lasx_xvpickod_h (t1, d->op0, d->op1)); + else + emit_insn (gen_lasx_xvpickev_h (t1, d->op0, d->op1)); + + emit_insn (gen_lasx_xvpermi_d_v16hi (d->target, t1, GEN_INT (0xd8))); + break; + + case E_V32QImode: + if (odd) + emit_insn (gen_lasx_xvpickod_b (t1, d->op0, d->op1)); + else + emit_insn (gen_lasx_xvpickev_b (t1, d->op0, d->op1)); + + emit_insn (gen_lasx_xvpermi_d_v32qi (d->target, t1, GEN_INT (0xd8))); + break; + + default: + gcc_unreachable (); + } + + return true; +} + +/* Pattern match extract-even and extract-odd permutations. */ + +static bool +loongarch_expand_vec_perm_even_odd (struct expand_vec_perm_d *d) +{ + unsigned i, odd, nelt = d->nelt; + if (!TARGET_LASX) + return false; + + odd = d->perm[0]; + if (odd != 0 && odd != 1) + return false; + + for (i = 1; i < nelt; ++i) + if (d->perm[i] != 2 * i + odd) + return false; + + return loongarch_expand_vec_perm_even_odd_1 (d, odd); +} + +/* Expand a variable vector permutation for LASX. */ + +void +loongarch_expand_vec_perm_1 (rtx operands[]) +{ + rtx target = operands[0]; + rtx op0 = operands[1]; + rtx op1 = operands[2]; + rtx mask = operands[3]; + + bool one_operand_shuffle = rtx_equal_p (op0, op1); + rtx t1 = NULL; + rtx t2 = NULL; + rtx t3, t4, t5, t6, vt = NULL; + rtx vec[32] = {NULL}; + machine_mode mode = GET_MODE (op0); + machine_mode maskmode = GET_MODE (mask); + int w, i; + + /* Number of elements in the vector. */ + w = GET_MODE_NUNITS (mode); + + if (mode == V4DImode || mode == V4DFmode) + { + maskmode = mode = V8SImode; + w = 8; + t1 = gen_reg_rtx (maskmode); + + /* Replicate the low bits of the V4DImode mask into V8SImode: + mask = { A B C D } + t1 = { A A B B C C D D }. */ + for (i = 0; i < w / 2; ++i) + vec[i*2 + 1] = vec[i*2] = GEN_INT (i * 2); + vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec)); + vt = force_reg (maskmode, vt); + mask = gen_lowpart (maskmode, mask); + emit_insn (gen_lasx_xvperm_w (t1, mask, vt)); + + /* Multiply the shuffle indicies by two. */ + t1 = expand_simple_binop (maskmode, PLUS, t1, t1, t1, 1, + OPTAB_DIRECT); + + /* Add one to the odd shuffle indicies: + t1 = { A*2, A*2+1, B*2, B*2+1, ... }. */ + for (i = 0; i < w / 2; ++i) + { + vec[i * 2] = const0_rtx; + vec[i * 2 + 1] = const1_rtx; + } + vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec)); + vt = validize_mem (force_const_mem (maskmode, vt)); + t1 = expand_simple_binop (maskmode, PLUS, t1, vt, t1, 1, + OPTAB_DIRECT); + + /* Continue as if V8SImode (resp. V32QImode) was used initially. */ + operands[3] = mask = t1; + target = gen_reg_rtx (mode); + op0 = gen_lowpart (mode, op0); + op1 = gen_lowpart (mode, op1); + } + + switch (mode) + { + case E_V8SImode: + if (one_operand_shuffle) + { + emit_insn (gen_lasx_xvperm_w (target, op0, mask)); + if (target != operands[0]) + emit_move_insn (operands[0], + gen_lowpart (GET_MODE (operands[0]), target)); + } + else + { + t1 = gen_reg_rtx (V8SImode); + t2 = gen_reg_rtx (V8SImode); + emit_insn (gen_lasx_xvperm_w (t1, op0, mask)); + emit_insn (gen_lasx_xvperm_w (t2, op1, mask)); + goto merge_two; + } + return; + + case E_V8SFmode: + mask = gen_lowpart (V8SImode, mask); + if (one_operand_shuffle) + emit_insn (gen_lasx_xvperm_w_f (target, op0, mask)); + else + { + t1 = gen_reg_rtx (V8SFmode); + t2 = gen_reg_rtx (V8SFmode); + emit_insn (gen_lasx_xvperm_w_f (t1, op0, mask)); + emit_insn (gen_lasx_xvperm_w_f (t2, op1, mask)); + goto merge_two; + } + return; + + case E_V16HImode: + if (one_operand_shuffle) + { + t1 = gen_reg_rtx (V16HImode); + t2 = gen_reg_rtx (V16HImode); + emit_insn (gen_lasx_xvpermi_d_v16hi (t1, op0, GEN_INT (0x44))); + emit_insn (gen_lasx_xvpermi_d_v16hi (t2, op0, GEN_INT (0xee))); + emit_insn (gen_lasx_xvshuf_h (target, mask, t2, t1)); + } + else + { + t1 = gen_reg_rtx (V16HImode); + t2 = gen_reg_rtx (V16HImode); + t3 = gen_reg_rtx (V16HImode); + t4 = gen_reg_rtx (V16HImode); + t5 = gen_reg_rtx (V16HImode); + t6 = gen_reg_rtx (V16HImode); + emit_insn (gen_lasx_xvpermi_d_v16hi (t3, op0, GEN_INT (0x44))); + emit_insn (gen_lasx_xvpermi_d_v16hi (t4, op0, GEN_INT (0xee))); + emit_insn (gen_lasx_xvshuf_h (t1, mask, t4, t3)); + emit_insn (gen_lasx_xvpermi_d_v16hi (t5, op1, GEN_INT (0x44))); + emit_insn (gen_lasx_xvpermi_d_v16hi (t6, op1, GEN_INT (0xee))); + emit_insn (gen_lasx_xvshuf_h (t2, mask, t6, t5)); + goto merge_two; + } + return; + + case E_V32QImode: + if (one_operand_shuffle) + { + t1 = gen_reg_rtx (V32QImode); + t2 = gen_reg_rtx (V32QImode); + emit_insn (gen_lasx_xvpermi_d_v32qi (t1, op0, GEN_INT (0x44))); + emit_insn (gen_lasx_xvpermi_d_v32qi (t2, op0, GEN_INT (0xee))); + emit_insn (gen_lasx_xvshuf_b (target, t2, t1, mask)); + } + else + { + t1 = gen_reg_rtx (V32QImode); + t2 = gen_reg_rtx (V32QImode); + t3 = gen_reg_rtx (V32QImode); + t4 = gen_reg_rtx (V32QImode); + t5 = gen_reg_rtx (V32QImode); + t6 = gen_reg_rtx (V32QImode); + emit_insn (gen_lasx_xvpermi_d_v32qi (t3, op0, GEN_INT (0x44))); + emit_insn (gen_lasx_xvpermi_d_v32qi (t4, op0, GEN_INT (0xee))); + emit_insn (gen_lasx_xvshuf_b (t1, t4, t3, mask)); + emit_insn (gen_lasx_xvpermi_d_v32qi (t5, op1, GEN_INT (0x44))); + emit_insn (gen_lasx_xvpermi_d_v32qi (t6, op1, GEN_INT (0xee))); + emit_insn (gen_lasx_xvshuf_b (t2, t6, t5, mask)); + goto merge_two; + } + return; + + default: + gcc_assert (GET_MODE_SIZE (mode) == 32); + break; + } + +merge_two: + /* Then merge them together. The key is whether any given control + element contained a bit set that indicates the second word. */ + rtx xops[6]; + mask = operands[3]; + vt = GEN_INT (w); + vt = gen_const_vec_duplicate (maskmode, vt); + vt = force_reg (maskmode, vt); + mask = expand_simple_binop (maskmode, AND, mask, vt, + NULL_RTX, 0, OPTAB_DIRECT); + if (GET_MODE (target) != mode) + target = gen_reg_rtx (mode); + xops[0] = target; + xops[1] = gen_lowpart (mode, t2); + xops[2] = gen_lowpart (mode, t1); + xops[3] = gen_rtx_EQ (maskmode, mask, vt); + xops[4] = mask; + xops[5] = vt; + + loongarch_expand_vec_cond_expr (mode, maskmode, xops); + if (target != operands[0]) + emit_move_insn (operands[0], + gen_lowpart (GET_MODE (operands[0]), target)); +} + +void +loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) +{ + machine_mode vmode = GET_MODE (target); + + switch (vmode) + { + case E_V16QImode: + emit_insn (gen_lsx_vshuf_b (target, op1, op0, sel)); + break; + case E_V2DFmode: + emit_insn (gen_lsx_vshuf_d_f (target, sel, op1, op0)); + break; + case E_V2DImode: + emit_insn (gen_lsx_vshuf_d (target, sel, op1, op0)); + break; + case E_V4SFmode: + emit_insn (gen_lsx_vshuf_w_f (target, sel, op1, op0)); + break; + case E_V4SImode: + emit_insn (gen_lsx_vshuf_w (target, sel, op1, op0)); + break; + case E_V8HImode: + emit_insn (gen_lsx_vshuf_h (target, sel, op1, op0)); + break; + default: + break; + } +} + +static bool +loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d) +{ + int i; rtx target, op0, op1, sel, tmp; rtx rperm[MAX_VECT_LEN]; @@ -7660,25 +8227,1302 @@ loongarch_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) return true; } - if (loongarch_expand_lsx_shuffle (d)) - return true; - return false; -} - -/* Implementation of constant vector permuatation. This function identifies - * recognized pattern of permuation selector argument, and use one or more - * instruction(s) to finish the permutation job correctly. For unsupported - * patterns, it will return false. */ - -static bool -loongarch_expand_vec_perm_const_2 (struct expand_vec_perm_d *d) -{ - /* Although we have the LSX vec_perm template, there's still some - 128bit vector permuatation operations send to vectorize_vec_perm_const. - In this case, we just simpliy wrap them by single vshuf.* instruction, - because LSX vshuf.* instruction just have the same behavior that GCC - expects. */ - return loongarch_try_expand_lsx_vshuf_const (d); + if (loongarch_expand_lsx_shuffle (d)) + return true; + if (loongarch_expand_vec_perm_even_odd (d)) + return true; + if (loongarch_expand_vec_perm_interleave (d)) + return true; + return false; +} + +/* Following are the assist function for const vector permutation support. */ +static bool +loongarch_is_quad_duplicate (struct expand_vec_perm_d *d) +{ + if (d->perm[0] >= d->nelt / 2) + return false; + + bool result = true; + unsigned char lhs = d->perm[0]; + unsigned char rhs = d->perm[d->nelt / 2]; + + if ((rhs - lhs) != d->nelt / 2) + return false; + + for (int i = 1; i < d->nelt; i += 1) + { + if ((i < d->nelt / 2) && (d->perm[i] != lhs)) + { + result = false; + break; + } + if ((i > d->nelt / 2) && (d->perm[i] != rhs)) + { + result = false; + break; + } + } + + return result; +} + +static bool +loongarch_is_double_duplicate (struct expand_vec_perm_d *d) +{ + if (!d->one_vector_p) + return false; + + if (d->nelt < 8) + return false; + + bool result = true; + unsigned char buf = d->perm[0]; + + for (int i = 1; i < d->nelt; i += 2) + { + if (d->perm[i] != buf) + { + result = false; + break; + } + if (d->perm[i - 1] != d->perm[i]) + { + result = false; + break; + } + buf += d->nelt / 4; + } + + return result; +} + +static bool +loongarch_is_odd_extraction (struct expand_vec_perm_d *d) +{ + bool result = true; + unsigned char buf = 1; + + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 2; + } + + return result; +} + +static bool +loongarch_is_even_extraction (struct expand_vec_perm_d *d) +{ + bool result = true; + unsigned char buf = 0; + + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + + return result; +} + +static bool +loongarch_is_extraction_permutation (struct expand_vec_perm_d *d) +{ + bool result = true; + unsigned char buf = d->perm[0]; + + if (buf != 0 || buf != d->nelt) + return false; + + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 2; + } + + return result; +} + +static bool +loongarch_is_center_extraction (struct expand_vec_perm_d *d) +{ + bool result = true; + unsigned buf = d->nelt / 2; + + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + + return result; +} + +static bool +loongarch_is_reversing_permutation (struct expand_vec_perm_d *d) +{ + if (!d->one_vector_p) + return false; + + bool result = true; + unsigned char buf = d->nelt - 1; + + for (int i = 0; i < d->nelt; i += 1) + { + if (d->perm[i] != buf) + { + result = false; + break; + } + + buf -= 1; + } + + return result; +} + +static bool +loongarch_is_di_misalign_extract (struct expand_vec_perm_d *d) +{ + if (d->nelt != 4 && d->nelt != 8) + return false; + + bool result = true; + unsigned char buf; + + if (d->nelt == 4) + { + buf = 1; + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + + buf += 1; + } + } + else if (d->nelt == 8) + { + buf = 2; + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + + buf += 1; + } + } + + return result; +} + +static bool +loongarch_is_si_misalign_extract (struct expand_vec_perm_d *d) +{ + if (d->vmode != E_V8SImode && d->vmode != E_V8SFmode) + return false; + bool result = true; + unsigned char buf = 1; + + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + + return result; +} + +static bool +loongarch_is_lasx_lowpart_interleave (struct expand_vec_perm_d *d) +{ + bool result = true; + unsigned char buf = 0; + + for (int i = 0;i < d->nelt; i += 2) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + + if (result) + { + buf = d->nelt; + for (int i = 1; i < d->nelt; i += 2) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + } + + return result; +} + +static bool +loongarch_is_lasx_lowpart_interleave_2 (struct expand_vec_perm_d *d) +{ + if (d->vmode != E_V32QImode) + return false; + bool result = true; + unsigned char buf = 0; + +#define COMPARE_SELECTOR(INIT, BEGIN, END) \ + buf = INIT; \ + for (int i = BEGIN; i < END && result; i += 1) \ + { \ + if (buf != d->perm[i]) \ + { \ + result = false; \ + break; \ + } \ + buf += 1; \ + } + + COMPARE_SELECTOR (0, 0, 8); + COMPARE_SELECTOR (32, 8, 16); + COMPARE_SELECTOR (8, 16, 24); + COMPARE_SELECTOR (40, 24, 32); + +#undef COMPARE_SELECTOR + return result; +} + +static bool +loongarch_is_lasx_lowpart_extract (struct expand_vec_perm_d *d) +{ + bool result = true; + unsigned char buf = 0; + + for (int i = 0; i < d->nelt / 2; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + + if (result) + { + buf = d->nelt; + for (int i = d->nelt / 2; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + } + + return result; +} + +static bool +loongarch_is_lasx_highpart_interleave (expand_vec_perm_d *d) +{ + bool result = true; + unsigned char buf = d->nelt / 2; + + for (int i = 0; i < d->nelt; i += 2) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + + if (result) + { + buf = d->nelt + d->nelt / 2; + for (int i = 1; i < d->nelt;i += 2) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + buf += 1; + } + } + + return result; +} + +static bool +loongarch_is_lasx_highpart_interleave_2 (struct expand_vec_perm_d *d) +{ + if (d->vmode != E_V32QImode) + return false; + + bool result = true; + unsigned char buf = 0; + +#define COMPARE_SELECTOR(INIT, BEGIN, END) \ + buf = INIT; \ + for (int i = BEGIN; i < END && result; i += 1) \ + { \ + if (buf != d->perm[i]) \ + { \ + result = false; \ + break; \ + } \ + buf += 1; \ + } + + COMPARE_SELECTOR (16, 0, 8); + COMPARE_SELECTOR (48, 8, 16); + COMPARE_SELECTOR (24, 16, 24); + COMPARE_SELECTOR (56, 24, 32); + +#undef COMPARE_SELECTOR + return result; +} + +static bool +loongarch_is_elem_duplicate (struct expand_vec_perm_d *d) +{ + bool result = true; + unsigned char buf = d->perm[0]; + + for (int i = 0; i < d->nelt; i += 1) + { + if (buf != d->perm[i]) + { + result = false; + break; + } + } + + return result; +} + +inline bool +loongarch_is_op_reverse_perm (struct expand_vec_perm_d *d) +{ + return (d->vmode == E_V4DFmode) + && d->perm[0] == 2 && d->perm[1] == 3 + && d->perm[2] == 0 && d->perm[3] == 1; +} + +static bool +loongarch_is_single_op_perm (struct expand_vec_perm_d *d) +{ + bool result = true; + + for (int i = 0; i < d->nelt; i += 1) + { + if (d->perm[i] >= d->nelt) + { + result = false; + break; + } + } + + return result; +} + +static bool +loongarch_is_divisible_perm (struct expand_vec_perm_d *d) +{ + bool result = true; + + for (int i = 0; i < d->nelt / 2; i += 1) + { + if (d->perm[i] >= d->nelt) + { + result = false; + break; + } + } + + if (result) + { + for (int i = d->nelt / 2; i < d->nelt; i += 1) + { + if (d->perm[i] < d->nelt) + { + result = false; + break; + } + } + } + + return result; +} + +inline bool +loongarch_is_triple_stride_extract (struct expand_vec_perm_d *d) +{ + return (d->vmode == E_V4DImode || d->vmode == E_V4DFmode) + && d->perm[0] == 1 && d->perm[1] == 4 + && d->perm[2] == 7 && d->perm[3] == 0; +} + +/* In LASX, some permutation insn does not have the behavior that gcc expects + * when compiler wants to emit a vector permutation. + * + * 1. What GCC provides via vectorize_vec_perm_const ()'s paramater: + * When GCC wants to performs a vector permutation, it provides two op + * reigster, one target register, and a selector. + * In const vector permutation case, GCC provides selector as a char array + * that contains original value; in variable vector permuatation + * (performs via vec_perm insn template), it provides a vector register. + * We assume that nelt is the elements numbers inside single vector in current + * 256bit vector mode. + * + * 2. What GCC expects to perform: + * Two op registers (op0, op1) will "combine" into a 512bit temp vector storage + * that has 2*nelt elements inside it; the low 256bit is op0, and high 256bit + * is op1, then the elements are indexed as below: + * 0 ~ nelt - 1 nelt ~ 2 * nelt - 1 + * |-------------------------|-------------------------| + * Low 256bit (op0) High 256bit (op1) + * For example, the second element in op1 (V8SImode) will be indexed with 9. + * Selector is a vector that has the same mode and number of elements with + * op0,op1 and target, it's look like this: + * 0 ~ nelt - 1 + * |-------------------------| + * 256bit (selector) + * It describes which element from 512bit temp vector storage will fit into + * target's every element slot. + * GCC expects that every element in selector can be ANY indices of 512bit + * vector storage (Selector can pick literally any element from op0 and op1, and + * then fits into any place of target register). This is also what LSX 128bit + * vshuf.* instruction do similarly, so we can handle 128bit vector permutation + * by single instruction easily. + * + * 3. What LASX permutation instruction does: + * In short, it just execute two independent 128bit vector permuatation, and + * it's the reason that we need to do the jobs below. We will explain it. + * op0, op1, target, and selector will be separate into high 128bit and low + * 128bit, and do permutation as the description below: + * + * a) op0's low 128bit and op1's low 128bit "combines" into a 256bit temp + * vector storage (TVS1), elements are indexed as below: + * 0 ~ nelt / 2 - 1 nelt / 2 ~ nelt - 1 + * |---------------------|---------------------| TVS1 + * op0's low 128bit op1's low 128bit + * op0's high 128bit and op1's high 128bit are "combined" into TVS2 in the + * same way. + * 0 ~ nelt / 2 - 1 nelt / 2 ~ nelt - 1 + * |---------------------|---------------------| TVS2 + * op0's high 128bit op1's high 128bit + * b) Selector's low 128bit describes which elements from TVS1 will fit into + * target vector's low 128bit. No TVS2 elements are allowed. + * c) Selector's high 128bit describes which elements from TVS2 will fit into + * target vector's high 128bit. No TVS1 elements are allowed. + * + * As we can see, if we want to handle vector permutation correctly, we can + * achieve it in three ways: + * a) Modify selector's elements, to make sure that every elements can inform + * correct value that will put into target vector. + b) Generate extra instruction before/after permutation instruction, for + adjusting op vector or target vector, to make sure target vector's value is + what GCC expects. + c) Use other instructions to process op and put correct result into target. + */ + +/* Implementation of constant vector permuatation. This function identifies + * recognized pattern of permuation selector argument, and use one or more + * instruction(s) to finish the permutation job correctly. For unsupported + * patterns, it will return false. */ + +static bool +loongarch_expand_vec_perm_const_2 (struct expand_vec_perm_d *d) +{ + /* Although we have the LSX vec_perm template, there's still some + 128bit vector permuatation operations send to vectorize_vec_perm_const. + In this case, we just simpliy wrap them by single vshuf.* instruction, + because LSX vshuf.* instruction just have the same behavior that GCC + expects. */ + if (GET_MODE_SIZE (d->vmode) == 16) + return loongarch_try_expand_lsx_vshuf_const (d); + else + return false; + + bool ok = false, reverse_hi_lo = false, extract_ev_od = false, + use_alt_op = false; + unsigned char idx; + int i; + rtx target, op0, op1, sel, tmp; + rtx op0_alt = NULL_RTX, op1_alt = NULL_RTX; + rtx rperm[MAX_VECT_LEN]; + unsigned int remapped[MAX_VECT_LEN]; + + /* Try to figure out whether is a recognized permutation selector pattern, if + yes, we will reassign some elements with new value in selector argument, + and in some cases we will generate some assist insn to complete the + permutation. (Even in some cases, we use other insn to impl permutation + instead of xvshuf!) + + Make sure to check d->testing_p is false everytime if you want to emit new + insn, unless you want to crash into ICE directly. */ + if (loongarch_is_quad_duplicate (d)) + { + /* Selector example: E_V8SImode, { 0, 0, 0, 0, 4, 4, 4, 4 } + copy first elem from original selector to all elem in new selector. */ + idx = d->perm[0]; + for (i = 0; i < d->nelt; i += 1) + { + remapped[i] = idx; + } + /* Selector after: { 0, 0, 0, 0, 0, 0, 0, 0 }. */ + } + else if (loongarch_is_double_duplicate (d)) + { + /* Selector example: E_V8SImode, { 1, 1, 3, 3, 5, 5, 7, 7 } + one_vector_p == true. */ + for (i = 0; i < d->nelt / 2; i += 1) + { + idx = d->perm[i]; + remapped[i] = idx; + remapped[i + d->nelt / 2] = idx; + } + /* Selector after: { 1, 1, 3, 3, 1, 1, 3, 3 }. */ + } + else if (loongarch_is_odd_extraction (d) + || loongarch_is_even_extraction (d)) + { + /* Odd extraction selector sample: E_V4DImode, { 1, 3, 5, 7 } + Selector after: { 1, 3, 1, 3 }. + Even extraction selector sample: E_V4DImode, { 0, 2, 4, 6 } + Selector after: { 0, 2, 0, 2 }. */ + for (i = 0; i < d->nelt / 2; i += 1) + { + idx = d->perm[i]; + remapped[i] = idx; + remapped[i + d->nelt / 2] = idx; + } + /* Additional insn is required for correct result. See codes below. */ + extract_ev_od = true; + } + else if (loongarch_is_extraction_permutation (d)) + { + /* Selector sample: E_V8SImode, { 0, 1, 2, 3, 4, 5, 6, 7 }. */ + if (d->perm[0] == 0) + { + for (i = 0; i < d->nelt / 2; i += 1) + { + remapped[i] = i; + remapped[i + d->nelt / 2] = i; + } + } + else + { + /* { 8, 9, 10, 11, 12, 13, 14, 15 }. */ + for (i = 0; i < d->nelt / 2; i += 1) + { + idx = i + d->nelt / 2; + remapped[i] = idx; + remapped[i + d->nelt / 2] = idx; + } + } + /* Selector after: { 0, 1, 2, 3, 0, 1, 2, 3 } + { 8, 9, 10, 11, 8, 9, 10, 11 } */ + } + else if (loongarch_is_center_extraction (d)) + { + /* sample: E_V4DImode, { 2, 3, 4, 5 } + In this condition, we can just copy high 128bit of op0 and low 128bit + of op1 to the target register by using xvpermi.q insn. */ + if (!d->testing_p) + { + emit_move_insn (d->target, d->op1); + switch (d->vmode) + { + case E_V4DImode: + emit_insn (gen_lasx_xvpermi_q_v4di (d->target, d->target, + d->op0, GEN_INT (0x21))); + break; + case E_V4DFmode: + emit_insn (gen_lasx_xvpermi_q_v4df (d->target, d->target, + d->op0, GEN_INT (0x21))); + break; + case E_V8SImode: + emit_insn (gen_lasx_xvpermi_q_v8si (d->target, d->target, + d->op0, GEN_INT (0x21))); + break; + case E_V8SFmode: + emit_insn (gen_lasx_xvpermi_q_v8sf (d->target, d->target, + d->op0, GEN_INT (0x21))); + break; + case E_V16HImode: + emit_insn (gen_lasx_xvpermi_q_v16hi (d->target, d->target, + d->op0, GEN_INT (0x21))); + break; + case E_V32QImode: + emit_insn (gen_lasx_xvpermi_q_v32qi (d->target, d->target, + d->op0, GEN_INT (0x21))); + break; + default: + break; + } + } + ok = true; + /* Finish the funtion directly. */ + goto expand_perm_const_2_end; + } + else if (loongarch_is_reversing_permutation (d)) + { + /* Selector sample: E_V8SImode, { 7, 6, 5, 4, 3, 2, 1, 0 } + one_vector_p == true */ + idx = d->nelt / 2 - 1; + for (i = 0; i < d->nelt / 2; i += 1) + { + remapped[i] = idx; + remapped[i + d->nelt / 2] = idx; + idx -= 1; + } + /* Selector after: { 3, 2, 1, 0, 3, 2, 1, 0 } + Additional insn will be generated to swap hi and lo 128bit of target + register. */ + reverse_hi_lo = true; + } + else if (loongarch_is_di_misalign_extract (d) + || loongarch_is_si_misalign_extract (d)) + { + /* Selector Sample: + DI misalign: E_V4DImode, { 1, 2, 3, 4 } + SI misalign: E_V8SImode, { 1, 2, 3, 4, 5, 6, 7, 8 } */ + if (!d->testing_p) + { + /* Copy original op0/op1 value to new temp register. + In some cases, operand register may be used in multiple place, so + we need new regiter instead modify original one, to avoid runtime + crashing or wrong value after execution. */ + use_alt_op = true; + op1_alt = gen_reg_rtx (d->vmode); + emit_move_insn (op1_alt, d->op1); + + /* Adjust op1 for selecting correct value in high 128bit of target + register. + op1: E_V4DImode, { 4, 5, 6, 7 } -> { 2, 3, 4, 5 }. */ + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0); + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1, + conv_op0, GEN_INT (0x21))); + + for (i = 0; i < d->nelt / 2; i += 1) + { + remapped[i] = d->perm[i]; + remapped[i + d->nelt / 2] = d->perm[i]; + } + /* Selector after: + DI misalign: { 1, 2, 1, 2 } + SI misalign: { 1, 2, 3, 4, 1, 2, 3, 4 } */ + } + } + else if (loongarch_is_lasx_lowpart_interleave (d)) + { + /* Elements from op0's low 18bit and op1's 128bit are inserted into + target register alternately. + sample: E_V4DImode, { 0, 4, 1, 5 } */ + if (!d->testing_p) + { + /* Prepare temp register instead of modify original op. */ + use_alt_op = true; + op1_alt = gen_reg_rtx (d->vmode); + op0_alt = gen_reg_rtx (d->vmode); + emit_move_insn (op1_alt, d->op1); + emit_move_insn (op0_alt, d->op0); + + /* Generate subreg for fitting into insn gen function. */ + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0); + + /* Adjust op value in temp register. + op0 = {0,1,2,3}, op1 = {4,5,0,1} */ + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1, + conv_op0, GEN_INT (0x02))); + /* op0 = {0,1,4,5}, op1 = {4,5,0,1} */ + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op0, conv_op0, + conv_op1, GEN_INT (0x01))); + + /* Remap indices in selector based on the location of index inside + selector, and vector element numbers in current vector mode. */ + + /* Filling low 128bit of new selector. */ + for (i = 0; i < d->nelt / 2; i += 1) + { + /* value in odd-indexed slot of low 128bit part of selector + vector. */ + remapped[i] = i % 2 != 0 ? d->perm[i] - d->nelt / 2 : d->perm[i]; + } + /* Then filling the high 128bit. */ + for (i = d->nelt / 2; i < d->nelt; i += 1) + { + /* value in even-indexed slot of high 128bit part of + selector vector. */ + remapped[i] = i % 2 == 0 + ? d->perm[i] + (d->nelt / 2) * 3 : d->perm[i]; + } + } + } + else if (loongarch_is_lasx_lowpart_interleave_2 (d)) + { + /* Special lowpart interleave case in V32QI vector mode. It does the same + thing as we can see in if branch that above this line. + Selector sample: E_V32QImode, + {0, 1, 2, 3, 4, 5, 6, 7, 32, 33, 34, 35, 36, 37, 38, 39, 8, + 9, 10, 11, 12, 13, 14, 15, 40, 41, 42, 43, 44, 45, 46, 47} */ + if (!d->testing_p) + { + /* Solution for this case in very simple - covert op into V4DI mode, + and do same thing as previous if branch. */ + op1_alt = gen_reg_rtx (d->vmode); + op0_alt = gen_reg_rtx (d->vmode); + emit_move_insn (op1_alt, d->op1); + emit_move_insn (op0_alt, d->op0); + + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0); + rtx conv_target = gen_rtx_SUBREG (E_V4DImode, d->target, 0); + + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1, + conv_op0, GEN_INT (0x02))); + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op0, conv_op0, + conv_op1, GEN_INT (0x01))); + remapped[0] = 0; + remapped[1] = 4; + remapped[2] = 1; + remapped[3] = 5; + + for (i = 0; i < d->nelt; i += 1) + { + rperm[i] = GEN_INT (remapped[i]); + } + + sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (4, rperm)); + sel = force_reg (E_V4DImode, sel); + emit_insn (gen_lasx_xvshuf_d (conv_target, sel, + conv_op1, conv_op0)); + } + + ok = true; + goto expand_perm_const_2_end; + } + else if (loongarch_is_lasx_lowpart_extract (d)) + { + /* Copy op0's low 128bit to target's low 128bit, and copy op1's low + 128bit to target's high 128bit. + Selector sample: E_V4DImode, { 0, 1, 4 ,5 } */ + if (!d->testing_p) + { + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0); + rtx conv_target = gen_rtx_SUBREG (E_V4DImode, d->target, 0); + + /* We can achieve the expectation by using sinple xvpermi.q insn. */ + emit_move_insn (conv_target, conv_op1); + emit_insn (gen_lasx_xvpermi_q_v4di (conv_target, conv_target, + conv_op0, GEN_INT (0x20))); + } + + ok = true; + goto expand_perm_const_2_end; + } + else if (loongarch_is_lasx_highpart_interleave (d)) + { + /* Similar to lowpart interleave, elements from op0's high 128bit and + op1's high 128bit are inserted into target regiter alternately. + Selector sample: E_V8SImode, { 4, 12, 5, 13, 6, 14, 7, 15 } */ + if (!d->testing_p) + { + /* Prepare temp op register. */ + use_alt_op = true; + op1_alt = gen_reg_rtx (d->vmode); + op0_alt = gen_reg_rtx (d->vmode); + emit_move_insn (op1_alt, d->op1); + emit_move_insn (op0_alt, d->op0); + + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0); + /* Adjust op value in temp regiter. + op0 = { 0, 1, 2, 3 }, op1 = { 6, 7, 2, 3 } */ + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1, + conv_op0, GEN_INT (0x13))); + /* op0 = { 2, 3, 6, 7 }, op1 = { 6, 7, 2, 3 } */ + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op0, conv_op0, + conv_op1, GEN_INT (0x01))); + /* Remap indices in selector based on the location of index inside + selector, and vector element numbers in current vector mode. */ + + /* Filling low 128bit of new selector. */ + for (i = 0; i < d->nelt / 2; i += 1) + { + /* value in even-indexed slot of low 128bit part of selector + vector. */ + remapped[i] = i % 2 == 0 ? d->perm[i] - d->nelt / 2 : d->perm[i]; + } + /* Then filling the high 128bit. */ + for (i = d->nelt / 2; i < d->nelt; i += 1) + { + /* value in odd-indexed slot of high 128bit part of selector + vector. */ + remapped[i] = i % 2 != 0 + ? d->perm[i] - (d->nelt / 2) * 3 : d->perm[i]; + } + } + } + else if (loongarch_is_lasx_highpart_interleave_2 (d)) + { + /* Special highpart interleave case in V32QI vector mode. It does the + same thing as the normal version above. + Selector sample: E_V32QImode, + {16, 17, 18, 19, 20, 21, 22, 23, 48, 49, 50, 51, 52, 53, 54, 55, + 24, 25, 26, 27, 28, 29, 30, 31, 56, 57, 58, 59, 60, 61, 62, 63} + */ + if (!d->testing_p) + { + /* Convert op into V4DImode and do the things. */ + op1_alt = gen_reg_rtx (d->vmode); + op0_alt = gen_reg_rtx (d->vmode); + emit_move_insn (op1_alt, d->op1); + emit_move_insn (op0_alt, d->op0); + + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0); + rtx conv_target = gen_rtx_SUBREG (E_V4DImode, d->target, 0); + + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1, + conv_op0, GEN_INT (0x13))); + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op0, conv_op0, + conv_op1, GEN_INT (0x01))); + remapped[0] = 2; + remapped[1] = 6; + remapped[2] = 3; + remapped[3] = 7; + + for (i = 0; i < d->nelt; i += 1) + { + rperm[i] = GEN_INT (remapped[i]); + } + + sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (4, rperm)); + sel = force_reg (E_V4DImode, sel); + emit_insn (gen_lasx_xvshuf_d (conv_target, sel, + conv_op1, conv_op0)); + } + + ok = true; + goto expand_perm_const_2_end; + } + else if (loongarch_is_elem_duplicate (d)) + { + /* Brocast single element (from op0 or op1) to all slot of target + register. + Selector sample:E_V8SImode, { 2, 2, 2, 2, 2, 2, 2, 2 } */ + if (!d->testing_p) + { + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0); + rtx temp_reg = gen_reg_rtx (d->vmode); + rtx conv_temp = gen_rtx_SUBREG (E_V4DImode, temp_reg, 0); + + emit_move_insn (temp_reg, d->op0); + + idx = d->perm[0]; + /* We will use xvrepl128vei.* insn to achieve the result, but we need + to make the high/low 128bit has the same contents that contain the + value that we need to broardcast, because xvrepl128vei does the + broardcast job from every 128bit of source register to + corresponded part of target register! (A deep sigh.) */ + if (/*idx >= 0 &&*/ idx < d->nelt / 2) + { + emit_insn (gen_lasx_xvpermi_q_v4di (conv_temp, conv_temp, + conv_op0, GEN_INT (0x0))); + } + else if (idx >= d->nelt / 2 && idx < d->nelt) + { + emit_insn (gen_lasx_xvpermi_q_v4di (conv_temp, conv_temp, + conv_op0, GEN_INT (0x11))); + idx -= d->nelt / 2; + } + else if (idx >= d->nelt && idx < (d->nelt + d->nelt / 2)) + { + emit_insn (gen_lasx_xvpermi_q_v4di (conv_temp, conv_temp, + conv_op1, GEN_INT (0x0))); + } + else if (idx >= (d->nelt + d->nelt / 2) && idx < d->nelt * 2) + { + emit_insn (gen_lasx_xvpermi_q_v4di (conv_temp, conv_temp, + conv_op1, GEN_INT (0x11))); + idx -= d->nelt / 2; + } + + /* Then we can finally generate this insn. */ + switch (d->vmode) + { + case E_V4DImode: + emit_insn (gen_lasx_xvrepl128vei_d (d->target, temp_reg, + GEN_INT (idx))); + break; + case E_V4DFmode: + emit_insn (gen_lasx_xvrepl128vei_d_f (d->target, temp_reg, + GEN_INT (idx))); + break; + case E_V8SImode: + emit_insn (gen_lasx_xvrepl128vei_w (d->target, temp_reg, + GEN_INT (idx))); + break; + case E_V8SFmode: + emit_insn (gen_lasx_xvrepl128vei_w_f (d->target, temp_reg, + GEN_INT (idx))); + break; + case E_V16HImode: + emit_insn (gen_lasx_xvrepl128vei_h (d->target, temp_reg, + GEN_INT (idx))); + break; + case E_V32QImode: + emit_insn (gen_lasx_xvrepl128vei_b (d->target, temp_reg, + GEN_INT (idx))); + break; + default: + gcc_unreachable (); + break; + } + + /* finish func directly. */ + ok = true; + goto expand_perm_const_2_end; + } + } + else if (loongarch_is_op_reverse_perm (d)) + { + /* reverse high 128bit and low 128bit in op0. + Selector sample: E_V4DFmode, { 2, 3, 0, 1 } + Use xvpermi.q for doing this job. */ + if (!d->testing_p) + { + if (d->vmode == E_V4DImode) + { + emit_insn (gen_lasx_xvpermi_q_v4di (d->target, d->target, d->op0, + GEN_INT (0x01))); + } + else if (d->vmode == E_V4DFmode) + { + emit_insn (gen_lasx_xvpermi_q_v4df (d->target, d->target, d->op0, + GEN_INT (0x01))); + } + else + { + gcc_unreachable (); + } + } + + ok = true; + goto expand_perm_const_2_end; + } + else if (loongarch_is_single_op_perm (d)) + { + /* Permutation that only select elements from op0. */ + if (!d->testing_p) + { + /* Prepare temp register instead of modify original op. */ + use_alt_op = true; + op0_alt = gen_reg_rtx (d->vmode); + op1_alt = gen_reg_rtx (d->vmode); + + emit_move_insn (op0_alt, d->op0); + emit_move_insn (op1_alt, d->op1); + + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0); + rtx conv_op0a = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0); + rtx conv_op1a = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0); + + /* Duplicate op0's low 128bit in op0, then duplicate high 128bit + in op1. After this, xvshuf.* insn's selector argument can + access all elements we need for correct permutation result. */ + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op0a, conv_op0a, conv_op0, + GEN_INT (0x00))); + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1a, conv_op1a, conv_op0, + GEN_INT (0x11))); + + /* In this case, there's no need to remap selector's indices. */ + for (i = 0; i < d->nelt; i += 1) + { + remapped[i] = d->perm[i]; + } + } + } + else if (loongarch_is_divisible_perm (d)) + { + /* Divisible perm: + Low 128bit of selector only selects elements of op0, + and high 128bit of selector only selects elements of op1. */ + + if (!d->testing_p) + { + /* Prepare temp register instead of modify original op. */ + use_alt_op = true; + op0_alt = gen_reg_rtx (d->vmode); + op1_alt = gen_reg_rtx (d->vmode); + + emit_move_insn (op0_alt, d->op0); + emit_move_insn (op1_alt, d->op1); + + rtx conv_op0a = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0); + rtx conv_op1a = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0); + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0); + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0); + + /* Reorganize op0's hi/lo 128bit and op1's hi/lo 128bit, to make sure + that selector's low 128bit can access all op0's elements, and + selector's high 128bit can access all op1's elements. */ + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op0a, conv_op0a, conv_op1, + GEN_INT (0x02))); + emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1a, conv_op1a, conv_op0, + GEN_INT (0x31))); + + /* No need to modify indices. */ + for (i = 0; i < d->nelt;i += 1) + { + remapped[i] = d->perm[i]; + } + } + } + else if (loongarch_is_triple_stride_extract (d)) + { + /* Selector sample: E_V4DFmode, { 1, 4, 7, 0 }. */ + if (!d->testing_p) + { + /* Resolve it with brute force modification. */ + remapped[0] = 1; + remapped[1] = 2; + remapped[2] = 3; + remapped[3] = 0; + } + } + else + { + /* When all of the detections above are failed, we will try last + strategy. + The for loop tries to detect following rules based on indices' value, + its position inside of selector vector ,and strange behavior of + xvshuf.* insn; Then we take corresponding action. (Replace with new + value, or give up whole permutation expansion.) */ + for (i = 0; i < d->nelt; i += 1) + { + /* % (2 * d->nelt) */ + idx = d->perm[i]; + + /* if index is located in low 128bit of selector vector. */ + if (i < d->nelt / 2) + { + /* Fail case 1: index tries to reach element that located in op0's + high 128bit. */ + if (idx >= d->nelt / 2 && idx < d->nelt) + { + goto expand_perm_const_2_end; + } + /* Fail case 2: index tries to reach element that located in + op1's high 128bit. */ + if (idx >= (d->nelt + d->nelt / 2)) + { + goto expand_perm_const_2_end; + } + + /* Success case: index tries to reach elements that located in + op1's low 128bit. Apply - (nelt / 2) offset to original + value. */ + if (idx >= d->nelt && idx < (d->nelt + d->nelt / 2)) + { + idx -= d->nelt / 2; + } + } + /* if index is located in high 128bit of selector vector. */ + else + { + /* Fail case 1: index tries to reach element that located in + op1's low 128bit. */ + if (idx >= d->nelt && idx < (d->nelt + d->nelt / 2)) + { + goto expand_perm_const_2_end; + } + /* Fail case 2: index tries to reach element that located in + op0's low 128bit. */ + if (idx < (d->nelt / 2)) + { + goto expand_perm_const_2_end; + } + /* Success case: index tries to reach element that located in + op0's high 128bit. */ + if (idx >= d->nelt / 2 && idx < d->nelt) + { + idx -= d->nelt / 2; + } + } + /* No need to process other case that we did not mentioned. */ + + /* Assign with original or processed value. */ + remapped[i] = idx; + } + } + + ok = true; + /* If testing_p is true, compiler is trying to figure out that backend can + handle this permutation, but doesn't want to generate actual insn. So + if true, exit directly. */ + if (d->testing_p) + { + goto expand_perm_const_2_end; + } + + /* Convert remapped selector array to RTL array. */ + for (i = 0; i < d->nelt; i += 1) + { + rperm[i] = GEN_INT (remapped[i]); + } + + /* Copy selector vector from memory to vector regiter for later insn gen + function. + If vector's element in floating point value, we cannot fit selector + argument into insn gen function directly, because of the insn template + definition. As a solution, generate a integral mode subreg of target, + then copy selector vector (that is in integral mode) to this subreg. */ + switch (d->vmode) + { + case E_V4DFmode: + sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (d->nelt, rperm)); + tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0); + emit_move_insn (tmp, sel); + break; + case E_V8SFmode: + sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d->nelt, rperm)); + tmp = gen_rtx_SUBREG (E_V8SImode, d->target, 0); + emit_move_insn (tmp, sel); + break; + default: + sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm)); + emit_move_insn (d->target, sel); + break; + } + + target = d->target; + /* If temp op registers are requested in previous if branch, then use temp + register intead of original one. */ + if (use_alt_op) + { + op0 = op0_alt != NULL_RTX ? op0_alt : d->op0; + op1 = op1_alt != NULL_RTX ? op1_alt : d->op1; + } + else + { + op0 = d->op0; + op1 = d->one_vector_p ? d->op0 : d->op1; + } + + /* We FINALLY can generate xvshuf.* insn. */ + switch (d->vmode) + { + case E_V4DFmode: + emit_insn (gen_lasx_xvshuf_d_f (target, target, op1, op0)); + break; + case E_V4DImode: + emit_insn (gen_lasx_xvshuf_d (target, target, op1, op0)); + break; + case E_V8SFmode: + emit_insn (gen_lasx_xvshuf_w_f (target, target, op1, op0)); + break; + case E_V8SImode: + emit_insn (gen_lasx_xvshuf_w (target, target, op1, op0)); + break; + case E_V16HImode: + emit_insn (gen_lasx_xvshuf_h (target, target, op1, op0)); + break; + case E_V32QImode: + emit_insn (gen_lasx_xvshuf_b (target, op1, op0, target)); + break; + default: + gcc_unreachable (); + break; + } + + /* Extra insn for swapping the hi/lo 128bit of target vector register. */ + if (reverse_hi_lo) + { + switch (d->vmode) + { + case E_V4DFmode: + emit_insn (gen_lasx_xvpermi_q_v4df (d->target, d->target, + d->target, GEN_INT (0x1))); + break; + case E_V4DImode: + emit_insn (gen_lasx_xvpermi_q_v4di (d->target, d->target, + d->target, GEN_INT (0x1))); + break; + case E_V8SFmode: + emit_insn (gen_lasx_xvpermi_q_v8sf (d->target, d->target, + d->target, GEN_INT (0x1))); + break; + case E_V8SImode: + emit_insn (gen_lasx_xvpermi_q_v8si (d->target, d->target, + d->target, GEN_INT (0x1))); + break; + case E_V16HImode: + emit_insn (gen_lasx_xvpermi_q_v16hi (d->target, d->target, + d->target, GEN_INT (0x1))); + break; + case E_V32QImode: + emit_insn (gen_lasx_xvpermi_q_v32qi (d->target, d->target, + d->target, GEN_INT (0x1))); + break; + default: + break; + } + } + /* Extra insn required by odd/even extraction. Swapping the second and third + 64bit in target vector register. */ + else if (extract_ev_od) + { + rtx converted = gen_rtx_SUBREG (E_V4DImode, d->target, 0); + emit_insn (gen_lasx_xvpermi_d_v4di (converted, converted, + GEN_INT (0xD8))); + } + +expand_perm_const_2_end: + return ok; } /* Implement TARGET_VECTORIZE_VEC_PERM_CONST. */ @@ -7799,7 +9643,7 @@ loongarch_sched_reassociation_width (unsigned int opc, machine_mode mode) case CPU_LOONGARCH64: case CPU_LA464: /* Vector part. */ - if (LSX_SUPPORTED_MODE_P (mode)) + if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) { /* Integer vector instructions execute in FP unit. The width of integer/float-point vector instructions is 3. */ @@ -7839,6 +9683,44 @@ loongarch_expand_vector_extract (rtx target, rtx vec, int elt) case E_V16QImode: break; + case E_V32QImode: + if (TARGET_LASX) + { + if (elt >= 16) + { + tmp = gen_reg_rtx (V32QImode); + emit_insn (gen_lasx_xvpermi_d_v32qi (tmp, vec, GEN_INT (0xe))); + loongarch_expand_vector_extract (target, + gen_lowpart (V16QImode, tmp), + elt & 15); + } + else + loongarch_expand_vector_extract (target, + gen_lowpart (V16QImode, vec), + elt & 15); + return; + } + break; + + case E_V16HImode: + if (TARGET_LASX) + { + if (elt >= 8) + { + tmp = gen_reg_rtx (V16HImode); + emit_insn (gen_lasx_xvpermi_d_v16hi (tmp, vec, GEN_INT (0xe))); + loongarch_expand_vector_extract (target, + gen_lowpart (V8HImode, tmp), + elt & 7); + } + else + loongarch_expand_vector_extract (target, + gen_lowpart (V8HImode, vec), + elt & 7); + return; + } + break; + default: break; } @@ -7877,6 +9759,31 @@ emit_reduc_half (rtx dest, rtx src, int i) case E_V2DFmode: tem = gen_lsx_vbsrl_d_f (dest, src, GEN_INT (8)); break; + case E_V8SFmode: + if (i == 256) + tem = gen_lasx_xvpermi_d_v8sf (dest, src, GEN_INT (0xe)); + else + tem = gen_lasx_xvshuf4i_w_f (dest, src, + GEN_INT (i == 128 ? 2 + (3 << 2) : 1)); + break; + case E_V4DFmode: + if (i == 256) + tem = gen_lasx_xvpermi_d_v4df (dest, src, GEN_INT (0xe)); + else + tem = gen_lasx_xvpermi_d_v4df (dest, src, const1_rtx); + break; + case E_V32QImode: + case E_V16HImode: + case E_V8SImode: + case E_V4DImode: + d = gen_reg_rtx (V4DImode); + if (i == 256) + tem = gen_lasx_xvpermi_d_v4di (d, gen_lowpart (V4DImode, src), + GEN_INT (0xe)); + else + tem = gen_lasx_xvbsrl_d (d, gen_lowpart (V4DImode, src), + GEN_INT (i/16)); + break; case E_V16QImode: case E_V8HImode: case E_V4SImode: @@ -7924,10 +9831,57 @@ loongarch_expand_vec_unpack (rtx operands[2], bool unsigned_p, bool high_p) { machine_mode imode = GET_MODE (operands[1]); rtx (*unpack) (rtx, rtx, rtx); + rtx (*extend) (rtx, rtx); rtx (*cmpFunc) (rtx, rtx, rtx); + rtx (*swap_hi_lo) (rtx, rtx, rtx, rtx); rtx tmp, dest; - if (ISA_HAS_LSX) + if (ISA_HAS_LASX && GET_MODE_SIZE (imode) == 32) + { + switch (imode) + { + case E_V8SImode: + if (unsigned_p) + extend = gen_lasx_vext2xv_du_wu; + else + extend = gen_lasx_vext2xv_d_w; + swap_hi_lo = gen_lasx_xvpermi_q_v8si; + break; + + case E_V16HImode: + if (unsigned_p) + extend = gen_lasx_vext2xv_wu_hu; + else + extend = gen_lasx_vext2xv_w_h; + swap_hi_lo = gen_lasx_xvpermi_q_v16hi; + break; + + case E_V32QImode: + if (unsigned_p) + extend = gen_lasx_vext2xv_hu_bu; + else + extend = gen_lasx_vext2xv_h_b; + swap_hi_lo = gen_lasx_xvpermi_q_v32qi; + break; + + default: + gcc_unreachable (); + break; + } + + if (high_p) + { + tmp = gen_reg_rtx (imode); + emit_insn (swap_hi_lo (tmp, tmp, operands[1], const1_rtx)); + emit_insn (extend (operands[0], tmp)); + return; + } + + emit_insn (extend (operands[0], operands[1])); + return; + + } + else if (ISA_HAS_LSX) { switch (imode) { @@ -8028,8 +9982,17 @@ loongarch_gen_const_int_vector_shuffle (machine_mode mode, int val) return gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (nunits, elts)); } + /* Expand a vector initialization. */ +void +loongarch_expand_vector_group_init (rtx target, rtx vals) +{ + rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) }; + emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (E_V32QImode, ops[0], + ops[1]))); +} + void loongarch_expand_vector_init (rtx target, rtx vals) { @@ -8049,6 +10012,285 @@ loongarch_expand_vector_init (rtx target, rtx vals) all_same = false; } + if (ISA_HAS_LASX && GET_MODE_SIZE (vmode) == 32) + { + if (all_same) + { + rtx same = XVECEXP (vals, 0, 0); + rtx temp, temp2; + + if (CONST_INT_P (same) && nvar == 0 + && loongarch_signed_immediate_p (INTVAL (same), 10, 0)) + { + switch (vmode) + { + case E_V32QImode: + case E_V16HImode: + case E_V8SImode: + case E_V4DImode: + temp = gen_rtx_CONST_VECTOR (vmode, XVEC (vals, 0)); + emit_move_insn (target, temp); + return; + + default: + gcc_unreachable (); + } + } + + temp = gen_reg_rtx (imode); + if (imode == GET_MODE (same)) + temp2 = same; + else if (GET_MODE_SIZE (imode) >= UNITS_PER_WORD) + { + if (GET_CODE (same) == MEM) + { + rtx reg_tmp = gen_reg_rtx (GET_MODE (same)); + loongarch_emit_move (reg_tmp, same); + temp2 = simplify_gen_subreg (imode, reg_tmp, + GET_MODE (reg_tmp), 0); + } + else + temp2 = simplify_gen_subreg (imode, same, + GET_MODE (same), 0); + } + else + { + if (GET_CODE (same) == MEM) + { + rtx reg_tmp = gen_reg_rtx (GET_MODE (same)); + loongarch_emit_move (reg_tmp, same); + temp2 = lowpart_subreg (imode, reg_tmp, + GET_MODE (reg_tmp)); + } + else + temp2 = lowpart_subreg (imode, same, GET_MODE (same)); + } + emit_move_insn (temp, temp2); + + switch (vmode) + { + case E_V32QImode: + case E_V16HImode: + case E_V8SImode: + case E_V4DImode: + loongarch_emit_move (target, + gen_rtx_VEC_DUPLICATE (vmode, temp)); + break; + + case E_V8SFmode: + emit_insn (gen_lasx_xvreplve0_w_f_scalar (target, temp)); + break; + + case E_V4DFmode: + emit_insn (gen_lasx_xvreplve0_d_f_scalar (target, temp)); + break; + + default: + gcc_unreachable (); + } + } + else + { + rtvec vec = shallow_copy_rtvec (XVEC (vals, 0)); + + for (i = 0; i < nelt; ++i) + RTVEC_ELT (vec, i) = CONST0_RTX (imode); + + emit_move_insn (target, gen_rtx_CONST_VECTOR (vmode, vec)); + + machine_mode half_mode = VOIDmode; + rtx target_hi, target_lo; + + switch (vmode) + { + case E_V32QImode: + half_mode=E_V16QImode; + target_hi = gen_reg_rtx (half_mode); + target_lo = gen_reg_rtx (half_mode); + for (i = 0; i < nelt/2; ++i) + { + rtx temp_hi = gen_reg_rtx (imode); + rtx temp_lo = gen_reg_rtx (imode); + emit_move_insn (temp_hi, XVECEXP (vals, 0, i+nelt/2)); + emit_move_insn (temp_lo, XVECEXP (vals, 0, i)); + if (i == 0) + { + emit_insn (gen_lsx_vreplvei_b_scalar (target_hi, + temp_hi)); + emit_insn (gen_lsx_vreplvei_b_scalar (target_lo, + temp_lo)); + } + else + { + emit_insn (gen_vec_setv16qi (target_hi, temp_hi, + GEN_INT (i))); + emit_insn (gen_vec_setv16qi (target_lo, temp_lo, + GEN_INT (i))); + } + } + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (vmode, target_hi, + target_lo))); + break; + + case E_V16HImode: + half_mode=E_V8HImode; + target_hi = gen_reg_rtx (half_mode); + target_lo = gen_reg_rtx (half_mode); + for (i = 0; i < nelt/2; ++i) + { + rtx temp_hi = gen_reg_rtx (imode); + rtx temp_lo = gen_reg_rtx (imode); + emit_move_insn (temp_hi, XVECEXP (vals, 0, i+nelt/2)); + emit_move_insn (temp_lo, XVECEXP (vals, 0, i)); + if (i == 0) + { + emit_insn (gen_lsx_vreplvei_h_scalar (target_hi, + temp_hi)); + emit_insn (gen_lsx_vreplvei_h_scalar (target_lo, + temp_lo)); + } + else + { + emit_insn (gen_vec_setv8hi (target_hi, temp_hi, + GEN_INT (i))); + emit_insn (gen_vec_setv8hi (target_lo, temp_lo, + GEN_INT (i))); + } + } + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (vmode, target_hi, + target_lo))); + break; + + case E_V8SImode: + half_mode=V4SImode; + target_hi = gen_reg_rtx (half_mode); + target_lo = gen_reg_rtx (half_mode); + for (i = 0; i < nelt/2; ++i) + { + rtx temp_hi = gen_reg_rtx (imode); + rtx temp_lo = gen_reg_rtx (imode); + emit_move_insn (temp_hi, XVECEXP (vals, 0, i+nelt/2)); + emit_move_insn (temp_lo, XVECEXP (vals, 0, i)); + if (i == 0) + { + emit_insn (gen_lsx_vreplvei_w_scalar (target_hi, + temp_hi)); + emit_insn (gen_lsx_vreplvei_w_scalar (target_lo, + temp_lo)); + } + else + { + emit_insn (gen_vec_setv4si (target_hi, temp_hi, + GEN_INT (i))); + emit_insn (gen_vec_setv4si (target_lo, temp_lo, + GEN_INT (i))); + } + } + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (vmode, target_hi, + target_lo))); + break; + + case E_V4DImode: + half_mode=E_V2DImode; + target_hi = gen_reg_rtx (half_mode); + target_lo = gen_reg_rtx (half_mode); + for (i = 0; i < nelt/2; ++i) + { + rtx temp_hi = gen_reg_rtx (imode); + rtx temp_lo = gen_reg_rtx (imode); + emit_move_insn (temp_hi, XVECEXP (vals, 0, i+nelt/2)); + emit_move_insn (temp_lo, XVECEXP (vals, 0, i)); + if (i == 0) + { + emit_insn (gen_lsx_vreplvei_d_scalar (target_hi, + temp_hi)); + emit_insn (gen_lsx_vreplvei_d_scalar (target_lo, + temp_lo)); + } + else + { + emit_insn (gen_vec_setv2di (target_hi, temp_hi, + GEN_INT (i))); + emit_insn (gen_vec_setv2di (target_lo, temp_lo, + GEN_INT (i))); + } + } + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (vmode, target_hi, + target_lo))); + break; + + case E_V8SFmode: + half_mode=E_V4SFmode; + target_hi = gen_reg_rtx (half_mode); + target_lo = gen_reg_rtx (half_mode); + for (i = 0; i < nelt/2; ++i) + { + rtx temp_hi = gen_reg_rtx (imode); + rtx temp_lo = gen_reg_rtx (imode); + emit_move_insn (temp_hi, XVECEXP (vals, 0, i+nelt/2)); + emit_move_insn (temp_lo, XVECEXP (vals, 0, i)); + if (i == 0) + { + emit_insn (gen_lsx_vreplvei_w_f_scalar (target_hi, + temp_hi)); + emit_insn (gen_lsx_vreplvei_w_f_scalar (target_lo, + temp_lo)); + } + else + { + emit_insn (gen_vec_setv4sf (target_hi, temp_hi, + GEN_INT (i))); + emit_insn (gen_vec_setv4sf (target_lo, temp_lo, + GEN_INT (i))); + } + } + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (vmode, target_hi, + target_lo))); + break; + + case E_V4DFmode: + half_mode=E_V2DFmode; + target_hi = gen_reg_rtx (half_mode); + target_lo = gen_reg_rtx (half_mode); + for (i = 0; i < nelt/2; ++i) + { + rtx temp_hi = gen_reg_rtx (imode); + rtx temp_lo = gen_reg_rtx (imode); + emit_move_insn (temp_hi, XVECEXP (vals, 0, i+nelt/2)); + emit_move_insn (temp_lo, XVECEXP (vals, 0, i)); + if (i == 0) + { + emit_insn (gen_lsx_vreplvei_d_f_scalar (target_hi, + temp_hi)); + emit_insn (gen_lsx_vreplvei_d_f_scalar (target_lo, + temp_lo)); + } + else + { + emit_insn (gen_vec_setv2df (target_hi, temp_hi, + GEN_INT (i))); + emit_insn (gen_vec_setv2df (target_lo, temp_lo, + GEN_INT (i))); + } + } + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (vmode, target_hi, + target_lo))); + break; + + default: + gcc_unreachable (); + } + + } + return; + } + if (ISA_HAS_LSX) { if (all_same) @@ -8296,6 +10538,38 @@ loongarch_expand_lsx_cmp (rtx dest, enum rtx_code cond, rtx op0, rtx op1) } break; + case E_V8SFmode: + case E_V4DFmode: + switch (cond) + { + case UNORDERED: + case ORDERED: + case EQ: + case NE: + case UNEQ: + case UNLE: + case UNLT: + break; + case LTGT: cond = NE; break; + case UNGE: cond = UNLE; std::swap (op0, op1); break; + case UNGT: cond = UNLT; std::swap (op0, op1); break; + case LE: unspec = UNSPEC_LASX_XVFCMP_SLE; break; + case LT: unspec = UNSPEC_LASX_XVFCMP_SLT; break; + case GE: unspec = UNSPEC_LASX_XVFCMP_SLE; std::swap (op0, op1); break; + case GT: unspec = UNSPEC_LASX_XVFCMP_SLT; std::swap (op0, op1); break; + default: + gcc_unreachable (); + } + if (unspec < 0) + loongarch_emit_binary (cond, dest, op0, op1); + else + { + rtx x = gen_rtx_UNSPEC (GET_MODE (dest), + gen_rtvec (2, op0, op1), unspec); + emit_insn (gen_rtx_SET (dest, x)); + } + break; + default: gcc_unreachable (); break; @@ -8633,7 +10907,7 @@ loongarch_builtin_support_vector_misalignment (machine_mode mode, int misalignment, bool is_packed) { - if (ISA_HAS_LSX && STRICT_ALIGNMENT) + if ((ISA_HAS_LSX || ISA_HAS_LASX) && STRICT_ALIGNMENT) { if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing) return false; diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index e939dd826d1..39852d2bb12 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -186,6 +186,11 @@ along with GCC; see the file COPYING3. If not see /* Width of a LSX vector register in bits. */ #define BITS_PER_LSX_REG (UNITS_PER_LSX_REG * BITS_PER_UNIT) +/* Width of a LASX vector register in bytes. */ +#define UNITS_PER_LASX_REG 32 +/* Width of a LASX vector register in bits. */ +#define BITS_PER_LASX_REG (UNITS_PER_LASX_REG * BITS_PER_UNIT) + /* For LARCH, width of a floating point register. */ #define UNITS_PER_FPREG (TARGET_DOUBLE_FLOAT ? 8 : 4) @@ -248,10 +253,11 @@ along with GCC; see the file COPYING3. If not see #define STRUCTURE_SIZE_BOUNDARY 8 /* There is no point aligning anything to a rounder boundary than - LONG_DOUBLE_TYPE_SIZE, unless under LSX the bigggest alignment is - BITS_PER_LSX_REG/.. */ + LONG_DOUBLE_TYPE_SIZE, unless under LSX/LASX the bigggest alignment is + BITS_PER_LSX_REG/BITS_PER_LASX_REG/.. */ #define BIGGEST_ALIGNMENT \ - (ISA_HAS_LSX ? BITS_PER_LSX_REG : LONG_DOUBLE_TYPE_SIZE) + (ISA_HAS_LASX? BITS_PER_LASX_REG \ + : (ISA_HAS_LSX ? BITS_PER_LSX_REG : LONG_DOUBLE_TYPE_SIZE)) /* All accesses must be aligned. */ #define STRICT_ALIGNMENT (TARGET_STRICT_ALIGN) @@ -391,6 +397,10 @@ along with GCC; see the file COPYING3. If not see #define LSX_REG_LAST FP_REG_LAST #define LSX_REG_NUM FP_REG_NUM +#define LASX_REG_FIRST FP_REG_FIRST +#define LASX_REG_LAST FP_REG_LAST +#define LASX_REG_NUM FP_REG_NUM + /* The DWARF 2 CFA column which tracks the return address from a signal handler context. This means that to maintain backwards compatibility, no hard register can be assigned this column if it @@ -409,9 +419,12 @@ along with GCC; see the file COPYING3. If not see ((unsigned int) ((int) (REGNO) - FCC_REG_FIRST) < FCC_REG_NUM) #define LSX_REG_P(REGNO) \ ((unsigned int) ((int) (REGNO) - LSX_REG_FIRST) < LSX_REG_NUM) +#define LASX_REG_P(REGNO) \ + ((unsigned int) ((int) (REGNO) - LASX_REG_FIRST) < LASX_REG_NUM) #define FP_REG_RTX_P(X) (REG_P (X) && FP_REG_P (REGNO (X))) #define LSX_REG_RTX_P(X) (REG_P (X) && LSX_REG_P (REGNO (X))) +#define LASX_REG_RTX_P(X) (REG_P (X) && LASX_REG_P (REGNO (X))) /* Select a register mode required for caller save of hard regno REGNO. */ #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ @@ -733,6 +746,13 @@ enum reg_class && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT \ || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)) +#define LASX_SUPPORTED_MODE_P(MODE) \ + (ISA_HAS_LASX \ + && (GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG \ + ||GET_MODE_SIZE (MODE) == UNITS_PER_LASX_REG) \ + && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT \ + || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)) + /* 1 if N is a possible register number for function argument passing. We have no FP argument registers when soft-float. */ @@ -985,7 +1005,39 @@ typedef struct { { "vr28", 28 + FP_REG_FIRST }, \ { "vr29", 29 + FP_REG_FIRST }, \ { "vr30", 30 + FP_REG_FIRST }, \ - { "vr31", 31 + FP_REG_FIRST } \ + { "vr31", 31 + FP_REG_FIRST }, \ + { "xr0", 0 + FP_REG_FIRST }, \ + { "xr1", 1 + FP_REG_FIRST }, \ + { "xr2", 2 + FP_REG_FIRST }, \ + { "xr3", 3 + FP_REG_FIRST }, \ + { "xr4", 4 + FP_REG_FIRST }, \ + { "xr5", 5 + FP_REG_FIRST }, \ + { "xr6", 6 + FP_REG_FIRST }, \ + { "xr7", 7 + FP_REG_FIRST }, \ + { "xr8", 8 + FP_REG_FIRST }, \ + { "xr9", 9 + FP_REG_FIRST }, \ + { "xr10", 10 + FP_REG_FIRST }, \ + { "xr11", 11 + FP_REG_FIRST }, \ + { "xr12", 12 + FP_REG_FIRST }, \ + { "xr13", 13 + FP_REG_FIRST }, \ + { "xr14", 14 + FP_REG_FIRST }, \ + { "xr15", 15 + FP_REG_FIRST }, \ + { "xr16", 16 + FP_REG_FIRST }, \ + { "xr17", 17 + FP_REG_FIRST }, \ + { "xr18", 18 + FP_REG_FIRST }, \ + { "xr19", 19 + FP_REG_FIRST }, \ + { "xr20", 20 + FP_REG_FIRST }, \ + { "xr21", 21 + FP_REG_FIRST }, \ + { "xr22", 22 + FP_REG_FIRST }, \ + { "xr23", 23 + FP_REG_FIRST }, \ + { "xr24", 24 + FP_REG_FIRST }, \ + { "xr25", 25 + FP_REG_FIRST }, \ + { "xr26", 26 + FP_REG_FIRST }, \ + { "xr27", 27 + FP_REG_FIRST }, \ + { "xr28", 28 + FP_REG_FIRST }, \ + { "xr29", 29 + FP_REG_FIRST }, \ + { "xr30", 30 + FP_REG_FIRST }, \ + { "xr31", 31 + FP_REG_FIRST } \ } /* Globalizing directive for a label. */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 7b8978e2533..30b2cb91e9a 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -163,7 +163,7 @@ (define_attr "alu_type" "unknown,add,sub,not,nor,and,or,xor,simd_add" ;; Main data type used by the insn (define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,SF,DF,TF,FCC, - V2DI,V4SI,V8HI,V16QI,V2DF,V4SF" + V2DI,V4SI,V8HI,V16QI,V2DF,V4SF,V4DI,V8SI,V16HI,V32QI,V4DF,V8SF" (const_string "unknown")) ;; True if the main data type is twice the size of a word. @@ -422,12 +422,14 @@ (define_mode_attr ifmt [(SI "w") (DI "l")]) ;; floating-point mode or vector mode. (define_mode_attr UNITMODE [(SF "SF") (DF "DF") (V2SF "SF") (V4SF "SF") (V16QI "QI") (V8HI "HI") (V4SI "SI") (V2DI "DI") - (V2DF "DF")]) + (V2DF "DF")(V8SF "SF")(V32QI "QI")(V16HI "HI")(V8SI "SI")(V4DI "DI")(V4DF "DF")]) ;; As above, but in lower case. (define_mode_attr unitmode [(SF "sf") (DF "df") (V2SF "sf") (V4SF "sf") (V16QI "qi") (V8QI "qi") (V8HI "hi") (V4HI "hi") - (V4SI "si") (V2SI "si") (V2DI "di") (V2DF "df")]) + (V4SI "si") (V2SI "si") (V2DI "di") (V2DF "df") + (V8SI "si") (V4DI "di") (V32QI "qi") (V16HI "hi") + (V8SF "sf") (V4DF "df")]) ;; This attribute gives the integer mode that has half the size of ;; the controlling mode. @@ -711,16 +713,17 @@ (define_insn "sub3" [(set_attr "alu_type" "sub") (set_attr "mode" "")]) + (define_insn "*subsi3_extended" - [(set (match_operand:DI 0 "register_operand" "= r") + [(set (match_operand:DI 0 "register_operand" "=r") (sign_extend:DI - (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ") - (match_operand:SI 2 "register_operand" " r"))))] + (minus:SI (match_operand:SI 1 "reg_or_0_operand" "rJ") + (match_operand:SI 2 "register_operand" "r"))))] "TARGET_64BIT" "sub.w\t%0,%z1,%2" [(set_attr "type" "arith") (set_attr "mode" "SI")]) - + ;; ;; .................... ;; @@ -3634,6 +3637,9 @@ (define_insn "loongarch_crcc_w__w" ; The LoongArch SX Instructions. (include "lsx.md") +; The LoongArch ASX Instructions. +(include "lasx.md") + (define_c_enum "unspec" [ UNSPEC_ADDRESS_FIRST ]) diff --git a/gcc/testsuite/g++.dg/torture/vshuf-v16hi.C b/gcc/testsuite/g++.dg/torture/vshuf-v16hi.C index 6277068b859..252de0ab97a 100644 --- a/gcc/testsuite/g++.dg/torture/vshuf-v16hi.C +++ b/gcc/testsuite/g++.dg/torture/vshuf-v16hi.C @@ -1,5 +1,6 @@ // { dg-options "-std=c++11" } // { dg-do run } +// { dg-skip-if "LoongArch vshuf/xvshuf insn result is undefined when 6 or 7 bit of vector's element is set." { loongarch*-*-* } } typedef unsigned short V __attribute__((vector_size(32))); typedef V VI; From patchwork Thu Aug 24 03:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenghui Pan X-Patchwork-Id: 1825110 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RWSs60M52z1yNm for ; Thu, 24 Aug 2023 13:15:54 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F3CE43853D00 for ; Thu, 24 Aug 2023 03:15:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id F258D3858296 for ; Thu, 24 Aug 2023 03:14:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F258D3858296 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.45]) by gateway (Coremail) with SMTP id _____8Dxfev7yuZk_GEbAA--.53702S3; Thu, 24 Aug 2023 11:14:03 +0800 (CST) Received: from loongson-pc.loongson.cn (unknown [10.20.4.45]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxviPdyuZkzvJhAA--.583S10; Thu, 24 Aug 2023 11:14:02 +0800 (CST) From: Chenghui Pan To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn Subject: [PATCH v5 6/6] LoongArch: Add Loongson ASX directive builtin function support. Date: Thu, 24 Aug 2023 11:13:16 +0800 Message-Id: <20230824031316.16599-7-panchenghui@loongson.cn> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230824031316.16599-1-panchenghui@loongson.cn> References: <20230824031316.16599-1-panchenghui@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxviPdyuZkzvJhAA--.583S10 X-CM-SenderInfo: psdquxxhqjx33l6o00pqjv00gofq/1tbiAQANBGTlhzMLuAAFsL X-Coremail-Antispam: 1Uk129KBj9kXoW8JFWkAry7AFyUtFyDCw18Aw18p5X_Xw13ur WrpFy8CF15Cay7ZF9rCFW8JFWavr4a9r4fC3W7XryY9wnIkayYvasFqF1vyr45AFnxCrWj yw17Xa1YqF95KF17A3gCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7 ZEXasCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29K BjDU0xBIdaVrnRJUUUk2b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26c xKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x02 67AKxVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6x ACxx1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E 87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82 IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC2 0s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMI IF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF 0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87 Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUcCD7UUUUU X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Lulu Cheng gcc/ChangeLog: * config.gcc: Export the header file lasxintrin.h. * config/loongarch/loongarch-builtins.cc (enum loongarch_builtin_type): Add Loongson ASX builtin functions support. (AVAIL_ALL): Ditto. (LASX_BUILTIN): Ditto. (LASX_NO_TARGET_BUILTIN): Ditto. (LASX_BUILTIN_TEST_BRANCH): Ditto. (CODE_FOR_lasx_xvsadd_b): Ditto. (CODE_FOR_lasx_xvsadd_h): Ditto. (CODE_FOR_lasx_xvsadd_w): Ditto. (CODE_FOR_lasx_xvsadd_d): Ditto. (CODE_FOR_lasx_xvsadd_bu): Ditto. (CODE_FOR_lasx_xvsadd_hu): Ditto. (CODE_FOR_lasx_xvsadd_wu): Ditto. (CODE_FOR_lasx_xvsadd_du): Ditto. (CODE_FOR_lasx_xvadd_b): Ditto. (CODE_FOR_lasx_xvadd_h): Ditto. (CODE_FOR_lasx_xvadd_w): Ditto. (CODE_FOR_lasx_xvadd_d): Ditto. (CODE_FOR_lasx_xvaddi_bu): Ditto. (CODE_FOR_lasx_xvaddi_hu): Ditto. (CODE_FOR_lasx_xvaddi_wu): Ditto. (CODE_FOR_lasx_xvaddi_du): Ditto. (CODE_FOR_lasx_xvand_v): Ditto. (CODE_FOR_lasx_xvandi_b): Ditto. (CODE_FOR_lasx_xvbitsel_v): Ditto. (CODE_FOR_lasx_xvseqi_b): Ditto. (CODE_FOR_lasx_xvseqi_h): Ditto. (CODE_FOR_lasx_xvseqi_w): Ditto. (CODE_FOR_lasx_xvseqi_d): Ditto. (CODE_FOR_lasx_xvslti_b): Ditto. (CODE_FOR_lasx_xvslti_h): Ditto. (CODE_FOR_lasx_xvslti_w): Ditto. (CODE_FOR_lasx_xvslti_d): Ditto. (CODE_FOR_lasx_xvslti_bu): Ditto. (CODE_FOR_lasx_xvslti_hu): Ditto. (CODE_FOR_lasx_xvslti_wu): Ditto. (CODE_FOR_lasx_xvslti_du): Ditto. (CODE_FOR_lasx_xvslei_b): Ditto. (CODE_FOR_lasx_xvslei_h): Ditto. (CODE_FOR_lasx_xvslei_w): Ditto. (CODE_FOR_lasx_xvslei_d): Ditto. (CODE_FOR_lasx_xvslei_bu): Ditto. (CODE_FOR_lasx_xvslei_hu): Ditto. (CODE_FOR_lasx_xvslei_wu): Ditto. (CODE_FOR_lasx_xvslei_du): Ditto. (CODE_FOR_lasx_xvdiv_b): Ditto. (CODE_FOR_lasx_xvdiv_h): Ditto. (CODE_FOR_lasx_xvdiv_w): Ditto. (CODE_FOR_lasx_xvdiv_d): Ditto. (CODE_FOR_lasx_xvdiv_bu): Ditto. (CODE_FOR_lasx_xvdiv_hu): Ditto. (CODE_FOR_lasx_xvdiv_wu): Ditto. (CODE_FOR_lasx_xvdiv_du): Ditto. (CODE_FOR_lasx_xvfadd_s): Ditto. (CODE_FOR_lasx_xvfadd_d): Ditto. (CODE_FOR_lasx_xvftintrz_w_s): Ditto. (CODE_FOR_lasx_xvftintrz_l_d): Ditto. (CODE_FOR_lasx_xvftintrz_wu_s): Ditto. (CODE_FOR_lasx_xvftintrz_lu_d): Ditto. (CODE_FOR_lasx_xvffint_s_w): Ditto. (CODE_FOR_lasx_xvffint_d_l): Ditto. (CODE_FOR_lasx_xvffint_s_wu): Ditto. (CODE_FOR_lasx_xvffint_d_lu): Ditto. (CODE_FOR_lasx_xvfsub_s): Ditto. (CODE_FOR_lasx_xvfsub_d): Ditto. (CODE_FOR_lasx_xvfmul_s): Ditto. (CODE_FOR_lasx_xvfmul_d): Ditto. (CODE_FOR_lasx_xvfdiv_s): Ditto. (CODE_FOR_lasx_xvfdiv_d): Ditto. (CODE_FOR_lasx_xvfmax_s): Ditto. (CODE_FOR_lasx_xvfmax_d): Ditto. (CODE_FOR_lasx_xvfmin_s): Ditto. (CODE_FOR_lasx_xvfmin_d): Ditto. (CODE_FOR_lasx_xvfsqrt_s): Ditto. (CODE_FOR_lasx_xvfsqrt_d): Ditto. (CODE_FOR_lasx_xvflogb_s): Ditto. (CODE_FOR_lasx_xvflogb_d): Ditto. (CODE_FOR_lasx_xvmax_b): Ditto. (CODE_FOR_lasx_xvmax_h): Ditto. (CODE_FOR_lasx_xvmax_w): Ditto. (CODE_FOR_lasx_xvmax_d): Ditto. (CODE_FOR_lasx_xvmaxi_b): Ditto. (CODE_FOR_lasx_xvmaxi_h): Ditto. (CODE_FOR_lasx_xvmaxi_w): Ditto. (CODE_FOR_lasx_xvmaxi_d): Ditto. (CODE_FOR_lasx_xvmax_bu): Ditto. (CODE_FOR_lasx_xvmax_hu): Ditto. (CODE_FOR_lasx_xvmax_wu): Ditto. (CODE_FOR_lasx_xvmax_du): Ditto. (CODE_FOR_lasx_xvmaxi_bu): Ditto. (CODE_FOR_lasx_xvmaxi_hu): Ditto. (CODE_FOR_lasx_xvmaxi_wu): Ditto. (CODE_FOR_lasx_xvmaxi_du): Ditto. (CODE_FOR_lasx_xvmin_b): Ditto. (CODE_FOR_lasx_xvmin_h): Ditto. (CODE_FOR_lasx_xvmin_w): Ditto. (CODE_FOR_lasx_xvmin_d): Ditto. (CODE_FOR_lasx_xvmini_b): Ditto. (CODE_FOR_lasx_xvmini_h): Ditto. (CODE_FOR_lasx_xvmini_w): Ditto. (CODE_FOR_lasx_xvmini_d): Ditto. (CODE_FOR_lasx_xvmin_bu): Ditto. (CODE_FOR_lasx_xvmin_hu): Ditto. (CODE_FOR_lasx_xvmin_wu): Ditto. (CODE_FOR_lasx_xvmin_du): Ditto. (CODE_FOR_lasx_xvmini_bu): Ditto. (CODE_FOR_lasx_xvmini_hu): Ditto. (CODE_FOR_lasx_xvmini_wu): Ditto. (CODE_FOR_lasx_xvmini_du): Ditto. (CODE_FOR_lasx_xvmod_b): Ditto. (CODE_FOR_lasx_xvmod_h): Ditto. (CODE_FOR_lasx_xvmod_w): Ditto. (CODE_FOR_lasx_xvmod_d): Ditto. (CODE_FOR_lasx_xvmod_bu): Ditto. (CODE_FOR_lasx_xvmod_hu): Ditto. (CODE_FOR_lasx_xvmod_wu): Ditto. (CODE_FOR_lasx_xvmod_du): Ditto. (CODE_FOR_lasx_xvmul_b): Ditto. (CODE_FOR_lasx_xvmul_h): Ditto. (CODE_FOR_lasx_xvmul_w): Ditto. (CODE_FOR_lasx_xvmul_d): Ditto. (CODE_FOR_lasx_xvclz_b): Ditto. (CODE_FOR_lasx_xvclz_h): Ditto. (CODE_FOR_lasx_xvclz_w): Ditto. (CODE_FOR_lasx_xvclz_d): Ditto. (CODE_FOR_lasx_xvnor_v): Ditto. (CODE_FOR_lasx_xvor_v): Ditto. (CODE_FOR_lasx_xvori_b): Ditto. (CODE_FOR_lasx_xvnori_b): Ditto. (CODE_FOR_lasx_xvpcnt_b): Ditto. (CODE_FOR_lasx_xvpcnt_h): Ditto. (CODE_FOR_lasx_xvpcnt_w): Ditto. (CODE_FOR_lasx_xvpcnt_d): Ditto. (CODE_FOR_lasx_xvxor_v): Ditto. (CODE_FOR_lasx_xvxori_b): Ditto. (CODE_FOR_lasx_xvsll_b): Ditto. (CODE_FOR_lasx_xvsll_h): Ditto. (CODE_FOR_lasx_xvsll_w): Ditto. (CODE_FOR_lasx_xvsll_d): Ditto. (CODE_FOR_lasx_xvslli_b): Ditto. (CODE_FOR_lasx_xvslli_h): Ditto. (CODE_FOR_lasx_xvslli_w): Ditto. (CODE_FOR_lasx_xvslli_d): Ditto. (CODE_FOR_lasx_xvsra_b): Ditto. (CODE_FOR_lasx_xvsra_h): Ditto. (CODE_FOR_lasx_xvsra_w): Ditto. (CODE_FOR_lasx_xvsra_d): Ditto. (CODE_FOR_lasx_xvsrai_b): Ditto. (CODE_FOR_lasx_xvsrai_h): Ditto. (CODE_FOR_lasx_xvsrai_w): Ditto. (CODE_FOR_lasx_xvsrai_d): Ditto. (CODE_FOR_lasx_xvsrl_b): Ditto. (CODE_FOR_lasx_xvsrl_h): Ditto. (CODE_FOR_lasx_xvsrl_w): Ditto. (CODE_FOR_lasx_xvsrl_d): Ditto. (CODE_FOR_lasx_xvsrli_b): Ditto. (CODE_FOR_lasx_xvsrli_h): Ditto. (CODE_FOR_lasx_xvsrli_w): Ditto. (CODE_FOR_lasx_xvsrli_d): Ditto. (CODE_FOR_lasx_xvsub_b): Ditto. (CODE_FOR_lasx_xvsub_h): Ditto. (CODE_FOR_lasx_xvsub_w): Ditto. (CODE_FOR_lasx_xvsub_d): Ditto. (CODE_FOR_lasx_xvsubi_bu): Ditto. (CODE_FOR_lasx_xvsubi_hu): Ditto. (CODE_FOR_lasx_xvsubi_wu): Ditto. (CODE_FOR_lasx_xvsubi_du): Ditto. (CODE_FOR_lasx_xvpackod_d): Ditto. (CODE_FOR_lasx_xvpackev_d): Ditto. (CODE_FOR_lasx_xvpickod_d): Ditto. (CODE_FOR_lasx_xvpickev_d): Ditto. (CODE_FOR_lasx_xvrepli_b): Ditto. (CODE_FOR_lasx_xvrepli_h): Ditto. (CODE_FOR_lasx_xvrepli_w): Ditto. (CODE_FOR_lasx_xvrepli_d): Ditto. (CODE_FOR_lasx_xvandn_v): Ditto. (CODE_FOR_lasx_xvorn_v): Ditto. (CODE_FOR_lasx_xvneg_b): Ditto. (CODE_FOR_lasx_xvneg_h): Ditto. (CODE_FOR_lasx_xvneg_w): Ditto. (CODE_FOR_lasx_xvneg_d): Ditto. (CODE_FOR_lasx_xvbsrl_v): Ditto. (CODE_FOR_lasx_xvbsll_v): Ditto. (CODE_FOR_lasx_xvfmadd_s): Ditto. (CODE_FOR_lasx_xvfmadd_d): Ditto. (CODE_FOR_lasx_xvfmsub_s): Ditto. (CODE_FOR_lasx_xvfmsub_d): Ditto. (CODE_FOR_lasx_xvfnmadd_s): Ditto. (CODE_FOR_lasx_xvfnmadd_d): Ditto. (CODE_FOR_lasx_xvfnmsub_s): Ditto. (CODE_FOR_lasx_xvfnmsub_d): Ditto. (CODE_FOR_lasx_xvpermi_q): Ditto. (CODE_FOR_lasx_xvpermi_d): Ditto. (CODE_FOR_lasx_xbnz_v): Ditto. (CODE_FOR_lasx_xbz_v): Ditto. (CODE_FOR_lasx_xvssub_b): Ditto. (CODE_FOR_lasx_xvssub_h): Ditto. (CODE_FOR_lasx_xvssub_w): Ditto. (CODE_FOR_lasx_xvssub_d): Ditto. (CODE_FOR_lasx_xvssub_bu): Ditto. (CODE_FOR_lasx_xvssub_hu): Ditto. (CODE_FOR_lasx_xvssub_wu): Ditto. (CODE_FOR_lasx_xvssub_du): Ditto. (CODE_FOR_lasx_xvabsd_b): Ditto. (CODE_FOR_lasx_xvabsd_h): Ditto. (CODE_FOR_lasx_xvabsd_w): Ditto. (CODE_FOR_lasx_xvabsd_d): Ditto. (CODE_FOR_lasx_xvabsd_bu): Ditto. (CODE_FOR_lasx_xvabsd_hu): Ditto. (CODE_FOR_lasx_xvabsd_wu): Ditto. (CODE_FOR_lasx_xvabsd_du): Ditto. (CODE_FOR_lasx_xvavg_b): Ditto. (CODE_FOR_lasx_xvavg_h): Ditto. (CODE_FOR_lasx_xvavg_w): Ditto. (CODE_FOR_lasx_xvavg_d): Ditto. (CODE_FOR_lasx_xvavg_bu): Ditto. (CODE_FOR_lasx_xvavg_hu): Ditto. (CODE_FOR_lasx_xvavg_wu): Ditto. (CODE_FOR_lasx_xvavg_du): Ditto. (CODE_FOR_lasx_xvavgr_b): Ditto. (CODE_FOR_lasx_xvavgr_h): Ditto. (CODE_FOR_lasx_xvavgr_w): Ditto. (CODE_FOR_lasx_xvavgr_d): Ditto. (CODE_FOR_lasx_xvavgr_bu): Ditto. (CODE_FOR_lasx_xvavgr_hu): Ditto. (CODE_FOR_lasx_xvavgr_wu): Ditto. (CODE_FOR_lasx_xvavgr_du): Ditto. (CODE_FOR_lasx_xvmuh_b): Ditto. (CODE_FOR_lasx_xvmuh_h): Ditto. (CODE_FOR_lasx_xvmuh_w): Ditto. (CODE_FOR_lasx_xvmuh_d): Ditto. (CODE_FOR_lasx_xvmuh_bu): Ditto. (CODE_FOR_lasx_xvmuh_hu): Ditto. (CODE_FOR_lasx_xvmuh_wu): Ditto. (CODE_FOR_lasx_xvmuh_du): Ditto. (CODE_FOR_lasx_xvssran_b_h): Ditto. (CODE_FOR_lasx_xvssran_h_w): Ditto. (CODE_FOR_lasx_xvssran_w_d): Ditto. (CODE_FOR_lasx_xvssran_bu_h): Ditto. (CODE_FOR_lasx_xvssran_hu_w): Ditto. (CODE_FOR_lasx_xvssran_wu_d): Ditto. (CODE_FOR_lasx_xvssrarn_b_h): Ditto. (CODE_FOR_lasx_xvssrarn_h_w): Ditto. (CODE_FOR_lasx_xvssrarn_w_d): Ditto. (CODE_FOR_lasx_xvssrarn_bu_h): Ditto. (CODE_FOR_lasx_xvssrarn_hu_w): Ditto. (CODE_FOR_lasx_xvssrarn_wu_d): Ditto. (CODE_FOR_lasx_xvssrln_bu_h): Ditto. (CODE_FOR_lasx_xvssrln_hu_w): Ditto. (CODE_FOR_lasx_xvssrln_wu_d): Ditto. (CODE_FOR_lasx_xvssrlrn_bu_h): Ditto. (CODE_FOR_lasx_xvssrlrn_hu_w): Ditto. (CODE_FOR_lasx_xvssrlrn_wu_d): Ditto. (CODE_FOR_lasx_xvftint_w_s): Ditto. (CODE_FOR_lasx_xvftint_l_d): Ditto. (CODE_FOR_lasx_xvftint_wu_s): Ditto. (CODE_FOR_lasx_xvftint_lu_d): Ditto. (CODE_FOR_lasx_xvsllwil_h_b): Ditto. (CODE_FOR_lasx_xvsllwil_w_h): Ditto. (CODE_FOR_lasx_xvsllwil_d_w): Ditto. (CODE_FOR_lasx_xvsllwil_hu_bu): Ditto. (CODE_FOR_lasx_xvsllwil_wu_hu): Ditto. (CODE_FOR_lasx_xvsllwil_du_wu): Ditto. (CODE_FOR_lasx_xvsat_b): Ditto. (CODE_FOR_lasx_xvsat_h): Ditto. (CODE_FOR_lasx_xvsat_w): Ditto. (CODE_FOR_lasx_xvsat_d): Ditto. (CODE_FOR_lasx_xvsat_bu): Ditto. (CODE_FOR_lasx_xvsat_hu): Ditto. (CODE_FOR_lasx_xvsat_wu): Ditto. (CODE_FOR_lasx_xvsat_du): Ditto. (loongarch_builtin_vectorized_function): Ditto. (loongarch_expand_builtin_insn): Ditto. (loongarch_expand_builtin): Ditto. * config/loongarch/loongarch-ftypes.def (1): Ditto. (2): Ditto. (3): Ditto. (4): Ditto. * config/loongarch/lasxintrin.h: New file. --- gcc/config.gcc | 2 +- gcc/config/loongarch/lasxintrin.h | 5338 ++++++++++++++++++++ gcc/config/loongarch/loongarch-builtins.cc | 1180 ++++- gcc/config/loongarch/loongarch-ftypes.def | 271 +- 4 files changed, 6788 insertions(+), 3 deletions(-) create mode 100644 gcc/config/loongarch/lasxintrin.h diff --git a/gcc/config.gcc b/gcc/config.gcc index d6b809cdb55..1cc04a8757e 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -469,7 +469,7 @@ mips*-*-*) ;; loongarch*-*-*) cpu_type=loongarch - extra_headers="larchintrin.h lsxintrin.h" + extra_headers="larchintrin.h lsxintrin.h lasxintrin.h" extra_objs="loongarch-c.o loongarch-builtins.o loongarch-cpu.o loongarch-opts.o loongarch-def.o" extra_gcc_objs="loongarch-driver.o loongarch-cpu.o loongarch-opts.o loongarch-def.o" extra_options="${extra_options} g.opt fused-madd.opt" diff --git a/gcc/config/loongarch/lasxintrin.h b/gcc/config/loongarch/lasxintrin.h new file mode 100644 index 00000000000..d3937992746 --- /dev/null +++ b/gcc/config/loongarch/lasxintrin.h @@ -0,0 +1,5338 @@ +/* LARCH Loongson ASX intrinsics include file. + + Copyright (C) 2018 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _GCC_LOONGSON_ASXINTRIN_H +#define _GCC_LOONGSON_ASXINTRIN_H 1 + +#if defined(__loongarch_asx) + +typedef signed char v32i8 __attribute__ ((vector_size(32), aligned(32))); +typedef signed char v32i8_b __attribute__ ((vector_size(32), aligned(1))); +typedef unsigned char v32u8 __attribute__ ((vector_size(32), aligned(32))); +typedef unsigned char v32u8_b __attribute__ ((vector_size(32), aligned(1))); +typedef short v16i16 __attribute__ ((vector_size(32), aligned(32))); +typedef short v16i16_h __attribute__ ((vector_size(32), aligned(2))); +typedef unsigned short v16u16 __attribute__ ((vector_size(32), aligned(32))); +typedef unsigned short v16u16_h __attribute__ ((vector_size(32), aligned(2))); +typedef int v8i32 __attribute__ ((vector_size(32), aligned(32))); +typedef int v8i32_w __attribute__ ((vector_size(32), aligned(4))); +typedef unsigned int v8u32 __attribute__ ((vector_size(32), aligned(32))); +typedef unsigned int v8u32_w __attribute__ ((vector_size(32), aligned(4))); +typedef long long v4i64 __attribute__ ((vector_size(32), aligned(32))); +typedef long long v4i64_d __attribute__ ((vector_size(32), aligned(8))); +typedef unsigned long long v4u64 __attribute__ ((vector_size(32), aligned(32))); +typedef unsigned long long v4u64_d __attribute__ ((vector_size(32), aligned(8))); +typedef float v8f32 __attribute__ ((vector_size(32), aligned(32))); +typedef float v8f32_w __attribute__ ((vector_size(32), aligned(4))); +typedef double v4f64 __attribute__ ((vector_size(32), aligned(32))); +typedef double v4f64_d __attribute__ ((vector_size(32), aligned(8))); +typedef float __m256 __attribute__ ((__vector_size__ (32), + __may_alias__)); +typedef long long __m256i __attribute__ ((__vector_size__ (32), + __may_alias__)); +typedef double __m256d __attribute__ ((__vector_size__ (32), + __may_alias__)); + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsll_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsll_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsll_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsll_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsll_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsll_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsll_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsll_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvslli_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvslli_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvslli_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvslli_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvslli_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslli_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvslli_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvslli_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsra_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsra_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsra_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsra_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsra_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsra_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsra_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsra_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvsrai_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsrai_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvsrai_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsrai_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvsrai_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsrai_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvsrai_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvsrai_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrar_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrar_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrar_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrar_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrar_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrar_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrar_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrar_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvsrari_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsrari_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvsrari_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsrari_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvsrari_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsrari_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvsrari_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvsrari_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrl_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrl_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrl_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrl_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrl_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrl_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrl_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrl_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvsrli_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsrli_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvsrli_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsrli_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvsrli_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsrli_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvsrli_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvsrli_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrlr_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrlr_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrlr_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrlr_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrlr_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrlr_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrlr_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrlr_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvsrlri_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsrlri_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvsrlri_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsrlri_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvsrlri_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsrlri_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvsrlri_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvsrlri_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitclr_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitclr_b ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitclr_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitclr_h ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitclr_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitclr_w ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitclr_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitclr_d ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvbitclri_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvbitclri_b ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV16HI, UV16HI, UQI. */ +#define __lasx_xvbitclri_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvbitclri_h ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV8SI, UV8SI, UQI. */ +#define __lasx_xvbitclri_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvbitclri_w ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV4DI, UV4DI, UQI. */ +#define __lasx_xvbitclri_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvbitclri_d ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitset_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitset_b ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitset_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitset_h ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitset_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitset_w ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitset_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitset_d ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvbitseti_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvbitseti_b ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV16HI, UV16HI, UQI. */ +#define __lasx_xvbitseti_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvbitseti_h ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV8SI, UV8SI, UQI. */ +#define __lasx_xvbitseti_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvbitseti_w ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV4DI, UV4DI, UQI. */ +#define __lasx_xvbitseti_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvbitseti_d ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitrev_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitrev_b ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitrev_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitrev_h ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitrev_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitrev_w ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitrev_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvbitrev_d ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvbitrevi_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvbitrevi_b ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV16HI, UV16HI, UQI. */ +#define __lasx_xvbitrevi_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvbitrevi_h ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV8SI, UV8SI, UQI. */ +#define __lasx_xvbitrevi_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvbitrevi_w ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV4DI, UV4DI, UQI. */ +#define __lasx_xvbitrevi_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvbitrevi_d ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadd_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadd_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadd_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadd_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadd_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadd_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadd_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadd_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvaddi_bu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvaddi_bu ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvaddi_hu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvaddi_hu ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvaddi_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvaddi_wu ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvaddi_du(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvaddi_du ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsub_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsub_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsub_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsub_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsub_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsub_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsub_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsub_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvsubi_bu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsubi_bu ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvsubi_hu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsubi_hu ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvsubi_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsubi_wu ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvsubi_du(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsubi_du ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V32QI, V32QI, QI. */ +#define __lasx_xvmaxi_b(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V16HI, V16HI, QI. */ +#define __lasx_xvmaxi_h(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V8SI, V8SI, QI. */ +#define __lasx_xvmaxi_w(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V4DI, V4DI, QI. */ +#define __lasx_xvmaxi_d(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmax_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmax_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvmaxi_bu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_bu ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV16HI, UV16HI, UQI. */ +#define __lasx_xvmaxi_hu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_hu ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV8SI, UV8SI, UQI. */ +#define __lasx_xvmaxi_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_wu ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV4DI, UV4DI, UQI. */ +#define __lasx_xvmaxi_du(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmaxi_du ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V32QI, V32QI, QI. */ +#define __lasx_xvmini_b(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V16HI, V16HI, QI. */ +#define __lasx_xvmini_h(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V8SI, V8SI, QI. */ +#define __lasx_xvmini_w(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V4DI, V4DI, QI. */ +#define __lasx_xvmini_d(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmin_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmin_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvmini_bu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_bu ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV16HI, UV16HI, UQI. */ +#define __lasx_xvmini_hu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_hu ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV8SI, UV8SI, UQI. */ +#define __lasx_xvmini_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_wu ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV4DI, UV4DI, UQI. */ +#define __lasx_xvmini_du(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvmini_du ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvseq_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvseq_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvseq_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvseq_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvseq_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvseq_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvseq_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvseq_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V32QI, V32QI, QI. */ +#define __lasx_xvseqi_b(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvseqi_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V16HI, V16HI, QI. */ +#define __lasx_xvseqi_h(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvseqi_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V8SI, V8SI, QI. */ +#define __lasx_xvseqi_w(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvseqi_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V4DI, V4DI, QI. */ +#define __lasx_xvseqi_d(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvseqi_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V32QI, V32QI, QI. */ +#define __lasx_xvslti_b(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V16HI, V16HI, QI. */ +#define __lasx_xvslti_h(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V8SI, V8SI, QI. */ +#define __lasx_xvslti_w(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V4DI, V4DI, QI. */ +#define __lasx_xvslti_d(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvslt_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvslt_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V32QI, UV32QI, UQI. */ +#define __lasx_xvslti_bu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_bu ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, UV16HI, UQI. */ +#define __lasx_xvslti_hu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_hu ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, UV8SI, UQI. */ +#define __lasx_xvslti_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_wu ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V4DI, UV4DI, UQI. */ +#define __lasx_xvslti_du(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslti_du ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V32QI, V32QI, QI. */ +#define __lasx_xvslei_b(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V16HI, V16HI, QI. */ +#define __lasx_xvslei_h(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V8SI, V8SI, QI. */ +#define __lasx_xvslei_w(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, si5. */ +/* Data types in instruction templates: V4DI, V4DI, QI. */ +#define __lasx_xvslei_d(/*__m256i*/ _1, /*si5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsle_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsle_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V32QI, UV32QI, UQI. */ +#define __lasx_xvslei_bu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_bu ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, UV16HI, UQI. */ +#define __lasx_xvslei_hu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_hu ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, UV8SI, UQI. */ +#define __lasx_xvslei_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_wu ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V4DI, UV4DI, UQI. */ +#define __lasx_xvslei_du(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvslei_du ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvsat_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvsat_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvsat_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvsat_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvsat_bu(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_bu ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV16HI, UV16HI, UQI. */ +#define __lasx_xvsat_hu(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_hu ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV8SI, UV8SI, UQI. */ +#define __lasx_xvsat_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_wu ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV4DI, UV4DI, UQI. */ +#define __lasx_xvsat_du(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvsat_du ((v4u64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadda_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadda_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadda_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadda_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadda_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadda_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadda_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadda_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsadd_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsadd_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavg_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavg_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvavgr_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvavgr_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssub_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssub_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvabsd_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvabsd_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmul_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmul_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmul_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmul_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmul_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmul_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmul_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmul_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmadd_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmadd_b ((v32i8)_1, (v32i8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmadd_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmadd_h ((v16i16)_1, (v16i16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmadd_w (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmadd_w ((v8i32)_1, (v8i32)_2, (v8i32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmadd_d (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmadd_d ((v4i64)_1, (v4i64)_2, (v4i64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmsub_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmsub_b ((v32i8)_1, (v32i8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmsub_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmsub_h ((v16i16)_1, (v16i16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmsub_w (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmsub_w ((v8i32)_1, (v8i32)_2, (v8i32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmsub_d (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmsub_d ((v4i64)_1, (v4i64)_2, (v4i64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvdiv_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvdiv_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_hu_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_hu_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_wu_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_wu_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_du_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_du_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_hu_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_hu_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_wu_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_wu_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_du_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_du_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmod_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmod_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvrepl128vei_b(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvrepl128vei_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvrepl128vei_h(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvrepl128vei_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui2. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvrepl128vei_w(/*__m256i*/ _1, /*ui2*/ _2) \ + ((__m256i)__builtin_lasx_xvrepl128vei_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui1. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvrepl128vei_d(/*__m256i*/ _1, /*ui1*/ _2) \ + ((__m256i)__builtin_lasx_xvrepl128vei_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickev_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickev_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickev_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickev_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickev_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickev_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickev_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickev_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickod_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickod_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickod_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickod_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickod_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickod_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpickod_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpickod_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvh_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvh_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvh_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvh_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvh_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvh_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvh_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvh_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvl_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvl_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvl_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvl_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvl_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvl_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvilvl_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvilvl_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackev_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackev_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackev_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackev_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackev_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackev_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackev_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackev_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackod_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackod_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackod_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackod_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackod_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackod_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpackod_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvpackod_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvshuf_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvshuf_b ((v32i8)_1, (v32i8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvshuf_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvshuf_h ((v16i16)_1, (v16i16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvshuf_w (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvshuf_w ((v8i32)_1, (v8i32)_2, (v8i32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvshuf_d (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvshuf_d ((v4i64)_1, (v4i64)_2, (v4i64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvand_v (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvand_v ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvandi_b(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvandi_b ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvor_v (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvor_v ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvori_b(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvori_b ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvnor_v (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvnor_v ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvnori_b(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvnori_b ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvxor_v (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvxor_v ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: UV32QI, UV32QI, UQI. */ +#define __lasx_xvxori_b(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvxori_b ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvbitsel_v (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvbitsel_v ((v32u8)_1, (v32u8)_2, (v32u8)_3); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI, USI. */ +#define __lasx_xvbitseli_b(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvbitseli_b ((v32u8)(_1), (v32u8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V32QI, V32QI, USI. */ +#define __lasx_xvshuf4i_b(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvshuf4i_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V16HI, V16HI, USI. */ +#define __lasx_xvshuf4i_h(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvshuf4i_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V8SI, V8SI, USI. */ +#define __lasx_xvshuf4i_w(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvshuf4i_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, rj. */ +/* Data types in instruction templates: V32QI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplgr2vr_b (int _1) +{ + return (__m256i)__builtin_lasx_xvreplgr2vr_b ((int)_1); +} + +/* Assembly instruction format: xd, rj. */ +/* Data types in instruction templates: V16HI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplgr2vr_h (int _1) +{ + return (__m256i)__builtin_lasx_xvreplgr2vr_h ((int)_1); +} + +/* Assembly instruction format: xd, rj. */ +/* Data types in instruction templates: V8SI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplgr2vr_w (int _1) +{ + return (__m256i)__builtin_lasx_xvreplgr2vr_w ((int)_1); +} + +/* Assembly instruction format: xd, rj. */ +/* Data types in instruction templates: V4DI, DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplgr2vr_d (long int _1) +{ + return (__m256i)__builtin_lasx_xvreplgr2vr_d ((long int)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpcnt_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvpcnt_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpcnt_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvpcnt_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpcnt_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvpcnt_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvpcnt_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvpcnt_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclo_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclo_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclo_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclo_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclo_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclo_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclo_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclo_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclz_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclz_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclz_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclz_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclz_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclz_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvclz_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvclz_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfadd_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfadd_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfadd_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfadd_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfsub_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfsub_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfsub_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfsub_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfmul_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfmul_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfmul_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfmul_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfdiv_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfdiv_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfdiv_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfdiv_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcvt_h_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcvt_h_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfcvt_s_d (__m256d _1, __m256d _2) +{ + return (__m256)__builtin_lasx_xvfcvt_s_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfmin_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfmin_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfmin_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfmin_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfmina_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfmina_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfmina_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfmina_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfmax_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfmax_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfmax_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfmax_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfmaxa_s (__m256 _1, __m256 _2) +{ + return (__m256)__builtin_lasx_xvfmaxa_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfmaxa_d (__m256d _1, __m256d _2) +{ + return (__m256d)__builtin_lasx_xvfmaxa_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfclass_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvfclass_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfclass_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvfclass_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfsqrt_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfsqrt_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfsqrt_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfsqrt_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrecip_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrecip_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrecip_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrecip_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrint_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrint_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrint_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrint_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrsqrt_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrsqrt_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrsqrt_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrsqrt_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvflogb_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvflogb_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvflogb_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvflogb_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfcvth_s_h (__m256i _1) +{ + return (__m256)__builtin_lasx_xvfcvth_s_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfcvth_d_s (__m256 _1) +{ + return (__m256d)__builtin_lasx_xvfcvth_d_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfcvtl_s_h (__m256i _1) +{ + return (__m256)__builtin_lasx_xvfcvtl_s_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfcvtl_d_s (__m256 _1) +{ + return (__m256d)__builtin_lasx_xvfcvtl_d_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftint_w_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftint_w_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftint_l_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvftint_l_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftint_wu_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftint_wu_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftint_lu_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvftint_lu_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrz_w_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrz_w_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrz_l_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvftintrz_l_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrz_wu_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrz_wu_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrz_lu_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvftintrz_lu_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvffint_s_w (__m256i _1) +{ + return (__m256)__builtin_lasx_xvffint_s_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvffint_d_l (__m256i _1) +{ + return (__m256d)__builtin_lasx_xvffint_d_l ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvffint_s_wu (__m256i _1) +{ + return (__m256)__builtin_lasx_xvffint_s_wu ((v8u32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvffint_d_lu (__m256i _1) +{ + return (__m256d)__builtin_lasx_xvffint_d_lu ((v4u64)_1); +} + +/* Assembly instruction format: xd, xj, rk. */ +/* Data types in instruction templates: V32QI, V32QI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve_b (__m256i _1, int _2) +{ + return (__m256i)__builtin_lasx_xvreplve_b ((v32i8)_1, (int)_2); +} + +/* Assembly instruction format: xd, xj, rk. */ +/* Data types in instruction templates: V16HI, V16HI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve_h (__m256i _1, int _2) +{ + return (__m256i)__builtin_lasx_xvreplve_h ((v16i16)_1, (int)_2); +} + +/* Assembly instruction format: xd, xj, rk. */ +/* Data types in instruction templates: V8SI, V8SI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve_w (__m256i _1, int _2) +{ + return (__m256i)__builtin_lasx_xvreplve_w ((v8i32)_1, (int)_2); +} + +/* Assembly instruction format: xd, xj, rk. */ +/* Data types in instruction templates: V4DI, V4DI, SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve_d (__m256i _1, int _2) +{ + return (__m256i)__builtin_lasx_xvreplve_d ((v4i64)_1, (int)_2); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvpermi_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvpermi_w ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvandn_v (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvandn_v ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvneg_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvneg_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvneg_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvneg_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvneg_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvneg_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvneg_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvneg_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmuh_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmuh_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V16HI, V32QI, UQI. */ +#define __lasx_xvsllwil_h_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsllwil_h_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V8SI, V16HI, UQI. */ +#define __lasx_xvsllwil_w_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsllwil_w_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V4DI, V8SI, UQI. */ +#define __lasx_xvsllwil_d_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsllwil_d_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: UV16HI, UV32QI, UQI. */ +#define __lasx_xvsllwil_hu_bu(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvsllwil_hu_bu ((v32u8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV8SI, UV16HI, UQI. */ +#define __lasx_xvsllwil_wu_hu(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvsllwil_wu_hu ((v16u16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV4DI, UV8SI, UQI. */ +#define __lasx_xvsllwil_du_wu(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvsllwil_du_wu ((v8u32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsran_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsran_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsran_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsran_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsran_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsran_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssran_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssran_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssran_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssran_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssran_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssran_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssran_bu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssran_bu_h ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssran_hu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssran_hu_w ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssran_wu_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssran_wu_d ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrarn_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrarn_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrarn_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrarn_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrarn_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrarn_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrarn_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrarn_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrarn_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrarn_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrarn_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrarn_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrarn_bu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrarn_bu_h ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrarn_hu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrarn_hu_w ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrarn_wu_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrarn_wu_d ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrln_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrln_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrln_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrln_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrln_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrln_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrln_bu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrln_bu_h ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrln_hu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrln_hu_w ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrln_wu_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrln_wu_d ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrlrn_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrlrn_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrlrn_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrlrn_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsrlrn_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsrlrn_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV32QI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrlrn_bu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrlrn_bu_h ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrlrn_hu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrlrn_hu_w ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrlrn_wu_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrlrn_wu_d ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, UQI. */ +#define __lasx_xvfrstpi_b(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvfrstpi_b ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, UQI. */ +#define __lasx_xvfrstpi_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvfrstpi_h ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfrstp_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvfrstp_b ((v32i8)_1, (v32i8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfrstp_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvfrstp_h ((v16i16)_1, (v16i16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvshuf4i_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvshuf4i_d ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvbsrl_v(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvbsrl_v ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvbsll_v(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvbsll_v ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvextrins_b(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvextrins_b ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvextrins_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvextrins_h ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvextrins_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvextrins_w ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvextrins_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvextrins_d ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmskltz_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvmskltz_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmskltz_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvmskltz_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmskltz_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvmskltz_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmskltz_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvmskltz_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsigncov_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsigncov_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsigncov_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsigncov_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsigncov_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsigncov_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsigncov_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsigncov_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfmadd_s (__m256 _1, __m256 _2, __m256 _3) +{ + return (__m256)__builtin_lasx_xvfmadd_s ((v8f32)_1, (v8f32)_2, (v8f32)_3); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfmadd_d (__m256d _1, __m256d _2, __m256d _3) +{ + return (__m256d)__builtin_lasx_xvfmadd_d ((v4f64)_1, (v4f64)_2, (v4f64)_3); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfmsub_s (__m256 _1, __m256 _2, __m256 _3) +{ + return (__m256)__builtin_lasx_xvfmsub_s ((v8f32)_1, (v8f32)_2, (v8f32)_3); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfmsub_d (__m256d _1, __m256d _2, __m256d _3) +{ + return (__m256d)__builtin_lasx_xvfmsub_d ((v4f64)_1, (v4f64)_2, (v4f64)_3); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfnmadd_s (__m256 _1, __m256 _2, __m256 _3) +{ + return (__m256)__builtin_lasx_xvfnmadd_s ((v8f32)_1, (v8f32)_2, (v8f32)_3); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfnmadd_d (__m256d _1, __m256d _2, __m256d _3) +{ + return (__m256d)__builtin_lasx_xvfnmadd_d ((v4f64)_1, (v4f64)_2, (v4f64)_3); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V8SF, V8SF, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfnmsub_s (__m256 _1, __m256 _2, __m256 _3) +{ + return (__m256)__builtin_lasx_xvfnmsub_s ((v8f32)_1, (v8f32)_2, (v8f32)_3); +} + +/* Assembly instruction format: xd, xj, xk, xa. */ +/* Data types in instruction templates: V4DF, V4DF, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfnmsub_d (__m256d _1, __m256d _2, __m256d _3) +{ + return (__m256d)__builtin_lasx_xvfnmsub_d ((v4f64)_1, (v4f64)_2, (v4f64)_3); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrne_w_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrne_w_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrne_l_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvftintrne_l_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrp_w_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrp_w_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrp_l_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvftintrp_l_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrm_w_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrm_w_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrm_l_d (__m256d _1) +{ + return (__m256i)__builtin_lasx_xvftintrm_l_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftint_w_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvftint_w_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SF, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvffint_s_l (__m256i _1, __m256i _2) +{ + return (__m256)__builtin_lasx_xvffint_s_l ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrz_w_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvftintrz_w_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrp_w_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvftintrp_w_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrm_w_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvftintrm_w_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrne_w_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvftintrne_w_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftinth_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftinth_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintl_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintl_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvffinth_d_w (__m256i _1) +{ + return (__m256d)__builtin_lasx_xvffinth_d_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvffintl_d_w (__m256i _1) +{ + return (__m256d)__builtin_lasx_xvffintl_d_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrzh_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrzh_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrzl_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrzl_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrph_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrph_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrpl_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrpl_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrmh_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrmh_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrml_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrml_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrneh_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrneh_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvftintrnel_l_s (__m256 _1) +{ + return (__m256i)__builtin_lasx_xvftintrnel_l_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrintrne_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrintrne_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrintrne_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrintrne_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrintrz_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrintrz_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrintrz_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrintrz_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrintrp_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrintrp_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrintrp_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrintrp_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrintrm_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrintrm_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrintrm_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrintrm_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, rj, si12. */ +/* Data types in instruction templates: V32QI, CVPOINTER, SI. */ +#define __lasx_xvld(/*void **/ _1, /*si12*/ _2) \ + ((__m256i)__builtin_lasx_xvld ((void *)(_1), (_2))) + +/* Assembly instruction format: xd, rj, si12. */ +/* Data types in instruction templates: VOID, V32QI, CVPOINTER, SI. */ +#define __lasx_xvst(/*__m256i*/ _1, /*void **/ _2, /*si12*/ _3) \ + ((void)__builtin_lasx_xvst ((v32i8)(_1), (void *)(_2), (_3))) + +/* Assembly instruction format: xd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V32QI, CVPOINTER, SI, UQI. */ +#define __lasx_xvstelm_b(/*__m256i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lasx_xvstelm_b ((v32i8)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: xd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V16HI, CVPOINTER, SI, UQI. */ +#define __lasx_xvstelm_h(/*__m256i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lasx_xvstelm_h ((v16i16)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: xd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V8SI, CVPOINTER, SI, UQI. */ +#define __lasx_xvstelm_w(/*__m256i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lasx_xvstelm_w ((v8i32)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: xd, rj, si8, idx. */ +/* Data types in instruction templates: VOID, V4DI, CVPOINTER, SI, UQI. */ +#define __lasx_xvstelm_d(/*__m256i*/ _1, /*void **/ _2, /*si8*/ _3, /*idx*/ _4) \ + ((void)__builtin_lasx_xvstelm_d ((v4i64)(_1), (void *)(_2), (_3), (_4))) + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, UQI. */ +#define __lasx_xvinsve0_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui3*/ _3) \ + ((__m256i)__builtin_lasx_xvinsve0_w ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui2. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, UQI. */ +#define __lasx_xvinsve0_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui2*/ _3) \ + ((__m256i)__builtin_lasx_xvinsve0_d ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvpickve_w(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvpickve_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui2. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvpickve_d(/*__m256i*/ _1, /*ui2*/ _2) \ + ((__m256i)__builtin_lasx_xvpickve_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrlrn_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrlrn_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrlrn_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrlrn_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrlrn_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrlrn_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrln_b_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrln_b_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrln_h_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrln_h_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvssrln_w_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvssrln_w_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvorn_v (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvorn_v ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, i13. */ +/* Data types in instruction templates: V4DI, HI. */ +#define __lasx_xvldi(/*i13*/ _1) \ + ((__m256i)__builtin_lasx_xvldi ((_1))) + +/* Assembly instruction format: xd, rj, rk. */ +/* Data types in instruction templates: V32QI, CVPOINTER, DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvldx (void * _1, long int _2) +{ + return (__m256i)__builtin_lasx_xvldx ((void *)_1, (long int)_2); +} + +/* Assembly instruction format: xd, rj, rk. */ +/* Data types in instruction templates: VOID, V32QI, CVPOINTER, DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +void __lasx_xvstx (__m256i _1, void * _2, long int _3) +{ + return (void)__builtin_lasx_xvstx ((v32i8)_1, (void *)_2, (long int)_3); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvextl_qu_du (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvextl_qu_du ((v4u64)_1); +} + +/* Assembly instruction format: xd, rj, ui3. */ +/* Data types in instruction templates: V8SI, V8SI, SI, UQI. */ +#define __lasx_xvinsgr2vr_w(/*__m256i*/ _1, /*int*/ _2, /*ui3*/ _3) \ + ((__m256i)__builtin_lasx_xvinsgr2vr_w ((v8i32)(_1), (int)(_2), (_3))) + +/* Assembly instruction format: xd, rj, ui2. */ +/* Data types in instruction templates: V4DI, V4DI, DI, UQI. */ +#define __lasx_xvinsgr2vr_d(/*__m256i*/ _1, /*long int*/ _2, /*ui2*/ _3) \ + ((__m256i)__builtin_lasx_xvinsgr2vr_d ((v4i64)(_1), (long int)(_2), (_3))) + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve0_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvreplve0_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve0_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvreplve0_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve0_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvreplve0_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve0_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvreplve0_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvreplve0_q (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvreplve0_q ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_h_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_h_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_w_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_w_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_d_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_d_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_w_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_w_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_d_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_d_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_d_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_d_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_hu_bu (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_hu_bu ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_wu_hu (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_wu_hu ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_du_wu (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_du_wu ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_wu_bu (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_wu_bu ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_du_hu (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_du_hu ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_vext2xv_du_bu (__m256i _1) +{ + return (__m256i)__builtin_lasx_vext2xv_du_bu ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvpermi_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui8*/ _3) \ + ((__m256i)__builtin_lasx_xvpermi_q ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui8. */ +/* Data types in instruction templates: V4DI, V4DI, USI. */ +#define __lasx_xvpermi_d(/*__m256i*/ _1, /*ui8*/ _2) \ + ((__m256i)__builtin_lasx_xvpermi_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvperm_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvperm_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, rj, si12. */ +/* Data types in instruction templates: V32QI, CVPOINTER, SI. */ +#define __lasx_xvldrepl_b(/*void **/ _1, /*si12*/ _2) \ + ((__m256i)__builtin_lasx_xvldrepl_b ((void *)(_1), (_2))) + +/* Assembly instruction format: xd, rj, si11. */ +/* Data types in instruction templates: V16HI, CVPOINTER, SI. */ +#define __lasx_xvldrepl_h(/*void **/ _1, /*si11*/ _2) \ + ((__m256i)__builtin_lasx_xvldrepl_h ((void *)(_1), (_2))) + +/* Assembly instruction format: xd, rj, si10. */ +/* Data types in instruction templates: V8SI, CVPOINTER, SI. */ +#define __lasx_xvldrepl_w(/*void **/ _1, /*si10*/ _2) \ + ((__m256i)__builtin_lasx_xvldrepl_w ((void *)(_1), (_2))) + +/* Assembly instruction format: xd, rj, si9. */ +/* Data types in instruction templates: V4DI, CVPOINTER, SI. */ +#define __lasx_xvldrepl_d(/*void **/ _1, /*si9*/ _2) \ + ((__m256i)__builtin_lasx_xvldrepl_d ((void *)(_1), (_2))) + +/* Assembly instruction format: rd, xj, ui3. */ +/* Data types in instruction templates: SI, V8SI, UQI. */ +#define __lasx_xvpickve2gr_w(/*__m256i*/ _1, /*ui3*/ _2) \ + ((int)__builtin_lasx_xvpickve2gr_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: rd, xj, ui3. */ +/* Data types in instruction templates: USI, V8SI, UQI. */ +#define __lasx_xvpickve2gr_wu(/*__m256i*/ _1, /*ui3*/ _2) \ + ((unsigned int)__builtin_lasx_xvpickve2gr_wu ((v8i32)(_1), (_2))) + +/* Assembly instruction format: rd, xj, ui2. */ +/* Data types in instruction templates: DI, V4DI, UQI. */ +#define __lasx_xvpickve2gr_d(/*__m256i*/ _1, /*ui2*/ _2) \ + ((long int)__builtin_lasx_xvpickve2gr_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: rd, xj, ui2. */ +/* Data types in instruction templates: UDI, V4DI, UQI. */ +#define __lasx_xvpickve2gr_du(/*__m256i*/ _1, /*ui2*/ _2) \ + ((unsigned long int)__builtin_lasx_xvpickve2gr_du ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_q_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_q_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_d_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_d_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_w_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_w_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_h_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_h_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_q_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_q_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_d_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_d_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_w_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_w_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwev_h_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwev_h_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_q_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_q_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_d_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_d_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_w_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_w_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_h_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_h_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_q_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_q_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_d_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_d_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_w_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_w_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_h_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_h_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_q_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_q_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_d_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_d_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_w_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_w_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsubwod_h_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsubwod_h_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_d_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_d_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_w_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_w_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_h_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_h_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_q_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_q_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_d_wu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_d_wu ((v8u32)_1, (v8u32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_w_hu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_w_hu ((v16u16)_1, (v16u16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_h_bu (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_h_bu ((v32u8)_1, (v32u8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_d_wu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_d_wu_w ((v8u32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_w_hu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_w_hu_h ((v16u16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_h_bu_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_h_bu_b ((v32u8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_d_wu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_d_wu_w ((v8u32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_w_hu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_w_hu_h ((v16u16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_h_bu_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_h_bu_b ((v32u8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_d_wu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_d_wu_w ((v8u32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_w_hu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_w_hu_h ((v16u16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_h_bu_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_h_bu_b ((v32u8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_d_wu_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_d_wu_w ((v8u32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, UV16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_w_hu_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_w_hu_h ((v16u16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, UV32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_h_bu_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_h_bu_b ((v32u8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhaddw_qu_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhaddw_qu_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_q_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_q_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvhsubw_qu_du (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvhsubw_qu_du ((v4u64)_1, (v4u64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_q_d (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_q_d ((v4i64)_1, (v4i64)_2, (v4i64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_d_w (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_d_w ((v4i64)_1, (v8i32)_2, (v8i32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_w_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_w_h ((v8i32)_1, (v16i16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_h_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_h_b ((v16i16)_1, (v32i8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_q_du (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_q_du ((v4u64)_1, (v4u64)_2, (v4u64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_d_wu (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_d_wu ((v4u64)_1, (v8u32)_2, (v8u32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_w_hu (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_w_hu ((v8u32)_1, (v16u16)_2, (v16u16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_h_bu (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_h_bu ((v16u16)_1, (v32u8)_2, (v32u8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_q_d (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_q_d ((v4i64)_1, (v4i64)_2, (v4i64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_d_w (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_d_w ((v4i64)_1, (v8i32)_2, (v8i32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_w_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_w_h ((v8i32)_1, (v16i16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_h_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_h_b ((v16i16)_1, (v32i8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_q_du (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_q_du ((v4u64)_1, (v4u64)_2, (v4u64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV4DI, UV4DI, UV8SI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_d_wu (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_d_wu ((v4u64)_1, (v8u32)_2, (v8u32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV8SI, UV8SI, UV16HI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_w_hu (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_w_hu ((v8u32)_1, (v16u16)_2, (v16u16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: UV16HI, UV16HI, UV32QI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_h_bu (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_h_bu ((v16u16)_1, (v32u8)_2, (v32u8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, UV4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_q_du_d (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_q_du_d ((v4i64)_1, (v4u64)_2, (v4i64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, UV8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_d_wu_w (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_d_wu_w ((v4i64)_1, (v8u32)_2, (v8i32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, UV16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_w_hu_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_w_hu_h ((v8i32)_1, (v16u16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, UV32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwev_h_bu_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwev_h_bu_b ((v16i16)_1, (v32u8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, UV4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_q_du_d (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_q_du_d ((v4i64)_1, (v4u64)_2, (v4i64)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, UV8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_d_wu_w (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_d_wu_w ((v4i64)_1, (v8u32)_2, (v8i32)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, UV16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_w_hu_h (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_w_hu_h ((v8i32)_1, (v16u16)_2, (v16i16)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, UV32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmaddwod_h_bu_b (__m256i _1, __m256i _2, __m256i _3) +{ + return (__m256i)__builtin_lasx_xvmaddwod_h_bu_b ((v16i16)_1, (v32u8)_2, (v32i8)_3); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvrotr_b (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvrotr_b ((v32i8)_1, (v32i8)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvrotr_h (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvrotr_h ((v16i16)_1, (v16i16)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvrotr_w (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvrotr_w ((v8i32)_1, (v8i32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvrotr_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvrotr_d ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvadd_q (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvadd_q ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvsub_q (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvsub_q ((v4i64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwev_q_du_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwev_q_du_d ((v4u64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvaddwod_q_du_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvaddwod_q_du_d ((v4u64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwev_q_du_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwev_q_du_d ((v4u64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, UV4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmulwod_q_du_d (__m256i _1, __m256i _2) +{ + return (__m256i)__builtin_lasx_xvmulwod_q_du_d ((v4u64)_1, (v4i64)_2); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmskgez_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvmskgez_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V32QI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvmsknz_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvmsknz_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V16HI, V32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_h_b (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_h_b ((v32i8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SI, V16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_w_h (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_w_h ((v16i16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_d_w (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_d_w ((v8i32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_q_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_q_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV16HI, UV32QI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_hu_bu (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_hu_bu ((v32u8)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV8SI, UV16HI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_wu_hu (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_wu_hu ((v16u16)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV4DI, UV8SI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_du_wu (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_du_wu ((v8u32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: UV4DI, UV4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvexth_qu_du (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvexth_qu_du ((v4u64)_1); +} + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V32QI, V32QI, UQI. */ +#define __lasx_xvrotri_b(/*__m256i*/ _1, /*ui3*/ _2) \ + ((__m256i)__builtin_lasx_xvrotri_b ((v32i8)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V16HI, V16HI, UQI. */ +#define __lasx_xvrotri_h(/*__m256i*/ _1, /*ui4*/ _2) \ + ((__m256i)__builtin_lasx_xvrotri_h ((v16i16)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V8SI, V8SI, UQI. */ +#define __lasx_xvrotri_w(/*__m256i*/ _1, /*ui5*/ _2) \ + ((__m256i)__builtin_lasx_xvrotri_w ((v8i32)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V4DI, V4DI, UQI. */ +#define __lasx_xvrotri_d(/*__m256i*/ _1, /*ui6*/ _2) \ + ((__m256i)__builtin_lasx_xvrotri_d ((v4i64)(_1), (_2))) + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DI, V4DI. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvextl_q_d (__m256i _1) +{ + return (__m256i)__builtin_lasx_xvextl_q_d ((v4i64)_1); +} + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvsrlni_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlni_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvsrlni_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlni_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvsrlni_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlni_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvsrlni_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlni_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvsrlrni_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlrni_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvsrlrni_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlrni_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvsrlrni_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlrni_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvsrlrni_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvsrlrni_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvssrlni_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvssrlni_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvssrlni_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvssrlni_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV32QI, UV32QI, V32QI, USI. */ +#define __lasx_xvssrlni_bu_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_bu_h ((v32u8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV16HI, UV16HI, V16HI, USI. */ +#define __lasx_xvssrlni_hu_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_hu_w ((v16u16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV8SI, UV8SI, V8SI, USI. */ +#define __lasx_xvssrlni_wu_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_wu_d ((v8u32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: UV4DI, UV4DI, V4DI, USI. */ +#define __lasx_xvssrlni_du_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlni_du_q ((v4u64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvssrlrni_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvssrlrni_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvssrlrni_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvssrlrni_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV32QI, UV32QI, V32QI, USI. */ +#define __lasx_xvssrlrni_bu_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_bu_h ((v32u8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV16HI, UV16HI, V16HI, USI. */ +#define __lasx_xvssrlrni_hu_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_hu_w ((v16u16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV8SI, UV8SI, V8SI, USI. */ +#define __lasx_xvssrlrni_wu_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_wu_d ((v8u32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: UV4DI, UV4DI, V4DI, USI. */ +#define __lasx_xvssrlrni_du_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrlrni_du_q ((v4u64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvsrani_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvsrani_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvsrani_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvsrani_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvsrani_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvsrani_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvsrani_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvsrani_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvsrarni_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvsrarni_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvsrarni_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvsrarni_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvsrarni_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvsrarni_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvsrarni_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvsrarni_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvssrani_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvssrani_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvssrani_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvssrani_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV32QI, UV32QI, V32QI, USI. */ +#define __lasx_xvssrani_bu_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_bu_h ((v32u8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV16HI, UV16HI, V16HI, USI. */ +#define __lasx_xvssrani_hu_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_hu_w ((v16u16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV8SI, UV8SI, V8SI, USI. */ +#define __lasx_xvssrani_wu_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_wu_d ((v8u32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: UV4DI, UV4DI, V4DI, USI. */ +#define __lasx_xvssrani_du_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrani_du_q ((v4u64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: V32QI, V32QI, V32QI, USI. */ +#define __lasx_xvssrarni_b_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_b_h ((v32i8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: V16HI, V16HI, V16HI, USI. */ +#define __lasx_xvssrarni_h_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_h_w ((v16i16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: V8SI, V8SI, V8SI, USI. */ +#define __lasx_xvssrarni_w_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_w_d ((v8i32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: V4DI, V4DI, V4DI, USI. */ +#define __lasx_xvssrarni_d_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_d_q ((v4i64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui4. */ +/* Data types in instruction templates: UV32QI, UV32QI, V32QI, USI. */ +#define __lasx_xvssrarni_bu_h(/*__m256i*/ _1, /*__m256i*/ _2, /*ui4*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_bu_h ((v32u8)(_1), (v32i8)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui5. */ +/* Data types in instruction templates: UV16HI, UV16HI, V16HI, USI. */ +#define __lasx_xvssrarni_hu_w(/*__m256i*/ _1, /*__m256i*/ _2, /*ui5*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_hu_w ((v16u16)(_1), (v16i16)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui6. */ +/* Data types in instruction templates: UV8SI, UV8SI, V8SI, USI. */ +#define __lasx_xvssrarni_wu_d(/*__m256i*/ _1, /*__m256i*/ _2, /*ui6*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_wu_d ((v8u32)(_1), (v8i32)(_2), (_3))) + +/* Assembly instruction format: xd, xj, ui7. */ +/* Data types in instruction templates: UV4DI, UV4DI, V4DI, USI. */ +#define __lasx_xvssrarni_du_q(/*__m256i*/ _1, /*__m256i*/ _2, /*ui7*/ _3) \ + ((__m256i)__builtin_lasx_xvssrarni_du_q ((v4u64)(_1), (v4i64)(_2), (_3))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV32QI. */ +#define __lasx_xbnz_b(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbnz_b ((v32u8)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV4DI. */ +#define __lasx_xbnz_d(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbnz_d ((v4u64)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV16HI. */ +#define __lasx_xbnz_h(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbnz_h ((v16u16)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV32QI. */ +#define __lasx_xbnz_v(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbnz_v ((v32u8)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV8SI. */ +#define __lasx_xbnz_w(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbnz_w ((v8u32)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV32QI. */ +#define __lasx_xbz_b(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbz_b ((v32u8)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV4DI. */ +#define __lasx_xbz_d(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbz_d ((v4u64)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV16HI. */ +#define __lasx_xbz_h(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbz_h ((v16u16)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV32QI. */ +#define __lasx_xbz_v(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbz_v ((v32u8)(_1))) + +/* Assembly instruction format: cd, xj. */ +/* Data types in instruction templates: SI, UV8SI. */ +#define __lasx_xbz_w(/*__m256i*/ _1) \ + ((int)__builtin_lasx_xbz_w ((v8u32)(_1))) + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_caf_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_caf_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_caf_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_caf_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_ceq_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_ceq_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_ceq_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_ceq_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cle_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cle_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cle_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cle_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_clt_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_clt_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_clt_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_clt_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cne_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cne_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cne_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cne_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cor_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cor_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cor_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cor_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cueq_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cueq_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cueq_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cueq_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cule_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cule_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cule_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cule_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cult_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cult_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cult_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cult_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cun_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cun_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cune_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cune_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cune_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cune_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_cun_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_cun_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_saf_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_saf_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_saf_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_saf_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_seq_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_seq_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_seq_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_seq_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sle_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sle_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sle_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sle_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_slt_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_slt_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_slt_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_slt_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sne_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sne_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sne_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sne_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sor_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sor_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sor_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sor_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sueq_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sueq_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sueq_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sueq_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sule_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sule_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sule_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sule_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sult_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sult_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sult_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sult_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sun_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sun_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V4DI, V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sune_d (__m256d _1, __m256d _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sune_d ((v4f64)_1, (v4f64)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sune_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sune_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, xk. */ +/* Data types in instruction templates: V8SI, V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256i __lasx_xvfcmp_sun_s (__m256 _1, __m256 _2) +{ + return (__m256i)__builtin_lasx_xvfcmp_sun_s ((v8f32)_1, (v8f32)_2); +} + +/* Assembly instruction format: xd, xj, ui2. */ +/* Data types in instruction templates: V4DF, V4DF, UQI. */ +#define __lasx_xvpickve_d_f(/*__m256d*/ _1, /*ui2*/ _2) \ + ((__m256d)__builtin_lasx_xvpickve_d_f ((v4f64)(_1), (_2))) + +/* Assembly instruction format: xd, xj, ui3. */ +/* Data types in instruction templates: V8SF, V8SF, UQI. */ +#define __lasx_xvpickve_w_f(/*__m256*/ _1, /*ui3*/ _2) \ + ((__m256)__builtin_lasx_xvpickve_w_f ((v8f32)(_1), (_2))) + +/* Assembly instruction format: xd, si10. */ +/* Data types in instruction templates: V32QI, HI. */ +#define __lasx_xvrepli_b(/*si10*/ _1) \ + ((__m256i)__builtin_lasx_xvrepli_b ((_1))) + +/* Assembly instruction format: xd, si10. */ +/* Data types in instruction templates: V4DI, HI. */ +#define __lasx_xvrepli_d(/*si10*/ _1) \ + ((__m256i)__builtin_lasx_xvrepli_d ((_1))) + +/* Assembly instruction format: xd, si10. */ +/* Data types in instruction templates: V16HI, HI. */ +#define __lasx_xvrepli_h(/*si10*/ _1) \ + ((__m256i)__builtin_lasx_xvrepli_h ((_1))) + +/* Assembly instruction format: xd, si10. */ +/* Data types in instruction templates: V8SI, HI. */ +#define __lasx_xvrepli_w(/*si10*/ _1) \ + ((__m256i)__builtin_lasx_xvrepli_w ((_1))) + +#endif /* defined(__loongarch_asx). */ +#endif /* _GCC_LOONGSON_ASXINTRIN_H. */ diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 5958f5b7fbe..064fee7dfa2 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -74,6 +74,13 @@ enum loongarch_builtin_type /* The function corresponds to an LSX conditional branch instruction combined with a compare instruction. */ LARCH_BUILTIN_LSX_TEST_BRANCH, + + /* For generating LoongArch LASX. */ + LARCH_BUILTIN_LASX, + + /* The function corresponds to an LASX conditional branch instruction + combined with a compare instruction. */ + LARCH_BUILTIN_LASX_TEST_BRANCH, }; /* Declare an availability predicate for built-in functions that require @@ -112,6 +119,7 @@ struct loongarch_builtin_description AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) AVAIL_ALL (lsx, ISA_HAS_LSX) +AVAIL_ALL (lasx, ISA_HAS_LASX) /* Construct a loongarch_builtin_description from the given arguments. @@ -173,6 +181,30 @@ AVAIL_ALL (lsx, ISA_HAS_LSX) "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \ FUNCTION_TYPE, loongarch_builtin_avail_lsx } +/* Define an LASX LARCH_BUILTIN_DIRECT function __builtin_lasx_ + for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description + field. */ +#define LASX_BUILTIN(INSN, FUNCTION_TYPE) \ + { CODE_FOR_lasx_ ## INSN, \ + "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX, \ + FUNCTION_TYPE, loongarch_builtin_avail_lasx } + +/* Define an LASX LARCH_BUILTIN_DIRECT_NO_TARGET function __builtin_lasx_ + for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description + field. */ +#define LASX_NO_TARGET_BUILTIN(INSN, FUNCTION_TYPE) \ + { CODE_FOR_lasx_ ## INSN, \ + "__builtin_lasx_" #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \ + FUNCTION_TYPE, loongarch_builtin_avail_lasx } + +/* Define an LASX LARCH_BUILTIN_LASX_TEST_BRANCH function __builtin_lasx_ + for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description + field. */ +#define LASX_BUILTIN_TEST_BRANCH(INSN, FUNCTION_TYPE) \ + { CODE_FOR_lasx_ ## INSN, \ + "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX_TEST_BRANCH, \ + FUNCTION_TYPE, loongarch_builtin_avail_lasx } + /* LoongArch SX define CODE_FOR_lsx_xxx */ #define CODE_FOR_lsx_vsadd_b CODE_FOR_ssaddv16qi3 #define CODE_FOR_lsx_vsadd_h CODE_FOR_ssaddv8hi3 @@ -442,6 +474,276 @@ AVAIL_ALL (lsx, ISA_HAS_LSX) #define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d +/* LoongArch ASX define CODE_FOR_lasx_mxxx */ +#define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3 +#define CODE_FOR_lasx_xvsadd_h CODE_FOR_ssaddv16hi3 +#define CODE_FOR_lasx_xvsadd_w CODE_FOR_ssaddv8si3 +#define CODE_FOR_lasx_xvsadd_d CODE_FOR_ssaddv4di3 +#define CODE_FOR_lasx_xvsadd_bu CODE_FOR_usaddv32qi3 +#define CODE_FOR_lasx_xvsadd_hu CODE_FOR_usaddv16hi3 +#define CODE_FOR_lasx_xvsadd_wu CODE_FOR_usaddv8si3 +#define CODE_FOR_lasx_xvsadd_du CODE_FOR_usaddv4di3 +#define CODE_FOR_lasx_xvadd_b CODE_FOR_addv32qi3 +#define CODE_FOR_lasx_xvadd_h CODE_FOR_addv16hi3 +#define CODE_FOR_lasx_xvadd_w CODE_FOR_addv8si3 +#define CODE_FOR_lasx_xvadd_d CODE_FOR_addv4di3 +#define CODE_FOR_lasx_xvaddi_bu CODE_FOR_addv32qi3 +#define CODE_FOR_lasx_xvaddi_hu CODE_FOR_addv16hi3 +#define CODE_FOR_lasx_xvaddi_wu CODE_FOR_addv8si3 +#define CODE_FOR_lasx_xvaddi_du CODE_FOR_addv4di3 +#define CODE_FOR_lasx_xvand_v CODE_FOR_andv32qi3 +#define CODE_FOR_lasx_xvandi_b CODE_FOR_andv32qi3 +#define CODE_FOR_lasx_xvbitsel_v CODE_FOR_lasx_xvbitsel_b +#define CODE_FOR_lasx_xvseqi_b CODE_FOR_lasx_xvseq_b +#define CODE_FOR_lasx_xvseqi_h CODE_FOR_lasx_xvseq_h +#define CODE_FOR_lasx_xvseqi_w CODE_FOR_lasx_xvseq_w +#define CODE_FOR_lasx_xvseqi_d CODE_FOR_lasx_xvseq_d +#define CODE_FOR_lasx_xvslti_b CODE_FOR_lasx_xvslt_b +#define CODE_FOR_lasx_xvslti_h CODE_FOR_lasx_xvslt_h +#define CODE_FOR_lasx_xvslti_w CODE_FOR_lasx_xvslt_w +#define CODE_FOR_lasx_xvslti_d CODE_FOR_lasx_xvslt_d +#define CODE_FOR_lasx_xvslti_bu CODE_FOR_lasx_xvslt_bu +#define CODE_FOR_lasx_xvslti_hu CODE_FOR_lasx_xvslt_hu +#define CODE_FOR_lasx_xvslti_wu CODE_FOR_lasx_xvslt_wu +#define CODE_FOR_lasx_xvslti_du CODE_FOR_lasx_xvslt_du +#define CODE_FOR_lasx_xvslei_b CODE_FOR_lasx_xvsle_b +#define CODE_FOR_lasx_xvslei_h CODE_FOR_lasx_xvsle_h +#define CODE_FOR_lasx_xvslei_w CODE_FOR_lasx_xvsle_w +#define CODE_FOR_lasx_xvslei_d CODE_FOR_lasx_xvsle_d +#define CODE_FOR_lasx_xvslei_bu CODE_FOR_lasx_xvsle_bu +#define CODE_FOR_lasx_xvslei_hu CODE_FOR_lasx_xvsle_hu +#define CODE_FOR_lasx_xvslei_wu CODE_FOR_lasx_xvsle_wu +#define CODE_FOR_lasx_xvslei_du CODE_FOR_lasx_xvsle_du +#define CODE_FOR_lasx_xvdiv_b CODE_FOR_divv32qi3 +#define CODE_FOR_lasx_xvdiv_h CODE_FOR_divv16hi3 +#define CODE_FOR_lasx_xvdiv_w CODE_FOR_divv8si3 +#define CODE_FOR_lasx_xvdiv_d CODE_FOR_divv4di3 +#define CODE_FOR_lasx_xvdiv_bu CODE_FOR_udivv32qi3 +#define CODE_FOR_lasx_xvdiv_hu CODE_FOR_udivv16hi3 +#define CODE_FOR_lasx_xvdiv_wu CODE_FOR_udivv8si3 +#define CODE_FOR_lasx_xvdiv_du CODE_FOR_udivv4di3 +#define CODE_FOR_lasx_xvfadd_s CODE_FOR_addv8sf3 +#define CODE_FOR_lasx_xvfadd_d CODE_FOR_addv4df3 +#define CODE_FOR_lasx_xvftintrz_w_s CODE_FOR_fix_truncv8sfv8si2 +#define CODE_FOR_lasx_xvftintrz_l_d CODE_FOR_fix_truncv4dfv4di2 +#define CODE_FOR_lasx_xvftintrz_wu_s CODE_FOR_fixuns_truncv8sfv8si2 +#define CODE_FOR_lasx_xvftintrz_lu_d CODE_FOR_fixuns_truncv4dfv4di2 +#define CODE_FOR_lasx_xvffint_s_w CODE_FOR_floatv8siv8sf2 +#define CODE_FOR_lasx_xvffint_d_l CODE_FOR_floatv4div4df2 +#define CODE_FOR_lasx_xvffint_s_wu CODE_FOR_floatunsv8siv8sf2 +#define CODE_FOR_lasx_xvffint_d_lu CODE_FOR_floatunsv4div4df2 +#define CODE_FOR_lasx_xvfsub_s CODE_FOR_subv8sf3 +#define CODE_FOR_lasx_xvfsub_d CODE_FOR_subv4df3 +#define CODE_FOR_lasx_xvfmul_s CODE_FOR_mulv8sf3 +#define CODE_FOR_lasx_xvfmul_d CODE_FOR_mulv4df3 +#define CODE_FOR_lasx_xvfdiv_s CODE_FOR_divv8sf3 +#define CODE_FOR_lasx_xvfdiv_d CODE_FOR_divv4df3 +#define CODE_FOR_lasx_xvfmax_s CODE_FOR_smaxv8sf3 +#define CODE_FOR_lasx_xvfmax_d CODE_FOR_smaxv4df3 +#define CODE_FOR_lasx_xvfmin_s CODE_FOR_sminv8sf3 +#define CODE_FOR_lasx_xvfmin_d CODE_FOR_sminv4df3 +#define CODE_FOR_lasx_xvfsqrt_s CODE_FOR_sqrtv8sf2 +#define CODE_FOR_lasx_xvfsqrt_d CODE_FOR_sqrtv4df2 +#define CODE_FOR_lasx_xvflogb_s CODE_FOR_logbv8sf2 +#define CODE_FOR_lasx_xvflogb_d CODE_FOR_logbv4df2 +#define CODE_FOR_lasx_xvmax_b CODE_FOR_smaxv32qi3 +#define CODE_FOR_lasx_xvmax_h CODE_FOR_smaxv16hi3 +#define CODE_FOR_lasx_xvmax_w CODE_FOR_smaxv8si3 +#define CODE_FOR_lasx_xvmax_d CODE_FOR_smaxv4di3 +#define CODE_FOR_lasx_xvmaxi_b CODE_FOR_smaxv32qi3 +#define CODE_FOR_lasx_xvmaxi_h CODE_FOR_smaxv16hi3 +#define CODE_FOR_lasx_xvmaxi_w CODE_FOR_smaxv8si3 +#define CODE_FOR_lasx_xvmaxi_d CODE_FOR_smaxv4di3 +#define CODE_FOR_lasx_xvmax_bu CODE_FOR_umaxv32qi3 +#define CODE_FOR_lasx_xvmax_hu CODE_FOR_umaxv16hi3 +#define CODE_FOR_lasx_xvmax_wu CODE_FOR_umaxv8si3 +#define CODE_FOR_lasx_xvmax_du CODE_FOR_umaxv4di3 +#define CODE_FOR_lasx_xvmaxi_bu CODE_FOR_umaxv32qi3 +#define CODE_FOR_lasx_xvmaxi_hu CODE_FOR_umaxv16hi3 +#define CODE_FOR_lasx_xvmaxi_wu CODE_FOR_umaxv8si3 +#define CODE_FOR_lasx_xvmaxi_du CODE_FOR_umaxv4di3 +#define CODE_FOR_lasx_xvmin_b CODE_FOR_sminv32qi3 +#define CODE_FOR_lasx_xvmin_h CODE_FOR_sminv16hi3 +#define CODE_FOR_lasx_xvmin_w CODE_FOR_sminv8si3 +#define CODE_FOR_lasx_xvmin_d CODE_FOR_sminv4di3 +#define CODE_FOR_lasx_xvmini_b CODE_FOR_sminv32qi3 +#define CODE_FOR_lasx_xvmini_h CODE_FOR_sminv16hi3 +#define CODE_FOR_lasx_xvmini_w CODE_FOR_sminv8si3 +#define CODE_FOR_lasx_xvmini_d CODE_FOR_sminv4di3 +#define CODE_FOR_lasx_xvmin_bu CODE_FOR_uminv32qi3 +#define CODE_FOR_lasx_xvmin_hu CODE_FOR_uminv16hi3 +#define CODE_FOR_lasx_xvmin_wu CODE_FOR_uminv8si3 +#define CODE_FOR_lasx_xvmin_du CODE_FOR_uminv4di3 +#define CODE_FOR_lasx_xvmini_bu CODE_FOR_uminv32qi3 +#define CODE_FOR_lasx_xvmini_hu CODE_FOR_uminv16hi3 +#define CODE_FOR_lasx_xvmini_wu CODE_FOR_uminv8si3 +#define CODE_FOR_lasx_xvmini_du CODE_FOR_uminv4di3 +#define CODE_FOR_lasx_xvmod_b CODE_FOR_modv32qi3 +#define CODE_FOR_lasx_xvmod_h CODE_FOR_modv16hi3 +#define CODE_FOR_lasx_xvmod_w CODE_FOR_modv8si3 +#define CODE_FOR_lasx_xvmod_d CODE_FOR_modv4di3 +#define CODE_FOR_lasx_xvmod_bu CODE_FOR_umodv32qi3 +#define CODE_FOR_lasx_xvmod_hu CODE_FOR_umodv16hi3 +#define CODE_FOR_lasx_xvmod_wu CODE_FOR_umodv8si3 +#define CODE_FOR_lasx_xvmod_du CODE_FOR_umodv4di3 +#define CODE_FOR_lasx_xvmul_b CODE_FOR_mulv32qi3 +#define CODE_FOR_lasx_xvmul_h CODE_FOR_mulv16hi3 +#define CODE_FOR_lasx_xvmul_w CODE_FOR_mulv8si3 +#define CODE_FOR_lasx_xvmul_d CODE_FOR_mulv4di3 +#define CODE_FOR_lasx_xvclz_b CODE_FOR_clzv32qi2 +#define CODE_FOR_lasx_xvclz_h CODE_FOR_clzv16hi2 +#define CODE_FOR_lasx_xvclz_w CODE_FOR_clzv8si2 +#define CODE_FOR_lasx_xvclz_d CODE_FOR_clzv4di2 +#define CODE_FOR_lasx_xvnor_v CODE_FOR_lasx_xvnor_b +#define CODE_FOR_lasx_xvor_v CODE_FOR_iorv32qi3 +#define CODE_FOR_lasx_xvori_b CODE_FOR_iorv32qi3 +#define CODE_FOR_lasx_xvnori_b CODE_FOR_lasx_xvnor_b +#define CODE_FOR_lasx_xvpcnt_b CODE_FOR_popcountv32qi2 +#define CODE_FOR_lasx_xvpcnt_h CODE_FOR_popcountv16hi2 +#define CODE_FOR_lasx_xvpcnt_w CODE_FOR_popcountv8si2 +#define CODE_FOR_lasx_xvpcnt_d CODE_FOR_popcountv4di2 +#define CODE_FOR_lasx_xvxor_v CODE_FOR_xorv32qi3 +#define CODE_FOR_lasx_xvxori_b CODE_FOR_xorv32qi3 +#define CODE_FOR_lasx_xvsll_b CODE_FOR_vashlv32qi3 +#define CODE_FOR_lasx_xvsll_h CODE_FOR_vashlv16hi3 +#define CODE_FOR_lasx_xvsll_w CODE_FOR_vashlv8si3 +#define CODE_FOR_lasx_xvsll_d CODE_FOR_vashlv4di3 +#define CODE_FOR_lasx_xvslli_b CODE_FOR_vashlv32qi3 +#define CODE_FOR_lasx_xvslli_h CODE_FOR_vashlv16hi3 +#define CODE_FOR_lasx_xvslli_w CODE_FOR_vashlv8si3 +#define CODE_FOR_lasx_xvslli_d CODE_FOR_vashlv4di3 +#define CODE_FOR_lasx_xvsra_b CODE_FOR_vashrv32qi3 +#define CODE_FOR_lasx_xvsra_h CODE_FOR_vashrv16hi3 +#define CODE_FOR_lasx_xvsra_w CODE_FOR_vashrv8si3 +#define CODE_FOR_lasx_xvsra_d CODE_FOR_vashrv4di3 +#define CODE_FOR_lasx_xvsrai_b CODE_FOR_vashrv32qi3 +#define CODE_FOR_lasx_xvsrai_h CODE_FOR_vashrv16hi3 +#define CODE_FOR_lasx_xvsrai_w CODE_FOR_vashrv8si3 +#define CODE_FOR_lasx_xvsrai_d CODE_FOR_vashrv4di3 +#define CODE_FOR_lasx_xvsrl_b CODE_FOR_vlshrv32qi3 +#define CODE_FOR_lasx_xvsrl_h CODE_FOR_vlshrv16hi3 +#define CODE_FOR_lasx_xvsrl_w CODE_FOR_vlshrv8si3 +#define CODE_FOR_lasx_xvsrl_d CODE_FOR_vlshrv4di3 +#define CODE_FOR_lasx_xvsrli_b CODE_FOR_vlshrv32qi3 +#define CODE_FOR_lasx_xvsrli_h CODE_FOR_vlshrv16hi3 +#define CODE_FOR_lasx_xvsrli_w CODE_FOR_vlshrv8si3 +#define CODE_FOR_lasx_xvsrli_d CODE_FOR_vlshrv4di3 +#define CODE_FOR_lasx_xvsub_b CODE_FOR_subv32qi3 +#define CODE_FOR_lasx_xvsub_h CODE_FOR_subv16hi3 +#define CODE_FOR_lasx_xvsub_w CODE_FOR_subv8si3 +#define CODE_FOR_lasx_xvsub_d CODE_FOR_subv4di3 +#define CODE_FOR_lasx_xvsubi_bu CODE_FOR_subv32qi3 +#define CODE_FOR_lasx_xvsubi_hu CODE_FOR_subv16hi3 +#define CODE_FOR_lasx_xvsubi_wu CODE_FOR_subv8si3 +#define CODE_FOR_lasx_xvsubi_du CODE_FOR_subv4di3 +#define CODE_FOR_lasx_xvpackod_d CODE_FOR_lasx_xvilvh_d +#define CODE_FOR_lasx_xvpackev_d CODE_FOR_lasx_xvilvl_d +#define CODE_FOR_lasx_xvpickod_d CODE_FOR_lasx_xvilvh_d +#define CODE_FOR_lasx_xvpickev_d CODE_FOR_lasx_xvilvl_d +#define CODE_FOR_lasx_xvrepli_b CODE_FOR_lasx_xvrepliv32qi +#define CODE_FOR_lasx_xvrepli_h CODE_FOR_lasx_xvrepliv16hi +#define CODE_FOR_lasx_xvrepli_w CODE_FOR_lasx_xvrepliv8si +#define CODE_FOR_lasx_xvrepli_d CODE_FOR_lasx_xvrepliv4di + +#define CODE_FOR_lasx_xvandn_v CODE_FOR_xvandnv32qi3 +#define CODE_FOR_lasx_xvorn_v CODE_FOR_xvornv32qi3 +#define CODE_FOR_lasx_xvneg_b CODE_FOR_negv32qi2 +#define CODE_FOR_lasx_xvneg_h CODE_FOR_negv16hi2 +#define CODE_FOR_lasx_xvneg_w CODE_FOR_negv8si2 +#define CODE_FOR_lasx_xvneg_d CODE_FOR_negv4di2 +#define CODE_FOR_lasx_xvbsrl_v CODE_FOR_lasx_xvbsrl_b +#define CODE_FOR_lasx_xvbsll_v CODE_FOR_lasx_xvbsll_b +#define CODE_FOR_lasx_xvfmadd_s CODE_FOR_fmav8sf4 +#define CODE_FOR_lasx_xvfmadd_d CODE_FOR_fmav4df4 +#define CODE_FOR_lasx_xvfmsub_s CODE_FOR_fmsv8sf4 +#define CODE_FOR_lasx_xvfmsub_d CODE_FOR_fmsv4df4 +#define CODE_FOR_lasx_xvfnmadd_s CODE_FOR_xvfnmaddv8sf4_nmadd4 +#define CODE_FOR_lasx_xvfnmadd_d CODE_FOR_xvfnmaddv4df4_nmadd4 +#define CODE_FOR_lasx_xvfnmsub_s CODE_FOR_xvfnmsubv8sf4_nmsub4 +#define CODE_FOR_lasx_xvfnmsub_d CODE_FOR_xvfnmsubv4df4_nmsub4 + +#define CODE_FOR_lasx_xvpermi_q CODE_FOR_lasx_xvpermi_q_v32qi +#define CODE_FOR_lasx_xvpermi_d CODE_FOR_lasx_xvpermi_d_v4di +#define CODE_FOR_lasx_xbnz_v CODE_FOR_lasx_xbnz_v_b +#define CODE_FOR_lasx_xbz_v CODE_FOR_lasx_xbz_v_b + +#define CODE_FOR_lasx_xvssub_b CODE_FOR_lasx_xvssub_s_b +#define CODE_FOR_lasx_xvssub_h CODE_FOR_lasx_xvssub_s_h +#define CODE_FOR_lasx_xvssub_w CODE_FOR_lasx_xvssub_s_w +#define CODE_FOR_lasx_xvssub_d CODE_FOR_lasx_xvssub_s_d +#define CODE_FOR_lasx_xvssub_bu CODE_FOR_lasx_xvssub_u_bu +#define CODE_FOR_lasx_xvssub_hu CODE_FOR_lasx_xvssub_u_hu +#define CODE_FOR_lasx_xvssub_wu CODE_FOR_lasx_xvssub_u_wu +#define CODE_FOR_lasx_xvssub_du CODE_FOR_lasx_xvssub_u_du +#define CODE_FOR_lasx_xvabsd_b CODE_FOR_lasx_xvabsd_s_b +#define CODE_FOR_lasx_xvabsd_h CODE_FOR_lasx_xvabsd_s_h +#define CODE_FOR_lasx_xvabsd_w CODE_FOR_lasx_xvabsd_s_w +#define CODE_FOR_lasx_xvabsd_d CODE_FOR_lasx_xvabsd_s_d +#define CODE_FOR_lasx_xvabsd_bu CODE_FOR_lasx_xvabsd_u_bu +#define CODE_FOR_lasx_xvabsd_hu CODE_FOR_lasx_xvabsd_u_hu +#define CODE_FOR_lasx_xvabsd_wu CODE_FOR_lasx_xvabsd_u_wu +#define CODE_FOR_lasx_xvabsd_du CODE_FOR_lasx_xvabsd_u_du +#define CODE_FOR_lasx_xvavg_b CODE_FOR_lasx_xvavg_s_b +#define CODE_FOR_lasx_xvavg_h CODE_FOR_lasx_xvavg_s_h +#define CODE_FOR_lasx_xvavg_w CODE_FOR_lasx_xvavg_s_w +#define CODE_FOR_lasx_xvavg_d CODE_FOR_lasx_xvavg_s_d +#define CODE_FOR_lasx_xvavg_bu CODE_FOR_lasx_xvavg_u_bu +#define CODE_FOR_lasx_xvavg_hu CODE_FOR_lasx_xvavg_u_hu +#define CODE_FOR_lasx_xvavg_wu CODE_FOR_lasx_xvavg_u_wu +#define CODE_FOR_lasx_xvavg_du CODE_FOR_lasx_xvavg_u_du +#define CODE_FOR_lasx_xvavgr_b CODE_FOR_lasx_xvavgr_s_b +#define CODE_FOR_lasx_xvavgr_h CODE_FOR_lasx_xvavgr_s_h +#define CODE_FOR_lasx_xvavgr_w CODE_FOR_lasx_xvavgr_s_w +#define CODE_FOR_lasx_xvavgr_d CODE_FOR_lasx_xvavgr_s_d +#define CODE_FOR_lasx_xvavgr_bu CODE_FOR_lasx_xvavgr_u_bu +#define CODE_FOR_lasx_xvavgr_hu CODE_FOR_lasx_xvavgr_u_hu +#define CODE_FOR_lasx_xvavgr_wu CODE_FOR_lasx_xvavgr_u_wu +#define CODE_FOR_lasx_xvavgr_du CODE_FOR_lasx_xvavgr_u_du +#define CODE_FOR_lasx_xvmuh_b CODE_FOR_lasx_xvmuh_s_b +#define CODE_FOR_lasx_xvmuh_h CODE_FOR_lasx_xvmuh_s_h +#define CODE_FOR_lasx_xvmuh_w CODE_FOR_lasx_xvmuh_s_w +#define CODE_FOR_lasx_xvmuh_d CODE_FOR_lasx_xvmuh_s_d +#define CODE_FOR_lasx_xvmuh_bu CODE_FOR_lasx_xvmuh_u_bu +#define CODE_FOR_lasx_xvmuh_hu CODE_FOR_lasx_xvmuh_u_hu +#define CODE_FOR_lasx_xvmuh_wu CODE_FOR_lasx_xvmuh_u_wu +#define CODE_FOR_lasx_xvmuh_du CODE_FOR_lasx_xvmuh_u_du +#define CODE_FOR_lasx_xvssran_b_h CODE_FOR_lasx_xvssran_s_b_h +#define CODE_FOR_lasx_xvssran_h_w CODE_FOR_lasx_xvssran_s_h_w +#define CODE_FOR_lasx_xvssran_w_d CODE_FOR_lasx_xvssran_s_w_d +#define CODE_FOR_lasx_xvssran_bu_h CODE_FOR_lasx_xvssran_u_bu_h +#define CODE_FOR_lasx_xvssran_hu_w CODE_FOR_lasx_xvssran_u_hu_w +#define CODE_FOR_lasx_xvssran_wu_d CODE_FOR_lasx_xvssran_u_wu_d +#define CODE_FOR_lasx_xvssrarn_b_h CODE_FOR_lasx_xvssrarn_s_b_h +#define CODE_FOR_lasx_xvssrarn_h_w CODE_FOR_lasx_xvssrarn_s_h_w +#define CODE_FOR_lasx_xvssrarn_w_d CODE_FOR_lasx_xvssrarn_s_w_d +#define CODE_FOR_lasx_xvssrarn_bu_h CODE_FOR_lasx_xvssrarn_u_bu_h +#define CODE_FOR_lasx_xvssrarn_hu_w CODE_FOR_lasx_xvssrarn_u_hu_w +#define CODE_FOR_lasx_xvssrarn_wu_d CODE_FOR_lasx_xvssrarn_u_wu_d +#define CODE_FOR_lasx_xvssrln_bu_h CODE_FOR_lasx_xvssrln_u_bu_h +#define CODE_FOR_lasx_xvssrln_hu_w CODE_FOR_lasx_xvssrln_u_hu_w +#define CODE_FOR_lasx_xvssrln_wu_d CODE_FOR_lasx_xvssrln_u_wu_d +#define CODE_FOR_lasx_xvssrlrn_bu_h CODE_FOR_lasx_xvssrlrn_u_bu_h +#define CODE_FOR_lasx_xvssrlrn_hu_w CODE_FOR_lasx_xvssrlrn_u_hu_w +#define CODE_FOR_lasx_xvssrlrn_wu_d CODE_FOR_lasx_xvssrlrn_u_wu_d +#define CODE_FOR_lasx_xvftint_w_s CODE_FOR_lasx_xvftint_s_w_s +#define CODE_FOR_lasx_xvftint_l_d CODE_FOR_lasx_xvftint_s_l_d +#define CODE_FOR_lasx_xvftint_wu_s CODE_FOR_lasx_xvftint_u_wu_s +#define CODE_FOR_lasx_xvftint_lu_d CODE_FOR_lasx_xvftint_u_lu_d +#define CODE_FOR_lasx_xvsllwil_h_b CODE_FOR_lasx_xvsllwil_s_h_b +#define CODE_FOR_lasx_xvsllwil_w_h CODE_FOR_lasx_xvsllwil_s_w_h +#define CODE_FOR_lasx_xvsllwil_d_w CODE_FOR_lasx_xvsllwil_s_d_w +#define CODE_FOR_lasx_xvsllwil_hu_bu CODE_FOR_lasx_xvsllwil_u_hu_bu +#define CODE_FOR_lasx_xvsllwil_wu_hu CODE_FOR_lasx_xvsllwil_u_wu_hu +#define CODE_FOR_lasx_xvsllwil_du_wu CODE_FOR_lasx_xvsllwil_u_du_wu +#define CODE_FOR_lasx_xvsat_b CODE_FOR_lasx_xvsat_s_b +#define CODE_FOR_lasx_xvsat_h CODE_FOR_lasx_xvsat_s_h +#define CODE_FOR_lasx_xvsat_w CODE_FOR_lasx_xvsat_s_w +#define CODE_FOR_lasx_xvsat_d CODE_FOR_lasx_xvsat_s_d +#define CODE_FOR_lasx_xvsat_bu CODE_FOR_lasx_xvsat_u_bu +#define CODE_FOR_lasx_xvsat_hu CODE_FOR_lasx_xvsat_u_hu +#define CODE_FOR_lasx_xvsat_wu CODE_FOR_lasx_xvsat_u_wu +#define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du + static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 DIRECT_BUILTIN (movfcsr2gr, LARCH_USI_FTYPE_UQI, hard_float), @@ -1209,7 +1511,761 @@ static const struct loongarch_builtin_description loongarch_builtins[] = { LSX_BUILTIN (vshuf_b, LARCH_V16QI_FTYPE_V16QI_V16QI_V16QI), LSX_BUILTIN (vldx, LARCH_V16QI_FTYPE_CVPOINTER_DI), LSX_NO_TARGET_BUILTIN (vstx, LARCH_VOID_FTYPE_V16QI_CVPOINTER_DI), - LSX_BUILTIN (vextl_qu_du, LARCH_UV2DI_FTYPE_UV2DI) + LSX_BUILTIN (vextl_qu_du, LARCH_UV2DI_FTYPE_UV2DI), + + /* Built-in functions for LASX */ + LASX_BUILTIN (xvsll_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsll_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsll_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsll_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvslli_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvslli_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvslli_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvslli_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvsra_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsra_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsra_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsra_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsrai_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvsrai_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvsrai_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvsrai_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvsrar_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsrar_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsrar_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsrar_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsrari_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvsrari_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvsrari_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvsrari_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvsrl_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsrl_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsrl_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsrl_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsrli_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvsrli_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvsrli_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvsrli_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvsrlr_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsrlr_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsrlr_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsrlr_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsrlri_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvsrlri_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvsrlri_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvsrlri_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvbitclr_b, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvbitclr_h, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvbitclr_w, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvbitclr_d, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvbitclri_b, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvbitclri_h, LARCH_UV16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvbitclri_w, LARCH_UV8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvbitclri_d, LARCH_UV4DI_FTYPE_UV4DI_UQI), + LASX_BUILTIN (xvbitset_b, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvbitset_h, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvbitset_w, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvbitset_d, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvbitseti_b, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvbitseti_h, LARCH_UV16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvbitseti_w, LARCH_UV8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvbitseti_d, LARCH_UV4DI_FTYPE_UV4DI_UQI), + LASX_BUILTIN (xvbitrev_b, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvbitrev_h, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvbitrev_w, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvbitrev_d, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvbitrevi_b, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvbitrevi_h, LARCH_UV16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvbitrevi_w, LARCH_UV8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvbitrevi_d, LARCH_UV4DI_FTYPE_UV4DI_UQI), + LASX_BUILTIN (xvadd_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvadd_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvadd_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvadd_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvaddi_bu, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvaddi_hu, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvaddi_wu, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvaddi_du, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvsub_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsub_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsub_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsub_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsubi_bu, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvsubi_hu, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvsubi_wu, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvsubi_du, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvmax_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvmax_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvmax_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvmax_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvmaxi_b, LARCH_V32QI_FTYPE_V32QI_QI), + LASX_BUILTIN (xvmaxi_h, LARCH_V16HI_FTYPE_V16HI_QI), + LASX_BUILTIN (xvmaxi_w, LARCH_V8SI_FTYPE_V8SI_QI), + LASX_BUILTIN (xvmaxi_d, LARCH_V4DI_FTYPE_V4DI_QI), + LASX_BUILTIN (xvmax_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvmax_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvmax_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvmax_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvmaxi_bu, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvmaxi_hu, LARCH_UV16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvmaxi_wu, LARCH_UV8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvmaxi_du, LARCH_UV4DI_FTYPE_UV4DI_UQI), + LASX_BUILTIN (xvmin_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvmin_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvmin_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvmin_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvmini_b, LARCH_V32QI_FTYPE_V32QI_QI), + LASX_BUILTIN (xvmini_h, LARCH_V16HI_FTYPE_V16HI_QI), + LASX_BUILTIN (xvmini_w, LARCH_V8SI_FTYPE_V8SI_QI), + LASX_BUILTIN (xvmini_d, LARCH_V4DI_FTYPE_V4DI_QI), + LASX_BUILTIN (xvmin_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvmin_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvmin_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvmin_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvmini_bu, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvmini_hu, LARCH_UV16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvmini_wu, LARCH_UV8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvmini_du, LARCH_UV4DI_FTYPE_UV4DI_UQI), + LASX_BUILTIN (xvseq_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvseq_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvseq_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvseq_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvseqi_b, LARCH_V32QI_FTYPE_V32QI_QI), + LASX_BUILTIN (xvseqi_h, LARCH_V16HI_FTYPE_V16HI_QI), + LASX_BUILTIN (xvseqi_w, LARCH_V8SI_FTYPE_V8SI_QI), + LASX_BUILTIN (xvseqi_d, LARCH_V4DI_FTYPE_V4DI_QI), + LASX_BUILTIN (xvslt_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvslt_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvslt_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvslt_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvslti_b, LARCH_V32QI_FTYPE_V32QI_QI), + LASX_BUILTIN (xvslti_h, LARCH_V16HI_FTYPE_V16HI_QI), + LASX_BUILTIN (xvslti_w, LARCH_V8SI_FTYPE_V8SI_QI), + LASX_BUILTIN (xvslti_d, LARCH_V4DI_FTYPE_V4DI_QI), + LASX_BUILTIN (xvslt_bu, LARCH_V32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvslt_hu, LARCH_V16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvslt_wu, LARCH_V8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvslt_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvslti_bu, LARCH_V32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvslti_hu, LARCH_V16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvslti_wu, LARCH_V8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvslti_du, LARCH_V4DI_FTYPE_UV4DI_UQI), + LASX_BUILTIN (xvsle_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsle_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsle_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsle_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvslei_b, LARCH_V32QI_FTYPE_V32QI_QI), + LASX_BUILTIN (xvslei_h, LARCH_V16HI_FTYPE_V16HI_QI), + LASX_BUILTIN (xvslei_w, LARCH_V8SI_FTYPE_V8SI_QI), + LASX_BUILTIN (xvslei_d, LARCH_V4DI_FTYPE_V4DI_QI), + LASX_BUILTIN (xvsle_bu, LARCH_V32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvsle_hu, LARCH_V16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvsle_wu, LARCH_V8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvsle_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvslei_bu, LARCH_V32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvslei_hu, LARCH_V16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvslei_wu, LARCH_V8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvslei_du, LARCH_V4DI_FTYPE_UV4DI_UQI), + + LASX_BUILTIN (xvsat_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvsat_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvsat_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvsat_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvsat_bu, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvsat_hu, LARCH_UV16HI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvsat_wu, LARCH_UV8SI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvsat_du, LARCH_UV4DI_FTYPE_UV4DI_UQI), + + LASX_BUILTIN (xvadda_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvadda_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvadda_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvadda_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsadd_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsadd_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsadd_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsadd_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsadd_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvsadd_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvsadd_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvsadd_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + + LASX_BUILTIN (xvavg_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvavg_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvavg_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvavg_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvavg_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvavg_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvavg_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvavg_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + + LASX_BUILTIN (xvavgr_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvavgr_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvavgr_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvavgr_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvavgr_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvavgr_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvavgr_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvavgr_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + + LASX_BUILTIN (xvssub_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvssub_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvssub_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvssub_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssub_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvssub_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvssub_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvssub_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvabsd_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvabsd_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvabsd_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvabsd_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvabsd_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvabsd_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvabsd_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvabsd_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + + LASX_BUILTIN (xvmul_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvmul_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvmul_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvmul_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvmadd_b, LARCH_V32QI_FTYPE_V32QI_V32QI_V32QI), + LASX_BUILTIN (xvmadd_h, LARCH_V16HI_FTYPE_V16HI_V16HI_V16HI), + LASX_BUILTIN (xvmadd_w, LARCH_V8SI_FTYPE_V8SI_V8SI_V8SI), + LASX_BUILTIN (xvmadd_d, LARCH_V4DI_FTYPE_V4DI_V4DI_V4DI), + LASX_BUILTIN (xvmsub_b, LARCH_V32QI_FTYPE_V32QI_V32QI_V32QI), + LASX_BUILTIN (xvmsub_h, LARCH_V16HI_FTYPE_V16HI_V16HI_V16HI), + LASX_BUILTIN (xvmsub_w, LARCH_V8SI_FTYPE_V8SI_V8SI_V8SI), + LASX_BUILTIN (xvmsub_d, LARCH_V4DI_FTYPE_V4DI_V4DI_V4DI), + LASX_BUILTIN (xvdiv_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvdiv_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvdiv_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvdiv_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvdiv_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvdiv_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvdiv_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvdiv_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvhaddw_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvhaddw_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvhaddw_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvhaddw_hu_bu, LARCH_UV16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvhaddw_wu_hu, LARCH_UV8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvhaddw_du_wu, LARCH_UV4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvhsubw_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvhsubw_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvhsubw_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvhsubw_hu_bu, LARCH_V16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvhsubw_wu_hu, LARCH_V8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvhsubw_du_wu, LARCH_V4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvmod_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvmod_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvmod_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvmod_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvmod_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvmod_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvmod_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvmod_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + + LASX_BUILTIN (xvrepl128vei_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvrepl128vei_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvrepl128vei_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvrepl128vei_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvpickev_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvpickev_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvpickev_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvpickev_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvpickod_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvpickod_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvpickod_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvpickod_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvilvh_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvilvh_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvilvh_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvilvh_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvilvl_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvilvl_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvilvl_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvilvl_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvpackev_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvpackev_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvpackev_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvpackev_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvpackod_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvpackod_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvpackod_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvpackod_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvshuf_b, LARCH_V32QI_FTYPE_V32QI_V32QI_V32QI), + LASX_BUILTIN (xvshuf_h, LARCH_V16HI_FTYPE_V16HI_V16HI_V16HI), + LASX_BUILTIN (xvshuf_w, LARCH_V8SI_FTYPE_V8SI_V8SI_V8SI), + LASX_BUILTIN (xvshuf_d, LARCH_V4DI_FTYPE_V4DI_V4DI_V4DI), + LASX_BUILTIN (xvand_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvandi_b, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvor_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvori_b, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvnor_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvnori_b, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvxor_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvxori_b, LARCH_UV32QI_FTYPE_UV32QI_UQI), + LASX_BUILTIN (xvbitsel_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI_UV32QI), + LASX_BUILTIN (xvbitseli_b, LARCH_UV32QI_FTYPE_UV32QI_UV32QI_USI), + + LASX_BUILTIN (xvshuf4i_b, LARCH_V32QI_FTYPE_V32QI_USI), + LASX_BUILTIN (xvshuf4i_h, LARCH_V16HI_FTYPE_V16HI_USI), + LASX_BUILTIN (xvshuf4i_w, LARCH_V8SI_FTYPE_V8SI_USI), + + LASX_BUILTIN (xvreplgr2vr_b, LARCH_V32QI_FTYPE_SI), + LASX_BUILTIN (xvreplgr2vr_h, LARCH_V16HI_FTYPE_SI), + LASX_BUILTIN (xvreplgr2vr_w, LARCH_V8SI_FTYPE_SI), + LASX_BUILTIN (xvreplgr2vr_d, LARCH_V4DI_FTYPE_DI), + LASX_BUILTIN (xvpcnt_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvpcnt_h, LARCH_V16HI_FTYPE_V16HI), + LASX_BUILTIN (xvpcnt_w, LARCH_V8SI_FTYPE_V8SI), + LASX_BUILTIN (xvpcnt_d, LARCH_V4DI_FTYPE_V4DI), + LASX_BUILTIN (xvclo_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvclo_h, LARCH_V16HI_FTYPE_V16HI), + LASX_BUILTIN (xvclo_w, LARCH_V8SI_FTYPE_V8SI), + LASX_BUILTIN (xvclo_d, LARCH_V4DI_FTYPE_V4DI), + LASX_BUILTIN (xvclz_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvclz_h, LARCH_V16HI_FTYPE_V16HI), + LASX_BUILTIN (xvclz_w, LARCH_V8SI_FTYPE_V8SI), + LASX_BUILTIN (xvclz_d, LARCH_V4DI_FTYPE_V4DI), + + LASX_BUILTIN (xvrepli_b, LARCH_V32QI_FTYPE_HI), + LASX_BUILTIN (xvrepli_h, LARCH_V16HI_FTYPE_HI), + LASX_BUILTIN (xvrepli_w, LARCH_V8SI_FTYPE_HI), + LASX_BUILTIN (xvrepli_d, LARCH_V4DI_FTYPE_HI), + LASX_BUILTIN (xvfcmp_caf_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_caf_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cor_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cor_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cun_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cun_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cune_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cune_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cueq_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cueq_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_ceq_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_ceq_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cne_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cne_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_clt_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_clt_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cult_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cult_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cle_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cle_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_cule_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_cule_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_saf_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_saf_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sor_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sor_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sun_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sun_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sune_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sune_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sueq_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sueq_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_seq_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_seq_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sne_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sne_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_slt_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_slt_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sult_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sult_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sle_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sle_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcmp_sule_s, LARCH_V8SI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcmp_sule_d, LARCH_V4DI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfadd_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfadd_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfsub_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfsub_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfmul_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfmul_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfdiv_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfdiv_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfcvt_h_s, LARCH_V16HI_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfcvt_s_d, LARCH_V8SF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfmin_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfmin_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfmina_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfmina_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfmax_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfmax_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfmaxa_s, LARCH_V8SF_FTYPE_V8SF_V8SF), + LASX_BUILTIN (xvfmaxa_d, LARCH_V4DF_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvfclass_s, LARCH_V8SI_FTYPE_V8SF), + LASX_BUILTIN (xvfclass_d, LARCH_V4DI_FTYPE_V4DF), + LASX_BUILTIN (xvfsqrt_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfsqrt_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrecip_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrecip_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrint_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrint_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrsqrt_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrsqrt_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvflogb_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvflogb_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfcvth_s_h, LARCH_V8SF_FTYPE_V16HI), + LASX_BUILTIN (xvfcvth_d_s, LARCH_V4DF_FTYPE_V8SF), + LASX_BUILTIN (xvfcvtl_s_h, LARCH_V8SF_FTYPE_V16HI), + LASX_BUILTIN (xvfcvtl_d_s, LARCH_V4DF_FTYPE_V8SF), + LASX_BUILTIN (xvftint_w_s, LARCH_V8SI_FTYPE_V8SF), + LASX_BUILTIN (xvftint_l_d, LARCH_V4DI_FTYPE_V4DF), + LASX_BUILTIN (xvftint_wu_s, LARCH_UV8SI_FTYPE_V8SF), + LASX_BUILTIN (xvftint_lu_d, LARCH_UV4DI_FTYPE_V4DF), + LASX_BUILTIN (xvftintrz_w_s, LARCH_V8SI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrz_l_d, LARCH_V4DI_FTYPE_V4DF), + LASX_BUILTIN (xvftintrz_wu_s, LARCH_UV8SI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrz_lu_d, LARCH_UV4DI_FTYPE_V4DF), + LASX_BUILTIN (xvffint_s_w, LARCH_V8SF_FTYPE_V8SI), + LASX_BUILTIN (xvffint_d_l, LARCH_V4DF_FTYPE_V4DI), + LASX_BUILTIN (xvffint_s_wu, LARCH_V8SF_FTYPE_UV8SI), + LASX_BUILTIN (xvffint_d_lu, LARCH_V4DF_FTYPE_UV4DI), + + LASX_BUILTIN (xvreplve_b, LARCH_V32QI_FTYPE_V32QI_SI), + LASX_BUILTIN (xvreplve_h, LARCH_V16HI_FTYPE_V16HI_SI), + LASX_BUILTIN (xvreplve_w, LARCH_V8SI_FTYPE_V8SI_SI), + LASX_BUILTIN (xvreplve_d, LARCH_V4DI_FTYPE_V4DI_SI), + LASX_BUILTIN (xvpermi_w, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + + LASX_BUILTIN (xvandn_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvneg_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvneg_h, LARCH_V16HI_FTYPE_V16HI), + LASX_BUILTIN (xvneg_w, LARCH_V8SI_FTYPE_V8SI), + LASX_BUILTIN (xvneg_d, LARCH_V4DI_FTYPE_V4DI), + LASX_BUILTIN (xvmuh_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvmuh_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvmuh_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvmuh_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvmuh_bu, LARCH_UV32QI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvmuh_hu, LARCH_UV16HI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvmuh_wu, LARCH_UV8SI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvmuh_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvsllwil_h_b, LARCH_V16HI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvsllwil_w_h, LARCH_V8SI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvsllwil_d_w, LARCH_V4DI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvsllwil_hu_bu, LARCH_UV16HI_FTYPE_UV32QI_UQI), /* FIXME: U? */ + LASX_BUILTIN (xvsllwil_wu_hu, LARCH_UV8SI_FTYPE_UV16HI_UQI), + LASX_BUILTIN (xvsllwil_du_wu, LARCH_UV4DI_FTYPE_UV8SI_UQI), + LASX_BUILTIN (xvsran_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsran_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsran_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssran_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvssran_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvssran_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssran_bu_h, LARCH_UV32QI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvssran_hu_w, LARCH_UV16HI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvssran_wu_d, LARCH_UV8SI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvsrarn_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsrarn_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsrarn_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssrarn_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvssrarn_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvssrarn_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssrarn_bu_h, LARCH_UV32QI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvssrarn_hu_w, LARCH_UV16HI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvssrarn_wu_d, LARCH_UV8SI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvsrln_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsrln_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsrln_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssrln_bu_h, LARCH_UV32QI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvssrln_hu_w, LARCH_UV16HI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvssrln_wu_d, LARCH_UV8SI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvsrlrn_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsrlrn_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsrlrn_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssrlrn_bu_h, LARCH_UV32QI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvssrlrn_hu_w, LARCH_UV16HI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvssrlrn_wu_d, LARCH_UV8SI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvfrstpi_b, LARCH_V32QI_FTYPE_V32QI_V32QI_UQI), + LASX_BUILTIN (xvfrstpi_h, LARCH_V16HI_FTYPE_V16HI_V16HI_UQI), + LASX_BUILTIN (xvfrstp_b, LARCH_V32QI_FTYPE_V32QI_V32QI_V32QI), + LASX_BUILTIN (xvfrstp_h, LARCH_V16HI_FTYPE_V16HI_V16HI_V16HI), + LASX_BUILTIN (xvshuf4i_d, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvbsrl_v, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvbsll_v, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvextrins_b, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvextrins_h, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvextrins_w, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvextrins_d, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvmskltz_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvmskltz_h, LARCH_V16HI_FTYPE_V16HI), + LASX_BUILTIN (xvmskltz_w, LARCH_V8SI_FTYPE_V8SI), + LASX_BUILTIN (xvmskltz_d, LARCH_V4DI_FTYPE_V4DI), + LASX_BUILTIN (xvsigncov_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsigncov_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsigncov_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsigncov_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvfmadd_s, LARCH_V8SF_FTYPE_V8SF_V8SF_V8SF), + LASX_BUILTIN (xvfmadd_d, LARCH_V4DF_FTYPE_V4DF_V4DF_V4DF), + LASX_BUILTIN (xvfmsub_s, LARCH_V8SF_FTYPE_V8SF_V8SF_V8SF), + LASX_BUILTIN (xvfmsub_d, LARCH_V4DF_FTYPE_V4DF_V4DF_V4DF), + LASX_BUILTIN (xvfnmadd_s, LARCH_V8SF_FTYPE_V8SF_V8SF_V8SF), + LASX_BUILTIN (xvfnmadd_d, LARCH_V4DF_FTYPE_V4DF_V4DF_V4DF), + LASX_BUILTIN (xvfnmsub_s, LARCH_V8SF_FTYPE_V8SF_V8SF_V8SF), + LASX_BUILTIN (xvfnmsub_d, LARCH_V4DF_FTYPE_V4DF_V4DF_V4DF), + LASX_BUILTIN (xvftintrne_w_s, LARCH_V8SI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrne_l_d, LARCH_V4DI_FTYPE_V4DF), + LASX_BUILTIN (xvftintrp_w_s, LARCH_V8SI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrp_l_d, LARCH_V4DI_FTYPE_V4DF), + LASX_BUILTIN (xvftintrm_w_s, LARCH_V8SI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrm_l_d, LARCH_V4DI_FTYPE_V4DF), + LASX_BUILTIN (xvftint_w_d, LARCH_V8SI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvffint_s_l, LARCH_V8SF_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvftintrz_w_d, LARCH_V8SI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvftintrp_w_d, LARCH_V8SI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvftintrm_w_d, LARCH_V8SI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvftintrne_w_d, LARCH_V8SI_FTYPE_V4DF_V4DF), + LASX_BUILTIN (xvftinth_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintl_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvffinth_d_w, LARCH_V4DF_FTYPE_V8SI), + LASX_BUILTIN (xvffintl_d_w, LARCH_V4DF_FTYPE_V8SI), + LASX_BUILTIN (xvftintrzh_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrzl_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrph_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrpl_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrmh_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrml_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrneh_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvftintrnel_l_s, LARCH_V4DI_FTYPE_V8SF), + LASX_BUILTIN (xvfrintrne_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrintrne_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrintrz_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrintrz_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrintrp_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrintrp_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrintrm_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrintrm_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvld, LARCH_V32QI_FTYPE_CVPOINTER_SI), + LASX_NO_TARGET_BUILTIN (xvst, LARCH_VOID_FTYPE_V32QI_CVPOINTER_SI), + LASX_NO_TARGET_BUILTIN (xvstelm_b, LARCH_VOID_FTYPE_V32QI_CVPOINTER_SI_UQI), + LASX_NO_TARGET_BUILTIN (xvstelm_h, LARCH_VOID_FTYPE_V16HI_CVPOINTER_SI_UQI), + LASX_NO_TARGET_BUILTIN (xvstelm_w, LARCH_VOID_FTYPE_V8SI_CVPOINTER_SI_UQI), + LASX_NO_TARGET_BUILTIN (xvstelm_d, LARCH_VOID_FTYPE_V4DI_CVPOINTER_SI_UQI), + LASX_BUILTIN (xvinsve0_w, LARCH_V8SI_FTYPE_V8SI_V8SI_UQI), + LASX_BUILTIN (xvinsve0_d, LARCH_V4DI_FTYPE_V4DI_V4DI_UQI), + LASX_BUILTIN (xvpickve_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvpickve_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvpickve_w_f, LARCH_V8SF_FTYPE_V8SF_UQI), + LASX_BUILTIN (xvpickve_d_f, LARCH_V4DF_FTYPE_V4DF_UQI), + LASX_BUILTIN (xvssrlrn_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvssrlrn_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvssrlrn_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvssrln_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvssrln_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvssrln_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvorn_v, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvldi, LARCH_V4DI_FTYPE_HI), + LASX_BUILTIN (xvldx, LARCH_V32QI_FTYPE_CVPOINTER_DI), + LASX_NO_TARGET_BUILTIN (xvstx, LARCH_VOID_FTYPE_V32QI_CVPOINTER_DI), + LASX_BUILTIN (xvextl_qu_du, LARCH_UV4DI_FTYPE_UV4DI), + + /* LASX */ + LASX_BUILTIN (xvinsgr2vr_w, LARCH_V8SI_FTYPE_V8SI_SI_UQI), + LASX_BUILTIN (xvinsgr2vr_d, LARCH_V4DI_FTYPE_V4DI_DI_UQI), + + LASX_BUILTIN (xvreplve0_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvreplve0_h, LARCH_V16HI_FTYPE_V16HI), + LASX_BUILTIN (xvreplve0_w, LARCH_V8SI_FTYPE_V8SI), + LASX_BUILTIN (xvreplve0_d, LARCH_V4DI_FTYPE_V4DI), + LASX_BUILTIN (xvreplve0_q, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (vext2xv_h_b, LARCH_V16HI_FTYPE_V32QI), + LASX_BUILTIN (vext2xv_w_h, LARCH_V8SI_FTYPE_V16HI), + LASX_BUILTIN (vext2xv_d_w, LARCH_V4DI_FTYPE_V8SI), + LASX_BUILTIN (vext2xv_w_b, LARCH_V8SI_FTYPE_V32QI), + LASX_BUILTIN (vext2xv_d_h, LARCH_V4DI_FTYPE_V16HI), + LASX_BUILTIN (vext2xv_d_b, LARCH_V4DI_FTYPE_V32QI), + LASX_BUILTIN (vext2xv_hu_bu, LARCH_V16HI_FTYPE_V32QI), + LASX_BUILTIN (vext2xv_wu_hu, LARCH_V8SI_FTYPE_V16HI), + LASX_BUILTIN (vext2xv_du_wu, LARCH_V4DI_FTYPE_V8SI), + LASX_BUILTIN (vext2xv_wu_bu, LARCH_V8SI_FTYPE_V32QI), + LASX_BUILTIN (vext2xv_du_hu, LARCH_V4DI_FTYPE_V16HI), + LASX_BUILTIN (vext2xv_du_bu, LARCH_V4DI_FTYPE_V32QI), + LASX_BUILTIN (xvpermi_q, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvpermi_d, LARCH_V4DI_FTYPE_V4DI_USI), + LASX_BUILTIN (xvperm_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN_TEST_BRANCH (xbz_b, LARCH_SI_FTYPE_UV32QI), + LASX_BUILTIN_TEST_BRANCH (xbz_h, LARCH_SI_FTYPE_UV16HI), + LASX_BUILTIN_TEST_BRANCH (xbz_w, LARCH_SI_FTYPE_UV8SI), + LASX_BUILTIN_TEST_BRANCH (xbz_d, LARCH_SI_FTYPE_UV4DI), + LASX_BUILTIN_TEST_BRANCH (xbnz_b, LARCH_SI_FTYPE_UV32QI), + LASX_BUILTIN_TEST_BRANCH (xbnz_h, LARCH_SI_FTYPE_UV16HI), + LASX_BUILTIN_TEST_BRANCH (xbnz_w, LARCH_SI_FTYPE_UV8SI), + LASX_BUILTIN_TEST_BRANCH (xbnz_d, LARCH_SI_FTYPE_UV4DI), + LASX_BUILTIN_TEST_BRANCH (xbz_v, LARCH_SI_FTYPE_UV32QI), + LASX_BUILTIN_TEST_BRANCH (xbnz_v, LARCH_SI_FTYPE_UV32QI), + LASX_BUILTIN (xvldrepl_b, LARCH_V32QI_FTYPE_CVPOINTER_SI), + LASX_BUILTIN (xvldrepl_h, LARCH_V16HI_FTYPE_CVPOINTER_SI), + LASX_BUILTIN (xvldrepl_w, LARCH_V8SI_FTYPE_CVPOINTER_SI), + LASX_BUILTIN (xvldrepl_d, LARCH_V4DI_FTYPE_CVPOINTER_SI), + LASX_BUILTIN (xvpickve2gr_w, LARCH_SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvpickve2gr_wu, LARCH_USI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvpickve2gr_d, LARCH_DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvpickve2gr_du, LARCH_UDI_FTYPE_V4DI_UQI), + + LASX_BUILTIN (xvaddwev_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvaddwev_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvaddwev_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvaddwev_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvaddwev_q_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvaddwev_d_wu, LARCH_V4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvaddwev_w_hu, LARCH_V8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvaddwev_h_bu, LARCH_V16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvsubwev_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsubwev_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsubwev_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsubwev_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsubwev_q_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvsubwev_d_wu, LARCH_V4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvsubwev_w_hu, LARCH_V8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvsubwev_h_bu, LARCH_V16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvmulwev_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvmulwev_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvmulwev_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvmulwev_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvmulwev_q_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvmulwev_d_wu, LARCH_V4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvmulwev_w_hu, LARCH_V8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvmulwev_h_bu, LARCH_V16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvaddwod_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvaddwod_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvaddwod_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvaddwod_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvaddwod_q_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvaddwod_d_wu, LARCH_V4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvaddwod_w_hu, LARCH_V8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvaddwod_h_bu, LARCH_V16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvsubwod_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsubwod_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvsubwod_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvsubwod_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvsubwod_q_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvsubwod_d_wu, LARCH_V4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvsubwod_w_hu, LARCH_V8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvsubwod_h_bu, LARCH_V16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvmulwod_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvmulwod_d_w, LARCH_V4DI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvmulwod_w_h, LARCH_V8SI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvmulwod_h_b, LARCH_V16HI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvmulwod_q_du, LARCH_V4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvmulwod_d_wu, LARCH_V4DI_FTYPE_UV8SI_UV8SI), + LASX_BUILTIN (xvmulwod_w_hu, LARCH_V8SI_FTYPE_UV16HI_UV16HI), + LASX_BUILTIN (xvmulwod_h_bu, LARCH_V16HI_FTYPE_UV32QI_UV32QI), + LASX_BUILTIN (xvaddwev_d_wu_w, LARCH_V4DI_FTYPE_UV8SI_V8SI), + LASX_BUILTIN (xvaddwev_w_hu_h, LARCH_V8SI_FTYPE_UV16HI_V16HI), + LASX_BUILTIN (xvaddwev_h_bu_b, LARCH_V16HI_FTYPE_UV32QI_V32QI), + LASX_BUILTIN (xvmulwev_d_wu_w, LARCH_V4DI_FTYPE_UV8SI_V8SI), + LASX_BUILTIN (xvmulwev_w_hu_h, LARCH_V8SI_FTYPE_UV16HI_V16HI), + LASX_BUILTIN (xvmulwev_h_bu_b, LARCH_V16HI_FTYPE_UV32QI_V32QI), + LASX_BUILTIN (xvaddwod_d_wu_w, LARCH_V4DI_FTYPE_UV8SI_V8SI), + LASX_BUILTIN (xvaddwod_w_hu_h, LARCH_V8SI_FTYPE_UV16HI_V16HI), + LASX_BUILTIN (xvaddwod_h_bu_b, LARCH_V16HI_FTYPE_UV32QI_V32QI), + LASX_BUILTIN (xvmulwod_d_wu_w, LARCH_V4DI_FTYPE_UV8SI_V8SI), + LASX_BUILTIN (xvmulwod_w_hu_h, LARCH_V8SI_FTYPE_UV16HI_V16HI), + LASX_BUILTIN (xvmulwod_h_bu_b, LARCH_V16HI_FTYPE_UV32QI_V32QI), + LASX_BUILTIN (xvhaddw_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvhaddw_qu_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvhsubw_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvhsubw_qu_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI), + LASX_BUILTIN (xvmaddwev_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI_V4DI), + LASX_BUILTIN (xvmaddwev_d_w, LARCH_V4DI_FTYPE_V4DI_V8SI_V8SI), + LASX_BUILTIN (xvmaddwev_w_h, LARCH_V8SI_FTYPE_V8SI_V16HI_V16HI), + LASX_BUILTIN (xvmaddwev_h_b, LARCH_V16HI_FTYPE_V16HI_V32QI_V32QI), + LASX_BUILTIN (xvmaddwev_q_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI_UV4DI), + LASX_BUILTIN (xvmaddwev_d_wu, LARCH_UV4DI_FTYPE_UV4DI_UV8SI_UV8SI), + LASX_BUILTIN (xvmaddwev_w_hu, LARCH_UV8SI_FTYPE_UV8SI_UV16HI_UV16HI), + LASX_BUILTIN (xvmaddwev_h_bu, LARCH_UV16HI_FTYPE_UV16HI_UV32QI_UV32QI), + LASX_BUILTIN (xvmaddwod_q_d, LARCH_V4DI_FTYPE_V4DI_V4DI_V4DI), + LASX_BUILTIN (xvmaddwod_d_w, LARCH_V4DI_FTYPE_V4DI_V8SI_V8SI), + LASX_BUILTIN (xvmaddwod_w_h, LARCH_V8SI_FTYPE_V8SI_V16HI_V16HI), + LASX_BUILTIN (xvmaddwod_h_b, LARCH_V16HI_FTYPE_V16HI_V32QI_V32QI), + LASX_BUILTIN (xvmaddwod_q_du, LARCH_UV4DI_FTYPE_UV4DI_UV4DI_UV4DI), + LASX_BUILTIN (xvmaddwod_d_wu, LARCH_UV4DI_FTYPE_UV4DI_UV8SI_UV8SI), + LASX_BUILTIN (xvmaddwod_w_hu, LARCH_UV8SI_FTYPE_UV8SI_UV16HI_UV16HI), + LASX_BUILTIN (xvmaddwod_h_bu, LARCH_UV16HI_FTYPE_UV16HI_UV32QI_UV32QI), + LASX_BUILTIN (xvmaddwev_q_du_d, LARCH_V4DI_FTYPE_V4DI_UV4DI_V4DI), + LASX_BUILTIN (xvmaddwev_d_wu_w, LARCH_V4DI_FTYPE_V4DI_UV8SI_V8SI), + LASX_BUILTIN (xvmaddwev_w_hu_h, LARCH_V8SI_FTYPE_V8SI_UV16HI_V16HI), + LASX_BUILTIN (xvmaddwev_h_bu_b, LARCH_V16HI_FTYPE_V16HI_UV32QI_V32QI), + LASX_BUILTIN (xvmaddwod_q_du_d, LARCH_V4DI_FTYPE_V4DI_UV4DI_V4DI), + LASX_BUILTIN (xvmaddwod_d_wu_w, LARCH_V4DI_FTYPE_V4DI_UV8SI_V8SI), + LASX_BUILTIN (xvmaddwod_w_hu_h, LARCH_V8SI_FTYPE_V8SI_UV16HI_V16HI), + LASX_BUILTIN (xvmaddwod_h_bu_b, LARCH_V16HI_FTYPE_V16HI_UV32QI_V32QI), + LASX_BUILTIN (xvrotr_b, LARCH_V32QI_FTYPE_V32QI_V32QI), + LASX_BUILTIN (xvrotr_h, LARCH_V16HI_FTYPE_V16HI_V16HI), + LASX_BUILTIN (xvrotr_w, LARCH_V8SI_FTYPE_V8SI_V8SI), + LASX_BUILTIN (xvrotr_d, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvadd_q, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvsub_q, LARCH_V4DI_FTYPE_V4DI_V4DI), + LASX_BUILTIN (xvaddwev_q_du_d, LARCH_V4DI_FTYPE_UV4DI_V4DI), + LASX_BUILTIN (xvaddwod_q_du_d, LARCH_V4DI_FTYPE_UV4DI_V4DI), + LASX_BUILTIN (xvmulwev_q_du_d, LARCH_V4DI_FTYPE_UV4DI_V4DI), + LASX_BUILTIN (xvmulwod_q_du_d, LARCH_V4DI_FTYPE_UV4DI_V4DI), + LASX_BUILTIN (xvmskgez_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvmsknz_b, LARCH_V32QI_FTYPE_V32QI), + LASX_BUILTIN (xvexth_h_b, LARCH_V16HI_FTYPE_V32QI), + LASX_BUILTIN (xvexth_w_h, LARCH_V8SI_FTYPE_V16HI), + LASX_BUILTIN (xvexth_d_w, LARCH_V4DI_FTYPE_V8SI), + LASX_BUILTIN (xvexth_q_d, LARCH_V4DI_FTYPE_V4DI), + LASX_BUILTIN (xvexth_hu_bu, LARCH_UV16HI_FTYPE_UV32QI), + LASX_BUILTIN (xvexth_wu_hu, LARCH_UV8SI_FTYPE_UV16HI), + LASX_BUILTIN (xvexth_du_wu, LARCH_UV4DI_FTYPE_UV8SI), + LASX_BUILTIN (xvexth_qu_du, LARCH_UV4DI_FTYPE_UV4DI), + LASX_BUILTIN (xvrotri_b, LARCH_V32QI_FTYPE_V32QI_UQI), + LASX_BUILTIN (xvrotri_h, LARCH_V16HI_FTYPE_V16HI_UQI), + LASX_BUILTIN (xvrotri_w, LARCH_V8SI_FTYPE_V8SI_UQI), + LASX_BUILTIN (xvrotri_d, LARCH_V4DI_FTYPE_V4DI_UQI), + LASX_BUILTIN (xvextl_q_d, LARCH_V4DI_FTYPE_V4DI), + LASX_BUILTIN (xvsrlni_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvsrlni_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvsrlni_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvsrlni_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvsrlrni_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvsrlrni_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvsrlrni_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvsrlrni_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvssrlni_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvssrlni_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvssrlni_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvssrlni_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvssrlni_bu_h, LARCH_UV32QI_FTYPE_UV32QI_V32QI_USI), + LASX_BUILTIN (xvssrlni_hu_w, LARCH_UV16HI_FTYPE_UV16HI_V16HI_USI), + LASX_BUILTIN (xvssrlni_wu_d, LARCH_UV8SI_FTYPE_UV8SI_V8SI_USI), + LASX_BUILTIN (xvssrlni_du_q, LARCH_UV4DI_FTYPE_UV4DI_V4DI_USI), + LASX_BUILTIN (xvssrlrni_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvssrlrni_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvssrlrni_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvssrlrni_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvssrlrni_bu_h, LARCH_UV32QI_FTYPE_UV32QI_V32QI_USI), + LASX_BUILTIN (xvssrlrni_hu_w, LARCH_UV16HI_FTYPE_UV16HI_V16HI_USI), + LASX_BUILTIN (xvssrlrni_wu_d, LARCH_UV8SI_FTYPE_UV8SI_V8SI_USI), + LASX_BUILTIN (xvssrlrni_du_q, LARCH_UV4DI_FTYPE_UV4DI_V4DI_USI), + LASX_BUILTIN (xvsrani_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvsrani_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvsrani_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvsrani_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvsrarni_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvsrarni_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvsrarni_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvsrarni_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvssrani_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvssrani_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvssrani_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvssrani_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvssrani_bu_h, LARCH_UV32QI_FTYPE_UV32QI_V32QI_USI), + LASX_BUILTIN (xvssrani_hu_w, LARCH_UV16HI_FTYPE_UV16HI_V16HI_USI), + LASX_BUILTIN (xvssrani_wu_d, LARCH_UV8SI_FTYPE_UV8SI_V8SI_USI), + LASX_BUILTIN (xvssrani_du_q, LARCH_UV4DI_FTYPE_UV4DI_V4DI_USI), + LASX_BUILTIN (xvssrarni_b_h, LARCH_V32QI_FTYPE_V32QI_V32QI_USI), + LASX_BUILTIN (xvssrarni_h_w, LARCH_V16HI_FTYPE_V16HI_V16HI_USI), + LASX_BUILTIN (xvssrarni_w_d, LARCH_V8SI_FTYPE_V8SI_V8SI_USI), + LASX_BUILTIN (xvssrarni_d_q, LARCH_V4DI_FTYPE_V4DI_V4DI_USI), + LASX_BUILTIN (xvssrarni_bu_h, LARCH_UV32QI_FTYPE_UV32QI_V32QI_USI), + LASX_BUILTIN (xvssrarni_hu_w, LARCH_UV16HI_FTYPE_UV16HI_V16HI_USI), + LASX_BUILTIN (xvssrarni_wu_d, LARCH_UV8SI_FTYPE_UV8SI_V8SI_USI), + LASX_BUILTIN (xvssrarni_du_q, LARCH_UV4DI_FTYPE_UV4DI_V4DI_USI) }; /* Index I is the function declaration for loongarch_builtins[I], or null if @@ -1441,11 +2497,15 @@ loongarch_builtin_vectorized_function (unsigned int fn, tree type_out, { if (out_n == 2 && in_n == 2) return LARCH_GET_BUILTIN (lsx_vfrintrp_d); + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lasx_xvfrintrp_d); } if (out_mode == SFmode && in_mode == SFmode) { if (out_n == 4 && in_n == 4) return LARCH_GET_BUILTIN (lsx_vfrintrp_s); + if (out_n == 8 && in_n == 8) + return LARCH_GET_BUILTIN (lasx_xvfrintrp_s); } break; @@ -1454,11 +2514,15 @@ loongarch_builtin_vectorized_function (unsigned int fn, tree type_out, { if (out_n == 2 && in_n == 2) return LARCH_GET_BUILTIN (lsx_vfrintrz_d); + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lasx_xvfrintrz_d); } if (out_mode == SFmode && in_mode == SFmode) { if (out_n == 4 && in_n == 4) return LARCH_GET_BUILTIN (lsx_vfrintrz_s); + if (out_n == 8 && in_n == 8) + return LARCH_GET_BUILTIN (lasx_xvfrintrz_s); } break; @@ -1468,11 +2532,15 @@ loongarch_builtin_vectorized_function (unsigned int fn, tree type_out, { if (out_n == 2 && in_n == 2) return LARCH_GET_BUILTIN (lsx_vfrint_d); + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lasx_xvfrint_d); } if (out_mode == SFmode && in_mode == SFmode) { if (out_n == 4 && in_n == 4) return LARCH_GET_BUILTIN (lsx_vfrint_s); + if (out_n == 8 && in_n == 8) + return LARCH_GET_BUILTIN (lasx_xvfrint_s); } break; @@ -1481,11 +2549,15 @@ loongarch_builtin_vectorized_function (unsigned int fn, tree type_out, { if (out_n == 2 && in_n == 2) return LARCH_GET_BUILTIN (lsx_vfrintrm_d); + if (out_n == 4 && in_n == 4) + return LARCH_GET_BUILTIN (lasx_xvfrintrm_d); } if (out_mode == SFmode && in_mode == SFmode) { if (out_n == 4 && in_n == 4) return LARCH_GET_BUILTIN (lsx_vfrintrm_s); + if (out_n == 8 && in_n == 8) + return LARCH_GET_BUILTIN (lasx_xvfrintrm_s); } break; @@ -1560,6 +2632,30 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, case CODE_FOR_lsx_vsubi_hu: case CODE_FOR_lsx_vsubi_wu: case CODE_FOR_lsx_vsubi_du: + case CODE_FOR_lasx_xvaddi_bu: + case CODE_FOR_lasx_xvaddi_hu: + case CODE_FOR_lasx_xvaddi_wu: + case CODE_FOR_lasx_xvaddi_du: + case CODE_FOR_lasx_xvslti_bu: + case CODE_FOR_lasx_xvslti_hu: + case CODE_FOR_lasx_xvslti_wu: + case CODE_FOR_lasx_xvslti_du: + case CODE_FOR_lasx_xvslei_bu: + case CODE_FOR_lasx_xvslei_hu: + case CODE_FOR_lasx_xvslei_wu: + case CODE_FOR_lasx_xvslei_du: + case CODE_FOR_lasx_xvmaxi_bu: + case CODE_FOR_lasx_xvmaxi_hu: + case CODE_FOR_lasx_xvmaxi_wu: + case CODE_FOR_lasx_xvmaxi_du: + case CODE_FOR_lasx_xvmini_bu: + case CODE_FOR_lasx_xvmini_hu: + case CODE_FOR_lasx_xvmini_wu: + case CODE_FOR_lasx_xvmini_du: + case CODE_FOR_lasx_xvsubi_bu: + case CODE_FOR_lasx_xvsubi_hu: + case CODE_FOR_lasx_xvsubi_wu: + case CODE_FOR_lasx_xvsubi_du: gcc_assert (has_target_p && nops == 3); /* We only generate a vector of constants iff the second argument is an immediate. We also validate the range of the immediate. */ @@ -1598,6 +2694,26 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, case CODE_FOR_lsx_vmini_h: case CODE_FOR_lsx_vmini_w: case CODE_FOR_lsx_vmini_d: + case CODE_FOR_lasx_xvseqi_b: + case CODE_FOR_lasx_xvseqi_h: + case CODE_FOR_lasx_xvseqi_w: + case CODE_FOR_lasx_xvseqi_d: + case CODE_FOR_lasx_xvslti_b: + case CODE_FOR_lasx_xvslti_h: + case CODE_FOR_lasx_xvslti_w: + case CODE_FOR_lasx_xvslti_d: + case CODE_FOR_lasx_xvslei_b: + case CODE_FOR_lasx_xvslei_h: + case CODE_FOR_lasx_xvslei_w: + case CODE_FOR_lasx_xvslei_d: + case CODE_FOR_lasx_xvmaxi_b: + case CODE_FOR_lasx_xvmaxi_h: + case CODE_FOR_lasx_xvmaxi_w: + case CODE_FOR_lasx_xvmaxi_d: + case CODE_FOR_lasx_xvmini_b: + case CODE_FOR_lasx_xvmini_h: + case CODE_FOR_lasx_xvmini_w: + case CODE_FOR_lasx_xvmini_d: gcc_assert (has_target_p && nops == 3); /* We only generate a vector of constants iff the second argument is an immediate. We also validate the range of the immediate. */ @@ -1620,6 +2736,10 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, case CODE_FOR_lsx_vori_b: case CODE_FOR_lsx_vnori_b: case CODE_FOR_lsx_vxori_b: + case CODE_FOR_lasx_xvandi_b: + case CODE_FOR_lasx_xvori_b: + case CODE_FOR_lasx_xvnori_b: + case CODE_FOR_lasx_xvxori_b: gcc_assert (has_target_p && nops == 3); if (!CONST_INT_P (ops[2].value)) break; @@ -1629,6 +2749,7 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, break; case CODE_FOR_lsx_vbitseli_b: + case CODE_FOR_lasx_xvbitseli_b: gcc_assert (has_target_p && nops == 4); if (!CONST_INT_P (ops[3].value)) break; @@ -1641,6 +2762,10 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, case CODE_FOR_lsx_vreplgr2vr_h: case CODE_FOR_lsx_vreplgr2vr_w: case CODE_FOR_lsx_vreplgr2vr_d: + case CODE_FOR_lasx_xvreplgr2vr_b: + case CODE_FOR_lasx_xvreplgr2vr_h: + case CODE_FOR_lasx_xvreplgr2vr_w: + case CODE_FOR_lasx_xvreplgr2vr_d: /* Map the built-ins to vector fill operations. We need fix up the mode for the element being inserted. */ gcc_assert (has_target_p && nops == 2); @@ -1669,6 +2794,26 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, case CODE_FOR_lsx_vpickod_b: case CODE_FOR_lsx_vpickod_h: case CODE_FOR_lsx_vpickod_w: + case CODE_FOR_lasx_xvilvh_b: + case CODE_FOR_lasx_xvilvh_h: + case CODE_FOR_lasx_xvilvh_w: + case CODE_FOR_lasx_xvilvh_d: + case CODE_FOR_lasx_xvilvl_b: + case CODE_FOR_lasx_xvilvl_h: + case CODE_FOR_lasx_xvilvl_w: + case CODE_FOR_lasx_xvilvl_d: + case CODE_FOR_lasx_xvpackev_b: + case CODE_FOR_lasx_xvpackev_h: + case CODE_FOR_lasx_xvpackev_w: + case CODE_FOR_lasx_xvpackod_b: + case CODE_FOR_lasx_xvpackod_h: + case CODE_FOR_lasx_xvpackod_w: + case CODE_FOR_lasx_xvpickev_b: + case CODE_FOR_lasx_xvpickev_h: + case CODE_FOR_lasx_xvpickev_w: + case CODE_FOR_lasx_xvpickod_b: + case CODE_FOR_lasx_xvpickod_h: + case CODE_FOR_lasx_xvpickod_w: /* Swap the operands 1 and 2 for interleave operations. Built-ins follow convention of ISA, which have op1 as higher component and op2 as lower component. However, the VEC_PERM op in tree and vec_concat in RTL @@ -1690,6 +2835,18 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, case CODE_FOR_lsx_vsrli_h: case CODE_FOR_lsx_vsrli_w: case CODE_FOR_lsx_vsrli_d: + case CODE_FOR_lasx_xvslli_b: + case CODE_FOR_lasx_xvslli_h: + case CODE_FOR_lasx_xvslli_w: + case CODE_FOR_lasx_xvslli_d: + case CODE_FOR_lasx_xvsrai_b: + case CODE_FOR_lasx_xvsrai_h: + case CODE_FOR_lasx_xvsrai_w: + case CODE_FOR_lasx_xvsrai_d: + case CODE_FOR_lasx_xvsrli_b: + case CODE_FOR_lasx_xvsrli_h: + case CODE_FOR_lasx_xvsrli_w: + case CODE_FOR_lasx_xvsrli_d: gcc_assert (has_target_p && nops == 3); if (CONST_INT_P (ops[2].value)) { @@ -1750,6 +2907,25 @@ loongarch_expand_builtin_insn (enum insn_code icode, unsigned int nops, INTVAL (ops[2].value)); break; + case CODE_FOR_lasx_xvinsgr2vr_w: + case CODE_FOR_lasx_xvinsgr2vr_d: + /* Map the built-ins to insert operations. We need to swap operands, + fix up the mode for the element being inserted, and generate + a bit mask for vec_merge. */ + gcc_assert (has_target_p && nops == 4); + std::swap (ops[1], ops[2]); + imode = GET_MODE_INNER (ops[0].mode); + ops[1].value = lowpart_subreg (imode, ops[1].value, ops[1].mode); + ops[1].mode = imode; + rangelo = 0; + rangehi = GET_MODE_NUNITS (ops[0].mode) - 1; + if (CONST_INT_P (ops[3].value) + && IN_RANGE (INTVAL (ops[3].value), rangelo, rangehi)) + ops[3].value = GEN_INT (1 << INTVAL (ops[3].value)); + else + error_opno = 2; + break; + default: break; } @@ -1859,12 +3035,14 @@ loongarch_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED, { case LARCH_BUILTIN_DIRECT: case LARCH_BUILTIN_LSX: + case LARCH_BUILTIN_LASX: return loongarch_expand_builtin_direct (d->icode, target, exp, true); case LARCH_BUILTIN_DIRECT_NO_TARGET: return loongarch_expand_builtin_direct (d->icode, target, exp, false); case LARCH_BUILTIN_LSX_TEST_BRANCH: + case LARCH_BUILTIN_LASX_TEST_BRANCH: return loongarch_expand_builtin_lsx_test_branch (d->icode, exp); } gcc_unreachable (); diff --git a/gcc/config/loongarch/loongarch-ftypes.def b/gcc/config/loongarch/loongarch-ftypes.def index 1ce9d83ccab..72d96878038 100644 --- a/gcc/config/loongarch/loongarch-ftypes.def +++ b/gcc/config/loongarch/loongarch-ftypes.def @@ -67,6 +67,7 @@ DEF_LARCH_FTYPE (3, (UDI, UDI, UDI, USI)) DEF_LARCH_FTYPE (1, (DF, DF)) DEF_LARCH_FTYPE (2, (DF, DF, DF)) DEF_LARCH_FTYPE (1, (DF, V2DF)) +DEF_LARCH_FTYPE (1, (DF, V4DF)) DEF_LARCH_FTYPE (1, (DI, DI)) DEF_LARCH_FTYPE (1, (DI, SI)) @@ -83,6 +84,7 @@ DEF_LARCH_FTYPE (2, (DI, SI, SI)) DEF_LARCH_FTYPE (2, (DI, USI, USI)) DEF_LARCH_FTYPE (2, (DI, V2DI, UQI)) +DEF_LARCH_FTYPE (2, (DI, V4DI, UQI)) DEF_LARCH_FTYPE (2, (INT, DF, DF)) DEF_LARCH_FTYPE (2, (INT, SF, SF)) @@ -104,21 +106,31 @@ DEF_LARCH_FTYPE (3, (SI, SI, SI, SI)) DEF_LARCH_FTYPE (3, (SI, SI, SI, QI)) DEF_LARCH_FTYPE (1, (SI, UQI)) DEF_LARCH_FTYPE (1, (SI, UV16QI)) +DEF_LARCH_FTYPE (1, (SI, UV32QI)) DEF_LARCH_FTYPE (1, (SI, UV2DI)) +DEF_LARCH_FTYPE (1, (SI, UV4DI)) DEF_LARCH_FTYPE (1, (SI, UV4SI)) +DEF_LARCH_FTYPE (1, (SI, UV8SI)) DEF_LARCH_FTYPE (1, (SI, UV8HI)) +DEF_LARCH_FTYPE (1, (SI, UV16HI)) DEF_LARCH_FTYPE (2, (SI, V16QI, UQI)) +DEF_LARCH_FTYPE (2, (SI, V32QI, UQI)) DEF_LARCH_FTYPE (1, (SI, V2HI)) DEF_LARCH_FTYPE (2, (SI, V2HI, V2HI)) DEF_LARCH_FTYPE (1, (SI, V4QI)) DEF_LARCH_FTYPE (2, (SI, V4QI, V4QI)) DEF_LARCH_FTYPE (2, (SI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (SI, V8SI, UQI)) DEF_LARCH_FTYPE (2, (SI, V8HI, UQI)) DEF_LARCH_FTYPE (1, (SI, VOID)) DEF_LARCH_FTYPE (2, (UDI, UDI, UDI)) +DEF_LARCH_FTYPE (2, (USI, V32QI, UQI)) DEF_LARCH_FTYPE (2, (UDI, UV2SI, UV2SI)) +DEF_LARCH_FTYPE (2, (USI, V8SI, UQI)) DEF_LARCH_FTYPE (2, (UDI, V2DI, UQI)) +DEF_LARCH_FTYPE (2, (USI, V16HI, UQI)) +DEF_LARCH_FTYPE (2, (UDI, V4DI, UQI)) DEF_LARCH_FTYPE (2, (USI, V16QI, UQI)) DEF_LARCH_FTYPE (2, (USI, V4SI, UQI)) @@ -142,6 +154,23 @@ DEF_LARCH_FTYPE (2, (UV2DI, UV2DI, V2DI)) DEF_LARCH_FTYPE (2, (UV2DI, UV4SI, UV4SI)) DEF_LARCH_FTYPE (1, (UV2DI, V2DF)) +DEF_LARCH_FTYPE (2, (UV32QI, UV32QI, UQI)) +DEF_LARCH_FTYPE (2, (UV32QI, UV32QI, USI)) +DEF_LARCH_FTYPE (2, (UV32QI, UV32QI, UV32QI)) +DEF_LARCH_FTYPE (3, (UV32QI, UV32QI, UV32QI, UQI)) +DEF_LARCH_FTYPE (3, (UV32QI, UV32QI, UV32QI, USI)) +DEF_LARCH_FTYPE (3, (UV32QI, UV32QI, UV32QI, UV32QI)) +DEF_LARCH_FTYPE (2, (UV32QI, UV32QI, V32QI)) + +DEF_LARCH_FTYPE (2, (UV4DI, UV4DI, UQI)) +DEF_LARCH_FTYPE (2, (UV4DI, UV4DI, UV4DI)) +DEF_LARCH_FTYPE (3, (UV4DI, UV4DI, UV4DI, UQI)) +DEF_LARCH_FTYPE (3, (UV4DI, UV4DI, UV4DI, UV4DI)) +DEF_LARCH_FTYPE (3, (UV4DI, UV4DI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (2, (UV4DI, UV4DI, V4DI)) +DEF_LARCH_FTYPE (2, (UV4DI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (1, (UV4DI, V4DF)) + DEF_LARCH_FTYPE (2, (UV2SI, UV2SI, UQI)) DEF_LARCH_FTYPE (2, (UV2SI, UV2SI, UV2SI)) @@ -170,7 +199,22 @@ DEF_LARCH_FTYPE (3, (UV8HI, UV8HI, UV8HI, UQI)) DEF_LARCH_FTYPE (3, (UV8HI, UV8HI, UV8HI, UV8HI)) DEF_LARCH_FTYPE (2, (UV8HI, UV8HI, V8HI)) - +DEF_LARCH_FTYPE (2, (UV8SI, UV8SI, UQI)) +DEF_LARCH_FTYPE (2, (UV8SI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (3, (UV8SI, UV8SI, UV8SI, UQI)) +DEF_LARCH_FTYPE (3, (UV8SI, UV8SI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (3, (UV8SI, UV8SI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (2, (UV8SI, UV8SI, V8SI)) +DEF_LARCH_FTYPE (2, (UV8SI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (1, (UV8SI, V8SF)) + +DEF_LARCH_FTYPE (2, (UV16HI, UV32QI, UV32QI)) +DEF_LARCH_FTYPE (2, (UV16HI, UV16HI, UQI)) +DEF_LARCH_FTYPE (3, (UV16HI, UV16HI, UV32QI, UV32QI)) +DEF_LARCH_FTYPE (2, (UV16HI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (3, (UV16HI, UV16HI, UV16HI, UQI)) +DEF_LARCH_FTYPE (3, (UV16HI, UV16HI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (2, (UV16HI, UV16HI, V16HI)) DEF_LARCH_FTYPE (2, (UV8QI, UV4HI, UV4HI)) DEF_LARCH_FTYPE (1, (UV8QI, UV8QI)) @@ -196,6 +240,25 @@ DEF_LARCH_FTYPE (4, (V16QI, V16QI, V16QI, UQI, UQI)) DEF_LARCH_FTYPE (3, (V16QI, V16QI, V16QI, USI)) DEF_LARCH_FTYPE (3, (V16QI, V16QI, V16QI, V16QI)) +DEF_LARCH_FTYPE (2, (V32QI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (2, (V32QI, CVPOINTER, DI)) +DEF_LARCH_FTYPE (1, (V32QI, HI)) +DEF_LARCH_FTYPE (1, (V32QI, SI)) +DEF_LARCH_FTYPE (2, (V32QI, UV32QI, UQI)) +DEF_LARCH_FTYPE (2, (V32QI, UV32QI, UV32QI)) +DEF_LARCH_FTYPE (1, (V32QI, V32QI)) +DEF_LARCH_FTYPE (2, (V32QI, V32QI, QI)) +DEF_LARCH_FTYPE (2, (V32QI, V32QI, SI)) +DEF_LARCH_FTYPE (2, (V32QI, V32QI, UQI)) +DEF_LARCH_FTYPE (2, (V32QI, V32QI, USI)) +DEF_LARCH_FTYPE (3, (V32QI, V32QI, SI, UQI)) +DEF_LARCH_FTYPE (3, (V32QI, V32QI, UQI, V32QI)) +DEF_LARCH_FTYPE (2, (V32QI, V32QI, V32QI)) +DEF_LARCH_FTYPE (3, (V32QI, V32QI, V32QI, SI)) +DEF_LARCH_FTYPE (3, (V32QI, V32QI, V32QI, UQI)) +DEF_LARCH_FTYPE (4, (V32QI, V32QI, V32QI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V32QI, V32QI, V32QI, USI)) +DEF_LARCH_FTYPE (3, (V32QI, V32QI, V32QI, V32QI)) DEF_LARCH_FTYPE (1, (V2DF, DF)) DEF_LARCH_FTYPE (1, (V2DF, UV2DI)) @@ -207,6 +270,16 @@ DEF_LARCH_FTYPE (1, (V2DF, V2DI)) DEF_LARCH_FTYPE (1, (V2DF, V4SF)) DEF_LARCH_FTYPE (1, (V2DF, V4SI)) +DEF_LARCH_FTYPE (1, (V4DF, DF)) +DEF_LARCH_FTYPE (1, (V4DF, UV4DI)) +DEF_LARCH_FTYPE (1, (V4DF, V4DF)) +DEF_LARCH_FTYPE (2, (V4DF, V4DF, V4DF)) +DEF_LARCH_FTYPE (3, (V4DF, V4DF, V4DF, V4DF)) +DEF_LARCH_FTYPE (2, (V4DF, V4DF, V4DI)) +DEF_LARCH_FTYPE (1, (V4DF, V4DI)) +DEF_LARCH_FTYPE (1, (V4DF, V8SF)) +DEF_LARCH_FTYPE (1, (V4DF, V8SI)) + DEF_LARCH_FTYPE (2, (V2DI, CVPOINTER, SI)) DEF_LARCH_FTYPE (1, (V2DI, DI)) DEF_LARCH_FTYPE (1, (V2DI, HI)) @@ -233,6 +306,32 @@ DEF_LARCH_FTYPE (3, (V2DI, V2DI, V2DI, V2DI)) DEF_LARCH_FTYPE (3, (V2DI, V2DI, V4SI, V4SI)) DEF_LARCH_FTYPE (2, (V2DI, V4SI, V4SI)) +DEF_LARCH_FTYPE (2, (V4DI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (1, (V4DI, DI)) +DEF_LARCH_FTYPE (1, (V4DI, HI)) +DEF_LARCH_FTYPE (2, (V4DI, UV4DI, UQI)) +DEF_LARCH_FTYPE (2, (V4DI, UV4DI, UV4DI)) +DEF_LARCH_FTYPE (2, (V4DI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (1, (V4DI, V4DF)) +DEF_LARCH_FTYPE (2, (V4DI, V4DF, V4DF)) +DEF_LARCH_FTYPE (1, (V4DI, V4DI)) +DEF_LARCH_FTYPE (1, (UV4DI, UV4DI)) +DEF_LARCH_FTYPE (2, (V4DI, V4DI, QI)) +DEF_LARCH_FTYPE (2, (V4DI, V4DI, SI)) +DEF_LARCH_FTYPE (2, (V4DI, V4DI, UQI)) +DEF_LARCH_FTYPE (2, (V4DI, V4DI, USI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, DI, UQI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, UQI, V4DI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (2, (V4DI, V4DI, V4DI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, V4DI, SI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, V4DI, USI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, V4DI, UQI)) +DEF_LARCH_FTYPE (4, (V4DI, V4DI, V4DI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, V4DI, V4DI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, V8SI, V8SI)) +DEF_LARCH_FTYPE (2, (V4DI, V8SI, V8SI)) + DEF_LARCH_FTYPE (1, (V2HI, SI)) DEF_LARCH_FTYPE (2, (V2HI, SI, SI)) DEF_LARCH_FTYPE (3, (V2HI, SI, SI, SI)) @@ -274,6 +373,17 @@ DEF_LARCH_FTYPE (3, (V4SF, V4SF, V4SF, V4SF)) DEF_LARCH_FTYPE (2, (V4SF, V4SF, V4SI)) DEF_LARCH_FTYPE (1, (V4SF, V4SI)) DEF_LARCH_FTYPE (1, (V4SF, V8HI)) +DEF_LARCH_FTYPE (1, (V8SF, V16HI)) + +DEF_LARCH_FTYPE (1, (V8SF, SF)) +DEF_LARCH_FTYPE (1, (V8SF, UV8SI)) +DEF_LARCH_FTYPE (2, (V8SF, V4DF, V4DF)) +DEF_LARCH_FTYPE (1, (V8SF, V8SF)) +DEF_LARCH_FTYPE (2, (V8SF, V8SF, V8SF)) +DEF_LARCH_FTYPE (3, (V8SF, V8SF, V8SF, V8SF)) +DEF_LARCH_FTYPE (2, (V8SF, V8SF, V8SI)) +DEF_LARCH_FTYPE (1, (V8SF, V8SI)) +DEF_LARCH_FTYPE (1, (V8SF, V8HI)) DEF_LARCH_FTYPE (2, (V4SI, CVPOINTER, SI)) DEF_LARCH_FTYPE (1, (V4SI, HI)) @@ -282,6 +392,7 @@ DEF_LARCH_FTYPE (2, (V4SI, UV4SI, UQI)) DEF_LARCH_FTYPE (2, (V4SI, UV4SI, UV4SI)) DEF_LARCH_FTYPE (2, (V4SI, UV8HI, UV8HI)) DEF_LARCH_FTYPE (2, (V4SI, V2DF, V2DF)) +DEF_LARCH_FTYPE (2, (V8SI, V4DF, V4DF)) DEF_LARCH_FTYPE (1, (V4SI, V4SF)) DEF_LARCH_FTYPE (2, (V4SI, V4SF, V4SF)) DEF_LARCH_FTYPE (1, (V4SI, V4SI)) @@ -301,6 +412,32 @@ DEF_LARCH_FTYPE (3, (V4SI, V4SI, V4SI, V4SI)) DEF_LARCH_FTYPE (3, (V4SI, V4SI, V8HI, V8HI)) DEF_LARCH_FTYPE (2, (V4SI, V8HI, V8HI)) +DEF_LARCH_FTYPE (2, (V8SI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (1, (V8SI, HI)) +DEF_LARCH_FTYPE (1, (V8SI, SI)) +DEF_LARCH_FTYPE (2, (V8SI, UV8SI, UQI)) +DEF_LARCH_FTYPE (2, (V8SI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (2, (V8SI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (2, (V8SI, V2DF, V2DF)) +DEF_LARCH_FTYPE (1, (V8SI, V8SF)) +DEF_LARCH_FTYPE (2, (V8SI, V8SF, V8SF)) +DEF_LARCH_FTYPE (1, (V8SI, V8SI)) +DEF_LARCH_FTYPE (2, (V8SI, V8SI, QI)) +DEF_LARCH_FTYPE (2, (V8SI, V8SI, SI)) +DEF_LARCH_FTYPE (2, (V8SI, V8SI, UQI)) +DEF_LARCH_FTYPE (2, (V8SI, V8SI, USI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, SI, UQI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, UQI, V8SI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (2, (V8SI, V8SI, V8SI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, V8SI, SI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, V8SI, UQI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, V8SI, USI)) +DEF_LARCH_FTYPE (4, (V8SI, V8SI, V8SI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, V8SI, V8SI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, V16HI, V16HI)) +DEF_LARCH_FTYPE (2, (V8SI, V16HI, V16HI)) + DEF_LARCH_FTYPE (2, (V8HI, CVPOINTER, SI)) DEF_LARCH_FTYPE (1, (V8HI, HI)) DEF_LARCH_FTYPE (1, (V8HI, SI)) @@ -326,6 +463,31 @@ DEF_LARCH_FTYPE (4, (V8HI, V8HI, V8HI, UQI, UQI)) DEF_LARCH_FTYPE (3, (V8HI, V8HI, V8HI, USI)) DEF_LARCH_FTYPE (3, (V8HI, V8HI, V8HI, V8HI)) +DEF_LARCH_FTYPE (2, (V16HI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (1, (V16HI, HI)) +DEF_LARCH_FTYPE (1, (V16HI, SI)) +DEF_LARCH_FTYPE (2, (V16HI, UV32QI, UV32QI)) +DEF_LARCH_FTYPE (2, (V16HI, UV16HI, UQI)) +DEF_LARCH_FTYPE (2, (V16HI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (2, (V16HI, V32QI, V32QI)) +DEF_LARCH_FTYPE (2, (V16HI, V8SF, V8SF)) +DEF_LARCH_FTYPE (1, (V16HI, V16HI)) +DEF_LARCH_FTYPE (2, (V16HI, V16HI, QI)) +DEF_LARCH_FTYPE (2, (V16HI, V16HI, SI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, SI, UQI)) +DEF_LARCH_FTYPE (2, (V16HI, V16HI, UQI)) +DEF_LARCH_FTYPE (2, (V16HI, V16HI, USI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, UQI, SI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, UQI, V16HI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, UV32QI, UV32QI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, V32QI, V32QI)) +DEF_LARCH_FTYPE (2, (V16HI, V16HI, V16HI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, V16HI, SI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, V16HI, UQI)) +DEF_LARCH_FTYPE (4, (V16HI, V16HI, V16HI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, V16HI, USI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, V16HI, V16HI)) + DEF_LARCH_FTYPE (2, (V8QI, V4HI, V4HI)) DEF_LARCH_FTYPE (1, (V8QI, V8QI)) DEF_LARCH_FTYPE (2, (V8QI, V8QI, V8QI)) @@ -337,62 +499,113 @@ DEF_LARCH_FTYPE (2, (VOID, USI, UQI)) DEF_LARCH_FTYPE (1, (VOID, UHI)) DEF_LARCH_FTYPE (3, (VOID, V16QI, CVPOINTER, SI)) DEF_LARCH_FTYPE (3, (VOID, V16QI, CVPOINTER, DI)) +DEF_LARCH_FTYPE (3, (VOID, V32QI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V32QI, CVPOINTER, DI)) +DEF_LARCH_FTYPE (3, (VOID, V4DF, POINTER, SI)) DEF_LARCH_FTYPE (3, (VOID, V2DF, POINTER, SI)) DEF_LARCH_FTYPE (3, (VOID, V2DI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V4DI, CVPOINTER, SI)) DEF_LARCH_FTYPE (2, (VOID, V2HI, V2HI)) DEF_LARCH_FTYPE (2, (VOID, V4QI, V4QI)) DEF_LARCH_FTYPE (3, (VOID, V4SF, POINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V8SF, POINTER, SI)) DEF_LARCH_FTYPE (3, (VOID, V4SI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V8SI, CVPOINTER, SI)) DEF_LARCH_FTYPE (3, (VOID, V8HI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (3, (VOID, V16HI, CVPOINTER, SI)) +DEF_LARCH_FTYPE (1, (V16HI, V32QI)) +DEF_LARCH_FTYPE (1, (UV16HI, UV32QI)) +DEF_LARCH_FTYPE (1, (V8SI, V32QI)) +DEF_LARCH_FTYPE (1, (V4DI, V32QI)) DEF_LARCH_FTYPE (1, (V8HI, V16QI)) DEF_LARCH_FTYPE (1, (V4SI, V16QI)) DEF_LARCH_FTYPE (1, (V2DI, V16QI)) +DEF_LARCH_FTYPE (1, (UV8SI, UV16HI)) +DEF_LARCH_FTYPE (1, (V8SI, V16HI)) +DEF_LARCH_FTYPE (1, (V4DI, V16HI)) DEF_LARCH_FTYPE (1, (V4SI, V8HI)) DEF_LARCH_FTYPE (1, (V2DI, V8HI)) DEF_LARCH_FTYPE (1, (V2DI, V4SI)) +DEF_LARCH_FTYPE (1, (V4DI, V8SI)) +DEF_LARCH_FTYPE (1, (UV4DI, UV8SI)) +DEF_LARCH_FTYPE (1, (UV16HI, V32QI)) +DEF_LARCH_FTYPE (1, (UV8SI, V32QI)) +DEF_LARCH_FTYPE (1, (UV4DI, V32QI)) DEF_LARCH_FTYPE (1, (UV8HI, V16QI)) DEF_LARCH_FTYPE (1, (UV4SI, V16QI)) DEF_LARCH_FTYPE (1, (UV2DI, V16QI)) +DEF_LARCH_FTYPE (1, (UV8SI, V16HI)) +DEF_LARCH_FTYPE (1, (UV4DI, V16HI)) DEF_LARCH_FTYPE (1, (UV4SI, V8HI)) DEF_LARCH_FTYPE (1, (UV2DI, V8HI)) DEF_LARCH_FTYPE (1, (UV2DI, V4SI)) +DEF_LARCH_FTYPE (1, (UV4DI, V8SI)) DEF_LARCH_FTYPE (1, (UV8HI, UV16QI)) DEF_LARCH_FTYPE (1, (UV4SI, UV16QI)) DEF_LARCH_FTYPE (1, (UV2DI, UV16QI)) +DEF_LARCH_FTYPE (1, (UV4DI, UV32QI)) DEF_LARCH_FTYPE (1, (UV4SI, UV8HI)) DEF_LARCH_FTYPE (1, (UV2DI, UV8HI)) DEF_LARCH_FTYPE (1, (UV2DI, UV4SI)) DEF_LARCH_FTYPE (2, (UV8HI, V16QI, V16QI)) DEF_LARCH_FTYPE (2, (UV4SI, V8HI, V8HI)) DEF_LARCH_FTYPE (2, (UV2DI, V4SI, V4SI)) +DEF_LARCH_FTYPE (2, (V16HI, V32QI, UQI)) +DEF_LARCH_FTYPE (2, (V8SI, V16HI, UQI)) +DEF_LARCH_FTYPE (2, (V4DI, V8SI, UQI)) DEF_LARCH_FTYPE (2, (V8HI, V16QI, UQI)) DEF_LARCH_FTYPE (2, (V4SI, V8HI, UQI)) DEF_LARCH_FTYPE (2, (V2DI, V4SI, UQI)) +DEF_LARCH_FTYPE (2, (UV16HI, UV32QI, UQI)) +DEF_LARCH_FTYPE (2, (UV8SI, UV16HI, UQI)) +DEF_LARCH_FTYPE (2, (UV4DI, UV8SI, UQI)) DEF_LARCH_FTYPE (2, (UV8HI, UV16QI, UQI)) DEF_LARCH_FTYPE (2, (UV4SI, UV8HI, UQI)) DEF_LARCH_FTYPE (2, (UV2DI, UV4SI, UQI)) +DEF_LARCH_FTYPE (2, (V32QI, V16HI, V16HI)) +DEF_LARCH_FTYPE (2, (V16HI, V8SI, V8SI)) +DEF_LARCH_FTYPE (2, (V8SI, V4DI, V4DI)) DEF_LARCH_FTYPE (2, (V16QI, V8HI, V8HI)) DEF_LARCH_FTYPE (2, (V8HI, V4SI, V4SI)) DEF_LARCH_FTYPE (2, (V4SI, V2DI, V2DI)) +DEF_LARCH_FTYPE (2, (UV32QI, UV16HI, UV16HI)) +DEF_LARCH_FTYPE (2, (UV16HI, UV8SI, UV8SI)) +DEF_LARCH_FTYPE (2, (UV8SI, UV4DI, UV4DI)) DEF_LARCH_FTYPE (2, (UV16QI, UV8HI, UV8HI)) DEF_LARCH_FTYPE (2, (UV8HI, UV4SI, UV4SI)) DEF_LARCH_FTYPE (2, (UV4SI, UV2DI, UV2DI)) +DEF_LARCH_FTYPE (2, (V32QI, V16HI, UQI)) +DEF_LARCH_FTYPE (2, (V16HI, V8SI, UQI)) +DEF_LARCH_FTYPE (2, (V8SI, V4DI, UQI)) DEF_LARCH_FTYPE (2, (V16QI, V8HI, UQI)) DEF_LARCH_FTYPE (2, (V8HI, V4SI, UQI)) DEF_LARCH_FTYPE (2, (V4SI, V2DI, UQI)) +DEF_LARCH_FTYPE (2, (UV32QI, UV16HI, UQI)) +DEF_LARCH_FTYPE (2, (UV16HI, UV8SI, UQI)) +DEF_LARCH_FTYPE (2, (UV8SI, UV4DI, UQI)) DEF_LARCH_FTYPE (2, (UV16QI, UV8HI, UQI)) DEF_LARCH_FTYPE (2, (UV8HI, UV4SI, UQI)) DEF_LARCH_FTYPE (2, (UV4SI, UV2DI, UQI)) +DEF_LARCH_FTYPE (2, (V32QI, V32QI, DI)) DEF_LARCH_FTYPE (2, (V16QI, V16QI, DI)) +DEF_LARCH_FTYPE (2, (V32QI, UQI, UQI)) DEF_LARCH_FTYPE (2, (V16QI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V32QI, V32QI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V16HI, V16HI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V8SI, V8SI, UQI, UQI)) +DEF_LARCH_FTYPE (3, (V4DI, V4DI, UQI, UQI)) DEF_LARCH_FTYPE (3, (V16QI, V16QI, UQI, UQI)) DEF_LARCH_FTYPE (3, (V8HI, V8HI, UQI, UQI)) DEF_LARCH_FTYPE (3, (V4SI, V4SI, UQI, UQI)) DEF_LARCH_FTYPE (3, (V2DI, V2DI, UQI, UQI)) +DEF_LARCH_FTYPE (2, (V8SF, V4DI, V4DI)) DEF_LARCH_FTYPE (2, (V4SF, V2DI, V2DI)) +DEF_LARCH_FTYPE (1, (V4DI, V8SF)) DEF_LARCH_FTYPE (1, (V2DI, V4SF)) +DEF_LARCH_FTYPE (2, (V4DI, UQI, USI)) DEF_LARCH_FTYPE (2, (V2DI, UQI, USI)) +DEF_LARCH_FTYPE (2, (V4DI, UQI, UQI)) DEF_LARCH_FTYPE (2, (V2DI, UQI, UQI)) DEF_LARCH_FTYPE (4, (VOID, SI, UQI, V16QI, CVPOINTER)) DEF_LARCH_FTYPE (4, (VOID, SI, UQI, V8HI, CVPOINTER)) @@ -402,6 +615,17 @@ DEF_LARCH_FTYPE (2, (V16QI, SI, CVPOINTER)) DEF_LARCH_FTYPE (2, (V8HI, SI, CVPOINTER)) DEF_LARCH_FTYPE (2, (V4SI, SI, CVPOINTER)) DEF_LARCH_FTYPE (2, (V2DI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (4, (VOID, V32QI, UQI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (4, (VOID, V16HI, UQI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (4, (VOID, V8SI, UQI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (4, (VOID, V4DI, UQI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (3, (VOID, V32QI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V32QI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V16HI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V8SI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (2, (V4DI, SI, CVPOINTER)) +DEF_LARCH_FTYPE (1, (V32QI, POINTER)) +DEF_LARCH_FTYPE (2, (VOID, V32QI, POINTER)) DEF_LARCH_FTYPE (2, (V8HI, UV16QI, V16QI)) DEF_LARCH_FTYPE (2, (V16QI, V16QI, UV16QI)) DEF_LARCH_FTYPE (2, (UV16QI, V16QI, UV16QI)) @@ -431,6 +655,33 @@ DEF_LARCH_FTYPE (3, (V4SI, V4SI, V16QI, V16QI)) DEF_LARCH_FTYPE (3, (V4SI, V4SI, UV16QI, V16QI)) DEF_LARCH_FTYPE (3, (UV4SI, UV4SI, UV16QI, UV16QI)) + +DEF_LARCH_FTYPE(2,(V4DI,V16HI,V16HI)) +DEF_LARCH_FTYPE(2,(V4DI,UV4SI,V4SI)) +DEF_LARCH_FTYPE(2,(V8SI,UV16HI,V16HI)) +DEF_LARCH_FTYPE(2,(V16HI,UV32QI,V32QI)) +DEF_LARCH_FTYPE(2,(V4DI,UV8SI,V8SI)) +DEF_LARCH_FTYPE(3,(V4DI,V4DI,V16HI,V16HI)) +DEF_LARCH_FTYPE(2,(UV32QI,V32QI,UV32QI)) +DEF_LARCH_FTYPE(2,(UV16HI,V16HI,UV16HI)) +DEF_LARCH_FTYPE(2,(UV8SI,V8SI,UV8SI)) +DEF_LARCH_FTYPE(2,(UV4DI,V4DI,UV4DI)) +DEF_LARCH_FTYPE(3,(V4DI,V4DI,UV4DI,V4DI)) +DEF_LARCH_FTYPE(3,(V4DI,V4DI,UV8SI,V8SI)) +DEF_LARCH_FTYPE(3,(V8SI,V8SI,UV16HI,V16HI)) +DEF_LARCH_FTYPE(3,(V16HI,V16HI,UV32QI,V32QI)) +DEF_LARCH_FTYPE(2,(V4DI,UV4DI,V4DI)) +DEF_LARCH_FTYPE(2,(V8SI,V32QI,V32QI)) +DEF_LARCH_FTYPE(2,(UV4DI,UV16HI,UV16HI)) +DEF_LARCH_FTYPE(2,(V4DI,UV16HI,V16HI)) +DEF_LARCH_FTYPE(3,(V8SI,V8SI,V32QI,V32QI)) +DEF_LARCH_FTYPE(3,(UV8SI,UV8SI,UV32QI,UV32QI)) +DEF_LARCH_FTYPE(3,(UV4DI,UV4DI,UV16HI,UV16HI)) +DEF_LARCH_FTYPE(3,(V8SI,V8SI,UV32QI,V32QI)) +DEF_LARCH_FTYPE(3,(V4DI,V4DI,UV16HI,V16HI)) +DEF_LARCH_FTYPE(2,(UV8SI,UV32QI,UV32QI)) +DEF_LARCH_FTYPE(2,(V8SI,UV32QI,V32QI)) + DEF_LARCH_FTYPE(4,(VOID,V16QI,CVPOINTER,SI,UQI)) DEF_LARCH_FTYPE(4,(VOID,V8HI,CVPOINTER,SI,UQI)) DEF_LARCH_FTYPE(4,(VOID,V4SI,CVPOINTER,SI,UQI)) @@ -448,11 +699,29 @@ DEF_LARCH_FTYPE (3, (UV8HI, UV8HI, V8HI, USI)) DEF_LARCH_FTYPE (3, (UV4SI, UV4SI, V4SI, USI)) DEF_LARCH_FTYPE (3, (UV2DI, UV2DI, V2DI, USI)) +DEF_LARCH_FTYPE (2, (DI, V8SI, UQI)) +DEF_LARCH_FTYPE (2, (UDI, V8SI, UQI)) + +DEF_LARCH_FTYPE (3, (UV32QI, UV32QI, V32QI, USI)) +DEF_LARCH_FTYPE (3, (UV16HI, UV16HI, V16HI, USI)) +DEF_LARCH_FTYPE (3, (UV8SI, UV8SI, V8SI, USI)) +DEF_LARCH_FTYPE (3, (UV4DI, UV4DI, V4DI, USI)) + +DEF_LARCH_FTYPE(4,(VOID,V32QI,CVPOINTER,SI,UQI)) +DEF_LARCH_FTYPE(4,(VOID,V16HI,CVPOINTER,SI,UQI)) +DEF_LARCH_FTYPE(4,(VOID,V8SI,CVPOINTER,SI,UQI)) +DEF_LARCH_FTYPE(4,(VOID,V4DI,CVPOINTER,SI,UQI)) + DEF_LARCH_FTYPE (1, (BOOLEAN,V16QI)) DEF_LARCH_FTYPE(2,(V16QI,CVPOINTER,CVPOINTER)) DEF_LARCH_FTYPE(3,(VOID,V16QI,CVPOINTER,CVPOINTER)) +DEF_LARCH_FTYPE(2,(V32QI,CVPOINTER,CVPOINTER)) +DEF_LARCH_FTYPE(3,(VOID,V32QI,CVPOINTER,CVPOINTER)) DEF_LARCH_FTYPE (3, (V16QI, V16QI, SI, UQI)) DEF_LARCH_FTYPE (3, (V2DI, V2DI, SI, UQI)) DEF_LARCH_FTYPE (3, (V2DI, V2DI, DI, UQI)) DEF_LARCH_FTYPE (3, (V4SI, V4SI, SI, UQI)) + +DEF_LARCH_FTYPE (2, (V8SF, V8SF, UQI)) +DEF_LARCH_FTYPE (2, (V4DF, V4DF, UQI))