[{"id":1769547,"web_url":"http://patchwork.ozlabs.org/comment/1769547/","msgid":"<b9d0cfa7-f5a4-3c96-8592-52ee380c381c@linaro.org>","list_archive_url":null,"date":"2017-09-16T02:35:38","subject":"Re: [Qemu-devel] [PATCH v3 0/6] TCG vectorization and example\n\tconversion","submitter":{"id":72104,"url":"http://patchwork.ozlabs.org/api/people/72104/","name":"Richard Henderson","email":"richard.henderson@linaro.org"},"content":"On 09/15/2017 07:34 PM, Richard Henderson wrote:\n> Now addressing the complex vector op issue.  I now expose TCGv_vec\n> to target front-ends, but opaque wrt the vector size.  One can thus\n> compose vector operations, as demonstrated in target/arm/.\n> \n> The actual host vector length now becomes an argument to the *_vec\n> opcodes.  It's a little awkward, but does prevent an explosion of\n> opcode values.\n> \n> All R-b dropped because all patches rewritten or heavily modified.\n\nBah.  Forgot to mention that this depends on tcg-next.  Full tree at\n\n  git://github.com/rth7680/qemu.git native-vector-registers-3\n\n\nr~","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=linaro.org header.i=@linaro.org\n\theader.b=\"UrujEZbu\"; dkim-atps=neutral"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xvGh75RM5z9s3T\n\tfor <incoming@patchwork.ozlabs.org>;\n\tSat, 16 Sep 2017 12:39:51 +1000 (AEST)","from localhost ([::1]:55712 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dt31J-0005QF-SY\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 22:39:49 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:36125)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <richard.henderson@linaro.org>) id 1dt2xO-0002XZ-NO\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:35:49 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <richard.henderson@linaro.org>) id 1dt2xK-0008JP-HY\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:35:46 -0400","from mail-pg0-x234.google.com ([2607:f8b0:400e:c05::234]:51309)\n\tby eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)\n\t(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)\n\tid 1dt2xK-0008J8-Bd\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:35:42 -0400","by mail-pg0-x234.google.com with SMTP id k193so2451933pgc.8\n\tfor <qemu-devel@nongnu.org>; Fri, 15 Sep 2017 19:35:42 -0700 (PDT)","from bigtime.twiddle.net (97-126-103-167.tukw.qwest.net.\n\t[97.126.103.167]) by smtp.gmail.com with ESMTPSA id\n\td80sm116001pfj.170.2017.09.15.19.35.40\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tFri, 15 Sep 2017 19:35:40 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;\n\th=subject:from:to:cc:references:message-id:date:user-agent\n\t:mime-version:in-reply-to:content-language:content-transfer-encoding; \n\tbh=Yt8DvUyIWNQr40yvbipNx4v+C13hLfXHTMpanS4ghUc=;\n\tb=UrujEZbuTx/8ZDhabBNLs4OgV6A+7DAxM/sITSaCdAGAz5PsFaShtT0SpqztlyN3mh\n\tdooBn0nG78SPLOJKWcm2opHULJHRBKD6m1MtCxiuCdKt/+ZIwfNqTs36pf6mlm2CuaEx\n\tBlDrE17JHocXrsw6yoezza1K7CiCGirTOzawM=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:subject:from:to:cc:references:message-id:date\n\t:user-agent:mime-version:in-reply-to:content-language\n\t:content-transfer-encoding;\n\tbh=Yt8DvUyIWNQr40yvbipNx4v+C13hLfXHTMpanS4ghUc=;\n\tb=WPf6XiRrB2ekoVOFMJWrOf0/GPqN1daMAFimAo204zDrr+vXxKd9+b358UF2elV7MC\n\tjlagVBguCa/RgSD9xgwPP4ataNfFXRoExbq5Y5EjXB25tpGFcNhrI1jen4vq+yjkS8Le\n\tquLQWtuBsR1mJal+Oy5czoToyL70UQ+TM8XyaqAyOWJy5bxIMvOmNpx2mPmrdcWlnX2K\n\tlImVJ/04kES6tw37n8UlFvNHcRjL5rqsst6gNb3MHV1FgeWwKSjWXCJJdsRDeIlYMgxA\n\t4So/eKWJirEh1vW18gjyruekc/0wEr1+rH9lw0hyZkhDY6ixbX2p3pTZd45s0VGOzsa7\n\tZ/JQ==","X-Gm-Message-State":"AHPjjUiP0FgSUdoO3v0eDfHHFWgL9onxaic9D1jdoggQnO/2d9q1CJ92\n\toYjjUzHk+G56RIbeRfJVKg==","X-Google-Smtp-Source":"ADKCNb5JAkxxcttJxKtZBZ1Sq++vPC/8dFI+iXiRCQWs/ICn6AFr9iHGFGFy/QgOloWSrRIeLMeaNA==","X-Received":"by 10.99.175.14 with SMTP id w14mr26655653pge.365.1505529341401; \n\tFri, 15 Sep 2017 19:35:41 -0700 (PDT)","From":"Richard Henderson <richard.henderson@linaro.org>","To":"qemu-devel@nongnu.org","References":"<20170916023417.14599-1-richard.henderson@linaro.org>","Message-ID":"<b9d0cfa7-f5a4-3c96-8592-52ee380c381c@linaro.org>","Date":"Fri, 15 Sep 2017 19:35:38 -0700","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170916023417.14599-1-richard.henderson@linaro.org>","Content-Type":"text/plain; charset=utf-8","Content-Language":"en-US","Content-Transfer-Encoding":"7bit","X-detected-operating-system":"by eggs.gnu.org: Genre and OS details not\n\trecognized.","X-Received-From":"2607:f8b0:400e:c05::234","Subject":"Re: [Qemu-devel] [PATCH v3 0/6] TCG vectorization and example\n\tconversion","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"alex.bennee@linaro.org, f4bug@amsat.org","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1775806,"web_url":"http://patchwork.ozlabs.org/comment/1775806/","msgid":"<87shf9cl9r.fsf@linaro.org>","list_archive_url":null,"date":"2017-09-26T19:28:16","subject":"Re: [Qemu-devel] [PATCH v3 1/6] tcg: Add types and operations for\n\thost vectors","submitter":{"id":39532,"url":"http://patchwork.ozlabs.org/api/people/39532/","name":"Alex Bennée","email":"alex.bennee@linaro.org"},"content":"Richard Henderson <richard.henderson@linaro.org> writes:\n\n> Nothing uses or enables them yet.\n>\n> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>\n> ---\n>  tcg/tcg-op.h  |  26 +++++++\n>  tcg/tcg-opc.h |  37 ++++++++++\n>  tcg/tcg.h     |  34 +++++++++\n>  tcg/tcg-op.c  | 234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n>  tcg/tcg.c     |  77 ++++++++++++++++++-\n>  tcg/README    |  46 ++++++++++++\n>  6 files changed, 453 insertions(+), 1 deletion(-)\n>\n> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h\n> index 5d3278f243..b9b0b9f46f 100644\n> --- a/tcg/tcg-op.h\n> +++ b/tcg/tcg-op.h\n> @@ -915,6 +915,32 @@ void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);\n>  void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp);\n>  void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);\n>\n> +void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);\n> +void tcg_gen_movi_vec(TCGv_vec, tcg_target_long);\n> +void tcg_gen_add8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_add16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_add32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_add64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_sub8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_sub16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_sub32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_sub64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_and_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_or_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_xor_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_andc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_orc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b);\n> +void tcg_gen_not_vec(TCGv_vec r, TCGv_vec a);\n> +void tcg_gen_neg8_vec(TCGv_vec r, TCGv_vec a);\n> +void tcg_gen_neg16_vec(TCGv_vec r, TCGv_vec a);\n> +void tcg_gen_neg32_vec(TCGv_vec r, TCGv_vec a);\n> +void tcg_gen_neg64_vec(TCGv_vec r, TCGv_vec a);\n> +\n> +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);\n> +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);\n> +void tcg_gen_ldz_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType sz);\n> +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType sz);\n> +\n>  #if TARGET_LONG_BITS == 64\n>  #define tcg_gen_movi_tl tcg_gen_movi_i64\n>  #define tcg_gen_mov_tl tcg_gen_mov_i64\n> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h\n> index 956fb1e9f3..8200184fa9 100644\n> --- a/tcg/tcg-opc.h\n> +++ b/tcg/tcg-opc.h\n> @@ -204,8 +204,45 @@ DEF(qemu_ld_i64, DATA64_ARGS, TLADDR_ARGS, 1,\n>  DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,\n>      TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | TCG_OPF_64BIT)\n>\n> +/* Host vector support.  */\n> +\n> +#define IMPLVEC  \\\n> +    IMPL(TCG_TARGET_HAS_v64 | TCG_TARGET_HAS_v128 | TCG_TARGET_HAS_v256)\n> +\n> +DEF(mov_vec, 1, 1, 1, TCG_OPF_NOT_PRESENT)\n> +\n> +/* ??? Simple, but perhaps dupiN would be more descriptive.  */\n> +DEF(movi_vec, 1, 0, 2, TCG_OPF_NOT_PRESENT)\n> +\n> +DEF(ld_vec, 1, 1, 2, IMPLVEC)\n> +DEF(ldz_vec, 1, 1, 3, IMPLVEC)\n> +DEF(st_vec, 0, 2, 2, IMPLVEC)\n> +\n> +DEF(add8_vec, 1, 2, 1, IMPLVEC)\n> +DEF(add16_vec, 1, 2, 1, IMPLVEC)\n> +DEF(add32_vec, 1, 2, 1, IMPLVEC)\n> +DEF(add64_vec, 1, 2, 1, IMPLVEC)\n> +\n> +DEF(sub8_vec, 1, 2, 1, IMPLVEC)\n> +DEF(sub16_vec, 1, 2, 1, IMPLVEC)\n> +DEF(sub32_vec, 1, 2, 1, IMPLVEC)\n> +DEF(sub64_vec, 1, 2, 1, IMPLVEC)\n> +\n> +DEF(neg8_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))\n> +DEF(neg16_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))\n> +DEF(neg32_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))\n> +DEF(neg64_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))\n> +\n> +DEF(and_vec, 1, 2, 1, IMPLVEC)\n> +DEF(or_vec, 1, 2, 1, IMPLVEC)\n> +DEF(xor_vec, 1, 2, 1, IMPLVEC)\n> +DEF(andc_vec, 1, 2, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec))\n> +DEF(orc_vec, 1, 2, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec))\n> +DEF(not_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec))\n> +\n>  #undef TLADDR_ARGS\n>  #undef DATA64_ARGS\n>  #undef IMPL\n>  #undef IMPL64\n> +#undef IMPLVEC\n>  #undef DEF\n> diff --git a/tcg/tcg.h b/tcg/tcg.h\n> index 25662c36d4..7cd356e87f 100644\n> --- a/tcg/tcg.h\n> +++ b/tcg/tcg.h\n> @@ -173,6 +173,16 @@ typedef uint64_t TCGRegSet;\n>  # error \"Missing unsigned widening multiply\"\n>  #endif\n>\n> +#ifndef TCG_TARGET_HAS_v64\n> +#define TCG_TARGET_HAS_v64              0\n> +#define TCG_TARGET_HAS_v128             0\n> +#define TCG_TARGET_HAS_v256             0\n> +#define TCG_TARGET_HAS_neg_vec          0\n> +#define TCG_TARGET_HAS_not_vec          0\n> +#define TCG_TARGET_HAS_andc_vec         0\n> +#define TCG_TARGET_HAS_orc_vec          0\n> +#endif\n> +\n>  #ifndef TARGET_INSN_START_EXTRA_WORDS\n>  # define TARGET_INSN_START_WORDS 1\n>  #else\n> @@ -249,6 +259,11 @@ typedef struct TCGPool {\n>  typedef enum TCGType {\n>      TCG_TYPE_I32,\n>      TCG_TYPE_I64,\n> +\n> +    TCG_TYPE_V64,\n> +    TCG_TYPE_V128,\n> +    TCG_TYPE_V256,\n> +\n>      TCG_TYPE_COUNT, /* number of different types */\n>\n>      /* An alias for the size of the host register.  */\n> @@ -399,6 +414,8 @@ typedef tcg_target_ulong TCGArg;\n>      * TCGv_i32 : 32 bit integer type\n>      * TCGv_i64 : 64 bit integer type\n>      * TCGv_ptr : a host pointer type\n> +    * TCGv_vec : a host vector type; the exact size is not exposed\n> +                 to the CPU front-end code.\n\nIsn't this a guest vector type (which is pointed to by a host pointer)?\n\n>      * TCGv : an integer type the same size as target_ulong\n>               (an alias for either TCGv_i32 or TCGv_i64)\n>     The compiler's type checking will complain if you mix them\n> @@ -424,6 +441,7 @@ typedef tcg_target_ulong TCGArg;\n>  typedef struct TCGv_i32_d *TCGv_i32;\n>  typedef struct TCGv_i64_d *TCGv_i64;\n>  typedef struct TCGv_ptr_d *TCGv_ptr;\n> +typedef struct TCGv_vec_d *TCGv_vec;\n>  typedef TCGv_ptr TCGv_env;\n>  #if TARGET_LONG_BITS == 32\n>  #define TCGv TCGv_i32\n> @@ -448,6 +466,11 @@ static inline TCGv_ptr QEMU_ARTIFICIAL MAKE_TCGV_PTR(intptr_t i)\n>      return (TCGv_ptr)i;\n>  }\n>\n> +static inline TCGv_vec QEMU_ARTIFICIAL MAKE_TCGV_VEC(intptr_t i)\n> +{\n> +    return (TCGv_vec)i;\n> +}\n> +\n>  static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_I32(TCGv_i32 t)\n>  {\n>      return (intptr_t)t;\n> @@ -463,6 +486,11 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)\n>      return (intptr_t)t;\n>  }\n>\n> +static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_VEC(TCGv_vec t)\n> +{\n> +    return (intptr_t)t;\n> +}\n> +\n>  #if TCG_TARGET_REG_BITS == 32\n>  #define TCGV_LOW(t) MAKE_TCGV_I32(GET_TCGV_I64(t))\n>  #define TCGV_HIGH(t) MAKE_TCGV_I32(GET_TCGV_I64(t) + 1)\n> @@ -471,15 +499,18 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)\n>  #define TCGV_EQUAL_I32(a, b) (GET_TCGV_I32(a) == GET_TCGV_I32(b))\n>  #define TCGV_EQUAL_I64(a, b) (GET_TCGV_I64(a) == GET_TCGV_I64(b))\n>  #define TCGV_EQUAL_PTR(a, b) (GET_TCGV_PTR(a) == GET_TCGV_PTR(b))\n> +#define TCGV_EQUAL_VEC(a, b) (GET_TCGV_VEC(a) == GET_TCGV_VEC(b))\n>\n>  /* Dummy definition to avoid compiler warnings.  */\n>  #define TCGV_UNUSED_I32(x) x = MAKE_TCGV_I32(-1)\n>  #define TCGV_UNUSED_I64(x) x = MAKE_TCGV_I64(-1)\n>  #define TCGV_UNUSED_PTR(x) x = MAKE_TCGV_PTR(-1)\n> +#define TCGV_UNUSED_VEC(x) x = MAKE_TCGV_VEC(-1)\n>\n>  #define TCGV_IS_UNUSED_I32(x) (GET_TCGV_I32(x) == -1)\n>  #define TCGV_IS_UNUSED_I64(x) (GET_TCGV_I64(x) == -1)\n>  #define TCGV_IS_UNUSED_PTR(x) (GET_TCGV_PTR(x) == -1)\n> +#define TCGV_IS_UNUSED_VEC(x) (GET_TCGV_VEC(x) == -1)\n>\n>  /* call flags */\n>  /* Helper does not read globals (either directly or through an exception). It\n> @@ -790,9 +821,12 @@ TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name);\n>\n>  TCGv_i32 tcg_temp_new_internal_i32(int temp_local);\n>  TCGv_i64 tcg_temp_new_internal_i64(int temp_local);\n> +TCGv_vec tcg_temp_new_vec(TCGType type);\n> +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match);\n>\n>  void tcg_temp_free_i32(TCGv_i32 arg);\n>  void tcg_temp_free_i64(TCGv_i64 arg);\n> +void tcg_temp_free_vec(TCGv_vec arg);\n>\n>  static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,\n>                                                const char *name)\n> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c\n> index 688d91755b..50b3177e5f 100644\n> --- a/tcg/tcg-op.c\n> +++ b/tcg/tcg-op.c\n> @@ -3072,3 +3072,237 @@ static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b)\n>  GEN_ATOMIC_HELPER(xchg, mov2, 0)\n>\n>  #undef GEN_ATOMIC_HELPER\n> +\n> +static void tcg_gen_op2_vec(TCGOpcode opc, TCGv_vec r, TCGv_vec a)\n> +{\n> +    TCGArg ri = GET_TCGV_VEC(r);\n> +    TCGArg ai = GET_TCGV_VEC(a);\n> +    TCGTemp *rt = &tcg_ctx.temps[ri];\n> +    TCGTemp *at = &tcg_ctx.temps[ai];\n> +    TCGType type = rt->base_type;\n> +\n> +    tcg_debug_assert(at->base_type == type);\n> +    tcg_gen_op3(&tcg_ctx, opc, ri, ai, type - TCG_TYPE_V64);\n> +}\n> +\n> +static void tcg_gen_op3_vec(TCGOpcode opc, TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    TCGArg ri = GET_TCGV_VEC(r);\n> +    TCGArg ai = GET_TCGV_VEC(a);\n> +    TCGArg bi = GET_TCGV_VEC(b);\n> +    TCGTemp *rt = &tcg_ctx.temps[ri];\n> +    TCGTemp *at = &tcg_ctx.temps[ai];\n> +    TCGTemp *bt = &tcg_ctx.temps[bi];\n> +    TCGType type = rt->base_type;\n> +\n> +    tcg_debug_assert(at->base_type == type);\n> +    tcg_debug_assert(bt->base_type == type);\n> +    tcg_gen_op4(&tcg_ctx, opc, ri, ai, bi, type - TCG_TYPE_V64);\n> +}\n> +\n> +void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a)\n> +{\n> +    if (!TCGV_EQUAL_VEC(r, a)) {\n> +        tcg_gen_op2_vec(INDEX_op_mov_vec, r, a);\n> +    }\n> +}\n> +\n> +void tcg_gen_movi_vec(TCGv_vec r, tcg_target_long a)\n> +{\n> +    TCGArg ri = GET_TCGV_VEC(r);\n> +    TCGTemp *rt = &tcg_ctx.temps[ri];\n> +    TCGType type = rt->base_type;\n> +\n> +    tcg_debug_assert(a == 0 || a == -1);\n> +    tcg_gen_op3(&tcg_ctx, INDEX_op_movi_vec, ri, a, type - TCG_TYPE_V64);\n> +}\n> +\n> +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr b, TCGArg o)\n> +{\n> +    TCGArg ri = GET_TCGV_VEC(r);\n> +    TCGArg bi = GET_TCGV_PTR(b);\n> +    TCGTemp *rt = &tcg_ctx.temps[ri];\n> +    TCGType type = rt->base_type;\n> +\n> +    tcg_gen_op4(&tcg_ctx, INDEX_op_ld_vec, ri, bi, o, type - TCG_TYPE_V64);\n> +}\n> +\n> +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr b, TCGArg o)\n> +{\n> +    TCGArg ri = GET_TCGV_VEC(r);\n> +    TCGArg bi = GET_TCGV_PTR(b);\n> +    TCGTemp *rt = &tcg_ctx.temps[ri];\n> +    TCGType type = rt->base_type;\n> +\n> +    tcg_gen_op4(&tcg_ctx, INDEX_op_st_vec, ri, bi, o, type - TCG_TYPE_V64);\n> +}\n> +\n> +/* Load data into a vector R from B+O using TYPE.  If R is wider than TYPE,\n> +   fill the high bits with zeros.  */\n> +void tcg_gen_ldz_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType type)\n> +{\n> +    TCGArg ri = GET_TCGV_VEC(r);\n> +    TCGArg bi = GET_TCGV_PTR(b);\n> +    TCGTemp *rt = &tcg_ctx.temps[ri];\n> +    TCGType btype = rt->base_type;\n> +\n> +    if (type < btype) {\n> +        tcg_gen_op5(&tcg_ctx, INDEX_op_ldz_vec, ri, bi, o,\n> +                    type - TCG_TYPE_V64, btype - TCG_TYPE_V64);\n> +    } else {\n> +        tcg_debug_assert(type == btype);\n> +        tcg_gen_op4(&tcg_ctx, INDEX_op_ld_vec, ri, bi, o, type - TCG_TYPE_V64);\n> +    }\n> +}\n> +\n> +/* Store data from vector R into B+O using TYPE.  If R is wider than TYPE,\n> +   store only the low bits.  */\n> +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType type)\n> +{\n> +    TCGArg ri = GET_TCGV_VEC(r);\n> +    TCGArg bi = GET_TCGV_PTR(b);\n> +    TCGTemp *rt = &tcg_ctx.temps[ri];\n> +    TCGType btype = rt->base_type;\n> +\n> +    tcg_debug_assert(type <= btype);\n> +    tcg_gen_op4(&tcg_ctx, INDEX_op_st_vec, ri, bi, o, type - TCG_TYPE_V64);\n> +}\n> +\n> +void tcg_gen_add8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_add8_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_add16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_add16_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_add32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_add32_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_add64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_add64_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_sub8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_sub8_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_sub16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_sub16_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_sub32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_sub32_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_sub64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_sub64_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_and_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_and_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_or_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_or_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_xor_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    tcg_gen_op3_vec(INDEX_op_xor_vec, r, a, b);\n> +}\n> +\n> +void tcg_gen_andc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    if (TCG_TARGET_HAS_andc_vec) {\n> +        tcg_gen_op3_vec(INDEX_op_andc_vec, r, a, b);\n> +    } else {\n> +        TCGv_vec t = tcg_temp_new_vec_matching(r);\n> +        tcg_gen_not_vec(t, b);\n> +        tcg_gen_and_vec(r, a, t);\n> +        tcg_temp_free_vec(t);\n> +    }\n> +}\n> +\n> +void tcg_gen_orc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b)\n> +{\n> +    if (TCG_TARGET_HAS_orc_vec) {\n> +        tcg_gen_op3_vec(INDEX_op_orc_vec, r, a, b);\n> +    } else {\n> +        TCGv_vec t = tcg_temp_new_vec_matching(r);\n> +        tcg_gen_not_vec(t, b);\n> +        tcg_gen_or_vec(r, a, t);\n> +        tcg_temp_free_vec(t);\n> +    }\n> +}\n> +\n> +void tcg_gen_not_vec(TCGv_vec r, TCGv_vec a)\n> +{\n> +    if (TCG_TARGET_HAS_not_vec) {\n> +        tcg_gen_op2_vec(INDEX_op_orc_vec, r, a);\n> +    } else {\n> +        TCGv_vec t = tcg_temp_new_vec_matching(r);\n> +        tcg_gen_movi_vec(t, -1);\n> +        tcg_gen_xor_vec(r, a, t);\n> +        tcg_temp_free_vec(t);\n> +    }\n> +}\n> +\n> +void tcg_gen_neg8_vec(TCGv_vec r, TCGv_vec a)\n> +{\n> +    if (TCG_TARGET_HAS_neg_vec) {\n> +        tcg_gen_op2_vec(INDEX_op_neg8_vec, r, a);\n> +    } else {\n> +        TCGv_vec t = tcg_temp_new_vec_matching(r);\n> +        tcg_gen_movi_vec(t, 0);\n> +        tcg_gen_sub8_vec(r, t, a);\n> +        tcg_temp_free_vec(t);\n> +    }\n> +}\n> +\n> +void tcg_gen_neg16_vec(TCGv_vec r, TCGv_vec a)\n> +{\n> +    if (TCG_TARGET_HAS_neg_vec) {\n> +        tcg_gen_op2_vec(INDEX_op_neg16_vec, r, a);\n> +    } else {\n> +        TCGv_vec t = tcg_temp_new_vec_matching(r);\n> +        tcg_gen_movi_vec(t, 0);\n> +        tcg_gen_sub16_vec(r, t, a);\n> +        tcg_temp_free_vec(t);\n> +    }\n> +}\n> +\n> +void tcg_gen_neg32_vec(TCGv_vec r, TCGv_vec a)\n> +{\n> +    if (TCG_TARGET_HAS_neg_vec) {\n> +        tcg_gen_op2_vec(INDEX_op_neg32_vec, r, a);\n> +    } else {\n> +        TCGv_vec t = tcg_temp_new_vec_matching(r);\n> +        tcg_gen_movi_vec(t, 0);\n> +        tcg_gen_sub32_vec(r, t, a);\n> +        tcg_temp_free_vec(t);\n> +    }\n> +}\n> +\n> +void tcg_gen_neg64_vec(TCGv_vec r, TCGv_vec a)\n> +{\n> +    if (TCG_TARGET_HAS_neg_vec) {\n> +        tcg_gen_op2_vec(INDEX_op_neg64_vec, r, a);\n> +    } else {\n> +        TCGv_vec t = tcg_temp_new_vec_matching(r);\n> +        tcg_gen_movi_vec(t, 0);\n> +        tcg_gen_sub64_vec(r, t, a);\n> +        tcg_temp_free_vec(t);\n> +    }\n> +}\n> diff --git a/tcg/tcg.c b/tcg/tcg.c\n> index dff9999bc6..a4d55efdf0 100644\n> --- a/tcg/tcg.c\n> +++ b/tcg/tcg.c\n> @@ -116,7 +116,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,\n>  static bool tcg_out_ldst_finalize(TCGContext *s);\n>  #endif\n>\n> -static TCGRegSet tcg_target_available_regs[2];\n> +static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT];\n>  static TCGRegSet tcg_target_call_clobber_regs;\n>\n>  #if TCG_TARGET_INSN_UNIT_SIZE == 1\n> @@ -664,6 +664,44 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local)\n>      return MAKE_TCGV_I64(idx);\n>  }\n>\n> +TCGv_vec tcg_temp_new_vec(TCGType type)\n> +{\n> +    int idx;\n> +\n> +#ifdef CONFIG_DEBUG_TCG\n> +    switch (type) {\n> +    case TCG_TYPE_V64:\n> +        assert(TCG_TARGET_HAS_v64);\n> +        break;\n> +    case TCG_TYPE_V128:\n> +        assert(TCG_TARGET_HAS_v128);\n> +        break;\n> +    case TCG_TYPE_V256:\n> +        assert(TCG_TARGET_HAS_v256);\n> +        break;\n> +    default:\n> +        g_assert_not_reached();\n> +    }\n> +#endif\n> +\n> +    idx = tcg_temp_new_internal(type, 0);\n> +    return MAKE_TCGV_VEC(idx);\n> +}\n> +\n\nA one line comment wouldn't go amiss here. This looks like we are\nallocating a new temp of the same type as an existing temp?\n\n> +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match)\n> +{\n> +    TCGContext *s = &tcg_ctx;\n> +    int idx = GET_TCGV_VEC(match);\n> +    TCGTemp *ts;\n> +\n> +    tcg_debug_assert(idx >= s->nb_globals && idx < s->nb_temps);\n> +    ts = &s->temps[idx];\n> +    tcg_debug_assert(ts->temp_allocated != 0);\n> +\n> +    idx = tcg_temp_new_internal(ts->base_type, 0);\n> +    return MAKE_TCGV_VEC(idx);\n> +}\n> +\n>  static void tcg_temp_free_internal(int idx)\n>  {\n>      TCGContext *s = &tcg_ctx;\n> @@ -696,6 +734,11 @@ void tcg_temp_free_i64(TCGv_i64 arg)\n>      tcg_temp_free_internal(GET_TCGV_I64(arg));\n>  }\n>\n> +void tcg_temp_free_vec(TCGv_vec arg)\n> +{\n> +    tcg_temp_free_internal(GET_TCGV_VEC(arg));\n> +}\n> +\n>  TCGv_i32 tcg_const_i32(int32_t val)\n>  {\n>      TCGv_i32 t0;\n> @@ -753,6 +796,9 @@ int tcg_check_temp_count(void)\n>     Test the runtime variable that controls each opcode.  */\n>  bool tcg_op_supported(TCGOpcode op)\n>  {\n> +    const bool have_vec\n> +        = TCG_TARGET_HAS_v64 | TCG_TARGET_HAS_v128 | TCG_TARGET_HAS_v256;\n> +\n>      switch (op) {\n>      case INDEX_op_discard:\n>      case INDEX_op_set_label:\n> @@ -966,6 +1012,35 @@ bool tcg_op_supported(TCGOpcode op)\n>      case INDEX_op_mulsh_i64:\n>          return TCG_TARGET_HAS_mulsh_i64;\n>\n> +    case INDEX_op_mov_vec:\n> +    case INDEX_op_movi_vec:\n> +    case INDEX_op_ld_vec:\n> +    case INDEX_op_ldz_vec:\n> +    case INDEX_op_st_vec:\n> +    case INDEX_op_add8_vec:\n> +    case INDEX_op_add16_vec:\n> +    case INDEX_op_add32_vec:\n> +    case INDEX_op_add64_vec:\n> +    case INDEX_op_sub8_vec:\n> +    case INDEX_op_sub16_vec:\n> +    case INDEX_op_sub32_vec:\n> +    case INDEX_op_sub64_vec:\n> +    case INDEX_op_and_vec:\n> +    case INDEX_op_or_vec:\n> +    case INDEX_op_xor_vec:\n> +        return have_vec;\n> +    case INDEX_op_not_vec:\n> +        return have_vec && TCG_TARGET_HAS_not_vec;\n> +    case INDEX_op_neg8_vec:\n> +    case INDEX_op_neg16_vec:\n> +    case INDEX_op_neg32_vec:\n> +    case INDEX_op_neg64_vec:\n> +        return have_vec && TCG_TARGET_HAS_neg_vec;\n> +    case INDEX_op_andc_vec:\n> +        return have_vec && TCG_TARGET_HAS_andc_vec;\n> +    case INDEX_op_orc_vec:\n> +        return have_vec && TCG_TARGET_HAS_orc_vec;\n> +\n>      case NB_OPS:\n>          break;\n>      }\n> diff --git a/tcg/README b/tcg/README\n> index 03bfb6acd4..3bf3af67db 100644\n> --- a/tcg/README\n> +++ b/tcg/README\n> @@ -503,6 +503,52 @@ of the memory access.\n>  For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a\n>  64-bit memory access specified in flags.\n>\n> +********* Host vector operations\n> +\n> +All of the vector ops have a final constant argument that specifies the\n> +length of the vector operation LEN as 64 << LEN bits.\n\nThat doesn't scan well. So would a 4 lane operation be encoded as 64 <<\n4? Is this because we are using the bottom bits for something?\n\n> +\n> +* mov_vec   v0, v1, len\n> +* ld_vec    v0, t1, len\n> +* st_vec    v0, t1, len\n> +\n> +  Move, load and store.\n> +\n> +* movi_vec  v0, c, len\n> +\n> +  Copy C across the entire vector.\n> +  At present the only supported values for C are 0 and -1.\n\nI guess this is why the size in unimportant? This is for clearing or\nsetting the whole of the vector? What does len mean in this case?\n\n> +\n> +* add8_vec    v0, v1, v2, len\n> +* add16_vec   v0, v1, v2, len\n> +* add32_vec   v0, v1, v2, len\n> +* add64_vec   v0, v1, v2, len\n> +\n> +  v0 = v1 + v2, in elements of 8/16/32/64 bits, across len.\n> +\n> +* sub8_vec    v0, v1, v2, len\n> +* sub16_vec   v0, v1, v2, len\n> +* sub32_vec   v0, v1, v2, len\n> +* sub64_vec   v0, v1, v2, len\n> +\n> +  Similarly, v0 = v1 - v2.\n> +\n> +* neg8_vec    v0, v1, len\n> +* neg16_vec   v0, v1, len\n> +* neg32_vec   v0, v1, len\n> +* neg64_vec   v0, v1, len\n> +\n> +  Similarly, v0 = -v1.\n> +\n> +* and_vec     v0, v1, v2, len\n> +* or_vec      v0, v1, v2, len\n> +* xor_vec     v0, v1, v2, len\n> +* andc_vec    v0, v1, v2, len\n> +* orc_vec     v0, v1, v2, len\n> +* not_vec     v0, v1, len\n> +\n> +  Similarly, logical operations.\n\nSimilarly, logical operations with and without compliment?\n\n> +\n>  *********\n>\n>  Note 1: Some shortcuts are defined when the last operand is known to be\n\n\n--\nAlex Bennée","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=linaro.org header.i=@linaro.org\n\theader.b=\"KN1yzDPH\"; dkim-atps=neutral"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3y1rbx0LfZz9t3m\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 27 Sep 2017 05:29:00 +1000 (AEST)","from localhost ([::1]:50869 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dwvXN-0000l3-E5\n\tfor incoming@patchwork.ozlabs.org; Tue, 26 Sep 2017 15:28:57 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:33127)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <alex.bennee@linaro.org>) id 1dwvWq-0000hR-N3\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 15:28:27 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <alex.bennee@linaro.org>) id 1dwvWm-00040x-Jw\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 15:28:24 -0400","from mail-wm0-x235.google.com ([2a00:1450:400c:c09::235]:46577)\n\tby eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)\n\t(Exim 4.71) (envelope-from <alex.bennee@linaro.org>)\n\tid 1dwvWm-00040i-7c\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 15:28:20 -0400","by mail-wm0-x235.google.com with SMTP id m72so11289328wmc.1\n\tfor <qemu-devel@nongnu.org>; Tue, 26 Sep 2017 12:28:19 -0700 (PDT)","from zen.linaro.local ([81.128.185.34])\n\tby smtp.gmail.com with ESMTPSA id\n\tp59sm14260904wrc.75.2017.09.26.12.28.17\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tTue, 26 Sep 2017 12:28:17 -0700 (PDT)","from zen (localhost [127.0.0.1])\n\tby zen.linaro.local (Postfix) with ESMTPS id E16FC3E0137;\n\tTue, 26 Sep 2017 20:28:16 +0100 (BST)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;\n\th=references:user-agent:from:to:cc:subject:in-reply-to:date\n\t:message-id:mime-version:content-transfer-encoding;\n\tbh=HbB+JlWMraOfvvC7UdPAjijajZXsFTD+MJsAPEhyio8=;\n\tb=KN1yzDPHbpO214id4NuOvhgRj66jTK/OayqxwHy69GDKjA7G56wW5h2L7Eh+0DFENZ\n\t/nGnwekc4iOP7S2WmXRNWZViUA1pfth6WwqeFCBSnjEp8a9kp9obYep4GCv8JSDY3Umv\n\t1Q1VDmxu5MD0WW6KjxW/0HACsUnZp2DlLL1rs=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:references:user-agent:from:to:cc:subject\n\t:in-reply-to:date:message-id:mime-version:content-transfer-encoding; \n\tbh=HbB+JlWMraOfvvC7UdPAjijajZXsFTD+MJsAPEhyio8=;\n\tb=HqoBHuJo463baozNDSnZBBDTvxDHz6pqHIaXrnkB9P4nR/Zwz0Qjj5OMiFAo/Dpsqa\n\tOkzKI1Dp9HBNZRiCPJw7Ik7+gx1psr7P6MjcL92KR3V7B2gfdoPfDVA+Un9Iowl7przm\n\telWtGL336cvoguYh7ghtTqglyK1jWxScTLTx/wmnhSe0l0uaVz0Kf+No6BESFJUvJ6X/\n\tVs0NXdN8Tv7bjZ23iYbyXu42+DZLn6n4SUbZFepuOhA1b4KT9zxULDjfBT7RoWiBAHk5\n\tW0abqaufqyaPuxf1zjiiiR6afBI7Bp9B6PyKJIz4HJM4h9m1b0D1sTko9RBk3xujhk5D\n\tXW6w==","X-Gm-Message-State":"AHPjjUgL6KcnLHIyFATcjbf43jqiBN/NbCPnp37M6b0AnXgko7iou4bX\n\tLIeULyoN/tjvbJaK4NGvNZOYcJeaZeM=","X-Google-Smtp-Source":"AOwi7QAbg3GfUca2/mAozHyyve2kbSLZtyaqImf25XerSnIwyAoWgcwKL2l+P9D+6xkJEZ3jd4IjEw==","X-Received":"by 10.28.227.68 with SMTP id a65mr4699661wmh.88.1506454098416;\n\tTue, 26 Sep 2017 12:28:18 -0700 (PDT)","References":"<20170916023417.14599-1-richard.henderson@linaro.org>\n\t<20170916023417.14599-2-richard.henderson@linaro.org>","User-agent":"mu4e 0.9.19; emacs 26.0.60","From":"Alex =?utf-8?q?Benn=C3=A9e?= <alex.bennee@linaro.org>","To":"Richard Henderson <richard.henderson@linaro.org>","In-reply-to":"<20170916023417.14599-2-richard.henderson@linaro.org>","Date":"Tue, 26 Sep 2017 20:28:16 +0100","Message-ID":"<87shf9cl9r.fsf@linaro.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"8bit","X-detected-operating-system":"by eggs.gnu.org: Genre and OS details not\n\trecognized.","X-Received-From":"2a00:1450:400c:c09::235","Subject":"Re: [Qemu-devel] [PATCH v3 1/6] tcg: Add types and operations for\n\thost vectors","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, f4bug@amsat.org","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1775924,"web_url":"http://patchwork.ozlabs.org/comment/1775924/","msgid":"<87r2utccs6.fsf@linaro.org>","list_archive_url":null,"date":"2017-09-26T22:31:37","subject":"Re: [Qemu-devel] [PATCH v3 2/6] tcg: Add vector expanders","submitter":{"id":39532,"url":"http://patchwork.ozlabs.org/api/people/39532/","name":"Alex Bennée","email":"alex.bennee@linaro.org"},"content":"Richard Henderson <richard.henderson@linaro.org> writes:\n\n> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>\n\nOther than live comments:\n\nReviewed-by: Alex Bennée <alex.bennee@linaro.org>\n\n> ---\n>  Makefile.target              |   2 +-\n>  accel/tcg/tcg-runtime.h      |  24 ++\n>  tcg/tcg-gvec-desc.h          |  49 +++\n>  tcg/tcg-op-gvec.h            | 143 ++++++++\n>  accel/tcg/tcg-runtime-gvec.c | 255 +++++++++++++\n>  tcg/tcg-op-gvec.c            | 853 +++++++++++++++++++++++++++++++++++++++++++\n>  accel/tcg/Makefile.objs      |   2 +-\n>  7 files changed, 1326 insertions(+), 2 deletions(-)\n>  create mode 100644 tcg/tcg-gvec-desc.h\n>  create mode 100644 tcg/tcg-op-gvec.h\n>  create mode 100644 accel/tcg/tcg-runtime-gvec.c\n>  create mode 100644 tcg/tcg-op-gvec.c\n>\n> diff --git a/Makefile.target b/Makefile.target\n> index 6361f957fb..f9967feef5 100644\n> --- a/Makefile.target\n> +++ b/Makefile.target\n> @@ -94,7 +94,7 @@ all: $(PROGS) stap\n>  obj-y += exec.o\n>  obj-y += accel/\n>  obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o\n> -obj-$(CONFIG_TCG) += tcg/tcg-common.o\n> +obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/tcg-op-gvec.o\n>  obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o\n>  obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o\n>  obj-y += fpu/softfloat.o\n> diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h\n> index c41d38a557..61c0ce39d3 100644\n> --- a/accel/tcg/tcg-runtime.h\n> +++ b/accel/tcg/tcg-runtime.h\n> @@ -134,3 +134,27 @@ GEN_ATOMIC_HELPERS(xor_fetch)\n>  GEN_ATOMIC_HELPERS(xchg)\n>\n>  #undef GEN_ATOMIC_HELPERS\n> +\n> +DEF_HELPER_FLAGS_3(gvec_mov, TCG_CALL_NO_RWG, void, ptr, ptr, i32)\n> +\n> +DEF_HELPER_FLAGS_4(gvec_add8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_add16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_add32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_add64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +\n> +DEF_HELPER_FLAGS_4(gvec_sub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +\n> +DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_3(gvec_neg64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)\n> +\n> +DEF_HELPER_FLAGS_3(gvec_not, TCG_CALL_NO_RWG, void, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_and, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> +DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)\n> diff --git a/tcg/tcg-gvec-desc.h b/tcg/tcg-gvec-desc.h\n> new file mode 100644\n> index 0000000000..8ba9a8168d\n> --- /dev/null\n> +++ b/tcg/tcg-gvec-desc.h\n> @@ -0,0 +1,49 @@\n> +/*\n> + *  Generic vector operation descriptor\n> + *\n> + *  Copyright (c) 2017 Linaro\n> + *\n> + * This library is free software; you can redistribute it and/or\n> + * modify it under the terms of the GNU Lesser General Public\n> + * License as published by the Free Software Foundation; either\n> + * version 2 of the License, or (at your option) any later version.\n> + *\n> + * This library is distributed in the hope that it will be useful,\n> + * but WITHOUT ANY WARRANTY; without even the implied warranty of\n> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n> + * Lesser General Public License for more details.\n> + *\n> + * You should have received a copy of the GNU Lesser General Public\n> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.\n> + */\n> +\n> +/* ??? These bit widths are set for ARM SVE, maxing out at 256 byte vectors. */\n> +#define SIMD_OPRSZ_SHIFT   0\n> +#define SIMD_OPRSZ_BITS    5\n> +\n> +#define SIMD_MAXSZ_SHIFT   (SIMD_OPRSZ_SHIFT + SIMD_OPRSZ_BITS)\n> +#define SIMD_MAXSZ_BITS    5\n> +\n> +#define SIMD_DATA_SHIFT    (SIMD_MAXSZ_SHIFT + SIMD_MAXSZ_BITS)\n> +#define SIMD_DATA_BITS     (32 - SIMD_DATA_SHIFT)\n> +\n> +/* Create a descriptor from components.  */\n> +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data);\n> +\n> +/* Extract the operation size from a descriptor.  */\n> +static inline intptr_t simd_oprsz(uint32_t desc)\n> +{\n> +    return (extract32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS) + 1) * 8;\n> +}\n> +\n> +/* Extract the max vector size from a descriptor.  */\n> +static inline intptr_t simd_maxsz(uint32_t desc)\n> +{\n> +    return (extract32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS) + 1) * 8;\n> +}\n> +\n> +/* Extract the operation-specific data from a descriptor.  */\n> +static inline int32_t simd_data(uint32_t desc)\n> +{\n> +    return sextract32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS);\n> +}\n> diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h\n> new file mode 100644\n> index 0000000000..28bd77f1dc\n> --- /dev/null\n> +++ b/tcg/tcg-op-gvec.h\n> @@ -0,0 +1,143 @@\n> +/*\n> + *  Generic vector operation expansion\n> + *\n> + *  Copyright (c) 2017 Linaro\n> + *\n> + * This library is free software; you can redistribute it and/or\n> + * modify it under the terms of the GNU Lesser General Public\n> + * License as published by the Free Software Foundation; either\n> + * version 2 of the License, or (at your option) any later version.\n> + *\n> + * This library is distributed in the hope that it will be useful,\n> + * but WITHOUT ANY WARRANTY; without even the implied warranty of\n> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n> + * Lesser General Public License for more details.\n> + *\n> + * You should have received a copy of the GNU Lesser General Public\n> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.\n> + */\n> +\n> +/*\n> + * \"Generic\" vectors.  All operands are given as offsets from ENV,\n> + * and therefore cannot also be allocated via tcg_global_mem_new_*.\n> + * OPRSZ is the byte size of the vector upon which the operation is performed.\n> + * MAXSZ is the byte size of the full vector; bytes beyond OPSZ are cleared.\n> + *\n> + * All sizes must be 8 or any multiple of 16.\n> + * When OPRSZ is 8, the alignment may be 8, otherwise must be 16.\n> + * Operands may completely, but not partially, overlap.\n> + */\n> +\n> +/* Expand a call to a gvec-style helper, with pointers to two vector\n> +   operands, and a descriptor (see tcg-gvec-desc.h).  */\n> +typedef void (gen_helper_gvec_2)(TCGv_ptr, TCGv_ptr, TCGv_i32);\n> +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,\n> +                        uint32_t oprsz, uint32_t maxsz, int32_t data,\n> +                        gen_helper_gvec_2 *fn);\n> +\n> +/* Similarly, passing an extra pointer (e.g. env or float_status).  */\n> +typedef void (gen_helper_gvec_2_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);\n> +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs,\n> +                        TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz,\n> +                        int32_t data, gen_helper_gvec_2_ptr *fn);\n> +\n> +/* Similarly, with three vector operands.  */\n> +typedef void (gen_helper_gvec_3)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);\n> +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t oprsz, uint32_t maxsz, int32_t data,\n> +                        gen_helper_gvec_3 *fn);\n> +\n> +typedef void (gen_helper_gvec_3_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr,\n> +                                     TCGv_ptr, TCGv_i32);\n> +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz,\n> +                        int32_t data, gen_helper_gvec_3_ptr *fn);\n> +\n> +/* Expand a gvec operation.  Either inline or out-of-line depending on\n> +   the actual vector size and the operations supported by the host.  */\n> +typedef struct {\n> +    /* Expand inline as a 64-bit or 32-bit integer.\n> +       Only one of these will be non-NULL.  */\n> +    void (*fni8)(TCGv_i64, TCGv_i64);\n> +    void (*fni4)(TCGv_i32, TCGv_i32);\n> +    /* Expand inline with a host vector type.  */\n> +    void (*fniv)(TCGv_vec, TCGv_vec);\n> +    /* Expand out-of-line helper w/descriptor.  */\n> +    gen_helper_gvec_2 *fno;\n> +    /* Prefer i64 to v64.  */\n> +    bool prefer_i64;\n> +} GVecGen2;\n> +\n> +typedef struct {\n> +    /* Expand inline as a 64-bit or 32-bit integer.\n> +       Only one of these will be non-NULL.  */\n> +    void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64);\n> +    void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32);\n> +    /* Expand inline with a host vector type.  */\n> +    void (*fniv)(TCGv_vec, TCGv_vec, TCGv_vec);\n> +    /* Expand out-of-line helper w/descriptor.  */\n> +    gen_helper_gvec_3 *fno;\n> +    /* Prefer i64 to v64.  */\n> +    bool prefer_i64;\n> +    /* Load dest as a 3rd source operand.  */\n> +    bool load_dest;\n> +} GVecGen3;\n> +\n> +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,\n> +                    uint32_t opsz, uint32_t clsz, const GVecGen2 *);\n> +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                    uint32_t opsz, uint32_t clsz, const GVecGen3 *);\n> +\n> +/* Expand a specific vector operation.  */\n> +\n> +#define DEF(X) \\\n> +    void tcg_gen_gvec_##X(uint32_t dofs, uint32_t aofs, \\\n> +                          uint32_t opsz, uint32_t clsz)\n> +\n> +DEF(mov);\n> +DEF(not);\n> +DEF(neg8);\n> +DEF(neg16);\n> +DEF(neg32);\n> +DEF(neg64);\n> +\n> +#undef DEF\n> +#define DEF(X) \\\n> +    void tcg_gen_gvec_##X(uint32_t dofs, uint32_t aofs, uint32_t bofs, \\\n> +                          uint32_t opsz, uint32_t clsz)\n> +\n> +DEF(add8);\n> +DEF(add16);\n> +DEF(add32);\n> +DEF(add64);\n> +\n> +DEF(sub8);\n> +DEF(sub16);\n> +DEF(sub32);\n> +DEF(sub64);\n> +\n> +DEF(and);\n> +DEF(or);\n> +DEF(xor);\n> +DEF(andc);\n> +DEF(orc);\n> +\n> +#undef DEF\n> +\n> +/*\n> + * 64-bit vector operations.  Use these when the register has been allocated\n> + * with tcg_global_mem_new_i64, and so we cannot also address it via pointer.\n> + * OPRSZ = MAXSZ = 8.\n> + */\n> +\n> +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 a);\n> +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 a);\n> +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a);\n> +\n> +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);\n> +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);\n> +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);\n> +\n> +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);\n> +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);\n> +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);\n> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c\n> new file mode 100644\n> index 0000000000..c75e76367c\n> --- /dev/null\n> +++ b/accel/tcg/tcg-runtime-gvec.c\n> @@ -0,0 +1,255 @@\n> +/*\n> + *  Generic vectorized operation runtime\n> + *\n> + *  Copyright (c) 2017 Linaro\n> + *\n> + * This library is free software; you can redistribute it and/or\n> + * modify it under the terms of the GNU Lesser General Public\n> + * License as published by the Free Software Foundation; either\n> + * version 2 of the License, or (at your option) any later version.\n> + *\n> + * This library is distributed in the hope that it will be useful,\n> + * but WITHOUT ANY WARRANTY; without even the implied warranty of\n> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n> + * Lesser General Public License for more details.\n> + *\n> + * You should have received a copy of the GNU Lesser General Public\n> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.\n> + */\n> +\n> +#include \"qemu/osdep.h\"\n> +#include \"qemu/host-utils.h\"\n> +#include \"cpu.h\"\n> +#include \"exec/helper-proto.h\"\n> +#include \"tcg-gvec-desc.h\"\n> +\n> +\n> +/* Virtually all hosts support 16-byte vectors.  Those that don't can emulate\n> +   them via GCC's generic vector extension.  This turns out to be simpler and\n> +   more reliable than getting the compiler to autovectorize.\n> +\n> +   In tcg-op-gvec.c, we asserted that both the size and alignment\n> +   of the data are multiples of 16.  */\n> +\n> +typedef uint8_t vec8 __attribute__((vector_size(16)));\n> +typedef uint16_t vec16 __attribute__((vector_size(16)));\n> +typedef uint32_t vec32 __attribute__((vector_size(16)));\n> +typedef uint64_t vec64 __attribute__((vector_size(16)));\n> +\n> +static inline void clear_high(void *d, intptr_t oprsz, uint32_t desc)\n> +{\n> +    intptr_t maxsz = simd_maxsz(desc);\n> +    intptr_t i;\n> +\n> +    if (unlikely(maxsz > oprsz)) {\n> +        for (i = oprsz; i < maxsz; i += sizeof(vec64)) {\n> +            *(vec64 *)(d + i) = (vec64){ 0 };\n> +        }\n> +    }\n> +}\n> +\n> +void HELPER(gvec_add8)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec8)) {\n> +        *(vec8 *)(d + i) = *(vec8 *)(a + i) + *(vec8 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_add16)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec16)) {\n> +        *(vec16 *)(d + i) = *(vec16 *)(a + i) + *(vec16 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_add32)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec32)) {\n> +        *(vec32 *)(d + i) = *(vec32 *)(a + i) + *(vec32 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_add64)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = *(vec64 *)(a + i) + *(vec64 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_sub8)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec8)) {\n> +        *(vec8 *)(d + i) = *(vec8 *)(a + i) - *(vec8 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_sub16)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec16)) {\n> +        *(vec16 *)(d + i) = *(vec16 *)(a + i) - *(vec16 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_sub32)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec32)) {\n> +        *(vec32 *)(d + i) = *(vec32 *)(a + i) - *(vec32 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = *(vec64 *)(a + i) - *(vec64 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_neg8)(void *d, void *a, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec8)) {\n> +        *(vec8 *)(d + i) = -*(vec8 *)(a + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_neg16)(void *d, void *a, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec16)) {\n> +        *(vec16 *)(d + i) = -*(vec16 *)(a + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_neg32)(void *d, void *a, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec32)) {\n> +        *(vec32 *)(d + i) = -*(vec32 *)(a + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_neg64)(void *d, void *a, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = -*(vec64 *)(a + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_mov)(void *d, void *a, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +\n> +    memcpy(d, a, oprsz);\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_not)(void *d, void *a, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = ~*(vec64 *)(a + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_and)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = *(vec64 *)(a + i) & *(vec64 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_or)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = *(vec64 *)(a + i) | *(vec64 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_xor)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = *(vec64 *)(a + i) ^ *(vec64 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_andc)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = *(vec64 *)(a + i) &~ *(vec64 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> +\n> +void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc)\n> +{\n> +    intptr_t oprsz = simd_oprsz(desc);\n> +    intptr_t i;\n> +\n> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {\n> +        *(vec64 *)(d + i) = *(vec64 *)(a + i) |~ *(vec64 *)(b + i);\n> +    }\n> +    clear_high(d, oprsz, desc);\n> +}\n> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c\n> new file mode 100644\n> index 0000000000..7464321eba\n> --- /dev/null\n> +++ b/tcg/tcg-op-gvec.c\n> @@ -0,0 +1,853 @@\n> +/*\n> + *  Generic vector operation expansion\n> + *\n> + *  Copyright (c) 2017 Linaro\n> + *\n> + * This library is free software; you can redistribute it and/or\n> + * modify it under the terms of the GNU Lesser General Public\n> + * License as published by the Free Software Foundation; either\n> + * version 2 of the License, or (at your option) any later version.\n> + *\n> + * This library is distributed in the hope that it will be useful,\n> + * but WITHOUT ANY WARRANTY; without even the implied warranty of\n> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n> + * Lesser General Public License for more details.\n> + *\n> + * You should have received a copy of the GNU Lesser General Public\n> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.\n> + */\n> +\n> +#include \"qemu/osdep.h\"\n> +#include \"qemu-common.h\"\n> +#include \"tcg.h\"\n> +#include \"tcg-op.h\"\n> +#include \"tcg-op-gvec.h\"\n> +#include \"tcg-gvec-desc.h\"\n> +\n> +#define REP8(x)    ((x) * 0x0101010101010101ull)\n> +#define REP16(x)   ((x) * 0x0001000100010001ull)\n> +\n> +#define MAX_UNROLL  4\n> +\n> +/* Verify vector size and alignment rules.  OFS should be the OR of all\n> +   of the operand offsets so that we can check them all at once.  */\n> +static void check_size_align(uint32_t oprsz, uint32_t maxsz, uint32_t ofs)\n> +{\n> +    uint32_t align = maxsz > 16 || oprsz >= 16 ? 15 : 7;\n> +    tcg_debug_assert(oprsz > 0);\n> +    tcg_debug_assert(oprsz <= maxsz);\n> +    tcg_debug_assert((oprsz & align) == 0);\n> +    tcg_debug_assert((maxsz & align) == 0);\n> +    tcg_debug_assert((ofs & align) == 0);\n> +}\n> +\n> +/* Verify vector overlap rules for two operands.  */\n> +static void check_overlap_2(uint32_t d, uint32_t a, uint32_t s)\n> +{\n> +    tcg_debug_assert(d == a || d + s <= a || a + s <= d);\n> +}\n> +\n> +/* Verify vector overlap rules for three operands.  */\n> +static void check_overlap_3(uint32_t d, uint32_t a, uint32_t b, uint32_t s)\n> +{\n> +    check_overlap_2(d, a, s);\n> +    check_overlap_2(d, b, s);\n> +    check_overlap_2(a, b, s);\n> +}\n> +\n> +/* Create a descriptor from components.  */\n> +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data)\n> +{\n> +    uint32_t desc = 0;\n> +\n> +    assert(oprsz % 8 == 0 && oprsz <= (8 << SIMD_OPRSZ_BITS));\n> +    assert(maxsz % 8 == 0 && maxsz <= (8 << SIMD_MAXSZ_BITS));\n> +    assert(data == sextract32(data, 0, SIMD_DATA_BITS));\n> +\n> +    oprsz = (oprsz / 8) - 1;\n> +    maxsz = (maxsz / 8) - 1;\n> +    desc = deposit32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS, oprsz);\n> +    desc = deposit32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS, maxsz);\n> +    desc = deposit32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS, data);\n> +\n> +    return desc;\n> +}\n> +\n> +/* Generate a call to a gvec-style helper with two vector operands.  */\n> +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,\n> +                        uint32_t oprsz, uint32_t maxsz, int32_t data,\n> +                        gen_helper_gvec_2 *fn)\n> +{\n> +    TCGv_ptr a0, a1;\n> +    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));\n> +\n> +    a0 = tcg_temp_new_ptr();\n> +    a1 = tcg_temp_new_ptr();\n> +\n> +    tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs);\n> +    tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs);\n> +\n> +    fn(a0, a1, desc);\n> +\n> +    tcg_temp_free_ptr(a0);\n> +    tcg_temp_free_ptr(a1);\n> +    tcg_temp_free_i32(desc);\n> +}\n> +\n> +/* Generate a call to a gvec-style helper with three vector operands.  */\n> +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t oprsz, uint32_t maxsz, int32_t data,\n> +                        gen_helper_gvec_3 *fn)\n> +{\n> +    TCGv_ptr a0, a1, a2;\n> +    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));\n> +\n> +    a0 = tcg_temp_new_ptr();\n> +    a1 = tcg_temp_new_ptr();\n> +    a2 = tcg_temp_new_ptr();\n> +\n> +    tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs);\n> +    tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs);\n> +    tcg_gen_addi_ptr(a2, tcg_ctx.tcg_env, bofs);\n> +\n> +    fn(a0, a1, a2, desc);\n> +\n> +    tcg_temp_free_ptr(a0);\n> +    tcg_temp_free_ptr(a1);\n> +    tcg_temp_free_ptr(a2);\n> +    tcg_temp_free_i32(desc);\n> +}\n> +\n> +/* Generate a call to a gvec-style helper with three vector operands\n> +   and an extra pointer operand.  */\n> +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs,\n> +                        TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz,\n> +                        int32_t data, gen_helper_gvec_2_ptr *fn)\n> +{\n> +    TCGv_ptr a0, a1;\n> +    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));\n> +\n> +    a0 = tcg_temp_new_ptr();\n> +    a1 = tcg_temp_new_ptr();\n> +\n> +    tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs);\n> +    tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs);\n> +\n> +    fn(a0, a1, ptr, desc);\n> +\n> +    tcg_temp_free_ptr(a0);\n> +    tcg_temp_free_ptr(a1);\n> +    tcg_temp_free_i32(desc);\n> +}\n> +\n> +/* Generate a call to a gvec-style helper with three vector operands\n> +   and an extra pointer operand.  */\n> +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz,\n> +                        int32_t data, gen_helper_gvec_3_ptr *fn)\n> +{\n> +    TCGv_ptr a0, a1, a2;\n> +    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));\n> +\n> +    a0 = tcg_temp_new_ptr();\n> +    a1 = tcg_temp_new_ptr();\n> +    a2 = tcg_temp_new_ptr();\n> +\n> +    tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs);\n> +    tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs);\n> +    tcg_gen_addi_ptr(a2, tcg_ctx.tcg_env, bofs);\n> +\n> +    fn(a0, a1, a2, ptr, desc);\n> +\n> +    tcg_temp_free_ptr(a0);\n> +    tcg_temp_free_ptr(a1);\n> +    tcg_temp_free_ptr(a2);\n> +    tcg_temp_free_i32(desc);\n> +}\n> +\n> +/* Return true if we want to implement something of OPRSZ bytes\n> +   in units of LNSZ.  This limits the expansion of inline code.  */\n> +static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz)\n> +{\n> +    uint32_t lnct = oprsz / lnsz;\n> +    return lnct >= 1 && lnct <= MAX_UNROLL;\n> +}\n> +\n> +/* Clear MAXSZ bytes at DOFS.  */\n> +static void expand_clr(uint32_t dofs, uint32_t maxsz)\n> +{\n> +    if (maxsz >= 16 && TCG_TARGET_HAS_v128) {\n> +        TCGv_vec zero;\n> +\n> +        if (maxsz >= 32 && TCG_TARGET_HAS_v256) {\n> +            zero = tcg_temp_new_vec(TCG_TYPE_V256);\n> +            tcg_gen_movi_vec(zero, 0);\n> +\n> +            for (; maxsz >= 32; dofs += 32, maxsz -= 32) {\n> +                tcg_gen_stl_vec(zero, tcg_ctx.tcg_env, dofs, TCG_TYPE_V256);\n> +            }\n> +        } else {\n> +            zero = tcg_temp_new_vec(TCG_TYPE_V128);\n> +            tcg_gen_movi_vec(zero, 0);\n> +        }\n> +        for (; maxsz >= 16; dofs += 16, maxsz -= 16) {\n> +            tcg_gen_stl_vec(zero, tcg_ctx.tcg_env, dofs, TCG_TYPE_V128);\n> +        }\n> +\n> +        tcg_temp_free_vec(zero);\n> +    } if (TCG_TARGET_REG_BITS == 64) {\n> +        TCGv_i64 zero = tcg_const_i64(0);\n> +\n> +        for (; maxsz >= 8; dofs += 8, maxsz -= 8) {\n> +            tcg_gen_st_i64(zero, tcg_ctx.tcg_env, dofs);\n> +        }\n> +\n> +        tcg_temp_free_i64(zero);\n> +    } else if (TCG_TARGET_HAS_v64) {\n> +        TCGv_vec zero = tcg_temp_new_vec(TCG_TYPE_V64);\n> +\n> +        tcg_gen_movi_vec(zero, 0);\n> +        for (; maxsz >= 8; dofs += 8, maxsz -= 8) {\n> +            tcg_gen_st_vec(zero, tcg_ctx.tcg_env, dofs);\n> +        }\n> +\n> +        tcg_temp_free_vec(zero);\n> +    } else {\n> +        TCGv_i32 zero = tcg_const_i32(0);\n> +\n> +        for (; maxsz >= 4; dofs += 4, maxsz -= 4) {\n> +            tcg_gen_st_i32(zero, tcg_ctx.tcg_env, dofs);\n> +        }\n> +\n> +        tcg_temp_free_i32(zero);\n> +    }\n> +}\n> +\n> +/* Expand OPSZ bytes worth of two-operand operations using i32 elements.  */\n> +static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t opsz,\n> +                         void (*fni)(TCGv_i32, TCGv_i32))\n> +{\n> +    TCGv_i32 t0 = tcg_temp_new_i32();\n> +    uint32_t i;\n> +\n> +    for (i = 0; i < opsz; i += 4) {\n> +        tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i);\n> +        fni(t0, t0);\n> +        tcg_gen_st_i32(t0, tcg_ctx.tcg_env, dofs + i);\n> +    }\n> +    tcg_temp_free_i32(t0);\n> +}\n> +\n> +/* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */\n> +static void expand_3_i32(uint32_t dofs, uint32_t aofs,\n> +                         uint32_t bofs, uint32_t opsz, bool load_dest,\n> +                         void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32))\n> +{\n> +    TCGv_i32 t0 = tcg_temp_new_i32();\n> +    TCGv_i32 t1 = tcg_temp_new_i32();\n> +    TCGv_i32 t2 = tcg_temp_new_i32();\n> +    uint32_t i;\n> +\n> +    for (i = 0; i < opsz; i += 4) {\n> +        tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i);\n> +        tcg_gen_ld_i32(t1, tcg_ctx.tcg_env, bofs + i);\n> +        if (load_dest) {\n> +            tcg_gen_ld_i32(t2, tcg_ctx.tcg_env, dofs + i);\n> +        }\n> +        fni(t2, t0, t1);\n> +        tcg_gen_st_i32(t2, tcg_ctx.tcg_env, dofs + i);\n> +    }\n> +    tcg_temp_free_i32(t2);\n> +    tcg_temp_free_i32(t1);\n> +    tcg_temp_free_i32(t0);\n> +}\n> +\n> +/* Expand OPSZ bytes worth of two-operand operations using i64 elements.  */\n> +static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz,\n> +                         void (*fni)(TCGv_i64, TCGv_i64))\n> +{\n> +    TCGv_i64 t0 = tcg_temp_new_i64();\n> +    uint32_t i;\n> +\n> +    for (i = 0; i < opsz; i += 8) {\n> +        tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i);\n> +        fni(t0, t0);\n> +        tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i);\n> +    }\n> +    tcg_temp_free_i64(t0);\n> +}\n> +\n> +/* Expand OPSZ bytes worth of three-operand operations using i64 elements.  */\n> +static void expand_3_i64(uint32_t dofs, uint32_t aofs,\n> +                         uint32_t bofs, uint32_t opsz, bool load_dest,\n> +                         void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64))\n> +{\n> +    TCGv_i64 t0 = tcg_temp_new_i64();\n> +    TCGv_i64 t1 = tcg_temp_new_i64();\n> +    TCGv_i64 t2 = tcg_temp_new_i64();\n> +    uint32_t i;\n> +\n> +    for (i = 0; i < opsz; i += 8) {\n> +        tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i);\n> +        tcg_gen_ld_i64(t1, tcg_ctx.tcg_env, bofs + i);\n> +        if (load_dest) {\n> +            tcg_gen_ld_i64(t2, tcg_ctx.tcg_env, dofs + i);\n> +        }\n> +        fni(t2, t0, t1);\n> +        tcg_gen_st_i64(t2, tcg_ctx.tcg_env, dofs + i);\n> +    }\n> +    tcg_temp_free_i64(t2);\n> +    tcg_temp_free_i64(t1);\n> +    tcg_temp_free_i64(t0);\n> +}\n> +\n> +/* Expand OPSZ bytes worth of two-operand operations using host vectors.  */\n> +static void expand_2_vec(uint32_t dofs, uint32_t aofs,\n> +                         uint32_t opsz, uint32_t tysz, TCGType type,\n> +                         void (*fni)(TCGv_vec, TCGv_vec))\n> +{\n> +    TCGv_vec t0 = tcg_temp_new_vec(type);\n> +    uint32_t i;\n> +\n> +    for (i = 0; i < opsz; i += tysz) {\n> +        tcg_gen_ld_vec(t0, tcg_ctx.tcg_env, aofs + i);\n> +        fni(t0, t0);\n> +        tcg_gen_st_vec(t0, tcg_ctx.tcg_env, dofs + i);\n> +    }\n> +    tcg_temp_free_vec(t0);\n> +}\n> +\n> +/* Expand OPSZ bytes worth of three-operand operations using host vectors.  */\n> +static void expand_3_vec(uint32_t dofs, uint32_t aofs,\n> +                         uint32_t bofs, uint32_t opsz,\n> +                         uint32_t tysz, TCGType type, bool load_dest,\n> +                         void (*fni)(TCGv_vec, TCGv_vec, TCGv_vec))\n> +{\n> +    TCGv_vec t0 = tcg_temp_new_vec(type);\n> +    TCGv_vec t1 = tcg_temp_new_vec(type);\n> +    TCGv_vec t2 = tcg_temp_new_vec(type);\n> +    uint32_t i;\n> +\n> +    for (i = 0; i < opsz; i += tysz) {\n> +        tcg_gen_ld_vec(t0, tcg_ctx.tcg_env, aofs + i);\n> +        tcg_gen_ld_vec(t1, tcg_ctx.tcg_env, bofs + i);\n> +        if (load_dest) {\n> +            tcg_gen_ld_vec(t2, tcg_ctx.tcg_env, dofs + i);\n> +        }\n> +        fni(t2, t0, t1);\n> +        tcg_gen_st_vec(t2, tcg_ctx.tcg_env, dofs + i);\n> +    }\n> +    tcg_temp_free_vec(t2);\n> +    tcg_temp_free_vec(t1);\n> +    tcg_temp_free_vec(t0);\n> +}\n> +\n> +/* Expand a vector two-operand operation.  */\n> +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,\n> +                    uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g)\n> +{\n> +    check_size_align(oprsz, maxsz, dofs | aofs);\n> +    check_overlap_2(dofs, aofs, maxsz);\n> +\n> +    /* Quick check for sizes we won't support inline.  */\n> +    if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) {\n> +        goto do_ool;\n> +    }\n> +\n> +    /* Recall that ARM SVE allows vector sizes that are not a power of 2.\n> +       Expand with successively smaller host vector sizes.  The intent is\n> +       that e.g. oprsz == 80 would be expanded with 2x32 + 1x16.  */\n> +    /* ??? For maxsz > oprsz, the host may be able to use an op-sized\n> +       operation, zeroing the balance of the register.  We can then\n> +       use a cl-sized store to implement the clearing without an extra\n> +       store operation.  This is true for aarch64 and x86_64 hosts.  */\n> +\n> +    if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32);\n> +        expand_2_vec(dofs, aofs, done, 32, TCG_TYPE_V256, g->fniv);\n> +        dofs += done;\n> +        aofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16);\n> +        expand_2_vec(dofs, aofs, done, 16, TCG_TYPE_V128, g->fniv);\n> +        dofs += done;\n> +        aofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (check_size_impl(oprsz, 8)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8);\n> +        if (TCG_TARGET_HAS_v64 && !g->prefer_i64) {\n> +            expand_2_vec(dofs, aofs, done, 8, TCG_TYPE_V64, g->fniv);\n> +        } else if (g->fni8) {\n> +            expand_2_i64(dofs, aofs, done, g->fni8);\n> +        } else {\n> +            done = 0;\n> +        }\n> +        dofs += done;\n> +        aofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (check_size_impl(oprsz, 4)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4);\n> +        expand_2_i32(dofs, aofs, done, g->fni4);\n> +        dofs += done;\n> +        aofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (oprsz == 0) {\n> +        if (maxsz != 0) {\n> +            expand_clr(dofs, maxsz);\n> +        }\n> +        return;\n> +    }\n> +\n> + do_ool:\n> +    tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, g->fno);\n> +}\n> +\n> +/* Expand a vector three-operand operation.  */\n> +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                    uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g)\n> +{\n> +    check_size_align(oprsz, maxsz, dofs | aofs | bofs);\n> +    check_overlap_3(dofs, aofs, bofs, maxsz);\n> +\n> +    /* Quick check for sizes we won't support inline.  */\n> +    if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) {\n> +        goto do_ool;\n> +    }\n> +\n> +    /* Recall that ARM SVE allows vector sizes that are not a power of 2.\n> +       Expand with successively smaller host vector sizes.  The intent is\n> +       that e.g. oprsz == 80 would be expanded with 2x32 + 1x16.  */\n> +    /* ??? For maxsz > oprsz, the host may be able to use an op-sized\n> +       operation, zeroing the balance of the register.  We can then\n> +       use a cl-sized store to implement the clearing without an extra\n> +       store operation.  This is true for aarch64 and x86_64 hosts.  */\n> +\n> +    if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32);\n> +        expand_3_vec(dofs, aofs, bofs, done, 32, TCG_TYPE_V256,\n> +                     g->load_dest, g->fniv);\n> +        dofs += done;\n> +        aofs += done;\n> +        bofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16);\n> +        expand_3_vec(dofs, aofs, bofs, done, 16, TCG_TYPE_V128,\n> +                     g->load_dest, g->fniv);\n> +        dofs += done;\n> +        aofs += done;\n> +        bofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (check_size_impl(oprsz, 8)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8);\n> +        if (TCG_TARGET_HAS_v64 && !g->prefer_i64) {\n> +            expand_3_vec(dofs, aofs, bofs, done, 8, TCG_TYPE_V64,\n> +                         g->load_dest, g->fniv);\n> +        } else if (g->fni8) {\n> +            expand_3_i64(dofs, aofs, bofs, done, g->load_dest, g->fni8);\n> +        } else {\n> +            done = 0;\n> +        }\n> +        dofs += done;\n> +        aofs += done;\n> +        bofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (check_size_impl(oprsz, 4)) {\n> +        uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4);\n> +        expand_3_i32(dofs, aofs, bofs, done, g->load_dest, g->fni4);\n> +        dofs += done;\n> +        aofs += done;\n> +        bofs += done;\n> +        oprsz -= done;\n> +        maxsz -= done;\n> +    }\n> +\n> +    if (oprsz == 0) {\n> +        if (maxsz != 0) {\n> +            expand_clr(dofs, maxsz);\n> +        }\n> +        return;\n> +    }\n> +\n> + do_ool:\n> +    tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, g->fno);\n> +}\n> +\n> +/*\n> + * Expand specific vector operations.\n> + */\n> +\n> +void tcg_gen_gvec_mov(uint32_t dofs, uint32_t aofs,\n> +                      uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen2 g = {\n> +        .fni8 = tcg_gen_mov_i64,\n> +        .fniv = tcg_gen_mov_vec,\n> +        .fno = gen_helper_gvec_mov,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_not(uint32_t dofs, uint32_t aofs,\n> +                      uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen2 g = {\n> +        .fni8 = tcg_gen_not_i64,\n> +        .fniv = tcg_gen_not_vec,\n> +        .fno = gen_helper_gvec_not,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g);\n> +}\n> +\n> +static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m)\n> +{\n> +    TCGv_i64 t1 = tcg_temp_new_i64();\n> +    TCGv_i64 t2 = tcg_temp_new_i64();\n> +    TCGv_i64 t3 = tcg_temp_new_i64();\n> +\n> +    tcg_gen_andc_i64(t1, a, m);\n> +    tcg_gen_andc_i64(t2, b, m);\n> +    tcg_gen_xor_i64(t3, a, b);\n> +    tcg_gen_add_i64(d, t1, t2);\n> +    tcg_gen_and_i64(t3, t3, m);\n> +    tcg_gen_xor_i64(d, d, t3);\n> +\n> +    tcg_temp_free_i64(t1);\n> +    tcg_temp_free_i64(t2);\n> +    tcg_temp_free_i64(t3);\n> +}\n> +\n> +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)\n> +{\n> +    TCGv_i64 m = tcg_const_i64(REP8(0x80));\n> +    gen_addv_mask(d, a, b, m);\n> +    tcg_temp_free_i64(m);\n> +}\n> +\n> +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)\n> +{\n> +    TCGv_i64 m = tcg_const_i64(REP16(0x8000));\n> +    gen_addv_mask(d, a, b, m);\n> +    tcg_temp_free_i64(m);\n> +}\n> +\n> +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)\n> +{\n> +    TCGv_i64 t1 = tcg_temp_new_i64();\n> +    TCGv_i64 t2 = tcg_temp_new_i64();\n> +\n> +    tcg_gen_andi_i64(t1, a, ~0xffffffffull);\n> +    tcg_gen_add_i64(t2, a, b);\n> +    tcg_gen_add_i64(t1, t1, b);\n> +    tcg_gen_deposit_i64(d, t1, t2, 0, 32);\n> +\n> +    tcg_temp_free_i64(t1);\n> +    tcg_temp_free_i64(t2);\n> +}\n> +\n> +void tcg_gen_gvec_add8(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                       uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_vec_add8_i64,\n> +        .fniv = tcg_gen_add8_vec,\n> +        .fno = gen_helper_gvec_add8,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_add16(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_vec_add16_i64,\n> +        .fniv = tcg_gen_add16_vec,\n> +        .fno = gen_helper_gvec_add16,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_add32(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni4 = tcg_gen_add_i32,\n> +        .fniv = tcg_gen_add32_vec,\n> +        .fno = gen_helper_gvec_add32,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_add64(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_add_i64,\n> +        .fniv = tcg_gen_add64_vec,\n> +        .fno = gen_helper_gvec_add64,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m)\n> +{\n> +    TCGv_i64 t1 = tcg_temp_new_i64();\n> +    TCGv_i64 t2 = tcg_temp_new_i64();\n> +    TCGv_i64 t3 = tcg_temp_new_i64();\n> +\n> +    tcg_gen_or_i64(t1, a, m);\n> +    tcg_gen_andc_i64(t2, b, m);\n> +    tcg_gen_eqv_i64(t3, a, b);\n> +    tcg_gen_sub_i64(d, t1, t2);\n> +    tcg_gen_and_i64(t3, t3, m);\n> +    tcg_gen_xor_i64(d, d, t3);\n> +\n> +    tcg_temp_free_i64(t1);\n> +    tcg_temp_free_i64(t2);\n> +    tcg_temp_free_i64(t3);\n> +}\n> +\n> +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)\n> +{\n> +    TCGv_i64 m = tcg_const_i64(REP8(0x80));\n> +    gen_subv_mask(d, a, b, m);\n> +    tcg_temp_free_i64(m);\n> +}\n> +\n> +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)\n> +{\n> +    TCGv_i64 m = tcg_const_i64(REP16(0x8000));\n> +    gen_subv_mask(d, a, b, m);\n> +    tcg_temp_free_i64(m);\n> +}\n> +\n> +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)\n> +{\n> +    TCGv_i64 t1 = tcg_temp_new_i64();\n> +    TCGv_i64 t2 = tcg_temp_new_i64();\n> +\n> +    tcg_gen_andi_i64(t1, b, ~0xffffffffull);\n> +    tcg_gen_sub_i64(t2, a, b);\n> +    tcg_gen_sub_i64(t1, a, t1);\n> +    tcg_gen_deposit_i64(d, t1, t2, 0, 32);\n> +\n> +    tcg_temp_free_i64(t1);\n> +    tcg_temp_free_i64(t2);\n> +}\n> +\n> +void tcg_gen_gvec_sub8(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                       uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_vec_sub8_i64,\n> +        .fniv = tcg_gen_sub8_vec,\n> +        .fno = gen_helper_gvec_sub8,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_sub16(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_vec_sub16_i64,\n> +        .fniv = tcg_gen_sub16_vec,\n> +        .fno = gen_helper_gvec_sub16,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_sub32(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni4 = tcg_gen_sub_i32,\n> +        .fniv = tcg_gen_sub32_vec,\n> +        .fno = gen_helper_gvec_sub32,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_sub64(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_sub_i64,\n> +        .fniv = tcg_gen_sub64_vec,\n> +        .fno = gen_helper_gvec_sub64,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m)\n> +{\n> +    TCGv_i64 t2 = tcg_temp_new_i64();\n> +    TCGv_i64 t3 = tcg_temp_new_i64();\n> +\n> +    tcg_gen_andc_i64(t3, m, b);\n> +    tcg_gen_andc_i64(t2, b, m);\n> +    tcg_gen_sub_i64(d, m, t2);\n> +    tcg_gen_xor_i64(d, d, t3);\n> +\n> +    tcg_temp_free_i64(t2);\n> +    tcg_temp_free_i64(t3);\n> +}\n> +\n> +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 b)\n> +{\n> +    TCGv_i64 m = tcg_const_i64(REP8(0x80));\n> +    gen_negv_mask(d, b, m);\n> +    tcg_temp_free_i64(m);\n> +}\n> +\n> +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 b)\n> +{\n> +    TCGv_i64 m = tcg_const_i64(REP16(0x8000));\n> +    gen_negv_mask(d, b, m);\n> +    tcg_temp_free_i64(m);\n> +}\n> +\n> +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 b)\n> +{\n> +    TCGv_i64 t1 = tcg_temp_new_i64();\n> +    TCGv_i64 t2 = tcg_temp_new_i64();\n> +\n> +    tcg_gen_andi_i64(t1, b, ~0xffffffffull);\n> +    tcg_gen_neg_i64(t2, b);\n> +    tcg_gen_neg_i64(t1, t1);\n> +    tcg_gen_deposit_i64(d, t1, t2, 0, 32);\n> +\n> +    tcg_temp_free_i64(t1);\n> +    tcg_temp_free_i64(t2);\n> +}\n> +\n> +void tcg_gen_gvec_neg8(uint32_t dofs, uint32_t aofs,\n> +                       uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen2 g = {\n> +        .fni8 = tcg_gen_vec_neg8_i64,\n> +        .fniv = tcg_gen_neg8_vec,\n> +        .fno = gen_helper_gvec_neg8,\n> +    };\n> +    tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_neg16(uint32_t dofs, uint32_t aofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen2 g = {\n> +        .fni8 = tcg_gen_vec_neg16_i64,\n> +        .fniv = tcg_gen_neg16_vec,\n> +        .fno = gen_helper_gvec_neg16,\n> +    };\n> +    tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_neg32(uint32_t dofs, uint32_t aofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen2 g = {\n> +        .fni4 = tcg_gen_neg_i32,\n> +        .fniv = tcg_gen_neg32_vec,\n> +        .fno = gen_helper_gvec_neg32,\n> +    };\n> +    tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_neg64(uint32_t dofs, uint32_t aofs,\n> +                        uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen2 g = {\n> +        .fni8 = tcg_gen_neg_i64,\n> +        .fniv = tcg_gen_neg64_vec,\n> +        .fno = gen_helper_gvec_neg64,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_and(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                      uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_and_i64,\n> +        .fniv = tcg_gen_and_vec,\n> +        .fno = gen_helper_gvec_and,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_or(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                     uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_or_i64,\n> +        .fniv = tcg_gen_or_vec,\n> +        .fno = gen_helper_gvec_or,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_xor(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                      uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_xor_i64,\n> +        .fniv = tcg_gen_xor_vec,\n> +        .fno = gen_helper_gvec_xor,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_andc(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                       uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_andc_i64,\n> +        .fniv = tcg_gen_andc_vec,\n> +        .fno = gen_helper_gvec_andc,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> +\n> +void tcg_gen_gvec_orc(uint32_t dofs, uint32_t aofs, uint32_t bofs,\n> +                      uint32_t opsz, uint32_t clsz)\n> +{\n> +    static const GVecGen3 g = {\n> +        .fni8 = tcg_gen_orc_i64,\n> +        .fniv = tcg_gen_orc_vec,\n> +        .fno = gen_helper_gvec_orc,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +    };\n> +    tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g);\n> +}\n> diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs\n> index 228cd84fa4..d381a02f34 100644\n> --- a/accel/tcg/Makefile.objs\n> +++ b/accel/tcg/Makefile.objs\n> @@ -1,6 +1,6 @@\n>  obj-$(CONFIG_SOFTMMU) += tcg-all.o\n>  obj-$(CONFIG_SOFTMMU) += cputlb.o\n> -obj-y += tcg-runtime.o\n> +obj-y += tcg-runtime.o tcg-runtime-gvec.o\n>  obj-y += cpu-exec.o cpu-exec-common.o translate-all.o\n>  obj-y += translator.o\n\n\n--\nAlex Bennée","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=linaro.org header.i=@linaro.org\n\theader.b=\"BgmH4JID\"; dkim-atps=neutral"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3y1wgg2NbTz9t3B\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 27 Sep 2017 08:32:28 +1000 (AEST)","from localhost ([::1]:51331 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dwyOt-0002V9-M8\n\tfor incoming@patchwork.ozlabs.org; Tue, 26 Sep 2017 18:32:23 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:44538)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <alex.bennee@linaro.org>) id 1dwyOI-0002V0-AT\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 18:31:53 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <alex.bennee@linaro.org>) id 1dwyOD-0001cg-LP\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 18:31:46 -0400","from mail-wm0-x22c.google.com ([2a00:1450:400c:c09::22c]:49059)\n\tby eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)\n\t(Exim 4.71) (envelope-from <alex.bennee@linaro.org>)\n\tid 1dwyOD-0001bk-6i\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 18:31:41 -0400","by mail-wm0-x22c.google.com with SMTP id m127so12347780wmm.3\n\tfor <qemu-devel@nongnu.org>; Tue, 26 Sep 2017 15:31:41 -0700 (PDT)","from zen.linaro.local ([81.128.185.34])\n\tby smtp.gmail.com with ESMTPSA id\n\tb89sm18515958wrd.42.2017.09.26.15.31.37\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tTue, 26 Sep 2017 15:31:37 -0700 (PDT)","from zen (localhost [127.0.0.1])\n\tby zen.linaro.local (Postfix) with ESMTPS id 514CA3E019F;\n\tTue, 26 Sep 2017 23:31:37 +0100 (BST)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;\n\th=references:user-agent:from:to:cc:subject:in-reply-to:date\n\t:message-id:mime-version:content-transfer-encoding;\n\tbh=qZ/Yncvb4pCW8cCeLfQTSaGoYHDiH4kMAdIaD0aPJnc=;\n\tb=BgmH4JIDuP+9wdGRKX9QUgDtVc/24Ca8Q9eWw4d1ENXG4hoHtjoZBQXfbQkbNWbKrb\n\tuTc6JB8jvY3pnKAYwZtkKoJoCKF7jGPVjtJxrvVw9bhz9nrSCeWIqbiXAqsvwKQ9qCBN\n\tF/RaE/wa3hPWskgWI+rsyLadZ86G00vqidXJo=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:references:user-agent:from:to:cc:subject\n\t:in-reply-to:date:message-id:mime-version:content-transfer-encoding; \n\tbh=qZ/Yncvb4pCW8cCeLfQTSaGoYHDiH4kMAdIaD0aPJnc=;\n\tb=MHJrj3fKnFgVwFhYTZpkiA/cSk9S7Z68tQQW/TvlUmS97o/Hzl07z/57mEc74ssZEw\n\tXCMW8pfkwNyRxoRkpN3obv6IYsz6vKLDXecDLpilMI2TFVq3aeUEUKo5ZcMYnFP/raZn\n\t3FzfC5D9n5/voFK79BLQeJ8tP2gcPLU+EMF8s0/eFaWrbVgdFnULCpOlHiu97/HP6iJm\n\tl6DJerO5GZgTiSA3llNjAoWQkU5sxczlJlY1TsSwfiC0SK8GBI/kLo8+NVCAqrOeas6t\n\tHT0QurEmCmoPsLt0QM6ReX9TZyHnTYzw7dpadd2D/0fC+tkhs3+A8ZGiVfe7AVW2yFdD\n\t9hxA==","X-Gm-Message-State":"AHPjjUjkhRdvGgeK+A0Y2XqOWtOCRckRnBcQZDVceqaVnjrIaZn/wqct\n\tmWqRqx7Ewr3DK3Ah+BLPVdpBiA==","X-Google-Smtp-Source":"AOwi7QC7JepvwqgMRZLJEkjbjajEm20Yb/w673jt0uklNnE2XcCSR9TUFBKOgCKbM1dyedmlf2bfhw==","X-Received":"by 10.28.203.4 with SMTP id b4mr4362364wmg.17.1506465099144;\n\tTue, 26 Sep 2017 15:31:39 -0700 (PDT)","References":"<20170916023417.14599-1-richard.henderson@linaro.org>\n\t<20170916023417.14599-3-richard.henderson@linaro.org>","User-agent":"mu4e 0.9.19; emacs 26.0.60","From":"Alex =?utf-8?q?Benn=C3=A9e?= <alex.bennee@linaro.org>","To":"Richard Henderson <richard.henderson@linaro.org>","In-reply-to":"<20170916023417.14599-3-richard.henderson@linaro.org>","Date":"Tue, 26 Sep 2017 23:31:37 +0100","Message-ID":"<87r2utccs6.fsf@linaro.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"8bit","X-detected-operating-system":"by eggs.gnu.org: Genre and OS details not\n\trecognized.","X-Received-From":"2a00:1450:400c:c09::22c","Subject":"Re: [Qemu-devel] [PATCH v3 2/6] tcg: Add vector expanders","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, f4bug@amsat.org","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1775952,"web_url":"http://patchwork.ozlabs.org/comment/1775952/","msgid":"<87o9pxcav8.fsf@linaro.org>","list_archive_url":null,"date":"2017-09-26T23:12:59","subject":"Re: [Qemu-devel] [PATCH v3 4/6] target/arm: Use vector\n\tinfrastructure for aa64 add/sub/logic","submitter":{"id":39532,"url":"http://patchwork.ozlabs.org/api/people/39532/","name":"Alex Bennée","email":"alex.bennee@linaro.org"},"content":"Richard Henderson <richard.henderson@linaro.org> writes:\n\n> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>\n\nReviewed-by: Alex Bennée <alex.bennee@linaro.org>\n\n> ---\n>  target/arm/translate-a64.c | 216 ++++++++++++++++++++++++++++++---------------\n>  1 file changed, 143 insertions(+), 73 deletions(-)\n>\n> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c\n> index a3984c9a0d..4759cc9829 100644\n> --- a/target/arm/translate-a64.c\n> +++ b/target/arm/translate-a64.c\n> @@ -21,6 +21,7 @@\n>  #include \"cpu.h\"\n>  #include \"exec/exec-all.h\"\n>  #include \"tcg-op.h\"\n> +#include \"tcg-op-gvec.h\"\n>  #include \"qemu/log.h\"\n>  #include \"arm_ldst.h\"\n>  #include \"translate.h\"\n> @@ -82,6 +83,7 @@ typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);\n>  typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);\n>  typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32);\n>  typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);\n> +typedef void GVecGenTwoFn(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t);\n>\n>  /* initialize TCG globals.  */\n>  void a64_translate_init(void)\n> @@ -537,6 +539,21 @@ static inline int vec_reg_offset(DisasContext *s, int regno,\n>      return offs;\n>  }\n>\n> +/* Return the offset info CPUARMState of the \"whole\" vector register Qn.  */\n> +static inline int vec_full_reg_offset(DisasContext *s, int regno)\n> +{\n> +    assert_fp_access_checked(s);\n> +    return offsetof(CPUARMState, vfp.regs[regno * 2]);\n> +}\n> +\n> +/* Return the byte size of the \"whole\" vector register, VL / 8.  */\n> +static inline int vec_full_reg_size(DisasContext *s)\n> +{\n> +    /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags.\n> +       In the meantime this is just the AdvSIMD length of 128.  */\n> +    return 128 / 8;\n> +}\n> +\n>  /* Return the offset into CPUARMState of a slice (from\n>   * the least significant end) of FP register Qn (ie\n>   * Dn, Sn, Hn or Bn).\n> @@ -9036,85 +9053,125 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)\n>      }\n>  }\n>\n> +static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)\n> +{\n> +    tcg_gen_xor_i64(rn, rn, rm);\n> +    tcg_gen_and_i64(rn, rn, rd);\n> +    tcg_gen_xor_i64(rd, rm, rn);\n> +}\n> +\n> +static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)\n> +{\n> +    tcg_gen_xor_i64(rn, rn, rd);\n> +    tcg_gen_and_i64(rn, rn, rm);\n> +    tcg_gen_xor_i64(rd, rd, rn);\n> +}\n> +\n> +static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)\n> +{\n> +    tcg_gen_xor_i64(rn, rn, rd);\n> +    tcg_gen_andc_i64(rn, rn, rm);\n> +    tcg_gen_xor_i64(rd, rd, rn);\n> +}\n> +\n> +static void gen_bsl_vec(TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)\n> +{\n> +    tcg_gen_xor_vec(rn, rn, rm);\n> +    tcg_gen_and_vec(rn, rn, rd);\n> +    tcg_gen_xor_vec(rd, rm, rn);\n> +}\n> +\n> +static void gen_bit_vec(TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)\n> +{\n> +    tcg_gen_xor_vec(rn, rn, rd);\n> +    tcg_gen_and_vec(rn, rn, rm);\n> +    tcg_gen_xor_vec(rd, rd, rn);\n> +}\n> +\n> +static void gen_bif_vec(TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)\n> +{\n> +    tcg_gen_xor_vec(rn, rn, rd);\n> +    tcg_gen_andc_vec(rn, rn, rm);\n> +    tcg_gen_xor_vec(rd, rd, rn);\n> +}\n> +\n>  /* Logic op (opcode == 3) subgroup of C3.6.16. */\n>  static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)\n>  {\n> +    static const GVecGen3 bsl_op = {\n> +        .fni8 = gen_bsl_i64,\n> +        .fniv = gen_bsl_vec,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +        .load_dest = true\n> +    };\n> +    static const GVecGen3 bit_op = {\n> +        .fni8 = gen_bit_i64,\n> +        .fniv = gen_bit_vec,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +        .load_dest = true\n> +    };\n> +    static const GVecGen3 bif_op = {\n> +        .fni8 = gen_bif_i64,\n> +        .fniv = gen_bif_vec,\n> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,\n> +        .load_dest = true\n> +    };\n> +\n>      int rd = extract32(insn, 0, 5);\n>      int rn = extract32(insn, 5, 5);\n>      int rm = extract32(insn, 16, 5);\n>      int size = extract32(insn, 22, 2);\n>      bool is_u = extract32(insn, 29, 1);\n>      bool is_q = extract32(insn, 30, 1);\n> -    TCGv_i64 tcg_op1, tcg_op2, tcg_res[2];\n> -    int pass;\n> +    GVecGenTwoFn *gvec_fn;\n> +    const GVecGen3 *gvec_op;\n>\n>      if (!fp_access_check(s)) {\n>          return;\n>      }\n>\n> -    tcg_op1 = tcg_temp_new_i64();\n> -    tcg_op2 = tcg_temp_new_i64();\n> -    tcg_res[0] = tcg_temp_new_i64();\n> -    tcg_res[1] = tcg_temp_new_i64();\n> -\n> -    for (pass = 0; pass < (is_q ? 2 : 1); pass++) {\n> -        read_vec_element(s, tcg_op1, rn, pass, MO_64);\n> -        read_vec_element(s, tcg_op2, rm, pass, MO_64);\n> -\n> -        if (!is_u) {\n> -            switch (size) {\n> -            case 0: /* AND */\n> -                tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2);\n> -                break;\n> -            case 1: /* BIC */\n> -                tcg_gen_andc_i64(tcg_res[pass], tcg_op1, tcg_op2);\n> -                break;\n> -            case 2: /* ORR */\n> -                tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2);\n> -                break;\n> -            case 3: /* ORN */\n> -                tcg_gen_orc_i64(tcg_res[pass], tcg_op1, tcg_op2);\n> -                break;\n> -            }\n> -        } else {\n> -            if (size != 0) {\n> -                /* B* ops need res loaded to operate on */\n> -                read_vec_element(s, tcg_res[pass], rd, pass, MO_64);\n> -            }\n> -\n> -            switch (size) {\n> -            case 0: /* EOR */\n> -                tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2);\n> -                break;\n> -            case 1: /* BSL bitwise select */\n> -                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2);\n> -                tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]);\n> -                tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1);\n> -                break;\n> -            case 2: /* BIT, bitwise insert if true */\n> -                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);\n> -                tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2);\n> -                tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);\n> -                break;\n> -            case 3: /* BIF, bitwise insert if false */\n> -                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);\n> -                tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2);\n> -                tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);\n> -                break;\n> -            }\n> -        }\n> -    }\n> +    switch (size + 4 * is_u) {\n> +    case 0: /* AND */\n> +        gvec_fn = tcg_gen_gvec_and;\n> +        goto do_fn;\n> +    case 1: /* BIC */\n> +        gvec_fn = tcg_gen_gvec_andc;\n> +        goto do_fn;\n> +    case 2: /* ORR */\n> +        gvec_fn = tcg_gen_gvec_or;\n> +        goto do_fn;\n> +    case 3: /* ORN */\n> +        gvec_fn = tcg_gen_gvec_orc;\n> +        goto do_fn;\n> +    case 4: /* EOR */\n> +        gvec_fn = tcg_gen_gvec_xor;\n> +        goto do_fn;\n> +    do_fn:\n> +        gvec_fn(vec_full_reg_offset(s, rd),\n> +                vec_full_reg_offset(s, rn),\n> +                vec_full_reg_offset(s, rm),\n> +                is_q ? 16 : 8, vec_full_reg_size(s));\n> +        return;\n> +\n> +    case 5: /* BSL bitwise select */\n> +        gvec_op = &bsl_op;\n> +        goto do_op;\n> +    case 6: /* BIT, bitwise insert if true */\n> +        gvec_op = &bit_op;\n> +        goto do_op;\n> +    case 7: /* BIF, bitwise insert if false */\n> +        gvec_op = &bif_op;\n> +        goto do_op;\n> +    do_op:\n> +        tcg_gen_gvec_3(vec_full_reg_offset(s, rd),\n> +                       vec_full_reg_offset(s, rn),\n> +                       vec_full_reg_offset(s, rm),\n> +                       is_q ? 16 : 8, vec_full_reg_size(s), gvec_op);\n> +        return;\n>\n> -    write_vec_element(s, tcg_res[0], rd, 0, MO_64);\n> -    if (!is_q) {\n> -        tcg_gen_movi_i64(tcg_res[1], 0);\n> +    default:\n> +        g_assert_not_reached();\n>      }\n> -    write_vec_element(s, tcg_res[1], rd, 1, MO_64);\n> -\n> -    tcg_temp_free_i64(tcg_op1);\n> -    tcg_temp_free_i64(tcg_op2);\n> -    tcg_temp_free_i64(tcg_res[0]);\n> -    tcg_temp_free_i64(tcg_res[1]);\n>  }\n>\n>  /* Helper functions for 32 bit comparisons */\n> @@ -9375,6 +9432,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)\n>      int rn = extract32(insn, 5, 5);\n>      int rd = extract32(insn, 0, 5);\n>      int pass;\n> +    GVecGenTwoFn *gvec_op;\n>\n>      switch (opcode) {\n>      case 0x13: /* MUL, PMUL */\n> @@ -9414,6 +9472,28 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)\n>          return;\n>      }\n>\n> +    switch (opcode) {\n> +    case 0x10: /* ADD, SUB */\n> +        {\n> +            static GVecGenTwoFn * const fns[4][2] = {\n> +                { tcg_gen_gvec_add8, tcg_gen_gvec_sub8 },\n> +                { tcg_gen_gvec_add16, tcg_gen_gvec_sub16 },\n> +                { tcg_gen_gvec_add32, tcg_gen_gvec_sub32 },\n> +                { tcg_gen_gvec_add64, tcg_gen_gvec_sub64 },\n> +            };\n> +            gvec_op = fns[size][u];\n> +            goto do_gvec;\n> +        }\n> +        break;\n> +\n> +    do_gvec:\n> +        gvec_op(vec_full_reg_offset(s, rd),\n> +                vec_full_reg_offset(s, rn),\n> +                vec_full_reg_offset(s, rm),\n> +                is_q ? 16 : 8, vec_full_reg_size(s));\n> +        return;\n> +    }\n> +\n>      if (size == 3) {\n>          assert(is_q);\n>          for (pass = 0; pass < 2; pass++) {\n> @@ -9586,16 +9666,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)\n>                  genfn = fns[size][u];\n>                  break;\n>              }\n> -            case 0x10: /* ADD, SUB */\n> -            {\n> -                static NeonGenTwoOpFn * const fns[3][2] = {\n> -                    { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 },\n> -                    { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },\n> -                    { tcg_gen_add_i32, tcg_gen_sub_i32 },\n> -                };\n> -                genfn = fns[size][u];\n> -                break;\n> -            }\n>              case 0x11: /* CMTST, CMEQ */\n>              {\n>                  static NeonGenTwoOpFn * const fns[3][2] = {\n\n\n--\nAlex Bennée","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=linaro.org header.i=@linaro.org\n\theader.b=\"aCxUer2A\"; dkim-atps=neutral"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3y1xb73ytWz9sPr\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 27 Sep 2017 09:13:38 +1000 (AEST)","from localhost ([::1]:51424 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dwz2m-0004Tl-Fn\n\tfor incoming@patchwork.ozlabs.org; Tue, 26 Sep 2017 19:13:36 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:52475)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <alex.bennee@linaro.org>) id 1dwz2H-0004Rb-Vk\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 19:13:07 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <alex.bennee@linaro.org>) id 1dwz2E-00009u-LQ\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 19:13:05 -0400","from mail-wm0-x235.google.com ([2a00:1450:400c:c09::235]:46335)\n\tby eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)\n\t(Exim 4.71) (envelope-from <alex.bennee@linaro.org>)\n\tid 1dwz2E-00009Y-Ah\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 19:13:02 -0400","by mail-wm0-x235.google.com with SMTP id m72so12614039wmc.1\n\tfor <qemu-devel@nongnu.org>; Tue, 26 Sep 2017 16:13:02 -0700 (PDT)","from zen.linaro.local ([81.128.185.34])\n\tby smtp.gmail.com with ESMTPSA id\n\tx5sm5939069wre.18.2017.09.26.16.13.00\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tTue, 26 Sep 2017 16:13:00 -0700 (PDT)","from zen (localhost [127.0.0.1])\n\tby zen.linaro.local (Postfix) with ESMTPS id 9C3453E0363;\n\tWed, 27 Sep 2017 00:12:59 +0100 (BST)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;\n\th=references:user-agent:from:to:cc:subject:in-reply-to:date\n\t:message-id:mime-version:content-transfer-encoding;\n\tbh=i8NATmBfymfoCjaifL0zVjM0Xt6gFNmF+r22ERA1Id8=;\n\tb=aCxUer2AgAj9ZlD4bgRN75bjJLpLoNaAMDYHm5o5irOK/H5seQgt4BSQdquZ5ckkxk\n\tdRL8b5nRhBmVJrcs+AiBwTMZHtUBTGu1RNNw1/jVyS6OsUhwel1CX2NW+IR52AnPK3o+\n\t67F6IOp+3uu6X4T9EOlnfA/FqWuX/Qqltv28I=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:references:user-agent:from:to:cc:subject\n\t:in-reply-to:date:message-id:mime-version:content-transfer-encoding; \n\tbh=i8NATmBfymfoCjaifL0zVjM0Xt6gFNmF+r22ERA1Id8=;\n\tb=Tg0Ef5Ym8lkFnqQ9jrZt1AbwaHWKujN4sVjS+E9P81GdwriSbONsKp2UFTNUFgH+Wt\n\tWuIVrydoZybBZq0reglA+6L3emy8O5eZYvUDpyj6tv/qk5fq5dgZi4fKJeye3U88+wha\n\tVqrQx2qglf1ZcsEiCM+XjIo6o/J5PwhVXHEGSWcPXkbMHfSceXjm1b/2gUst1OqSNa4D\n\tNHHuu9ZHJslmmPFLIpc31J9uGXpbuf1m8XTVZcoh9ZkcQeGDVwhQSFls6jqsj93dwv0V\n\tVLsToVDhAFpGtaVTjI7G64oz87PGtHyMgQ54N94RTYJMvWSs32XsxeV6zP23PlCtXdnv\n\t2/DQ==","X-Gm-Message-State":"AHPjjUhV2UXWNdBoexl0CgIwzs0Bb7u+8Wb3lALtszEX+5bnGUIFgMxA\n\tNe1sj1ZgEBbYIFbI87EM6PU2Rw==","X-Google-Smtp-Source":"AOwi7QA33hV89cUP7Oq7eGw0yoAoQJMecx2avVvm/eINlsIGBQFNzt59JI/e1Eb+QCOqXeR2v/IabQ==","X-Received":"by 10.28.152.23 with SMTP id a23mr4158988wme.45.1506467581036;\n\tTue, 26 Sep 2017 16:13:01 -0700 (PDT)","References":"<20170916023417.14599-1-richard.henderson@linaro.org>\n\t<20170916023417.14599-5-richard.henderson@linaro.org>","User-agent":"mu4e 0.9.19; emacs 26.0.60","From":"Alex =?utf-8?q?Benn=C3=A9e?= <alex.bennee@linaro.org>","To":"Richard Henderson <richard.henderson@linaro.org>","In-reply-to":"<20170916023417.14599-5-richard.henderson@linaro.org>","Date":"Wed, 27 Sep 2017 00:12:59 +0100","Message-ID":"<87o9pxcav8.fsf@linaro.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"8bit","X-detected-operating-system":"by eggs.gnu.org: Genre and OS details not\n\trecognized.","X-Received-From":"2a00:1450:400c:c09::235","Subject":"Re: [Qemu-devel] [PATCH v3 4/6] target/arm: Use vector\n\tinfrastructure for aa64 add/sub/logic","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, f4bug@amsat.org","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1775967,"web_url":"http://patchwork.ozlabs.org/comment/1775967/","msgid":"<150646669584.63.10090914967646614472@b58463cdfd5f>","list_archive_url":null,"date":"2017-09-26T22:58:16","subject":"Re: [Qemu-devel] [PATCH v3 0/6] TCG vectorization and example\n\tconversion","submitter":{"id":69632,"url":"http://patchwork.ozlabs.org/api/people/69632/","name":null,"email":"no-reply@patchew.org"},"content":"Hi,\n\nThis series seems to have some coding style problems. See output below for\nmore information:\n\nType: series\nMessage-id: 20170916023417.14599-1-richard.henderson@linaro.org\nSubject: [Qemu-devel] [PATCH v3 0/6] TCG vectorization and example conversion\n\n=== TEST SCRIPT BEGIN ===\n#!/bin/bash\n\nBASE=base\nn=1\ntotal=$(git log --oneline $BASE.. | wc -l)\nfailed=0\n\ngit config --local diff.renamelimit 0\ngit config --local diff.renames True\n\ncommits=\"$(git log --format=%H --reverse $BASE..)\"\nfor c in $commits; do\n    echo \"Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)...\"\n    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then\n        failed=1\n        echo\n    fi\n    n=$((n+1))\ndone\n\nexit $failed\n=== TEST SCRIPT END ===\n\nUpdating 3c8cf5a9c21ff8782164d1def7f44bd888713384\nSwitched to a new branch 'test'\n7f8bff3639 tcg/aarch64: Add vector operations\n107700b998 tcg/i386: Add vector operations\n63c5d729cd target/arm: Use vector infrastructure for aa64 add/sub/logic\n66bd1ba117 target/arm: Align vector registers\nbcf88636c0 tcg: Add vector expanders\n00e32ea5b2 tcg: Add types and operations for host vectors\n\n=== OUTPUT BEGIN ===\nChecking PATCH 1/6: tcg: Add types and operations for host vectors...\nChecking PATCH 2/6: tcg: Add vector expanders...\nERROR: spaces required around that '&' (ctx:WxO)\n#284: FILE: accel/tcg/tcg-runtime-gvec.c:241:\n+        *(vec64 *)(d + i) = *(vec64 *)(a + i) &~ *(vec64 *)(b + i);\n                                               ^\n\nERROR: space prohibited after that '~' (ctx:OxW)\n#284: FILE: accel/tcg/tcg-runtime-gvec.c:241:\n+        *(vec64 *)(d + i) = *(vec64 *)(a + i) &~ *(vec64 *)(b + i);\n                                                ^\n\nERROR: spaces required around that '|' (ctx:WxO)\n#295: FILE: accel/tcg/tcg-runtime-gvec.c:252:\n+        *(vec64 *)(d + i) = *(vec64 *)(a + i) |~ *(vec64 *)(b + i);\n                                               ^\n\nERROR: space prohibited after that '~' (ctx:OxW)\n#295: FILE: accel/tcg/tcg-runtime-gvec.c:252:\n+        *(vec64 *)(d + i) = *(vec64 *)(a + i) |~ *(vec64 *)(b + i);\n                                                ^\n\nERROR: trailing statements should be on next line\n#589: FILE: tcg/tcg-op-gvec.c:198:\n+    } if (TCG_TARGET_REG_BITS == 64) {\n\ntotal: 5 errors, 0 warnings, 1342 lines checked\n\nYour patch has style problems, please review.  If any of these errors\nare false positives report them to the maintainer, see\nCHECKPATCH in MAINTAINERS.\n\nChecking PATCH 3/6: target/arm: Align vector registers...\nChecking PATCH 4/6: target/arm: Use vector infrastructure for aa64 add/sub/logic...\nChecking PATCH 5/6: tcg/i386: Add vector operations...\nWARNING: architecture specific defines should be avoided\n#50: FILE: tcg/i386/tcg-target.h:93:\n+#ifdef __SSE2__\n\nWARNING: architecture specific defines should be avoided\n#55: FILE: tcg/i386/tcg-target.h:98:\n+#ifdef __AVX2__\n\ntotal: 0 errors, 2 warnings, 722 lines checked\n\nYour patch has style problems, please review.  If any of these errors\nare false positives report them to the maintainer, see\nCHECKPATCH in MAINTAINERS.\nChecking PATCH 6/6: tcg/aarch64: Add vector operations...\n=== OUTPUT END ===\n\nTest command exited with code: 1\n\n\n---\nEmail generated automatically by Patchew [http://patchew.org/].\nPlease send your feedback to patchew-devel@freelists.org","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3y1zcT0Ylpz9s0Z\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 27 Sep 2017 10:44:55 +1000 (AEST)","from localhost ([::1]:51627 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dx0T5-0008WL-Gc\n\tfor incoming@patchwork.ozlabs.org; Tue, 26 Sep 2017 20:44:51 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:41378)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <no-reply@patchew.org>) id 1dx0Sj-0008WB-Az\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 20:44:30 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <no-reply@patchew.org>) id 1dx0Sg-00078H-8A\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 20:44:29 -0400","from sender-of-o52.zoho.com ([135.84.80.217]:21433)\n\tby eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <no-reply@patchew.org>)\n\tid 1dx0Sg-00076U-1V\n\tfor qemu-devel@nongnu.org; Tue, 26 Sep 2017 20:44:26 -0400","from [172.17.0.2] (23.253.156.214 [23.253.156.214]) by\n\tmx.zohomail.com with SMTPS id 1506466696340101.18228952158245;\n\tTue, 26 Sep 2017 15:58:16 -0700 (PDT)"],"Resent-Date":"Tue, 26 Sep 2017 20:44:29 -0400","Resent-Message-Id":"<E1dx0Sg-00078H-8A@eggs.gnu.org>","Message-ID":"<150646669584.63.10090914967646614472@b58463cdfd5f>","In-Reply-To":"<20170916023417.14599-1-richard.henderson@linaro.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=\"utf-8\"","Content-Transfer-Encoding":"base64","Resent-From":"","From":"no-reply@patchew.org","To":"richard.henderson@linaro.org","Date":"Tue, 26 Sep 2017 15:58:16 -0700 (PDT)","X-ZohoMailClient":"External","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 3.x [fuzzy]","X-Received-From":"135.84.80.217","Subject":"Re: [Qemu-devel] [PATCH v3 0/6] TCG vectorization and example\n\tconversion","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Reply-To":"qemu-devel@nongnu.org","Cc":"alex.bennee@linaro.org, famz@redhat.com, qemu-devel@nongnu.org,\n\tf4bug@amsat.org","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1776464,"web_url":"http://patchwork.ozlabs.org/comment/1776464/","msgid":"<e1ce7b9e-e8a8-41c8-4851-726377e63f89@linaro.org>","list_archive_url":null,"date":"2017-09-27T16:18:50","subject":"Re: [Qemu-devel] [PATCH v3 1/6] tcg: Add types and operations for\n\thost vectors","submitter":{"id":72104,"url":"http://patchwork.ozlabs.org/api/people/72104/","name":"Richard Henderson","email":"richard.henderson@linaro.org"},"content":"On 09/26/2017 12:28 PM, Alex Bennée wrote:\n>>      * TCGv_ptr : a host pointer type\n>> +    * TCGv_vec : a host vector type; the exact size is not exposed\n>> +                 to the CPU front-end code.\n> \n> Isn't this a guest vector type (which is pointed to by a host pointer)?\n\nNo, it's a host vector, which we have created in response to expanding a guest\nvector operation.\n\n> A one line comment wouldn't go amiss here. This looks like we are\n> allocating a new temp of the same type as an existing temp?\n> \n>> +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match)\n\nYes.\n\n>> +All of the vector ops have a final constant argument that specifies the\n>> +length of the vector operation LEN as 64 << LEN bits.\n> \n> That doesn't scan well. So would a 4 lane operation be encoded as 64 <<\n> 4? Is this because we are using the bottom bits for something?\n\n64 << 0 = 64\n64 << 1 = 128\n64 << 2 = 256.\n\nI've fixed up the wording a bit.\n\n>> +  Copy C across the entire vector.\n>> +  At present the only supported values for C are 0 and -1.\n> \n> I guess this is why the size in unimportant? This is for clearing or\n> setting the whole of the vector? What does len mean in this case?\n\nYes.  Len still means the length of the whole vector.\n\nElsewhere there's a comment about maybe using dupi{8,16,32,64}_vec instead.\nHowever I wanted to put that off until we do some more conversions and see\nexactly what's going to be needed.\n\n\n>> +* and_vec     v0, v1, v2, len\n>> +* or_vec      v0, v1, v2, len\n>> +* xor_vec     v0, v1, v2, len\n>> +* andc_vec    v0, v1, v2, len\n>> +* orc_vec     v0, v1, v2, len\n>> +* not_vec     v0, v1, len\n>> +\n>> +  Similarly, logical operations.\n> \n> Similarly, logical operations with and without compliment?\n\nSure.\n\n\nr~","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=208.118.235.17; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=linaro.org header.i=@linaro.org\n\theader.b=\"ViNDLLfJ\"; dkim-atps=neutral"],"Received":["from lists.gnu.org (lists.gnu.org [208.118.235.17])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3y2NNM697mz9t5l\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 28 Sep 2017 02:20:51 +1000 (AEST)","from localhost ([::1]:55515 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dxF3b-0005bk-MX\n\tfor incoming@patchwork.ozlabs.org; Wed, 27 Sep 2017 12:19:31 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:49209)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <richard.henderson@linaro.org>) id 1dxF36-0005aO-57\n\tfor qemu-devel@nongnu.org; Wed, 27 Sep 2017 12:19:01 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <richard.henderson@linaro.org>) id 1dxF32-0007ji-5c\n\tfor qemu-devel@nongnu.org; Wed, 27 Sep 2017 12:19:00 -0400","from mail-pg0-x22b.google.com ([2607:f8b0:400e:c05::22b]:51274)\n\tby eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)\n\t(Exim 4.71) (envelope-from <richard.henderson@linaro.org>)\n\tid 1dxF31-0007iT-V1\n\tfor qemu-devel@nongnu.org; Wed, 27 Sep 2017 12:18:56 -0400","by mail-pg0-x22b.google.com with SMTP id k193so8037968pgc.8\n\tfor <qemu-devel@nongnu.org>; Wed, 27 Sep 2017 09:18:55 -0700 (PDT)","from bigtime.twiddle.net ([70.35.39.2])\n\tby smtp.gmail.com with ESMTPSA id\n\tl74sm22475754pfi.9.2017.09.27.09.18.52\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tWed, 27 Sep 2017 09:18:52 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;\n\th=subject:to:cc:references:from:message-id:date:user-agent\n\t:mime-version:in-reply-to:content-language:content-transfer-encoding; \n\tbh=ZFJSGV72ZuPhtVbTGFZoZrbJG4TyAdRFMYZtGQKui7w=;\n\tb=ViNDLLfJA3iiaQCsXrRJi0DH7swFqZof4FKeNWOi++TQyWIqs1CzDa32Auvq2vLQVz\n\t5cOJrH6Z4gxzuuSE+xkwe6fTa+yw9goo/GUu1glw+feLuHQH0Tzy/IOBxLELrW3IROKG\n\tzRyjmV/xcQW9xmup7gSbLjL/G+wb30ruL/ZkE=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:subject:to:cc:references:from:message-id:date\n\t:user-agent:mime-version:in-reply-to:content-language\n\t:content-transfer-encoding;\n\tbh=ZFJSGV72ZuPhtVbTGFZoZrbJG4TyAdRFMYZtGQKui7w=;\n\tb=FNswGq8/3/ZWWHIU6K+Tt4J3FtZuF64EnQwSDvJvNAaxOJi8RNvZW5i2rCp12RgB6G\n\tTmmpQFaqwH4vUNfJcdP7heZNNPPkZOKA/139hb1mIRAE8IIHIpLYY9ibJ8/1PHC29R4X\n\tZdxPZ4ZxDQM7KJwhnl/+hC0ffP+BCY9Mqmq6pTJu/P49hN0nuKJitl4XekrBK5rY0eox\n\tMydq4Wto4CbrttQSTsWeGgq0c7J2dGPHzvkdc/7MhTswp8z2nrGE7VNXj27h1C+qfqjh\n\tsEdd5CBNQ42Y7yR3nabftPOupXu4ieJ8G/bovFRNIGVwuwNfTG4B3OtK8IeMburHbe7N\n\teq4A==","X-Gm-Message-State":"AHPjjUgboxxHNFeVN8w256pyaXyyoIczbBeNVxWGmsH+azIA/CdpJN0/\n\t+reBrjz9soTkxniUidYkNneg9Q==","X-Google-Smtp-Source":"AOwi7QC2AEFNLqfGrSKIix6F78RDdW2AhjLo9OMiKovi8WI0/KRJT84+7CTrH9j7D6cWro47533SSQ==","X-Received":"by 10.101.86.196 with SMTP id w4mr1784049pgs.341.1506529133789; \n\tWed, 27 Sep 2017 09:18:53 -0700 (PDT)","To":"=?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>","References":"<20170916023417.14599-1-richard.henderson@linaro.org>\n\t<20170916023417.14599-2-richard.henderson@linaro.org>\n\t<87shf9cl9r.fsf@linaro.org>","From":"Richard Henderson <richard.henderson@linaro.org>","Message-ID":"<e1ce7b9e-e8a8-41c8-4851-726377e63f89@linaro.org>","Date":"Wed, 27 Sep 2017 09:18:50 -0700","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<87shf9cl9r.fsf@linaro.org>","Content-Type":"text/plain; charset=utf-8","Content-Language":"en-US","Content-Transfer-Encoding":"8bit","X-detected-operating-system":"by eggs.gnu.org: Genre and OS details not\n\trecognized.","X-Received-From":"2607:f8b0:400e:c05::22b","Subject":"Re: [Qemu-devel] [PATCH v3 1/6] tcg: Add types and operations for\n\thost vectors","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, f4bug@amsat.org","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}}]