From patchwork Sat Apr 13 14:03:44 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 236355 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 4B29C2C00B4 for ; Sun, 14 Apr 2013 00:04:06 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type; q=dns; s=default; b=DwPQFgjAPy2B6mjW wKJWPHNmaAjfjHvFA9OPNDpFWlkixKb8o30PsNYBZZjS6kzbr+fkP+Pl3uHLLoZ1 ocqPbXrpfjQrZVHZUOcvM77cMOf7xWm0wmNZCqXCGXK2f7ESqaJREKo+upltEWIy V3+HXYqhqDQ6jMjcexdJN7mlz9s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type; s=default; bh=vQ6H/DY35WEoJZKaMlrj0u P3aH4=; b=sESlVZBNIXCvDHfEDhL7HPlw6wi8nKOllgcJwzV6J2G0CCNVTwS9eX jvJJjLLfrAADGXmuf1Bl3wkBoi0d0zRnNK7c2FF9GFZVgNTRyVDHd3i2eyfAs2v8 OaVKOQfbf4Kc5ljxHyTxYPYwhXWCHiIChR0Sd+njfkbl9Gx4Zp+OE= Received: (qmail 23022 invoked by alias); 13 Apr 2013 14:04:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 23012 invoked by uid 89); 13 Apr 2013 14:03:59 -0000 X-Spam-SWARE-Status: No, score=-4.3 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, KHOP_THREADED, NORMAL_HTTP_TO_IP, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, TW_VC autolearn=ham version=3.3.1 Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Sat, 13 Apr 2013 14:03:58 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1UR13a-0004oh-Ch from Julian_Brown@mentor.com ; Sat, 13 Apr 2013 07:03:54 -0700 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Sat, 13 Apr 2013 07:03:52 -0700 Received: from octopus (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.1.289.1; Sat, 13 Apr 2013 15:03:51 +0100 Date: Sat, 13 Apr 2013 15:03:44 +0100 From: Julian Brown To: Julian Brown CC: Kyrylo Tkachov , , Richard Earnshaw , Ramana Radhakrishnan Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics Message-ID: <20130413150344.2a80da8e@octopus> In-Reply-To: <20130412200939.13515c69@octopus> References: <020601ce3788$b88c85a0$29a590e0$@tkachov@arm.com> <20130412200939.13515c69@octopus> MIME-Version: 1.0 X-Virus-Found: No On Fri, 12 Apr 2013 20:09:39 +0100 Julian Brown wrote: > On Fri, 12 Apr 2013 15:19:18 +0100 > Kyrylo Tkachov wrote: > > > Hi all, > > > > This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic > > to arm_neon.h through the generator ML scripts and also adds the > > built-ins to which the intrinsics will map to. The generator ML > > scripts are updated and used to generate the relevant .texi > > documentation, arm_neon.h and the tests in gcc.target/arm/neon . > > FWIW, some of the changes to neon*.ml can be simplified somewhat -- my > attempt at an improved version of those bits is attached. I'm still > not too happy with mode_suffix, but these new instructions require > adding semantics to parts of the generator program which weren't > really very well-defined to start with :-). I appreciate that it's a > bit of a tangle... I thought of an improvement to the mode_suffix part from the last version of the patch, so here it is. I'm done fiddling with this now, so back to you! Cheers, Julian Index: neon-gen.ml =================================================================== --- neon-gen.ml (revision 197804) +++ neon-gen.ml (working copy) @@ -121,6 +121,7 @@ let rec signed_ctype = function | T_uint16 | T_int16 -> T_intHI | T_uint32 | T_int32 -> T_intSI | T_uint64 | T_int64 -> T_intDI + | T_float16 -> T_floatHF | T_float32 -> T_floatSF | T_poly8 -> T_intQI | T_poly16 -> T_intHI @@ -275,8 +276,8 @@ let rec mode_suffix elttype shape = let mode = mode_of_elt elttype shape in string_of_mode mode with MixedMode (dst, src) -> - let dstmode = mode_of_elt dst shape - and srcmode = mode_of_elt src shape in + let dstmode = mode_of_elt ~argpos:0 dst shape + and srcmode = mode_of_elt ~argpos:1 src shape in string_of_mode dstmode ^ string_of_mode srcmode let get_shuffle features = @@ -291,19 +292,24 @@ let print_feature_test_start features = match List.find (fun feature -> match feature with Requires_feature _ -> true | Requires_arch _ -> true + | Requires_FP_bit _ -> true | _ -> false) features with - Requires_feature feature -> + Requires_feature feature -> Format.printf "#ifdef __ARM_FEATURE_%s@\n" feature | Requires_arch arch -> Format.printf "#if __ARM_ARCH >= %d@\n" arch + | Requires_FP_bit bit -> + Format.printf "#if ((__ARM_FP & 0x%X) != 0)@\n" + (1 lsl bit) | _ -> assert false with Not_found -> assert true let print_feature_test_end features = let feature = - List.exists (function Requires_feature x -> true - | Requires_arch x -> true + List.exists (function Requires_feature _ -> true + | Requires_arch _ -> true + | Requires_FP_bit _ -> true | _ -> false) features in if feature then Format.printf "#endif@\n" @@ -365,6 +371,7 @@ let deftypes () = "__builtin_neon_hi", "int", 16, 4; "__builtin_neon_si", "int", 32, 2; "__builtin_neon_di", "int", 64, 1; + "__builtin_neon_hf", "float", 16, 4; "__builtin_neon_sf", "float", 32, 2; "__builtin_neon_poly8", "poly", 8, 8; "__builtin_neon_poly16", "poly", 16, 4; Index: neon.ml =================================================================== --- neon.ml (revision 197804) +++ neon.ml (working copy) @@ -21,7 +21,7 @@ . *) (* Shorthand types for vector elements. *) -type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16 +type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16 | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts | Cast of elts * elts | NoElts @@ -37,6 +37,7 @@ type vectype = T_int8x8 | T_int8x16 | T_uint16x4 | T_uint16x8 | T_uint32x2 | T_uint32x4 | T_uint64x1 | T_uint64x2 + | T_float16x4 | T_float32x2 | T_float32x4 | T_poly8x8 | T_poly8x16 | T_poly16x4 | T_poly16x8 @@ -46,11 +47,13 @@ type vectype = T_int8x8 | T_int8x16 | T_uint8 | T_uint16 | T_uint32 | T_uint64 | T_poly8 | T_poly16 - | T_float32 | T_arrayof of int * vectype + | T_float16 | T_float32 + | T_arrayof of int * vectype | T_ptrto of vectype | T_const of vectype | T_void | T_intQI | T_intHI | T_intSI - | T_intDI | T_floatSF + | T_intDI | T_floatHF + | T_floatSF (* The meanings of the following are: TImode : "Tetra", two registers (four words). @@ -93,7 +96,7 @@ type arity = Arity0 of vectype | Arity4 of vectype * vectype * vectype * vectype * vectype type vecmode = V8QI | V4HI | V2SI | V2SF | DI - | V16QI | V8HI | V4SI | V4SF | V2DI + | V16QI | V8HI | V4SI | V4SF | V4HF | V2DI | QI | HI | SI | SF type opcode = @@ -284,18 +287,22 @@ type features = | Fixed_core_reg (* Mark that the intrinsic requires __ARM_FEATURE_string to be defined. *) | Requires_feature of string + (* Mark that the intrinsic requires a particular architecture version. *) | Requires_arch of int + (* Mark that the intrinsic requires a particular bit in __ARM_FP to + be set. *) + | Requires_FP_bit of int exception MixedMode of elts * elts let rec elt_width = function S8 | U8 | P8 | I8 | B8 -> 8 - | S16 | U16 | P16 | I16 | B16 -> 16 + | S16 | U16 | P16 | I16 | B16 | F16 -> 16 | S32 | F32 | U32 | I32 | B32 -> 32 | S64 | U64 | I64 | B64 -> 64 | Conv (a, b) -> let wa = elt_width a and wb = elt_width b in - if wa = wb then wa else failwith "element width?" + if wa = wb then wa else raise (MixedMode (a, b)) | Cast (a, b) -> raise (MixedMode (a, b)) | NoElts -> failwith "No elts" @@ -303,7 +310,7 @@ let rec elt_class = function S8 | S16 | S32 | S64 -> Signed | U8 | U16 | U32 | U64 -> Unsigned | P8 | P16 -> Poly - | F32 -> Float + | F16 | F32 -> Float | I8 | I16 | I32 | I64 -> Int | B8 | B16 | B32 | B64 -> Bits | Conv (a, b) | Cast (a, b) -> ConvClass (elt_class a, elt_class b) @@ -315,6 +322,7 @@ let elt_of_class_width c w = | Signed, 16 -> S16 | Signed, 32 -> S32 | Signed, 64 -> S64 + | Float, 16 -> F16 | Float, 32 -> F32 | Unsigned, 8 -> U8 | Unsigned, 16 -> U16 @@ -384,7 +392,12 @@ let find_key_operand operands = in scan ((Array.length operands) - 1) -let rec mode_of_elt elt shape = +(* Find a vecmode from a shape_elt ELT for an instruction with shape_form + SHAPE. For a Use_operands shape, if ARGPOS is passed then return the mode + for the given argument position, else determine which argument to return a + mode for automatically. *) + +let rec mode_of_elt ?argpos elt shape = let flt = match elt_class elt with Float | ConvClass(_, Float) -> true | _ -> false in let idx = @@ -394,7 +407,10 @@ let rec mode_of_elt elt shape = in match shape with All (_, Dreg) | By_scalar Dreg | Pair_result Dreg | Unary_scalar Dreg | Binary_imm Dreg | Long_noreg Dreg | Wide_noreg Dreg -> - [| V8QI; V4HI; if flt then V2SF else V2SI; DI |].(idx) + if flt then + [| V8QI; V4HF; V2SF; DI |].(idx) + else + [| V8QI; V4HI; V2SI; DI |].(idx) | All (_, Qreg) | By_scalar Qreg | Pair_result Qreg | Unary_scalar Qreg | Binary_imm Qreg | Long_noreg Qreg | Wide_noreg Qreg -> [| V16QI; V8HI; if flt then V4SF else V4SI; V2DI |].(idx) @@ -404,7 +420,11 @@ let rec mode_of_elt elt shape = | Long_imm -> [| V8QI; V4HI; V2SI; DI |].(idx) | Narrow | Narrow_imm -> [| V16QI; V8HI; V4SI; V2DI |].(idx) - | Use_operands ops -> mode_of_elt elt (All (0, (find_key_operand ops))) + | Use_operands ops -> + begin match argpos with + None -> mode_of_elt ?argpos elt (All (0, (find_key_operand ops))) + | Some pos -> mode_of_elt ?argpos elt (All (0, ops.(pos))) + end | _ -> failwith "invalid shape" (* Modify an element type dependent on the shape of the instruction and the @@ -454,10 +474,11 @@ let type_for_elt shape elt no = | U16 -> T_uint16x4 | U32 -> T_uint32x2 | U64 -> T_uint64x1 + | F16 -> T_float16x4 | F32 -> T_float32x2 | P8 -> T_poly8x8 | P16 -> T_poly16x4 - | _ -> failwith "Bad elt type" + | _ -> failwith "Bad elt type for Dreg" end | Qreg -> begin match elt with @@ -472,7 +493,7 @@ let type_for_elt shape elt no = | F32 -> T_float32x4 | P8 -> T_poly8x16 | P16 -> T_poly16x8 - | _ -> failwith "Bad elt type" + | _ -> failwith "Bad elt type for Qreg" end | Corereg -> begin match elt with @@ -487,7 +508,7 @@ let type_for_elt shape elt no = | P8 -> T_poly8 | P16 -> T_poly16 | F32 -> T_float32 - | _ -> failwith "Bad elt type" + | _ -> failwith "Bad elt type for Corereg" end | Immed -> T_immediate (0, 0) @@ -506,7 +527,7 @@ let type_for_elt shape elt no = let vectype_size = function T_int8x8 | T_int16x4 | T_int32x2 | T_int64x1 | T_uint8x8 | T_uint16x4 | T_uint32x2 | T_uint64x1 - | T_float32x2 | T_poly8x8 | T_poly16x4 -> 64 + | T_float32x2 | T_poly8x8 | T_poly16x4 | T_float16x4 -> 64 | T_int8x16 | T_int16x8 | T_int32x4 | T_int64x2 | T_uint8x16 | T_uint16x8 | T_uint32x4 | T_uint64x2 | T_float32x4 | T_poly8x16 | T_poly16x8 -> 128 @@ -1217,6 +1238,10 @@ let ops = [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)]; Vcvt, [InfoWord], All (2, Qreg), "vcvtQ", conv_1, [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)]; + Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1], + Use_operands [| Dreg; Qreg; |], "vcvt", conv_1, [Conv (F16, F32)]; + Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1], + Use_operands [| Qreg; Dreg; |], "vcvt", conv_1, [Conv (F32, F16)]; Vcvt_n, [InfoWord], Use_operands [| Dreg; Dreg; Immed |], "vcvt_n", conv_2, [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)]; Vcvt_n, [InfoWord], Use_operands [| Qreg; Qreg; Immed |], "vcvtQ_n", conv_2, @@ -1782,7 +1807,7 @@ let rec string_of_elt = function | U8 -> "u8" | U16 -> "u16" | U32 -> "u32" | U64 -> "u64" | I8 -> "i8" | I16 -> "i16" | I32 -> "i32" | I64 -> "i64" | B8 -> "8" | B16 -> "16" | B32 -> "32" | B64 -> "64" - | F32 -> "f32" | P8 -> "p8" | P16 -> "p16" + | F32 -> "f32" | P8 -> "p8" | P16 -> "p16" | F16 -> "f16" | Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "_" ^ string_of_elt b | NoElts -> failwith "No elts" @@ -1809,6 +1834,7 @@ let string_of_vectype vt = | T_uint32x4 -> affix "uint32x4" | T_uint64x1 -> affix "uint64x1" | T_uint64x2 -> affix "uint64x2" + | T_float16x4 -> affix "float16x4" | T_float32x2 -> affix "float32x2" | T_float32x4 -> affix "float32x4" | T_poly8x8 -> affix "poly8x8" @@ -1825,6 +1851,7 @@ let string_of_vectype vt = | T_uint64 -> affix "uint64" | T_poly8 -> affix "poly8" | T_poly16 -> affix "poly16" + | T_float16 -> affix "float16" | T_float32 -> affix "float32" | T_immediate _ -> "const int" | T_void -> "void" @@ -1832,6 +1859,7 @@ let string_of_vectype vt = | T_intHI -> "__builtin_neon_hi" | T_intSI -> "__builtin_neon_si" | T_intDI -> "__builtin_neon_di" + | T_floatHF -> "__builtin_neon_hf" | T_floatSF -> "__builtin_neon_sf" | T_arrayof (num, base) -> let basename = name (fun x -> x) base in @@ -1853,10 +1881,10 @@ let string_of_inttype = function | B_XImode -> "__builtin_neon_xi" let string_of_mode = function - V8QI -> "v8qi" | V4HI -> "v4hi" | V2SI -> "v2si" | V2SF -> "v2sf" - | DI -> "di" | V16QI -> "v16qi" | V8HI -> "v8hi" | V4SI -> "v4si" - | V4SF -> "v4sf" | V2DI -> "v2di" | QI -> "qi" | HI -> "hi" | SI -> "si" - | SF -> "sf" + V8QI -> "v8qi" | V4HI -> "v4hi" | V4HF -> "v4hf" | V2SI -> "v2si" + | V2SF -> "v2sf" | DI -> "di" | V16QI -> "v16qi" | V8HI -> "v8hi" + | V4SI -> "v4si" | V4SF -> "v4sf" | V2DI -> "v2di" | QI -> "qi" + | HI -> "hi" | SI -> "si" | SF -> "sf" (* Use uppercase chars for letters which form part of the intrinsic name, but should be omitted from the builtin name (the info is passed in an extra Index: neon-testgen.ml =================================================================== --- neon-testgen.ml (revision 197804) +++ neon-testgen.ml (working copy) @@ -163,10 +163,12 @@ let effective_target features = match List.find (fun feature -> match feature with Requires_feature _ -> true | Requires_arch _ -> true + | Requires_FP_bit 1 -> true | _ -> false) features with Requires_feature "FMA" -> "arm_neonv2" | Requires_arch 8 -> "arm_v8_neon" + | Requires_FP_bit 1 -> "arm_neon_fp16" | _ -> assert false with Not_found -> "arm_neon"