From patchwork Thu Dec 8 19:50:06 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Edlinger X-Patchwork-Id: 704221 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tZQvg00Kqz9vDV for ; Fri, 9 Dec 2016 06:50:35 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="h6zjp+PK"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:mime-version; q=dns; s=default; b=LLMhXMsZQWQkWWWW HU1GXPw8UGEj26ZVQh3y0O/fj06KzokTROpt62t4otQM56Hk++CTzCM2Qo8YPnJI A5x0gSj8jscQ42hrOtzs9GF6JXMYvRPDvt6jJtmA/FIJQazlsosrn22xs1neUCrV djVkRePmDqe1Gi46a8Tp7C8ttJo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:mime-version; s=default; bh=yTjXbyhgPLv2kvDCv1LZ1b Dn1Hs=; b=h6zjp+PKpnhoRF8yIQtE8EqfIJ29xnoneqJqmSWePhV2/Bi0AAW2+6 2ukVIHsb45p7Od6FrzDo7dyv7IKSJ0CxJEQ6d8xyE4YIpPaPiWXsvSOp9i59jbWJ gXzW+sQYQmL+fQjdSCaIcm8i1jc8N2bhNYg6G7/yMqmSZtcxVn86Y= Received: (qmail 59246 invoked by alias); 8 Dec 2016 19:50:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 59233 invoked by uid 89); 8 Dec 2016 19:50:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=UD:orig, 7j, HX-OriginatorOrg:outlook.com, HX-HELO:sk:SNT004- X-HELO: SNT004-OMC1S21.hotmail.com Received: from snt004-omc1s21.hotmail.com (HELO SNT004-OMC1S21.hotmail.com) (65.55.90.32) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Dec 2016 19:50:16 +0000 Received: from EUR01-HE1-obe.outbound.protection.outlook.com ([65.55.90.7]) by SNT004-OMC1S21.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Thu, 8 Dec 2016 11:50:14 -0800 Received: from HE1EUR01FT053.eop-EUR01.prod.protection.outlook.com (10.152.0.52) by HE1EUR01HT091.eop-EUR01.prod.protection.outlook.com (10.152.0.213) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.761.6; Thu, 8 Dec 2016 19:50:07 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.0.52) by HE1EUR01FT053.mail.protection.outlook.com (10.152.1.73) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.761.6 via Frontend Transport; Thu, 8 Dec 2016 19:50:07 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) with mapi id 15.01.0771.008; Thu, 8 Dec 2016 19:50:06 +0000 From: Bernd Edlinger To: Wilco Dijkstra , Ramana Radhakrishnan CC: GCC Patches , Kyrill Tkachov , Richard Earnshaw , nd Subject: Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308) Date: Thu, 8 Dec 2016 19:50:06 +0000 Message-ID: References: In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none; arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:5451F6F26C02B9A0EB7421C3C51FC03BDCBC465A08A162CFAD628ABCA4CD178E; UpperCasedChecksum:0E44C372FF2D764E2091CF6BBFA4E883DB8E4491020F25507963917B17D7E92B; SizeAsReceived:8231; Count:37 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 37 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; HE1EUR01HT091; 7:wcWVNp7YAKsrA8PHYTK/XV6a1ejCjgVePC8lvCNHn8oMKb3lwr7ZGn7CkkvI1+pk8PA+LOyZ/tAXByuqP+KkI3fCIJ+LuWlYzXjDtc/X4/48Fa+4oChI6JYYLLp07GzmQm9KD++eAW6CrETdBvOj+/I0VJ9a/TyWocZEDTbBQjnmYp9WM0XNC/stCMwmBg4dSkKv3oiAfE3meoZhpm8Ki0JCpDna7SluakBmU4VC2xH4FdHATw2n3kA8W0EJGRpoJtPU40GathUo+vkje8/SKqaB86V92AAW3+HqktHMwP9jmaLtv5+S4L9jCtND5N5TOFmFYKsjufkSwrqdg5rDTLTFUFU7fkSuI3aTwNvFIljQjaxf8qRXvszoYKfBkbKorwGZxnozmAkG6NYpam0BIJwZCaU1u5R8ANeD+UVmVkGH49jyckMkrkWM/W3s7GJOLjuYbKCl+Mg5BE4AhqzWAg== x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900003); DIR:OUT; SFP:1102; SCL:1; SRVR:HE1EUR01HT091; H:AM4PR0701MB2162.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: 60b03035-55c8-47b0-eaa5-08d41fa36842 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(1601124038)(1603103113)(1601125047); SRVR:HE1EUR01HT091; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015012)(102415395)(82015046); SRVR:HE1EUR01HT091; BCL:0; PCL:0; RULEID:; SRVR:HE1EUR01HT091; x-forefront-prvs: 0150F3F97D spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Dec 2016 19:50:06.7789 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1EUR01HT091 Hi Wilco, On 11/30/16 18:01, Bernd Edlinger wrote: > I attached the completely untested follow-up patch now, but I would > like to post that one again for review, after I applied my current > patch, which is still waiting for final review (please feel pinged!). > > > This is really exciting... > > when testing the follow-up patch I discovered a single regression in gcc.dg/fixed-point/convert-sat.c that was caused by a mis-compilation of the libgcc function __gnu_satfractdasq. I think it triggerd a latent bug in the carryin_compare patterns. everything is as expected until reload. First what is left over of a split cmpdi_insn followed by a former cmpdi_unsigned, if the branch is not taken. (insn 109 10 110 2 (set (reg:CC 100 cc) (compare:CC (reg:SI 0 r0 [orig:124 _10 ] [124]) (const_int 0 [0]))) "../../../gcc-trunk/libgcc/fixed-bit.c":785 196 {*arm_cmpsi_insn} (nil)) (insn 110 109 13 2 (parallel [ (set (reg:CC 100 cc) (compare:CC (reg:SI 1 r1 [orig:125 _10+4 ] [125]) (const_int -1 [0xffffffffffffffff]))) (set (reg:SI 3 r3 [123]) (minus:SI (plus:SI (reg:SI 1 r1 [orig:125 _10+4 ] [125]) (const_int -1 [0xffffffffffffffff])) (ltu:SI (reg:CC_C 100 cc) (const_int 0 [0])))) ]) "../../../gcc-trunk/libgcc/fixed-bit.c":785 32 {*subsi3_carryin_compare_const} (nil)) (jump_insn 13 110 31 2 (set (pc) (if_then_else (ge (reg:CC_NCV 100 cc) (const_int 0 [0])) (label_ref:SI 102) (pc))) "../../../gcc-trunk/libgcc/fixed-bit.c":785 204 {arm_cond_branch} (int_list:REG_BR_PROB 6400 (nil)) (note 31 13 97 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (note 97 31 114 3 NOTE_INSN_DELETED) (insn 114 97 113 3 (set (reg:SI 2 r2 [orig:127+4 ] [127]) (const_int -1 [0xffffffffffffffff])) "../../../gcc-trunk/libgcc/fixed-bit.c":831 630 {*arm_movsi_vfp} (expr_list:REG_EQUIV (const_int -1 [0xffffffffffffffff]) (nil))) (insn 113 114 107 3 (set (reg:SI 3 r3 [126]) (const_int 2147483647 [0x7fffffff])) "../../../gcc-trunk/libgcc/fixed-bit.c":831 630 {*arm_movsi_vfp} (expr_list:REG_EQUIV (const_int 2147483647 [0x7fffffff]) (nil))) (insn 107 113 108 3 (set (reg:CC 100 cc) (compare:CC (reg:SI 1 r1 [orig:125 _10+4 ] [125]) (reg:SI 2 r2 [orig:127+4 ] [127]))) "../../../gcc-trunk/libgcc/fixed-bit.c":831 196 {*arm_cmpsi_insn} (nil)) Note that the CC register is not really set as implied by insn 110, because the C flag depends on the comparison of r1, 0xFFFF and the carry flag from insn 109. Therefore in the postreload pass the insn 107 appears to be unnecessary, as if should compute exactly the same CC flag, as insn 110, i.e. not dependent on previous CC flag. I think all carryin_compare patterns are wrong because they do not describe the true value of the CC reg. I think the CC reg is actually dependent on left, right and CC-in value, as in the new version of the patch it must be a computation in DI mode without overflow, as in my new version of the patch. I attached an update of the followup patch which is not yet adjusted on your pending negdi patch. Reg-testing is no yet done, but the mis-compilation on libgcc is fixed at least. What do you think? Thanks Bernd. 2016-12-08 Bernd Edlinger PR target/77308 * config/arm/arm.md (subdi3_compare1, subsi3_carryin_compare, subsi3_carryin_compare_const, negdi2_compare): Fix the CC reg dataflow. (*arm_negdi2, *arm_cmpdi_unsigned): Split early except for TARGET_NEON and TARGET_IWMMXT. (*arm_cmpdi_insn): Split early except for TARGET_NEON and TARGET_IWMMXT. Fix the CC reg dataflow. * config/arm/thumb2.md (*thumb2_negdi2): Split early except for TARGET_NEON and TARGET_IWMMXT. testsuite: 2016-12-08 Bernd Edlinger PR target/77308 * gcc.target/arm/pr77308-2.c: New test. --- gcc/config/arm/arm.md.orig 2016-12-08 16:01:43.290595127 +0100 +++ gcc/config/arm/arm.md 2016-12-08 19:04:22.251065848 +0100 @@ -1086,8 +1086,8 @@ }) (define_insn_and_split "subdi3_compare1" - [(set (reg:CC CC_REGNUM) - (compare:CC + [(set (reg:CC_NCV CC_REGNUM) + (compare:CC_NCV (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "register_operand" "r"))) (set (match_operand:DI 0 "register_operand" "=&r") @@ -1098,10 +1098,15 @@ [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (match_dup 2))) (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))]) - (parallel [(set (reg:CC CC_REGNUM) - (compare:CC (match_dup 4) (match_dup 5))) - (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5)) - (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] + (parallel [(set (reg:CC_C CC_REGNUM) + (compare:CC_C + (zero_extend:DI (match_dup 4)) + (plus:DI + (zero_extend:DI (match_dup 5)) + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) + (set (match_dup 3) + (minus:SI (minus:SI (match_dup 4) (match_dup 5)) + (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] { operands[3] = gen_highpart (SImode, operands[0]); operands[0] = gen_lowpart (SImode, operands[0]); @@ -1156,13 +1161,15 @@ ) (define_insn "*subsi3_carryin_compare" - [(set (reg:CC CC_REGNUM) - (compare:CC (match_operand:SI 1 "s_register_operand" "r") - (match_operand:SI 2 "s_register_operand" "r"))) + [(set (reg:CC_C CC_REGNUM) + (compare:CC_C + (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r")) + (plus:DI + (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r")) + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) (set (match_operand:SI 0 "s_register_operand" "=r") - (minus:SI (minus:SI (match_dup 1) - (match_dup 2)) - (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))] + (minus:SI (minus:SI (match_dup 1) (match_dup 2)) + (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))] "TARGET_32BIT" "sbcs\\t%0, %1, %2" [(set_attr "conds" "set") @@ -1170,12 +1177,14 @@ ) (define_insn "*subsi3_carryin_compare_const" - [(set (reg:CC CC_REGNUM) - (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r") - (match_operand:SI 2 "arm_not_operand" "K"))) + [(set (reg:CC_C CC_REGNUM) + (compare:CC_C + (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r")) + (plus:DI + (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K")) + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) (set (match_operand:SI 0 "s_register_operand" "=r") - (minus:SI (plus:SI (match_dup 1) - (match_dup 2)) + (minus:SI (plus:SI (match_dup 1) (match_dup 2)) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))] "TARGET_32BIT" "sbcs\\t%0, %1, #%B2" @@ -4684,8 +4693,8 @@ (define_insn_and_split "negdi2_compare" - [(set (reg:CC CC_REGNUM) - (compare:CC + [(set (reg:CC_NCV CC_REGNUM) + (compare:CC_NCV (const_int 0) (match_operand:DI 1 "register_operand" "0,r"))) (set (match_operand:DI 0 "register_operand" "=r,&r") @@ -4697,8 +4706,12 @@ (compare:CC (const_int 0) (match_dup 1))) (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))]) - (parallel [(set (reg:CC CC_REGNUM) - (compare:CC (const_int 0) (match_dup 3))) + (parallel [(set (reg:CC_C CC_REGNUM) + (compare:CC_C + (const_int 0) + (plus:DI + (zero_extend:DI (match_dup 3)) + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) (set (match_dup 2) (minus:SI (minus:SI (const_int 0) (match_dup 3)) @@ -4738,7 +4751,7 @@ (clobber (reg:CC CC_REGNUM))] "TARGET_ARM" "#" ; "rsbs\\t%Q0, %Q1, #0\;rsc\\t%R0, %R1, #0" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (const_int 0) (match_dup 1))) (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))]) @@ -4756,12 +4769,14 @@ ) (define_insn "*negsi2_carryin_compare" - [(set (reg:CC CC_REGNUM) - (compare:CC (const_int 0) - (match_operand:SI 1 "s_register_operand" "r"))) + [(set (reg:CC_C CC_REGNUM) + (compare:CC_C + (const_int 0) + (plus:DI + (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r")) + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) (set (match_operand:SI 0 "s_register_operand" "=r") - (minus:SI (minus:SI (const_int 0) - (match_dup 1)) + (minus:SI (minus:SI (const_int 0) (match_dup 1)) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))] "TARGET_ARM" "rscs\\t%0, %1, #0" @@ -7432,14 +7447,17 @@ (clobber (match_scratch:SI 2 "=r"))] "TARGET_32BIT" "#" ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(set (reg:CC CC_REGNUM) - (compare:CC (match_dup 0) (match_dup 1))) - (parallel [(set (reg:CC CC_REGNUM) - (compare:CC (match_dup 3) (match_dup 4))) - (set (match_dup 2) - (minus:SI (match_dup 5) - (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] + (compare:CC (match_dup 0) (match_dup 1))) + (parallel [(set (reg:CC_C CC_REGNUM) + (compare:CC_C + (zero_extend:DI (match_dup 3)) + (plus:DI (zero_extend:DI (match_dup 4)) + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) + (set (match_dup 2) + (minus:SI (match_dup 5) + (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] { operands[3] = gen_highpart (SImode, operands[0]); operands[0] = gen_lowpart (SImode, operands[0]); @@ -7456,7 +7474,10 @@ operands[5] = gen_rtx_MINUS (SImode, operands[3], operands[4]); } operands[1] = gen_lowpart (SImode, operands[1]); - operands[2] = gen_lowpart (SImode, operands[2]); + if (can_create_pseudo_p ()) + operands[2] = gen_reg_rtx (SImode); + else + operands[2] = gen_lowpart (SImode, operands[2]); } [(set_attr "conds" "set") (set_attr "length" "8") @@ -7470,7 +7491,7 @@ "TARGET_32BIT" "#" ; "cmp\\t%R0, %R1\;it eq\;cmpeq\\t%Q0, %Q1" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) --- gcc/config/arm/thumb2.md.orig 2016-12-08 16:00:59.017597265 +0100 +++ gcc/config/arm/thumb2.md 2016-12-08 16:02:38.591592456 +0100 @@ -132,7 +132,7 @@ (clobber (reg:CC CC_REGNUM))] "TARGET_THUMB2" "#" ; negs\\t%Q0, %Q1\;sbc\\t%R0, %R1, %R1, lsl #1 - "&& reload_completed" + "&& (!TARGET_NEON || reload_completed)" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (const_int 0) (match_dup 1))) (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))]) --- /dev/null 2016-12-08 15:50:45.426271450 +0100 +++ gcc/testsuite/gcc.target/arm/pr77308-2.c 2016-12-08 16:02:38.591592456 +0100 @@ -0,0 +1,169 @@ +/* { dg-do compile } */ +/* { dg-options "-Os -Wstack-usage=2500" } */ + +/* This is a modified algorithm with 64bit cmp and neg at the Sigma-blocks. + It improves the test coverage of cmpdi and negdi2 patterns. + Unlike the original test case these insns can reach the reload pass, + which may result in large stack usage. */ + +#define SHA_LONG64 unsigned long long +#define U64(C) C##ULL + +#define SHA_LBLOCK 16 +#define SHA512_CBLOCK (SHA_LBLOCK*8) + +typedef struct SHA512state_st { + SHA_LONG64 h[8]; + SHA_LONG64 Nl, Nh; + union { + SHA_LONG64 d[SHA_LBLOCK]; + unsigned char p[SHA512_CBLOCK]; + } u; + unsigned int num, md_len; +} SHA512_CTX; + +static const SHA_LONG64 K512[80] = { + U64(0x428a2f98d728ae22), U64(0x7137449123ef65cd), + U64(0xb5c0fbcfec4d3b2f), U64(0xe9b5dba58189dbbc), + U64(0x3956c25bf348b538), U64(0x59f111f1b605d019), + U64(0x923f82a4af194f9b), U64(0xab1c5ed5da6d8118), + U64(0xd807aa98a3030242), U64(0x12835b0145706fbe), + U64(0x243185be4ee4b28c), U64(0x550c7dc3d5ffb4e2), + U64(0x72be5d74f27b896f), U64(0x80deb1fe3b1696b1), + U64(0x9bdc06a725c71235), U64(0xc19bf174cf692694), + U64(0xe49b69c19ef14ad2), U64(0xefbe4786384f25e3), + U64(0x0fc19dc68b8cd5b5), U64(0x240ca1cc77ac9c65), + U64(0x2de92c6f592b0275), U64(0x4a7484aa6ea6e483), + U64(0x5cb0a9dcbd41fbd4), U64(0x76f988da831153b5), + U64(0x983e5152ee66dfab), U64(0xa831c66d2db43210), + U64(0xb00327c898fb213f), U64(0xbf597fc7beef0ee4), + U64(0xc6e00bf33da88fc2), U64(0xd5a79147930aa725), + U64(0x06ca6351e003826f), U64(0x142929670a0e6e70), + U64(0x27b70a8546d22ffc), U64(0x2e1b21385c26c926), + U64(0x4d2c6dfc5ac42aed), U64(0x53380d139d95b3df), + U64(0x650a73548baf63de), U64(0x766a0abb3c77b2a8), + U64(0x81c2c92e47edaee6), U64(0x92722c851482353b), + U64(0xa2bfe8a14cf10364), U64(0xa81a664bbc423001), + U64(0xc24b8b70d0f89791), U64(0xc76c51a30654be30), + U64(0xd192e819d6ef5218), U64(0xd69906245565a910), + U64(0xf40e35855771202a), U64(0x106aa07032bbd1b8), + U64(0x19a4c116b8d2d0c8), U64(0x1e376c085141ab53), + U64(0x2748774cdf8eeb99), U64(0x34b0bcb5e19b48a8), + U64(0x391c0cb3c5c95a63), U64(0x4ed8aa4ae3418acb), + U64(0x5b9cca4f7763e373), U64(0x682e6ff3d6b2b8a3), + U64(0x748f82ee5defb2fc), U64(0x78a5636f43172f60), + U64(0x84c87814a1f0ab72), U64(0x8cc702081a6439ec), + U64(0x90befffa23631e28), U64(0xa4506cebde82bde9), + U64(0xbef9a3f7b2c67915), U64(0xc67178f2e372532b), + U64(0xca273eceea26619c), U64(0xd186b8c721c0c207), + U64(0xeada7dd6cde0eb1e), U64(0xf57d4f7fee6ed178), + U64(0x06f067aa72176fba), U64(0x0a637dc5a2c898a6), + U64(0x113f9804bef90dae), U64(0x1b710b35131c471b), + U64(0x28db77f523047d84), U64(0x32caab7b40c72493), + U64(0x3c9ebe0a15c9bebc), U64(0x431d67c49c100d4c), + U64(0x4cc5d4becb3e42b6), U64(0x597f299cfc657e2a), + U64(0x5fcb6fab3ad6faec), U64(0x6c44198c4a475817) +}; + +#define B(x,j) (((SHA_LONG64)(*(((const unsigned char *)(&x))+j)))<<((7-j)*8)) +#define PULL64(x) (B(x,0)|B(x,1)|B(x,2)|B(x,3)|B(x,4)|B(x,5)|B(x,6)|B(x,7)) +#define ROTR(x,s) (((x)>>s) | (x)<<(64-s)) +#define Sigma0(x) (ROTR((x),28) ^ ROTR((x),34) ^ (ROTR((x),39) == (x)) ? -(x) : (x)) +#define Sigma1(x) (ROTR((x),14) ^ ROTR(-(x),18) ^ ((long long)ROTR((x),41) < (long long)(x)) ? -(x) : (x)) +#define sigma0(x) (ROTR((x),1) ^ ROTR((x),8) ^ (((x)>>7) > (x)) ? -(x) : (x)) +#define sigma1(x) (ROTR((x),19) ^ ROTR((x),61) ^ ((long long)((x)>>6) < (long long)(x)) ? -(x) : (x)) +#define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z))) +#define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z))) + +#define ROUND_00_15(i,a,b,c,d,e,f,g,h) do { \ + T1 += h + Sigma1(e) + Ch(e,f,g) + K512[i]; \ + h = Sigma0(a) + Maj(a,b,c); \ + d += T1; h += T1; } while (0) +#define ROUND_16_80(i,j,a,b,c,d,e,f,g,h,X) do { \ + s0 = X[(j+1)&0x0f]; s0 = sigma0(s0); \ + s1 = X[(j+14)&0x0f]; s1 = sigma1(s1); \ + T1 = X[(j)&0x0f] += s0 + s1 + X[(j+9)&0x0f]; \ + ROUND_00_15(i+j,a,b,c,d,e,f,g,h); } while (0) +void sha512_block_data_order(SHA512_CTX *ctx, const void *in, + unsigned int num) +{ + const SHA_LONG64 *W = in; + SHA_LONG64 a, b, c, d, e, f, g, h, s0, s1, T1; + SHA_LONG64 X[16]; + int i; + + while (num--) { + + a = ctx->h[0]; + b = ctx->h[1]; + c = ctx->h[2]; + d = ctx->h[3]; + e = ctx->h[4]; + f = ctx->h[5]; + g = ctx->h[6]; + h = ctx->h[7]; + + T1 = X[0] = PULL64(W[0]); + ROUND_00_15(0, a, b, c, d, e, f, g, h); + T1 = X[1] = PULL64(W[1]); + ROUND_00_15(1, h, a, b, c, d, e, f, g); + T1 = X[2] = PULL64(W[2]); + ROUND_00_15(2, g, h, a, b, c, d, e, f); + T1 = X[3] = PULL64(W[3]); + ROUND_00_15(3, f, g, h, a, b, c, d, e); + T1 = X[4] = PULL64(W[4]); + ROUND_00_15(4, e, f, g, h, a, b, c, d); + T1 = X[5] = PULL64(W[5]); + ROUND_00_15(5, d, e, f, g, h, a, b, c); + T1 = X[6] = PULL64(W[6]); + ROUND_00_15(6, c, d, e, f, g, h, a, b); + T1 = X[7] = PULL64(W[7]); + ROUND_00_15(7, b, c, d, e, f, g, h, a); + T1 = X[8] = PULL64(W[8]); + ROUND_00_15(8, a, b, c, d, e, f, g, h); + T1 = X[9] = PULL64(W[9]); + ROUND_00_15(9, h, a, b, c, d, e, f, g); + T1 = X[10] = PULL64(W[10]); + ROUND_00_15(10, g, h, a, b, c, d, e, f); + T1 = X[11] = PULL64(W[11]); + ROUND_00_15(11, f, g, h, a, b, c, d, e); + T1 = X[12] = PULL64(W[12]); + ROUND_00_15(12, e, f, g, h, a, b, c, d); + T1 = X[13] = PULL64(W[13]); + ROUND_00_15(13, d, e, f, g, h, a, b, c); + T1 = X[14] = PULL64(W[14]); + ROUND_00_15(14, c, d, e, f, g, h, a, b); + T1 = X[15] = PULL64(W[15]); + ROUND_00_15(15, b, c, d, e, f, g, h, a); + + for (i = 16; i < 80; i += 16) { + ROUND_16_80(i, 0, a, b, c, d, e, f, g, h, X); + ROUND_16_80(i, 1, h, a, b, c, d, e, f, g, X); + ROUND_16_80(i, 2, g, h, a, b, c, d, e, f, X); + ROUND_16_80(i, 3, f, g, h, a, b, c, d, e, X); + ROUND_16_80(i, 4, e, f, g, h, a, b, c, d, X); + ROUND_16_80(i, 5, d, e, f, g, h, a, b, c, X); + ROUND_16_80(i, 6, c, d, e, f, g, h, a, b, X); + ROUND_16_80(i, 7, b, c, d, e, f, g, h, a, X); + ROUND_16_80(i, 8, a, b, c, d, e, f, g, h, X); + ROUND_16_80(i, 9, h, a, b, c, d, e, f, g, X); + ROUND_16_80(i, 10, g, h, a, b, c, d, e, f, X); + ROUND_16_80(i, 11, f, g, h, a, b, c, d, e, X); + ROUND_16_80(i, 12, e, f, g, h, a, b, c, d, X); + ROUND_16_80(i, 13, d, e, f, g, h, a, b, c, X); + ROUND_16_80(i, 14, c, d, e, f, g, h, a, b, X); + ROUND_16_80(i, 15, b, c, d, e, f, g, h, a, X); + } + + ctx->h[0] += a; + ctx->h[1] += b; + ctx->h[2] += c; + ctx->h[3] += d; + ctx->h[4] += e; + ctx->h[5] += f; + ctx->h[6] += g; + ctx->h[7] += h; + + W += SHA_LBLOCK; + } +}