From patchwork Wed May 18 19:14:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1632924 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=CgyhaMzV; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4L3N703hLNz9sfG for ; Thu, 19 May 2022 05:18:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9C01A3848580 for ; Wed, 18 May 2022 19:18:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9C01A3848580 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652901482; bh=6OnHlWxofdJIKta8KAGN30A4W/ltOe7J+X9DxvQ6Ocw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=CgyhaMzVKl14v+S0IMgZcM98HI7S1P6q+Wy23TKbUFtX4sf0xDCWY497qBMQpbkK3 lHRD03Kr2bPVJMQn/WizXW3VGtMZBXYtmWw0t2UHNz8JLnn2lHZYWhPtGuLlSj9/hB HxvzodF9dnzhQpUCl/4u5xo0Odu+Fwd+7HP+SXGY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id 5E9B438485B9 for ; Wed, 18 May 2022 19:14:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5E9B438485B9 Received: by mail-ot1-x330.google.com with SMTP id b4-20020a056830104400b0060adcc8a299so249957otp.3 for ; Wed, 18 May 2022 12:14:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6OnHlWxofdJIKta8KAGN30A4W/ltOe7J+X9DxvQ6Ocw=; b=Xp1A8leBA8HT2vmUEDNTdCs2mQ5xgGV/KZFYvRh3B6CZcQ59/+x4skuoV/sMT74day 8RQb4CYEG5e9Sbl0XGcA+v37zfAHh1bRTXyLFeeU6VPE9Av8ILQtdasz1Yi4+ZuuqtfR hbBZSTbq7H8xptCORsIWJDuOPxWdIaDFq6bRxN/4sRVUN6Uuy9LrH5pyy+jkWPlUQBKh f4FGfE6jwWBbzzkDssokneRRy00/LWbMa1xCsfmh/jm0OjdyLN0Gr2p6ieTKva/UPvd1 uVcvxrZ3InETnDQvp16FAnzf48RGXnFGdvIKUC9JBQCK000QcKkAX1l55xPLWnKwBHy9 MCxQ== X-Gm-Message-State: AOAM532oS9lbkOQHzvOUIdbT5cc6wRoQeJs3mRRUkeqUc7PNFJYbGbmW 6tT8mnnQVrc9+2AvVGkUCWwtNVkp1OEPEg== X-Google-Smtp-Source: ABdhPJy1o/zWkIgyHGqzX2HHwNjQAEJhEHZFIDFverFGZfGpKx+lImXnLmq1ML63OnmbHDnDJyt1dA== X-Received: by 2002:a9d:6645:0:b0:605:fb52:3739 with SMTP id q5-20020a9d6645000000b00605fb523739mr529676otm.124.1652901279034; Wed, 18 May 2022 12:14:39 -0700 (PDT) Received: from birita.. ([2804:431:c7cb:cdd6:1a62:669c:7cd2:ac43]) by smtp.gmail.com with ESMTPSA id i131-20020acaea89000000b00325cda1ffbasm1033011oih.57.2022.05.18.12.14.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 May 2022 12:14:38 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 04/10] aarch64: Add optimized chacha20 Date: Wed, 18 May 2022 16:14:18 -0300 Message-Id: <20220518191424.3630729-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220518191424.3630729-1-adhemerval.zanella@linaro.org> References: <20220518191424.3630729-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-aarch64.S. It is used as default and only little-endian is supported (BE uses generic fallback code). As for generic implementation, the last step that XOR with the input is omited. On a Neoverse-N1 it shows the following improvements (using formatted bench-arc4random data): GENERIC Function MB/s -------------------------------------------------- arc4random [single-thread] 136.85 arc4random_buf(16) [single-thread] 272.21 arc4random_buf(32) [single-thread] 335.59 arc4random_buf(48) [single-thread] 373.36 arc4random_buf(64) [single-thread] 394.02 arc4random_buf(80) [single-thread] 401.40 arc4random_buf(96) [single-thread] 411.80 arc4random_buf(112) [single-thread] 416.15 arc4random_buf(128) [single-thread] 421.16 -------------------------------------------------- OPTIMIZED Function MB/s -------------------------------------------------- arc4random [single-thread] 154.98 arc4random_buf(16) [single-thread] 342.63 arc4random_buf(32) [single-thread] 485.91 arc4random_buf(48) [single-thread] 539.95 arc4random_buf(64) [single-thread] 593.38 arc4random_buf(80) [single-thread] 629.45 arc4random_buf(96) [single-thread] 655.78 arc4random_buf(112) [single-thread] 670.54 arc4random_buf(128) [single-thread] 681.65 -------------------------------------------------- Checked on aarch64-linux-gnu. --- LICENSES | 20 ++ stdlib/chacha20.c | 8 +- sysdeps/aarch64/Makefile | 4 + sysdeps/aarch64/chacha20-neon.S | 323 ++++++++++++++++++++++++++++++++ sysdeps/aarch64/chacha20_arch.h | 40 ++++ sysdeps/generic/chacha20_arch.h | 24 +++ 6 files changed, 417 insertions(+), 2 deletions(-) create mode 100644 sysdeps/aarch64/chacha20-neon.S create mode 100644 sysdeps/aarch64/chacha20_arch.h create mode 100644 sysdeps/generic/chacha20_arch.h diff --git a/LICENSES b/LICENSES index 530893b1dc..7288d281dc 100644 --- a/LICENSES +++ b/LICENSES @@ -389,3 +389,23 @@ Copyright 2001 by Stephen L. Moshier You should have received a copy of the GNU Lesser General Public License along with this library; if not, see . */ + +sysdeps/aarch64/chacha20.S imports code from libgcrypt, with the +following notices: + +Copyright (C) 2017-2019 Jussi Kivilinna + +This file is part of Libgcrypt. + +Libgcrypt is free software; you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as +published by the Free Software Foundation; either version 2.1 of +the License, or (at your option) any later version. + +Libgcrypt is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Lesser General Public License for more details. + +You should have received a copy of the GNU Lesser General Public +License along with this program; if not, see . diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c index 28630121f9..a18bf60a55 100644 --- a/stdlib/chacha20.c +++ b/stdlib/chacha20.c @@ -166,8 +166,9 @@ chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) } static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) +__attribute_maybe_unused__ +chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) { while (bytes >= CHACHA20_BLOCK_SIZE) { @@ -186,3 +187,6 @@ chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, explicit_bzero (stream, sizeof stream); } } + +/* Get the architecture optimized version. */ +#include diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 17fb1c5b72..d92e3e0cb1 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,6 +51,10 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-neon +endif + ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-neon.S b/sysdeps/aarch64/chacha20-neon.S new file mode 100644 index 0000000000..f5652d5062 --- /dev/null +++ b/sysdeps/aarch64/chacha20-neon.S @@ -0,0 +1,323 @@ +/* Optimized AArch64 implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +/* Only LE is supported. */ +#ifdef __AARCH64EL__ + +/* Based on D. J. Bernstein reference implementation at + http://cr.yp.to/chacha.html: + + chacha-regs.c version 20080118 + D. J. Bernstein + Public domain. */ + +#define GET_DATA_POINTER(reg, name) \ + adrp reg, name ; \ + add reg, reg, :lo12:name + +/* 'ret' instruction replacement for straight-line speculation mitigation */ +#define ret_spec_stop \ + ret; dsb sy; isb; + +.cpu generic+simd + +.text + +/* register macros */ +#define INPUT x0 +#define DST x1 +#define SRC x2 +#define NBLKS x3 +#define ROUND x4 +#define INPUT_CTR x5 +#define INPUT_POS x6 +#define CTR x7 + +/* vector registers */ +#define X0 v16 +#define X4 v17 +#define X8 v18 +#define X12 v19 + +#define X1 v20 +#define X5 v21 + +#define X9 v22 +#define X13 v23 +#define X2 v24 +#define X6 v25 + +#define X3 v26 +#define X7 v27 +#define X11 v28 +#define X15 v29 + +#define X10 v30 +#define X14 v31 + +#define VCTR v0 +#define VTMP0 v1 +#define VTMP1 v2 +#define VTMP2 v3 +#define VTMP3 v4 +#define X12_TMP v5 +#define X13_TMP v6 +#define ROT8 v7 + +/********************************************************************** + helper macros + **********************************************************************/ + +#define _(...) __VA_ARGS__ + +#define vpunpckldq(s1, s2, dst) \ + zip1 dst.4s, s2.4s, s1.4s; + +#define vpunpckhdq(s1, s2, dst) \ + zip2 dst.4s, s2.4s, s1.4s; + +#define vpunpcklqdq(s1, s2, dst) \ + zip1 dst.2d, s2.2d, s1.2d; + +#define vpunpckhqdq(s1, s2, dst) \ + zip2 dst.2d, s2.2d, s1.2d; + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ + vpunpckhdq(x1, x0, t2); \ + vpunpckldq(x1, x0, x0); \ + \ + vpunpckldq(x3, x2, t1); \ + vpunpckhdq(x3, x2, x2); \ + \ + vpunpckhqdq(t1, x0, x1); \ + vpunpcklqdq(t1, x0, x0); \ + \ + vpunpckhqdq(x2, t2, x3); \ + vpunpcklqdq(x2, t2, x2); + +#define clear(x) \ + movi x.16b, #0; + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define XOR(d,s1,s2) \ + eor d.16b, s2.16b, s1.16b; + +#define PLUS(ds,s) \ + add ds.4s, ds.4s, s.4s; + +#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ + shl dst1.4s, src1.4s, #(c); \ + shl dst2.4s, src2.4s, #(c); \ + shl dst3.4s, src3.4s, #(c); \ + shl dst4.4s, src4.4s, #(c); \ + sri dst1.4s, src1.4s, #(32 - (c)); \ + sri dst2.4s, src2.4s, #(32 - (c)); \ + sri dst3.4s, src3.4s, #(32 - (c)); \ + sri dst4.4s, src4.4s, #(32 - (c)); + +#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ + tbl dst1.16b, {src1.16b}, ROT8.16b; \ + tbl dst2.16b, {src2.16b}, ROT8.16b; \ + tbl dst3.16b, {src3.16b}, ROT8.16b; \ + tbl dst4.16b, {src4.16b}, ROT8.16b; + +#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ + rev32 dst1.8h, src1.8h; \ + rev32 dst2.8h, src2.8h; \ + rev32 dst3.8h, src3.8h; \ + rev32 dst4.8h, src4.8h; + +#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ + PLUS(a1,b1); PLUS(a2,b2); \ + PLUS(a3,b3); PLUS(a4,b4); \ + XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ + XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ + ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ + PLUS(c1,d1); PLUS(c2,d2); \ + PLUS(c3,d3); PLUS(c4,d4); \ + XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ + XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ + ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ + PLUS(a1,b1); PLUS(a2,b2); \ + PLUS(a3,b3); PLUS(a4,b4); \ + XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ + XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ + ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ + PLUS(c1,d1); PLUS(c2,d2); \ + PLUS(c3,d3); PLUS(c4,d4); \ + XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ + XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ + ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ + +.align 4 +L(__chacha20_blocks4_data_inc_counter): + .long 0,1,2,3 + +.align 4 +L(__chacha20_blocks4_data_rot8): + .byte 3,0,1,2 + .byte 7,4,5,6 + .byte 11,8,9,10 + .byte 15,12,13,14 + +.hidden __chacha20_neon_blocks4 +ENTRY (__chacha20_neon_blocks4) + /* input: + * x0: input + * x1: dst + * x2: src + * x3: nblks (multiple of 4) + */ + + GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) + add INPUT_CTR, INPUT, #(12*4); + ld1 {ROT8.16b}, [CTR]; + GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) + mov INPUT_POS, INPUT; + ld1 {VCTR.16b}, [CTR]; + +L(loop4): + /* Construct counter vectors X12 and X13 */ + + ld1 {X15.16b}, [INPUT_CTR]; + mov ROUND, #20; + ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; + + dup X12.4s, X15.s[0]; + dup X13.4s, X15.s[1]; + ldr CTR, [INPUT_CTR]; + add X12.4s, X12.4s, VCTR.4s; + dup X0.4s, VTMP1.s[0]; + dup X1.4s, VTMP1.s[1]; + dup X2.4s, VTMP1.s[2]; + dup X3.4s, VTMP1.s[3]; + dup X14.4s, X15.s[2]; + cmhi VTMP0.4s, VCTR.4s, X12.4s; + dup X15.4s, X15.s[3]; + add CTR, CTR, #4; /* Update counter */ + dup X4.4s, VTMP2.s[0]; + dup X5.4s, VTMP2.s[1]; + dup X6.4s, VTMP2.s[2]; + dup X7.4s, VTMP2.s[3]; + sub X13.4s, X13.4s, VTMP0.4s; + dup X8.4s, VTMP3.s[0]; + dup X9.4s, VTMP3.s[1]; + dup X10.4s, VTMP3.s[2]; + dup X11.4s, VTMP3.s[3]; + mov X12_TMP.16b, X12.16b; + mov X13_TMP.16b, X13.16b; + str CTR, [INPUT_CTR]; + +L(round2): + subs ROUND, ROUND, #2 + QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, + X2, X6, X10, X14, X3, X7, X11, X15, + tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) + QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, + X2, X7, X8, X13, X3, X4, X9, X14, + tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) + b.ne L(round2); + + ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; + + PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ + PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ + + dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ + dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ + dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ + dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ + PLUS(X0, VTMP2); + PLUS(X1, VTMP3); + PLUS(X2, X12_TMP); + PLUS(X3, X13_TMP); + + dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ + dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ + dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ + dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ + ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; + mov INPUT_POS, INPUT; + PLUS(X4, VTMP2); + PLUS(X5, VTMP3); + PLUS(X6, X12_TMP); + PLUS(X7, X13_TMP); + + dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ + dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ + dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ + dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ + dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ + dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ + PLUS(X8, VTMP2); + PLUS(X9, VTMP3); + PLUS(X10, X12_TMP); + PLUS(X11, X13_TMP); + PLUS(X14, VTMP0); + PLUS(X15, VTMP1); + + transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); + transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); + transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); + transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); + + subs NBLKS, NBLKS, #4; + + st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 + st1 {X1.16b,X5.16b}, [DST], #32; + st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 + st1 {X10.16b,X14.16b}, [DST], #32; + st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; + + b.ne L(loop4); + + /* clear the used vector registers and stack */ + clear(VTMP0); + clear(VTMP1); + clear(VTMP2); + clear(VTMP3); + clear(X12_TMP); + clear(X13_TMP); + clear(X0); + clear(X1); + clear(X2); + clear(X3); + clear(X4); + clear(X5); + clear(X6); + clear(X7); + clear(X8); + clear(X9); + clear(X10); + clear(X11); + clear(X12); + clear(X13); + clear(X14); + clear(X15); + + eor x0, x0, x0 + ret_spec_stop +END (__chacha20_neon_blocks4) + +#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h new file mode 100644 index 0000000000..9febee7bb6 --- /dev/null +++ b/sysdeps/aarch64/chacha20_arch.h @@ -0,0 +1,40 @@ +/* Chacha20 implementation, used on arc4random. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); +#ifdef __AARCH64EL__ + __chacha20_neon_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +#else + chacha20_crypt_generic (state, dst, src, bytes); +#endif +} diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h new file mode 100644 index 0000000000..efad41d034 --- /dev/null +++ b/sysdeps/generic/chacha20_arch.h @@ -0,0 +1,24 @@ +/* Chacha20 implementation, generic interface for encrypt. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +static inline void +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ + chacha20_crypt_generic (state, dst, src, bytes); +}