From patchwork Thu May 13 08:46:17 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bin Meng <bmeng.cn@gmail.com>
X-Patchwork-Id: 1477961
X-Patchwork-Delegate: uboot@andestech.com
Return-Path: <u-boot-bounces@lists.denx.de>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=lists.denx.de
 (client-ip=2a01:238:438b:c500:173d:9f52:ddab:ee01; helo=phobos.denx.de;
 envelope-from=u-boot-bounces@lists.denx.de; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256
 header.s=20161025 header.b=nvd6fMAg;
	dkim-atps=neutral
Received: from phobos.denx.de (phobos.denx.de
 [IPv6:2a01:238:438b:c500:173d:9f52:ddab:ee01])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4Fgld700QVz9sWX
	for <incoming@patchwork.ozlabs.org>; Thu, 13 May 2021 18:46:33 +1000 (AEST)
Received: from h2850616.stratoserver.net (localhost [IPv6:::1])
	by phobos.denx.de (Postfix) with ESMTP id 9C26081CDE;
	Thu, 13 May 2021 10:46:23 +0200 (CEST)
Authentication-Results: phobos.denx.de;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: phobos.denx.de;
 spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de
Authentication-Results: phobos.denx.de;
	dkim=pass (2048-bit key;
 unprotected) header.d=gmail.com header.i=@gmail.com header.b="nvd6fMAg";
	dkim-atps=neutral
Received: by phobos.denx.de (Postfix, from userid 109)
 id 4752181D32; Thu, 13 May 2021 10:46:21 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on phobos.denx.de
X-Spam-Level: 
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
 DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,SPF_HELO_NONE autolearn=ham
 autolearn_force=no version=3.4.2
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com
 [IPv6:2a00:1450:4864:20::52a])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits))
 (No client certificate requested)
 by phobos.denx.de (Postfix) with ESMTPS id 3793681B55
 for <u-boot@lists.denx.de>; Thu, 13 May 2021 10:46:17 +0200 (CEST)
Authentication-Results: phobos.denx.de;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: phobos.denx.de;
 spf=pass smtp.mailfrom=bmeng.cn@gmail.com
Received: by mail-ed1-x52a.google.com with SMTP id g14so30152490edy.6
 for <u-boot@lists.denx.de>; Thu, 13 May 2021 01:46:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:cc:subject:date:message-id:mime-version
 :content-transfer-encoding;
 bh=37lMZL0T6ctXPlfsIzwEMDxDmCo0tpwTNkg00iQO6hY=;
 b=nvd6fMAger7BTdTUxCFPoxA9GJXD+pr2pe2pZJwkH+1ARcraIYDxxLHC2CJFk2/T51
 Mowpijx9oYQKJJs508yjMfuhCfjEs/j3mSdEaYpaLodGQIlmimgMak6rE2vCxc0iXwRc
 jkwh7rZYt1c+nKQUYqBxYmLJlwCnUWC+ZhQOq9wgdas29Y9Y5ghWBOGG3hM+XpvDMw/r
 YqM+K24X9zEf52Ziqg/L73HI4Rj2umRqyzrnL8V3JN6s43k5CVHpclgyFAb1Wcy+bxjW
 8PQ30cAYCjUEZcK+JxJLpHWQZO6fJ9O2kBUEol+Eoeu35863FZbf+Q5ADCsz5FTZ96Ny
 8xJA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version
 :content-transfer-encoding;
 bh=37lMZL0T6ctXPlfsIzwEMDxDmCo0tpwTNkg00iQO6hY=;
 b=t2RO43Gsxyuzsu4IBr6QYm+x+GWMx9DkRxEpPgkOkgwAGgkscvtp0R0c7NmtUPeKKk
 bcrRXcPuUBLaZr9zNKzRyAxdZCbRD0nJW0LNc+XFv61AzqprP10nJTQ8pwuss+YCk0hO
 9vfTeqd5D83RrbGWlONkAJHHPLUjoAztS4LAaV4ii8AE1IWbmmzDfeQgkKihRyhgTme0
 Ez/5TcxPJF45a1FtWTTXeW0g20v4eqCf5S/HYrKlPIYjuVXnAYGcCP5S12g/YnlSf9a6
 NjLqJYk6ZyyiZPQjGsOadyxzUjchm1pfOxT1m6Vz+1gJ1wX6N+l46v3OvrhzPDPvLae7
 JumQ==
X-Gm-Message-State: AOAM531xF4IsgH72TzHsuT2+rDX1/GTmUjjdgbDKKzFDOGTdDLrkbipE
 LHHQfBq/MYVYLIzutKMUhdDNLsNxre1vpg==
X-Google-Smtp-Source: 
 ABdhPJxwsNNXIWSLzM41CecIC8f8tAKBlY1P8GvakEQaDtqnTw3TlKed8uUvsSxloTqM1E483EWTeA==
X-Received: by 2002:aa7:c787:: with SMTP id n7mr13514388eds.309.1620895576784;
 Thu, 13 May 2021 01:46:16 -0700 (PDT)
Received: from pek-vx-bsp2.wrs.com
 (ec2-44-242-66-180.us-west-2.compute.amazonaws.com. [44.242.66.180])
 by smtp.gmail.com with ESMTPSA id p14sm1895141eds.28.2021.05.13.01.46.13
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Thu, 13 May 2021 01:46:16 -0700 (PDT)
From: Bin Meng <bmeng.cn@gmail.com>
To: Rick Chen <rick@andestech.com>,
	u-boot@lists.denx.de
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Subject: [PATCH 1/2] riscv: Fix memmove and optimise memcpy when misalign
Date: Thu, 13 May 2021 16:46:17 +0800
Message-Id: <20210513084618.2161331-1-bmeng.cn@gmail.com>
X-Mailer: git-send-email 2.25.1
MIME-Version: 1.0
X-BeenThere: u-boot@lists.denx.de
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: U-Boot discussion <u-boot.lists.denx.de>
List-Unsubscribe: <https://lists.denx.de/options/u-boot>,
 <mailto:u-boot-request@lists.denx.de?subject=unsubscribe>
List-Archive: <https://lists.denx.de/pipermail/u-boot/>
List-Post: <mailto:u-boot@lists.denx.de>
List-Help: <mailto:u-boot-request@lists.denx.de?subject=help>
List-Subscribe: <https://lists.denx.de/listinfo/u-boot>,
 <mailto:u-boot-request@lists.denx.de?subject=subscribe>
Errors-To: u-boot-bounces@lists.denx.de
Sender: "U-Boot" <u-boot-bounces@lists.denx.de>
X-Virus-Scanned: clamav-milter 0.102.4 at phobos.denx.de
X-Virus-Status: Clean

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@cs.nctu.edu.tw>
---

 arch/riscv/lib/memcpy.S  | 223 ++++++++++++++++++++++++---------------
 arch/riscv/lib/memmove.S | 176 ++++++++++++++++++++----------
 2 files changed, 257 insertions(+), 142 deletions(-)
diff --git a/arch/riscv/lib/memcpy.S b/arch/riscv/lib/memcpy.S
index 51ab716253..00672c19ad 100644
--- a/arch/riscv/lib/memcpy.S
+++ b/arch/riscv/lib/memcpy.S
@@ -9,100 +9,151 @@
 /* void *memcpy(void *, const void *, size_t) */
 ENTRY(__memcpy)
 WEAK(memcpy)
-	move t6, a0  /* Preserve return value */
+	/* Save for return value */
+	mv	t6, a0
 
-	/* Defer to byte-oriented copy for small sizes */
-	sltiu a3, a2, 128
-	bnez a3, 4f
-	/* Use word-oriented copy only if low-order bits match */
-	andi a3, t6, SZREG-1
-	andi a4, a1, SZREG-1
-	bne a3, a4, 4f
+	/*
+	 * Register allocation for code below:
+	 * a0 - start of uncopied dst
+	 * a1 - start of uncopied src
+	 * t0 - end of uncopied dst
+	 */
+	add	t0, a0, a2
 
-	beqz a3, 2f  /* Skip if already aligned */
 	/*
-	 * Round to nearest double word-aligned address
-	 * greater than or equal to start address
+	 * Use bytewise copy if too small.
+	 *
+	 * This threshold must be at least 2*SZREG to ensure at least one
+	 * wordwise copy is performed. It is chosen to be 16 because it will
+	 * save at least 7 iterations of bytewise copy, which pays off the
+	 * fixed overhead.
 	 */
-	andi a3, a1, ~(SZREG-1)
-	addi a3, a3, SZREG
-	/* Handle initial misalignment */
-	sub a4, a3, a1
+	li	a3, 16
+	bltu	a2, a3, .Lbyte_copy_tail
+
+	/*
+	 * Bytewise copy first to align a0 to word boundary.
+	 */
+	addi	a2, a0, SZREG-1
+	andi	a2, a2, ~(SZREG-1)
+	beq	a0, a2, 2f
 1:
-	lb a5, 0(a1)
-	addi a1, a1, 1
-	sb a5, 0(t6)
-	addi t6, t6, 1
-	bltu a1, a3, 1b
-	sub a2, a2, a4  /* Update count */
+	lb	a5, 0(a1)
+	addi	a1, a1, 1
+	sb	a5, 0(a0)
+	addi	a0, a0, 1
+	bne	a0, a2, 1b
+2:
+
+	/*
+	 * Now a0 is word-aligned. If a1 is also word aligned, we could perform
+	 * aligned word-wise copy. Otherwise we need to perform misaligned
+	 * word-wise copy.
+	 */
+	andi	a3, a1, SZREG-1
+	bnez	a3, .Lmisaligned_word_copy
 
+	/* Unrolled wordwise copy */
+	addi	t0, t0, -(16*SZREG-1)
+	bgeu	a0, t0, 2f
+1:
+	REG_L	a2,        0(a1)
+	REG_L	a3,    SZREG(a1)
+	REG_L	a4,  2*SZREG(a1)
+	REG_L	a5,  3*SZREG(a1)
+	REG_L	a6,  4*SZREG(a1)
+	REG_L	a7,  5*SZREG(a1)
+	REG_L	t1,  6*SZREG(a1)
+	REG_L	t2,  7*SZREG(a1)
+	REG_L	t3,  8*SZREG(a1)
+	REG_L	t4,  9*SZREG(a1)
+	REG_L	t5, 10*SZREG(a1)
+	REG_S	a2,        0(a0)
+	REG_S	a3,    SZREG(a0)
+	REG_S	a4,  2*SZREG(a0)
+	REG_S	a5,  3*SZREG(a0)
+	REG_S	a6,  4*SZREG(a0)
+	REG_S	a7,  5*SZREG(a0)
+	REG_S	t1,  6*SZREG(a0)
+	REG_S	t2,  7*SZREG(a0)
+	REG_S	t3,  8*SZREG(a0)
+	REG_S	t4,  9*SZREG(a0)
+	REG_S	t5, 10*SZREG(a0)
+	REG_L	a2, 11*SZREG(a1)
+	REG_L	a3, 12*SZREG(a1)
+	REG_L	a4, 13*SZREG(a1)
+	REG_L	a5, 14*SZREG(a1)
+	REG_L	a6, 15*SZREG(a1)
+	addi	a1, a1, 16*SZREG
+	REG_S	a2, 11*SZREG(a0)
+	REG_S	a3, 12*SZREG(a0)
+	REG_S	a4, 13*SZREG(a0)
+	REG_S	a5, 14*SZREG(a0)
+	REG_S	a6, 15*SZREG(a0)
+	addi	a0, a0, 16*SZREG
+	bltu	a0, t0, 1b
 2:
-	andi a4, a2, ~((16*SZREG)-1)
-	beqz a4, 4f
-	add a3, a1, a4
-3:
-	REG_L a4,       0(a1)
-	REG_L a5,   SZREG(a1)
-	REG_L a6, 2*SZREG(a1)
-	REG_L a7, 3*SZREG(a1)
-	REG_L t0, 4*SZREG(a1)
-	REG_L t1, 5*SZREG(a1)
-	REG_L t2, 6*SZREG(a1)
-	REG_L t3, 7*SZREG(a1)
-	REG_L t4, 8*SZREG(a1)
-	REG_L t5, 9*SZREG(a1)
-	REG_S a4,       0(t6)
-	REG_S a5,   SZREG(t6)
-	REG_S a6, 2*SZREG(t6)
-	REG_S a7, 3*SZREG(t6)
-	REG_S t0, 4*SZREG(t6)
-	REG_S t1, 5*SZREG(t6)
-	REG_S t2, 6*SZREG(t6)
-	REG_S t3, 7*SZREG(t6)
-	REG_S t4, 8*SZREG(t6)
-	REG_S t5, 9*SZREG(t6)
-	REG_L a4, 10*SZREG(a1)
-	REG_L a5, 11*SZREG(a1)
-	REG_L a6, 12*SZREG(a1)
-	REG_L a7, 13*SZREG(a1)
-	REG_L t0, 14*SZREG(a1)
-	REG_L t1, 15*SZREG(a1)
-	addi a1, a1, 16*SZREG
-	REG_S a4, 10*SZREG(t6)
-	REG_S a5, 11*SZREG(t6)
-	REG_S a6, 12*SZREG(t6)
-	REG_S a7, 13*SZREG(t6)
-	REG_S t0, 14*SZREG(t6)
-	REG_S t1, 15*SZREG(t6)
-	addi t6, t6, 16*SZREG
-	bltu a1, a3, 3b
-	andi a2, a2, (16*SZREG)-1  /* Update count */
-
-4:
-	/* Handle trailing misalignment */
-	beqz a2, 6f
-	add a3, a1, a2
-
-	/* Use word-oriented copy if co-aligned to word boundary */
-	or a5, a1, t6
-	or a5, a5, a3
-	andi a5, a5, 3
-	bnez a5, 5f
-7:
-	lw a4, 0(a1)
-	addi a1, a1, 4
-	sw a4, 0(t6)
-	addi t6, t6, 4
-	bltu a1, a3, 7b
+	/* Post-loop increment by 16*SZREG-1 and pre-loop decrement by SZREG-1 */
+	addi	t0, t0, 15*SZREG
 
-	ret
+	/* Wordwise copy */
+	bgeu	a0, t0, 2f
+1:
+	REG_L	a5, 0(a1)
+	addi	a1, a1, SZREG
+	REG_S	a5, 0(a0)
+	addi	a0, a0, SZREG
+	bltu	a0, t0, 1b
+2:
+	addi	t0, t0, SZREG-1
 
-5:
-	lb a4, 0(a1)
-	addi a1, a1, 1
-	sb a4, 0(t6)
-	addi t6, t6, 1
-	bltu a1, a3, 5b
-6:
+.Lbyte_copy_tail:
+	/*
+	 * Bytewise copy anything left.
+	 */
+	beq	a0, t0, 2f
+1:
+	lb	a5, 0(a1)
+	addi	a1, a1, 1
+	sb	a5, 0(a0)
+	addi	a0, a0, 1
+	bne	a0, t0, 1b
+2:
+
+	mv	a0, t6
 	ret
+
+.Lmisaligned_word_copy:
+	/*
+	 * Misaligned word-wise copy.
+	 * For misaligned copy we still perform word-wise copy, but we need to
+	 * use the value fetched from the previous iteration and do some shifts.
+	 * This is safe because we wouldn't access more words than necessary.
+	 */
+
+	/* Calculate shifts */
+	slli	t3, a3, 3
+	sub	t4, x0, t3 /* negate is okay as shift will only look at LSBs */
+
+	/* Load the initial value and align a1 */
+	andi	a1, a1, ~(SZREG-1)
+	REG_L	a5, 0(a1)
+
+	addi	t0, t0, -(SZREG-1)
+	/* At least one iteration will be executed here, no check */
+1:
+	srl	a4, a5, t3
+	REG_L	a5, SZREG(a1)
+	addi	a1, a1, SZREG
+	sll	a2, a5, t4
+	or	a2, a2, a4
+	REG_S	a2, 0(a0)
+	addi	a0, a0, SZREG
+	bltu	a0, t0, 1b
+
+	/* Update pointers to correct value */
+	addi	t0, t0, SZREG-1
+	add	a1, a1, a3
+
+	j	.Lbyte_copy_tail
 END(__memcpy)
diff --git a/arch/riscv/lib/memmove.S b/arch/riscv/lib/memmove.S
index 07d1d2152b..fbe6701dbe 100644
--- a/arch/riscv/lib/memmove.S
+++ b/arch/riscv/lib/memmove.S
@@ -5,60 +5,124 @@
 
 ENTRY(__memmove)
 WEAK(memmove)
-        move    t0, a0
-        move    t1, a1
-
-        beq     a0, a1, exit_memcpy
-        beqz    a2, exit_memcpy
-        srli    t2, a2, 0x2
-
-        slt     t3, a0, a1
-        beqz    t3, do_reverse
-
-        andi    a2, a2, 0x3
-        li      t4, 1
-        beqz    t2, byte_copy
-
-word_copy:
-        lw      t3, 0(a1)
-        addi    t2, t2, -1
-        addi    a1, a1, 4
-        sw      t3, 0(a0)
-        addi    a0, a0, 4
-        bnez    t2, word_copy
-        beqz    a2, exit_memcpy
-        j       byte_copy
-
-do_reverse:
-        add     a0, a0, a2
-        add     a1, a1, a2
-        andi    a2, a2, 0x3
-        li      t4, -1
-        beqz    t2, reverse_byte_copy
-
-reverse_word_copy:
-        addi    a1, a1, -4
-        addi    t2, t2, -1
-        lw      t3, 0(a1)
-        addi    a0, a0, -4
-        sw      t3, 0(a0)
-        bnez    t2, reverse_word_copy
-        beqz    a2, exit_memcpy
-
-reverse_byte_copy:
-        addi    a0, a0, -1
-        addi    a1, a1, -1
-
-byte_copy:
-        lb      t3, 0(a1)
-        addi    a2, a2, -1
-        sb      t3, 0(a0)
-        add     a1, a1, t4
-        add     a0, a0, t4
-        bnez    a2, byte_copy
-
-exit_memcpy:
-        move a0, t0
-        move a1, t1
-        ret
+	/*
+	 * Here we determine if forward copy is possible. Forward copy is
+	 * preferred to backward copy as it is more cache friendly.
+	 *
+	 * If a0 >= a1, t0 gives their distance, if t0 >= a2 then we can
+	 *   copy forward.
+	 * If a0 < a1, we can always copy forward. This will make t0 negative,
+	 *   so a *unsigned* comparison will always have t0 >= a2.
+	 *
+	 * For forward copy we just delegate the task to memcpy.
+	 */
+	sub	t0, a0, a1
+	bltu	t0, a2, 1f
+	tail	__memcpy
+1:
+
+	/*
+	 * Register allocation for code below:
+	 * a0 - end of uncopied dst
+	 * a1 - end of uncopied src
+	 * t0 - start of uncopied dst
+	 */
+	mv	t0, a0
+	add	a0, a0, a2
+	add	a1, a1, a2
+
+	/*
+	 * Use bytewise copy if too small.
+	 *
+	 * This threshold must be at least 2*SZREG to ensure at least one
+	 * wordwise copy is performed. It is chosen to be 16 because it will
+	 * save at least 7 iterations of bytewise copy, which pays off the
+	 * fixed overhead.
+	 */
+	li	a3, 16
+	bltu	a2, a3, .Lbyte_copy_tail
+
+	/*
+	 * Bytewise copy first to align t0 to word boundary.
+	 */
+	andi	a2, a0, ~(SZREG-1)
+	beq	a0, a2, 2f
+1:
+	addi	a1, a1, -1
+	lb	a5, 0(a1)
+	addi	a0, a0, -1
+	sb	a5, 0(a0)
+	bne	a0, a2, 1b
+2:
+
+	/*
+	 * Now a0 is word-aligned. If a1 is also word aligned, we could perform
+	 * aligned word-wise copy. Otherwise we need to perform misaligned
+	 * word-wise copy.
+	 */
+	andi	a3, a1, SZREG-1
+	bnez	a3, .Lmisaligned_word_copy
+
+	/* Wordwise copy */
+	addi	t0, t0, SZREG-1
+	bleu	a0, t0, 2f
+1:
+	addi	a1, a1, -SZREG
+	REG_L	a5, 0(a1)
+	addi	a0, a0, -SZREG
+	REG_S	a5, 0(a0)
+	bgtu	a0, t0, 1b
+2:
+	addi	t0, t0, -(SZREG-1)
+
+.Lbyte_copy_tail:
+	/*
+	 * Bytewise copy anything left.
+	 */
+	beq	a0, t0, 2f
+1:
+	addi	a1, a1, -1
+	lb	a5, 0(a1)
+	addi	a0, a0, -1
+	sb	a5, 0(a0)
+	bne	a0, t0, 1b
+2:
+
+	mv	a0, t0
+	ret
+
+.Lmisaligned_word_copy:
+	/*
+	 * Misaligned word-wise copy.
+	 * For misaligned copy we still perform word-wise copy, but we need to
+	 * use the value fetched from the previous iteration and do some shifts.
+	 * This is safe because we wouldn't access more words than necessary.
+	 */
+
+	/* Calculate shifts */
+	slli	t3, a3, 3
+	sub	t4, x0, t3 /* negate is okay as shift will only look at LSBs */
+
+	/* Load the initial value and align a1 */
+	andi	a1, a1, ~(SZREG-1)
+	REG_L	a5, 0(a1)
+
+	addi	t0, t0, SZREG-1
+	/* At least one iteration will be executed here, no check */
+1:
+	sll	a4, a5, t4
+	addi	a1, a1, -SZREG
+	REG_L	a5, 0(a1)
+	srl	a2, a5, t3
+	or	a2, a2, a4
+	addi	a0, a0, -SZREG
+	REG_S	a2, 0(a0)
+	bgtu	a0, t0, 1b
+
+	/* Update pointers to correct value */
+	addi	t0, t0, -(SZREG-1)
+	add	a1, a1, a3
+
+	j	.Lbyte_copy_tail
+
 END(__memmove)

From patchwork Thu May 13 08:46:18 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bin Meng <bmeng.cn@gmail.com>
X-Patchwork-Id: 1477962
X-Patchwork-Delegate: uboot@andestech.com
Return-Path: <u-boot-bounces@lists.denx.de>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=lists.denx.de
 (client-ip=85.214.62.61; helo=phobos.denx.de;
 envelope-from=u-boot-bounces@lists.denx.de; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256
 header.s=20161025 header.b=Bk4+qKJg;
	dkim-atps=neutral
Received: from phobos.denx.de (phobos.denx.de [85.214.62.61])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits))
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4FgldB5tfbz9sWX
	for <incoming@patchwork.ozlabs.org>; Thu, 13 May 2021 18:46:38 +1000 (AEST)
Received: from h2850616.stratoserver.net (localhost [IPv6:::1])
	by phobos.denx.de (Postfix) with ESMTP id ED02B81D48;
	Thu, 13 May 2021 10:46:28 +0200 (CEST)
Authentication-Results: phobos.denx.de;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: phobos.denx.de;
 spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de
Authentication-Results: phobos.denx.de;
	dkim=pass (2048-bit key;
 unprotected) header.d=gmail.com header.i=@gmail.com header.b="Bk4+qKJg";
	dkim-atps=neutral
Received: by phobos.denx.de (Postfix, from userid 109)
 id 0AD0F81D1B; Thu, 13 May 2021 10:46:23 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on phobos.denx.de
X-Spam-Level: 
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
 DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,SPF_HELO_NONE autolearn=ham
 autolearn_force=no version=3.4.2
Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com
 [IPv6:2a00:1450:4864:20::635])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits))
 (No client certificate requested)
 by phobos.denx.de (Postfix) with ESMTPS id 6854081CDE
 for <u-boot@lists.denx.de>; Thu, 13 May 2021 10:46:20 +0200 (CEST)
Authentication-Results: phobos.denx.de;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: phobos.denx.de;
 spf=pass smtp.mailfrom=bmeng.cn@gmail.com
Received: by mail-ej1-x635.google.com with SMTP id l1so2823859ejb.6
 for <u-boot@lists.denx.de>; Thu, 13 May 2021 01:46:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=NKuBgTHKxqNm9VS74707zoFRPzNJe8IcP8wZt24zjeQ=;
 b=Bk4+qKJgeYPzqkpQkK3uuYutvMbsJ1j960ZZHIM3rrXDit+ESetIC1ww181KZxW5v2
 nWEbV/X7qISXm/xOzzzq7DPsfBmuSm/8Ott6dmcobZ0NwRgskxOm1WdFpjdgUoQTfE7z
 rgBPSM8WRN3OBLPUy/vMz7lXe7s7KFh2nPM/308Va/v26YMAppqpPirQ406+52xRAas4
 K0PeKwXP+C4/cf0qA1m5eJCjw25i5t+bzuNy9S3icdT5SDBcoNW4eLUx33mX47DaiMSz
 ckeDsAOEDD2HHMO9JUupv6PssBR3mg/0/E0LlxNBnY27I/vlTUMy2uV2aMSUKOEPwTdq
 UFmQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=NKuBgTHKxqNm9VS74707zoFRPzNJe8IcP8wZt24zjeQ=;
 b=DcbWjZn1bXzrpJTs1yFyi7+kPoCRzzXwmmIbW1R0a2QspuyZAsr/3L7s3RED4GKryI
 hvm/mIjJZWKuum8Iy36KEn8f8XgDzu08/XIuMIs8C51TVUlcQ0qVS8KJuYFIFyl+Hxp4
 +L5fedFG8WKRBKtCMj5MNwXqHopmlBH0z4iq8BD1C/vTfHU6M3IJbm+lDBMKpAMZ7JvM
 B5G2qw7gejRQDR7jL/eZ/zmuoD+XDYTq/o3KSS+c++q7Zp+Wiqbs9xYYjnvMm22uSHFF
 v9J7NBsEWynI18CloJZ8ceMGoczu3L89rmAIiDvqjVaUxmi+RiRK3Lf9GDN4WS1BlbV9
 5jrQ==
X-Gm-Message-State: AOAM530GMktOnRiaCsY25CrMd9wtBuAVu3rZ86e6sQSSsA0aOG5OmOIl
 URrZ1pH45CH+27zuVwb6+gc=
X-Google-Smtp-Source: 
 ABdhPJz2xEt+4JL2/4yo046gxmKhaf/yqNdoQvvWy7SGWk6qmf2RIkWnZTFTCne/IbBQwZm9PeVbDw==
X-Received: by 2002:a17:906:ae8f:: with SMTP id
 md15mr42823741ejb.244.1620895580112;
 Thu, 13 May 2021 01:46:20 -0700 (PDT)
Received: from pek-vx-bsp2.wrs.com
 (ec2-44-242-66-180.us-west-2.compute.amazonaws.com. [44.242.66.180])
 by smtp.gmail.com with ESMTPSA id p14sm1895141eds.28.2021.05.13.01.46.17
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Thu, 13 May 2021 01:46:19 -0700 (PDT)
From: Bin Meng <bmeng.cn@gmail.com>
To: Rick Chen <rick@andestech.com>,
	u-boot@lists.denx.de
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Subject: [PATCH 2/2] riscv: Group assembly optimized implementation of memory
 routines into a submenu
Date: Thu, 13 May 2021 16:46:18 +0800
Message-Id: <20210513084618.2161331-2-bmeng.cn@gmail.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20210513084618.2161331-1-bmeng.cn@gmail.com>
References: <20210513084618.2161331-1-bmeng.cn@gmail.com>
MIME-Version: 1.0
X-BeenThere: u-boot@lists.denx.de
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: U-Boot discussion <u-boot.lists.denx.de>
List-Unsubscribe: <https://lists.denx.de/options/u-boot>,
 <mailto:u-boot-request@lists.denx.de?subject=unsubscribe>
List-Archive: <https://lists.denx.de/pipermail/u-boot/>
List-Post: <mailto:u-boot@lists.denx.de>
List-Help: <mailto:u-boot-request@lists.denx.de?subject=help>
List-Subscribe: <https://lists.denx.de/listinfo/u-boot>,
 <mailto:u-boot-request@lists.denx.de?subject=subscribe>
Errors-To: u-boot-bounces@lists.denx.de
Sender: "U-Boot" <u-boot-bounces@lists.denx.de>
X-Virus-Scanned: clamav-milter 0.102.4 at phobos.denx.de
X-Virus-Status: Clean

Currently all assembly optimized implementation of memory routines
show up at the top level of the RISC-V architecture Kconfig menu.
Let's group them together into a submenu.

Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>
---

 arch/riscv/Kconfig | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 82e10da11e..63665d210c 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -278,6 +278,8 @@ config STACK_SIZE_SHIFT
 config OF_BOARD_FIXUP
 	default y if OF_SEPARATE && RISCV_SMODE
 
+menu "Use assembly optimized implementation of memory routines"
+
 config USE_ARCH_MEMCPY
 	bool "Use an assembly optimized implementation of memcpy"
 	default y
@@ -357,3 +359,5 @@ config TPL_USE_ARCH_MEMSET
 	  but may increase the binary size.
 
 endmenu
+
+endmenu