Patchwork [U-Boot,v3,1/2] Optimized nand_read_buf for kirkwood

login
register
mail settings
Submitter Phil Sutter
Date June 26, 2013, 6:25 p.m.
Message ID <1372271126-2642-2-git-send-email-phil.sutter@viprinet.com>
Download mbox | patch
Permalink /patch/254834/
State Changes Requested
Delegated to: Scott Wood
Headers show

Comments

Phil Sutter - June 26, 2013, 6:25 p.m.
The basic idea is taken from the linux-kernel, but further optimized.

First align the buffer to 8 bytes, then use ldrd/strd to read and store
in 8 byte quantities, then do the final bytes.

Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'.
Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this
patch in place, reading the same amount of data was done in 27s
(~4.89MB/s). So read performance is increased by ~80%!

Signed-off-by: Nico Erfurth <ne@erfurth.eu>
Tested-by: Phil Sutter <phil.sutter@viprinet.com>
Cc: Prafulla Wadaskar <prafulla@marvell.com>
---
 drivers/mtd/nand/kirkwood_nand.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
Albert ARIBAUD - June 27, 2013, 10:02 a.m.
Hi Phil,

On Wed, 26 Jun 2013 20:25:25 +0200, Phil Sutter
<phil.sutter@viprinet.com> wrote:

> The basic idea is taken from the linux-kernel, but further optimized.
> 
> First align the buffer to 8 bytes, then use ldrd/strd to read and store
> in 8 byte quantities, then do the final bytes.
> 
> Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'.
> Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this
> patch in place, reading the same amount of data was done in 27s
> (~4.89MB/s). So read performance is increased by ~80%!
> 
> Signed-off-by: Nico Erfurth <ne@erfurth.eu>
> Tested-by: Phil Sutter <phil.sutter@viprinet.com>
> Cc: Prafulla Wadaskar <prafulla@marvell.com>
> ---

Patch history missing.

>  drivers/mtd/nand/kirkwood_nand.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/drivers/mtd/nand/kirkwood_nand.c b/drivers/mtd/nand/kirkwood_nand.c
> index 0a99a10..85ea5d2 100644
> --- a/drivers/mtd/nand/kirkwood_nand.c
> +++ b/drivers/mtd/nand/kirkwood_nand.c
> @@ -38,6 +38,37 @@ struct kwnandf_registers {
>  static struct kwnandf_registers *nf_reg =
>  	(struct kwnandf_registers *)KW_NANDF_BASE;
>  
> +
> +/*
> + * The basic idea is stolen from the linux kernel, but the inner loop is
> + * optimized a bit more.
> + */
> +static void kw_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
> +{
> +	struct nand_chip *chip = mtd->priv;
> +
> +	while (len && (unsigned long)buf & 7) {
> +		*buf++ = readb(chip->IO_ADDR_R);
> +		len--;
> +	};
> +
> +	/* This loop reads and writes 64bit per round. */
> +	asm volatile (
> +		"1:\n"
> +		"  subs   %0, #8\n"
> +		"  ldrpld r2, [%2]\n"
> +		"  strpld r2, [%1], #8\n"
> +		"  bhi    1b\n"
> +		"  addne  %0, #8\n"
> +		: "+&r" (len), "+&r" (buf)
> +		: "r" (chip->IO_ADDR_R)
> +		: "r2", "r3", "memory", "cc"
> +	);

Are assembler instructions *really* required? IOW, can you not get
enough performance simply with a cleverly written C loop?

> +	while (len--)
> +		*buf++ = readb(chip->IO_ADDR_R);
> +}
> +
>  /*
>   * hardware specific access to control-lines/bits
>   */
> @@ -80,6 +111,7 @@ int board_nand_init(struct nand_chip *nand)
>  	nand->ecc.mode = NAND_ECC_SOFT;
>  #endif
>  	nand->cmd_ctrl = kw_nand_hwcontrol;
> +	nand->read_buf = kw_nand_read_buf;
>  	nand->chip_delay = 40;
>  	nand->select_chip = kw_nand_select_chip;
>  	return 0;


Amicalement,
Scott Wood - Aug. 19, 2013, 11:29 p.m.
On Wed, Jun 26, 2013 at 08:25:25PM +0200, Phil Sutter wrote:
> The basic idea is taken from the linux-kernel, but further optimized.
> 
> First align the buffer to 8 bytes, then use ldrd/strd to read and store
> in 8 byte quantities, then do the final bytes.
> 
> Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'.
> Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this
> patch in place, reading the same amount of data was done in 27s
> (~4.89MB/s). So read performance is increased by ~80%!
> 
> Signed-off-by: Nico Erfurth <ne@erfurth.eu>
> Tested-by: Phil Sutter <phil.sutter@viprinet.com>
> Cc: Prafulla Wadaskar <prafulla@marvell.com>

Missing your signoff, and if Nico was the main author then there should
be a From: line indicating that.

-Scott

Patch

diff --git a/drivers/mtd/nand/kirkwood_nand.c b/drivers/mtd/nand/kirkwood_nand.c
index 0a99a10..85ea5d2 100644
--- a/drivers/mtd/nand/kirkwood_nand.c
+++ b/drivers/mtd/nand/kirkwood_nand.c
@@ -38,6 +38,37 @@  struct kwnandf_registers {
 static struct kwnandf_registers *nf_reg =
 	(struct kwnandf_registers *)KW_NANDF_BASE;
 
+
+/*
+ * The basic idea is stolen from the linux kernel, but the inner loop is
+ * optimized a bit more.
+ */
+static void kw_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
+{
+	struct nand_chip *chip = mtd->priv;
+
+	while (len && (unsigned long)buf & 7) {
+		*buf++ = readb(chip->IO_ADDR_R);
+		len--;
+	};
+
+	/* This loop reads and writes 64bit per round. */
+	asm volatile (
+		"1:\n"
+		"  subs   %0, #8\n"
+		"  ldrpld r2, [%2]\n"
+		"  strpld r2, [%1], #8\n"
+		"  bhi    1b\n"
+		"  addne  %0, #8\n"
+		: "+&r" (len), "+&r" (buf)
+		: "r" (chip->IO_ADDR_R)
+		: "r2", "r3", "memory", "cc"
+	);
+
+	while (len--)
+		*buf++ = readb(chip->IO_ADDR_R);
+}
+
 /*
  * hardware specific access to control-lines/bits
  */
@@ -80,6 +111,7 @@  int board_nand_init(struct nand_chip *nand)
 	nand->ecc.mode = NAND_ECC_SOFT;
 #endif
 	nand->cmd_ctrl = kw_nand_hwcontrol;
+	nand->read_buf = kw_nand_read_buf;
 	nand->chip_delay = 40;
 	nand->select_chip = kw_nand_select_chip;
 	return 0;