diff mbox

[2/2] mtd: orion-nand: fix build error with ARMv4

Message ID 20140509184505.GA30330@arch.cereza
State Deferred
Headers show

Commit Message

Ezequiel Garcia May 9, 2014, 6:45 p.m. UTC
On 08 May 04:56 PM, Arnd Bergmann wrote:
> orion_nand_read_buf uses an inline assembly with the "ldrd"
> instruction, which is only available from ARMv5 upwards. This
> used to be fine, since all users have an ARMv5 or ARMv7 CPU,
> but now we can also build a multiplatform kernel with ARMv4
> support enabled in addition to the "kirkwood" (mvebu) platform.
> 
> This provides an alternative to call the readsl() function that
> is supposed to have the same effect and is also optimized for
> performance.
> 
> This patch is untested, and it would be worthwhile to check
> if there is any performance impact, especially in case the readsl
> version is actually faster.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Brian Norris <computersforpeace@gmail.com>
> Cc: Jingoo Han <jg1.han@samsung.com>
> Cc: linux-mtd@lists.infradead.org
> ---
>  drivers/mtd/nand/orion_nand.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/mtd/nand/orion_nand.c b/drivers/mtd/nand/orion_nand.c
> index dd7fe81..c7b5e8a 100644
> --- a/drivers/mtd/nand/orion_nand.c
> +++ b/drivers/mtd/nand/orion_nand.c
> @@ -56,6 +56,7 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
>  		*buf++ = readb(io_base);
>  		len--;
>  	}
> +#if __LINUX_ARM_ARCH__ >= 5
>  	buf64 = (uint64_t *)buf;
>  	while (i < len/8) {
>  		/*
> @@ -68,6 +69,10 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
>  		asm volatile ("ldrd\t%0, [%1]" : "=&r" (x) : "r" (io_base));
>  		buf64[i++] = x;
>  	}
> +#else
> +	readsl(io_base, buf, len/8);

I gave this a try in order to answer Arnd's performance question. First of all,
the patch seems wrong. I guess it's because readsl reads 4-bytes pieces, instead of
8-bytes.

This patch below is tested (but not completely, see below) and works:


However, all the reads are nicely aligned (in both the buffer and the
length) which means the only 'read' performed in the readsl() one.

In other words, the patch is still half-untested. Therefore, and given
this is meant only to coherce a build, maybe we'd rather just loop over
readb and stay on the safe side?

And now, answering Arnd's question:

# Using ldrd
# time nanddump /dev/mtd5 -f /dev/null -q
real	0m 5.90s
user	0m 0.22s
sys	0m 5.67s

# Using readsl
# time nanddump /dev/mtd5 -f /dev/null -q
real	0m 6.39s
user	0m 0.17s
sys	0m 6.20s

So I'd say, let's stick to the ldrd magic.

Comments

Geert Uytterhoeven May 9, 2014, 7:29 p.m. UTC | #1
On Fri, May 9, 2014 at 8:45 PM, Ezequiel Garcia
<ezequiel.garcia@free-electrons.com> wrote:
> --- a/drivers/mtd/nand/orion_nand.c
> +++ b/drivers/mtd/nand/orion_nand.c
> @@ -52,6 +52,7 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
>         uint64_t *buf64;
>         int i = 0;
>
> +#if __LINUX_ARM_ARCH__ >= 5
>         while (len && (unsigned long)buf & 7) {
>                 *buf++ = readb(io_base);
>                 len--;
> @@ -69,6 +70,14 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
>                 buf64[i++] = x;
>         }
>         i *= 8;
> +#else
> +       while (len && (unsigned long)buf & 3) {
> +               *buf++ = readb(io_base);
> +               len--;
> +       }
> +       readsl(io_base, buf, len/4);
> +       i = (len / 4 * 4) * 4;

Why multiply by 4 twice? "i" is supposed to be the number of bytes read,
right?

BTW, Arnd's version should just need s/8/4/g to make it work.

> +#endif
>         while (i < len)
>                 buf[i++] = readb(io_base);
>  }

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Arnd Bergmann May 9, 2014, 8:12 p.m. UTC | #2
On Friday 09 May 2014 15:45:05 Ezequiel Garcia wrote:
> On 08 May 04:56 PM, Arnd Bergmann wrote:
> 
> I gave this a try in order to answer Arnd's performance question.

Thanks a lot for testing!

> First of all,
> the patch seems wrong. I guess it's because readsl reads 4-bytes pieces, instead of
> 8-bytes.

Oops. I guess I was thinking of a 64-bit system and didn't even notice
the difference between 4 and 8 byte accesses here. I wonder where I have
my mind sometimes.

> In other words, the patch is still half-untested. Therefore, and given
> this is meant only to coherce a build, maybe we'd rather just loop over
> readb and stay on the safe side?

I guess that would be equal to calling memcpy_fromio().

> And now, answering Arnd's question:
> 
> # Using ldrd
> # time nanddump /dev/mtd5 -f /dev/null -q
> real	0m 5.90s
> user	0m 0.22s
> sys	0m 5.67s
> 
> # Using readsl
> # time nanddump /dev/mtd5 -f /dev/null -q
> real	0m 6.39s
> user	0m 0.17s
> sys	0m 6.20s
> 
> So I'd say, let's stick to the ldrd magic.

Ok, that is a noticeable difference. For scale, what is the size of that partition?
If this is something that actually affects people, it might be worth also trying
memcpy(), which should be better at saturating the bus, but might be wrong here
(if alignment the alignment requirements on the external bus are stricter than
what memcpy does) or it might not make a difference at all if the code is already
ideal.

	Arnd
diff mbox

Patch

diff --git a/drivers/mtd/nand/orion_nand.c b/drivers/mtd/nand/orion_nand.c
index dd7fe81..7a78cc5 100644
--- a/drivers/mtd/nand/orion_nand.c
+++ b/drivers/mtd/nand/orion_nand.c
@@ -52,6 +52,7 @@  static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
 	uint64_t *buf64;
 	int i = 0;
 
+#if __LINUX_ARM_ARCH__ >= 5
 	while (len && (unsigned long)buf & 7) {
 		*buf++ = readb(io_base);
 		len--;
@@ -69,6 +70,14 @@  static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
 		buf64[i++] = x;
 	}
 	i *= 8;
+#else
+	while (len && (unsigned long)buf & 3) {
+		*buf++ = readb(io_base);
+		len--;
+	}
+	readsl(io_base, buf, len/4);
+	i = (len / 4 * 4) * 4;
+#endif
 	while (i < len)
 		buf[i++] = readb(io_base);
 }