Patchwork mtd/nand: use string library

login
register
mail settings
Submitter Akinobu Mita
Date Jan. 27, 2012, 2:24 p.m.
Message ID <1327674295-3700-2-git-send-email-akinobu.mita@gmail.com>
Download mbox | patch
Permalink /patch/138231/
State New
Headers show

Comments

Akinobu Mita - Jan. 27, 2012, 2:24 p.m.
- Use memchr_inv to check if the data contains all 0xFF bytes.
  It is faster than looping for each byte.

- Use memcmp to compare memory areas

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jiandong Zheng <jdzheng@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
---
 drivers/mtd/nand/davinci_nand.c |    7 +++----
 drivers/mtd/nand/denali.c       |    7 +++----
 drivers/mtd/nand/diskonchip.c   |   18 +++---------------
 drivers/mtd/nand/nand_bbt.c     |   16 +++++-----------
 drivers/mtd/nand/nand_bcm_umi.c |   10 +++-------
 5 files changed, 17 insertions(+), 41 deletions(-)
Joe Perches - Jan. 27, 2012, 5:16 p.m.
On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote:
> - Use memchr_inv to check if the data contains all 0xFF bytes.
>   It is faster than looping for each byte.

Stupid question:

Are there any mtd devices modified that are slower
at 64 bit accesses than repeated 8 bit accesses?
Brian Norris - Jan. 27, 2012, 6:52 p.m.
On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote:
> On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote:
>> - Use memchr_inv to check if the data contains all 0xFF bytes.
>>   It is faster than looping for each byte.
>
> Stupid question:
>
> Are there any mtd devices modified that are slower
> at 64 bit accesses than repeated 8 bit accesses?

I believe this patch deals with kernel buffers, not any kind of direct
access to the MTD, so the question (which is not stupid IMO) should be
regarding CPU architectures. And my educated guess is that 64-bit
access should not be any slower. I do know that 8-bit access *is*
slower for some relevant architectures.

Brian
Akinobu Mita - Jan. 27, 2012, 11:52 p.m.
2012/1/28 Brian Norris <computersforpeace@gmail.com>:
> On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote:
>> On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote:
>>> - Use memchr_inv to check if the data contains all 0xFF bytes.
>>>   It is faster than looping for each byte.
>>
>> Stupid question:
>>
>> Are there any mtd devices modified that are slower
>> at 64 bit accesses than repeated 8 bit accesses?
>
> I believe this patch deals with kernel buffers, not any kind of direct
> access to the MTD, so the question (which is not stupid IMO) should be
> regarding CPU architectures. And my educated guess is that 64-bit
> access should not be any slower. I do know that 8-bit access *is*
> slower for some relevant architectures.

It could be slower when the number of bytes scanned is very small
(found a unmatched character immediately, or the size of the area
is very small), because memchr_inv() needs to generate a 64bit pattern
to compare before starting the loop.  I recalled that Eric Dumazet
pointed out it could generate the 64bit pattern more efficiently.
(https://lkml.org/lkml/2011/8/8/480)

Even if that small scanning is slower, this change can be assumed cleanup
patch that simplifies the code.
Brian Norris - Jan. 31, 2012, 5:57 p.m.
On Fri, Jan 27, 2012 at 3:52 PM, Akinobu Mita <akinobu.mita@gmail.com> wrote:
> 2012/1/28 Brian Norris <computersforpeace@gmail.com>:
>> On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote:
>>> On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote:
>>>> - Use memchr_inv to check if the data contains all 0xFF bytes.
>>>>   It is faster than looping for each byte.
>>>
>>> Stupid question:
>>>
>>> Are there any mtd devices modified that are slower
>>> at 64 bit accesses than repeated 8 bit accesses?
>>
>> I believe this patch deals with kernel buffers, not any kind of direct
>> access to the MTD, so the question (which is not stupid IMO) should be
>> regarding CPU architectures. And my educated guess is that 64-bit
>> access should not be any slower. I do know that 8-bit access *is*
>> slower for some relevant architectures.
>
> It could be slower when the number of bytes scanned is very small
> (found a unmatched character immediately, or the size of the area
> is very small), because memchr_inv() needs to generate a 64bit pattern
> to compare before starting the loop.  I recalled that Eric Dumazet
> pointed out it could generate the 64bit pattern more efficiently.
> (https://lkml.org/lkml/2011/8/8/480)
>
> Even if that small scanning is slower, this change can be assumed cleanup
> patch that simplifies the code.

Well, I agree that it qualifies as cleanup as well, but we should at
least make an attempt not to cause performance regression...

So by my understanding, the use of memchr_inv() is on buffers of
minimum length of 10 in this patch, so we're likely to have decent
results. And memcmp() usage looks fine to me.

So unless other concerns arise:

Acked-by: Brian Norris <computersforpeace@gmail.com>
Akinobu Mita - Feb. 1, 2012, 1:11 p.m.
2012/2/1 Brian Norris <computersforpeace@gmail.com>:
> On Fri, Jan 27, 2012 at 3:52 PM, Akinobu Mita <akinobu.mita@gmail.com> wrote:
>> 2012/1/28 Brian Norris <computersforpeace@gmail.com>:
>>> On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote:
>>>> On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote:
>>>>> - Use memchr_inv to check if the data contains all 0xFF bytes.
>>>>>   It is faster than looping for each byte.
>>>>
>>>> Stupid question:
>>>>
>>>> Are there any mtd devices modified that are slower
>>>> at 64 bit accesses than repeated 8 bit accesses?
>>>
>>> I believe this patch deals with kernel buffers, not any kind of direct
>>> access to the MTD, so the question (which is not stupid IMO) should be
>>> regarding CPU architectures. And my educated guess is that 64-bit
>>> access should not be any slower. I do know that 8-bit access *is*
>>> slower for some relevant architectures.
>>
>> It could be slower when the number of bytes scanned is very small
>> (found a unmatched character immediately, or the size of the area
>> is very small), because memchr_inv() needs to generate a 64bit pattern
>> to compare before starting the loop.  I recalled that Eric Dumazet
>> pointed out it could generate the 64bit pattern more efficiently.
>> (https://lkml.org/lkml/2011/8/8/480)
>>
>> Even if that small scanning is slower, this change can be assumed cleanup
>> patch that simplifies the code.
>
> Well, I agree that it qualifies as cleanup as well, but we should at
> least make an attempt not to cause performance regression...
>
> So by my understanding, the use of memchr_inv() is on buffers of
> minimum length of 10 in this patch, so we're likely to have decent
> results. And memcmp() usage looks fine to me.

Sorry, I answered without checking memchr_inv() carefully.  If the size
of buffer is less than 16 bytes, memchr_inv() scans for each byte as the
original code did.  So it is unlikely to be slower in the most cases.

But I mentioned in the previous email, there are some problems in
memchr_inv().  I'll send the patch in a few days.

> So unless other concerns arise:
>
> Acked-by: Brian Norris <computersforpeace@gmail.com>

Thanks.

Patch

diff --git a/drivers/mtd/nand/davinci_nand.c b/drivers/mtd/nand/davinci_nand.c
index 6e56615..423a7cc 100644
--- a/drivers/mtd/nand/davinci_nand.c
+++ b/drivers/mtd/nand/davinci_nand.c
@@ -315,10 +315,9 @@  static int nand_davinci_correct_4bit(struct mtd_info *mtd,
 	unsigned long timeo;
 
 	/* All bytes 0xff?  It's an erased page; ignore its ECC. */
-	for (i = 0; i < 10; i++) {
-		if (ecc_code[i] != 0xff)
-			goto compare;
-	}
+	if (memchr_inv(ecc_code, 0xff, 10))
+		goto compare;
+
 	return 0;
 
 compare:
diff --git a/drivers/mtd/nand/denali.c b/drivers/mtd/nand/denali.c
index 3984d48..a87e3a4 100644
--- a/drivers/mtd/nand/denali.c
+++ b/drivers/mtd/nand/denali.c
@@ -908,10 +908,9 @@  static void read_oob_data(struct mtd_info *mtd, uint8_t *buf, int page)
  */
 bool is_erased(uint8_t *buf, int len)
 {
-	int i = 0;
-	for (i = 0; i < len; i++)
-		if (buf[i] != 0xFF)
-			return false;
+	if (memchr_inv(buf, 0xFF, len))
+		return false;
+
 	return true;
 }
 #define ECC_SECTOR_SIZE 512
diff --git a/drivers/mtd/nand/diskonchip.c b/drivers/mtd/nand/diskonchip.c
index df921e7..bf0602e 100644
--- a/drivers/mtd/nand/diskonchip.c
+++ b/drivers/mtd/nand/diskonchip.c
@@ -945,12 +945,8 @@  static int doc200x_calculate_ecc(struct mtd_info *mtd, const u_char *dat, unsign
 		/* Note: this somewhat expensive test should not be triggered
 		   often.  It could be optimized away by examining the data in
 		   the writebuf routine, and remembering the result. */
-		for (i = 0; i < 512; i++) {
-			if (dat[i] == 0xff)
-				continue;
+		if (memchr_inv(dat, 0xff, 512))
 			emptymatch = 0;
-			break;
-		}
 	}
 	/* If emptymatch still =1, we do have an all-0xff data buffer.
 	   Return all-0xff ecc value instead of the computed one, so
@@ -1000,24 +996,16 @@  static int doc200x_correct_data(struct mtd_info *mtd, u_char *dat,
 		/* If emptymatch=1, the read syndrome is consistent with an
 		   all-0xff data and stored ecc block.  Check the stored ecc. */
 		if (emptymatch) {
-			for (i = 0; i < 6; i++) {
-				if (read_ecc[i] == 0xff)
-					continue;
+			if (memchr_inv(read_ecc, 0xff, 6))
 				emptymatch = 0;
-				break;
-			}
 		}
 		/* If emptymatch still =1, check the data block. */
 		if (emptymatch) {
 			/* Note: this somewhat expensive test should not be triggered
 			   often.  It could be optimized away by examining the data in
 			   the readbuf routine, and remembering the result. */
-			for (i = 0; i < 512; i++) {
-				if (dat[i] == 0xff)
-					continue;
+			if (memchr_inv(dat, 0xff, 512))
 				emptymatch = 0;
-				break;
-			}
 		}
 		/* If emptymatch still =1, this is almost certainly a freshly-
 		   erased block, in which case the ECC will not come out right.
diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
index 20a112f..a1b2dd5 100644
--- a/drivers/mtd/nand/nand_bbt.c
+++ b/drivers/mtd/nand/nand_bbt.c
@@ -100,10 +100,8 @@  static int check_pattern(uint8_t *buf, int len, int paglen, struct nand_bbt_desc
 
 	end = paglen + td->offs;
 	if (td->options & NAND_BBT_SCANEMPTY) {
-		for (i = 0; i < end; i++) {
-			if (p[i] != 0xff)
-				return -1;
-		}
+		if (memchr_inv(p, 0xff, end))
+			return -1;
 	}
 	p += end;
 
@@ -133,14 +131,10 @@  static int check_pattern(uint8_t *buf, int len, int paglen, struct nand_bbt_desc
  */
 static int check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td)
 {
-	int i;
-	uint8_t *p = buf;
-
 	/* Compare the pattern */
-	for (i = 0; i < td->len; i++) {
-		if (p[td->offs + i] != td->pattern[i])
-			return -1;
-	}
+	if (memcmp(buf + td->offs, td->pattern, td->len))
+		return -1;
+
 	return 0;
 }
 
diff --git a/drivers/mtd/nand/nand_bcm_umi.c b/drivers/mtd/nand/nand_bcm_umi.c
index 46a6bc9..ad51dad 100644
--- a/drivers/mtd/nand/nand_bcm_umi.c
+++ b/drivers/mtd/nand/nand_bcm_umi.c
@@ -109,13 +109,9 @@  int nand_bcm_umi_bch_correct_page(uint8_t *datap, uint8_t *readEccData,
 	 * see if errors are correctible
 	 */
 	if ((regValue & REG_UMI_BCH_CTRL_STATUS_UNCORR_ERR) > 0) {
-		int i;
-
-		for (i = 0; i < numEccBytes; i++) {
-			if (readEccData[i] != 0xff) {
-				/* errors cannot be fixed, return -1 */
-				return -1;
-			}
+		if (memchr_inv(readEccData, 0xff, numEccBytes)) {
+			/* errors cannot be fixed, return -1 */
+			return -1;
 		}
 		/* If ECC is unprogrammed then we can't correct,
 		 * assume everything OK */