Patchwork nand_update_bbt fix

login
register
mail settings
Submitter Andrew McKay
Date Aug. 10, 2009, 11:04 p.m.
Message ID <4A80A760.20901@iders.ca>
Download mbox | patch
Permalink /patch/31111/
State New, archived
Headers show

Comments

Andrew McKay - Aug. 10, 2009, 11:04 p.m.
Hello,

For NAND parts with 2K pages or larger, kmalloc of one erase block will exceed 
128K and fail.  A vmalloc is used in nand_scan_bbt to allocate enough memory for 
one erase block.  This should likely be the case for nand_update_bbt as well.

Not sure the proper way to submit a patch, but here's a diff of the change:

Andrew McKay
Iders Inc.
Artem Bityutskiy - Aug. 11, 2009, 3:39 p.m.
On Mon, 2009-08-10 at 18:04 -0500, Andrew McKay wrote:
> Hello,
> 
> For NAND parts with 2K pages or larger, kmalloc of one erase block will exceed 
> 128K and fail.  A vmalloc is used in nand_scan_bbt to allocate enough memory for 
> one erase block.  This should likely be the case for nand_update_bbt as well.

I think this is not about parts with 2K pages, but about parts with
256KiB eraseblocks.

> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
> index 55c23e5..43b8d08 100644
> --- a/drivers/mtd/nand/nand_bbt.c
> +++ b/drivers/mtd/nand/nand_bbt.c
> @@ -1030,7 +1030,7 @@ int nand_update_bbt(struct mtd_info *mtd, loff_t offs)
>    /* Allocate a temporary buffer for one eraseblock incl. oob */
>    len = (1 << this->bbt_erase_shift);
>    len += (len >> this->page_shift) * mtd->oobsize;
> - buf = kmalloc(len, GFP_KERNEL);
> + buf = vmalloc(len);
>    if (!buf) {
>      printk(KERN_ERR "nand_update_bbt: Out of memory\n");
>      return -ENOMEM;
> @@ -1063,7 +1063,7 @@ int nand_update_bbt(struct mtd_info *mtd, loff_t offs)
>    }
> 
>    out:
> - kfree(buf);
> + vfree(buf);
>    return res;
>   }

I would disagree with this patch. Other people are fighting against
vmalloc, because they want to do DMA in their drivers, but they cannot.
I've CCed Zhang who is doing the opposite to what you do.

I think you should instead split the array on several smaller parts
and work with those parts. And since this is a very common task in
MTD, it is better to create a helper library.

You may also take a look at what I suggested to Zhang:
http://lists.infradead.org/pipermail/linux-mtd/2009-August/026845.html
Andrew McKay - Aug. 11, 2009, 4:03 p.m.
Artem Bityutskiy wrote:
> On Mon, 2009-08-10 at 18:04 -0500, Andrew McKay wrote:
>> Hello,
>>
>> For NAND parts with 2K pages or larger, kmalloc of one erase block will exceed 
>> 128K and fail.  A vmalloc is used in nand_scan_bbt to allocate enough memory for 
>> one erase block.  This should likely be the case for nand_update_bbt as well.
> 
> I think this is not about parts with 2K pages, but about parts with
> 256KiB eraseblocks.

Yes, you're correct, any part that has an erase block, including OOB, that 
exceeds 128K bytes.

> 
>> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
>> index 55c23e5..43b8d08 100644
>> --- a/drivers/mtd/nand/nand_bbt.c
>> +++ b/drivers/mtd/nand/nand_bbt.c
>> @@ -1030,7 +1030,7 @@ int nand_update_bbt(struct mtd_info *mtd, loff_t offs)
>>    /* Allocate a temporary buffer for one eraseblock incl. oob */
>>    len = (1 << this->bbt_erase_shift);
>>    len += (len >> this->page_shift) * mtd->oobsize;
>> - buf = kmalloc(len, GFP_KERNEL);
>> + buf = vmalloc(len);
>>    if (!buf) {
>>      printk(KERN_ERR "nand_update_bbt: Out of memory\n");
>>      return -ENOMEM;
>> @@ -1063,7 +1063,7 @@ int nand_update_bbt(struct mtd_info *mtd, loff_t offs)
>>    }
>>
>>    out:
>> - kfree(buf);
>> + vfree(buf);
>>    return res;
>>   }
> 
> I would disagree with this patch. Other people are fighting against
> vmalloc, because they want to do DMA in their drivers, but they cannot.
> I've CCed Zhang who is doing the opposite to what you do.

I understand vmalloc is frowned upon in general, but then why is it already 
being used in nand_scan_bbt?  Any fix that avoids vmalloc for nand_update_bbt 
should be used in nand_scan_bbt as well.

> I think you should instead split the array on several smaller parts
> and work with those parts. And since this is a very common task in
> MTD, it is better to create a helper library.
> 
> You may also take a look at what I suggested to Zhang:
> http://lists.infradead.org/pipermail/linux-mtd/2009-August/026845.html

Thanks, I will take a look.

Andrew
Artem Bityutskiy - Aug. 12, 2009, 5:51 a.m.
On 08/11/2009 07:03 PM, Andrew McKay wrote:
>>> out:
>>> - kfree(buf);
>>> + vfree(buf);
>>> return res;
>>> }
>>
>> I would disagree with this patch. Other people are fighting against
>> vmalloc, because they want to do DMA in their drivers, but they cannot.
>> I've CCed Zhang who is doing the opposite to what you do.
>
> I understand vmalloc is frowned upon in general, but then why is it
> already being used in nand_scan_bbt? Any fix that avoids vmalloc for
> nand_update_bbt should be used in nand_scan_bbt as well.

My position is:
   * vmalloc is a problem because it prevents DMA
   * kmalloc is a problem because large allocations of contiguous memory
     are impossible

Thus, I think people should invent some nice solution for the whole issue
instead of turning vmalloc's into kmallocks and back and forth. I'm CCing
David Brownell because AFAIR he was discussing similar things on lkml some
time ago.
David Brownell - Aug. 12, 2009, 5:37 p.m.
On Tuesday 11 August 2009, Artem Bityutskiy wrote:
> My position is:
>    * vmalloc is a problem because it prevents DMA
>    * kmalloc is a problem because large allocations of contiguous memory
>      are impossible
> 
> Thus, I think people should invent some nice solution for the whole issue
> instead of turning vmalloc's into kmallocks and back and forth.

BBT is a constrained sub-problem, but not the only one.

(Another BBT issue:  I've thought that with MLC chips and their
small limits on number-of-erases, the current waste of BBT pages
deserves more attention.  On a 2 GByte chip with 4KB pages and
blocks at 256KB, each block could hold 64 BBT versions, with
newer ones after older ones, even at one-per-page.  But today's
BBT code is dumb:  one-per-block.  That's a lot of needless and
extra erasures for BBT blocks...)


> I'm CCing 
> David Brownell because AFAIR he was discussing similar things on lkml some
> time ago.

The MTD stack is DMA-unfriendly today.

The issue I saw was with SPI flash chips, where the underlying
SPI master controller often uses DMA ... causing trouble for
certain code paths through MTD (or was it just JFFS2?).

I'm not sure it's well understood which things are DMA-unsafe.
Using vmalloc is one problem.  There's also code making more
subtle assumptions, which dma-incoherent caches will break...

There are two issues I see.  One is "does it work at all";
that was the SPI issue.  Another is performance ... it's
easy for DMA overheads to be excessive:  DMA setup/teardown
can cost more than the PIO code would, even ignoring cache
operation costs.  Fixing performance will likely require
interface changes. 

Either issue will need care to fix.  Given that NOR is now
becoming less prevalent, I'd suggest that "people" doing
such work focus on large-page NAND.

- Dave
Artem Bityutskiy - Aug. 13, 2009, 5:48 a.m.
On Wed, 2009-08-12 at 10:37 -0700, David Brownell wrote:
> On Tuesday 11 August 2009, Artem Bityutskiy wrote:
> > My position is:
> >    * vmalloc is a problem because it prevents DMA
> >    * kmalloc is a problem because large allocations of contiguous memory
> >      are impossible
> > 
> > Thus, I think people should invent some nice solution for the whole issue
> > instead of turning vmalloc's into kmallocks and back and forth.
> 
> BBT is a constrained sub-problem, but not the only one.
> 
> (Another BBT issue:  I've thought that with MLC chips and their
> small limits on number-of-erases, the current waste of BBT pages
> deserves more attention.  On a 2 GByte chip with 4KB pages and
> blocks at 256KB, each block could hold 64 BBT versions, with
> newer ones after older ones, even at one-per-page.  But today's
> BBT code is dumb:  one-per-block.  That's a lot of needless and
> extra erasures for BBT blocks...)

Agree. IMO, today's MTD is fits the "ancient crap" classification, and
badly needs some brave knight who would improve it.

> > I'm CCing 
> > David Brownell because AFAIR he was discussing similar things on lkml some
> > time ago.
> 
> The MTD stack is DMA-unfriendly today.
> 
> The issue I saw was with SPI flash chips, where the underlying
> SPI master controller often uses DMA ... causing trouble for
> certain code paths through MTD (or was it just JFFS2?).

UBI / UBIFS also send vmalloc'ed buffers :-)

Patch

diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
index 55c23e5..43b8d08 100644
--- a/drivers/mtd/nand/nand_bbt.c
+++ b/drivers/mtd/nand/nand_bbt.c
@@ -1030,7 +1030,7 @@  int nand_update_bbt(struct mtd_info *mtd, loff_t offs)
   /* Allocate a temporary buffer for one eraseblock incl. oob */
   len = (1 << this->bbt_erase_shift);
   len += (len >> this->page_shift) * mtd->oobsize;
- buf = kmalloc(len, GFP_KERNEL);
+ buf = vmalloc(len);
   if (!buf) {
     printk(KERN_ERR "nand_update_bbt: Out of memory\n");
     return -ENOMEM;
@@ -1063,7 +1063,7 @@  int nand_update_bbt(struct mtd_info *mtd, loff_t offs)
   }

   out:
- kfree(buf);
+ vfree(buf);
   return res;
  }