Patchwork [v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

login
register
mail settings
Submitter b35362@freescale.com
Date Aug. 18, 2011, 2:33 a.m.
Message ID <1313634783-8855-1-git-send-email-b35362@freescale.com>
Download mbox | patch
Permalink /patch/110496/
State New
Headers show

Comments

b35362@freescale.com - Aug. 18, 2011, 2:33 a.m.
From: Liu Shuo <b35362@freescale.com>

Freescale FCM controller has a 2K size limitation of buffer RAM. In order
to support the Nand flash chip whose page size is larger than 2K bytes,
we divide a page into multi-2K pages for MTD layer driver. In that case,
we force to set the page size to 2K bytes. We convert the page address of
MTD layer driver to a real page address in flash chips and a column index
in fsl_elbc driver. We can issue any column address by UA instruction of
elbc controller.

NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
the Same Page (NOP)', the flash chip which is supported by this workaround 
have to meet below conditions.
	1. page size is not greater than 4KB 
	2.	1) if main area and spare area have independent NOPs:
			  main  area NOP    :    >=3
			  spare area NOP    :    >=2
		2) if main area and spare area have a common NOP: 
			  NOP               :    >=4

Signed-off-by: Liu Shuo <b35362@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
---
 drivers/mtd/nand/fsl_elbc_nand.c |   66 ++++++++++++++++++++++++++++++-------
 1 files changed, 53 insertions(+), 13 deletions(-)
Scott Wood - Aug. 18, 2011, 4:25 p.m.
On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
> From: Liu Shuo <b35362@freescale.com>
> 
> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
> to support the Nand flash chip whose page size is larger than 2K bytes,
> we divide a page into multi-2K pages for MTD layer driver. In that case,
> we force to set the page size to 2K bytes. We convert the page address of
> MTD layer driver to a real page address in flash chips and a column index
> in fsl_elbc driver. We can issue any column address by UA instruction of
> elbc controller.
> 
> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
> the Same Page (NOP)', the flash chip which is supported by this workaround 
> have to meet below conditions.
> 	1. page size is not greater than 4KB 
> 	2.	1) if main area and spare area have independent NOPs:
> 			  main  area NOP    :    >=3
> 			  spare area NOP    :    >=2?

How often are the NOPs split like this?

> 		2) if main area and spare area have a common NOP: 
> 			  NOP               :    >=4

This depends on how the flash is used.  If you treat it as a NOP1 flash
(e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
real 2K chip, you'll need NOP8 for a 4K chip.

The NOP restrictions should be documented in the code itself, not just
in the git changelog.  Maybe print it to the console when this hack is
used, along with the NOP value read from the ID.  If it's less than 4
for 4K or 8 for 8K, also print a message saying not to use jffs2 (does
yaffs2 do similar things?).  If it's less than 2 for 4K or 4 for 8K, the
probe should fail.

-Scott
Matthieu CASTET - Aug. 18, 2011, 5 p.m.
b35362@freescale.com a écrit :
> From: Liu Shuo <b35362@freescale.com>
> 
> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
> to support the Nand flash chip whose page size is larger than 2K bytes,
> we divide a page into multi-2K pages for MTD layer driver. In that case,
> we force to set the page size to 2K bytes. We convert the page address of
> MTD layer driver to a real page address in flash chips and a column index
> in fsl_elbc driver. We can issue any column address by UA instruction of
> elbc controller.
> 
Why do you need to do that ?

When mtd send you a 4k page, why can't you write it by 2*2k pages write ?

Even better send the first 2K and then if your controller allow it send the
remaining 2K without command/address phase.


Matthieu
Scott Wood - Aug. 18, 2011, 6:24 p.m.
On 08/18/2011 12:00 PM, Matthieu CASTET wrote:
> b35362@freescale.com a écrit :
>> From: Liu Shuo <b35362@freescale.com>
>>
>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>> to support the Nand flash chip whose page size is larger than 2K bytes,
>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>> we force to set the page size to 2K bytes. We convert the page address of
>> MTD layer driver to a real page address in flash chips and a column index
>> in fsl_elbc driver. We can issue any column address by UA instruction of
>> elbc controller.
>>
> Why do you need to do that ?
> 
> When mtd send you a 4k page, why can't you write it by 2*2k pages write ?

That would be more complicated given the statefulness of the interface,
for no real benefit.

> Even better send the first 2K and then if your controller allow it send the
> remaining 2K without command/address phase.

IIRC Shuo tried this first and couldn't make it work.

-Scott
Scott Wood - Aug. 18, 2011, 6:27 p.m.
On 08/18/2011 11:25 AM, Scott Wood wrote:
> The NOP restrictions should be documented in the code itself, not just
> in the git changelog.  Maybe print it to the console when this hack is
> used, along with the NOP value read from the ID.  If it's less than 4
> for 4K or 8 for 8K, also print a message saying not to use jffs2 (does
> yaffs2 do similar things?).  If it's less than 2 for 4K or 4 for 8K, the
> probe should fail.

We should also warn the user about the need to prepare the NAND chip by
copying the bad block markers to the OOB of the new layout.

-Scott
b35362@freescale.com - Aug. 19, 2011, 3:20 a.m.
于 2011年08月19日 01:00, Matthieu CASTET 写道:
> b35362@freescale.com a écrit :
>> From: Liu Shuo<b35362@freescale.com>
>>
>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>> to support the Nand flash chip whose page size is larger than 2K bytes,
>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>> we force to set the page size to 2K bytes. We convert the page address of
>> MTD layer driver to a real page address in flash chips and a column index
>> in fsl_elbc driver. We can issue any column address by UA instruction of
>> elbc controller.
>>
> Why do you need to do that ?
>
> When mtd send you a 4k page, why can't you write it by 2*2k pages write ?
1. It's easy to implement.
2. We don't need to move the data in buffer more times, because we
want to use the HW_ECC.

In flash chip per Page:
----------------------------------------------------------------
| first data | first oob | second data | second oob |
----------------------------------------------------------------

> Even better send the first 2K and then if your controller allow it send the
> remaining 2K without command/address phase.
The elbc controller don't allow that.
I have to send twice Program CMD for writing the whole 4KB data.

>
> Matthieu
>
Matthieu CASTET - Aug. 19, 2011, 8:57 a.m.
LiuShuo a écrit :
> 于 2011年08月19日 01:00, Matthieu CASTET 写道:
>> b35362@freescale.com a écrit :
>>> From: Liu Shuo<b35362@freescale.com>
>>>
>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>>> to support the Nand flash chip whose page size is larger than 2K bytes,
>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>>> we force to set the page size to 2K bytes. We convert the page address of
>>> MTD layer driver to a real page address in flash chips and a column index
>>> in fsl_elbc driver. We can issue any column address by UA instruction of
>>> elbc controller.
>>>
>> Why do you need to do that ?
>>
>> When mtd send you a 4k page, why can't you write it by 2*2k pages write ?
> 1. It's easy to implement.
> 2. We don't need to move the data in buffer more times, because we
> want to use the HW_ECC.
> 
> In flash chip per Page:
> ----------------------------------------------------------------
> | first data | first oob | second data | second oob |
> ----------------------------------------------------------------
How the bad block marker are handled with this remapping ?

Mtd will search in the first oob, but this will be the data zone of the nand,
not where manufacturer put marker.

Matthieu
Scott Wood - Aug. 19, 2011, 6:10 p.m.
On 08/19/2011 03:57 AM, Matthieu CASTET wrote:
> LiuShuo a écrit :
>> 于 2011年08月19日 01:00, Matthieu CASTET 写道:
>>> b35362@freescale.com a écrit :
>>>> From: Liu Shuo<b35362@freescale.com>
>>>>
>>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>>>> to support the Nand flash chip whose page size is larger than 2K bytes,
>>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>>>> we force to set the page size to 2K bytes. We convert the page address of
>>>> MTD layer driver to a real page address in flash chips and a column index
>>>> in fsl_elbc driver. We can issue any column address by UA instruction of
>>>> elbc controller.
>>>>
>>> Why do you need to do that ?
>>>
>>> When mtd send you a 4k page, why can't you write it by 2*2k pages write ?
>> 1. It's easy to implement.
>> 2. We don't need to move the data in buffer more times, because we
>> want to use the HW_ECC.
>>
>> In flash chip per Page:
>> ----------------------------------------------------------------
>> | first data | first oob | second data | second oob |
>> ----------------------------------------------------------------
> How the bad block marker are handled with this remapping ?

It has to be migrated prior to first use (this needs to be documented,
and ideally a U-Boot command provided do do this), or else special
handling would be needed when building the BBT.  The only way around
this would be to do ECC in software, and do the buffering needed to let
MTD treat it as a 4K chip.

-Scott
Artem Bityutskiy - Aug. 22, 2011, 10:53 a.m.
On Thu, 2011-08-18 at 10:33 +0800, b35362@freescale.com wrote:
> From: Liu Shuo <b35362@freescale.com>
> 
> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
> to support the Nand flash chip whose page size is larger than 2K bytes,
> we divide a page into multi-2K pages for MTD layer driver. In that case,
> we force to set the page size to 2K bytes. We convert the page address of
> MTD layer driver to a real page address in flash chips and a column index
> in fsl_elbc driver. We can issue any column address by UA instruction of
> elbc controller.
> 
> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
> the Same Page (NOP)', the flash chip which is supported by this workaround 
> have to meet below conditions.
> 	1. page size is not greater than 4KB 
> 	2.	1) if main area and spare area have independent NOPs:
> 			  main  area NOP    :    >=3
> 			  spare area NOP    :    >=2
> 		2) if main area and spare area have a common NOP: 
> 			  NOP               :    >=4

Could you please also add this kind of info to the driver code comments?

Does it also make sense to print a message if you do the emulation,
like:

	pr_info("attention! emulating 2KiB NAND pages!");
Artem Bityutskiy - Aug. 22, 2011, 10:58 a.m.
On Fri, 2011-08-19 at 13:10 -0500, Scott Wood wrote:
> On 08/19/2011 03:57 AM, Matthieu CASTET wrote:
> > LiuShuo a écrit :
> >> 于 2011年08月19日 01:00, Matthieu CASTET 写道:
> >>> b35362@freescale.com a écrit :
> >>>> From: Liu Shuo<b35362@freescale.com>
> >>>>
> >>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
> >>>> to support the Nand flash chip whose page size is larger than 2K bytes,
> >>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
> >>>> we force to set the page size to 2K bytes. We convert the page address of
> >>>> MTD layer driver to a real page address in flash chips and a column index
> >>>> in fsl_elbc driver. We can issue any column address by UA instruction of
> >>>> elbc controller.
> >>>>
> >>> Why do you need to do that ?
> >>>
> >>> When mtd send you a 4k page, why can't you write it by 2*2k pages write ?
> >> 1. It's easy to implement.
> >> 2. We don't need to move the data in buffer more times, because we
> >> want to use the HW_ECC.
> >>
> >> In flash chip per Page:
> >> ----------------------------------------------------------------
> >> | first data | first oob | second data | second oob |
> >> ----------------------------------------------------------------
> > How the bad block marker are handled with this remapping ?
> 
> It has to be migrated prior to first use (this needs to be documented,
> and ideally a U-Boot command provided do do this), or else special
> handling would be needed when building the BBT.  The only way around
> this would be to do ECC in software, and do the buffering needed to let
> MTD treat it as a 4K chip.

It really feels like a special hack which would better not go to
mainline - am I the only one with such feeling? If yes, probably I am
wrong...
Ivan Djelic - Aug. 22, 2011, 3:25 p.m.
On Mon, Aug 22, 2011 at 11:58:33AM +0100, Artem Bityutskiy wrote:
> On Fri, 2011-08-19 at 13:10 -0500, Scott Wood wrote:
> > On 08/19/2011 03:57 AM, Matthieu CASTET wrote:
> > > LiuShuo a écrit :
> > >> ??? 2011???08???19??? 01:00, Matthieu CASTET ??????:
> > >>> b35362@freescale.com a écrit :
> > >>>> From: Liu Shuo<b35362@freescale.com>
> > >>>>
> > >>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
> > >>>> to support the Nand flash chip whose page size is larger than 2K bytes,
> > >>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
> > >>>> we force to set the page size to 2K bytes. We convert the page address of
> > >>>> MTD layer driver to a real page address in flash chips and a column index
> > >>>> in fsl_elbc driver. We can issue any column address by UA instruction of
> > >>>> elbc controller.
> > >>>>
> > >>> Why do you need to do that ?
> > >>>
> > >>> When mtd send you a 4k page, why can't you write it by 2*2k pages write ?
> > >> 1. It's easy to implement.
> > >> 2. We don't need to move the data in buffer more times, because we
> > >> want to use the HW_ECC.
> > >>
> > >> In flash chip per Page:
> > >> ----------------------------------------------------------------
> > >> | first data | first oob | second data | second oob |
> > >> ----------------------------------------------------------------
> > > How the bad block marker are handled with this remapping ?
> > 
> > It has to be migrated prior to first use (this needs to be documented,
> > and ideally a U-Boot command provided do do this), or else special
> > handling would be needed when building the BBT.  The only way around
> > this would be to do ECC in software, and do the buffering needed to let
> > MTD treat it as a 4K chip.

Did you take into account the fact that because MTD thinks this a 2K chip,
you will have to wait twice for the nand busy read time (typically 25 us) per
each 4K read. In other words, to read 4 kBytes you will do:

1. send read0 (00), send address, send read1 (30)
2. wait tRB
3. transfer 2 kBytes
4. send read0 (00), send address, send read1 (30)
5. wait tRB
6. transfer 2 kBytes

Same problem for writes (but rather 100 us instead of 25 us).

How does that compare with hw ecc gain in terms of performance ?

--
Best Regards,

Ivan
Scott Wood - Aug. 22, 2011, 3:58 p.m.
On 08/22/2011 05:58 AM, Artem Bityutskiy wrote:
> On Fri, 2011-08-19 at 13:10 -0500, Scott Wood wrote:
>> On 08/19/2011 03:57 AM, Matthieu CASTET wrote:
>>> How the bad block marker are handled with this remapping ?
>>
>> It has to be migrated prior to first use (this needs to be documented,
>> and ideally a U-Boot command provided do do this), or else special
>> handling would be needed when building the BBT.  The only way around
>> this would be to do ECC in software, and do the buffering needed to let
>> MTD treat it as a 4K chip.
> 
> It really feels like a special hack which would better not go to
> mainline - am I the only one with such feeling? If yes, probably I am
> wrong...

While the implementation is (of necessity) a hack, the feature is
something that multiple people have been asking for (it's not a special
case for a specific user).  They say 2K chips are getting more difficult
to obtain.  It doesn't change anything for people using 512/2K chips,
and (in its current form) doesn't introduce significant complexity to
the driver.  I'm not sure how maintaining it out of tree would be a
better situation for anyone.

-Scott
Scott Wood - Aug. 22, 2011, 4:04 p.m.
On 08/22/2011 10:25 AM, Ivan Djelic wrote:
> Did you take into account the fact that because MTD thinks this a 2K chip,
> you will have to wait twice for the nand busy read time (typically 25 us) per
> each 4K read. In other words, to read 4 kBytes you will do:
> 
> 1. send read0 (00), send address, send read1 (30)
> 2. wait tRB
> 3. transfer 2 kBytes
> 4. send read0 (00), send address, send read1 (30)
> 5. wait tRB
> 6. transfer 2 kBytes
> 
> Same problem for writes (but rather 100 us instead of 25 us).
> 
> How does that compare with hw ecc gain in terms of performance ?

We'd have the double-delay with the sw ecc plus buffering approach as well.

To eliminate it we'd need to do an extra data transfer without reissuing
the command, which Shuo was unable to get to work.

And it's not worse than having an actual 2K chip. :-)

-Scott
Matthieu CASTET - Aug. 22, 2011, 4:13 p.m.
Scott Wood a écrit :
> On 08/22/2011 10:25 AM, Ivan Djelic wrote:
>> Did you take into account the fact that because MTD thinks this a 2K chip,
>> you will have to wait twice for the nand busy read time (typically 25 us) per
>> each 4K read. In other words, to read 4 kBytes you will do:
>>
>> 1. send read0 (00), send address, send read1 (30)
>> 2. wait tRB
>> 3. transfer 2 kBytes
>> 4. send read0 (00), send address, send read1 (30)
>> 5. wait tRB
>> 6. transfer 2 kBytes
>>
>> Same problem for writes (but rather 100 us instead of 25 us).
>>
>> How does that compare with hw ecc gain in terms of performance ?
> 
> We'd have the double-delay with the sw ecc plus buffering approach as well.
> 
> To eliminate it we'd need to do an extra data transfer without reissuing
> the command, which Shuo was unable to get to work.
> 
That's weird because our controller seems quite flexible [1].

Something like that should work ?

            out_be32(&lbc->fir,
                     (FIR_OP_CM2 << FIR_OP0_SHIFT) |
                     (FIR_OP_CA  << FIR_OP1_SHIFT) |
                     (FIR_OP_PA  << FIR_OP2_SHIFT) |
                     (FIR_OP_WB  << FIR_OP3_SHIFT));
refill FCM buffer with next 2k data

            out_be32(&lbc->fir,
                     (FIR_OP_WB  << FIR_OP3_SHIFT) |
                     (FIR_OP_CM3 << FIR_OP4_SHIFT) |
                     (FIR_OP_CW1 << FIR_OP5_SHIFT) |
                     (FIR_OP_RS  << FIR_OP6_SHIFT));



[1]
    __be32 fir;             /**< Flash Instruction Register */
#define FIR_OP0      0xF0000000
#define FIR_OP0_SHIFT        28
#define FIR_OP1      0x0F000000
#define FIR_OP1_SHIFT        24
#define FIR_OP2      0x00F00000
#define FIR_OP2_SHIFT        20
#define FIR_OP3      0x000F0000
#define FIR_OP3_SHIFT        16
#define FIR_OP4      0x0000F000
#define FIR_OP4_SHIFT        12
#define FIR_OP5      0x00000F00
#define FIR_OP5_SHIFT         8
#define FIR_OP6      0x000000F0
#define FIR_OP6_SHIFT         4
#define FIR_OP7      0x0000000F
#define FIR_OP7_SHIFT         0
#define FIR_OP_NOP   0x0    /* No operation and end of sequence */
#define FIR_OP_CA    0x1        /* Issue current column address */
#define FIR_OP_PA    0x2        /* Issue current block+page address */
#define FIR_OP_UA    0x3        /* Issue user defined address */
#define FIR_OP_CM0   0x4        /* Issue command from FCR[CMD0] */
#define FIR_OP_CM1   0x5        /* Issue command from FCR[CMD1] */
#define FIR_OP_CM2   0x6        /* Issue command from FCR[CMD2] */
#define FIR_OP_CM3   0x7        /* Issue command from FCR[CMD3] */
#define FIR_OP_WB    0x8        /* Write FBCR bytes from FCM buffer */
#define FIR_OP_WS    0x9        /* Write 1 or 2 bytes from MDR[AS] */
#define FIR_OP_RB    0xA        /* Read FBCR bytes to FCM buffer */
#define FIR_OP_RS    0xB        /* Read 1 or 2 bytes to MDR[AS] */
#define FIR_OP_CW0   0xC        /* Wait then issue FCR[CMD0] */
#define FIR_OP_CW1   0xD        /* Wait then issue FCR[CMD1] */
#define FIR_OP_RBW   0xE        /* Wait then read FBCR bytes */
#define FIR_OP_RSW   0xE        /* Wait then read 1 or 2 bytes */
Scott Wood - Aug. 22, 2011, 4:19 p.m.
On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
> Scott Wood a écrit :
>> To eliminate it we'd need to do an extra data transfer without reissuing
>> the command, which Shuo was unable to get to work.
>>
> That's weird because our controller seems quite flexible [1].
> 
> Something like that should work ?
> 
>             out_be32(&lbc->fir,
>                      (FIR_OP_CM2 << FIR_OP0_SHIFT) |
>                      (FIR_OP_CA  << FIR_OP1_SHIFT) |
>                      (FIR_OP_PA  << FIR_OP2_SHIFT) |
>                      (FIR_OP_WB  << FIR_OP3_SHIFT));
> refill FCM buffer with next 2k data
> 
>             out_be32(&lbc->fir,
>                      (FIR_OP_WB  << FIR_OP3_SHIFT) |
>                      (FIR_OP_CM3 << FIR_OP4_SHIFT) |
>                      (FIR_OP_CW1 << FIR_OP5_SHIFT) |
>                      (FIR_OP_RS  << FIR_OP6_SHIFT));

Something like that is what I originally suggested, but Shuo said it
didn't work (even in theory, it requires a CE-don't-care NAND chip,
since bus atomicity is broken).

Shuo, what specifically did you try, and what did you see happen?

-Scott
Matthieu CASTET - Aug. 22, 2011, 5:05 p.m.
Scott Wood a écrit :
> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>> Scott Wood a écrit :
>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>> the command, which Shuo was unable to get to work.
>>>
>> That's weird because our controller seems quite flexible [1].
>>
>> Something like that should work ?
>>
>>             out_be32(&lbc->fir,
>>                      (FIR_OP_CM2 << FIR_OP0_SHIFT) |
>>                      (FIR_OP_CA  << FIR_OP1_SHIFT) |
>>                      (FIR_OP_PA  << FIR_OP2_SHIFT) |
>>                      (FIR_OP_WB  << FIR_OP3_SHIFT));
>> refill FCM buffer with next 2k data
>>
>>             out_be32(&lbc->fir,
>>                      (FIR_OP_WB  << FIR_OP3_SHIFT) |
>>                      (FIR_OP_CM3 << FIR_OP4_SHIFT) |
>>                      (FIR_OP_CW1 << FIR_OP5_SHIFT) |
>>                      (FIR_OP_RS  << FIR_OP6_SHIFT));
> 
> Something like that is what I originally suggested, but Shuo said it
> didn't work (even in theory, it requires a CE-don't-care NAND chip,
> since bus atomicity is broken).
Are there 4K chip that are not CE-don't-care ?

Also I think it depends how the bus are connected  (shared with other device)
and the controller.

Matthieu
b35362@freescale.com - Aug. 23, 2011, 3:09 a.m.
于 2011年08月23日 00:19, Scott Wood 写道:
> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>> Scott Wood a écrit :
>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>> the command, which Shuo was unable to get to work.
>>>
>> That's weird because our controller seems quite flexible [1].
>>
>> Something like that should work ?
>>
>>              out_be32(&lbc->fir,
>>                       (FIR_OP_CM2<<  FIR_OP0_SHIFT) |
>>                       (FIR_OP_CA<<  FIR_OP1_SHIFT) |
>>                       (FIR_OP_PA<<  FIR_OP2_SHIFT) |
>>                       (FIR_OP_WB<<  FIR_OP3_SHIFT));
>> refill FCM buffer with next 2k data
>>
>>              out_be32(&lbc->fir,
>>                       (FIR_OP_WB<<  FIR_OP3_SHIFT) |
>>                       (FIR_OP_CM3<<  FIR_OP4_SHIFT) |
>>                       (FIR_OP_CW1<<  FIR_OP5_SHIFT) |
>>                       (FIR_OP_RS<<  FIR_OP6_SHIFT));
> Something like that is what I originally suggested, but Shuo said it
> didn't work (even in theory, it requires a CE-don't-care NAND chip,
> since bus atomicity is broken).
>
> Shuo, what specifically did you try, and what did you see happen?
>
> -Scott
First, if we want to read 4K data with once command issuing, we can't 
use HW_ECC.
Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st 
2k and 2nd 2k data.
They will cover the data in the head of 2nd 2K.
  -------------------------------------------------------------------------------------
| xxxxxx ... 1st 2k xxxxxxx ... | ff ff ff ... ff xxxxxx 2nd 2k xxxxxxx |
  -------------------------------------------------------------------------------------

It is worse to write 4k data with once command issuing. It can't write 
the 2nd data correctly.

-Liu Shuo
Matthieu CASTET - Aug. 23, 2011, 8:14 a.m.
LiuShuo a écrit :
> 于 2011年08月23日 00:19, Scott Wood 写道:
>> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>>> Scott Wood a écrit :
>>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>>> the command, which Shuo was unable to get to work.
>>>>
>>> That's weird because our controller seems quite flexible [1].
>>>
>>> Something like that should work ?
>>>
>>>              out_be32(&lbc->fir,
>>>                       (FIR_OP_CM2<<  FIR_OP0_SHIFT) |
>>>                       (FIR_OP_CA<<  FIR_OP1_SHIFT) |
>>>                       (FIR_OP_PA<<  FIR_OP2_SHIFT) |
>>>                       (FIR_OP_WB<<  FIR_OP3_SHIFT));
>>> refill FCM buffer with next 2k data
>>>
>>>              out_be32(&lbc->fir,
>>>                       (FIR_OP_WB<<  FIR_OP3_SHIFT) |
>>>                       (FIR_OP_CM3<<  FIR_OP4_SHIFT) |
>>>                       (FIR_OP_CW1<<  FIR_OP5_SHIFT) |
>>>                       (FIR_OP_RS<<  FIR_OP6_SHIFT));
>> Something like that is what I originally suggested, but Shuo said it
>> didn't work (even in theory, it requires a CE-don't-care NAND chip,
>> since bus atomicity is broken).
>>
>> Shuo, what specifically did you try, and what did you see happen?
>>
>> -Scott
> First, if we want to read 4K data with once command issuing, we can't 
> use HW_ECC.
Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software ecc ?

> Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st 
> 2k and 2nd 2k data.
Did you understand where those 0xff comes (what's the size of them. Doesn't the
controller try to insert spare aera ?)

Could you detail the sequence you used ?

Matthieu
b35362@freescale.com - Aug. 23, 2011, 8:37 a.m.
于 2011年08月19日 00:25, Scott Wood 写道:
> On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
>> From: Liu Shuo<b35362@freescale.com>
>>
>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>> to support the Nand flash chip whose page size is larger than 2K bytes,
>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>> we force to set the page size to 2K bytes. We convert the page address of
>> MTD layer driver to a real page address in flash chips and a column index
>> in fsl_elbc driver. We can issue any column address by UA instruction of
>> elbc controller.
>>
>> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
>> the Same Page (NOP)', the flash chip which is supported by this workaround
>> have to meet below conditions.
>> 	1. page size is not greater than 4KB
>> 	2.	1) if main area and spare area have independent NOPs:
>> 			  main  area NOP    :>=3
>> 			  spare area NOP    :>=2?
> How often are the NOPs split like this?
>
>> 		2) if main area and spare area have a common NOP:
>> 			  NOP               :>=4
> This depends on how the flash is used.  If you treat it as a NOP1 flash
> (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
> NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
> real 2K chip, you'll need NOP8 for a 4K chip.
>
> The NOP restrictions should be documented in the code itself, not just
> in the git changelog.  Maybe print it to the console when this hack is
> used, along with the NOP value read from the ID.

We can't read the NOP from the ID on any chip. Some chips don't
give this infomation.(e.g. Micron MT29F4G08BAC)

So it is hard to determine whether the probe() should fail in the code.
Maybe we will always print the NOP restrictions when this hack is used,
let the customers select how to use the flash on their board.

-LiuShuo
> If it's less than 4
> for 4K or 8 for 8K, also print a message saying not to use jffs2 (does
> yaffs2 do similar things?).  If it's less than 2 for 4K or 4 for 8K, the
> probe should fail.
>
> -Scott
b35362@freescale.com - Aug. 23, 2011, 9:57 a.m.
于 2011年08月23日 16:14, Matthieu CASTET 写道:
> LiuShuo a écrit :
>> 于 2011年08月23日 00:19, Scott Wood 写道:
>>> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>>>> Scott Wood a écrit :
>>>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>>>> the command, which Shuo was unable to get to work.
>>>>>
>>>> That's weird because our controller seems quite flexible [1].
>>>>
>>>> Something like that should work ?
>>>>
>>>>               out_be32(&lbc->fir,
>>>>                        (FIR_OP_CM2<<   FIR_OP0_SHIFT) |
>>>>                        (FIR_OP_CA<<   FIR_OP1_SHIFT) |
>>>>                        (FIR_OP_PA<<   FIR_OP2_SHIFT) |
>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT));
>>>> refill FCM buffer with next 2k data
>>>>
>>>>               out_be32(&lbc->fir,
>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT) |
>>>>                        (FIR_OP_CM3<<   FIR_OP4_SHIFT) |
>>>>                        (FIR_OP_CW1<<   FIR_OP5_SHIFT) |
>>>>                        (FIR_OP_RS<<   FIR_OP6_SHIFT));
>>> Something like that is what I originally suggested, but Shuo said it
>>> didn't work (even in theory, it requires a CE-don't-care NAND chip,
>>> since bus atomicity is broken).
>>>
>>> Shuo, what specifically did you try, and what did you see happen?
>>>
>>> -Scott
>> First, if we want to read 4K data with once command issuing, we can't
>> use HW_ECC.
> Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software ecc ?
>
>> Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st
>> 2k and 2nd 2k data.
> Did you understand where those 0xff comes (what's the size of them. Doesn't the
> controller try to insert spare aera ?)
I don't understand. I set FBCR to 2048, the controller will read the 
main area without spare area.
But the size of them is nearly spare area size( more or less a few bytes).
I can't guess the behavior of the controller then, so I select another way.

Could you try to do it and explain how those 0xff comes ?
> Could you detail the sequence you used ?
>
First half :
                   out_be32(&lbc->fbcr, 2048);
                   out_be32(&lbc->fir,
                            (FIR_OP_CM0 << FIR_OP0_SHIFT) |
                            (FIR_OP_CA << FIR_OP1_SHIFT) |
                            (FIR_OP_PA << FIR_OP2_SHIFT) |
                            (FIR_OP_CM1 << FIR_OP3_SHIFT) |
                            (FIR_OP_RBW << FIR_OP4_SHIFT));


Sencond half :
                 out_be32(&lbc->fbcr, 2048);
                 out_be32(&lbc->fir,
                            (FIR_OP_RB << FIR_OP0_SHIFT) |
                            (FIR_OP_RBW << FIR_OP1_SHIFT));


-Liu Shuo

> Matthieu
>
>
Matthieu CASTET - Aug. 23, 2011, 10:02 a.m.
LiuShuo a écrit :
> 于 2011年08月19日 00:25, Scott Wood 写道:
>> On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
>>> From: Liu Shuo<b35362@freescale.com>
>>>
>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>>> to support the Nand flash chip whose page size is larger than 2K bytes,
>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>>> we force to set the page size to 2K bytes. We convert the page address of
>>> MTD layer driver to a real page address in flash chips and a column index
>>> in fsl_elbc driver. We can issue any column address by UA instruction of
>>> elbc controller.
>>>
>>> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
>>> the Same Page (NOP)', the flash chip which is supported by this workaround
>>> have to meet below conditions.
>>> 	1. page size is not greater than 4KB
>>> 	2.	1) if main area and spare area have independent NOPs:
>>> 			  main  area NOP    :>=3
>>> 			  spare area NOP    :>=2?
>> How often are the NOPs split like this?
>>
>>> 		2) if main area and spare area have a common NOP:
>>> 			  NOP               :>=4
>> This depends on how the flash is used.  If you treat it as a NOP1 flash
>> (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
>> NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
>> real 2K chip, you'll need NOP8 for a 4K chip.
>>
>> The NOP restrictions should be documented in the code itself, not just
>> in the git changelog.  Maybe print it to the console when this hack is
>> used, along with the NOP value read from the ID.
> 
> We can't read the NOP from the ID on any chip. Some chips don't
> give this infomation.(e.g. Micron MT29F4G08BAC)
Doesn't the micron chip provide it with onfi info ?

Matthieu
Matthieu CASTET - Aug. 23, 2011, 10:13 a.m.
LiuShuo a écrit :
> 于 2011年08月23日 16:14, Matthieu CASTET 写道:
>> LiuShuo a écrit :
>>> 于 2011年08月23日 00:19, Scott Wood 写道:
>>>> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>>>>> Scott Wood a écrit :
>>>>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>>>>> the command, which Shuo was unable to get to work.
>>>>>>
>>>>> That's weird because our controller seems quite flexible [1].
>>>>>
>>>>> Something like that should work ?
>>>>>
>>>>>               out_be32(&lbc->fir,
>>>>>                        (FIR_OP_CM2<<   FIR_OP0_SHIFT) |
>>>>>                        (FIR_OP_CA<<   FIR_OP1_SHIFT) |
>>>>>                        (FIR_OP_PA<<   FIR_OP2_SHIFT) |
>>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT));
>>>>> refill FCM buffer with next 2k data
>>>>>
>>>>>               out_be32(&lbc->fir,
>>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT) |
>>>>>                        (FIR_OP_CM3<<   FIR_OP4_SHIFT) |
>>>>>                        (FIR_OP_CW1<<   FIR_OP5_SHIFT) |
>>>>>                        (FIR_OP_RS<<   FIR_OP6_SHIFT));
>>>> Something like that is what I originally suggested, but Shuo said it
>>>> didn't work (even in theory, it requires a CE-don't-care NAND chip,
>>>> since bus atomicity is broken).
>>>>
>>>> Shuo, what specifically did you try, and what did you see happen?
>>>>
>>>> -Scott
>>> First, if we want to read 4K data with once command issuing, we can't
>>> use HW_ECC.
>> Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software ecc ?
>>
>>> Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st
>>> 2k and 2nd 2k data.
>> Did you understand where those 0xff comes (what's the size of them. Doesn't the
>> controller try to insert spare aera ?)
> I don't understand. I set FBCR to 2048, the controller will read the 
> main area without spare area.
> But the size of them is nearly spare area size( more or less a few bytes)..
> I can't guess the behavior of the controller then, so I select another way.
> 
> Could you try to do it and explain how those 0xff comes ?
>> Could you detail the sequence you used ?
>>
> First half :
>                    out_be32(&lbc->fbcr, 2048);
shouldn't you read 2k+64 here ? At the end you want 4k plus spare aera (128) ?

>                    out_be32(&lbc->fir,
>                             (FIR_OP_CM0 << FIR_OP0_SHIFT) |
>                             (FIR_OP_CA << FIR_OP1_SHIFT) |
>                             (FIR_OP_PA << FIR_OP2_SHIFT) |
>                             (FIR_OP_CM1 << FIR_OP3_SHIFT) |
>                             (FIR_OP_RBW << FIR_OP4_SHIFT));
> 
> 
> Sencond half :
>                  out_be32(&lbc->fbcr, 2048);
>                  out_be32(&lbc->fir,
>                             (FIR_OP_RB << FIR_OP0_SHIFT) |
>                             (FIR_OP_RBW << FIR_OP1_SHIFT));
Why do you do FIR_OP_RBW ?
FIR_OP_RB already fetch the data.

Matthieu
Scott Wood - Aug. 23, 2011, 4:12 p.m.
On 08/23/2011 05:02 AM, Matthieu CASTET wrote:
> LiuShuo a écrit :
>> We can't read the NOP from the ID on any chip. Some chips don't
>> give this infomation.(e.g. Micron MT29F4G08BAC)

Are there any 4K+ chips (especially ones with insufficient NOP) that
don't have the info?

This chip is 2K and NOP8.

Is there an easy way (without needing to have every datasheet for every
chip ever made) to determine at runtime which chips supply this information?

> Doesn't the micron chip provide it with onfi info ?

This chip doesn't appear to be ONFI.

-Scott
b35362@freescale.com - Aug. 24, 2011, 2:48 a.m.
于 2011年08月23日 18:02, Matthieu CASTET 写道:
> LiuShuo a écrit :
>> 于 2011年08月19日 00:25, Scott Wood 写道:
>>> On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
>>>> From: Liu Shuo<b35362@freescale.com>
>>>>
>>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>>>> to support the Nand flash chip whose page size is larger than 2K bytes,
>>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>>>> we force to set the page size to 2K bytes. We convert the page address of
>>>> MTD layer driver to a real page address in flash chips and a column index
>>>> in fsl_elbc driver. We can issue any column address by UA instruction of
>>>> elbc controller.
>>>>
>>>> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
>>>> the Same Page (NOP)', the flash chip which is supported by this workaround
>>>> have to meet below conditions.
>>>> 	1. page size is not greater than 4KB
>>>> 	2.	1) if main area and spare area have independent NOPs:
>>>> 			  main  area NOP    :>=3
>>>> 			  spare area NOP    :>=2?
>>> How often are the NOPs split like this?
>>>
>>>> 		2) if main area and spare area have a common NOP:
>>>> 			  NOP               :>=4
>>> This depends on how the flash is used.  If you treat it as a NOP1 flash
>>> (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
>>> NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
>>> real 2K chip, you'll need NOP8 for a 4K chip.
>>>
>>> The NOP restrictions should be documented in the code itself, not just
>>> in the git changelog.  Maybe print it to the console when this hack is
>>> used, along with the NOP value read from the ID.
>> We can't read the NOP from the ID on any chip. Some chips don't
>> give this infomation.(e.g. Micron MT29F4G08BAC)
> Doesn't the micron chip provide it with onfi info ?
Sorry, there is something wrong with my expression.
We can get the NOP info from datasheet, but can't get it by READID 
command in code.

-LiuShuo
> Matthieu
>
Artem Bityutskiy - Aug. 25, 2011, 11:06 a.m.
On Mon, 2011-08-22 at 10:58 -0500, Scott Wood wrote:
> On 08/22/2011 05:58 AM, Artem Bityutskiy wrote:
> > On Fri, 2011-08-19 at 13:10 -0500, Scott Wood wrote:
> >> On 08/19/2011 03:57 AM, Matthieu CASTET wrote:
> >>> How the bad block marker are handled with this remapping ?
> >>
> >> It has to be migrated prior to first use (this needs to be documented,
> >> and ideally a U-Boot command provided do do this), or else special
> >> handling would be needed when building the BBT.  The only way around
> >> this would be to do ECC in software, and do the buffering needed to let
> >> MTD treat it as a 4K chip.
> > 
> > It really feels like a special hack which would better not go to
> > mainline - am I the only one with such feeling? If yes, probably I am
> > wrong...
> 
> While the implementation is (of necessity) a hack, the feature is
> something that multiple people have been asking for (it's not a special
> case for a specific user).  They say 2K chips are getting more difficult
> to obtain.  It doesn't change anything for people using 512/2K chips,
> and (in its current form) doesn't introduce significant complexity to
> the driver.  I'm not sure how maintaining it out of tree would be a
> better situation for anyone.

I am just afraid that (a) other drivers will do this (b) this will start
causing various weird bug-reports...
Artem Bityutskiy - Aug. 25, 2011, 11:18 a.m.
On Tue, 2011-08-23 at 11:12 -0500, Scott Wood wrote:
> On 08/23/2011 05:02 AM, Matthieu CASTET wrote:
> > LiuShuo a écrit :
> >> We can't read the NOP from the ID on any chip. Some chips don't
> >> give this infomation.(e.g. Micron MT29F4G08BAC)
> 
> Are there any 4K+ chips (especially ones with insufficient NOP) that
> don't have the info?
> 
> This chip is 2K and NOP8.
> 
> Is there an easy way (without needing to have every datasheet for every
> chip ever made) to determine at runtime which chips supply this information?
> 
> > Doesn't the micron chip provide it with onfi info ?
> 
> This chip doesn't appear to be ONFI.

Few quick thoughts.

1. I think that if driver is able to detect flash NOP parameter and
refuse flashes with too low NOP, then your change is OK.
2. For ONFI flashes, can we take NOP from ONFI info?
3. For non-ONFI chip, is it fair to conclude that MLCs _all_ have NOP 1?
Can distinguish between MLC/SLC? If not, can this table help:
http://www.linux-mtd.infradead.org/nand-data/nanddata.html? If needed,
can we put "bits-per-cell" data to 'struct nand_flash_dev
nand_flash_ids' ?
4. Can we add a NOP field to 'struct nand_flash_dev nand_flash_ids'
array?
Matthieu CASTET - Aug. 25, 2011, 11:25 a.m.
Hi,

LiuShuo a écrit :
> 于 2011年08月23日 18:02, Matthieu CASTET 写道:
>> LiuShuo a écrit :
>>> 于 2011年08月19日 00:25, Scott Wood 写道:
>>>> On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
>>>>> From: Liu Shuo<b35362@freescale.com>
>>>>>
>>>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>>>>> to support the Nand flash chip whose page size is larger than 2K bytes,
>>>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>>>>> we force to set the page size to 2K bytes. We convert the page address of
>>>>> MTD layer driver to a real page address in flash chips and a column index
>>>>> in fsl_elbc driver. We can issue any column address by UA instruction of
>>>>> elbc controller.
>>>>>
>>>>> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
>>>>> the Same Page (NOP)', the flash chip which is supported by this workaround
>>>>> have to meet below conditions.
>>>>> 	1. page size is not greater than 4KB
>>>>> 	2.	1) if main area and spare area have independent NOPs:
>>>>> 			  main  area NOP    :>=3
>>>>> 			  spare area NOP    :>=2?
>>>> How often are the NOPs split like this?
>>>>
>>>>> 		2) if main area and spare area have a common NOP:
>>>>> 			  NOP               :>=4
>>>> This depends on how the flash is used.  If you treat it as a NOP1 flash
>>>> (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
>>>> NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
>>>> real 2K chip, you'll need NOP8 for a 4K chip.
>>>>
>>>> The NOP restrictions should be documented in the code itself, not just
>>>> in the git changelog.  Maybe print it to the console when this hack is
>>>> used, along with the NOP value read from the ID.
>>> We can't read the NOP from the ID on any chip. Some chips don't
>>> give this infomation.(e.g. Micron MT29F4G08BAC)
>> Doesn't the micron chip provide it with onfi info ?
> Sorry, there is something wrong with my expression.
> We can get the NOP info from datasheet, but can't get it by READID 
> command in code.
> 
ok I was thinking the micron chip was a 4K nand. But it is an old 2K. Why do you
want NOP from it ?


Also can you reply my question about the sequence you use when trying to read 4k
with one command.


Thanks


Matthieu
b35362@freescale.com - Sept. 1, 2011, 9:41 a.m.
于 2011年08月25日 19:25, Matthieu CASTET 写道:
> Hi,
>
> LiuShuo a écrit :
>> 于 2011年08月23日 18:02, Matthieu CASTET 写道:
>>> LiuShuo a écrit :
>>>> 于 2011年08月19日 00:25, Scott Wood 写道:
>>>>> On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
>>>>>> From: Liu Shuo<b35362@freescale.com>
>>>>>>
>>>>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>>>>>> to support the Nand flash chip whose page size is larger than 2K bytes,
>>>>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>>>>>> we force to set the page size to 2K bytes. We convert the page address of
>>>>>> MTD layer driver to a real page address in flash chips and a column index
>>>>>> in fsl_elbc driver. We can issue any column address by UA instruction of
>>>>>> elbc controller.
>>>>>>
>>>>>> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
>>>>>> the Same Page (NOP)', the flash chip which is supported by this workaround
>>>>>> have to meet below conditions.
>>>>>> 	1. page size is not greater than 4KB
>>>>>> 	2.	1) if main area and spare area have independent NOPs:
>>>>>> 			  main  area NOP    :>=3
>>>>>> 			  spare area NOP    :>=2?
>>>>> How often are the NOPs split like this?
>>>>>
>>>>>> 		2) if main area and spare area have a common NOP:
>>>>>> 			  NOP               :>=4
>>>>> This depends on how the flash is used.  If you treat it as a NOP1 flash
>>>>> (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
>>>>> NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
>>>>> real 2K chip, you'll need NOP8 for a 4K chip.
>>>>>
>>>>> The NOP restrictions should be documented in the code itself, not just
>>>>> in the git changelog.  Maybe print it to the console when this hack is
>>>>> used, along with the NOP value read from the ID.
>>>> We can't read the NOP from the ID on any chip. Some chips don't
>>>> give this infomation.(e.g. Micron MT29F4G08BAC)
>>> Doesn't the micron chip provide it with onfi info ?
>> Sorry, there is something wrong with my expression.
>> We can get the NOP info from datasheet, but can't get it by READID
>> command in code.
>>
> ok I was thinking the micron chip was a 4K nand. But it is an old 2K. Why do you
> want NOP from it ?
>
>
> Also can you reply my question about the sequence you use when trying to read 4k
> with one command.
>
>
> Thanks
>
>
> Matthieu
>
Sorry for late reply.

After doing some tests, I found that the elbc controller can read/write 
4k with one command
if we insert a FIR_OP_NOP between first half reading/wring and second 
half reading/writing.(delay for something ?)

Read sequence :
-----------------------------------------------------------------------------------------------------------------------
first half :
                out_be32(&lbc->fir,
                          (FIR_OP_CM0 << FIR_OP0_SHIFT) |
                          (FIR_OP_CA << FIR_OP1_SHIFT) |
                          (FIR_OP_PA << FIR_OP2_SHIFT) |
                          (FIR_OP_CM1 << FIR_OP3_SHIFT) |
                          (FIR_OP_RBW << FIR_OP4_SHIFT));

                 out_be32(&lbc->fcr, (NAND_CMD_READ0 << FCR_CMD0_SHIFT) |
                                     (NAND_CMD_READSTART << 
FCR_CMD1_SHIFT));

second half :
                 out_be32(&lbc->fir,
                                 (FIR_OP_RB << FIR_OP1_SHIFT));        
// FIR_OP0_SHIFT is FIR_OP_NOP
-----------------------------------------------------------------------------------------------------------------------


Write sequence :
-----------------------------------------------------------------------------------------------------------------------

first half:
                 fcr = (NAND_CMD_STATUS << FCR_CMD1_SHIFT) |
                       (NAND_CMD_SEQIN << FCR_CMD2_SHIFT) |
                       (NAND_CMD_PAGEPROG << FCR_CMD3_SHIFT);

                 out_be32(&lbc->fir,
                                  (FIR_OP_CM2 << FIR_OP0_SHIFT) |
                                  (FIR_OP_CA << FIR_OP1_SHIFT) |
                                  (FIR_OP_PA << FIR_OP2_SHIFT) |
                                  (FIR_OP_WB << FIR_OP3_SHIFT));

second half:
                 out_be32(&lbc->fir,
                         (FIR_OP_WB << FIR_OP1_SHIFT) |
                         (FIR_OP_CM3 << FIR_OP2_SHIFT) |
                         (FIR_OP_CW1 << FIR_OP3_SHIFT) |
                         (FIR_OP_RS << FIR_OP4_SHIFT));
-----------------------------------------------------------------------------------------------------------------------

I am going to try to finish it and send a new patch.


-LiuShuo
Scott Wood - Sept. 1, 2011, 10:30 p.m.
On 09/01/2011 04:41 AM, LiuShuo wrote:
> After doing some tests, I found that the elbc controller can read/write
> 4k with one command
> if we insert a FIR_OP_NOP between first half reading/wring and second
> half reading/writing.(delay for something ?)

From the docs:

> A NOP instruction that appears in FIR ahead of the last instruction
> is executed with the timing of a regular command instruction, but
> neither LFCLE nor LFWE are asserted.  Thus a NOP instruction may be
> used to insert a pause matching the time taken for a regular command
> write.

So the NOP does generate a delay.  Would be nice to know exactly why
it's required.

Have you tried doing this under load with parallel NOR activity?  With
CE-don't-care operation, during the times when CE is not asserted, does
it matter what happens with CLE/ALE/RE?  These signals could be driven
for another chipselect during that time.

-Scott

Patch

diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index a212116..884a9f1 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -76,6 +76,13 @@  struct fsl_elbc_fcm_ctrl {
 	unsigned int oob;        /* Non zero if operating on OOB data     */
 	unsigned int counter;	 /* counter for the initializations	  */
 	char *oob_poi;           /* Place to write ECC after read back    */
+
+	/*
+	 * If writesize > 2048, these two members are used to calculate
+	 * the real page address and real column address.
+	 */
+	int subpage_shift;
+	int subpage_mask;
 };
 
 /* These map to the positions used by the FCM hardware ECC generator */
@@ -164,18 +171,27 @@  static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
 	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
 	struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = ctrl->nand;
 	int buf_num;
+	u32 real_ca = column;
 
-	elbc_fcm_ctrl->page = page_addr;
+	if (priv->page_size && elbc_fcm_ctrl->subpage_shift) {
+		real_ca = (page_addr & elbc_fcm_ctrl->subpage_mask) * 2112;
+		page_addr >>= elbc_fcm_ctrl->subpage_shift;
+	}
 
-	out_be32(&lbc->fbar,
-	         page_addr >> (chip->phys_erase_shift - chip->page_shift));
+	elbc_fcm_ctrl->page = page_addr;
 
 	if (priv->page_size) {
+		real_ca += (oob ? 2048 : 0);
+		elbc_fcm_ctrl->use_mdr = 1;
+		elbc_fcm_ctrl->mdr = real_ca;
+
+		out_be32(&lbc->fbar, page_addr >> 6);
 		out_be32(&lbc->fpar,
 		         ((page_addr << FPAR_LP_PI_SHIFT) & FPAR_LP_PI) |
 		         (oob ? FPAR_LP_MS : 0) | column);
 		buf_num = (page_addr & 1) << 2;
 	} else {
+		out_be32(&lbc->fbar, page_addr >> 5);
 		out_be32(&lbc->fpar,
 		         ((page_addr << FPAR_SP_PI_SHIFT) & FPAR_SP_PI) |
 		         (oob ? FPAR_SP_MS : 0) | column);
@@ -256,10 +272,11 @@  static void fsl_elbc_do_read(struct nand_chip *chip, int oob)
 	if (priv->page_size) {
 		out_be32(&lbc->fir,
 		         (FIR_OP_CM0 << FIR_OP0_SHIFT) |
-		         (FIR_OP_CA  << FIR_OP1_SHIFT) |
-		         (FIR_OP_PA  << FIR_OP2_SHIFT) |
-		         (FIR_OP_CM1 << FIR_OP3_SHIFT) |
-		         (FIR_OP_RBW << FIR_OP4_SHIFT));
+		         (FIR_OP_UA  << FIR_OP1_SHIFT) |
+		         (FIR_OP_UA  << FIR_OP2_SHIFT) |
+		         (FIR_OP_PA  << FIR_OP3_SHIFT) |
+		         (FIR_OP_CM1 << FIR_OP4_SHIFT) |
+		         (FIR_OP_RBW << FIR_OP5_SHIFT));
 
 		out_be32(&lbc->fcr, (NAND_CMD_READ0 << FCR_CMD0_SHIFT) |
 		                    (NAND_CMD_READSTART << FCR_CMD1_SHIFT));
@@ -399,12 +416,13 @@  static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 		if (priv->page_size) {
 			out_be32(&lbc->fir,
 			         (FIR_OP_CM2 << FIR_OP0_SHIFT) |
-			         (FIR_OP_CA  << FIR_OP1_SHIFT) |
-			         (FIR_OP_PA  << FIR_OP2_SHIFT) |
-			         (FIR_OP_WB  << FIR_OP3_SHIFT) |
-			         (FIR_OP_CM3 << FIR_OP4_SHIFT) |
-			         (FIR_OP_CW1 << FIR_OP5_SHIFT) |
-			         (FIR_OP_RS  << FIR_OP6_SHIFT));
+			         (FIR_OP_UA  << FIR_OP1_SHIFT) |
+			         (FIR_OP_UA  << FIR_OP2_SHIFT) |
+			         (FIR_OP_PA  << FIR_OP3_SHIFT) |
+			         (FIR_OP_WB  << FIR_OP4_SHIFT) |
+			         (FIR_OP_CM3 << FIR_OP5_SHIFT) |
+			         (FIR_OP_CW1 << FIR_OP6_SHIFT) |
+			         (FIR_OP_RS  << FIR_OP7_SHIFT));
 		} else {
 			out_be32(&lbc->fir,
 			         (FIR_OP_CM0 << FIR_OP0_SHIFT) |
@@ -453,6 +471,9 @@  static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 			full_page = 1;
 		}
 
+		if (priv->page_size)
+			elbc_fcm_ctrl->use_mdr = 1;
+
 		fsl_elbc_run_command(mtd);
 
 		/* Read back the page in order to fill in the ECC for the
@@ -654,9 +675,28 @@  static int fsl_elbc_chip_init_tail(struct mtd_info *mtd)
 	struct nand_chip *chip = mtd->priv;
 	struct fsl_elbc_mtd *priv = chip->priv;
 	struct fsl_lbc_ctrl *ctrl = priv->ctrl;
+	struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = ctrl->nand;
 	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
 	unsigned int al;
 
+	/*
+	 * Hack for supporting the flash chip whose writesize is
+	 * larger than 2K bytes.
+	 */
+	if (mtd->writesize > 2048) {
+		elbc_fcm_ctrl->subpage_shift = ffs(mtd->writesize >> 11) - 1;
+		elbc_fcm_ctrl->subpage_mask =
+			(1 << elbc_fcm_ctrl->subpage_shift) - 1;
+		/*
+		 * Rewrite mtd->writesize, mtd->oobsize, chip->page_shift
+		 * and chip->pagemask.
+		 */
+		mtd->writesize = 2048;
+		mtd->oobsize = 64;
+		chip->page_shift = ffs(mtd->writesize) - 1;
+		chip->pagemask = (chip->chipsize >> chip->page_shift) - 1;
+	}
+
 	/* calculate FMR Address Length field */
 	al = 0;
 	if (chip->pagemask & 0xffff0000)