mbox series

[0/3] mtd: rawnand: More continuous read fixes

Message ID 20240223115545.354541-1-miquel.raynal@bootlin.com
Headers show
Series mtd: rawnand: More continuous read fixes | expand

Message

Miquel Raynal Feb. 23, 2024, 11:55 a.m. UTC
Hello,

Following Christophe report I manually inserted many different
conditions to test the logic enablingand configuring continuous reads in
the core, trying to clarify the core and hopefully fix it for real. I am
pretty confident regarding the first patch but a bit more in the fog for
the second/third. Even though I'm pretty sure they improve the situation
there might still be corner cases I don't have in mind.

Link: https://lore.kernel.org/linux-mtd/20240221175327.42f7076d@xps-13/T/#m399bacb10db8f58f6b1f0149a1df867ec086bb0a

Cheers,
Miquèl

Miquel Raynal (3):
  mtd: rawnand: Fix and simplify again the continuous read derivations
  mtd: rawnand: Add a helper for calculating a page index
  mtd: rawnand: Ensure all continuous terms are always in sync

 drivers/mtd/nand/raw/nand_base.c | 77 +++++++++++++++++++-------------
 1 file changed, 47 insertions(+), 30 deletions(-)

Comments

Christophe Kerello Feb. 26, 2024, 2:28 p.m. UTC | #1
Hi Miquel,

On 2/23/24 12:55, Miquel Raynal wrote:
> Hello,
> 
> Following Christophe report I manually inserted many different
> conditions to test the logic enablingand configuring continuous reads in
> the core, trying to clarify the core and hopefully fix it for real. I am
> pretty confident regarding the first patch but a bit more in the fog for
> the second/third. Even though I'm pretty sure they improve the situation
> there might still be corner cases I don't have in mind.

I have tested the patchset and the issue is fixed, so I will send a 
tested-by on patch 1.

But, I think that there is still an issue using FMC2 and probably others 
drivers.

FMC2 driver has 2 modes: polling mode that will called 
nand_read_page_op and the sequencer mode that is defining its own HW 
read algorithm.

The FMC2 sequencer do not support continuous read feature.

I have added basic logs in nand_do_read_ops. My understanding is that 
the continuous read feature should be disabled at the end of this function.

FMC2 polling mode:
root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex
[   41.083132] nand_do_read_ops starts: cont_read.ongoing=1
[   41.086410] nand_do_read_ops starts: cont_read.first_page=0, 
cont_read.last_page=1
[   41.094797] nand_do_read_ops ends: cont_read.ongoing=0
[   41.098111] nand_do_read_ops ends: cont_read.first_page=0, 
cont_read.last_page=1
Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex

It is OK. In polling mode, con_read.ongoing is set to false before 
leaving nand_do_read_ops function.

FMC2 sequencer:
root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex
[   57.143059] nand_do_read_ops starts: cont_read.ongoing=1
[   57.146370] nand_do_read_ops starts: cont_read.first_page=0, 
cont_read.last_page=1
[   57.154469] nand_do_read_ops ends: cont_read.ongoing=1
[   57.158020] nand_do_read_ops ends: cont_read.first_page=0, 
cont_read.last_page=1
Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex

KO, con_read.ongoing is set to true before leaving nand_do_read_ops 
function. That means that read_oob can returned wrong data (similar 
issue as the initial reported issue).

So, I see 2 ways to fix this issue.

On framework side by adding a callback (rawnand_disable_cont_reads) that 
will set to false con_read.ongoing before leaving nand_do_read_ops function.

Or

On FMC2 driver side by disabling the continuous read feature in case of 
the sequencer is used like it is done in nandsim.c driver.
Something like:
if (check_only) {
		for (op_id = 0; op_id < op->ninstrs; op_id++) {
			instr = &op->instrs[op_id];
			if (instr->type == NAND_OP_CMD_INSTR &&
			    (instr->ctx.cmd.opcode == NAND_CMD_READCACHEEND ||
			     instr->ctx.cmd.opcode == NAND_CMD_READCACHESEQ))
				return -EOPNOTSUPP;
		}

		return 0;
	}

So, should it be fixed on driver side or framework side?

Regards,
Christophe Kerello.

> 
> Link: https://lore.kernel.org/linux-mtd/20240221175327.42f7076d@xps-13/T/#m399bacb10db8f58f6b1f0149a1df867ec086bb0a
> 
> Cheers,
> Miquèl
> 
> Miquel Raynal (3):
>    mtd: rawnand: Fix and simplify again the continuous read derivations
>    mtd: rawnand: Add a helper for calculating a page index
>    mtd: rawnand: Ensure all continuous terms are always in sync
> 
>   drivers/mtd/nand/raw/nand_base.c | 77 +++++++++++++++++++-------------
>   1 file changed, 47 insertions(+), 30 deletions(-)
>
Miquel Raynal Feb. 29, 2024, 10:46 a.m. UTC | #2
Hi Christophe,

christophe.kerello@foss.st.com wrote on Mon, 26 Feb 2024 15:28:59 +0100:

> Hi Miquel,
> 
> On 2/23/24 12:55, Miquel Raynal wrote:
> > Hello,
> > 
> > Following Christophe report I manually inserted many different
> > conditions to test the logic enablingand configuring continuous reads in
> > the core, trying to clarify the core and hopefully fix it for real. I am
> > pretty confident regarding the first patch but a bit more in the fog for
> > the second/third. Even though I'm pretty sure they improve the situation
> > there might still be corner cases I don't have in mind.  
> 
> I have tested the patchset and the issue is fixed, so I will send a tested-by on patch 1.

Great! Thanks!

For now I could not get my hands on a chip with more than one LUN. If
by chance yours has two LUNs (or more), could you please run some
experiments when crossing the LUN boundary?

Anyhow, your Tested-by will be welcome.

> But, I think that there is still an issue using FMC2 and probably others drivers.
> 
> FMC2 driver has 2 modes: polling mode that will called nand_read_page_op and the sequencer mode that is defining its own HW read algorithm.
> 
> The FMC2 sequencer do not support continuous read feature.
> 
> I have added basic logs in nand_do_read_ops. My understanding is that the continuous read feature should be disabled at the end of this function.
> 
> FMC2 polling mode:
> root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex
> [   41.083132] nand_do_read_ops starts: cont_read.ongoing=1
> [   41.086410] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1
> [   41.094797] nand_do_read_ops ends: cont_read.ongoing=0
> [   41.098111] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1
> Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex
> 
> It is OK. In polling mode, con_read.ongoing is set to false before leaving nand_do_read_ops function.
> 
> FMC2 sequencer:
> root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex
> [   57.143059] nand_do_read_ops starts: cont_read.ongoing=1
> [   57.146370] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1
> [   57.154469] nand_do_read_ops ends: cont_read.ongoing=1
> [   57.158020] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1
> Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex
> 
> KO, con_read.ongoing is set to true before leaving nand_do_read_ops function. That means that read_oob can returned wrong data (similar issue as the initial reported issue).
> 
> So, I see 2 ways to fix this issue.
> 
> On framework side by adding a callback (rawnand_disable_cont_reads) that will set to false con_read.ongoing before leaving nand_do_read_ops function.

Can you please trace what happens here? Upon what specific condition
nand_do_read_ops does return with ongoing set to true? This is probably
what needs fixing, but I don't feel like a driver hook is the right
approach.

I was expecting the ongoing boolean to always be reset at the end of
nand_do_read_ops(), I probably missed a scenario. In any case what is
probably needed for the sequencer to work is:

+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -3726,6 +3726,7 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
                }
        }
        nand_deselect_target(chip);
+       chip->cont_read.ongoing = false;
 
        ops->retlen = ops->len - (size_t) readlen;
        if (oob)

> 
> Or
> 
> On FMC2 driver side by disabling the continuous read feature in case of the sequencer is used like it is done in nandsim.c driver.
> Something like:
> if (check_only) {
> 		for (op_id = 0; op_id < op->ninstrs; op_id++) {
> 			instr = &op->instrs[op_id];
> 			if (instr->type == NAND_OP_CMD_INSTR &&
> 			    (instr->ctx.cmd.opcode == NAND_CMD_READCACHEEND ||
> 			     instr->ctx.cmd.opcode == NAND_CMD_READCACHESEQ))
> 				return -EOPNOTSUPP;
> 		}
> 
> 		return 0;
> 	}

This is a second possible choice if the first one does not give
interesting results. Not my favorite in your case.

Thanks for your help,
Miquèl
Christophe Kerello March 4, 2024, 10:44 a.m. UTC | #3
Hi Miquel,

On 2/29/24 11:46, Miquel Raynal wrote:
> Hi Christophe,
> 
> christophe.kerello@foss.st.com wrote on Mon, 26 Feb 2024 15:28:59 +0100:
> 
>> Hi Miquel,
>>
>> On 2/23/24 12:55, Miquel Raynal wrote:
>>> Hello,
>>>
>>> Following Christophe report I manually inserted many different
>>> conditions to test the logic enablingand configuring continuous reads in
>>> the core, trying to clarify the core and hopefully fix it for real. I am
>>> pretty confident regarding the first patch but a bit more in the fog for
>>> the second/third. Even though I'm pretty sure they improve the situation
>>> there might still be corner cases I don't have in mind.
>>
>> I have tested the patchset and the issue is fixed, so I will send a tested-by on patch 1.
> 
> Great! Thanks!
> 
> For now I could not get my hands on a chip with more than one LUN. If
> by chance yours has two LUNs (or more), could you please run some
> experiments when crossing the LUN boundary?

Sorry, I have one LUN.

> 
> Anyhow, your Tested-by will be welcome.
> 
>> But, I think that there is still an issue using FMC2 and probably others drivers.
>>
>> FMC2 driver has 2 modes: polling mode that will called nand_read_page_op and the sequencer mode that is defining its own HW read algorithm.
>>
>> The FMC2 sequencer do not support continuous read feature.
>>
>> I have added basic logs in nand_do_read_ops. My understanding is that the continuous read feature should be disabled at the end of this function.
>>
>> FMC2 polling mode:
>> root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex
>> [   41.083132] nand_do_read_ops starts: cont_read.ongoing=1
>> [   41.086410] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1
>> [   41.094797] nand_do_read_ops ends: cont_read.ongoing=0
>> [   41.098111] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1
>> Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex
>>
>> It is OK. In polling mode, con_read.ongoing is set to false before leaving nand_do_read_ops function.
>>
>> FMC2 sequencer:
>> root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex
>> [   57.143059] nand_do_read_ops starts: cont_read.ongoing=1
>> [   57.146370] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1
>> [   57.154469] nand_do_read_ops ends: cont_read.ongoing=1
>> [   57.158020] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1
>> Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex
>>
>> KO, con_read.ongoing is set to true before leaving nand_do_read_ops function. That means that read_oob can returned wrong data (similar issue as the initial reported issue).
>>
>> So, I see 2 ways to fix this issue.
>>
>> On framework side by adding a callback (rawnand_disable_cont_reads) that will set to false con_read.ongoing before leaving nand_do_read_ops function.
> 
> Can you please trace what happens here? Upon what specific condition
> nand_do_read_ops does return with ongoing set to true? This is probably
> what needs fixing, but I don't feel like a driver hook is the right
> approach.

The continuous read feature is disabled in
nand_lp_exec_cont_read_page_op function. In case of the sequencer is
used, this function is never called as the HW will read the page using
its own algorithm.

I have done the traces in polling and in sequencer mode.

sequencer:
root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x3000 /tmp/read.hex
[   70.867763] rawnand_enable_cont_reads
[   70.870033] rawnand_cap_cont_reads
[   70.873463] rawnand_cap_cont_reads: ppl=262144 luns_per_target=1
[   70.879649] nand_do_read_ops before while loop: cont_read.ongoing=1
[   70.885705] nand_do_read_ops before while loop: 
cont_read.first_page=0, cont_read.last_page=2
[   70.894335] stm32_fmc2_nfc_seq_read_page
[   70.898637] stm32_fmc2_nfc_seq_read_page
[   70.902440] stm32_fmc2_nfc_seq_read_page
[   70.906471] nand_do_read_ops after while loop: cont_read.ongoing=1
[   70.912293] nand_do_read_ops after while loop: 
cont_read.first_page=0, cont_read.last_page=2
Copied 12288 bytes from address 0x00000000 in flash to /tmp/read.hex

polling:
root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x3000 /tmp/read.hex
[   59.712206] rawnand_enable_cont_reads
[   59.714478] rawnand_cap_cont_reads
[   59.718060] rawnand_cap_cont_reads: ppl=262144 luns_per_target=1
[   59.723951] nand_do_read_ops before while loop: cont_read.ongoing=1
[   59.730273] nand_do_read_ops before while loop: 
cont_read.first_page=0, cont_read.last_page=2
[   59.738793] stm32_fmc2_nfc_read_page
[   59.742335] nand_read_page_op
[   59.745264] rawnand_cont_read_ongoing
[   59.748942] nand_lp_exec_cont_read_page_op
[   59.753596] stm32_fmc2_nfc_read_page
[   59.756716] nand_read_page_op
[   59.759549] rawnand_cont_read_ongoing
[   59.763184] nand_lp_exec_cont_read_page_op
[   59.768329] stm32_fmc2_nfc_read_page
[   59.770879] nand_read_page_op
[   59.773907] rawnand_cont_read_ongoing
[   59.777615] nand_lp_exec_cont_read_page_op
[   59.782217] nand_do_read_ops after while loop: cont_read.ongoing=0
[   59.787818] nand_do_read_ops after while loop: 
cont_read.first_page=0, cont_read.last_page=2
Copied 12288 bytes from address 0x00000000 in flash to /tmp/read.hex

> 
> I was expecting the ongoing boolean to always be reset at the end of
> nand_do_read_ops(), I probably missed a scenario. In any case what is
> probably needed for the sequencer to work is:
> 
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -3726,6 +3726,7 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
>                  }
>          }
>          nand_deselect_target(chip);
> +       chip->cont_read.ongoing = false;

With this patch, it is OK, the continuous read feature will be disabled
as expected in all cases.

Regards,
Christophe Kerello.

>   
>          ops->retlen = ops->len - (size_t) readlen;
>          if (oob)
> 
>>
>> Or
>>
>> On FMC2 driver side by disabling the continuous read feature in case of the sequencer is used like it is done in nandsim.c driver.
>> Something like:
>> if (check_only) {
>> 		for (op_id = 0; op_id < op->ninstrs; op_id++) {
>> 			instr = &op->instrs[op_id];
>> 			if (instr->type == NAND_OP_CMD_INSTR &&
>> 			    (instr->ctx.cmd.opcode == NAND_CMD_READCACHEEND ||
>> 			     instr->ctx.cmd.opcode == NAND_CMD_READCACHESEQ))
>> 				return -EOPNOTSUPP;
>> 		}
>>
>> 		return 0;
>> 	}
> 
> This is a second possible choice if the first one does not give
> interesting results. Not my favorite in your case.
> 
> Thanks for your help,
> Miquèl
Miquel Raynal March 7, 2024, 11:52 a.m. UTC | #4
Hi Christophe,

> > I was expecting the ongoing boolean to always be reset at the end of
> > nand_do_read_ops(), I probably missed a scenario. In any case what is
> > probably needed for the sequencer to work is:
> > 
> > +++ b/drivers/mtd/nand/raw/nand_base.c
> > @@ -3726,6 +3726,7 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
> >                  }
> >          }
> >          nand_deselect_target(chip);
> > +       chip->cont_read.ongoing = false;  
> 
> With this patch, it is OK, the continuous read feature will be disabled
> as expected in all cases.

Thanks to your feedback, I've thought again about this case. The fix
presented here will most likely work but does not sound ideal as we
still enable the continuous read internal flag without using it. So I
am drafting something else that should ensure we are always enabling
continuous read only when we can use it.

As a second step, I will reset the flag like above, but with a
WARN_ON_ONCE(). This way:
- If it happens that we missed another path where this is not disabled,
  there will be no behavioral bug (hopefully).
- We will still be notified by the warning.

Thanks for all your valuable feedback and testing. Let me know what
you think of this new series.

Thanks,
Miquèl