Message ID | 20240223115545.354541-1-miquel.raynal@bootlin.com |
---|---|
Headers | show |
Series | mtd: rawnand: More continuous read fixes | expand |
Hi Miquel, On 2/23/24 12:55, Miquel Raynal wrote: > Hello, > > Following Christophe report I manually inserted many different > conditions to test the logic enablingand configuring continuous reads in > the core, trying to clarify the core and hopefully fix it for real. I am > pretty confident regarding the first patch but a bit more in the fog for > the second/third. Even though I'm pretty sure they improve the situation > there might still be corner cases I don't have in mind. I have tested the patchset and the issue is fixed, so I will send a tested-by on patch 1. But, I think that there is still an issue using FMC2 and probably others drivers. FMC2 driver has 2 modes: polling mode that will called nand_read_page_op and the sequencer mode that is defining its own HW read algorithm. The FMC2 sequencer do not support continuous read feature. I have added basic logs in nand_do_read_ops. My understanding is that the continuous read feature should be disabled at the end of this function. FMC2 polling mode: root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex [ 41.083132] nand_do_read_ops starts: cont_read.ongoing=1 [ 41.086410] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1 [ 41.094797] nand_do_read_ops ends: cont_read.ongoing=0 [ 41.098111] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1 Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex It is OK. In polling mode, con_read.ongoing is set to false before leaving nand_do_read_ops function. FMC2 sequencer: root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex [ 57.143059] nand_do_read_ops starts: cont_read.ongoing=1 [ 57.146370] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1 [ 57.154469] nand_do_read_ops ends: cont_read.ongoing=1 [ 57.158020] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1 Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex KO, con_read.ongoing is set to true before leaving nand_do_read_ops function. That means that read_oob can returned wrong data (similar issue as the initial reported issue). So, I see 2 ways to fix this issue. On framework side by adding a callback (rawnand_disable_cont_reads) that will set to false con_read.ongoing before leaving nand_do_read_ops function. Or On FMC2 driver side by disabling the continuous read feature in case of the sequencer is used like it is done in nandsim.c driver. Something like: if (check_only) { for (op_id = 0; op_id < op->ninstrs; op_id++) { instr = &op->instrs[op_id]; if (instr->type == NAND_OP_CMD_INSTR && (instr->ctx.cmd.opcode == NAND_CMD_READCACHEEND || instr->ctx.cmd.opcode == NAND_CMD_READCACHESEQ)) return -EOPNOTSUPP; } return 0; } So, should it be fixed on driver side or framework side? Regards, Christophe Kerello. > > Link: https://lore.kernel.org/linux-mtd/20240221175327.42f7076d@xps-13/T/#m399bacb10db8f58f6b1f0149a1df867ec086bb0a > > Cheers, > Miquèl > > Miquel Raynal (3): > mtd: rawnand: Fix and simplify again the continuous read derivations > mtd: rawnand: Add a helper for calculating a page index > mtd: rawnand: Ensure all continuous terms are always in sync > > drivers/mtd/nand/raw/nand_base.c | 77 +++++++++++++++++++------------- > 1 file changed, 47 insertions(+), 30 deletions(-) >
Hi Christophe, christophe.kerello@foss.st.com wrote on Mon, 26 Feb 2024 15:28:59 +0100: > Hi Miquel, > > On 2/23/24 12:55, Miquel Raynal wrote: > > Hello, > > > > Following Christophe report I manually inserted many different > > conditions to test the logic enablingand configuring continuous reads in > > the core, trying to clarify the core and hopefully fix it for real. I am > > pretty confident regarding the first patch but a bit more in the fog for > > the second/third. Even though I'm pretty sure they improve the situation > > there might still be corner cases I don't have in mind. > > I have tested the patchset and the issue is fixed, so I will send a tested-by on patch 1. Great! Thanks! For now I could not get my hands on a chip with more than one LUN. If by chance yours has two LUNs (or more), could you please run some experiments when crossing the LUN boundary? Anyhow, your Tested-by will be welcome. > But, I think that there is still an issue using FMC2 and probably others drivers. > > FMC2 driver has 2 modes: polling mode that will called nand_read_page_op and the sequencer mode that is defining its own HW read algorithm. > > The FMC2 sequencer do not support continuous read feature. > > I have added basic logs in nand_do_read_ops. My understanding is that the continuous read feature should be disabled at the end of this function. > > FMC2 polling mode: > root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex > [ 41.083132] nand_do_read_ops starts: cont_read.ongoing=1 > [ 41.086410] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1 > [ 41.094797] nand_do_read_ops ends: cont_read.ongoing=0 > [ 41.098111] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1 > Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex > > It is OK. In polling mode, con_read.ongoing is set to false before leaving nand_do_read_ops function. > > FMC2 sequencer: > root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex > [ 57.143059] nand_do_read_ops starts: cont_read.ongoing=1 > [ 57.146370] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1 > [ 57.154469] nand_do_read_ops ends: cont_read.ongoing=1 > [ 57.158020] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1 > Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex > > KO, con_read.ongoing is set to true before leaving nand_do_read_ops function. That means that read_oob can returned wrong data (similar issue as the initial reported issue). > > So, I see 2 ways to fix this issue. > > On framework side by adding a callback (rawnand_disable_cont_reads) that will set to false con_read.ongoing before leaving nand_do_read_ops function. Can you please trace what happens here? Upon what specific condition nand_do_read_ops does return with ongoing set to true? This is probably what needs fixing, but I don't feel like a driver hook is the right approach. I was expecting the ongoing boolean to always be reset at the end of nand_do_read_ops(), I probably missed a scenario. In any case what is probably needed for the sequencer to work is: +++ b/drivers/mtd/nand/raw/nand_base.c @@ -3726,6 +3726,7 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from, } } nand_deselect_target(chip); + chip->cont_read.ongoing = false; ops->retlen = ops->len - (size_t) readlen; if (oob) > > Or > > On FMC2 driver side by disabling the continuous read feature in case of the sequencer is used like it is done in nandsim.c driver. > Something like: > if (check_only) { > for (op_id = 0; op_id < op->ninstrs; op_id++) { > instr = &op->instrs[op_id]; > if (instr->type == NAND_OP_CMD_INSTR && > (instr->ctx.cmd.opcode == NAND_CMD_READCACHEEND || > instr->ctx.cmd.opcode == NAND_CMD_READCACHESEQ)) > return -EOPNOTSUPP; > } > > return 0; > } This is a second possible choice if the first one does not give interesting results. Not my favorite in your case. Thanks for your help, Miquèl
Hi Miquel, On 2/29/24 11:46, Miquel Raynal wrote: > Hi Christophe, > > christophe.kerello@foss.st.com wrote on Mon, 26 Feb 2024 15:28:59 +0100: > >> Hi Miquel, >> >> On 2/23/24 12:55, Miquel Raynal wrote: >>> Hello, >>> >>> Following Christophe report I manually inserted many different >>> conditions to test the logic enablingand configuring continuous reads in >>> the core, trying to clarify the core and hopefully fix it for real. I am >>> pretty confident regarding the first patch but a bit more in the fog for >>> the second/third. Even though I'm pretty sure they improve the situation >>> there might still be corner cases I don't have in mind. >> >> I have tested the patchset and the issue is fixed, so I will send a tested-by on patch 1. > > Great! Thanks! > > For now I could not get my hands on a chip with more than one LUN. If > by chance yours has two LUNs (or more), could you please run some > experiments when crossing the LUN boundary? Sorry, I have one LUN. > > Anyhow, your Tested-by will be welcome. > >> But, I think that there is still an issue using FMC2 and probably others drivers. >> >> FMC2 driver has 2 modes: polling mode that will called nand_read_page_op and the sequencer mode that is defining its own HW read algorithm. >> >> The FMC2 sequencer do not support continuous read feature. >> >> I have added basic logs in nand_do_read_ops. My understanding is that the continuous read feature should be disabled at the end of this function. >> >> FMC2 polling mode: >> root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex >> [ 41.083132] nand_do_read_ops starts: cont_read.ongoing=1 >> [ 41.086410] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1 >> [ 41.094797] nand_do_read_ops ends: cont_read.ongoing=0 >> [ 41.098111] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1 >> Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex >> >> It is OK. In polling mode, con_read.ongoing is set to false before leaving nand_do_read_ops function. >> >> FMC2 sequencer: >> root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x2000 /tmp/read.hex >> [ 57.143059] nand_do_read_ops starts: cont_read.ongoing=1 >> [ 57.146370] nand_do_read_ops starts: cont_read.first_page=0, cont_read.last_page=1 >> [ 57.154469] nand_do_read_ops ends: cont_read.ongoing=1 >> [ 57.158020] nand_do_read_ops ends: cont_read.first_page=0, cont_read.last_page=1 >> Copied 8192 bytes from address 0x00000000 in flash to /tmp/read.hex >> >> KO, con_read.ongoing is set to true before leaving nand_do_read_ops function. That means that read_oob can returned wrong data (similar issue as the initial reported issue). >> >> So, I see 2 ways to fix this issue. >> >> On framework side by adding a callback (rawnand_disable_cont_reads) that will set to false con_read.ongoing before leaving nand_do_read_ops function. > > Can you please trace what happens here? Upon what specific condition > nand_do_read_ops does return with ongoing set to true? This is probably > what needs fixing, but I don't feel like a driver hook is the right > approach. The continuous read feature is disabled in nand_lp_exec_cont_read_page_op function. In case of the sequencer is used, this function is never called as the HW will read the page using its own algorithm. I have done the traces in polling and in sequencer mode. sequencer: root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x3000 /tmp/read.hex [ 70.867763] rawnand_enable_cont_reads [ 70.870033] rawnand_cap_cont_reads [ 70.873463] rawnand_cap_cont_reads: ppl=262144 luns_per_target=1 [ 70.879649] nand_do_read_ops before while loop: cont_read.ongoing=1 [ 70.885705] nand_do_read_ops before while loop: cont_read.first_page=0, cont_read.last_page=2 [ 70.894335] stm32_fmc2_nfc_seq_read_page [ 70.898637] stm32_fmc2_nfc_seq_read_page [ 70.902440] stm32_fmc2_nfc_seq_read_page [ 70.906471] nand_do_read_ops after while loop: cont_read.ongoing=1 [ 70.912293] nand_do_read_ops after while loop: cont_read.first_page=0, cont_read.last_page=2 Copied 12288 bytes from address 0x00000000 in flash to /tmp/read.hex polling: root@stm32mp1:~# mtd_debug read /dev/mtd9 0 0x3000 /tmp/read.hex [ 59.712206] rawnand_enable_cont_reads [ 59.714478] rawnand_cap_cont_reads [ 59.718060] rawnand_cap_cont_reads: ppl=262144 luns_per_target=1 [ 59.723951] nand_do_read_ops before while loop: cont_read.ongoing=1 [ 59.730273] nand_do_read_ops before while loop: cont_read.first_page=0, cont_read.last_page=2 [ 59.738793] stm32_fmc2_nfc_read_page [ 59.742335] nand_read_page_op [ 59.745264] rawnand_cont_read_ongoing [ 59.748942] nand_lp_exec_cont_read_page_op [ 59.753596] stm32_fmc2_nfc_read_page [ 59.756716] nand_read_page_op [ 59.759549] rawnand_cont_read_ongoing [ 59.763184] nand_lp_exec_cont_read_page_op [ 59.768329] stm32_fmc2_nfc_read_page [ 59.770879] nand_read_page_op [ 59.773907] rawnand_cont_read_ongoing [ 59.777615] nand_lp_exec_cont_read_page_op [ 59.782217] nand_do_read_ops after while loop: cont_read.ongoing=0 [ 59.787818] nand_do_read_ops after while loop: cont_read.first_page=0, cont_read.last_page=2 Copied 12288 bytes from address 0x00000000 in flash to /tmp/read.hex > > I was expecting the ongoing boolean to always be reset at the end of > nand_do_read_ops(), I probably missed a scenario. In any case what is > probably needed for the sequencer to work is: > > +++ b/drivers/mtd/nand/raw/nand_base.c > @@ -3726,6 +3726,7 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from, > } > } > nand_deselect_target(chip); > + chip->cont_read.ongoing = false; With this patch, it is OK, the continuous read feature will be disabled as expected in all cases. Regards, Christophe Kerello. > > ops->retlen = ops->len - (size_t) readlen; > if (oob) > >> >> Or >> >> On FMC2 driver side by disabling the continuous read feature in case of the sequencer is used like it is done in nandsim.c driver. >> Something like: >> if (check_only) { >> for (op_id = 0; op_id < op->ninstrs; op_id++) { >> instr = &op->instrs[op_id]; >> if (instr->type == NAND_OP_CMD_INSTR && >> (instr->ctx.cmd.opcode == NAND_CMD_READCACHEEND || >> instr->ctx.cmd.opcode == NAND_CMD_READCACHESEQ)) >> return -EOPNOTSUPP; >> } >> >> return 0; >> } > > This is a second possible choice if the first one does not give > interesting results. Not my favorite in your case. > > Thanks for your help, > Miquèl
Hi Christophe, > > I was expecting the ongoing boolean to always be reset at the end of > > nand_do_read_ops(), I probably missed a scenario. In any case what is > > probably needed for the sequencer to work is: > > > > +++ b/drivers/mtd/nand/raw/nand_base.c > > @@ -3726,6 +3726,7 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from, > > } > > } > > nand_deselect_target(chip); > > + chip->cont_read.ongoing = false; > > With this patch, it is OK, the continuous read feature will be disabled > as expected in all cases. Thanks to your feedback, I've thought again about this case. The fix presented here will most likely work but does not sound ideal as we still enable the continuous read internal flag without using it. So I am drafting something else that should ensure we are always enabling continuous read only when we can use it. As a second step, I will reset the flag like above, but with a WARN_ON_ONCE(). This way: - If it happens that we missed another path where this is not disabled, there will be no behavioral bug (hopefully). - We will still be notified by the warning. Thanks for all your valuable feedback and testing. Let me know what you think of this new series. Thanks, Miquèl