diff mbox

[RFC,0/5] fix data+OOB writes, add ioctl

Message ID CAN8TOE-jYm3pRa3utwnqYRB=DonxboW7PfFq-CqdqG+_OzCh9A@mail.gmail.com
State RFC
Headers show

Commit Message

Brian Norris Aug. 23, 2011, 4:47 p.m. UTC
On Mon, Aug 22, 2011 at 11:48 PM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
>>>> On Wed, 2011-08-17 at 16:50 -0700, Brian Norris wrote:
>>>>> 1) Broken "noecc" writes; `nandwrite -n -o' does not work on my
>>>>>   hardware, at least, without these patches
>
> This particular issue was fixed in a patch by Peter Wippich on June 6th:
>
> http://lists.infradead.org/pipermail/linux-mtd/2011-June/036016.html

Right, I actually found this patch already and have it lined up in my
test kernel, but apparently my issue is separate.

I will try to describe my issue further on the thread for patch 1.
Jason Liu has asked about the details there.

> I could test patch 1 of this set to see if it accomplishes the same thing,
> but initially to me it seems that the issue has already been fixed, albeit
> perhaps not pushed yet?

I am now doubting that my fix will solve your problem like Peter
Wippich's patch did. I believe the problem I am having is specifically
related to my NAND controller's characteristics (how it handles ECC).

So for hardware on which Peter's fix solves your problems, I suppose I
would be interested mostly in seeing if my patch *breaks* your build.
FYI, I already see an issue for those who use both the builtin
`nand_write_page_raw()' and `nand_write_buf()' functions. It can be
fixed if you amend my patch with the following. I can resend later,
but test these for breakage if you can!

Comments

Ricard Wanderlof Aug. 24, 2011, 3:36 p.m. UTC | #1
On Tue, 23 Aug 2011, Brian Norris wrote:

> On Mon, Aug 22, 2011 at 11:48 PM, Ricard Wanderlof
> <ricard.wanderlof@axis.com> wrote:
>>>>> On Wed, 2011-08-17 at 16:50 -0700, Brian Norris wrote:
>>>>>> 1) Broken "noecc" writes; `nandwrite -n -o' does not work on my
>>>>>>   hardware, at least, without these patches
>>
>> This particular issue was fixed in a patch by Peter Wippich on June 6th:
>>
>> http://lists.infradead.org/pipermail/linux-mtd/2011-June/036016.html
>
> Right, I actually found this patch already and have it lined up in my
> test kernel, but apparently my issue is separate.
>
>> I could test patch 1 of this set to see if it accomplishes the same thing,
>> but initially to me it seems that the issue has already been fixed, albeit
>> perhaps not pushed yet?
>
> I am now doubting that my fix will solve your problem like Peter
> Wippich's patch did. I believe the problem I am having is specifically
> related to my NAND controller's characteristics (how it handles ECC).
>
> So for hardware on which Peter's fix solves your problems, I suppose I
> would be interested mostly in seeing if my patch *breaks* your build.
> FYI, I already see an issue for those who use both the builtin
> `nand_write_page_raw()' and `nand_write_buf()' functions. It can be
> fixed if you amend my patch with the following. I can resend later,
> but test these for breakage if you can!

Ok. I tried applying patch #1 of your patch set, to a tree in which 
Peter's patch has already been applying, to see what would happen. I had a 
problem in that in the mtdchar.c I have it looks like this:

      memset(chip->oob_poi, 0xff, mtd->oobsize);
      nand_fill_oob(chip, ops->oobbuf, ops->ooblen, ops);
      status = chip->ecc.write_oob(mtd, chip, page & chip->pagemask);
      memset(chip->oob_poi, 0xff, mtd->oobsize);

whereas your patch looks like it was made against a version which lacks 
the memsets. First I thought it was because I was running an older kernel 
(2.6.35), but I looked at HEAD of the linux-2.6 and mtd-2.6 trees at 
git.infradead.org, and it's the same there. So I'm not sure exactly which 
version your patch was made against. Perhaps it's obvious to someone but 
not me right now.

Nevertheless, after applying the patch, as you suspected, using 'nandwrite 
-o -n' fails, in this case the application hangs after outputting

Writing data to block 0

After applying the following patch from your email, 'nandwrite -o -n' 
initially appears to work, however upon closer inspection the oob's of all 
the written pages are all-ff, i.e. the oob's are never written.

> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1898,7 +1898,8 @@ out:
> static void nand_write_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
>                                const uint8_t *buf)
> {
> -       chip->write_buf(mtd, buf, mtd->writesize);
> +       if (buf)
> +               chip->write_buf(mtd, buf, mtd->writesize);
>        chip->write_buf(mtd, chip->oob_poi, mtd->oobsize);
> }
>

Now, this may all be nonsense as as I mentioned I'm not sure that I've had 
the right version to start with. The memsets bug me, as they would explain 
the all-ff's stuff, but I don't really feel like just experimenting.

/Ricard
Brian Norris Aug. 24, 2011, 6:01 p.m. UTC | #2
On Wed, Aug 24, 2011 at 8:36 AM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
...
> I had a problem
> in that in the mtdchar.c I have it looks like this:

Did you mean nand_base.c?

> whereas your patch looks like it was made against a version which lacks the
> memsets. First I thought it was because I was running an older kernel
> (2.6.35), but I looked at HEAD of the linux-2.6 and mtd-2.6 trees at
> git.infradead.org, and it's the same there. So I'm not sure exactly which
> version your patch was made against. Perhaps it's obvious to someone but not
> me right now.

My patches were based on l2-mtd-2.6.git, actually. David Woodhouse
rarely pulls patches into his mtd-2.6 tree, so I have moved to working
with Artem's l2-mtd-2.6 tree, where all the MTD work that's waiting
for upstream sits (some stuff's been there since May). This is not
obvious, and usually when it matters, I try to mention it in the patch
summaries.

Artem: is there any official change in policy on patch submission? I
see documentation that says to base off mtd-2.6.git, but I've been
using l2-mtd-2.6 to help you avoid merge conflicts:
http://www.linux-mtd.infradead.org/source.html

Anyway, I believe the relevant patch dependency (from l2-mtd-2.6.git) is:

   commit a8ee364bbf14861d5d0af39c4da06c30441895fb
   Author: THOMSON, Adam (Adam) <adam.thomson@alcatel-lucent.com>
   mtd: nand_base: always initialise oob_poi before writing OOB data

> Nevertheless, after applying the patch, as you suspected, using 'nandwrite
> -o -n' fails, in this case the application hangs after outputting
>
> Writing data to block 0

I guessed you would be more likely to get a segmentation fault on a
NULL pointer, but I may be wrong.

> Now, this may all be nonsense as as I mentioned I'm not sure that I've had
> the right version to start with. The memsets bug me, as they would explain
> the all-ff's stuff, but I don't really feel like just experimenting.

Right, that's fair enough. I still think this patch is not 100% ready,
anyway. Thanks for the help though!

If you're still up for trying, you can try applying Adam Thomson's
patch, then my RFC, then the inlined amendment I sent. With your
feedback, I will

I need to figure out structurally how to handle the differences
between hardware that uses the standard functions (and works fine
without my patch) and hardware like mine that needs more information
passed to it. Perhaps there needs to be a replaceable
`chip->ecc.write_oob_raw' function? Or at least some modifications to
the other "replaceable" raw functions.

Brian
Artem Bityutskiy Aug. 25, 2011, 7:21 a.m. UTC | #3
On Wed, 2011-08-24 at 11:01 -0700, Brian Norris wrote:
> On Wed, Aug 24, 2011 at 8:36 AM, Ricard Wanderlof
> <ricard.wanderlof@axis.com> wrote:
> ...
> > I had a problem
> > in that in the mtdchar.c I have it looks like this:
> 
> Did you mean nand_base.c?
> 
> > whereas your patch looks like it was made against a version which lacks the
> > memsets. First I thought it was because I was running an older kernel
> > (2.6.35), but I looked at HEAD of the linux-2.6 and mtd-2.6 trees at
> > git.infradead.org, and it's the same there. So I'm not sure exactly which
> > version your patch was made against. Perhaps it's obvious to someone but not
> > me right now.
> 
> My patches were based on l2-mtd-2.6.git, actually. David Woodhouse
> rarely pulls patches into his mtd-2.6 tree, so I have moved to working
> with Artem's l2-mtd-2.6 tree, where all the MTD work that's waiting
> for upstream sits (some stuff's been there since May). This is not
> obvious, and usually when it matters, I try to mention it in the patch
> summaries.

David's tree is desperately out-of-date now, I did not talk to him
lately, he is not very reachable now.

There are patches from May because David did merge anything this merge
window, probably he had some issues/etc, let's hope he'll merge
everything next merge window. May be he wanted to ask me to merge it,
but I have been having vacation and was not available at the IRC chat.

> Artem: is there any official change in policy on patch submission? I
> see documentation that says to base off mtd-2.6.git, but I've been
> using l2-mtd-2.6 to help you avoid merge conflicts:
> http://www.linux-mtd.infradead.org/source.html

There is no official policy, this all works because enthusiasts who just
like MTD stuff and keep it alive. When I noticed that dwmw2 does not
give MTD ML enough attention, I just started my l2 tree to help him - it
was faster/easier for him to look with reviewed patches in my tree
rather than look through whole MTD ML, find out which acks/reviewed-by
to add and where, which patch versions are out of date, etc. 

At this point I think, that you have to use the l2 tree, because David's
tree is very out-of-date. Also, beware that the l2 tree is currently in
linux-next.
Ricard Wanderlof Aug. 25, 2011, 9:33 a.m. UTC | #4
On Wed, 24 Aug 2011, Brian Norris wrote:

> On Wed, Aug 24, 2011 at 8:36 AM, Ricard Wanderlof
> <ricard.wanderlof@axis.com> wrote:
> ...
>> I had a problem
>> in that in the mtdchar.c I have it looks like this:
>
> Did you mean nand_base.c?

Yes, sorry about that.

>> whereas your patch looks like it was made against a version which lacks the
>> memsets. First I thought it was because I was running an older kernel
>> ...
> My patches were based on l2-mtd-2.6.git, actually. David Woodhouse
> rarely pulls patches into his mtd-2.6 tree, so I have moved to working
> with Artem's l2-mtd-2.6 tree, where all the MTD work that's waiting
> for upstream sits (some stuff's been there since May). This is not
> obvious, and usually when it matters, I try to mention it in the patch
> summaries.

I guess I should have thought of it but never thought to look there.

> Anyway, I believe the relevant patch dependency (from l2-mtd-2.6.git) is:
>
>   commit a8ee364bbf14861d5d0af39c4da06c30441895fb
>   Author: THOMSON, Adam (Adam) <adam.thomson@alcatel-lucent.com>
>   mtd: nand_base: always initialise oob_poi before writing OOB data

That seems right.

>> Nevertheless, after applying the patch, as you suspected, using 'nandwrite
>> -o -n' fails, in this case the application hangs after outputting
>>
>> Writing data to block 0
>
> I guessed you would be more likely to get a segmentation fault on a
> NULL pointer, but I may be wrong.

Something seems to hang at a lower level, because the system becomes 
unresponsive at this point (I can telnet in, but trying to access the 
flash with ls for instance causes the shell to hang).

> If you're still up for trying, you can try applying Adam Thomson's
> patch, then my RFC, then the inlined amendment I sent. With your
> feedback, I will

(something seems missing here ... nevertheless)

I applied the patches as you mentioned, which brought success. Dumping a 
partition with nanddump, then writing it back with nandwrite -o -n results 
in the correct data being written both to the main and oob areas.

(Without your inlinend patch, the nandwrite application still hangs after 
Writing to data block 0).

So the conclusion would be that this combination of patches does not break 
Peter Wippich's patch.

/Ricard
Brian Norris Aug. 25, 2011, 5:54 p.m. UTC | #5
Hi Ricard,

On Thu, Aug 25, 2011 at 2:33 AM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
> On Wed, 24 Aug 2011, Brian Norris wrote:
>> On Wed, Aug 24, 2011 at 8:36 AM, Ricard Wanderlof <ricard.wanderlof@axis.com> wrote:
>> Anyway, I believe the relevant patch dependency (from l2-mtd-2.6.git) is:
>>
>>  commit a8ee364bbf14861d5d0af39c4da06c30441895fb
>>  Author: THOMSON, Adam (Adam) <adam.thomson@alcatel-lucent.com>
>>  mtd: nand_base: always initialise oob_poi before writing OOB data
>
> That seems right.

BTW, that patch seems a bit broken to me; I sent a fixup for it:
http://lists.infradead.org/pipermail/linux-mtd/2011-August/037698.html

>> I guessed you would be more likely to get a segmentation fault on a
>> NULL pointer, but I may be wrong.
>
> Something seems to hang at a lower level, because the system becomes
> unresponsive at this point (I can telnet in, but trying to access the flash
> with ls for instance causes the shell to hang).

I think that I had a little bit of the wrong approach. I was doing
some ill advised hacking to the existing write functions. I spun off
my first two patches here as a different series (I believe I CC'd
you); I used a different approach that should make as little impact on
currently working hardware as possible:
http://lists.infradead.org/pipermail/linux-mtd/2011-August/037695.html

> (something seems missing here ...

Acknowledged :)

> I applied the patches as you mentioned, which brought success. Dumping a
> partition with nanddump, then writing it back with nandwrite -o -n results
> in the correct data being written both to the main and oob areas.

Great!

> (Without your inlinend patch, the nandwrite application still hangs after
> Writing to data block 0).

Not quite sure what the hanging is all about - may be related to my
improper use of CMD_SEQIN, then a nand wait function being called
later? - but that's why I've changed my approach.

> So the conclusion would be that this combination of patches does not break
> Peter Wippich's patch.

Thanks a lot for the testing. I think that my first approach still may
easily have unintended consequences, though. I welcome any testing on
my new patch series, and any more systems with broken "noecc" should
be handled through that thread.

Brian
Ricard Wanderlof Aug. 26, 2011, 12:41 p.m. UTC | #6
On Thu, 25 Aug 2011, Brian Norris wrote:

>>>  Author: THOMSON, Adam (Adam) <adam.thomson@alcatel-lucent.com>
>>>  mtd: nand_base: always initialise oob_poi before writing OOB data
>>
>> That seems right.
>
> BTW, that patch seems a bit broken to me; I sent a fixup for it: 
> http://lists.infradead.org/pipermail/linux-mtd/2011-August/037698.html

Thanks, I did see it but it failed to register. I've applied it to my tree 
now.

> I think that I had a little bit of the wrong approach. I was doing
> some ill advised hacking to the existing write functions. I spun off
> my first two patches here as a different series (I believe I CC'd
> you); I used a different approach that should make as little impact on
> currently working hardware as possible:
> http://lists.infradead.org/pipermail/linux-mtd/2011-August/037695.html
>
> Thanks a lot for the testing. I think that my first approach still may 
> easily have unintended consequences, though. I welcome any testing on my 
> new patch series, and any more systems with broken "noecc" should be 
> handled through that thread.

I applied the two patches in the above mentioned patch series to my tree 
(after applying Peter's patch, Adam's, and your amendment to it). 
Conclusion: nandwrite -o -n still works for me, i.e. the patch doesn't 
break it.

One little caveat, the tree I'm using is 2.6.35 based. The patches don't 
apply cleanly, but it just takes a little effort. So I think the results 
are relevant, just wanted to mention it.

/Ricard
diff mbox

Patch

--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1898,7 +1898,8 @@  out:
 static void nand_write_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
                                const uint8_t *buf)
 {
-       chip->write_buf(mtd, buf, mtd->writesize);
+       if (buf)
+               chip->write_buf(mtd, buf, mtd->writesize);
        chip->write_buf(mtd, chip->oob_poi, mtd->oobsize);
 }