Message ID | 1464353701-23233-14-git-send-email-boris.brezillon@free-electrons.com |
---|---|
State | Superseded |
Headers | show |
On Fri, 27 May 2016 14:54:59 +0200, Boris Brezillon said: > From: Hans de Goede <hdegoede@redhat.com> > > On some nand controllers with hw-ecc the controller code wants to know > the ecc strength and size and having these as 0, 0 is not accepted. > > Specifying these in devicetree is possible but undesirable as the nand > may be different in different production runs of the same board, so it > is better to get this info from the nand id where possible. > > This commit adds code to read the ecc strength and size from the nand > for Samsung extended-id nands. This code is based on the info for the 5th > id byte in the datasheets for the following Samsung nands: K9GAG08U0E, > K9GAG08U0F, K9GAG08X0D, K9GBG08U0A, K9GBG08U0B. These all use these bits > in the exact same way. Is this correct for all Samsung nand devices supported by this driver? (If this driver only covers those 5 specific parts, it's OK. If there's others, more research is needed....)
Hi Valdis, On Sun, 29 May 2016 20:20:35 -0400 Valdis.Kletnieks@vt.edu wrote: > On Fri, 27 May 2016 14:54:59 +0200, Boris Brezillon said: > > From: Hans de Goede <hdegoede@redhat.com> > > > > On some nand controllers with hw-ecc the controller code wants to know > > the ecc strength and size and having these as 0, 0 is not accepted. > > > > Specifying these in devicetree is possible but undesirable as the nand > > may be different in different production runs of the same board, so it > > is better to get this info from the nand id where possible. > > > > This commit adds code to read the ecc strength and size from the nand > > for Samsung extended-id nands. This code is based on the info for the 5th > > id byte in the datasheets for the following Samsung nands: K9GAG08U0E, > > K9GAG08U0F, K9GAG08X0D, K9GBG08U0A, K9GBG08U0B. These all use these bits > > in the exact same way. > > Is this correct for all Samsung nand devices supported by this driver? > > (If this driver only covers those 5 specific parts, it's OK. If there's > others, more research is needed....) Actually, that was my first reaction [1], but the more I think about it the more I realize it's a non-issue. AFAICT, there's no full-id entries for Samsung NANDs in the nand_ids table, so this either means there's no real users of Samsung MLCs or NAND controller drivers connecting to those chips don't care about the ->ecc_{step_ds,strength_ds} fields. I agree that the solution is not perfect, but I'd prefer seeing the NAND detection code iteratively improved than rejecting everything until we're 100% sure that all cases are correctly handled (which might never happen since NAND vendors introduce new NAND ID scheme if they need to). BTW, do you have Samsung datasheets describing a different NAND ID format, or is it purely hypothetical? Regards, Boris [1]http://lists.infradead.org/pipermail/linux-mtd/2015-July/060582.html
On Mon, 30 May 2016 09:44:46 +0200, Boris Brezillon said: > Hi Valdis, > Actually, that was my first reaction [1], but the more I think about it > the more I realize it's a non-issue. > AFAICT, there's no full-id entries for Samsung NANDs in the nand_ids > table, so this either means there's no real users of Samsung MLCs or > NAND controller drivers connecting to those chips don't care about the > ->ecc_{step_ds,strength_ds} fields. I'm mostly, though not totally convinced (not having looked closely at the existing code). There's still a possible issue with the distinction between: A) "driver never references the variable" and B) driver check if it's zero, and acts like it doesn't care if it is, but if it's non-zero, it goes ahead and uses it, with possible hilarity ensuing if the value is wrong. Should be pretty easy for somebody who knows the code better than I to rule out case B fairly quickly... > I agree that the solution is not perfect, but I'd prefer seeing the > NAND detection code iteratively improved than rejecting everything > until we're 100% sure that all cases are correctly handled (which might > never happen since NAND vendors introduce new NAND ID scheme if they > need to). > > BTW, do you have Samsung datasheets describing a different NAND ID > format, or is it purely hypothetical? Mostly hypothetical. I've just seen too many patches that assume "all chips from vendor XYZ do *this*" that were not at all corrrect.
On Mon, 30 May 2016 16:56:09 -0400 Valdis.Kletnieks@vt.edu wrote: > On Mon, 30 May 2016 09:44:46 +0200, Boris Brezillon said: > > Hi Valdis, > > > Actually, that was my first reaction [1], but the more I think about it > > the more I realize it's a non-issue. > > AFAICT, there's no full-id entries for Samsung NANDs in the nand_ids > > table, so this either means there's no real users of Samsung MLCs or > > NAND controller drivers connecting to those chips don't care about the > > ->ecc_{step_ds,strength_ds} fields. > > I'm mostly, though not totally convinced (not having looked closely at > the existing code). There's still a possible issue with the distinction > between: > > A) "driver never references the variable" and > > B) driver check if it's zero, and acts like it doesn't care if it is, but if > it's non-zero, it goes ahead and uses it, with possible hilarity ensuing if the > value is wrong. > > Should be pretty easy for somebody who knows the code better than I to rule > out case B fairly quickly... Ok, so I had a quick look, and only 4 drivers are actually using the ->ecc_{strength,step}_ds fields, and AFAICT, all of them are already broken with the existing implementation, even if those fields are set to 0. - the atmel driver uses a default ECC config (2bits/512bytes) if those fields are set to 0, and this config is clearly not suitable for the MLC NANDs we are talking about (note that SLC NANDs seem to all use the 4 bytes extended ID scheme, which seems to be common to all vendors). - the gpmi driver either returns an error if one of these fields are set to zero and the 'fsl,use-minimum-ecc' DT property is defined, or tries to fill the whole OOB area with ECC bytes if the property is not defined. The 2nd solution could work, if only we were sure about the encoding of the OOB size, but, as the ECC requirements field, it depends on the extended ID scheme. So, in the end, it's broken too. - the pxa and sunxi drivers are just blindly relying on those fields if the 'nand-ecc-strength' and 'nand-ecc-step-size' properties are undefined. The pxa default to 1bit/512bytes if ecc strength or ecc step appear to be set to 0, while the sunxi driver completely rejects the NAND chip. In both cases, the current implementation is broken, either because you will use an unsuitable ECC config or because your NAND chip won't be registered. So, as you can see, we're just moving from a broken state to another broken state, except the new infrastructures allows one to extend the detection logic and thus allow for correct detection of more chips. > > > I agree that the solution is not perfect, but I'd prefer seeing the > > NAND detection code iteratively improved than rejecting everything > > until we're 100% sure that all cases are correctly handled (which might > > never happen since NAND vendors introduce new NAND ID scheme if they > > need to). > > > > BTW, do you have Samsung datasheets describing a different NAND ID > > format, or is it purely hypothetical? > > Mostly hypothetical. I've just seen too many patches that assume "all chips > from vendor XYZ do *this*" that were not at all corrrect. > Yep, that's true, except I'm not promising anything here, I just say that this patch adds code to detect a range of Samsung chips, and that it can be extended to properly detect chips that do not use this format if we appear to find some (which is very likely to happen). Of course, we could decide to leave everything as is and add full-id entries to the nand_ids table each time we want to support a new chip that does not expose a valid ONFI of JEDEC parameter table. But that means adding more and more info to the nand_flash_dev structure and polluting the nand_ids table with a bunch of NAND chips that could otherwise be handled by the same detection code. And as detailed above, this solution is just as broken as mine but in a different way (in both cases, NANDs that are not already supported by the kernel will either be rejected or used ).
On Tue, 31 May 2016 00:28:24 +0200 Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > On Mon, 30 May 2016 16:56:09 -0400 > Valdis.Kletnieks@vt.edu wrote: > > > On Mon, 30 May 2016 09:44:46 +0200, Boris Brezillon said: > > > Hi Valdis, > > > > > Actually, that was my first reaction [1], but the more I think about it > > > the more I realize it's a non-issue. > > > AFAICT, there's no full-id entries for Samsung NANDs in the nand_ids > > > table, so this either means there's no real users of Samsung MLCs or > > > NAND controller drivers connecting to those chips don't care about the > > > ->ecc_{step_ds,strength_ds} fields. > > > > I'm mostly, though not totally convinced (not having looked closely at > > the existing code). There's still a possible issue with the distinction > > between: > > > > A) "driver never references the variable" and > > > > B) driver check if it's zero, and acts like it doesn't care if it is, but if > > it's non-zero, it goes ahead and uses it, with possible hilarity ensuing if the > > value is wrong. > > > > Should be pretty easy for somebody who knows the code better than I to rule > > out case B fairly quickly... > > Ok, so I had a quick look, and only 4 drivers are actually using the > ->ecc_{strength,step}_ds fields, and AFAICT, all of them are already > broken with the existing implementation, even if those fields are set > to 0. > > - the atmel driver uses a default ECC config (2bits/512bytes) if > those fields are set to 0, and this config is clearly not suitable > for the MLC NANDs we are talking about (note that SLC NANDs seem to > all use the 4 bytes extended ID scheme, which seems to be common to > all vendors). > > - the gpmi driver either returns an error if one of these fields > are set to zero and the 'fsl,use-minimum-ecc' DT property is defined, > or tries to fill the whole OOB area with ECC bytes if the property is > not defined. The 2nd solution could work, if only we were sure about > the encoding of the OOB size, but, as the ECC requirements field, it > depends on the extended ID scheme. So, in the end, it's broken too. > > - the pxa and sunxi drivers are just blindly relying on those fields if > the 'nand-ecc-strength' and 'nand-ecc-step-size' properties are > undefined. The pxa default to 1bit/512bytes if ecc strength or ecc > step appear to be set to 0, while the sunxi driver completely rejects > the NAND chip. > In both cases, the current implementation is broken, either because > you will use an unsuitable ECC config or because your NAND chip won't > be registered. > > So, as you can see, we're just moving from a broken state to another > broken state, except the new infrastructures allows one to extend the > detection logic and thus allow for correct detection of more chips. > > > > > > I agree that the solution is not perfect, but I'd prefer seeing the > > > NAND detection code iteratively improved than rejecting everything > > > until we're 100% sure that all cases are correctly handled (which might > > > never happen since NAND vendors introduce new NAND ID scheme if they > > > need to). > > > > > > BTW, do you have Samsung datasheets describing a different NAND ID > > > format, or is it purely hypothetical? > > > > Mostly hypothetical. I've just seen too many patches that assume "all chips > > from vendor XYZ do *this*" that were not at all corrrect. > > > > Yep, that's true, except I'm not promising anything here, I just say > that this patch adds code to detect a range of Samsung chips, and that > it can be extended to properly detect chips that do not use this format > if we appear to find some (which is very likely to happen). > > Of course, we could decide to leave everything as is and add full-id > entries to the nand_ids table each time we want to support a new chip > that does not expose a valid ONFI of JEDEC parameter table. But that > means adding more and more info to the nand_flash_dev structure and > polluting the nand_ids table with a bunch of NAND chips that could > otherwise be handled by the same detection code. > And as detailed above, this solution is just as broken as mine but in a > different way (in both cases, NANDs that are not already supported by > the kernel will either be rejected or used ). ^ inappropriately
diff --git a/drivers/mtd/nand/nand_samsung.c b/drivers/mtd/nand/nand_samsung.c index 27ede41..39448ff 100644 --- a/drivers/mtd/nand/nand_samsung.c +++ b/drivers/mtd/nand/nand_samsung.c @@ -61,6 +61,26 @@ static int samsung_nand_decode_id(struct nand_chip *chip) extid >>= 2; mtd->erasesize = (128 * 1024) << (((extid >> 1) & 0x04) | (extid & 0x03)); + + /* Extract ECC requirements from 5th id byte*/ + extid = (chip->id.data[4] >> 4) & 0x07; + if (extid < 5) { + chip->ecc_step_ds = 512; + chip->ecc_strength_ds = 1 << extid; + } else { + chip->ecc_step_ds = 1024; + switch (extid) { + case 5: + chip->ecc_strength_ds = 24; + break; + case 6: + chip->ecc_strength_ds = 40; + break; + case 7: + chip->ecc_strength_ds = 60; + break; + } + } } if (mtd->writesize > 512)