Message ID | 20250512084033.69718-1-csokas.bence@prolan.hu |
---|---|
State | New |
Headers | show |
Series | [v3] mtd: Verify written data in paranoid mode | expand |
On 12/05/2025 at 10:40:32 +02, Bence Csókás <csokas.bence@prolan.hu> wrote: > Add MTD_PARANOID config option for verifying all written data to prevent > silent bit errors being undetected, at the cost of some bandwidth overhead. > > Signed-off-by: Bence Csókás <csokas.bence@prolan.hu> > --- > drivers/mtd/Kconfig | 14 ++++++++++++ > drivers/mtd/mtdcore.c | 51 +++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 63 insertions(+), 2 deletions(-) > > diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig > index 796a2eccbef0..e75f4a57df6a 100644 > --- a/drivers/mtd/Kconfig > +++ b/drivers/mtd/Kconfig > @@ -206,6 +206,20 @@ config MTD_PARTITIONED_MASTER > the parent of the partition device be the master device, rather than > what lies behind the master. > > +config MTD_PARANOID > + bool "Read back written data (paranoid mode)" > + help > + This option makes the MTD core read back all data on a write and > + report an error if it doesn't match the written data. This can > + safeguard against silent bit errors resulting from a faulty Flash, > + controller oddities, bus noise etc. > + > + It is up to the layer above MTD (e.g. the filesystem) to handle > + this condition, for example by going read-only to prevent further > + data corruption, or to mark a certain region of Flash as bad. > + > + If you are unsure, select 'n'. > + > source "drivers/mtd/chips/Kconfig" > > source "drivers/mtd/maps/Kconfig" > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c > index 5ba9a741f5ac..3f9874cd4126 100644 > --- a/drivers/mtd/mtdcore.c > +++ b/drivers/mtd/mtdcore.c > @@ -1745,8 +1745,8 @@ int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops) > } > EXPORT_SYMBOL_GPL(mtd_read_oob); > > -int mtd_write_oob(struct mtd_info *mtd, loff_t to, > - struct mtd_oob_ops *ops) > +static int _mtd_write_oob(struct mtd_info *mtd, loff_t to, > + struct mtd_oob_ops *ops) I don't like these '_' prefixes, they do not indicate much about the content of the function. I don't think we need an extra function for that, just include the check in mtd_write_oob? > { > struct mtd_info *master = mtd_get_master(mtd); > int ret; > @@ -1771,6 +1771,53 @@ int mtd_write_oob(struct mtd_info *mtd, loff_t to, > > return mtd_write_oob_std(mtd, to, ops); > } > + > +static int _mtd_verify(struct mtd_info *mtd, loff_t to, size_t len, const u8 *buf) > +{ > + struct device *dev = &mtd->dev; > + u_char *verify_buf; > + size_t r_retlen; > + int ret; > + > + verify_buf = devm_kmalloc(dev, len, GFP_KERNEL); > + if (!verify_buf) > + return -ENOMEM; > + > + ret = mtd_read(mtd, to, len, &r_retlen, verify_buf); > + if (ret < 0) > + goto err; > + > + if (len != r_retlen) { > + /* We shouldn't see short reads */ > + dev_err(dev, "Verify failed, written %zd but only read %zd", > + len, r_retlen); > + ret = -EIO; > + goto err; > + } > + > + if (memcmp(verify_buf, buf, len)) { > + dev_err(dev, "Verify failed, compare mismatch!"); > + ret = -EIO; > + } > + > +err: > + devm_kfree(dev, verify_buf); > + return ret; > +} > + > +int mtd_write_oob(struct mtd_info *mtd, loff_t to, > + struct mtd_oob_ops *ops) > +{ > + int ret = _mtd_write_oob(mtd, to, ops); > + > + if (ret < 0) > + return ret; > + > + if (IS_ENABLED(CONFIG_MTD_PARANOID)) > + ret = _mtd_verify(mtd, to, ops->retlen, ops->datbuf); Why _mtd_verify and not mtd_verify?
----- Ursprüngliche Mail ----- > Von: "Miquel Raynal" <miquel.raynal@bootlin.com> >> +config MTD_PARANOID >> + bool "Read back written data (paranoid mode)" >> + help >> + This option makes the MTD core read back all data on a write and >> + report an error if it doesn't match the written data. This can >> + safeguard against silent bit errors resulting from a faulty Flash, >> + controller oddities, bus noise etc. >> + >> + It is up to the layer above MTD (e.g. the filesystem) to handle >> + this condition, for example by going read-only to prevent further >> + data corruption, or to mark a certain region of Flash as bad. >> + >> + If you are unsure, select 'n'. I still have a hard time seeing the benefit of this. To me it looks like you're working around broken hardware. Thanks, //richard
Hi, On 2025. 05. 12. 11:14, Miquel Raynal wrote: > Why _mtd_verify and not mtd_verify? Hm, no particular reason, I was thinking that since it's an "internal" function, like `_mtd_write_oob()`, it would get the underscore. But now that I think about it, there are many static functions already without this underscore. Should I change it? On 2025. 05. 12. 11:45, Richard Weinberger wrote: > I still have a hard time seeing the benefit of this. > To me it looks like you're working around broken hardware. > > Thanks, > //richard Well, yes, in our case. But the point is, we have a strict requirement for data integrity, which is not unique to us I believe. I would think there are other industrial control applications like ours, which dictate a high data integrity. Bence
----- Ursprüngliche Mail ----- > Von: "Csókás Bence" <csokas.bence@prolan.hu> > Well, yes, in our case. But the point is, we have a strict requirement > for data integrity, which is not unique to us I believe. I would think > there are other industrial control applications like ours, which dictate > a high data integrity. In your last patch set you said your hardware has an issue that every now and that data is not properly written. Now you talk about data integrity requirements. I'm confused. My point is that at some level we need to trust hardware, if your flash memory is so broken that you can't rely on the write path you're in deep trouble. What is the next step, reading it back every five seconds to make sure it is still there? (just kidding). We do have plenty of tests to verify this for *testing* to find broken drivers or hardware. e.g. mtd/ubi tests, UBI verify at runtime. Thanks, //richard
Hi, On 2025. 05. 12. 14:47, Richard Weinberger wrote: > ----- Ursprüngliche Mail ----- >> Von: "Csókás Bence" <csokas.bence@prolan.hu> >> Well, yes, in our case. But the point is, we have a strict requirement >> for data integrity, which is not unique to us I believe. I would think >> there are other industrial control applications like ours, which dictate >> a high data integrity. > > In your last patch set you said your hardware has an issue that every > now and that data is not properly written. > Now you talk about data integrity requirements. I'm confused. The two problems are not too dissimilar: in one case you have a random, and _very_ low chance of data corruption, e.g. because of noise, aging hardware, power supply ripple etc. But you still need to make sure that the written data is absolutely correct; or if it is not, the system will immediately enter some fail-safe mode. This is the problem we want to solve, for everybody using Linux in high reliability environments. The problem we _have_ though happens to be a bit different: here we are blursed with a system that corrupts data at a noticeable probability. But the model is the same: a stochastic process introducing bit errors on write. But I sincerely hope no one else has this problem, and this is *not* the primary aim of this patch; it just happens to solve our issue as well. But I intend it to be useful for the larger Linux community, thus the primary goal is to solve the first issue. > My point is that at some level we need to trust hardware, > if your flash memory is so broken that you can't rely on the write > path you're in deep trouble. Sure, but at the moment, we're not giving any return path for hardware. We just shovel megabytes at it, and don't even ask back. In critical systems, this will not fly. > What is the next step, reading it back every five seconds to make > sure it is still there? (just kidding). (( Well, you're kidding now, but this is what we will have to do in another project, a rail interlocking system. Though obviously not in the kernel. But I digress... )) Bence
Hello, On 12/05/2025 at 15:13:20 +02, Csókás Bence <csokas.bence@prolan.hu> wrote: > Hi, > > On 2025. 05. 12. 14:47, Richard Weinberger wrote: >> ----- Ursprüngliche Mail ----- >>> Von: "Csókás Bence" <csokas.bence@prolan.hu> >>> Well, yes, in our case. But the point is, we have a strict requirement >>> for data integrity, which is not unique to us I believe. I would think >>> there are other industrial control applications like ours, which dictate >>> a high data integrity. >> In your last patch set you said your hardware has an issue that every >> now and that data is not properly written. >> Now you talk about data integrity requirements. I'm confused. > > The two problems are not too dissimilar: in one case you have a random, > and _very_ low chance of data corruption, e.g. because of noise, aging > hardware, power supply ripple etc. But you still need to make sure that > the written data is absolutely correct; or if it is not, the system will > immediately enter some fail-safe mode. This is the problem we want to > solve, for everybody using Linux in high reliability environments. > > The problem we _have_ though happens to be a bit different: here we are > blursed with a system that corrupts data at a noticeable > probability. But the model is the same: a stochastic process introducing > bit errors on write. But I sincerely hope no one else has this problem, > and this is *not* the primary aim of this patch; it just happens to > solve our issue as well. But I intend it to be useful for the larger > Linux community, thus the primary goal is to solve the first issue. I don't have a strong opinion there but I don't dislike this idea because it might also help troubleshooting errors sometimes. It is very hard to understand issues which happen to be discovered way after they have been generated (typically during a read, way later than a "faulty" write). Having this paranoid option would give a more synchronous approach which is easier to work with sometimes. Cheers, Miquèl
----- Ursprüngliche Mail ----- > Von: "Miquel Raynal" <miquel.raynal@bootlin.com> >> The problem we _have_ though happens to be a bit different: here we are >> blursed with a system that corrupts data at a noticeable >> probability. But the model is the same: a stochastic process introducing >> bit errors on write. But I sincerely hope no one else has this problem, >> and this is *not* the primary aim of this patch; it just happens to >> solve our issue as well. But I intend it to be useful for the larger >> Linux community, thus the primary goal is to solve the first issue. > > I don't have a strong opinion there but I don't dislike this idea > because it might also help troubleshooting errors sometimes. It is very > hard to understand issues which happen to be discovered way after they > have been generated (typically during a read, way later than a "faulty" > write). Having this paranoid option would give a more synchronous > approach which is easier to work with sometimes. UBI offers this already, there is a write self-check as part of the io checks that can be enabled via debugfs per UBI device. So for troubleshooting this should be good enough. There is room for improvement, though. Currently it uses vmalloc(). Thanks, //richard
On 12/05/2025 at 17:23:16 +02, Richard Weinberger <richard@nod.at> wrote: > ----- Ursprüngliche Mail ----- >> Von: "Miquel Raynal" <miquel.raynal@bootlin.com> >>> The problem we _have_ though happens to be a bit different: here we are >>> blursed with a system that corrupts data at a noticeable >>> probability. But the model is the same: a stochastic process introducing >>> bit errors on write. But I sincerely hope no one else has this problem, >>> and this is *not* the primary aim of this patch; it just happens to >>> solve our issue as well. But I intend it to be useful for the larger >>> Linux community, thus the primary goal is to solve the first issue. >> >> I don't have a strong opinion there but I don't dislike this idea >> because it might also help troubleshooting errors sometimes. It is very >> hard to understand issues which happen to be discovered way after they >> have been generated (typically during a read, way later than a "faulty" >> write). Having this paranoid option would give a more synchronous >> approach which is easier to work with sometimes. > > UBI offers this already, there is a write self-check as part of the io > checks that can be enabled > via debugfs per UBI device. > So for troubleshooting this should be good enough. > There is room for improvement, though. Currently it uses vmalloc(). UBI is full of uncovered resources :-)
Hi, On 2025. 05. 12. 17:33, Miquel Raynal wrote: > On 12/05/2025 at 17:23:16 +02, Richard Weinberger <richard@nod.at> wrote: > >> ----- Ursprüngliche Mail ----- >>> Von: "Miquel Raynal" <miquel.raynal@bootlin.com> >>>> The problem we _have_ though happens to be a bit different: here we are >>>> blursed with a system that corrupts data at a noticeable >>>> probability. But the model is the same: a stochastic process introducing >>>> bit errors on write. But I sincerely hope no one else has this problem, >>>> and this is *not* the primary aim of this patch; it just happens to >>>> solve our issue as well. But I intend it to be useful for the larger >>>> Linux community, thus the primary goal is to solve the first issue. >>> >>> I don't have a strong opinion there but I don't dislike this idea >>> because it might also help troubleshooting errors sometimes. It is very >>> hard to understand issues which happen to be discovered way after they >>> have been generated (typically during a read, way later than a "faulty" >>> write). Having this paranoid option would give a more synchronous >>> approach which is easier to work with sometimes. >> >> UBI offers this already, there is a write self-check as part of the io >> checks that can be enabled >> via debugfs per UBI device. >> So for troubleshooting this should be good enough. >> There is room for improvement, though. Currently it uses vmalloc(). > > UBI is full of uncovered resources :-) For sure. Plus, even though we use UBI + UBIFS, that may not necessarily be the case for others. Maybe someone uses mtdblock + some "conventional" FS (ext*, f2fs etc.). Or maybe they use JFFS. Or maybe no FS at all. We should still allow userspace or higher FS layers to be notified of the problem. Bence
----- Ursprüngliche Mail ----- > Von: "Csókás Bence" <csokas.bence@prolan.hu> >>> UBI offers this already, there is a write self-check as part of the io >>> checks that can be enabled >>> via debugfs per UBI device. >>> So for troubleshooting this should be good enough. >>> There is room for improvement, though. Currently it uses vmalloc(). >> >> UBI is full of uncovered resources :-) > > For sure. Plus, even though we use UBI + UBIFS, that may not necessarily > be the case for others. Maybe someone uses mtdblock + some > "conventional" FS (ext*, f2fs etc.). Or maybe they use JFFS. Or maybe no > FS at all. We should still allow userspace or higher FS layers to be > notified of the problem. Nobody sane would every use a regular block based filesystem in rw mode on-top of mtdblock. Especially if you care about your data. JFFS2 has CONFIG_JFFS2_FS_WBUF_VERIFY. AFAIK, this origins form the old and dark age of flashes where everything was wonky and no good test tooling existed. Long story short, I'm not convinced by this patch set. No ack form my side. At least it should be configurable per device and not being a global config option. If Michael or Miquel like it, I'll not block this decision. Thanks, //richard
diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig index 796a2eccbef0..e75f4a57df6a 100644 --- a/drivers/mtd/Kconfig +++ b/drivers/mtd/Kconfig @@ -206,6 +206,20 @@ config MTD_PARTITIONED_MASTER the parent of the partition device be the master device, rather than what lies behind the master. +config MTD_PARANOID + bool "Read back written data (paranoid mode)" + help + This option makes the MTD core read back all data on a write and + report an error if it doesn't match the written data. This can + safeguard against silent bit errors resulting from a faulty Flash, + controller oddities, bus noise etc. + + It is up to the layer above MTD (e.g. the filesystem) to handle + this condition, for example by going read-only to prevent further + data corruption, or to mark a certain region of Flash as bad. + + If you are unsure, select 'n'. + source "drivers/mtd/chips/Kconfig" source "drivers/mtd/maps/Kconfig" diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c index 5ba9a741f5ac..3f9874cd4126 100644 --- a/drivers/mtd/mtdcore.c +++ b/drivers/mtd/mtdcore.c @@ -1745,8 +1745,8 @@ int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops) } EXPORT_SYMBOL_GPL(mtd_read_oob); -int mtd_write_oob(struct mtd_info *mtd, loff_t to, - struct mtd_oob_ops *ops) +static int _mtd_write_oob(struct mtd_info *mtd, loff_t to, + struct mtd_oob_ops *ops) { struct mtd_info *master = mtd_get_master(mtd); int ret; @@ -1771,6 +1771,53 @@ int mtd_write_oob(struct mtd_info *mtd, loff_t to, return mtd_write_oob_std(mtd, to, ops); } + +static int _mtd_verify(struct mtd_info *mtd, loff_t to, size_t len, const u8 *buf) +{ + struct device *dev = &mtd->dev; + u_char *verify_buf; + size_t r_retlen; + int ret; + + verify_buf = devm_kmalloc(dev, len, GFP_KERNEL); + if (!verify_buf) + return -ENOMEM; + + ret = mtd_read(mtd, to, len, &r_retlen, verify_buf); + if (ret < 0) + goto err; + + if (len != r_retlen) { + /* We shouldn't see short reads */ + dev_err(dev, "Verify failed, written %zd but only read %zd", + len, r_retlen); + ret = -EIO; + goto err; + } + + if (memcmp(verify_buf, buf, len)) { + dev_err(dev, "Verify failed, compare mismatch!"); + ret = -EIO; + } + +err: + devm_kfree(dev, verify_buf); + return ret; +} + +int mtd_write_oob(struct mtd_info *mtd, loff_t to, + struct mtd_oob_ops *ops) +{ + int ret = _mtd_write_oob(mtd, to, ops); + + if (ret < 0) + return ret; + + if (IS_ENABLED(CONFIG_MTD_PARANOID)) + ret = _mtd_verify(mtd, to, ops->retlen, ops->datbuf); + + return ret; +} EXPORT_SYMBOL_GPL(mtd_write_oob); /**
Add MTD_PARANOID config option for verifying all written data to prevent silent bit errors being undetected, at the cost of some bandwidth overhead. Signed-off-by: Bence Csókás <csokas.bence@prolan.hu> --- drivers/mtd/Kconfig | 14 ++++++++++++ drivers/mtd/mtdcore.c | 51 +++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 63 insertions(+), 2 deletions(-) base-commit: d76bb1ebb5587f66b0f8b8099bfbb44722bc08b3