diff mbox series

[v3] mtd: Verify written data in paranoid mode

Message ID 20250512084033.69718-1-csokas.bence@prolan.hu
State New
Headers show
Series [v3] mtd: Verify written data in paranoid mode | expand

Commit Message

Bence Csókás May 12, 2025, 8:40 a.m. UTC
Add MTD_PARANOID config option for verifying all written data to prevent
silent bit errors being undetected, at the cost of some bandwidth overhead.

Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
---
 drivers/mtd/Kconfig   | 14 ++++++++++++
 drivers/mtd/mtdcore.c | 51 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 2 deletions(-)


base-commit: d76bb1ebb5587f66b0f8b8099bfbb44722bc08b3

Comments

Miquel Raynal May 12, 2025, 9:14 a.m. UTC | #1
On 12/05/2025 at 10:40:32 +02, Bence Csókás <csokas.bence@prolan.hu> wrote:

> Add MTD_PARANOID config option for verifying all written data to prevent
> silent bit errors being undetected, at the cost of some bandwidth overhead.
>
> Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
> ---
>  drivers/mtd/Kconfig   | 14 ++++++++++++
>  drivers/mtd/mtdcore.c | 51 +++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 63 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
> index 796a2eccbef0..e75f4a57df6a 100644
> --- a/drivers/mtd/Kconfig
> +++ b/drivers/mtd/Kconfig
> @@ -206,6 +206,20 @@ config MTD_PARTITIONED_MASTER
>  	  the parent of the partition device be the master device, rather than
>  	  what lies behind the master.
>  
> +config MTD_PARANOID
> +	bool "Read back written data (paranoid mode)"
> +	help
> +	  This option makes the MTD core read back all data on a write and
> +	  report an error if it doesn't match the written data. This can
> +	  safeguard against silent bit errors resulting from a faulty Flash,
> +	  controller oddities, bus noise etc.
> +
> +	  It is up to the layer above MTD (e.g. the filesystem) to handle
> +	  this condition, for example by going read-only to prevent further
> +	  data corruption, or to mark a certain region of Flash as bad.
> +
> +	  If you are unsure, select 'n'.
> +
>  source "drivers/mtd/chips/Kconfig"
>  
>  source "drivers/mtd/maps/Kconfig"
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index 5ba9a741f5ac..3f9874cd4126 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -1745,8 +1745,8 @@ int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(mtd_read_oob);
>  
> -int mtd_write_oob(struct mtd_info *mtd, loff_t to,
> -				struct mtd_oob_ops *ops)
> +static int _mtd_write_oob(struct mtd_info *mtd, loff_t to,
> +			  struct mtd_oob_ops *ops)

I don't like these '_' prefixes, they do not indicate much about the
content of the function. I don't think we need an extra function for
that, just include the check in mtd_write_oob?

>  {
>  	struct mtd_info *master = mtd_get_master(mtd);
>  	int ret;
> @@ -1771,6 +1771,53 @@ int mtd_write_oob(struct mtd_info *mtd, loff_t to,
>  
>  	return mtd_write_oob_std(mtd, to, ops);
>  }
> +
> +static int _mtd_verify(struct mtd_info *mtd, loff_t to, size_t len, const u8 *buf)
> +{
> +	struct device *dev = &mtd->dev;
> +	u_char *verify_buf;
> +	size_t r_retlen;
> +	int ret;
> +
> +	verify_buf = devm_kmalloc(dev, len, GFP_KERNEL);
> +	if (!verify_buf)
> +		return -ENOMEM;
> +
> +	ret = mtd_read(mtd, to, len, &r_retlen, verify_buf);
> +	if (ret < 0)
> +		goto err;
> +
> +	if (len != r_retlen) {
> +		/* We shouldn't see short reads */
> +		dev_err(dev, "Verify failed, written %zd but only read %zd",
> +			len, r_retlen);
> +		ret = -EIO;
> +		goto err;
> +	}
> +
> +	if (memcmp(verify_buf, buf, len)) {
> +		dev_err(dev, "Verify failed, compare mismatch!");
> +		ret = -EIO;
> +	}
> +
> +err:
> +	devm_kfree(dev, verify_buf);
> +	return ret;
> +}
> +
> +int mtd_write_oob(struct mtd_info *mtd, loff_t to,
> +		  struct mtd_oob_ops *ops)
> +{
> +	int ret = _mtd_write_oob(mtd, to, ops);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (IS_ENABLED(CONFIG_MTD_PARANOID))
> +		ret = _mtd_verify(mtd, to, ops->retlen, ops->datbuf);

Why _mtd_verify and not mtd_verify?
Richard Weinberger May 12, 2025, 9:45 a.m. UTC | #2
----- Ursprüngliche Mail -----
> Von: "Miquel Raynal" <miquel.raynal@bootlin.com>
>> +config MTD_PARANOID
>> +	bool "Read back written data (paranoid mode)"
>> +	help
>> +	  This option makes the MTD core read back all data on a write and
>> +	  report an error if it doesn't match the written data. This can
>> +	  safeguard against silent bit errors resulting from a faulty Flash,
>> +	  controller oddities, bus noise etc.
>> +
>> +	  It is up to the layer above MTD (e.g. the filesystem) to handle
>> +	  this condition, for example by going read-only to prevent further
>> +	  data corruption, or to mark a certain region of Flash as bad.
>> +
>> +	  If you are unsure, select 'n'.

I still have a hard time seeing the benefit of this.
To me it looks like you're working around broken hardware.

Thanks,
//richard
Bence Csókás May 12, 2025, 12:29 p.m. UTC | #3
Hi,

On 2025. 05. 12. 11:14, Miquel Raynal wrote:
> Why _mtd_verify and not mtd_verify?

Hm, no particular reason, I was thinking that since it's an "internal" 
function, like `_mtd_write_oob()`, it would get the underscore. But now 
that I think about it, there are many static functions already without 
this underscore. Should I change it?

On 2025. 05. 12. 11:45, Richard Weinberger wrote:
> I still have a hard time seeing the benefit of this.
> To me it looks like you're working around broken hardware.
>
> Thanks,
> //richard

Well, yes, in our case. But the point is, we have a strict requirement 
for data integrity, which is not unique to us I believe. I would think 
there are other industrial control applications like ours, which dictate 
a high data integrity.

Bence
Richard Weinberger May 12, 2025, 12:47 p.m. UTC | #4
----- Ursprüngliche Mail -----
> Von: "Csókás Bence" <csokas.bence@prolan.hu>
> Well, yes, in our case. But the point is, we have a strict requirement
> for data integrity, which is not unique to us I believe. I would think
> there are other industrial control applications like ours, which dictate
> a high data integrity.

In your last patch set you said your hardware has an issue that every
now and that data is not properly written.
Now you talk about data integrity requirements. I'm confused.

My point is that at some level we need to trust hardware,
if your flash memory is so broken that you can't rely on the write
path you're in deep trouble.
What is the next step, reading it back every five seconds to make
sure it is still there? (just kidding).

We do have plenty of tests to verify this for *testing*
to find broken drivers or hardware.
e.g. mtd/ubi tests, UBI verify at runtime.

Thanks,
//richard
Bence Csókás May 12, 2025, 1:13 p.m. UTC | #5
Hi,

On 2025. 05. 12. 14:47, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "Csókás Bence" <csokas.bence@prolan.hu>
>> Well, yes, in our case. But the point is, we have a strict requirement
>> for data integrity, which is not unique to us I believe. I would think
>> there are other industrial control applications like ours, which dictate
>> a high data integrity.
> 
> In your last patch set you said your hardware has an issue that every
> now and that data is not properly written.
> Now you talk about data integrity requirements. I'm confused.

The two problems are not too dissimilar: in one case you have a random, 
and _very_ low chance of data corruption, e.g. because of noise, aging 
hardware, power supply ripple etc. But you still need to make sure that 
the written data is absolutely correct; or if it is not, the system will 
immediately enter some fail-safe mode. This is the problem we want to 
solve, for everybody using Linux in high reliability environments.

The problem we _have_ though happens to be a bit different: here we are 
blursed with a system that corrupts data at a noticeable probability. 
But the model is the same: a stochastic process introducing bit errors 
on write. But I sincerely hope no one else has this problem, and this is 
*not* the primary aim of this patch; it just happens to solve our issue 
as well. But I intend it to be useful for the larger Linux community, 
thus the primary goal is to solve the first issue.

> My point is that at some level we need to trust hardware,
> if your flash memory is so broken that you can't rely on the write
> path you're in deep trouble.

Sure, but at the moment, we're not giving any return path for hardware. 
We just shovel megabytes at it, and don't even ask back. In critical 
systems, this will not fly.

> What is the next step, reading it back every five seconds to make
> sure it is still there? (just kidding).

(( Well, you're kidding now, but this is what we will have to do in 
another project, a rail interlocking system. Though obviously not in the 
kernel. But I digress... ))

Bence
Miquel Raynal May 12, 2025, 1:59 p.m. UTC | #6
Hello,

On 12/05/2025 at 15:13:20 +02, Csókás Bence <csokas.bence@prolan.hu> wrote:

> Hi,
>
> On 2025. 05. 12. 14:47, Richard Weinberger wrote:
>> ----- Ursprüngliche Mail -----
>>> Von: "Csókás Bence" <csokas.bence@prolan.hu>
>>> Well, yes, in our case. But the point is, we have a strict requirement
>>> for data integrity, which is not unique to us I believe. I would think
>>> there are other industrial control applications like ours, which dictate
>>> a high data integrity.
>> In your last patch set you said your hardware has an issue that every
>> now and that data is not properly written.
>> Now you talk about data integrity requirements. I'm confused.
>
> The two problems are not too dissimilar: in one case you have a random,
> and _very_ low chance of data corruption, e.g. because of noise, aging
> hardware, power supply ripple etc. But you still need to make sure that
> the written data is absolutely correct; or if it is not, the system will
> immediately enter some fail-safe mode. This is the problem we want to
> solve, for everybody using Linux in high reliability environments.
>
> The problem we _have_ though happens to be a bit different: here we are
> blursed with a system that corrupts data at a noticeable
> probability. But the model is the same: a stochastic process introducing
> bit errors on write. But I sincerely hope no one else has this problem,
> and this is *not* the primary aim of this patch; it just happens to
> solve our issue as well. But I intend it to be useful for the larger
> Linux community, thus the primary goal is to solve the first issue.

I don't have a strong opinion there but I don't dislike this idea
because it might also help troubleshooting errors sometimes. It is very
hard to understand issues which happen to be discovered way after they
have been generated (typically during a read, way later than a "faulty"
write). Having this paranoid option would give a more synchronous
approach which is easier to work with sometimes.

Cheers,
Miquèl
Richard Weinberger May 12, 2025, 3:23 p.m. UTC | #7
----- Ursprüngliche Mail -----
> Von: "Miquel Raynal" <miquel.raynal@bootlin.com>
>> The problem we _have_ though happens to be a bit different: here we are
>> blursed with a system that corrupts data at a noticeable
>> probability. But the model is the same: a stochastic process introducing
>> bit errors on write. But I sincerely hope no one else has this problem,
>> and this is *not* the primary aim of this patch; it just happens to
>> solve our issue as well. But I intend it to be useful for the larger
>> Linux community, thus the primary goal is to solve the first issue.
> 
> I don't have a strong opinion there but I don't dislike this idea
> because it might also help troubleshooting errors sometimes. It is very
> hard to understand issues which happen to be discovered way after they
> have been generated (typically during a read, way later than a "faulty"
> write). Having this paranoid option would give a more synchronous
> approach which is easier to work with sometimes.

UBI offers this already, there is a write self-check as part of the io
checks that can be enabled
via debugfs per UBI device.
So for troubleshooting this should be good enough.
There is room for improvement, though. Currently it uses vmalloc().

Thanks,
//richard
Miquel Raynal May 12, 2025, 3:33 p.m. UTC | #8
On 12/05/2025 at 17:23:16 +02, Richard Weinberger <richard@nod.at> wrote:

> ----- Ursprüngliche Mail -----
>> Von: "Miquel Raynal" <miquel.raynal@bootlin.com>
>>> The problem we _have_ though happens to be a bit different: here we are
>>> blursed with a system that corrupts data at a noticeable
>>> probability. But the model is the same: a stochastic process introducing
>>> bit errors on write. But I sincerely hope no one else has this problem,
>>> and this is *not* the primary aim of this patch; it just happens to
>>> solve our issue as well. But I intend it to be useful for the larger
>>> Linux community, thus the primary goal is to solve the first issue.
>> 
>> I don't have a strong opinion there but I don't dislike this idea
>> because it might also help troubleshooting errors sometimes. It is very
>> hard to understand issues which happen to be discovered way after they
>> have been generated (typically during a read, way later than a "faulty"
>> write). Having this paranoid option would give a more synchronous
>> approach which is easier to work with sometimes.
>
> UBI offers this already, there is a write self-check as part of the io
> checks that can be enabled
> via debugfs per UBI device.
> So for troubleshooting this should be good enough.
> There is room for improvement, though. Currently it uses vmalloc().

UBI is full of uncovered resources :-)
Bence Csókás May 12, 2025, 3:51 p.m. UTC | #9
Hi,

On 2025. 05. 12. 17:33, Miquel Raynal wrote:
> On 12/05/2025 at 17:23:16 +02, Richard Weinberger <richard@nod.at> wrote:
> 
>> ----- Ursprüngliche Mail -----
>>> Von: "Miquel Raynal" <miquel.raynal@bootlin.com>
>>>> The problem we _have_ though happens to be a bit different: here we are
>>>> blursed with a system that corrupts data at a noticeable
>>>> probability. But the model is the same: a stochastic process introducing
>>>> bit errors on write. But I sincerely hope no one else has this problem,
>>>> and this is *not* the primary aim of this patch; it just happens to
>>>> solve our issue as well. But I intend it to be useful for the larger
>>>> Linux community, thus the primary goal is to solve the first issue.
>>>
>>> I don't have a strong opinion there but I don't dislike this idea
>>> because it might also help troubleshooting errors sometimes. It is very
>>> hard to understand issues which happen to be discovered way after they
>>> have been generated (typically during a read, way later than a "faulty"
>>> write). Having this paranoid option would give a more synchronous
>>> approach which is easier to work with sometimes.
>>
>> UBI offers this already, there is a write self-check as part of the io
>> checks that can be enabled
>> via debugfs per UBI device.
>> So for troubleshooting this should be good enough.
>> There is room for improvement, though. Currently it uses vmalloc().
> 
> UBI is full of uncovered resources :-)

For sure. Plus, even though we use UBI + UBIFS, that may not necessarily 
be the case for others. Maybe someone uses mtdblock + some 
"conventional" FS (ext*, f2fs etc.). Or maybe they use JFFS. Or maybe no 
FS at all. We should still allow userspace or higher FS layers to be 
notified of the problem.

Bence
Richard Weinberger May 12, 2025, 3:57 p.m. UTC | #10
----- Ursprüngliche Mail -----
> Von: "Csókás Bence" <csokas.bence@prolan.hu>
>>> UBI offers this already, there is a write self-check as part of the io
>>> checks that can be enabled
>>> via debugfs per UBI device.
>>> So for troubleshooting this should be good enough.
>>> There is room for improvement, though. Currently it uses vmalloc().
>> 
>> UBI is full of uncovered resources :-)
> 
> For sure. Plus, even though we use UBI + UBIFS, that may not necessarily
> be the case for others. Maybe someone uses mtdblock + some
> "conventional" FS (ext*, f2fs etc.). Or maybe they use JFFS. Or maybe no
> FS at all. We should still allow userspace or higher FS layers to be
> notified of the problem.

Nobody sane would every use a regular block based filesystem in rw mode
on-top of mtdblock. Especially if you care about your data.

JFFS2 has CONFIG_JFFS2_FS_WBUF_VERIFY.
AFAIK, this origins form the old and dark age of flashes where everything
was wonky and no good test tooling existed.

Long story short, I'm not convinced by this patch set.
No ack form my side.
At least it should be configurable per device and not being a global
config option. If Michael or Miquel like it, I'll not block this decision.

Thanks,
//richard
diff mbox series

Patch

diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
index 796a2eccbef0..e75f4a57df6a 100644
--- a/drivers/mtd/Kconfig
+++ b/drivers/mtd/Kconfig
@@ -206,6 +206,20 @@  config MTD_PARTITIONED_MASTER
 	  the parent of the partition device be the master device, rather than
 	  what lies behind the master.
 
+config MTD_PARANOID
+	bool "Read back written data (paranoid mode)"
+	help
+	  This option makes the MTD core read back all data on a write and
+	  report an error if it doesn't match the written data. This can
+	  safeguard against silent bit errors resulting from a faulty Flash,
+	  controller oddities, bus noise etc.
+
+	  It is up to the layer above MTD (e.g. the filesystem) to handle
+	  this condition, for example by going read-only to prevent further
+	  data corruption, or to mark a certain region of Flash as bad.
+
+	  If you are unsure, select 'n'.
+
 source "drivers/mtd/chips/Kconfig"
 
 source "drivers/mtd/maps/Kconfig"
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 5ba9a741f5ac..3f9874cd4126 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -1745,8 +1745,8 @@  int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
 }
 EXPORT_SYMBOL_GPL(mtd_read_oob);
 
-int mtd_write_oob(struct mtd_info *mtd, loff_t to,
-				struct mtd_oob_ops *ops)
+static int _mtd_write_oob(struct mtd_info *mtd, loff_t to,
+			  struct mtd_oob_ops *ops)
 {
 	struct mtd_info *master = mtd_get_master(mtd);
 	int ret;
@@ -1771,6 +1771,53 @@  int mtd_write_oob(struct mtd_info *mtd, loff_t to,
 
 	return mtd_write_oob_std(mtd, to, ops);
 }
+
+static int _mtd_verify(struct mtd_info *mtd, loff_t to, size_t len, const u8 *buf)
+{
+	struct device *dev = &mtd->dev;
+	u_char *verify_buf;
+	size_t r_retlen;
+	int ret;
+
+	verify_buf = devm_kmalloc(dev, len, GFP_KERNEL);
+	if (!verify_buf)
+		return -ENOMEM;
+
+	ret = mtd_read(mtd, to, len, &r_retlen, verify_buf);
+	if (ret < 0)
+		goto err;
+
+	if (len != r_retlen) {
+		/* We shouldn't see short reads */
+		dev_err(dev, "Verify failed, written %zd but only read %zd",
+			len, r_retlen);
+		ret = -EIO;
+		goto err;
+	}
+
+	if (memcmp(verify_buf, buf, len)) {
+		dev_err(dev, "Verify failed, compare mismatch!");
+		ret = -EIO;
+	}
+
+err:
+	devm_kfree(dev, verify_buf);
+	return ret;
+}
+
+int mtd_write_oob(struct mtd_info *mtd, loff_t to,
+		  struct mtd_oob_ops *ops)
+{
+	int ret = _mtd_write_oob(mtd, to, ops);
+
+	if (ret < 0)
+		return ret;
+
+	if (IS_ENABLED(CONFIG_MTD_PARANOID))
+		ret = _mtd_verify(mtd, to, ops->retlen, ops->datbuf);
+
+	return ret;
+}
 EXPORT_SYMBOL_GPL(mtd_write_oob);
 
 /**