| Message ID | 20260512052230.2947683-2-hch@lst.de (mailing list archive) |
|---|---|
| State | Handled Elsewhere |
| Headers | show |
| Series | [01/19] btrfs: require at least 4 devices for RAID 6 | expand |
On Tue, May 12, 2026 at 07:20:41AM +0200, Christoph Hellwig wrote: > While the RAID6 algorithm could in theory support 3 devices by just > copying the data disk to the two parity disks, this version is not only > useless because it is a suboptimal version of 3-way mirroring, but also > broken with various crashes and incorrect parity generation in various > architecture-optimized implementations. Disallow it similar to mdraid > which requires at least 4 devices for RAID 6. > > Fixes: 53b381b3abeb ("Btrfs: RAID5 and RAID6") > Signed-off-by: Christoph Hellwig <hch@lst.de> This patch should have been sent separately as it has user visible impact and can potentially break some setups. The degenerate modes of raid0, 5, or 6 are explicit as a possible middle step when converting profiles. We can use a fallback implementation for this case if the accelerated implementations cannot do it.
On Tue, May 12, 2026 at 01:42:31PM +0200, David Sterba wrote: > On Tue, May 12, 2026 at 07:20:41AM +0200, Christoph Hellwig wrote: > > While the RAID6 algorithm could in theory support 3 devices by just > > copying the data disk to the two parity disks, this version is not only > > useless because it is a suboptimal version of 3-way mirroring, but also > > broken with various crashes and incorrect parity generation in various > > architecture-optimized implementations. Disallow it similar to mdraid > > which requires at least 4 devices for RAID 6. > > > > Fixes: 53b381b3abeb ("Btrfs: RAID5 and RAID6") > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > This patch should have been sent separately as it has user visible > impact and can potentially break some setups. It _is_ sent out separate. > The degenerate modes of > raid0, 5, or 6 are explicit as a possible middle step when converting > profiles. We can use a fallback implementation for this case if the > accelerated implementations cannot do it. This is not about a degenerated mode. For a degenerated RAID 6, parity generation uses the RAID 5 XOR routines as the second parity will be missing. This is about generating two parities for a single data disk, which must be explicitly selected.
On May 11, 2026 10:20:41 PM PDT, Christoph Hellwig <hch@lst.de> wrote: >While the RAID6 algorithm could in theory support 3 devices by just >copying the data disk to the two parity disks, this version is not only >useless because it is a suboptimal version of 3-way mirroring, but also >broken with various crashes and incorrect parity generation in various >architecture-optimized implementations. Disallow it similar to mdraid >which requires at least 4 devices for RAID 6. > >Fixes: 53b381b3abeb ("Btrfs: RAID5 and RAID6") >Signed-off-by: Christoph Hellwig <hch@lst.de> >--- > fs/btrfs/volumes.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >index a88e68f90564..0b54b97bdad8 100644 >--- a/fs/btrfs/volumes.c >+++ b/fs/btrfs/volumes.c >@@ -159,7 +159,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .sub_stripes = 1, > .dev_stripes = 1, > .devs_max = 0, >- .devs_min = 3, >+ .devs_min = 4, > .tolerated_failures = 2, > .devs_increment = 1, > .ncopies = 1, Yes, if anyone cares about < 4 disks for the RAID-6 case (or < 3 for the RAID-4/5 case), just use the RAID-1 code.
On Wed, May 13, 2026 at 07:47:42AM +0200, Christoph Hellwig wrote: > On Tue, May 12, 2026 at 01:42:31PM +0200, David Sterba wrote: > > On Tue, May 12, 2026 at 07:20:41AM +0200, Christoph Hellwig wrote: > > > While the RAID6 algorithm could in theory support 3 devices by just > > > copying the data disk to the two parity disks, this version is not only > > > useless because it is a suboptimal version of 3-way mirroring, but also > > > broken with various crashes and incorrect parity generation in various > > > architecture-optimized implementations. Disallow it similar to mdraid > > > which requires at least 4 devices for RAID 6. > > > > > > Fixes: 53b381b3abeb ("Btrfs: RAID5 and RAID6") > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > > > This patch should have been sent separately as it has user visible > > impact and can potentially break some setups. > > It _is_ sent out separate. It's public interface change of btrfs but in a patch series cleaning up some library code, I noticed it by accident. > > The degenerate modes of > > raid0, 5, or 6 are explicit as a possible middle step when converting > > profiles. We can use a fallback implementation for this case if the > > accelerated implementations cannot do it. > > This is not about a degenerated mode. For a degenerated RAID 6, parity > generation uses the RAID 5 XOR routines as the second parity will be > missing. This is about generating two parities for a single data disk, > which must be explicitly selected. The calcuation is a different than what I'm concened about, changing minimum devices from 3 to 4 is a breaking change. If the library won't provide the xor/parity functions then we'll have to add a fallback for this special case.
On 13/05/2026 07.47, Christoph Hellwig wrote: > On Tue, May 12, 2026 at 01:42:31PM +0200, David Sterba wrote: > >> The degenerate modes of >> raid0, 5, or 6 are explicit as a possible middle step when converting >> profiles. We can use a fallback implementation for this case if the >> accelerated implementations cannot do it. > > This is not about a degenerated mode. For a degenerated RAID 6, parity > generation uses the RAID 5 XOR routines as the second parity will be > missing. This is about generating two parities for a single data disk, > which must be explicitly selected. > I think that the David concern is : "what happens for an already existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?" (I am thinking when a new BG needs to be allocated)... BR GB
On May 14, 2026 12:51:59 PM PDT, Goffredo Baroncelli <kreijack@libero.it> wrote: >On 13/05/2026 07.47, Christoph Hellwig wrote: >> On Tue, May 12, 2026 at 01:42:31PM +0200, David Sterba wrote: > >> >>> The degenerate modes of >>> raid0, 5, or 6 are explicit as a possible middle step when converting >>> profiles. We can use a fallback implementation for this case if the >>> accelerated implementations cannot do it. >> >> This is not about a degenerated mode. For a degenerated RAID 6, parity >> generation uses the RAID 5 XOR routines as the second parity will be >> missing. This is about generating two parities for a single data disk, >> which must be explicitly selected. >> > >I think that the David concern is : "what happens for an already >existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?" >(I am thinking when a new BG needs to be allocated)... > >BR >GB > That's what I'm saying – it should invoke the RAID-1 code under the cover (as with 3 disks, D = P = Q.)
On Thu, May 14, 2026 at 09:51:59PM +0200, Goffredo Baroncelli wrote: > I think that the David concern is : "what happens for an already > existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?" > (I am thinking when a new BG needs to be allocated)... Then it will cleanly fail to mount instead of constantly corrupting data and memory with every write, yes. Which clearly suggest that such file systems don't exist in the wild. But if btrfs wants to keep supporting this I'll just add a _unsafe version without the check in the core library.
On Thu, May 14, 2026 at 12:57:53PM -0700, H. Peter Anvin wrote: > That's what I'm saying – it should invoke the RAID-1 code under the > cover (as with 3 disks, D = P = Q.) Yes, if the btrfs maintainer cared for this setup that is what should be done.
On Thu, May 14, 2026 at 12:57:53PM -0700, H. Peter Anvin wrote: > On May 14, 2026 12:51:59 PM PDT, Goffredo Baroncelli <kreijack@libero.it> wrote: > >On 13/05/2026 07.47, Christoph Hellwig wrote: > >> On Tue, May 12, 2026 at 01:42:31PM +0200, David Sterba wrote: > > > >> > >>> The degenerate modes of > >>> raid0, 5, or 6 are explicit as a possible middle step when converting > >>> profiles. We can use a fallback implementation for this case if the > >>> accelerated implementations cannot do it. > >> > >> This is not about a degenerated mode. For a degenerated RAID 6, parity > >> generation uses the RAID 5 XOR routines as the second parity will be > >> missing. This is about generating two parities for a single data disk, > >> which must be explicitly selected. > >> > > > >I think that the David concern is : "what happens for an already > >existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?" > >(I am thinking when a new BG needs to be allocated)... > > That's what I'm saying – it should invoke the RAID-1 code under the cover (as with 3 disks, D = P = Q.) Thanks, it was not clear to me what you meant. For the two edge cases the code should do simple memcpy for both calculations of parity and recovery.
On 15/05/2026 06.37, Christoph Hellwig wrote: > On Thu, May 14, 2026 at 09:51:59PM +0200, Goffredo Baroncelli wrote: >> I think that the David concern is : "what happens for an already >> existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?" >> (I am thinking when a new BG needs to be allocated)... > > Then it will cleanly fail to mount instead of constantly corrupting data > and memory with every write, yes. Which clearly suggest that such > file systems don't exist in the wild. > > But if btrfs wants to keep supporting this I'll just add a _unsafe > version without the check in the core library. > I am not arguing about this part. My point is that the change shouldn't have impacted the BTRFS interface versus the user (as patch 01/19 does), but instead the change should have modify the interface raid code <-> btrfs (e.g. doing a memcpy....), or at least the cover letter should warn that the raid6 code requires a number of disk >= 4, pointing to BTRFS as "client doing wrong things". At least, the message was received: don't relay to the raid6 code when the number of disk is less than 4. BR GB
On May 14, 2026 9:37:05 PM PDT, Christoph Hellwig <hch@lst.de> wrote: >On Thu, May 14, 2026 at 09:51:59PM +0200, Goffredo Baroncelli wrote: >> I think that the David concern is : "what happens for an already >> existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?" >> (I am thinking when a new BG needs to be allocated)... > >Then it will cleanly fail to mount instead of constantly corrupting data >and memory with every write, yes. Which clearly suggest that such >file systems don't exist in the wild. > >But if btrfs wants to keep supporting this I'll just add a _unsafe >version without the check in the core library. I don't think this is a good idea. Error out; it is the btrfs maintainers' job to ensure user data isn't lost. The RAID-6 code has *never* supported only 3 units, and if it ever worked for *any* of the implementations it was purely by accident. Speaking as the original author I should know; this was deliberate as in some cases the degenerate case (3) would have required extra trays in the code to no user benefit. I would not be surprised if the kernel crashed or corrupted the page cache in that case.
On Fri, May 15, 2026 at 12:59:34PM -0700, H. Peter Anvin wrote: > I don't think this is a good idea. Error out; it is the btrfs maintainers' job to ensure user data isn't lost. > > The RAID-6 code has *never* supported only 3 units, and if it ever worked for *any* of the implementations it was purely by accident. Speaking as the original author I should know; this was deliberate as in some cases the degenerate case (3) would have required extra trays in the code to no user benefit. > > I would not be surprised if the kernel crashed or corrupted the page cache in that case. It does, that's why I wanted to exclude it. Anyway, for the about to be resent version I'll drop this btrfs patch over the stated objection and will otherwise not change anything. This means the (IMHO hypothetical) users of this configuration will get a WARN_ON_ONCE triggered, but otherwise keep working (or rather not working) as before.
在 2026/5/18 14:42, Christoph Hellwig 写道: > On Fri, May 15, 2026 at 12:59:34PM -0700, H. Peter Anvin wrote: >> I don't think this is a good idea. Error out; it is the btrfs maintainers' job to ensure user data isn't lost. >> >> The RAID-6 code has *never* supported only 3 units, and if it ever worked for *any* of the implementations it was purely by accident. Speaking as the original author I should know; this was deliberate as in some cases the degenerate case (3) would have required extra trays in the code to no user benefit. >> >> I would not be surprised if the kernel crashed or corrupted the page cache in that case. > > It does, that's why I wanted to exclude it. Anyway, for the about to be > resent version I'll drop this btrfs patch over the stated objection and > will otherwise not change anything. This means the (IMHO hypothetical) > users of this configuration will get a WARN_ON_ONCE triggered, but > otherwise keep working (or rather not working) as before. > For the btrfs part, I believe I can get the current 2-disk-raid5 and 3-disk-raid6 to fallback to raid1 inside btrfs. I hope the btrfs part can be finished and reach the next merge window, but I'm not 100% sure. What is the planned cycle to merge this raid5/6 cleanup? Thanks, Qu
On Wed, 20 May 2026 18:11:09 +0930 Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > 在 2026/5/18 14:42, Christoph Hellwig 写道: > > On Fri, May 15, 2026 at 12:59:34PM -0700, H. Peter Anvin wrote: > >> I don't think this is a good idea. Error out; it is the btrfs maintainers' job to ensure user data isn't lost. > >> > >> The RAID-6 code has *never* supported only 3 units, and if it ever worked for *any* of the implementations it was purely by accident. Speaking as the original author I should know; this was deliberate as in some cases the degenerate case (3) would have required extra trays in the code to no user benefit. > >> > >> I would not be surprised if the kernel crashed or corrupted the page cache in that case. > > > > It does, that's why I wanted to exclude it. Anyway, for the about to be > > resent version I'll drop this btrfs patch over the stated objection and > > will otherwise not change anything. This means the (IMHO hypothetical) > > users of this configuration will get a WARN_ON_ONCE triggered, but > > otherwise keep working (or rather not working) as before. > > > > For the btrfs part, I believe I can get the current 2-disk-raid5 and > 3-disk-raid6 to fallback to raid1 inside btrfs. > > I hope the btrfs part can be finished and reach the next merge window, > but I'm not 100% sure. > > What is the planned cycle to merge this raid5/6 cleanup? At present it's on track for the 7.2-rc1 merge window. Does that suit?
在 2026/5/22 09:47, Andrew Morton 写道: > On Wed, 20 May 2026 18:11:09 +0930 Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> >> >> 在 2026/5/18 14:42, Christoph Hellwig 写道: >>> On Fri, May 15, 2026 at 12:59:34PM -0700, H. Peter Anvin wrote: >>>> I don't think this is a good idea. Error out; it is the btrfs maintainers' job to ensure user data isn't lost. >>>> >>>> The RAID-6 code has *never* supported only 3 units, and if it ever worked for *any* of the implementations it was purely by accident. Speaking as the original author I should know; this was deliberate as in some cases the degenerate case (3) would have required extra trays in the code to no user benefit. >>>> >>>> I would not be surprised if the kernel crashed or corrupted the page cache in that case. >>> >>> It does, that's why I wanted to exclude it. Anyway, for the about to be >>> resent version I'll drop this btrfs patch over the stated objection and >>> will otherwise not change anything. This means the (IMHO hypothetical) >>> users of this configuration will get a WARN_ON_ONCE triggered, but >>> otherwise keep working (or rather not working) as before. >>> >> >> For the btrfs part, I believe I can get the current 2-disk-raid5 and >> 3-disk-raid6 to fallback to raid1 inside btrfs. >> >> I hope the btrfs part can be finished and reach the next merge window, >> but I'm not 100% sure. >> >> What is the planned cycle to merge this raid5/6 cleanup? > > At present it's on track for the 7.2-rc1 merge window. Does that suit? The current btrfs fix (*) is pretty small, I believe we can get it into the next merge window, as long as we got enough review on it. *: https://lore.kernel.org/linux-btrfs/a1d63733465229936351804f3760803d5894a962.1779274630.git.wqu@suse.com/T/#u
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a88e68f90564..0b54b97bdad8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -159,7 +159,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .sub_stripes = 1, .dev_stripes = 1, .devs_max = 0, - .devs_min = 3, + .devs_min = 4, .tolerated_failures = 2, .devs_increment = 1, .ncopies = 1,
While the RAID6 algorithm could in theory support 3 devices by just copying the data disk to the two parity disks, this version is not only useless because it is a suboptimal version of 3-way mirroring, but also broken with various crashes and incorrect parity generation in various architecture-optimized implementations. Disallow it similar to mdraid which requires at least 4 devices for RAID 6. Fixes: 53b381b3abeb ("Btrfs: RAID5 and RAID6") Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)