diff mbox series

ext4: Fix data corruption caused by unaligned direct AIO

Message ID 20190306110642.11804-1-lczerner@redhat.com
State Accepted, archived
Headers show
Series ext4: Fix data corruption caused by unaligned direct AIO | expand

Commit Message

Lukas Czerner March 6, 2019, 11:06 a.m. UTC
Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.

However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.

This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31

With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
*
0000b200

Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
Cc: <stable@vger.kernel.org>
---
 fs/ext4/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Theodore Ts'o March 15, 2019, 3:38 a.m. UTC | #1
On Wed, Mar 06, 2019 at 12:06:42PM +0100, Lukas Czerner wrote:
> Ext4 needs to serialize unaligned direct AIO because the zeroing of
> partial blocks of two competing unaligned AIOs can result in data
> corruption.
> 
> However it decides not to serialize if the potentially unaligned aio is
> past i_size with the rationale that no pending writes are possible past
> i_size. Unfortunately if the i_size is not block aligned and the second
> unaligned write lands past i_size, but still into the same block, it has
> the potential of corrupting the previous unaligned write to the same
> block.
> 
> This is (very simplified) reproducer from Frank
> 
>     // 41472 = (10 * 4096) + 512
>     // 37376 = 41472 - 4096
> 
>     ftruncate(fd, 41472);
>     io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
>     io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);
> 
>     io_submit(io_ctx, 1, &iocbs[1]);
>     io_submit(io_ctx, 1, &iocbs[2]);
> 
>     io_getevents(io_ctx, 2, 2, events, NULL);
> 
> Without this patch the 512B range from 40960 up to the start of the
> second unaligned write (41472) is going to be zeroed overwriting the data
> written by the first write. This is a data corruption.
> 
> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> *
> 00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
> *
> 0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> *
> 0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
> 
> With this patch the data corruption is avoided because we will recognize
> the unaligned_aio and wait for the unwritten extent conversion.
> 
> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> *
> 00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
> *
> 0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
> *
> 0000b200
> 
> Reported-by: Frank Sorenson <fsorenso@redhat.com>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
> Cc: <stable@vger.kernel.org>

Thanks, applied.

					- Ted
diff mbox series

Patch

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837b..98ec11f69cd4 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -125,7 +125,7 @@  ext4_unaligned_aio(struct inode *inode, struct iov_iter *from, loff_t pos)
 	struct super_block *sb = inode->i_sb;
 	int blockmask = sb->s_blocksize - 1;
 
-	if (pos >= i_size_read(inode))
+	if (pos >= ALIGN(i_size_read(inode), sb->s_blocksize))
 		return 0;
 
 	if ((pos | iov_iter_alignment(from)) & blockmask)