diff mbox series

mmp: do not use O_DIRECT when working with regular file

Message ID 20210212093719.162065-1-lczerner@redhat.com
State Accepted
Headers show
Series mmp: do not use O_DIRECT when working with regular file | expand

Commit Message

Lukas Czerner Feb. 12, 2021, 9:37 a.m. UTC
Currently the mmp block is read using O_DIRECT to avoid any caching tha
may be done by the VM. However when working with regular files this
creates alignment issues when the device of the host file system has
sector size smaller than the blocksize of the file system in the file
we're working with.

This can be reproduced with t_mmp_fail test when run on the device with
4k sector size because the mke2fs fails when trying to read the mmp
block.

Fix it by disabling O_DIRECT when working with regular file. I don't
think there is any risk of doing so since the file system layer, unlike
shared block device, should guarantee cache consistency.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 lib/ext2fs/mmp.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

Comments

Eric Sandeen Feb. 16, 2021, 9:24 p.m. UTC | #1
On 2/12/21 3:37 AM, Lukas Czerner wrote:
> Currently the mmp block is read using O_DIRECT to avoid any caching tha
> may be done by the VM. However when working with regular files this
> creates alignment issues when the device of the host file system has
> sector size smaller than the blocksize of the file system in the file
> we're working with.
> 
> This can be reproduced with t_mmp_fail test when run on the device with
> 4k sector size because the mke2fs fails when trying to read the mmp
> block.
> 
> Fix it by disabling O_DIRECT when working with regular file. I don't
> think there is any risk of doing so since the file system layer, unlike
> shared block device, should guarantee cache consistency.
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  lib/ext2fs/mmp.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> index c21ae272..1ac22194 100644
> --- a/lib/ext2fs/mmp.c
> +++ b/lib/ext2fs/mmp.c
> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
>  	 * regardless of how the io_manager is doing reads, to avoid caching of
>  	 * the MMP block by the io_manager or the VM.  It needs to be fresh. */
>  	if (fs->mmp_fd <= 0) {
> +		struct stat st;
>  		int flags = O_RDWR | O_DIRECT;
>  
> -retry:
> +		/*
> +		 * There is no reason for using O_DIRECT if we're working with
> +		 * regular file. Disabling it also avoids problems with
> +		 * alignment when the device of the host file system has sector
> +		 * size smaller than blocksize of the fs we're working with.

I think the problem is when the host filesystem that contains the image is on
a device with a logical sector size which is /larger/ than the image filesystem's
block size, right? Not smaller?

Because then you might not be able to do an image-filesystem-block-aligned direct
IO on it, if it's sub-logical-block-size for the host filesystem/device, and lands
within the larger host sector at an offset?

otherwise, this seems at least as reasonable to me as the previous tmpfs work
around, so other than the question about the comment,

Reviewed-by: Eric Sandeen <sandeen@redhat.com>


> +		 */
> +		if (stat(fs->device_name, &st) == 0 &&
> +		    S_ISREG(st.st_mode))
> +			flags &= ~O_DIRECT;
> +
>  		fs->mmp_fd = open(fs->device_name, flags);
>  		if (fs->mmp_fd < 0) {
> -			struct stat st;
> -
> -			/* Avoid O_DIRECT for filesystem image files if open
> -			 * fails, since it breaks when running on tmpfs. */
> -			if (errno == EINVAL && (flags & O_DIRECT) &&
> -			    stat(fs->device_name, &st) == 0 &&
> -			    S_ISREG(st.st_mode)) {
> -				flags &= ~O_DIRECT;
> -				goto retry;
> -			}
>  			retval = EXT2_ET_MMP_OPEN_DIRECT;
>  			goto out;
>  		}
>
Lukas Czerner Feb. 16, 2021, 9:51 p.m. UTC | #2
On Tue, Feb 16, 2021 at 03:24:00PM -0600, Eric Sandeen wrote:
> On 2/12/21 3:37 AM, Lukas Czerner wrote:
> > Currently the mmp block is read using O_DIRECT to avoid any caching tha
> > may be done by the VM. However when working with regular files this
> > creates alignment issues when the device of the host file system has
> > sector size smaller than the blocksize of the file system in the file
> > we're working with.
> > 
> > This can be reproduced with t_mmp_fail test when run on the device with
> > 4k sector size because the mke2fs fails when trying to read the mmp
> > block.
> > 
> > Fix it by disabling O_DIRECT when working with regular file. I don't
> > think there is any risk of doing so since the file system layer, unlike
> > shared block device, should guarantee cache consistency.
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> >  lib/ext2fs/mmp.c | 22 +++++++++++-----------
> >  1 file changed, 11 insertions(+), 11 deletions(-)
> > 
> > diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> > index c21ae272..1ac22194 100644
> > --- a/lib/ext2fs/mmp.c
> > +++ b/lib/ext2fs/mmp.c
> > @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
> >  	 * regardless of how the io_manager is doing reads, to avoid caching of
> >  	 * the MMP block by the io_manager or the VM.  It needs to be fresh. */
> >  	if (fs->mmp_fd <= 0) {
> > +		struct stat st;
> >  		int flags = O_RDWR | O_DIRECT;
> >  
> > -retry:
> > +		/*
> > +		 * There is no reason for using O_DIRECT if we're working with
> > +		 * regular file. Disabling it also avoids problems with
> > +		 * alignment when the device of the host file system has sector
> > +		 * size smaller than blocksize of the fs we're working with.
> 
> I think the problem is when the host filesystem that contains the image is on
> a device with a logical sector size which is /larger/ than the image filesystem's
> block size, right? Not smaller?

Yeah, it is supposed to be *larger*, of course. If it is smaller, then
there is no problem. Thanks for pointing this out I'll change the
comment and the description.

> 
> Because then you might not be able to do an image-filesystem-block-aligned direct
> IO on it, if it's sub-logical-block-size for the host filesystem/device, and lands
> within the larger host sector at an offset?
> 
> otherwise, this seems at least as reasonable to me as the previous tmpfs work
> around, so other than the question about the comment,
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>

Thanks!
-Lukas

> 
> 
> > +		 */
> > +		if (stat(fs->device_name, &st) == 0 &&
> > +		    S_ISREG(st.st_mode))
> > +			flags &= ~O_DIRECT;
> > +
> >  		fs->mmp_fd = open(fs->device_name, flags);
> >  		if (fs->mmp_fd < 0) {
> > -			struct stat st;
> > -
> > -			/* Avoid O_DIRECT for filesystem image files if open
> > -			 * fails, since it breaks when running on tmpfs. */
> > -			if (errno == EINVAL && (flags & O_DIRECT) &&
> > -			    stat(fs->device_name, &st) == 0 &&
> > -			    S_ISREG(st.st_mode)) {
> > -				flags &= ~O_DIRECT;
> > -				goto retry;
> > -			}
> >  			retval = EXT2_ET_MMP_OPEN_DIRECT;
> >  			goto out;
> >  		}
> > 
>
diff mbox series

Patch

diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
index c21ae272..1ac22194 100644
--- a/lib/ext2fs/mmp.c
+++ b/lib/ext2fs/mmp.c
@@ -57,21 +57,21 @@  errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
 	 * regardless of how the io_manager is doing reads, to avoid caching of
 	 * the MMP block by the io_manager or the VM.  It needs to be fresh. */
 	if (fs->mmp_fd <= 0) {
+		struct stat st;
 		int flags = O_RDWR | O_DIRECT;
 
-retry:
+		/*
+		 * There is no reason for using O_DIRECT if we're working with
+		 * regular file. Disabling it also avoids problems with
+		 * alignment when the device of the host file system has sector
+		 * size smaller than blocksize of the fs we're working with.
+		 */
+		if (stat(fs->device_name, &st) == 0 &&
+		    S_ISREG(st.st_mode))
+			flags &= ~O_DIRECT;
+
 		fs->mmp_fd = open(fs->device_name, flags);
 		if (fs->mmp_fd < 0) {
-			struct stat st;
-
-			/* Avoid O_DIRECT for filesystem image files if open
-			 * fails, since it breaks when running on tmpfs. */
-			if (errno == EINVAL && (flags & O_DIRECT) &&
-			    stat(fs->device_name, &st) == 0 &&
-			    S_ISREG(st.st_mode)) {
-				flags &= ~O_DIRECT;
-				goto retry;
-			}
 			retval = EXT2_ET_MMP_OPEN_DIRECT;
 			goto out;
 		}