diff mbox series

[RFC,v3,1/7] fs: Add folio_may_straddle_isize helper

Message ID 20221216150626.670312-2-agruenba@redhat.com
State Not Applicable
Headers show
Series Turn iomap_page_ops into iomap_folio_ops | expand

Commit Message

Andreas Gruenbacher Dec. 16, 2022, 3:06 p.m. UTC
Add a folio_may_straddle_isize() helper as a replacement for
pagecache_isize_extended() when we have a locked folio.

Use the new helper in generic_write_end(), iomap_write_end(),
ext4_write_end(), and ext4_journalled_write_end().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/buffer.c            |  5 ++---
 fs/ext4/inode.c        | 13 ++++++-------
 fs/iomap/buffered-io.c |  3 +--
 include/linux/mm.h     |  2 ++
 mm/truncate.c          | 35 +++++++++++++++++++++++++++++++++++
 5 files changed, 46 insertions(+), 12 deletions(-)

Comments

Christoph Hellwig Dec. 23, 2022, 2:56 p.m. UTC | #1
On Fri, Dec 16, 2022 at 04:06:20PM +0100, Andreas Gruenbacher wrote:
> Add a folio_may_straddle_isize() helper as a replacement for
> pagecache_isize_extended() when we have a locked folio.

I find the naming very confusing.  Any good reason to not follow
the naming of pagecache_isize_extended an call it
folio_isize_extended?

> Use the new helper in generic_write_end(), iomap_write_end(),
> ext4_write_end(), and ext4_journalled_write_end().

Please split this into a patch per caller in addition to the one
adding the helper, and write commit logs explaining the rationale
for the helper.  The obious ones I'm trying to guess are that
the new helper avoid a page cache radix tree lookup and a lock
page/folio cycle, but I'd rather hear that from the horses mouth
in the commit log.

> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2164,16 +2164,15 @@ int generic_write_end(struct file *file, struct address_space *mapping,
>  	 * But it's important to update i_size while still holding page lock:
>  	 * page writeout could otherwise come in and zero beyond i_size.
>  	 */
> -	if (pos + copied > inode->i_size) {
> +	if (pos + copied > old_size) {

This is and unrelated and undocument (but useful) change.  Please split
it out as well.

> + * This function must be called while we still hold i_rwsem - this not only
> + * makes sure i_size is stable but also that userspace cannot observe the new
> + * i_size value before we are prepared to handle mmap writes there.

Please add a lockdep_assert_held_write to enforce that.

> +void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
> +			      loff_t old_size, loff_t start)
> +{
> +	unsigned int blocksize = i_blocksize(inode);
> +
> +	if (round_up(old_size, blocksize) >= round_down(start, blocksize))
> +		return;
> +
> +	/*
> +	 * See clear_page_dirty_for_io() for details why folio_set_dirty()
> +	 * is needed.
> +	 */
> +	if (folio_mkclean(folio))
> +		folio_set_dirty(folio);

Should pagecache_isize_extended be rewritten to use this helper,
i.e. turn this into a factoring out of a helper?

> +EXPORT_SYMBOL(folio_may_straddle_isize);

Please make this an EXPORT_SYMBOL_GPL just like folio_mkclean.
Andreas Grünbacher Dec. 23, 2022, 10:04 p.m. UTC | #2
Am Fr., 23. Dez. 2022 um 16:06 Uhr schrieb Christoph Hellwig
<hch@infradead.org>:
> On Fri, Dec 16, 2022 at 04:06:20PM +0100, Andreas Gruenbacher wrote:
> > Add a folio_may_straddle_isize() helper as a replacement for
> > pagecache_isize_extended() when we have a locked folio.
>
> I find the naming very confusing.  Any good reason to not follow
> the naming of pagecache_isize_extended an call it
> folio_isize_extended?

A good reason for a different name is because
folio_may_straddle_isize() requires a locked folio, while
pagecache_isize_extended() will fail if the folio is still locked. So
this doesn't follow the usual "replace 'page' with 'folio'" pattern.

> > Use the new helper in generic_write_end(), iomap_write_end(),
> > ext4_write_end(), and ext4_journalled_write_end().
>
> Please split this into a patch per caller in addition to the one
> adding the helper, and write commit logs explaining the rationale
> for the helper.  The obious ones I'm trying to guess are that
> the new helper avoid a page cache radix tree lookup and a lock
> page/folio cycle, but I'd rather hear that from the horses mouth
> in the commit log.

Yes, that's what the horse says.

> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -2164,16 +2164,15 @@ int generic_write_end(struct file *file, struct address_space *mapping,
> >        * But it's important to update i_size while still holding page lock:
> >        * page writeout could otherwise come in and zero beyond i_size.
> >        */
> > -     if (pos + copied > inode->i_size) {
> > +     if (pos + copied > old_size) {
>
> This is and unrelated and undocument (but useful) change.  Please split
> it out as well.
>
> > + * This function must be called while we still hold i_rwsem - this not only
> > + * makes sure i_size is stable but also that userspace cannot observe the new
> > + * i_size value before we are prepared to handle mmap writes there.
>
> Please add a lockdep_assert_held_write to enforce that.
>
> > +void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
> > +                           loff_t old_size, loff_t start)
> > +{
> > +     unsigned int blocksize = i_blocksize(inode);
> > +
> > +     if (round_up(old_size, blocksize) >= round_down(start, blocksize))
> > +             return;
> > +
> > +     /*
> > +      * See clear_page_dirty_for_io() for details why folio_set_dirty()
> > +      * is needed.
> > +      */
> > +     if (folio_mkclean(folio))
> > +             folio_set_dirty(folio);
>
> Should pagecache_isize_extended be rewritten to use this helper,
> i.e. turn this into a factoring out of a helper?

I'm not really sure about that. The boundary conditions in the two
functions are not identical. I think the logic in
folio_may_straddle_isize() is sufficient for the
extending-write-under-folio-lock case, but I'd still need confirmation
for that. If the same logic would also be enough in
pagecache_isize_extended() is more unclear to me.

> > +EXPORT_SYMBOL(folio_may_straddle_isize);
>
> Please make this an EXPORT_SYMBOL_GPL just like folio_mkclean.

Thanks,
Andreas
Christoph Hellwig Dec. 24, 2022, 7:21 a.m. UTC | #3
On Fri, Dec 23, 2022 at 11:04:51PM +0100, Andreas Grünbacher wrote:
> > I find the naming very confusing.  Any good reason to not follow
> > the naming of pagecache_isize_extended an call it
> > folio_isize_extended?
> 
> A good reason for a different name is because
> folio_may_straddle_isize() requires a locked folio, while
> pagecache_isize_extended() will fail if the folio is still locked. So
> this doesn't follow the usual "replace 'page' with 'folio'" pattern.

pagecache also doesn't say page, it says pagecache.  I'd still prepfer
to keep the postfix the same.  And I think the fact that it needs
a locked folio should also have an assert, which both documents this
and catches errors.  I think that's much better than an arbitrarily
different name.

> > Should pagecache_isize_extended be rewritten to use this helper,
> > i.e. turn this into a factoring out of a helper?
> 
> I'm not really sure about that. The boundary conditions in the two
> functions are not identical. I think the logic in
> folio_may_straddle_isize() is sufficient for the
> extending-write-under-folio-lock case, but I'd still need confirmation
> for that. If the same logic would also be enough in
> pagecache_isize_extended() is more unclear to me.

That's another thing that really needs to into the commit log,
why is the condition different and pagecache_isize_extended can't
just be extended for it (if it really can't).
diff mbox series

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index d9c6d1fbb6dd..bbae1437994b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2164,16 +2164,15 @@  int generic_write_end(struct file *file, struct address_space *mapping,
 	 * But it's important to update i_size while still holding page lock:
 	 * page writeout could otherwise come in and zero beyond i_size.
 	 */
-	if (pos + copied > inode->i_size) {
+	if (pos + copied > old_size) {
 		i_size_write(inode, pos + copied);
 		i_size_changed = true;
+		folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
 	}
 
 	unlock_page(page);
 	put_page(page);
 
-	if (old_size < pos)
-		pagecache_isize_extended(inode, old_size, pos);
 	/*
 	 * Don't mark the inode dirty under page lock. First, it unnecessarily
 	 * makes the holding time of page lock longer. Second, it forces lock
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9d9f414f99fe..6fe1c9609d86 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1327,13 +1327,13 @@  static int ext4_write_end(struct file *file,
 	 * If FS_IOC_ENABLE_VERITY is running on this inode, then Merkle tree
 	 * blocks are being written past EOF, so skip the i_size update.
 	 */
-	if (!verity)
+	if (!verity) {
 		i_size_changed = ext4_update_inode_size(inode, pos + copied);
+		folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
+	}
 	unlock_page(page);
 	put_page(page);
 
-	if (old_size < pos && !verity)
-		pagecache_isize_extended(inode, old_size, pos);
 	/*
 	 * Don't mark the inode dirty under page lock. First, it unnecessarily
 	 * makes the holding time of page lock longer. Second, it forces lock
@@ -1439,16 +1439,15 @@  static int ext4_journalled_write_end(struct file *file,
 		if (!partial)
 			SetPageUptodate(page);
 	}
-	if (!verity)
+	if (!verity) {
 		size_changed = ext4_update_inode_size(inode, pos + copied);
+		folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
+	}
 	ext4_set_inode_state(inode, EXT4_STATE_JDATA);
 	EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
 	unlock_page(page);
 	put_page(page);
 
-	if (old_size < pos && !verity)
-		pagecache_isize_extended(inode, old_size, pos);
-
 	if (size_changed) {
 		ret2 = ext4_mark_inode_dirty(handle, inode);
 		if (!ret)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 356193e44cf0..347010c6a652 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -734,11 +734,10 @@  static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
 	if (pos + ret > old_size) {
 		i_size_write(iter->inode, pos + ret);
 		iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
+		folio_may_straddle_isize(iter->inode, folio, old_size, pos);
 	}
 	folio_unlock(folio);
 
-	if (old_size < pos)
-		pagecache_isize_extended(iter->inode, old_size, pos);
 	if (page_ops && page_ops->page_done)
 		page_ops->page_done(iter->inode, pos, ret, &folio->page);
 	folio_put(folio);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8178fe894e2e..a8632747780e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2016,6 +2016,8 @@  int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 
 extern void truncate_pagecache(struct inode *inode, loff_t new);
 extern void truncate_setsize(struct inode *inode, loff_t newsize);
+void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
+			      loff_t old_size, loff_t start);
 void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
 void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
 int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 7b4ea4c4a46b..971b08399144 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -769,6 +769,41 @@  void truncate_setsize(struct inode *inode, loff_t newsize)
 }
 EXPORT_SYMBOL(truncate_setsize);
 
+/**
+ * folio_may_straddle_isize - update pagecache after extending i_size
+ * @inode:	inode for which i_size was extended
+ * @folio:	folio to maybe mark read-only
+ * @old_size:	original inode size
+ * @start:	start of the write
+ *
+ * Handle extending an inode by a write that starts behind the old inode size.
+ * If a block-aligned hole exists between the old inode size and the start of
+ * the write, we mark the folio read-only so that page_mkwrite() is called on
+ * the nearest write access to the page.  That way, the filesystem can be sure
+ * that page_mkwrite() is called on the page before a user writes to the page
+ * via mmap.
+ *
+ * This function must be called while we still hold i_rwsem - this not only
+ * makes sure i_size is stable but also that userspace cannot observe the new
+ * i_size value before we are prepared to handle mmap writes there.
+ */
+void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
+			      loff_t old_size, loff_t start)
+{
+	unsigned int blocksize = i_blocksize(inode);
+
+	if (round_up(old_size, blocksize) >= round_down(start, blocksize))
+		return;
+
+	/*
+	 * See clear_page_dirty_for_io() for details why folio_set_dirty()
+	 * is needed.
+	 */
+	if (folio_mkclean(folio))
+		folio_set_dirty(folio);
+}
+EXPORT_SYMBOL(folio_may_straddle_isize);
+
 /**
  * pagecache_isize_extended - update pagecache after extension of i_size
  * @inode:	inode for which i_size was extended