diff mbox series

[RFC,v2,2/5] jbd2: introduce journal callbacks j_submit|finish_inode_data_buffers

Message ID 20200810010210.3305322-3-mfo@canonical.com
State Superseded
Headers show
Series ext4/jbd2: data=journal: write-protect pages on transaction commit | expand

Commit Message

Mauricio Faria de Oliveira Aug. 10, 2020, 1:02 a.m. UTC
Add the callbacks as opt-in to override the default behavior for
the transaction's inode list, instead of moving that code around.

This is important as not only ext4 uses the inode list: ocfs2 too,
via jbd2_journal_inode_ranged_write(), and maybe out-of-tree code.

To opt-out of the default behavior (i.e., to do nothing), one has
to opt-in with a no-op function.
---
 fs/jbd2/commit.c     | 21 ++++++++++++++++-----
 include/linux/jbd2.h | 21 ++++++++++++++++++++-
 2 files changed, 36 insertions(+), 6 deletions(-)

Comments

Jan Kara Aug. 18, 2020, 2:52 p.m. UTC | #1
On Sun 09-08-20 22:02:05, Mauricio Faria de Oliveira wrote:
> Add the callbacks as opt-in to override the default behavior for
> the transaction's inode list, instead of moving that code around.
> 
> This is important as not only ext4 uses the inode list: ocfs2 too,
> via jbd2_journal_inode_ranged_write(), and maybe out-of-tree code.

I'd prefer if the callback is called unconditionally, jbd2 exports the
callback that implements the current behavior and and both ext4 & ocfs2
are adapted to use this callback. We don't care about out of tree code.
That way things are cleaner long term...

> To opt-out of the default behavior (i.e., to do nothing), one has
> to opt-in with a no-op function.

Your Signed-off-by is missing for this patch.

> ---
>  fs/jbd2/commit.c     | 21 ++++++++++++++++-----
>  include/linux/jbd2.h | 21 ++++++++++++++++++++-
>  2 files changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 51f713089e35..b98d227b50d8 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -237,10 +237,14 @@ static int journal_submit_data_buffers(journal_t *journal,
>  		 * instead of writepages. Because writepages can do
>  		 * block allocation  with delalloc. We need to write
>  		 * only allocated blocks here.
> +		 * This can be overriden with a custom callback.
>  		 */
>  		trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
> -		err = journal_submit_inode_data_buffers(mapping, dirty_start,
> -				dirty_end);
> +		if (journal->j_submit_inode_data_buffers)
> +			err = journal->j_submit_inode_data_buffers(jinode);
> +		else
> +			err = journal_submit_inode_data_buffers(mapping,
> +					dirty_start, dirty_end);
>  		if (!ret)
>  			ret = err;
>  		spin_lock(&journal->j_list_lock);
> @@ -274,9 +278,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
>  			continue;
>  		jinode->i_flags |= JI_COMMIT_RUNNING;
>  		spin_unlock(&journal->j_list_lock);
> -		err = filemap_fdatawait_range_keep_errors(
> -				jinode->i_vfs_inode->i_mapping, dirty_start,
> -				dirty_end);
> +		/*
> +		 * Wait for the inode data buffers writeout.
> +		 * This can be overriden with a custom callback.
> +		 */
> +		if (journal->j_finish_inode_data_buffers)
> +			err = journal->j_finish_inode_data_buffers(jinode);
> +		else
> +			err = filemap_fdatawait_range_keep_errors(
> +					jinode->i_vfs_inode->i_mapping,
> +					dirty_start, dirty_end);
>  		if (!ret)
>  			ret = err;
>  		spin_lock(&journal->j_list_lock);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index d56128df2aff..24efe88eda1b 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -628,7 +628,8 @@ struct transaction_s
>  	struct journal_head	*t_shadow_list;
>  
>  	/*
> -	 * List of inodes whose data we've modified in data=ordered mode.
> +	 * List of inodes whose data we've modified in data=ordered mode
> +	 * or whose pages we should write-protect in data=journaled mode.

I'd rather change the comment to generic "List of inodes associated with
the transaction. E.g. ext4 uses this to track inodes in data=ordered and
data=journal mode that need special handling on transaction commit.".

>  	 * [j_list_lock]
>  	 */
>  	struct list_head	t_inode_list;
> @@ -1110,6 +1111,24 @@ struct journal_s
>  	void			(*j_commit_callback)(journal_t *,
>  						     transaction_t *);
>  
> +	/**
> +	 * @j_submit_inode_data_buffers:
> +	 *
> +	 * This function is called before flushing metadata buffers.
> +	 * This overrides the default behavior (writeout data buffers.)
> +	 */

I'd change the comment to:
	 * This function is called for all inodes associated with the
	 * committing transaction marked with JI_WRITE_DATA flag before we
	 * start to write out the transaction to the journal.

> +	int			(*j_submit_inode_data_buffers)
> +					(struct jbd2_inode *);
> +
> +	/**
> +	 * @j_finish_inode_data_buffers:
> +	 *
> +	 * This function is called after flushing metadata buffers.
> +	 * This overrides the default behavior (wait writeout.)
> +	 */

And here:
	 * This function is called for all inodes associated with the
	 * committing transaction marked with JI_WAIT_DATA flag after we
	 * we have written the transaction to the journal but before we
	 * write out the commit block.


> +	int			(*j_finish_inode_data_buffers)
> +					(struct jbd2_inode *);
> +
>  	/*
>  	 * Journal statistics
>  	 */

								Honza
Mauricio Faria de Oliveira Aug. 19, 2020, 1:20 a.m. UTC | #2
Hi Jan,

Thanks for reviewing.

On Tue, Aug 18, 2020 at 11:52 AM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 09-08-20 22:02:05, Mauricio Faria de Oliveira wrote:
> > Add the callbacks as opt-in to override the default behavior for
> > the transaction's inode list, instead of moving that code around.
> >
> > This is important as not only ext4 uses the inode list: ocfs2 too,
> > via jbd2_journal_inode_ranged_write(), and maybe out-of-tree code.
>
> I'd prefer if the callback is called unconditionally, jbd2 exports the
> callback that implements the current behavior and and both ext4 & ocfs2
> are adapted to use this callback. We don't care about out of tree code.
> That way things are cleaner long term...

Understood.

>
> > To opt-out of the default behavior (i.e., to do nothing), one has
> > to opt-in with a no-op function.
>
> Your Signed-off-by is missing for this patch.

Oh, I thought it wasn't needed in RFC patches.

Thanks for the suggestions below; they're more precise and descriptive.

I had a few questions in the cover letter, but in case you didn't have
the time, I'll just try harder on them; no worries.

Kind regards,
Mauricio

>
> > ---
> >  fs/jbd2/commit.c     | 21 ++++++++++++++++-----
> >  include/linux/jbd2.h | 21 ++++++++++++++++++++-
> >  2 files changed, 36 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> > index 51f713089e35..b98d227b50d8 100644
> > --- a/fs/jbd2/commit.c
> > +++ b/fs/jbd2/commit.c
> > @@ -237,10 +237,14 @@ static int journal_submit_data_buffers(journal_t *journal,
> >                * instead of writepages. Because writepages can do
> >                * block allocation  with delalloc. We need to write
> >                * only allocated blocks here.
> > +              * This can be overriden with a custom callback.
> >                */
> >               trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
> > -             err = journal_submit_inode_data_buffers(mapping, dirty_start,
> > -                             dirty_end);
> > +             if (journal->j_submit_inode_data_buffers)
> > +                     err = journal->j_submit_inode_data_buffers(jinode);
> > +             else
> > +                     err = journal_submit_inode_data_buffers(mapping,
> > +                                     dirty_start, dirty_end);
> >               if (!ret)
> >                       ret = err;
> >               spin_lock(&journal->j_list_lock);
> > @@ -274,9 +278,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
> >                       continue;
> >               jinode->i_flags |= JI_COMMIT_RUNNING;
> >               spin_unlock(&journal->j_list_lock);
> > -             err = filemap_fdatawait_range_keep_errors(
> > -                             jinode->i_vfs_inode->i_mapping, dirty_start,
> > -                             dirty_end);
> > +             /*
> > +              * Wait for the inode data buffers writeout.
> > +              * This can be overriden with a custom callback.
> > +              */
> > +             if (journal->j_finish_inode_data_buffers)
> > +                     err = journal->j_finish_inode_data_buffers(jinode);
> > +             else
> > +                     err = filemap_fdatawait_range_keep_errors(
> > +                                     jinode->i_vfs_inode->i_mapping,
> > +                                     dirty_start, dirty_end);
> >               if (!ret)
> >                       ret = err;
> >               spin_lock(&journal->j_list_lock);
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index d56128df2aff..24efe88eda1b 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -628,7 +628,8 @@ struct transaction_s
> >       struct journal_head     *t_shadow_list;
> >
> >       /*
> > -      * List of inodes whose data we've modified in data=ordered mode.
> > +      * List of inodes whose data we've modified in data=ordered mode
> > +      * or whose pages we should write-protect in data=journaled mode.
>
> I'd rather change the comment to generic "List of inodes associated with
> the transaction. E.g. ext4 uses this to track inodes in data=ordered and
> data=journal mode that need special handling on transaction commit.".
>
> >        * [j_list_lock]
> >        */
> >       struct list_head        t_inode_list;
> > @@ -1110,6 +1111,24 @@ struct journal_s
> >       void                    (*j_commit_callback)(journal_t *,
> >                                                    transaction_t *);
> >
> > +     /**
> > +      * @j_submit_inode_data_buffers:
> > +      *
> > +      * This function is called before flushing metadata buffers.
> > +      * This overrides the default behavior (writeout data buffers.)
> > +      */
>
> I'd change the comment to:
>          * This function is called for all inodes associated with the
>          * committing transaction marked with JI_WRITE_DATA flag before we
>          * start to write out the transaction to the journal.
>
> > +     int                     (*j_submit_inode_data_buffers)
> > +                                     (struct jbd2_inode *);
> > +
> > +     /**
> > +      * @j_finish_inode_data_buffers:
> > +      *
> > +      * This function is called after flushing metadata buffers.
> > +      * This overrides the default behavior (wait writeout.)
> > +      */
>
> And here:
>          * This function is called for all inodes associated with the
>          * committing transaction marked with JI_WAIT_DATA flag after we
>          * we have written the transaction to the journal but before we
>          * write out the commit block.
>
>
> > +     int                     (*j_finish_inode_data_buffers)
> > +                                     (struct jbd2_inode *);
> > +
> >       /*
> >        * Journal statistics
> >        */
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
Jan Kara Aug. 19, 2020, 9:16 a.m. UTC | #3
On Tue 18-08-20 22:20:08, Mauricio Faria de Oliveira wrote:
> > > To opt-out of the default behavior (i.e., to do nothing), one has
> > > to opt-in with a no-op function.
> >
> > Your Signed-off-by is missing for this patch.
> 
> Oh, I thought it wasn't needed in RFC patches.

Yes, it's not strictly needed if you don't want patches included yet. But
usually they are present so that people have less things to worry about
when preparing final submission :)

								Honza
diff mbox series

Patch

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 51f713089e35..b98d227b50d8 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -237,10 +237,14 @@  static int journal_submit_data_buffers(journal_t *journal,
 		 * instead of writepages. Because writepages can do
 		 * block allocation  with delalloc. We need to write
 		 * only allocated blocks here.
+		 * This can be overriden with a custom callback.
 		 */
 		trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
-		err = journal_submit_inode_data_buffers(mapping, dirty_start,
-				dirty_end);
+		if (journal->j_submit_inode_data_buffers)
+			err = journal->j_submit_inode_data_buffers(jinode);
+		else
+			err = journal_submit_inode_data_buffers(mapping,
+					dirty_start, dirty_end);
 		if (!ret)
 			ret = err;
 		spin_lock(&journal->j_list_lock);
@@ -274,9 +278,16 @@  static int journal_finish_inode_data_buffers(journal_t *journal,
 			continue;
 		jinode->i_flags |= JI_COMMIT_RUNNING;
 		spin_unlock(&journal->j_list_lock);
-		err = filemap_fdatawait_range_keep_errors(
-				jinode->i_vfs_inode->i_mapping, dirty_start,
-				dirty_end);
+		/*
+		 * Wait for the inode data buffers writeout.
+		 * This can be overriden with a custom callback.
+		 */
+		if (journal->j_finish_inode_data_buffers)
+			err = journal->j_finish_inode_data_buffers(jinode);
+		else
+			err = filemap_fdatawait_range_keep_errors(
+					jinode->i_vfs_inode->i_mapping,
+					dirty_start, dirty_end);
 		if (!ret)
 			ret = err;
 		spin_lock(&journal->j_list_lock);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index d56128df2aff..24efe88eda1b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -628,7 +628,8 @@  struct transaction_s
 	struct journal_head	*t_shadow_list;
 
 	/*
-	 * List of inodes whose data we've modified in data=ordered mode.
+	 * List of inodes whose data we've modified in data=ordered mode
+	 * or whose pages we should write-protect in data=journaled mode.
 	 * [j_list_lock]
 	 */
 	struct list_head	t_inode_list;
@@ -1110,6 +1111,24 @@  struct journal_s
 	void			(*j_commit_callback)(journal_t *,
 						     transaction_t *);
 
+	/**
+	 * @j_submit_inode_data_buffers:
+	 *
+	 * This function is called before flushing metadata buffers.
+	 * This overrides the default behavior (writeout data buffers.)
+	 */
+	int			(*j_submit_inode_data_buffers)
+					(struct jbd2_inode *);
+
+	/**
+	 * @j_finish_inode_data_buffers:
+	 *
+	 * This function is called after flushing metadata buffers.
+	 * This overrides the default behavior (wait writeout.)
+	 */
+	int			(*j_finish_inode_data_buffers)
+					(struct jbd2_inode *);
+
 	/*
 	 * Journal statistics
 	 */