ext4: fix deadlock while checkpoint thread waits commit thread to finish

Message ID 20181114114935.5250-1-xiaoguang.wang@linux.alibaba.com
State New
Headers show
Series
  • ext4: fix deadlock while checkpoint thread waits commit thread to finish
Related show

Commit Message

Xiaoguang Wang Nov. 14, 2018, 11:49 a.m.
This issue was found when I tried to put checkpoint work in a separate thread,
the deadlock below happened:
         Thread1                                |   Thread2
__jbd2_log_wait_for_space                       |
jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
  if (jh->b_transaction != NULL)                |
    ...                                         |
    jbd2_log_start_commit(journal, tid);        |jbd2_update_log_tail
                                                |  will lock j_checkpoint_mutex,
                                                |  but will be blocked here.
                                                |
    jbd2_log_wait_commit(journal, tid);         |
    wait_event(journal->j_wait_done_commit,     |
     !tid_gt(tid, journal->j_commit_sequence)); |
     ...                                        |wake_up(j_wait_done_commit)
  }                                             |

then deadlock occurs, Thread1 will never be waken up.

To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
when we are going to wait for transaction commit.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 fs/jbd2/checkpoint.c | 21 +++++++++++++++++----
 fs/jbd2/journal.c    |  2 +-
 2 files changed, 18 insertions(+), 5 deletions(-)

Comments

Jan Kara Nov. 22, 2018, 12:36 p.m. | #1
On Wed 14-11-18 19:49:35, Xiaoguang Wang wrote:
> This issue was found when I tried to put checkpoint work in a separate thread,
> the deadlock below happened:
>          Thread1                                |   Thread2
> __jbd2_log_wait_for_space                       |
> jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
>   if (jh->b_transaction != NULL)                |
>     ...                                         |
>     jbd2_log_start_commit(journal, tid);        |jbd2_update_log_tail
>                                                 |  will lock j_checkpoint_mutex,
>                                                 |  but will be blocked here.
>                                                 |
>     jbd2_log_wait_commit(journal, tid);         |
>     wait_event(journal->j_wait_done_commit,     |
>      !tid_gt(tid, journal->j_commit_sequence)); |
>      ...                                        |wake_up(j_wait_done_commit)
>   }                                             |
> 
> then deadlock occurs, Thread1 will never be waken up.
> 
> To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
> when we are going to wait for transaction commit.
> 
> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

Thanks for the patch! One comment below...

> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 26f8d7e46462..e728844f2f0e 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -113,7 +113,7 @@ void __jbd2_log_wait_for_space(journal_t *journal)
>  	nblocks = jbd2_space_needed(journal);
>  	while (jbd2_log_space_left(journal) < nblocks) {
>  		write_unlock(&journal->j_state_lock);
> -		mutex_lock(&journal->j_checkpoint_mutex);
> +		mutex_lock_io(&journal->j_checkpoint_mutex);
>  
>  		/*
>  		 * Test again, another process may have checkpointed while we
> @@ -241,8 +241,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>  	 * done (maybe it's a new transaction, but it fell at the same
>  	 * address).
>  	 */
> -	if (journal->j_checkpoint_transactions != transaction ||
> -	    transaction->t_tid != this_tid)
> +	if (journal->j_checkpoint_transactions == NULL ||
> +	    journal->j_checkpoint_transactions->t_tid != this_tid)
>  		goto out;

Why did you change this? As far as I can tell there's no difference and the
previous condition makes it more obvious that we are still looking at the
same transaction.

Otherwise the patch looks good so after removing the above hunk feel free to
add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> @@ -276,9 +276,22 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>  		"JBD2: %s: Waiting for Godot: block %llu\n",
>  		journal->j_devname, (unsigned long long) bh->b_blocknr);
>  
> +			if (batch_count)
> +				__flush_batch(journal, &batch_count);
>  			jbd2_log_start_commit(journal, tid);
> +			/*
> +			 * jbd2_journal_commit_transaction() may want
> +			 * to take the checkpoint_mutex if JBD2_FLUSHED
> +			 * is set, jbd2_update_log_tail() called by
> +			 * jbd2_journal_commit_transaction() may also take
> +			 * checkpoint_mutex.  So we need to temporarily
> +			 * drop it.
> +			 */
> +			mutex_unlock(&journal->j_checkpoint_mutex);
>  			jbd2_log_wait_commit(journal, tid);
> -			goto retry;
> +			mutex_lock_io(&journal->j_checkpoint_mutex);
> +			spin_lock(&journal->j_list_lock);
> +			goto restart;
>  		}
>  		if (!buffer_dirty(bh)) {
>  			if (unlikely(buffer_write_io_error(bh)) && !result)
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 8ef6b6daaa7a..88d8f22d2cba 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2067,7 +2067,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
>  	err = jbd2_journal_skip_recovery(journal);
>  	if (write) {
>  		/* Lock to make assertions happy... */
> -		mutex_lock(&journal->j_checkpoint_mutex);
> +		mutex_lock_io(&journal->j_checkpoint_mutex);
>  		jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA);
>  		mutex_unlock(&journal->j_checkpoint_mutex);
>  	}
> -- 
> 2.17.2
>
Xiaoguang Wang Nov. 23, 2018, 2:45 a.m. | #2
hi,

> On Wed 14-11-18 19:49:35, Xiaoguang Wang wrote:
>> This issue was found when I tried to put checkpoint work in a separate thread,
>> the deadlock below happened:
>>           Thread1                                |   Thread2
>> __jbd2_log_wait_for_space                       |
>> jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
>>    if (jh->b_transaction != NULL)                |
>>      ...                                         |
>>      jbd2_log_start_commit(journal, tid);        |jbd2_update_log_tail
>>                                                  |  will lock j_checkpoint_mutex,
>>                                                  |  but will be blocked here.
>>                                                  |
>>      jbd2_log_wait_commit(journal, tid);         |
>>      wait_event(journal->j_wait_done_commit,     |
>>       !tid_gt(tid, journal->j_commit_sequence)); |
>>       ...                                        |wake_up(j_wait_done_commit)
>>    }                                             |
>>
>> then deadlock occurs, Thread1 will never be waken up.
>>
>> To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
>> when we are going to wait for transaction commit.
>>
>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> 
> Thanks for the patch! One comment below...
> 
>> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
>> index 26f8d7e46462..e728844f2f0e 100644
>> --- a/fs/jbd2/checkpoint.c
>> +++ b/fs/jbd2/checkpoint.c
>> @@ -113,7 +113,7 @@ void __jbd2_log_wait_for_space(journal_t *journal)
>>   	nblocks = jbd2_space_needed(journal);
>>   	while (jbd2_log_space_left(journal) < nblocks) {
>>   		write_unlock(&journal->j_state_lock);
>> -		mutex_lock(&journal->j_checkpoint_mutex);
>> +		mutex_lock_io(&journal->j_checkpoint_mutex);
>>   
>>   		/*
>>   		 * Test again, another process may have checkpointed while we
>> @@ -241,8 +241,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>>   	 * done (maybe it's a new transaction, but it fell at the same
>>   	 * address).
>>   	 */
>> -	if (journal->j_checkpoint_transactions != transaction ||
>> -	    transaction->t_tid != this_tid)
>> +	if (journal->j_checkpoint_transactions == NULL ||
>> +	    journal->j_checkpoint_transactions->t_tid != this_tid)
>>   		goto out;
> 
> Why did you change this? As far as I can tell there's no difference and the
> previous condition makes it more obvious that we are still looking at the
> same transaction.
In this patch, we may drop j_checkpoint_mutex, then another thread may acquire
this lock, do checkpoint work and freed current transaction, "transaction->t_tid"
will cause an invalid pointer dereference.

Regards,
Xiaoguang Wang
> 
> Otherwise the patch looks good so after removing the above hunk feel free to
> add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 								Honza
> 
> 
>> @@ -276,9 +276,22 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>>   		"JBD2: %s: Waiting for Godot: block %llu\n",
>>   		journal->j_devname, (unsigned long long) bh->b_blocknr);
>>   
>> +			if (batch_count)
>> +				__flush_batch(journal, &batch_count);
>>   			jbd2_log_start_commit(journal, tid);
>> +			/*
>> +			 * jbd2_journal_commit_transaction() may want
>> +			 * to take the checkpoint_mutex if JBD2_FLUSHED
>> +			 * is set, jbd2_update_log_tail() called by
>> +			 * jbd2_journal_commit_transaction() may also take
>> +			 * checkpoint_mutex.  So we need to temporarily
>> +			 * drop it.
>> +			 */
>> +			mutex_unlock(&journal->j_checkpoint_mutex);
>>   			jbd2_log_wait_commit(journal, tid);
>> -			goto retry;
>> +			mutex_lock_io(&journal->j_checkpoint_mutex);
>> +			spin_lock(&journal->j_list_lock);
>> +			goto restart;
>>   		}
>>   		if (!buffer_dirty(bh)) {
>>   			if (unlikely(buffer_write_io_error(bh)) && !result)
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index 8ef6b6daaa7a..88d8f22d2cba 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -2067,7 +2067,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
>>   	err = jbd2_journal_skip_recovery(journal);
>>   	if (write) {
>>   		/* Lock to make assertions happy... */
>> -		mutex_lock(&journal->j_checkpoint_mutex);
>> +		mutex_lock_io(&journal->j_checkpoint_mutex);
>>   		jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA);
>>   		mutex_unlock(&journal->j_checkpoint_mutex);
>>   	}
>> -- 
>> 2.17.2
>>
Jan Kara Nov. 23, 2018, 11:15 a.m. | #3
On Fri 23-11-18 10:45:20, Xiaoguang Wang wrote:
> hi,
> 
> > On Wed 14-11-18 19:49:35, Xiaoguang Wang wrote:
> > > This issue was found when I tried to put checkpoint work in a separate thread,
> > > the deadlock below happened:
> > >           Thread1                                |   Thread2
> > > __jbd2_log_wait_for_space                       |
> > > jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
> > >    if (jh->b_transaction != NULL)                |
> > >      ...                                         |
> > >      jbd2_log_start_commit(journal, tid);        |jbd2_update_log_tail
> > >                                                  |  will lock j_checkpoint_mutex,
> > >                                                  |  but will be blocked here.
> > >                                                  |
> > >      jbd2_log_wait_commit(journal, tid);         |
> > >      wait_event(journal->j_wait_done_commit,     |
> > >       !tid_gt(tid, journal->j_commit_sequence)); |
> > >       ...                                        |wake_up(j_wait_done_commit)
> > >    }                                             |
> > > 
> > > then deadlock occurs, Thread1 will never be waken up.
> > > 
> > > To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
> > > when we are going to wait for transaction commit.
> > > 
> > > Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> > 
> > Thanks for the patch! One comment below...
> > 
> > > diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> > > index 26f8d7e46462..e728844f2f0e 100644
> > > --- a/fs/jbd2/checkpoint.c
> > > +++ b/fs/jbd2/checkpoint.c
> > > @@ -113,7 +113,7 @@ void __jbd2_log_wait_for_space(journal_t *journal)
> > >   	nblocks = jbd2_space_needed(journal);
> > >   	while (jbd2_log_space_left(journal) < nblocks) {
> > >   		write_unlock(&journal->j_state_lock);
> > > -		mutex_lock(&journal->j_checkpoint_mutex);
> > > +		mutex_lock_io(&journal->j_checkpoint_mutex);
> > >   		/*
> > >   		 * Test again, another process may have checkpointed while we
> > > @@ -241,8 +241,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
> > >   	 * done (maybe it's a new transaction, but it fell at the same
> > >   	 * address).
> > >   	 */
> > > -	if (journal->j_checkpoint_transactions != transaction ||
> > > -	    transaction->t_tid != this_tid)
> > > +	if (journal->j_checkpoint_transactions == NULL ||
> > > +	    journal->j_checkpoint_transactions->t_tid != this_tid)
> > >   		goto out;
> > 
> > Why did you change this? As far as I can tell there's no difference and the
> > previous condition makes it more obvious that we are still looking at the
> > same transaction.
> In this patch, we may drop j_checkpoint_mutex, then another thread may acquire
> this lock, do checkpoint work and freed current transaction, "transaction->t_tid"
> will cause an invalid pointer dereference.

That is exactly the reason why we check:

if (journal->j_checkpoint_transactions != transaction || ...

So if this test is false and so transaction->t_tid != this_tid gets
evaluated we are sure that j_checkpoint_transactions actually still points
to our transaction.

								Honza
Xiaoguang Wang Nov. 24, 2018, 3:40 a.m. | #4
> On Fri 23-11-18 10:45:20, Xiaoguang Wang wrote:
>> hi,
>>
>>> On Wed 14-11-18 19:49:35, Xiaoguang Wang wrote:
>>>> This issue was found when I tried to put checkpoint work in a separate thread,
>>>> the deadlock below happened:
>>>>            Thread1                                |   Thread2
>>>> __jbd2_log_wait_for_space                       |
>>>> jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
>>>>     if (jh->b_transaction != NULL)                |
>>>>       ...                                         |
>>>>       jbd2_log_start_commit(journal, tid);        |jbd2_update_log_tail
>>>>                                                   |  will lock j_checkpoint_mutex,
>>>>                                                   |  but will be blocked here.
>>>>                                                   |
>>>>       jbd2_log_wait_commit(journal, tid);         |
>>>>       wait_event(journal->j_wait_done_commit,     |
>>>>        !tid_gt(tid, journal->j_commit_sequence)); |
>>>>        ...                                        |wake_up(j_wait_done_commit)
>>>>     }                                             |
>>>>
>>>> then deadlock occurs, Thread1 will never be waken up.
>>>>
>>>> To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
>>>> when we are going to wait for transaction commit.
>>>>
>>>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>>>
>>> Thanks for the patch! One comment below...
>>>
>>>> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
>>>> index 26f8d7e46462..e728844f2f0e 100644
>>>> --- a/fs/jbd2/checkpoint.c
>>>> +++ b/fs/jbd2/checkpoint.c
>>>> @@ -113,7 +113,7 @@ void __jbd2_log_wait_for_space(journal_t *journal)
>>>>    	nblocks = jbd2_space_needed(journal);
>>>>    	while (jbd2_log_space_left(journal) < nblocks) {
>>>>    		write_unlock(&journal->j_state_lock);
>>>> -		mutex_lock(&journal->j_checkpoint_mutex);
>>>> +		mutex_lock_io(&journal->j_checkpoint_mutex);
>>>>    		/*
>>>>    		 * Test again, another process may have checkpointed while we
>>>> @@ -241,8 +241,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>>>>    	 * done (maybe it's a new transaction, but it fell at the same
>>>>    	 * address).
>>>>    	 */
>>>> -	if (journal->j_checkpoint_transactions != transaction ||
>>>> -	    transaction->t_tid != this_tid)
>>>> +	if (journal->j_checkpoint_transactions == NULL ||
>>>> +	    journal->j_checkpoint_transactions->t_tid != this_tid)
>>>>    		goto out;
>>>
>>> Why did you change this? As far as I can tell there's no difference and the
>>> previous condition makes it more obvious that we are still looking at the
>>> same transaction.
>> In this patch, we may drop j_checkpoint_mutex, then another thread may acquire
>> this lock, do checkpoint work and freed current transaction, "transaction->t_tid"
>> will cause an invalid pointer dereference.
> 
> That is exactly the reason why we check:
> 
> if (journal->j_checkpoint_transactions != transaction || ...
> 
> So if this test is false and so transaction->t_tid != this_tid gets
> evaluated we are sure that j_checkpoint_transactions actually still points
> to our transaction.
I just realize that "journal->j_checkpoint_transactions != transaction" returns false, we
can make sure that transaction is valid, thanks. I'll send a patch v2 soon.

Regards,
Xiaoguang Wang
> 
> 								Honza
>

Patch

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 26f8d7e46462..e728844f2f0e 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -113,7 +113,7 @@  void __jbd2_log_wait_for_space(journal_t *journal)
 	nblocks = jbd2_space_needed(journal);
 	while (jbd2_log_space_left(journal) < nblocks) {
 		write_unlock(&journal->j_state_lock);
-		mutex_lock(&journal->j_checkpoint_mutex);
+		mutex_lock_io(&journal->j_checkpoint_mutex);
 
 		/*
 		 * Test again, another process may have checkpointed while we
@@ -241,8 +241,8 @@  int jbd2_log_do_checkpoint(journal_t *journal)
 	 * done (maybe it's a new transaction, but it fell at the same
 	 * address).
 	 */
-	if (journal->j_checkpoint_transactions != transaction ||
-	    transaction->t_tid != this_tid)
+	if (journal->j_checkpoint_transactions == NULL ||
+	    journal->j_checkpoint_transactions->t_tid != this_tid)
 		goto out;
 
 	/* checkpoint all of the transaction's buffers */
@@ -276,9 +276,22 @@  int jbd2_log_do_checkpoint(journal_t *journal)
 		"JBD2: %s: Waiting for Godot: block %llu\n",
 		journal->j_devname, (unsigned long long) bh->b_blocknr);
 
+			if (batch_count)
+				__flush_batch(journal, &batch_count);
 			jbd2_log_start_commit(journal, tid);
+			/*
+			 * jbd2_journal_commit_transaction() may want
+			 * to take the checkpoint_mutex if JBD2_FLUSHED
+			 * is set, jbd2_update_log_tail() called by
+			 * jbd2_journal_commit_transaction() may also take
+			 * checkpoint_mutex.  So we need to temporarily
+			 * drop it.
+			 */
+			mutex_unlock(&journal->j_checkpoint_mutex);
 			jbd2_log_wait_commit(journal, tid);
-			goto retry;
+			mutex_lock_io(&journal->j_checkpoint_mutex);
+			spin_lock(&journal->j_list_lock);
+			goto restart;
 		}
 		if (!buffer_dirty(bh)) {
 			if (unlikely(buffer_write_io_error(bh)) && !result)
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 8ef6b6daaa7a..88d8f22d2cba 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2067,7 +2067,7 @@  int jbd2_journal_wipe(journal_t *journal, int write)
 	err = jbd2_journal_skip_recovery(journal);
 	if (write) {
 		/* Lock to make assertions happy... */
-		mutex_lock(&journal->j_checkpoint_mutex);
+		mutex_lock_io(&journal->j_checkpoint_mutex);
 		jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA);
 		mutex_unlock(&journal->j_checkpoint_mutex);
 	}