diff mbox

[BUG,1/4] ext3: fix a cause of __schedule_bug via blkdev_releasepage

Message ID 20081208140138.GA17700@mit.edu
State Superseded, archived
Headers show

Commit Message

Theodore Ts'o Dec. 8, 2008, 2:01 p.m. UTC
Toshiyuki-san,

My apologies for not having a chance to review your patches; things
have been rather busy for me.  There were a couple of shortcomings in
your patch series, and I think there is a better way of solving the
issue at hand.  First of all, patches #1 and #2 use a new function
which is not actually defined until patches #3 and #4, respectively.
This can make it difficult for people who are trying to use "git
bisect" to try to localize the root cause of a problem.

Secondly, the introduction of a large number of wrapper functions
increases stack utilization, and makes the call depth deeper (and in
the long run, each increase in call depth makes the code that much
harder to trace through and understand), and so it should be done only
as last resort.  Fortunately, there is a simpler way of fixing this
problem.  I include the inter-diff below, but I will fold this into
the current blkdev_releasepage() patches that are in the ext4 patch
queue.

Best regards,

					- Ted

P.S.  Note that this patch is functionally identical to what you
proposed in your patch series, but since the gfp_wait mask already
controlls whether or not log_wait_commit() is called, instead of
introducing a new functional parameter, we just mask off __GFP_WAIT
before calling jbd2_journal_try_to_free_buffers.  I'll make a similar
change to the ext3 patch, and attach the two revised patches to this
mail thread.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Toshiyuki Okajima Dec. 12, 2008, 12:54 a.m. UTC | #1
Ted-san,
Thank you for your reviewing.

 > Toshiyuki-san,
 >
 > My apologies for not having a chance to review your patches; things
 > have been rather busy for me.  There were a couple of shortcomings in
 > your patch series, and I think there is a better way of solving the
 > issue at hand.  First of all, patches #1 and #2 use a new function
 > which is not actually defined until patches #3 and #4, respectively.
 > This can make it difficult for people who are trying to use "git
 > bisect" to try to localize the root cause of a problem.
 >
 > Secondly, the introduction of a large number of wrapper functions
 > increases stack utilization, and makes the call depth deeper (and in
 > the long run, each increase in call depth makes the code that much
 > harder to trace through and understand), and so it should be done only
 > as last resort.  Fortunately, there is a simpler way of fixing this
 > problem.  I include the inter-diff below, but I will fold this into
 > the current blkdev_releasepage() patches that are in the ext4 patch
 > queue.
 >
 > Best regards,
 >
 > 					- Ted
 >
 > P.S.  Note that this patch is functionally identical to what you
 > proposed in your patch series, but since the gfp_wait mask already
 > controlls whether or not log_wait_commit() is called, instead of
 > introducing a new functional parameter, we just mask off __GFP_WAIT
 > before calling jbd2_journal_try_to_free_buffers.  I'll make a similar
 > change to the ext3 patch, and attach the two revised patches to this
 > mail thread.

Through the idea as follows,  I agree to your change.

-----------------------------------------------------------------------------
To tell the truth, at first, I imagined the same patch as yours to fix this
problem. But I have made another patch because I thought that ext3(or ext4)
should not know the contents of the processing of journal_try_to_free_buffers
  in detail. (ext3 should not know there is a possibility to call
journal_wait_for_transaction_sync_data from journal_try_to_free_buffers.)
So, I have made a new function, journal_try_to_free_metadata_buffers
to release only metadata buffer_heads.
(I wanted a function which is the almost same as journal_try_to_free_buffers
except calling journal_wait_for_transaction_sync_data from it.)
However, this new function needed big changes, you know.

I reconsidered what was the most suitable patch to fix this problem
after I read your mail (patch).
And then, I thought that it was important to make the most concise patch
to fix only a root cause. Big patch is not easy to understand even if it is
more logical one.

Therefore, there is the fact that ext3_release_metadata must not sleep because
it can get a spinlock, and then, only changing ext3_release_metadata to the
logic to make it not sleep is the simplest fix for this problem.
-----------------------------------------------------------------------------

Best Regards,
Toshiyuki Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Dec. 12, 2008, 6:21 a.m. UTC | #2
On Fri, Dec 12, 2008 at 09:54:18AM +0900, Toshiyuki Okajima wrote:
> To tell the truth, at first, I imagined the same patch as yours to fix this
> problem. But I have made another patch because I thought that ext3(or ext4)
> should not know the contents of the processing of journal_try_to_free_buffers
>  in detail. (ext3 should not know there is a possibility to call
> journal_wait_for_transaction_sync_data from journal_try_to_free_buffers.)

I agree, but ext3 doesn't need to know that.  What my change did was
to mask off the _GFP_WAIT flag, which prohibits the function it calls
from blocking, because it knows that its caller is holding a spinlock.

And actually, come to think of it.  We can do even better; the right
fix is to have blkdev_releasepage() mask off the _GFP_WAIT flag; this
is the function which is taking spinlock, and by masking off the
__GFP_WAIT flag, this is simply requesting all of the downstream
functions not to block, but to do the best job they can do without
blocking.  It doesn't need to know whether it's going to call
log_wait_commit(), or anything else; all it needs to do is request
"please don't block".

That means we only make the request once, in the function which is
taking spinlock, so all of the per-filesystem implementations of
release_metadata() don't need to know that its caller is holding a
spinlock. 

							- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshiyuki Okajima Dec. 15, 2008, 2:21 a.m. UTC | #3
Ted-san,

Theodore Tso wrote:
 > On Fri, Dec 12, 2008 at 09:54:18AM +0900, Toshiyuki Okajima wrote:
 > > > To tell the truth, at first, I imagined the same patch as yours to fix this
 > > > problem. But I have made another patch because I thought that ext3(or ext4)
 > > > should not know the contents of the processing of journal_try_to_free_buffers
 > > >  in detail. (ext3 should not know there is a possibility to call
 > > > journal_wait_for_transaction_sync_data from journal_try_to_free_buffers.)
 >
 > I agree, but ext3 doesn't need to know that.  What my change did was
 > to mask off the _GFP_WAIT flag, which prohibits the function it calls
 > from blocking, because it knows that its caller is holding a spinlock.
 >
 > And actually, come to think of it.  We can do even better; the right
 > fix is to have blkdev_releasepage() mask off the _GFP_WAIT flag; this
 > is the function which is taking spinlock, and by masking off the
 > __GFP_WAIT flag, this is simply requesting all of the downstream
 > functions not to block, but to do the best job they can do without
 > blocking.  It doesn't need to know whether it's going to call
 > log_wait_commit(), or anything else; all it needs to do is request
 > "please don't block".

 > That means we only make the request once, in the function which is
 > taking spinlock, so all of the per-filesystem implementations of
 > release_metadata() don't need to know that its caller is holding a
 > spinlock.

OK. I agree your last change.

I also think blkdev_releasepage must do something so that downstream
functions of it don't sleep. Masking off the _GFP_WAIT flag is the
easiest achievement of it.
Besides, I think it is not valid implementation that brings us a care
about ei->client_releasepage's sleeping.

Additional Information:
I did an easy test with your last change and then I haven't experienced
any errors.

Regards,
Toshiyuki Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e77a059..543f3d0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3049,6 +3049,12 @@  static int ext4_releasepage(struct page *page, gfp_t wait)
  * mapped via the block device.  Since these pages could have journal heads
  * which would prevent try_to_free_buffers() from freeing them, we must use
  * jbd2 layer's try_to_free_buffers() function to release them.
+ *
+ * Note: we have to strip the __GFP_WAIT flag before calling
+ * jbd2_journal_try_to_free_buffers because blkdev_releasepage is
+ * called while holding a spinlock (bdev_inode.client_lock).
+ * Fortunately the metadata buffers we are interested are freed right
+ * away and do not require calling journal_wait_for_transaction_sync_data().
  */
 int ext4_release_metadata(void *client, struct page *page, gfp_t wait)
 {
@@ -3061,7 +3067,8 @@  int ext4_release_metadata(void *client, struct page *page, gfp_t wait)
 	BUG_ON(EXT4_SB(sb) == NULL);
 	journal = EXT4_SB(sb)->s_journal;
 	if (journal != NULL)
-		return jbd2_journal_try_to_free_buffers(journal, page, wait);
+		return jbd2_journal_try_to_free_buffers(journal, page,
+							wait & ~__GFP_WAIT);
 	else
 		return try_to_free_buffers(page);
 }