diff mbox

[-V2] ext4: Drop mapped buffer_head check during page_mkwrite

Message ID 1251264196-31382-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
State Accepted, archived
Headers show

Commit Message

Aneesh Kumar K.V Aug. 26, 2009, 5:23 a.m. UTC
Inorder to check whether the buffer_heads are mapped we need
to hold page lock. Otherwise a reclaim can cleanup the attached
buffer_heads. Instead of taking page lock and check whether
buffer_heads are mapped we let the write_begin/write_end callback
does the equivalent. It does have a performance impact in that we
are doing more work if we the buffer_heads are already mapped.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/inode.c |   11 -----------
 1 files changed, 0 insertions(+), 11 deletions(-)

Comments

Theodore Ts'o Aug. 29, 2009, 2:26 a.m. UTC | #1
On Wed, Aug 26, 2009 at 10:53:16AM +0530, Aneesh Kumar K.V wrote:
> Inorder to check whether the buffer_heads are mapped we need
> to hold page lock. Otherwise a reclaim can cleanup the attached
> buffer_heads. Instead of taking page lock and check whether
> buffer_heads are mapped we let the write_begin/write_end callback
> does the equivalent. It does have a performance impact in that we
> are doing more work if we the buffer_heads are already mapped.

So I started looking at all of the work that we need to do in
write_begin/write_end; did you check both write paths depending on
whether we are using delayed allocation or not?  It would seem to me
that it might be better to use lock_page() and unlock_page() around
the check, since in many work loads the buffer heads will already be
mapped often, and it appears to me that write_begin() will end up
locking the page anyway.

Am I missing something?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Aug. 31, 2009, 6:30 a.m. UTC | #2
On Fri, Aug 28, 2009 at 10:26:56PM -0400, Theodore Tso wrote:
> On Wed, Aug 26, 2009 at 10:53:16AM +0530, Aneesh Kumar K.V wrote:
> > Inorder to check whether the buffer_heads are mapped we need
> > to hold page lock. Otherwise a reclaim can cleanup the attached
> > buffer_heads. Instead of taking page lock and check whether
> > buffer_heads are mapped we let the write_begin/write_end callback
> > does the equivalent. It does have a performance impact in that we
> > are doing more work if we the buffer_heads are already mapped.
> 
> So I started looking at all of the work that we need to do in
> write_begin/write_end; did you check both write paths depending on
> whether we are using delayed allocation or not?  It would seem to me
> that it might be better to use lock_page() and unlock_page() around
> the check, since in many work loads the buffer heads will already be
> mapped often, and it appears to me that write_begin() will end up
> locking the page anyway.
> 
> Am I missing something?
> 

Below are the possibilities i looked at

a) mmap with no parallel write to the same offset. That would mean
we don't have attached buffer heads because nobody attach buffer
heads to the page.

b) mmap happening to the hole. The buffer heads are not mapped.

c) mmap with parallel write to the same offset. The parallel write
did attach mapped buffer heads to the same page. So we should find
all buffer heads mapped in the above case. 

if we will find buffer heads already be mapped in many workloads then
i guess it make sense to add page lock. It will also avoid the
journal_start that we do in write_begin. I will redo the patch

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Aug. 31, 2009, 12:24 p.m. UTC | #3
On Mon, Aug 31, 2009 at 12:00:06PM +0530, Aneesh Kumar K.V wrote:
> Below are the possibilities i looked at
> 
> a) mmap with no parallel write to the same offset. That would mean
> we don't have attached buffer heads because nobody attach buffer
> heads to the page.
> 
> b) mmap happening to the hole. The buffer heads are not mapped.
> 
> c) mmap with parallel write to the same offset. The parallel write
> did attach mapped buffer heads to the same page. So we should find
> all buffer heads mapped in the above case. 
> 
> if we will find buffer heads already be mapped in many workloads then
> i guess it make sense to add page lock. It will also avoid the
> journal_start that we do in write_begin. I will redo the patch

The usage case I was worried about is the one where we are mmap'ing an
existing file (say, like an Oracle or DB2 table space, or a berkdb
database file), and we are writing into already allocated blocks.  In
that case (which does use these code paths, right?) the second time we
write a particular page, the buffer heads will already be mapped.

For database applications where we aren't loading a table, but just
making changes to an already instantiated table, the buffer heads
would be mapped most of the time, would they not?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Aug. 31, 2009, 12:33 p.m. UTC | #4
On Mon, Aug 31, 2009 at 08:24:48AM -0400, Theodore Tso wrote:
> On Mon, Aug 31, 2009 at 12:00:06PM +0530, Aneesh Kumar K.V wrote:
> > Below are the possibilities i looked at
> > 
> > a) mmap with no parallel write to the same offset. That would mean
> > we don't have attached buffer heads because nobody attach buffer
> > heads to the page.
> > 
> > b) mmap happening to the hole. The buffer heads are not mapped.
> > 
> > c) mmap with parallel write to the same offset. The parallel write
> > did attach mapped buffer heads to the same page. So we should find
> > all buffer heads mapped in the above case. 
> > 
> > if we will find buffer heads already be mapped in many workloads then
> > i guess it make sense to add page lock. It will also avoid the
> > journal_start that we do in write_begin. I will redo the patch
> 
> The usage case I was worried about is the one where we are mmap'ing an
> existing file (say, like an Oracle or DB2 table space, or a berkdb
> database file), and we are writing into already allocated blocks.  In
> that case (which does use these code paths, right?) the second time we
> write a particular page, the buffer heads will already be mapped.

If the database is not being updated via a write(2), then even though
the blocks are already allocated, we won't find buffer_heads attached to the page.

ie, page_buffers(page) will be NULL

The page_mkwrite -> write_begin  path would be allocating the buffer_heads
and attaching them to the page. So even in the above case we will be
doing write_begin -> write_end. That is, it is similar to the (a) i wrote
above.


> 
> For database applications where we aren't loading a table, but just
> making changes to an already instantiated table, the buffer heads
> would be mapped most of the time, would they not?
> 
> 					- Ted

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Aug. 31, 2009, 12:50 p.m. UTC | #5
On Mon, Aug 31, 2009 at 06:03:14PM +0530, Aneesh Kumar K.V wrote:
> If the database is not being updated via a write(2), then even though
> the blocks are already allocated, we won't find buffer_heads attached to the page.
> 
> ie, page_buffers(page) will be NULL
> 
> The page_mkwrite -> write_begin  path would be allocating the buffer_heads
> and attaching them to the page. So even in the above case we will be
> doing write_begin -> write_end. That is, it is similar to the (a) i wrote
> above.

What about the case where they are being updated via llseek(2) and
write(2)?  I'll grant that isn't as common these days (dbm used to do
it, but these days most people use berk_db, which does use mmap), but
it's not a totally unknown thing to do.  Certainly any of the
e2fsprogs tools operating on a filesystem-image-in-a-file (which isn't
that uncommon if you are using KVM or some other virtualization
situation) uses llseek(2) and write(2).  I'd have to check to see
whether KVM/qemu is using mmap(2) or write(2).

If we think when we update-in-place already allocated blocks, it's
much more common to use mmap(2) than lseek(2)/write(2), then I can see
how avoiding taking a page_lock in ext4_page_mkwrite() might be the
right choice.  On the other hand, if write(2) is more common, we'll be
starting and stopping a transaction handle, and going through a *much*
more complicated code path.

The other question I have then is that there are multiple
write_begin/write_end functions that could be used, if we are going to
be dropping this check in ext4_page_mkwrite() and depending in
write_begin/write_end to do the right thing.  (ext4_write_begin,
ext4_da_write_begin, ext4_ordered_write_end,
ext4_journalled_write_end, ext4_writeback_write_end).  You did check
all of the possible code path combinations, to make sure they will do
the right thing?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Aug. 31, 2009, 5:06 p.m. UTC | #6
On Mon, Aug 31, 2009 at 08:50:25AM -0400, Theodore Tso wrote:
> On Mon, Aug 31, 2009 at 06:03:14PM +0530, Aneesh Kumar K.V wrote:
> > If the database is not being updated via a write(2), then even though
> > the blocks are already allocated, we won't find buffer_heads attached to the page.
> > 
> > ie, page_buffers(page) will be NULL
> > 
> > The page_mkwrite -> write_begin  path would be allocating the buffer_heads
> > and attaching them to the page. So even in the above case we will be
> > doing write_begin -> write_end. That is, it is similar to the (a) i wrote
> > above.
> 
> What about the case where they are being updated via llseek(2) and
> write(2)?  I'll grant that isn't as common these days (dbm used to do
> it, but these days most people use berk_db, which does use mmap), but
> it's not a totally unknown thing to do.  Certainly any of the
> e2fsprogs tools operating on a filesystem-image-in-a-file (which isn't
> that uncommon if you are using KVM or some other virtualization
> situation) uses llseek(2) and write(2).  I'd have to check to see
> whether KVM/qemu is using mmap(2) or write(2).
> 
> If we think when we update-in-place already allocated blocks, it's
> much more common to use mmap(2) than lseek(2)/write(2), then I can see
> how avoiding taking a page_lock in ext4_page_mkwrite() might be the
> right choice.  On the other hand, if write(2) is more common, we'll be
> starting and stopping a transaction handle, and going through a *much*
> more complicated code path.
> 
> The other question I have then is that there are multiple
> write_begin/write_end functions that could be used, if we are going to
> be dropping this check in ext4_page_mkwrite() and depending in
> write_begin/write_end to do the right thing.  (ext4_write_begin,
> ext4_da_write_begin, ext4_ordered_write_end,
> ext4_journalled_write_end, ext4_writeback_write_end).  You did check
> all of the possible code path combinations, to make sure they will do
> the right thing?

Both ext4_write_begin and ext4_da_write_begin use block_write_begin
which calls __block_prepare_write which looks at the mapped flag of the
buffer_head and call get_block if not mapped. Delayed alloc get_block
does block reservation and returns a mapped buffer_head and non delayed
alloc get_block does block allocation and returns a mapped buffer_head.
So in both the case i guess we are ok

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Sept. 6, 2009, 3:49 a.m. UTC | #7
OK, I've added this patch to the ext4 patch queue.

    	       	    	     	 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Sept. 7, 2009, 12:22 p.m. UTC | #8
On Sat, Sep 05, 2009 at 11:49:51PM -0400, Theodore Tso wrote:
> OK, I've added this patch to the ext4 patch queue.
> 


I have sent a v3 version of this patch. You may want to take that

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9c642b..ad99286 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5245,11 +5245,6 @@  int ext4_change_inode_journal_flag(struct inode *inode, int val)
 	return err;
 }
 
-static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
-{
-	return !buffer_mapped(bh);
-}
-
 int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct page *page = vmf->page;
@@ -5281,12 +5276,6 @@  int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	else
 		len = PAGE_CACHE_SIZE;
 
-	if (page_has_buffers(page)) {
-		/* return if we have all the buffers mapped */
-		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
-				       ext4_bh_unmapped))
-			goto out_unlock;
-	}
 	/*
 	 * OK, we need to fill the hole... Do write_begin write_end
 	 * to do block allocation/reservation.We are not holding