Patchwork [1/2] ext4: let mpage_submit_io works well when blocksize < pagesize

login
register
mail settings
Submitter Yongqiang Yang
Date Nov. 23, 2011, 9:15 a.m.
Message ID <1322039726-3587-1-git-send-email-xiaoqiangnk@gmail.com>
Download mbox | patch
Permalink /patch/127279/
State Accepted
Headers show

Comments

Yongqiang Yang - Nov. 23, 2011, 9:15 a.m.
If there is a unwritten but clean buffer in a page and there is a dirty buffer
after the buffer, then mpage_submit_io does not write the dirty buffer out.
As a result, da_writepages loops forever.

This patch fixes the problem by checking dirty flag.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
 fs/ext4/inode.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)
Tao Ma - Nov. 26, 2011, 3:08 p.m.
Hi Yongqiang,
On 11/23/2011 05:15 PM, Yongqiang Yang wrote:
> If there is a unwritten but clean buffer in a page and there is a dirty buffer
> after the buffer, then mpage_submit_io does not write the dirty buffer out.
> As a result, da_writepages loops forever.
Did you ever meet with this bug or just find it to be a possible bug by
skimming the code? Actually, I can't find a proper way to get a buffer
which can satisfy the check.

Thanks
Tao
> 
> This patch fixes the problem by checking dirty flag.
> 
> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> ---
>  fs/ext4/inode.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 755f6c7..20a1d17 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1339,8 +1339,11 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
>  					clear_buffer_unwritten(bh);
>  				}
>  
> -				/* skip page if block allocation undone */
> -				if (buffer_delay(bh) || buffer_unwritten(bh))
> +				/*
> +				 * skip page if block allocation undone and
> +				 * block is dirty
> +				 */
> +				if (ext4_bh_delay_or_unwritten(NULL, bh))
>  					skip_page = 1;
>  				bh = bh->b_this_page;
>  				block_start += bh->b_size;


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma - Nov. 26, 2011, 3:22 p.m.
On 11/26/2011 11:13 PM, Yongqiang Yang wrote:
> Hi,
> 
> It can be tested by xfstests 61 or 91, I did not remember which one.
ok, I will try it later. But I still wonder how it happens. See my
comments below.
> 
> It can be reproduced by reading on a fallocted block and write the block
> after the fallocted block.  Then the written block can not be written
> out by da_writepages.
Why? in read no bh will be created, and in write only the written bh
will be mapped and set unwritten. How could that happen? Sorry, but this
explanation doesn't convince me.
> 
> Yongqiang.
> 
> On Saturday, November 26, 2011, Tao Ma <tm@tao.ma <mailto:tm@tao.ma>> wrote:
>> Hi Yongqiang,
>> On 11/23/2011 05:15 PM, Yongqiang Yang wrote:
>>> If there is a unwritten but clean buffer in a page and there is a
> dirty buffer
>>> after the buffer, then mpage_submit_io does not write the dirty
> buffer out.
>>> As a result, da_writepages loops forever.
>> Did you ever meet with this bug or just find it to be a possible bug by
>> skimming the code? Actually, I can't find a proper way to get a buffer
>> which can satisfy the check.
>>
>> Thanks
>> Tao
>>>
>>> This patch fixes the problem by checking dirty flag.
>>>
>>> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com
> <mailto:xiaoqiangnk@gmail.com>>
>>> ---
>>>  fs/ext4/inode.c |    7 +++++--
>>>  1 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>> index 755f6c7..20a1d17 100644
>>> --- a/fs/ext4/inode.c
>>> +++ b/fs/ext4/inode.c
>>> @@ -1339,8 +1339,11 @@ static int mpage_da_submit_io(struct
> mpage_da_data *mpd,
>>>                                       clear_buffer_unwritten(bh);
>>>                               }
>>>
>>> -                             /* skip page if block allocation undone */
>>> -                             if (buffer_delay(bh) ||
> buffer_unwritten(bh))
>>> +                             /*
>>> +                              * skip page if block allocation undone and
>>> +                              * block is dirty
>>> +                              */
>>> +                             if (ext4_bh_delay_or_unwritten(NULL, bh))
>>>                                       skip_page = 1;
>>>                               bh = bh->b_this_page;
>>>                               block_start += bh->b_size;
>>
>>
>>
> 
> -- 
> Best Wishes
> Yongqiang Yang
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma - Nov. 27, 2011, 8:59 a.m.
On 11/27/2011 09:29 AM, Yongqiang Yang wrote:
> 
> 
> On Saturday, November 26, 2011, Tao Ma <tm@tao.ma <mailto:tm@tao.ma>> wrote:
>> On 11/26/2011 11:13 PM, Yongqiang Yang wrote:
>>> Hi,
>>>
>>> It can be tested by xfstests 61 or 91, I did not remember which one.
>> ok, I will try it later. But I still wonder how it happens. See my
>> comments below.
>>>
>>> It can be reproduced by reading on a fallocted block and write the block
>>> after the fallocted block.  Then the written block can not be written
>>> out by da_writepages.
>> Why? in read no bh will be created, and in write only the written bh
>> will be mapped and set unwritten. How could that happen? Sorry, but this
>> explanation doesn't convince me.
> Ok!  The bug was found with punch hole enabled, and punch hole creates
> empty buffers sometimes.
> Actually, punching hole can read the whole page like read operation
> does, but it do not.  May be this is the source of the  problem.
> The problemastic  function is discard_partial_buffers.   I post a patch
> fixing another bug in it in this series.  If we read whole page in this
> function, the problem can also be solved.
Got it and thanks for the explanation. So we introduced a bug and then
fix it in another way. ;)

Thanks
Tao
> 
> Yongqiang.
>>>
>>> Yongqiang.
>>>
>>> On Saturday, November 26, 2011, Tao Ma <tm@tao.ma <mailto:tm@tao.ma>
> <mailto:tm@tao.ma <mailto:tm@tao.ma>>> wrote:
>>>> Hi Yongqiang,
>>>> On 11/23/2011 05:15 PM, Yongqiang Yang wrote:
>>>>> If there is a unwritten but clean buffer in a page and there is a
>>> dirty buffer
>>>>> after the buffer, then mpage_submit_io does not write the dirty
>>> buffer out.
>>>>> As a result, da_writepages loops forever.
>>>> Did you ever meet with this bug or just find it to be a possible bug by
>>>> skimming the code? Actually, I can't find a proper way to get a buffer
>>>> which can satisfy the check.
>>>>
>>>> Thanks
>>>> Tao
>>>>>
>>>>> This patch fixes the problem by checking dirty flag.
>>>>>
>>>>> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com
> <mailto:xiaoqiangnk@gmail.com>
>>> <mailto:xiaoqiangnk@gmail.com <mailto:xiaoqiangnk@gmail.com>>>
>>>>> ---
>>>>>  fs/ext4/inode.c |    7 +++++--
>>>>>  1 files changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>>> index 755f6c7..20a1d17 100644
>>>>> --- a/fs/ext4/inode.c
>>>>> +++ b/fs/ext4/inode.c
>>>>> @@ -1339,8 +1339,11 @@ static int mpage_da_submit_io(struct
>>> mpage_da_data *mpd,
>>>>>                                       clear_buffer_unwritten(bh);
>>>>>                               }
>>>>>
>>>>> -                             /* skip page if block allocation
> undone */
>>>>> -                             if (buffer_delay(bh) ||
>>> buffer_unwritten(bh))
>>>>> +                             /*
>>>>> +                              * skip page if block allocation
> undone and
>>>>> +                              * block is dirty
>>>>> +                              */
>>>>> +                             if (ext4_bh_delay_or_unwritten(NULL, bh))
>>>>>                                       skip_page = 1;
>>>>>                               bh = bh->b_this_page;
>>>>>                               block_start += bh->b_size;
>>>>
>>>>
>>>>
>>>
>>> --
>>> Best Wishes
>>> Yongqiang Yang
>>>
>>
>>
> 
> -- 
> Best Wishes
> Yongqiang Yang
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Allison Henderson - Dec. 1, 2011, 8:13 p.m.
On 11/23/2011 02:15 AM, Yongqiang Yang wrote:
> If there is a unwritten but clean buffer in a page and there is a dirty buffer
> after the buffer, then mpage_submit_io does not write the dirty buffer out.
> As a result, da_writepages loops forever.
>
> This patch fixes the problem by checking dirty flag.
>
> Signed-off-by: Yongqiang Yang<xiaoqiangnk@gmail.com>
> ---
>   fs/ext4/inode.c |    7 +++++--
>   1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 755f6c7..20a1d17 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1339,8 +1339,11 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
>   					clear_buffer_unwritten(bh);
>   				}
>
> -				/* skip page if block allocation undone */
> -				if (buffer_delay(bh) || buffer_unwritten(bh))
> +				/*
> +				 * skip page if block allocation undone and
> +				 * block is dirty
> +				 */
> +				if (ext4_bh_delay_or_unwritten(NULL, bh))
>   					skip_page = 1;
>   				bh = bh->b_this_page;
>   				block_start += bh->b_size;

Hi Yongqiang,

Thank you for looking into the punch hole code, I know there's been some 
recent bugs reported, so I am looking at it too.  I've applied your 
patch and ran it through an fsx stress test, and I notice there are some 
failures, but it appears to run longer with the patch then with out it, 
so it may not be the cause of the errors I'm seeing. I think maybe 
something else may have happened between now and the last time it made 
it through 24hr of fsx (at least for me :) ), so I'm continuing to look 
through the recent code changes.  I will keep folks posted on my 
findings.  Thx!

Allison Henderson

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yongqiang Yang - Dec. 2, 2011, 1:15 a.m.
On Fri, Dec 2, 2011 at 4:13 AM, Allison Henderson
<achender@linux.vnet.ibm.com> wrote:
> On 11/23/2011 02:15 AM, Yongqiang Yang wrote:
>>
>> If there is a unwritten but clean buffer in a page and there is a dirty
>> buffer
>> after the buffer, then mpage_submit_io does not write the dirty buffer
>> out.
>> As a result, da_writepages loops forever.
>>
>> This patch fixes the problem by checking dirty flag.
>>
>> Signed-off-by: Yongqiang Yang<xiaoqiangnk@gmail.com>
>> ---
>>  fs/ext4/inode.c |    7 +++++--
>>  1 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 755f6c7..20a1d17 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1339,8 +1339,11 @@ static int mpage_da_submit_io(struct mpage_da_data
>> *mpd,
>>                                        clear_buffer_unwritten(bh);
>>                                }
>>
>> -                               /* skip page if block allocation undone */
>> -                               if (buffer_delay(bh) ||
>> buffer_unwritten(bh))
>> +                               /*
>> +                                * skip page if block allocation undone
>> and
>> +                                * block is dirty
>> +                                */
>> +                               if (ext4_bh_delay_or_unwritten(NULL, bh))
>>                                        skip_page = 1;
>>                                bh = bh->b_this_page;
>>                                block_start += bh->b_size;
>
> Hi Yongqiang,
>
> Thank you for looking into the punch hole code, I know there's been some
> recent bugs reported, so I am looking at it too.  I've applied your patch
> and ran it through an fsx stress test, and I notice there are some failures,
> but it appears to run longer with the patch then with out it, so it may not
> be the cause of the errors I'm seeing. I think maybe something else may have
> happened between now and the last time it made it through 24hr of fsx (at
> least for me :) ), so I'm continuing to look through the recent code
On the other hand, xfstests have a lot of changes since your last
test.   I am not sure if original xfstests did not discover some
errors.
> changes.  I will keep folks posted on my findings.  Thx!
Did you test it by multi-thread tests or single thread tests?  If
multi-thread, I suggest that we hold the i_mutex in punching hole.


Yongqiang.
>
> Allison Henderson
>
>
Allison Henderson - Dec. 2, 2011, 6:46 a.m.
On 12/01/2011 06:15 PM, Yongqiang Yang wrote:
> On Fri, Dec 2, 2011 at 4:13 AM, Allison Henderson
> <achender@linux.vnet.ibm.com>  wrote:
>> On 11/23/2011 02:15 AM, Yongqiang Yang wrote:
>>>
>>> If there is a unwritten but clean buffer in a page and there is a dirty
>>> buffer
>>> after the buffer, then mpage_submit_io does not write the dirty buffer
>>> out.
>>> As a result, da_writepages loops forever.
>>>
>>> This patch fixes the problem by checking dirty flag.
>>>
>>> Signed-off-by: Yongqiang Yang<xiaoqiangnk@gmail.com>
>>> ---
>>>   fs/ext4/inode.c |    7 +++++--
>>>   1 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>> index 755f6c7..20a1d17 100644
>>> --- a/fs/ext4/inode.c
>>> +++ b/fs/ext4/inode.c
>>> @@ -1339,8 +1339,11 @@ static int mpage_da_submit_io(struct mpage_da_data
>>> *mpd,
>>>                                         clear_buffer_unwritten(bh);
>>>                                 }
>>>
>>> -                               /* skip page if block allocation undone */
>>> -                               if (buffer_delay(bh) ||
>>> buffer_unwritten(bh))
>>> +                               /*
>>> +                                * skip page if block allocation undone
>>> and
>>> +                                * block is dirty
>>> +                                */
>>> +                               if (ext4_bh_delay_or_unwritten(NULL, bh))
>>>                                         skip_page = 1;
>>>                                 bh = bh->b_this_page;
>>>                                 block_start += bh->b_size;
>>
>> Hi Yongqiang,
>>
>> Thank you for looking into the punch hole code, I know there's been some
>> recent bugs reported, so I am looking at it too.  I've applied your patch
>> and ran it through an fsx stress test, and I notice there are some failures,
>> but it appears to run longer with the patch then with out it, so it may not
>> be the cause of the errors I'm seeing. I think maybe something else may have
>> happened between now and the last time it made it through 24hr of fsx (at
>> least for me :) ), so I'm continuing to look through the recent code
> On the other hand, xfstests have a lot of changes since your last
> test.   I am not sure if original xfstests did not discover some
> errors.
>> changes.  I will keep folks posted on my findings.  Thx!
> Did you test it by multi-thread tests or single thread tests?  If
> multi-thread, I suggest that we hold the i_mutex in punching hole.

Alrighty, well the test Im using is just the fsx test (in xfstests under 
the ltp folder).  I will check and see if there's been any updates to it 
since then.  The command I usually use is just "./fsx -d -S 1 
/mnt/ext4MntPt/test" from the ltp folder.  I dont think it's 
multi-threaded, but the i_mutex lock is another work item on my plate 
that I haven't gotten to yet.  The reason we dont want to just lock 
i_mutex is because folks are trying to reduce the use of i_mutex in 
ext4.  So the plan is to implement extent locks to replace i_mutex all 
together.  That's another project, but I will do a trial run with 
i_mutex locked just to rule it out.

Allison Henderson

>
>
> Yongqiang.
>>
>> Allison Henderson
>>
>>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Dec. 14, 2011, 3:05 a.m.
On Wed, Nov 23, 2011 at 05:15:25PM +0800, Yongqiang Yang wrote:
> If there is a unwritten but clean buffer in a page and there is a dirty buffer
> after the buffer, then mpage_submit_io does not write the dirty buffer out.
> As a result, da_writepages loops forever.
> 
> This patch fixes the problem by checking dirty flag.
> 
> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>

Thanks, applied.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 755f6c7..20a1d17 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1339,8 +1339,11 @@  static int mpage_da_submit_io(struct mpage_da_data *mpd,
 					clear_buffer_unwritten(bh);
 				}
 
-				/* skip page if block allocation undone */
-				if (buffer_delay(bh) || buffer_unwritten(bh))
+				/*
+				 * skip page if block allocation undone and
+				 * block is dirty
+				 */
+				if (ext4_bh_delay_or_unwritten(NULL, bh))
 					skip_page = 1;
 				bh = bh->b_this_page;
 				block_start += bh->b_size;