Patchwork ext3: fix message in ext3_remount for rw-remount case

login
register
mail settings
Submitter Toshiyuki Okajima
Date Aug. 1, 2011, 4:54 a.m.
Message ID <20110801135451.cb73c981.toshi.okajima@jp.fujitsu.com>
Download mbox | patch
Permalink /patch/107679/
State Not Applicable
Headers show

Comments

Toshiyuki Okajima - Aug. 1, 2011, 4:54 a.m.
If there are some inodes in orphan list while a filesystem is being 
read-only mounted, we should recommend that pepole umount and then
mount it when they try to remount with read-write. But the current
message/comment recommends that they umount and then remount it.

ext3_remount:
	/*
	 * If we have an unprocessed orphan list hanging
	 * around from a previously readonly bdev mount,
	 * require a full umount/remount for now.
                          ^^^^^^^^^^^^^^
	 */
	if (es->s_last_orphan) {
		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
			"remount RDWR because of unprocessed "
			"orphan inode list.  Please "
			"umount/remount instead.\n",
                         ^^^^^^^^^^^^^^
			sb->s_id);

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
---
 fs/ext3/super.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
Jan Kara - Aug. 1, 2011, 8:45 a.m.
On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
> If there are some inodes in orphan list while a filesystem is being 
> read-only mounted, we should recommend that pepole umount and then
> mount it when they try to remount with read-write. But the current
> message/comment recommends that they umount and then remount it.
> 
> ext3_remount:
> 	/*
> 	 * If we have an unprocessed orphan list hanging
> 	 * around from a previously readonly bdev mount,
> 	 * require a full umount/remount for now.
>                           ^^^^^^^^^^^^^^
> 	 */
> 	if (es->s_last_orphan) {
> 		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
> 			"remount RDWR because of unprocessed "
> 			"orphan inode list.  Please "
> 			"umount/remount instead.\n",
>                          ^^^^^^^^^^^^^^
> 			sb->s_id);
  OK, so how about using "umount & mount"? The '/' is what would confuse me
the most... BTW, I guess you didn't really see this message in practice, did
you?

									Honza
Toshiyuki Okajima - Aug. 1, 2011, 9:45 a.m.
Hi.

(2011/08/01 17:45), Jan Kara wrote:
> On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
>> If there are some inodes in orphan list while a filesystem is being
>> read-only mounted, we should recommend that pepole umount and then
>> mount it when they try to remount with read-write. But the current
>> message/comment recommends that they umount and then remount it.
>>
>> ext3_remount:
>> 	/*
>> 	 * If we have an unprocessed orphan list hanging
>> 	 * around from a previously readonly bdev mount,
>> 	 * require a full umount/remount for now.
>>                            ^^^^^^^^^^^^^^
>> 	 */
>> 	if (es->s_last_orphan) {
>> 		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
>> 			"remount RDWR because of unprocessed "
>> 			"orphan inode list.  Please "
>> 			"umount/remount instead.\n",
>>                           ^^^^^^^^^^^^^^
>> 			sb->s_id);

>    OK, so how about using "umount&  mount"? The '/' is what would confuse me
OK. I modify it like your comment.

  umount/mount => umount & mount

> the most... BTW, I guess you didn't really see this message in practice, did
> you?
No.
I have seen this message in practice while quotacheck command was repeatedly
executed per an hour.

Thanks,
Toshiyuki Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - Aug. 1, 2011, 9:57 a.m.
On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote:
> (2011/08/01 17:45), Jan Kara wrote:
> >On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
> >>If there are some inodes in orphan list while a filesystem is being
> >>read-only mounted, we should recommend that pepole umount and then
> >>mount it when they try to remount with read-write. But the current
> >>message/comment recommends that they umount and then remount it.
> >>
> >>ext3_remount:
> >>	/*
> >>	 * If we have an unprocessed orphan list hanging
> >>	 * around from a previously readonly bdev mount,
> >>	 * require a full umount/remount for now.
> >>                           ^^^^^^^^^^^^^^
> >>	 */
> >>	if (es->s_last_orphan) {
> >>		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
> >>			"remount RDWR because of unprocessed "
> >>			"orphan inode list.  Please "
> >>			"umount/remount instead.\n",
> >>                          ^^^^^^^^^^^^^^
> >>			sb->s_id);
> 
> >   OK, so how about using "umount&  mount"? The '/' is what would confuse me
> OK. I modify it like your comment.
> 
>  umount/mount => umount & mount
  Thanks.

> >the most... BTW, I guess you didn't really see this message in practice, did
> >you?
> No.
> I have seen this message in practice while quotacheck command was repeatedly
> executed per an hour.
  Interesting. Are you able to reproduce this? Quotacheck does remount
read-only + remount read-write but you cannot really remount the filesystem
read-only when it has orphan inodes and so you should not see those when
you remount read-write again. Possibly there's race between remounting and
unlinking...

									Honza
Toshiyuki Okajima - Aug. 2, 2011, 9:14 a.m.
Hi.

(2011/08/01 18:57), Jan Kara wrote:
> On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote:
>> (2011/08/01 17:45), Jan Kara wrote:
>>> On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
>>>> If there are some inodes in orphan list while a filesystem is being
>>>> read-only mounted, we should recommend that pepole umount and then
>>>> mount it when they try to remount with read-write. But the current
>>>> message/comment recommends that they umount and then remount it.
>>>>
>>>> ext3_remount:
>>>> 	/*
>>>> 	 * If we have an unprocessed orphan list hanging
>>>> 	 * around from a previously readonly bdev mount,
>>>> 	 * require a full umount/remount for now.
>>>>                            ^^^^^^^^^^^^^^
>>>> 	 */
>>>> 	if (es->s_last_orphan) {
>>>> 		printk(KERN_WARNING "EXT3-fs: %s: couldn't "
>>>> 			"remount RDWR because of unprocessed "
>>>> 			"orphan inode list.  Please "
>>>> 			"umount/remount instead.\n",
>>>>                           ^^^^^^^^^^^^^^
>>>> 			sb->s_id);
>>
>>>    OK, so how about using "umount&   mount"? The '/' is what would confuse me
>> OK. I modify it like your comment.
>>
>>   umount/mount =>  umount&  mount
>    Thanks.
>

>>> the most... BTW, I guess you didn't really see this message in practice, did
>>> you?
>> No.
>> I have seen this message in practice while quotacheck command was repeatedly
>> executed per an hour.
>    Interesting. Are you able to reproduce this? Quotacheck does remount
> read-only + remount read-write but you cannot really remount the filesystem
> read-only when it has orphan inodes and so you should not see those when
> you remount read-write again. Possibly there's race between remounting and
> unlinking...
Yes. I can reproduce it. However, it is not frequently reproduced
by using the original procedure (qutacheck per an hour). So, I made a
reproducer.

It is:
[go.sh]
#!/bin/sh

dd if=/dev/zero of=./img bs=1k count=1 seek=100k > /dev/null 2>&1
/sbin/mkfs.ext3 -Fq ./img
/sbin/tune2fs -c 0 ./img
mkdir -p mnt
LOOP=10000
for ((i=0; i<LOOP; i++));
do
         mount -o loop ./img ./mnt
         sh ./writer.sh ./mnt &
         PID=$!
	j=0
         while [ 1 ];
         do
                 sleep 1
# remount
                 if ((j%2 == 0));
                 then
                         mount -o loop,remount,ro ./mnt
                 else
                         mount -o loop,remount,rw ./mnt
                 fi
                 tail -n 1 /var/log/messages| grep "remount RDWR"
                 if [ $? -eq 0 ];
                 then
                         break
                 fi
                 j=$((j+1))
         done
         kill -9 $PID
         umount ./mnt
         /sbin/e2fsck -pf ./img
done
exit

[writer.sh]
#!/bin/sh

i=0
path=$1
num=0
stride=64
while [ 1 ];
do
         for ((j=0;j<stride;j++));
         do
                 num=$((i*stride + j))
                 dd if=/dev/zero of=${path}/file${num} bs=8k count=1 > /dev/null 2>&1 &
         done
         for ((j=0;j<stride;j++));
         do
                 num=$((i*stride + j))
                 rm -f ${path}/file${num} &
         done
         wait
         i=$((i+1))
done

# vi go.sh
# vi writer.sh
# sh go.sh
(It might be reproduced after a while...)

Thanks,
Toshiyuki Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshiyuki Okajima - Aug. 3, 2011, 2:42 a.m.
Hi.

(2011/08/02 18:14), Toshiyuki Okajima wrote:
> Hi.
>
> (2011/08/01 18:57), Jan Kara wrote:
>> On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote:
>>> (2011/08/01 17:45), Jan Kara wrote:
>>>> On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
>>>>> If there are some inodes in orphan list while a filesystem is being
>>>>> read-only mounted, we should recommend that pepole umount and then
>>>>> mount it when they try to remount with read-write. But the current
>>>>> message/comment recommends that they umount and then remount it.
<SNIP>
>>>> the most... BTW, I guess you didn't really see this message in practice, did
>>>> you?
>>> No.
>>> I have seen this message in practice while quotacheck command was repeatedly
>>> executed per an hour.
>> Interesting. Are you able to reproduce this? Quotacheck does remount
>> read-only + remount read-write but you cannot really remount the filesystem
>> read-only when it has orphan inodes and so you should not see those when
>> you remount read-write again. Possibly there's race between remounting and
>> unlinking...
> Yes. I can reproduce it. However, it is not frequently reproduced
> by using the original procedure (qutacheck per an hour). So, I made a
> reproducer.
To tell the truth, I think the race creates the message:
-----------------------------------------------------------------------
  EXT3-fs: <dev>: couldn't remount RDWR because of
       unprocessed orphan inode list.  Please umount/remount instead.
-----------------------------------------------------------------------
which hides a serious problem.

By using my reproducer, I found that it can show another message that
is not the above mentioned message:
-----------------------------------------------------------------------
EXT3-fs error (device <dev>) in start_transaction: Readonly filesystem	
-----------------------------------------------------------------------
After I examined the code path which message could display, I found
it can display if the following steps are satisfied:

[[CASE 1]]
( 1)  [process A] do_unlinkat
( 2)  [process B] do_remount_sb(, RDONLY, )
( 3)  [process A]  vfs_unlink
( 4)  [process A]   ext3_unlink
( 5)  [process A]    ext3_journal_start
( 6)  [process B]  fs_may_remount_ro   (=> return 0)
( 7)  [process A]    inode->i_nlink-- (i_nlink=0)
( 8)  [process A]    ext3_orphan_add
( 9)  [process A]    ext3_journal_stop
(10)  [process A]  dput
(11)  [process A]   iput
(12)  [process A]    ext3_evict_inode
(13)  [process B]  ext3_remount
(14)  [process A]     start_transaction
(15)  [process B]   sb->s_flags |= MS_RDONLY
(16)  [process B]   ext3_mark_recovery_complete
(17)  [process A]      start_this_handle (new transaction is created)
(18)  [process A]     ext3_truncate
(19)  [process A]      start_transaction (failed => this message is displayed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(20)  [process A]     ext3_orphan_del
(21)  [process A]     ext3_journal_stop

* "Process A" deletes a file successfully(21). However the file data is left
    because (18) fails. **Furthermore, new transaction can be created after
    ext3_mark_recovery_complete finishes.**

[[CASE2]]
( 1)  [process A] do_unlinkat
( 2)  [process B] do_remount_sb(, RDONLY, )
( 3)  [process A]  vfs_unlink
( 4)  [process A]   ext3_unlink
( 5)  [process A]    ext3_journal_start
( 6)  [process B]  fs_may_remount_ro   (=> return 0)
( 7)  [process A]    inode->i_nlink-- (i_nlink=0)
( 8)  [process A]    ext3_orphan_add
( 9)  [process A]    ext3_journal_stop
(10)  [process A]  dput
(11)  [process A]   iput
(12)  [process A]    ext3_evict_inode
(13)  [process B]  ext3_remount
(14)  [process A]     start_transaction
(15)  [process B]   sb->s_flags |= MS_RDONLY
(17)  [process A]      start_this_handle (new transaction is created)
(18)  [process A]     ext3_truncate
(19)  [process A]      start_transaction (failed => this message is displayed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(20)  [process A]     ext3_orphan_del
(21)  [process A]     ext3_journal_stop
(22)  [process B]   ext3_mark_recovery_complete

* "Process A" deletes a file successfully(21). However the file data is left
    because (18) fails. This transaction can finish before
    ext3_mark_recovery_complete finishes.

I will try to fix this problem not to do with fs-error.
Please comment about the fix if I have created one.

Thanks,
Toshiyuki Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - Aug. 3, 2011, 9:57 a.m.
Hello,

On Wed 03-08-11 11:42:03, Toshiyuki Okajima wrote:
> >(2011/08/01 18:57), Jan Kara wrote:
> >>On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote:
> >>>(2011/08/01 17:45), Jan Kara wrote:
> >>>>On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote:
> >>>>>If there are some inodes in orphan list while a filesystem is being
> >>>>>read-only mounted, we should recommend that pepole umount and then
> >>>>>mount it when they try to remount with read-write. But the current
> >>>>>message/comment recommends that they umount and then remount it.
> <SNIP>
> >>>>the most... BTW, I guess you didn't really see this message in practice, did
> >>>>you?
> >>>No.
> >>>I have seen this message in practice while quotacheck command was repeatedly
> >>>executed per an hour.
> >>Interesting. Are you able to reproduce this? Quotacheck does remount
> >>read-only + remount read-write but you cannot really remount the filesystem
> >>read-only when it has orphan inodes and so you should not see those when
> >>you remount read-write again. Possibly there's race between remounting and
> >>unlinking...
> >Yes. I can reproduce it. However, it is not frequently reproduced
> >by using the original procedure (qutacheck per an hour). So, I made a
> >reproducer.
> To tell the truth, I think the race creates the message:
> -----------------------------------------------------------------------
>  EXT3-fs: <dev>: couldn't remount RDWR because of
>       unprocessed orphan inode list.  Please umount/remount instead.
> -----------------------------------------------------------------------
> which hides a serious problem.
  I've inquired about this at linux-fsdevel (I think you were in CC unless
I forgot). It's a race in VFS remount code as you properly analyzed below.
People are working on fixing it but it's not trivial. Filesystem is really
a wrong place to fix such problem. If there is a trivial fix for ext3 to
workaround the issue, I can take it but I'm not willing to push anything
complex - effort should better be spent working on a generic fix.

								Honza

> By using my reproducer, I found that it can show another message that
> is not the above mentioned message:
> -----------------------------------------------------------------------
> EXT3-fs error (device <dev>) in start_transaction: Readonly filesystem	
> -----------------------------------------------------------------------
> After I examined the code path which message could display, I found
> it can display if the following steps are satisfied:
> 
> [[CASE 1]]
> ( 1)  [process A] do_unlinkat
> ( 2)  [process B] do_remount_sb(, RDONLY, )
> ( 3)  [process A]  vfs_unlink
> ( 4)  [process A]   ext3_unlink
> ( 5)  [process A]    ext3_journal_start
> ( 6)  [process B]  fs_may_remount_ro   (=> return 0)
> ( 7)  [process A]    inode->i_nlink-- (i_nlink=0)
> ( 8)  [process A]    ext3_orphan_add
> ( 9)  [process A]    ext3_journal_stop
> (10)  [process A]  dput
> (11)  [process A]   iput
> (12)  [process A]    ext3_evict_inode
> (13)  [process B]  ext3_remount
> (14)  [process A]     start_transaction
> (15)  [process B]   sb->s_flags |= MS_RDONLY
> (16)  [process B]   ext3_mark_recovery_complete
> (17)  [process A]      start_this_handle (new transaction is created)
> (18)  [process A]     ext3_truncate
> (19)  [process A]      start_transaction (failed => this message is displayed)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> (20)  [process A]     ext3_orphan_del
> (21)  [process A]     ext3_journal_stop
> 
> * "Process A" deletes a file successfully(21). However the file data is left
>    because (18) fails. **Furthermore, new transaction can be created after
>    ext3_mark_recovery_complete finishes.**
> 
> [[CASE2]]
> ( 1)  [process A] do_unlinkat
> ( 2)  [process B] do_remount_sb(, RDONLY, )
> ( 3)  [process A]  vfs_unlink
> ( 4)  [process A]   ext3_unlink
> ( 5)  [process A]    ext3_journal_start
> ( 6)  [process B]  fs_may_remount_ro   (=> return 0)
> ( 7)  [process A]    inode->i_nlink-- (i_nlink=0)
> ( 8)  [process A]    ext3_orphan_add
> ( 9)  [process A]    ext3_journal_stop
> (10)  [process A]  dput
> (11)  [process A]   iput
> (12)  [process A]    ext3_evict_inode
> (13)  [process B]  ext3_remount
> (14)  [process A]     start_transaction
> (15)  [process B]   sb->s_flags |= MS_RDONLY
> (17)  [process A]      start_this_handle (new transaction is created)
> (18)  [process A]     ext3_truncate
> (19)  [process A]      start_transaction (failed => this message is displayed)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> (20)  [process A]     ext3_orphan_del
> (21)  [process A]     ext3_journal_stop
> (22)  [process B]   ext3_mark_recovery_complete
> 
> * "Process A" deletes a file successfully(21). However the file data is left
>    because (18) fails. This transaction can finish before
>    ext3_mark_recovery_complete finishes.
> 
> I will try to fix this problem not to do with fs-error.
> Please comment about the fix if I have created one.
> 
> Thanks,
> Toshiyuki Okajima
>

Patch

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 7beb69a..d3df0d4 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2669,13 +2669,13 @@  static int ext3_remount (struct super_block * sb, int * flags, char * data)
 			/*
 			 * If we have an unprocessed orphan list hanging
 			 * around from a previously readonly bdev mount,
-			 * require a full umount/remount for now.
+			 * require a full umount/mount for now.
 			 */
 			if (es->s_last_orphan) {
 				ext3_msg(sb, KERN_WARNING, "warning: couldn't "
 				       "remount RDWR because of unprocessed "
 				       "orphan inode list.  Please "
-				       "umount/remount instead.");
+				       "umount/mount instead.");
 				err = -EINVAL;
 				goto restore_opts;
 			}