Message ID | 560984B4.7090105@codeaurora.org |
---|---|
State | Changes Requested |
Headers | show |
Hi! Am 28.09.2015 um 20:19 schrieb Nikhilesh Reddy: > The rename operation in UBIFS is synchronous (or nearly synchronous) > while the write operation is not. This can result in zero length files when > renaming of files followed by an abrupt power down or a crash. > > For example: > 1) Say a file a.txt exists with size 1KB. > 2) Create a file b.tmp (open) > 3) Update the data in b.tmp with new values (write and close) > 4) rename b.tmp to a.txt > 5) Abrupt power down or crash > > This above scenario can result in a.txt becoming a file of zero length and > giving the impression of a.txt being truncated. > This scenario can ofcourse be prevented by calling fsync or fdatasync > before the rename operation. > > There are many applications and libraries that do not call fsync or > fdatasync since they were tested on EXT4 which has a hack to handle > the above case. ...and these applications are buggy by design. ext4 has some hacks to detect some misuses (IIRC replace by rename and replace by truncate) as these applications worked by chance on ext3. Adding such a hack now to UBIFS needs a bit more justification. Especially as your new mount option is a sledgehammer. Which application triggers this issue? I'm asking because UBIFS is more or less an embedded filesystem. On ext4 mostly broken GUI programs like eclipse or kwrite forgot to fsync(). > Add a new mount option of sync_before_rename which would implicitly > sync the data before renaming the file to help address cases where the > rename cases need to be handled implicitly and prevent the zero length > files as a result of a rename. > > Change-Id: I4e8771d40307543b532df7f46bd87864f0d3d294 What is that? Thanks, //richard
On Mon 28 Sep 2015 12:38:29 PM PDT, Richard Weinberger wrote: > Hi! > > Am 28.09.2015 um 20:19 schrieb Nikhilesh Reddy: >> The rename operation in UBIFS is synchronous (or nearly synchronous) >> while the write operation is not. This can result in zero length files when >> renaming of files followed by an abrupt power down or a crash. >> >> For example: >> 1) Say a file a.txt exists with size 1KB. >> 2) Create a file b.tmp (open) >> 3) Update the data in b.tmp with new values (write and close) >> 4) rename b.tmp to a.txt >> 5) Abrupt power down or crash >> >> This above scenario can result in a.txt becoming a file of zero length and >> giving the impression of a.txt being truncated. >> This scenario can ofcourse be prevented by calling fsync or fdatasync >> before the rename operation. >> >> There are many applications and libraries that do not call fsync or >> fdatasync since they were tested on EXT4 which has a hack to handle >> the above case. > > ...and these applications are buggy by design. > ext4 has some hacks to detect some misuses (IIRC replace by rename and replace by truncate) > as these applications worked by chance on ext3. > Adding such a hack now to UBIFS needs a bit more justification. > Especially as your new mount option is a sledgehammer. > > Which application triggers this issue? > I'm asking because UBIFS is more or less an embedded filesystem. > On ext4 mostly broken GUI programs like eclipse or kwrite forgot to fsync(). Thanks Richard for lookign into this patch. I completely agree with you on the fact that these applications are indeed buggy. But yes the issues were seen on embedded systems. We saw this issue when debugging a few applications that used an xml parser library. to write data. There were a few other applications as well but i dont have access to their source. Fixing all the applications is not exactly feasible since they may have bugs in multiple places. And sometimes we dont have a legal go ahead to fix code that is from thirdparties who may never fix their code... or just distribute a s binaries. This change was made due to multiple requests that came from our customers who ran into this issue on the applications that they run on their products. We could use "-o sync" mount option. But this makes UBIFS perform badly that just syncing the old inode data alone. The idea was to have a mount point option that could be enabled only as needed and taking a performance hit during a rename. All the tests showed no real performance degradation. Since it would be disabled by default the normal mount without this would have no impact what so ever to the current behavior. Only on filesystems that are mounted with this option will this new behavior kick in. Please do consider applying the patch. If you have any suggestions on improving this patch to you liking please do let me know and I am happy to make any chances that you deem necessary. Thanks again for you consideration. > >> Add a new mount option of sync_before_rename which would implicitly >> sync the data before renaming the file to help address cases where the >> rename cases need to be handled implicitly and prevent the zero length >> files as a result of a rename. >> >> Change-Id: I4e8771d40307543b532df7f46bd87864f0d3d294 > > What is that? Oops this was a leftover from an internal change id... can get rid of it. Sorry about that. > > Thanks, > //richard -- Thanks Nikhilesh Reddy Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Hi! Am 28.09.2015 um 22:38 schrieb Nikhilesh Reddy: >> ...and these applications are buggy by design. >> ext4 has some hacks to detect some misuses (IIRC replace by rename and replace by truncate) >> as these applications worked by chance on ext3. >> Adding such a hack now to UBIFS needs a bit more justification. >> Especially as your new mount option is a sledgehammer. >> >> Which application triggers this issue? >> I'm asking because UBIFS is more or less an embedded filesystem. >> On ext4 mostly broken GUI programs like eclipse or kwrite forgot to fsync(). > > Thanks Richard for lookign into this patch. > I completely agree with you on the fact that these applications are indeed buggy. > But yes the issues were seen on embedded systems. > We saw this issue when debugging a few applications that used an xml parser library. > to write data. libxml2? > There were a few other applications as well but i dont have access to their source. > Fixing all the applications is not exactly feasible since they may have bugs in multiple places. > And sometimes we dont have a legal go ahead to fix code that is from thirdparties who may never fix their code... or just distribute a s binaries. > This change was made due to multiple requests that came from our customers who ran into this issue on the applications that they run on their products. > > We could use "-o sync" mount option. But this makes UBIFS perform badly that just syncing the old inode data alone. > The idea was to have a mount point option that could be enabled only as needed and taking a performance hit during a rename. > All the tests showed no real performance degradation. Hmm, I'd have assumed that programs with heavy rename() usage would degrade. > Since it would be disabled by default the normal mount without this would have no impact what so ever to the current behavior. > Only on filesystems that are mounted with this option will this new behavior kick in. > > Please do consider applying the patch. > If you have any suggestions on improving this patch to you liking please do let me know and I am happy to make any chances that you deem necessary. Instead of making all rename() synchronous it would be a good start to detect broken patterns like ext4 does. I'll happily test and review those. :-) Thanks, //richard
On Mon 28 Sep 2015 01:49:28 PM PDT, Richard Weinberger wrote: > Hi! > > Am 28.09.2015 um 22:38 schrieb Nikhilesh Reddy: >>> ...and these applications are buggy by design. >>> ext4 has some hacks to detect some misuses (IIRC replace by rename and replace by truncate) >>> as these applications worked by chance on ext3. >>> Adding such a hack now to UBIFS needs a bit more justification. >>> Especially as your new mount option is a sledgehammer. >>> >>> Which application triggers this issue? >>> I'm asking because UBIFS is more or less an embedded filesystem. >>> On ext4 mostly broken GUI programs like eclipse or kwrite forgot to fsync(). >> >> Thanks Richard for lookign into this patch. >> I completely agree with you on the fact that these applications are indeed buggy. >> But yes the issues were seen on embedded systems. >> We saw this issue when debugging a few applications that used an xml parser library. >> to write data. > > libxml2? No i believe its pugixml (not sure which version) or its variant. > >> There were a few other applications as well but i dont have access to their source. >> Fixing all the applications is not exactly feasible since they may have bugs in multiple places. >> And sometimes we dont have a legal go ahead to fix code that is from thirdparties who may never fix their code... or just distribute a s binaries. >> This change was made due to multiple requests that came from our customers who ran into this issue on the applications that they run on their products. >> >> We could use "-o sync" mount option. But this makes UBIFS perform badly that just syncing the old inode data alone. >> The idea was to have a mount point option that could be enabled only as needed and taking a performance hit during a rename. >> All the tests showed no real performance degradation. > > Hmm, I'd have assumed that programs with heavy rename() usage would degrade. > >> Since it would be disabled by default the normal mount without this would have no impact what so ever to the current behavior. >> Only on filesystems that are mounted with this option will this new behavior kick in. >> >> Please do consider applying the patch. >> If you have any suggestions on improving this patch to you liking please do let me know and I am happy to make any chances that you deem necessary. > > Instead of making all rename() synchronous it would be a good start to detect broken > patterns like ext4 does. > I'll happily test and review those. :-) Unfortunately major changes would not be completely up to me :( But before i spend time on it ... i was hoping to understand the disadvantages of this approach and the advantages of the other approach? Can you kindly help me understand? So the patch effectively converts all the rename to a fdatasync+rename which is what we would expect all the userspace applications to do anyway. If the application already calls fdatasync then nothing new occurs as part of the rename. If it doesnt then we perform an implicit fdatasync. So only the broken paths ( which do not fdatasync) would be impacted. Hopefully that is ok? If enough time had passed and the data is already written to flash before the rename then again nothing additional happens. Hopefully i did not miss something obvious. Please do correct me if i missed something or made wrong assumptions. > > Thanks, > //richard -- Thanks Nikhilesh Reddy Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Hi! Am 29.09.2015 um 19:04 schrieb Nikhilesh Reddy: > Unfortunately major changes would not be completely up to me :( hm? > But before i spend time on it ... i was hoping to understand the disadvantages of this approach and the advantages of the other approach? > Can you kindly help me understand? Sure. Instead of making rename() unconditionally synchronous we could do it like ext4 and detect rename+close patterns and make only these synchronous. So, your application does: fd = open("foo", ...) write(fd, "yadda yadda, ...) close(fd) rename("foo", "bar") and then expects "bar" to exist with contents "yadda yadda"? I'm asking because we also have to make sure that this is not an UBIFS bug. We have to be careful, ext4 has a metadata journal, UBIFS' is different. My point is that UBIFS better should detect the close+rename pattern then only then do rename synchronous. > So the patch effectively converts all the rename to a fdatasync+rename which is what we would expect all the userspace applications to do anyway. But there are also cases where you want a fast rename() and sync later. Artem, what do you think? Thanks, //richard
Hi! Am 28.09.2015 um 20:19 schrieb Nikhilesh Reddy: > The rename operation in UBIFS is synchronous (or nearly synchronous) > while the write operation is not. This can result in zero length files when > renaming of files followed by an abrupt power down or a crash. > > For example: > 1) Say a file a.txt exists with size 1KB. > 2) Create a file b.tmp (open) > 3) Update the data in b.tmp with new values (write and close) > 4) rename b.tmp to a.txt > 5) Abrupt power down or crash > > This above scenario can result in a.txt becoming a file of zero length and > giving the impression of a.txt being truncated. > This scenario can ofcourse be prevented by calling fsync or fdatasync > before the rename operation. I gave this a try and hacked up something to emulate a powercut *exactly* after rename() in UBIFS. fd = open("b.tmp", ...) write(fd, "foo", ...) close(fd) rename("b.tmp", "a.txt") ^---- powercut After remounting UBIFS both a.txt and b.tmp are present but b.tmp is truncated. Not a.txt as you said. Can you please double check? I want to make sure that we're talking about the same things. Thanks, //richard
On 10/02/2015 02:38 PM, Richard Weinberger wrote: > Hi! > > Am 28.09.2015 um 20:19 schrieb Nikhilesh Reddy: >> The rename operation in UBIFS is synchronous (or nearly synchronous) >> while the write operation is not. This can result in zero length files when >> renaming of files followed by an abrupt power down or a crash. >> >> For example: >> 1) Say a file a.txt exists with size 1KB. >> 2) Create a file b.tmp (open) >> 3) Update the data in b.tmp with new values (write and close) >> 4) rename b.tmp to a.txt >> 5) Abrupt power down or crash >> >> This above scenario can result in a.txt becoming a file of zero length and >> giving the impression of a.txt being truncated. >> This scenario can ofcourse be prevented by calling fsync or fdatasync >> before the rename operation. > > I gave this a try and hacked up something to emulate a powercut *exactly* after > rename() in UBIFS. > > fd = open("b.tmp", ...) > write(fd, "foo", ...) > close(fd) > rename("b.tmp", "a.txt") > ^---- powercut > > After remounting UBIFS both a.txt and b.tmp are present > but b.tmp is truncated. Not a.txt as you said. > > Can you please double check? > I want to make sure that we're talking about the same things. Since you mentioned a.txt and b.tmp are both present... i assume the file a.txt was present even before b.tmp was created? I will try and explain as to what i understand the situation to be. If both the files are present then the rename didnt actually get written to the device and was probably still in the internal ubifs write buffer. I believe there is a small delay between the rename call and the inodes being updated on the the device from the internal ubifs write buffer. The scenario i described above seems to occur when the inode update is committed to the device... i.e here the b.tmp should not exist since the rename was successfully written but the file data writeback (that is in the page cache) has not yet been committed to the device. Since the writeback buffer is way smaller than the page cache the inode update occurs first or is likely to have. Hopefully i did not mess up on my understanding or explanation. Please do correct me if i messed up. > > Thanks, > //richard >
Am 05.10.2015 um 20:05 schrieb Nikhilesh Reddy: > On 10/02/2015 02:38 PM, Richard Weinberger wrote: >> Hi! >> >> Am 28.09.2015 um 20:19 schrieb Nikhilesh Reddy: >>> The rename operation in UBIFS is synchronous (or nearly synchronous) >>> while the write operation is not. This can result in zero length files when >>> renaming of files followed by an abrupt power down or a crash. >>> >>> For example: >>> 1) Say a file a.txt exists with size 1KB. >>> 2) Create a file b.tmp (open) >>> 3) Update the data in b.tmp with new values (write and close) >>> 4) rename b.tmp to a.txt >>> 5) Abrupt power down or crash >>> >>> This above scenario can result in a.txt becoming a file of zero length and >>> giving the impression of a.txt being truncated. >>> This scenario can ofcourse be prevented by calling fsync or fdatasync >>> before the rename operation. >> >> I gave this a try and hacked up something to emulate a powercut *exactly* after >> rename() in UBIFS. >> >> fd = open("b.tmp", ...) >> write(fd, "foo", ...) >> close(fd) >> rename("b.tmp", "a.txt") >> ^---- powercut >> >> After remounting UBIFS both a.txt and b.tmp are present >> but b.tmp is truncated. Not a.txt as you said. >> >> Can you please double check? >> I want to make sure that we're talking about the same things. > > Since you mentioned a.txt and b.tmp are both present... i assume the file a.txt was present even before b.tmp was created? Yes. > I will try and explain as to what i understand the situation to be. > > If both the files are present then the rename didnt actually get written to the device and was probably still in the internal ubifs write buffer. A rename operation does not trigger a commit, therefore a powercut directly after rename() would make the rename() void. In this context "both files present" means a.txt and b.tmp exist and are both synched to disk? > I believe there is a small delay between the rename call and the inodes > being updated on the the device from the internal ubifs write buffer. > > The scenario i described above seems to occur when the inode update is committed to the device... i.e here the b.tmp should not exist since the rename was successfully written but > the file data writeback (that is in the page cache) has not yet been committed to the device. > Since the writeback buffer is way smaller than the page cache the inode update occurs first or is likely to have. > > > Hopefully i did not mess up on my understanding or explanation. Can you please share a reproducer? A simple sequence of syscall would also do it. Thanks, //richard
On Tue 06 Oct 2015 01:09:22 AM PDT, Richard Weinberger wrote: > Am 05.10.2015 um 20:05 schrieb Nikhilesh Reddy: >> On 10/02/2015 02:38 PM, Richard Weinberger wrote: >>> Hi! >>> >>> Am 28.09.2015 um 20:19 schrieb Nikhilesh Reddy: >>>> The rename operation in UBIFS is synchronous (or nearly synchronous) >>>> while the write operation is not. This can result in zero length files when >>>> renaming of files followed by an abrupt power down or a crash. >>>> >>>> For example: >>>> 1) Say a file a.txt exists with size 1KB. >>>> 2) Create a file b.tmp (open) >>>> 3) Update the data in b.tmp with new values (write and close) >>>> 4) rename b.tmp to a.txt >>>> 5) Abrupt power down or crash >>>> >>>> This above scenario can result in a.txt becoming a file of zero length and >>>> giving the impression of a.txt being truncated. >>>> This scenario can ofcourse be prevented by calling fsync or fdatasync >>>> before the rename operation. >>> >>> I gave this a try and hacked up something to emulate a powercut *exactly* after >>> rename() in UBIFS. >>> >>> fd = open("b.tmp", ...) >>> write(fd, "foo", ...) >>> close(fd) >>> rename("b.tmp", "a.txt") >>> ^---- powercut >>> >>> After remounting UBIFS both a.txt and b.tmp are present >>> but b.tmp is truncated. Not a.txt as you said. >>> >>> Can you please double check? >>> I want to make sure that we're talking about the same things. >> >> Since you mentioned a.txt and b.tmp are both present... i assume the file a.txt was present even before b.tmp was created? > > Yes. > >> I will try and explain as to what i understand the situation to be. >> >> If both the files are present then the rename didnt actually get written to the device and was probably still in the internal ubifs write buffer. > > A rename operation does not trigger a commit, therefore a powercut directly after rename() would make the rename() void. > In this context "both files present" means a.txt and b.tmp exist and are both synched to disk? > >> I believe there is a small delay between the rename call and the inodes >> being updated on the the device from the internal ubifs write buffer. >> >> The scenario i described above seems to occur when the inode update is committed to the device... i.e here the b.tmp should not exist since the rename was successfully written but >> the file data writeback (that is in the page cache) has not yet been committed to the device. >> Since the writeback buffer is way smaller than the page cache the inode update occurs first or is likely to have. >> >> >> Hopefully i did not mess up on my understanding or explanation. > > Can you please share a reproducer? > A simple sequence of syscall would also do it. > > Thanks, > //richard Sorry for the delay in my reply I got tied up... as for the reproducer... its exactly as i described in the commit message... though we performed the power reset after a bit of delay. it does take a few tries on our end to reproduce... so we have it on a loop until it is reproduced. I Will definitely send you more concrete steps once i have a bit of time. -- Thanks Nikhilesh Reddy Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Am 13.10.2015 um 20:41 schrieb Nikhilesh Reddy: > On Tue 06 Oct 2015 01:09:22 AM PDT, Richard Weinberger wrote: >> Am 05.10.2015 um 20:05 schrieb Nikhilesh Reddy: >>> On 10/02/2015 02:38 PM, Richard Weinberger wrote: >>>> Hi! >>>> >>>> Am 28.09.2015 um 20:19 schrieb Nikhilesh Reddy: >>>>> The rename operation in UBIFS is synchronous (or nearly synchronous) >>>>> while the write operation is not. This can result in zero length files when >>>>> renaming of files followed by an abrupt power down or a crash. >>>>> >>>>> For example: >>>>> 1) Say a file a.txt exists with size 1KB. >>>>> 2) Create a file b.tmp (open) >>>>> 3) Update the data in b.tmp with new values (write and close) >>>>> 4) rename b.tmp to a.txt >>>>> 5) Abrupt power down or crash >>>>> >>>>> This above scenario can result in a.txt becoming a file of zero length and >>>>> giving the impression of a.txt being truncated. >>>>> This scenario can ofcourse be prevented by calling fsync or fdatasync >>>>> before the rename operation. >>>> >>>> I gave this a try and hacked up something to emulate a powercut *exactly* after >>>> rename() in UBIFS. >>>> >>>> fd = open("b.tmp", ...) >>>> write(fd, "foo", ...) >>>> close(fd) >>>> rename("b.tmp", "a.txt") >>>> ^---- powercut >>>> >>>> After remounting UBIFS both a.txt and b.tmp are present >>>> but b.tmp is truncated. Not a.txt as you said. >>>> >>>> Can you please double check? >>>> I want to make sure that we're talking about the same things. >>> >>> Since you mentioned a.txt and b.tmp are both present... i assume the file a.txt was present even before b.tmp was created? >> >> Yes. >> >>> I will try and explain as to what i understand the situation to be. >>> >>> If both the files are present then the rename didnt actually get written to the device and was probably still in the internal ubifs write buffer. >> >> A rename operation does not trigger a commit, therefore a powercut directly after rename() would make the rename() void. >> In this context "both files present" means a.txt and b.tmp exist and are both synched to disk? >> >>> I believe there is a small delay between the rename call and the inodes >>> being updated on the the device from the internal ubifs write buffer. >>> >>> The scenario i described above seems to occur when the inode update is committed to the device... i.e here the b.tmp should not exist since the rename was successfully written but >>> the file data writeback (that is in the page cache) has not yet been committed to the device. >>> Since the writeback buffer is way smaller than the page cache the inode update occurs first or is likely to have. >>> >>> >>> Hopefully i did not mess up on my understanding or explanation. >> >> Can you please share a reproducer? >> A simple sequence of syscall would also do it. >> >> Thanks, >> //richard > > Sorry for the delay in my reply > I got tied up... No big deal. > as for the reproducer... its exactly as i described in the commit message... though we performed the power reset after a bit of delay. it does take a few tries on our end to > reproduce... so we have it on a loop until it is reproduced. > > I Will definitely send you more concrete steps once i have a bit of time. Please do so. As I said, if I do exactly what you wrote, as expected b.tmp will be truncated but the already synced file a.txt stays. Thanks, //richard
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 5083036..7943291 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -968,6 +968,24 @@ static void unlock_3_inodes(struct inode *inode1, struct inode *inode2, mutex_unlock(&ubifs_inode(inode1)->ui_mutex); } +static int force_sync_inode_data(struct ubifs_info *c, struct inode *inode) +{ + int err; + + ubifs_assert(mutex_is_locked(&inode->i_mutex)); + + err = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); + if (err) { + ubifs_err(c, "filemap_write_and_wait_range failed with %d", + err); + return err; + } + err = ubifs_sync_wbufs_by_inode(c, inode); + if (err) + ubifs_err(c, "ubifs_sync_wbufs_by_inode failed with %d", err); + return err; +} + static int ubifs_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry) { @@ -1020,6 +1038,16 @@ static int ubifs_rename(struct inode *old_dir, struct dentry *old_dentry, return err; } + /* Before renaming, make sure old_inode is synced to disc if the + * mount option is selected. */ + if (c->sync_before_rename) { + err = force_sync_inode_data(c, old_inode); + if (err) { + ubifs_err(c, "old_inode data sync failed for rename"); + goto out; + } + } + lock_3_inodes(old_dir, new_dir, new_inode); /* @@ -1129,6 +1157,7 @@ out_cancel: } } unlock_3_inodes(old_dir, new_dir, new_inode); +out: ubifs_release_budget(c, &ino_req); ubifs_release_budget(c, &req); return err; diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index 4fa829f..f185894 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -443,6 +443,11 @@ static int ubifs_show_options(struct seq_file *s, struct dentry *root) ubifs_compr_name(c->mount_opts.compr_type)); } + if (c->mount_opts.sync_before_rename == 2) + seq_puts(s, ",sync_before_rename"); + else if (c->mount_opts.sync_before_rename == 1) + seq_puts(s, ",no_sync_before_rename"); + return 0; } @@ -928,6 +933,8 @@ enum { Opt_chk_data_crc, Opt_no_chk_data_crc, Opt_override_compr, + Opt_sync_before_rename, + Opt_no_sync_before_rename, Opt_err, }; @@ -939,6 +946,8 @@ static const match_table_t tokens = { {Opt_chk_data_crc, "chk_data_crc"}, {Opt_no_chk_data_crc, "no_chk_data_crc"}, {Opt_override_compr, "compr=%s"}, + {Opt_sync_before_rename, "sync_before_rename"}, + {Opt_no_sync_before_rename, "no_sync_before_rename"}, {Opt_err, NULL}, }; @@ -1039,6 +1048,14 @@ static int ubifs_parse_options(struct ubifs_info *c, char *options, c->default_compr = c->mount_opts.compr_type; break; } + case Opt_sync_before_rename: + c->mount_opts.sync_before_rename = 2; + c->sync_before_rename = 1; + break; + case Opt_no_sync_before_rename: + c->mount_opts.sync_before_rename = 1; + c->sync_before_rename = 0; + break; default: { unsigned long flag; diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h index de75902..c2adf9c 100644 --- a/fs/ubifs/ubifs.h +++ b/fs/ubifs/ubifs.h @@ -942,6 +942,8 @@ struct ubifs_orphan { * specified in @compr_type) * @compr_type: compressor type to override the superblock compressor with * (%UBIFS_COMPR_NONE, etc) + * @sync_before_rename: Sync the data associated with the old inode before + the rename.(%0 default, %1 disable, %2 enabled) */ struct ubifs_mount_opts { unsigned int unmount_mode:2; @@ -949,6 +951,7 @@ struct ubifs_mount_opts { unsigned int chk_data_crc:2; unsigned int override_compr:1; unsigned int compr_type:2; + unsigned int sync_before_rename:2; }; /** @@ -1034,6 +1037,7 @@ struct ubifs_debug_info; * @bulk_read: enable bulk-reads * @default_compr: default compression algorithm (%UBIFS_COMPR_LZO, etc) * @rw_incompat: the media is not R/W compatible + * @sync_before_rename: Perform an fdatasync on the old inode before rename. * * @tnc_mutex: protects the Tree Node Cache (TNC), @zroot, @cnext,
The rename operation in UBIFS is synchronous (or nearly synchronous) while the write operation is not. This can result in zero length files when renaming of files followed by an abrupt power down or a crash. For example: 1) Say a file a.txt exists with size 1KB. 2) Create a file b.tmp (open) 3) Update the data in b.tmp with new values (write and close) 4) rename b.tmp to a.txt 5) Abrupt power down or crash This above scenario can result in a.txt becoming a file of zero length and giving the impression of a.txt being truncated. This scenario can ofcourse be prevented by calling fsync or fdatasync before the rename operation. There are many applications and libraries that do not call fsync or fdatasync since they were tested on EXT4 which has a hack to handle the above case. Add a new mount option of sync_before_rename which would implicitly sync the data before renaming the file to help address cases where the rename cases need to be handled implicitly and prevent the zero length files as a result of a rename. Change-Id: I4e8771d40307543b532df7f46bd87864f0d3d294 Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org> --- fs/ubifs/dir.c | 29 +++++++++++++++++++++++++++++ fs/ubifs/super.c | 17 +++++++++++++++++ fs/ubifs/ubifs.h | 5 +++++ 3 files changed, 51 insertions(+) @enext, and * @calc_idx_sz @@ -1275,6 +1279,7 @@ struct ubifs_info { unsigned int bulk_read:1; unsigned int default_compr:2; unsigned int rw_incompat:1; + unsigned int sync_before_rename:1; struct mutex tnc_mutex; struct ubifs_zbranch zroot;