Message ID | 1364390347-4360-1-git-send-email-wenqing.lz@taobao.com |
---|---|
State | Superseded, archived |
Headers | show |
On Wed, Mar 27, 2013 at 09:19:07PM +0800, Zheng Liu wrote: > From: Zheng Liu <wenqing.lz@taobao.com> > > After applied this commit (8e4061cb), we will get a warning from > ext4_truncate when i_mutex isn't taken. Here the assumption is that > i_mutex should be taken when we do a truncation. In ext4_symlink we > could need to call ext4_truncate to trim some blocks beyond i_size, but > the i_mutex isn't taken. Hmm, and this is why I added the warning. Even after looking your patch, I'm having trouble finding the codepath that results in ext4_truncate() getting called from __page_symlink(). Can you send the stack trace from the WARN_ON, just so I can see what I missed? Thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ah, now I see. Thanks for sending the stack trace. On the failure path, we're calling the inline function ext4_truncate_filaed_write() and this is calling ext4_truncate(). But I'm now wondering if we need to take the i_data_sem mutex in ext4_truncate_failed_write(). Otherwise, couldn't we end up with problems where a failed write calls ext4_truncate() without i_data_sem(), and that races with something else --- say, a punch or truncate call to that same inode? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 27, 2013 at 09:41:10AM -0400, Theodore Ts'o wrote: > On Wed, Mar 27, 2013 at 09:19:07PM +0800, Zheng Liu wrote: > > From: Zheng Liu <wenqing.lz@taobao.com> > > > > After applied this commit (8e4061cb), we will get a warning from > > ext4_truncate when i_mutex isn't taken. Here the assumption is that > > i_mutex should be taken when we do a truncation. In ext4_symlink we > > could need to call ext4_truncate to trim some blocks beyond i_size, but > > the i_mutex isn't taken. > > Hmm, and this is why I added the warning. Even after looking your > patch, I'm having trouble finding the codepath that results in > ext4_truncate() getting called from __page_symlink(). Can you send > the stack trace from the WARN_ON, just so I can see what I missed? Here it is. kernel: ------------[ cut here ]------------ kernel: WARNING: at fs/ext4/inode.c:3803 ext4_truncate+0x36/0x2d7 [ext4]() kernel: Hardware name: OptiPlex 780 kernel: Modules linked in: ext4(O) jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport dcdbas acpi_cpufreq mperf serio_raw button pcspkr i2c_i801 sg ehci_pci ehci_hcd e1000e ptp pps_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2 c_core [last unloaded: crc16] kernel: Pid: 4152, comm: fsstress Tainted: G W O 3.9.0-rc4+ #6 kernel: Call Trace: kernel: [<ffffffff82031e14>] warn_slowpath_common+0x85/0x9f kernel: [<ffffffff82031e48>] warn_slowpath_null+0x1a/0x1c kernel: [<ffffffffa044b789>] ext4_truncate+0x36/0x2d7 [ext4] kernel: [<ffffffffa044c9a5>] ext4_write_begin+0x35f/0x3b9 [ext4] kernel: [<ffffffff820af514>] pagecache_write_begin+0x1c/0x1e kernel: [<ffffffff820f5f80>] __page_symlink+0x6d/0x10f kernel: [<ffffffffa0450806>] ? ext4_orphan_add+0x1ea/0x219 [ext4] kernel: [<ffffffffa0451a7f>] ext4_symlink+0x1ca/0x33e [ext4] kernel: [<ffffffff820f8c41>] vfs_symlink+0x7c/0xd6 kernel: [<ffffffff820fb634>] sys_symlinkat+0x68/0xb9 kernel: [<ffffffff820fb69b>] sys_symlink+0x16/0x18 kernel: [<ffffffff8238fac2>] system_call_fastpath+0x16/0x1b kernel: ---[ end trace 05d179cc296c4f3a ]--- Regards, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 27, 2013 at 10:02:50PM +0800, Zheng Liu wrote: > On Wed, Mar 27, 2013 at 09:41:10AM -0400, Theodore Ts'o wrote: > > On Wed, Mar 27, 2013 at 09:19:07PM +0800, Zheng Liu wrote: > > > From: Zheng Liu <wenqing.lz@taobao.com> > > > > > > After applied this commit (8e4061cb), we will get a warning from > > > ext4_truncate when i_mutex isn't taken. Here the assumption is that > > > i_mutex should be taken when we do a truncation. In ext4_symlink we > > > could need to call ext4_truncate to trim some blocks beyond i_size, but > > > the i_mutex isn't taken. > > > > Hmm, and this is why I added the warning. Even after looking your > > patch, I'm having trouble finding the codepath that results in > > ext4_truncate() getting called from __page_symlink(). Can you send > > the stack trace from the WARN_ON, just so I can see what I missed? > > Here it is. > > kernel: ------------[ cut here ]------------ > kernel: WARNING: at fs/ext4/inode.c:3803 ext4_truncate+0x36/0x2d7 [ext4]() > kernel: Hardware name: OptiPlex 780 > kernel: Modules linked in: ext4(O) jbd2 crc16 cpufreq_ondemand ipv6 > dm_mirror dm_region_hash dm_log dm_mod parport_pc parport dcdbas > acpi_cpufreq mperf serio_raw button pcspkr i2c_i801 sg ehci_pci > ehci_hcd e1000e ptp pps_core ext3 jbd sd_mod ahci libahci libata > scsi_mod uhci_hcd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2 > c_core [last unloaded: crc16] > kernel: Pid: 4152, comm: fsstress Tainted: G W O 3.9.0-rc4+ #6 > kernel: Call Trace: > kernel: [<ffffffff82031e14>] warn_slowpath_common+0x85/0x9f > kernel: [<ffffffff82031e48>] warn_slowpath_null+0x1a/0x1c > kernel: [<ffffffffa044b789>] ext4_truncate+0x36/0x2d7 [ext4] > kernel: [<ffffffffa044c9a5>] ext4_write_begin+0x35f/0x3b9 [ext4] > kernel: [<ffffffff820af514>] pagecache_write_begin+0x1c/0x1e > kernel: [<ffffffff820f5f80>] __page_symlink+0x6d/0x10f > kernel: [<ffffffffa0450806>] ? ext4_orphan_add+0x1ea/0x219 [ext4] > kernel: [<ffffffffa0451a7f>] ext4_symlink+0x1ca/0x33e [ext4] > kernel: [<ffffffff820f8c41>] vfs_symlink+0x7c/0xd6 > kernel: [<ffffffff820fb634>] sys_symlinkat+0x68/0xb9 > kernel: [<ffffffff820fb69b>] sys_symlink+0x16/0x18 > kernel: [<ffffffff8238fac2>] system_call_fastpath+0x16/0x1b > kernel: ---[ end trace 05d179cc296c4f3a ]--- Sorry, maybe you want to reproduce this warning. xfstests #083 can trigger it. Regards, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 27, 2013 at 09:51:55AM -0400, Theodore Ts'o wrote: > Ah, now I see. Thanks for sending the stack trace. On the failure > path, we're calling the inline function ext4_truncate_filaed_write() > and this is calling ext4_truncate(). > > But I'm now wondering if we need to take the i_data_sem mutex in > ext4_truncate_failed_write(). > > Otherwise, couldn't we end up with problems where a failed write calls > ext4_truncate() without i_data_sem(), and that races with something > else --- say, a punch or truncate call to that same inode? I don't think we need to take i_mutex lock honestly. In ext4_symlink when we call __page_symlink() the new inode doesn't access yet. So no one can do a punching hole or truncation to this inode. But I also think we need to add WARN_ON in ext4_truncate because i_mutex lock is used to serialize truncate/punch hole and buffered io. Regards, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 27, 2013 at 11:19:22PM +0800, Zheng Liu wrote: > > > Otherwise, couldn't we end up with problems where a failed write calls > > > ext4_truncate() without i_data_sem(), and that races with something > > > else --- say, a punch or truncate call to that same inode? > > Let me think about it. I need to take a close look at it. Note that I'm not so concerned when we are creating symlink --- you are quite right in pointing out in that case the inode isn't in the namespace yet, so that prevents races --- but also what might happen in an ENOSPC write(2) failure racing against a punch/truncate call. But again, this is why I added the warning --- it was to find these edge cases that we might not have considered. :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 27, 2013 at 11:07:35PM +0800, Zheng Liu wrote: > On Wed, Mar 27, 2013 at 09:51:55AM -0400, Theodore Ts'o wrote: > > Ah, now I see. Thanks for sending the stack trace. On the failure > > path, we're calling the inline function ext4_truncate_filaed_write() > > and this is calling ext4_truncate(). > > > > But I'm now wondering if we need to take the i_data_sem mutex in ^^^ Sigh, I misread i_data_sem and i_mutex. Really sorry about that. :-/ > > ext4_truncate_failed_write(). > > > > Otherwise, couldn't we end up with problems where a failed write calls > > ext4_truncate() without i_data_sem(), and that races with something > > else --- say, a punch or truncate call to that same inode? Let me think about it. I need to take a close look at it. Regards, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 27, 2013 at 11:12:48AM -0400, Theodore Ts'o wrote: > On Wed, Mar 27, 2013 at 11:19:22PM +0800, Zheng Liu wrote: > > > > Otherwise, couldn't we end up with problems where a failed write calls > > > > ext4_truncate() without i_data_sem(), and that races with something > > > > else --- say, a punch or truncate call to that same inode? > > > > Let me think about it. I need to take a close look at it. > > Note that I'm not so concerned when we are creating symlink --- you > are quite right in pointing out in that case the inode isn't in the > namespace yet, so that prevents races --- but also what might happen > in an ENOSPC write(2) failure racing against a punch/truncate call. > > But again, this is why I added the warning --- it was to find these > edge cases that we might not have considered. :-) ext4_truncate_failed_write() is called by the following functions: - ext4_ind_direct_IO - ext4_convert_inline_data_to_extent - ext4_da_convert_inline_data_to_extent - ext4_write_begin - ext4_write_end - ext4_journalled_write_end - ext4_da_write_begin All these functions are protected by i_mutex. So we can serialize it with truncate/punch hole. Regards, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I looked more closely at the assumption that ext4_write_begin() holds i_mutex. This is guaranteed by Documentation/filesystems/Locking, which notes that write_begin() and write_end() functions hold i_mutex: PageLocked(page) i_mutex write_begin: locks the page yes write_end: yes, unlocks yes So the bug is that ext4_symlink() calls __page_symlink(); __page_symlink() calls pagecache_write_begin() which calls write_begin(), without taking i_mutex. So we can fix this by taking i_mutex in ext4_symlink(), but I think it would be better to take the i_mutex in __page_symlink(), since it would then address a violation of the locking rules for all file systems. Al, do you agree? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 3825d6a..d75f91a 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2856,7 +2856,9 @@ retry: ext4_journal_stop(handle); if (err) goto err_drop_inode; + mutex_lock(&inode->i_mutex); err = __page_symlink(inode, symname, l, 1); + mutex_unlock(&inode->i_mutex); if (err) goto err_drop_inode; /*