Message ID | 4FEEC973.4090905@oracle.com |
---|---|
State | Not Applicable, archived |
Headers | show |
> + * tty_write_message() will invoked by print_warning() > + * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING > + * is enabled when a user running out of disk quota limits. > + * It will end up call tty_write(). Here is a potential race tty->ops->write is the low level write method, not tty_write. This appears to be even more wrong than the other one in other ways too - it uses interruptible sleeps but doesn't handle the signal case so will spin on a signal and kill the box. NAK Looking gat the traces I suspect what you've actually got is a much more complicated deadlock where a process doing perfectly normal I/O to the tty has faulted and there is a chain of dependancies through the file system code to the thread which is doing the dquot_alloc_inode. If that is the case then dquot_alloc_inode shouldn't be making blocking calls to tty_write_message and probably the right thing to do is to queue work for it so the tty_write_message is done asynchronously. There are a very limited number of events that need reporting so probably something like a per mount flags and workqueue would allow you to do set_bit(DQUOT_INODEOVER, &foo->events); schedule_work() and the work queue can just xchg the events long for 0 and spew any messages required. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Alan, On 06/30/2012 08:44 PM, Alan Cox wrote: >> + * tty_write_message() will invoked by print_warning() >> + * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING >> + * is enabled when a user running out of disk quota limits. >> + * It will end up call tty_write(). Here is a potential race > > tty->ops->write is the low level write method, not tty_write. I was wondering if below call trace is come from tty_write_message()->tty->ops->write()? [ 2739.802106] -> #1 (&mm->mmap_sem){++++++}: [ 2739.802120] [<c10fa825>] lock_acquire+0x14e/0x189 [ 2739.802133] [<c11e3e83>] might_fault+0xbf/0xf8 [ 2739.802154] [<c13bcbdf>] _copy_from_user+0x40/0x8a [ 2739.802175] [<c14addc3>] copy_from_user+0x16/0x26 [ 2739.802195] [<c14b00b4>] tty_write+0x282/0x3c7 [ 2739.802212] [<c14b02dd>] redirected_tty_write+0xe4/0xfd [ 2739.802226] [<c1231879>] vfs_write+0xf5/0x1a3 [ 2739.802239] [<c1231bdc>] sys_write+0x6c/0xa9 [ 2739.802253] [<c186281f>] sysenter_do_call+0x12/0x38 > > This appears to be even more wrong than the other one in other ways too - > it uses interruptible sleeps but doesn't handle the signal case so will > spin on a signal and kill the box. > > NAK > > Looking gat the traces I suspect what you've actually got is a much more > complicated deadlock where a process doing perfectly normal I/O to the > tty has faulted and there is a chain of dependancies through the file > system code to the thread which is doing the dquot_alloc_inode. > > If that is the case then dquot_alloc_inode shouldn't be making blocking > calls to tty_write_message and probably the right thing to do is to queue > work for it so the tty_write_message is done asynchronously. > > There are a very limited number of events that need reporting so probably > something like a per mount flags and workqueue would allow you to do > > set_bit(DQUOT_INODEOVER, &foo->events); > schedule_work() > > and the work queue can just xchg the events long for 0 and spew any > messages required. Thanks for the teaching, I'll give a try. -Jeff > > Alan > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index b425c79..2b4664d 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1098,12 +1098,40 @@ out: * * We must still hold the BTM and test the CLOSING flag for the moment. */ - void tty_write_message(struct tty_struct *tty, char *msg) { +retry: if (tty) { - mutex_lock(&tty->atomic_write_lock); - tty_lock(); + /* + * tty_write_message() will invoked by print_warning() + * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING + * is enabled when a user running out of disk quota limits. + * It will end up call tty_write(). Here is a potential race + * situation since tty_write() call copy_from_user() which + * might produce page faults and turn to invoke inodes dirty + * process on the underlying file systems if needed, and + * it will try to acquire JBD/JBD2 lock accordingly. This + * might make lockdep unhappy to print dead lock warning. + * To solve this issue, we have to go to sleep and relinquish + * the CPU power to another process until the atomic_write_lock + * became available. + */ + if (!mutex_trylock(&tty->atomic_write_lock)) { + DEFINE_WAIT(wait); + prepare_to_wait_exclusive(&tty->write_wait, &wait, + TASK_INTERRUPTIBLE); + schedule(); + finish_wait(&tty->write_wait, &wait); + goto retry; + } + + /* + * Call tty_lock() directly might race with tty_open() even + * if we have already got the atomic_write_lock. However, we + * must perform TTY_CLOSING check up with the BTM hold, so call + * tty_lock_wait() which is sleepable instead. + */ + tty_lock_wait(); if (tty->ops->write && !test_bit(TTY_CLOSING, &tty->flags)) { tty_unlock(); tty->ops->write(tty, msg, strlen(msg)); @@ -1114,7 +1142,6 @@ void tty_write_message(struct tty_struct *tty, char *msg) return; } - /** * tty_write - write method for tty device file * @file: tty file pointer
Avoid lockdep warn by having both lock acquirements sleepable. BTM also have to go to sleep if lock failed at the first time, otherwise it will race with tty_open()->tty_lock(). Signed-off-by: Jie Liu <jeff.liu@oracle.com> --- drivers/tty/tty_io.c | 35 +++++++++++++++++++++++++++++++---- 1 files changed, 31 insertions(+), 4 deletions(-)