Patchwork [2/2] make both atomic_write_lock and BTM lock acquirement sleepable at tty_write_message()

login
register
mail settings
Submitter jeff.liu
Date June 30, 2012, 9:40 a.m.
Message ID <4FEEC973.4090905@oracle.com>
Download mbox | patch
Permalink /patch/168291/
State Not Applicable
Headers show

Comments

jeff.liu - June 30, 2012, 9:40 a.m.
Avoid lockdep warn by having both lock acquirements sleepable.
BTM also have to go to sleep if lock failed at the first time, otherwise it
will race with tty_open()->tty_lock().

Signed-off-by: Jie Liu <jeff.liu@oracle.com>

---
 drivers/tty/tty_io.c |   35 +++++++++++++++++++++++++++++++----
 1 files changed, 31 insertions(+), 4 deletions(-)
Alan Cox - June 30, 2012, 12:44 p.m.
> +		 * tty_write_message() will invoked by print_warning()
> +		 * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING
> +		 * is enabled when a user running out of disk quota limits.
> +		 * It will end up call tty_write().  Here is a potential race

tty->ops->write is the low level write method, not tty_write.

This appears to be even more wrong than the other one in other ways too -
it uses interruptible sleeps but doesn't handle the signal case so will
spin on a signal and kill the box.

NAK

Looking gat the traces I suspect what you've actually got is a much more
complicated deadlock where a process doing perfectly normal I/O to the
tty has faulted and there is a chain of dependancies through the file
system code to the thread which is doing the dquot_alloc_inode.

If that is the case then dquot_alloc_inode shouldn't be making blocking
calls to tty_write_message and probably the right thing to do is to queue
work for it so the tty_write_message is done asynchronously.

There are a very limited number of events that need reporting so probably
something like a per mount flags and workqueue would allow you to do

	set_bit(DQUOT_INODEOVER, &foo->events);
	schedule_work()

and the work queue can just xchg the events long for 0 and spew any
messages required.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jeff.liu - June 30, 2012, 1:35 p.m.
Hey Alan,

On 06/30/2012 08:44 PM, Alan Cox wrote:

>> +		 * tty_write_message() will invoked by print_warning()
>> +		 * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING
>> +		 * is enabled when a user running out of disk quota limits.
>> +		 * It will end up call tty_write().  Here is a potential race
> 
> tty->ops->write is the low level write method, not tty_write.

I was wondering if below call trace is come from tty_write_message()->tty->ops->write()?
[ 2739.802106] -> #1 (&mm->mmap_sem){++++++}:
[ 2739.802120]        [<c10fa825>] lock_acquire+0x14e/0x189
[ 2739.802133]        [<c11e3e83>] might_fault+0xbf/0xf8
[ 2739.802154]        [<c13bcbdf>] _copy_from_user+0x40/0x8a
[ 2739.802175]        [<c14addc3>] copy_from_user+0x16/0x26
[ 2739.802195]        [<c14b00b4>] tty_write+0x282/0x3c7
[ 2739.802212]        [<c14b02dd>] redirected_tty_write+0xe4/0xfd
[ 2739.802226]        [<c1231879>] vfs_write+0xf5/0x1a3
[ 2739.802239]        [<c1231bdc>] sys_write+0x6c/0xa9
[ 2739.802253]        [<c186281f>] sysenter_do_call+0x12/0x38

> 
> This appears to be even more wrong than the other one in other ways too -
> it uses interruptible sleeps but doesn't handle the signal case so will
> spin on a signal and kill the box.
> 
> NAK
> 
> Looking gat the traces I suspect what you've actually got is a much more
> complicated deadlock where a process doing perfectly normal I/O to the
> tty has faulted and there is a chain of dependancies through the file
> system code to the thread which is doing the dquot_alloc_inode.
> 
> If that is the case then dquot_alloc_inode shouldn't be making blocking
> calls to tty_write_message and probably the right thing to do is to queue
> work for it so the tty_write_message is done asynchronously.
> 
> There are a very limited number of events that need reporting so probably
> something like a per mount flags and workqueue would allow you to do
> 
> 	set_bit(DQUOT_INODEOVER, &foo->events);
> 	schedule_work()
> 
> and the work queue can just xchg the events long for 0 and spew any
> messages required.

Thanks for the teaching, I'll give a try. 

-Jeff

> 
> Alan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index b425c79..2b4664d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1098,12 +1098,40 @@  out:
  *
  * We must still hold the BTM and test the CLOSING flag for the moment.
  */
-
 void tty_write_message(struct tty_struct *tty, char *msg)
 {
+retry:
 	if (tty) {
-		mutex_lock(&tty->atomic_write_lock);
-		tty_lock();
+		/*
+		 * tty_write_message() will invoked by print_warning()
+		 * at fs/quota/dquot.c if CONFIG_PRINT_QUOTA_WARNING
+		 * is enabled when a user running out of disk quota limits.
+		 * It will end up call tty_write().  Here is a potential race
+		 * situation since tty_write() call copy_from_user() which
+		 * might produce page faults and turn to invoke inodes dirty
+		 * process on the underlying file systems if needed, and
+		 * it will try to acquire JBD/JBD2 lock accordingly.  This
+		 * might make lockdep unhappy to print dead lock warning.
+		 * To solve this issue, we have to go to sleep and relinquish
+		 * the CPU power to another process until the atomic_write_lock
+		 * became available.
+		 */
+		if (!mutex_trylock(&tty->atomic_write_lock)) {
+			DEFINE_WAIT(wait);
+			prepare_to_wait_exclusive(&tty->write_wait, &wait,
+						  TASK_INTERRUPTIBLE);
+			schedule();
+			finish_wait(&tty->write_wait, &wait);
+			goto retry;
+		}
+
+		/*
+		 * Call tty_lock() directly might race with tty_open() even
+		 * if we have already got the atomic_write_lock. However, we
+		 * must perform TTY_CLOSING check up with the BTM hold, so call
+		 * tty_lock_wait() which is sleepable instead.
+		 */
+		tty_lock_wait();
 		if (tty->ops->write && !test_bit(TTY_CLOSING, &tty->flags)) {
 			tty_unlock();
 			tty->ops->write(tty, msg, strlen(msg));
@@ -1114,7 +1142,6 @@  void tty_write_message(struct tty_struct *tty, char *msg)
 	return;
 }
 
-
 /**
  *	tty_write		-	write method for tty device file
  *	@file: tty file pointer