Message ID | 5644F3B5.6050807@canonical.com |
---|---|
State | New |
Headers | show |
On 11/12/2015 01:16 PM, Tim Gardner wrote: > On 11/12/2015 11:24 AM, Joseph Salisbury wrote: >> From: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >> >> BugLink: http://bugs.launchpad.net/bugs/1512815 >> >> My colleague ran into a program stall on a x86_64 server, where >> n_tty_read() was waiting for data even if there was data in the buffer >> in the pty. kernel stack for the stuck process looks like below. >> #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 >> #1 [ffff88303d107bd0] schedule at ffffffff815c513e >> #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 >> #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 >> #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 >> #5 [ffff88303d107dd0] tty_read at ffffffff81368013 >> #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 >> #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 >> #8 [ffff88303d107f00] sys_read at ffffffff811a4306 >> #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 >> >> There seems to be two problems causing this issue. >> >> First, in drivers/tty/n_tty.c, __receive_buf() stores the data and >> updates ldata->commit_head using smp_store_release() and then checks >> the wait queue using waitqueue_active(). However, since there is no >> memory barrier, __receive_buf() could return without calling >> wake_up_interactive_poll(), and at the same time, n_tty_read() could >> start to wait in wait_woken() as in the following chart. >> >> __receive_buf() n_tty_read() >> ------------------------------------------------------------------------ >> if (waitqueue_active(&tty->read_wait)) >> /* Memory operations issued after the >> RELEASE may be completed before the >> RELEASE operation has completed */ >> >> add_wait_queue(&tty->read_wait, &wait); >> ... >> if (!input_available_p(tty, >> 0)) { >> smp_store_release(&ldata->commit_head, >> ldata->read_head); >> ... >> timeout = wait_woken(&wait, >> TASK_INTERRUPTIBLE, timeout); >> ------------------------------------------------------------------------ >> >> The second problem is that n_tty_read() also lacks a memory barrier >> call and could also cause __receive_buf() to return without calling >> wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() >> as in the chart below. >> >> __receive_buf() n_tty_read() >> ------------------------------------------------------------------------ >> spin_lock_irqsave(&q->lock, >> flags); >> /* from add_wait_queue() */ >> ... >> if (!input_available_p(tty, >> 0)) { >> /* Memory operations issued >> after the >> RELEASE may be completed >> before the >> RELEASE operation has >> completed */ >> smp_store_release(&ldata->commit_head, >> ldata->read_head); >> if (waitqueue_active(&tty->read_wait)) >> __add_wait_queue(q, wait); >> >> spin_unlock_irqrestore(&q->lock,flags); >> /* from add_wait_queue() */ >> ... >> timeout = wait_woken(&wait, >> TASK_INTERRUPTIBLE, timeout); >> ------------------------------------------------------------------------ >> >> There are also other places in drivers/tty/n_tty.c which have similar >> calls to waitqueue_active(), so instead of adding many memory barrier >> calls, this patch simply removes the call to waitqueue_active(), >> leaving just wake_up*() behind. >> >> This fixes both problems because, even though the memory access before >> or after the spinlocks in both wake_up*() and add_wait_queue() can >> sneak into the critical section, it cannot go past it and the critical >> section assures that they will be serialized (please see "INTER-CPU >> ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a >> better explanation). Moreover, the resulting code is much simpler. >> >> Latency measurement using a ping-pong test over a pty doesn't show any >> visible performance drop. >> >> Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >> Cc: stable@vger.kernel.org >> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >> (backported from commit e81107d4c6bd098878af9796b24edc8d4a9524fd) >> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> >> --- >> drivers/tty/n_tty.c | 11 +++++------ >> 1 file changed, 5 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c >> index 84dcdf4..2d4088d 100644 >> --- a/drivers/tty/n_tty.c >> +++ b/drivers/tty/n_tty.c >> @@ -363,6 +363,7 @@ static void n_tty_packet_mode_flush(struct >> tty_struct *tty) >> spin_lock_irqsave(&tty->ctrl_lock, flags); >> if (tty->link->packet) { >> tty->ctrl_status |= TIOCPKT_FLUSHREAD; >> + spin_unlock_irqrestore(&tty->ctrl_lock, flags); >> wake_up_interruptible(&tty->link->read_wait); >> } >> spin_unlock_irqrestore(&tty->ctrl_lock, flags); > > This doesn't look right. I think spin lock/unlock have to be balanced, > plus it was not part of the original patch. > >> @@ -1173,7 +1174,7 @@ static void n_tty_receive_break(struct >> tty_struct *tty) >> put_tty_queue('\0', ldata); >> } >> put_tty_queue('\0', ldata); >> - wake_up_interruptible(&tty->read_wait); >> + wake_up_interruptible_poll(&tty->read_wait, POLLIN); >> } >> > > I'm also a little leery of using wake_up_interruptible_poll() when the > original code was wake_up_interruptible(), though upstream is currently > using wake_up_interruptible_poll(). However, test results appear to be > positive, so meh. > > See attached backport. > > rtg > > well, my backport isn't completely right either. Just reproduce your original patch without the spin_unlock_irqrestore() changes. rtg
Thanks for the feedback, Tim. On November 14, 2015 9:57:51 AM EST, Tim Gardner <tim.gardner@canonical.com> wrote: >On 11/12/2015 01:16 PM, Tim Gardner wrote: >> On 11/12/2015 11:24 AM, Joseph Salisbury wrote: >>> From: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >>> >>> BugLink: http://bugs.launchpad.net/bugs/1512815 >>> >>> My colleague ran into a program stall on a x86_64 server, where >>> n_tty_read() was waiting for data even if there was data in the >buffer >>> in the pty. kernel stack for the stuck process looks like below. >>> #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 >>> #1 [ffff88303d107bd0] schedule at ffffffff815c513e >>> #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 >>> #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 >>> #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 >>> #5 [ffff88303d107dd0] tty_read at ffffffff81368013 >>> #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 >>> #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 >>> #8 [ffff88303d107f00] sys_read at ffffffff811a4306 >>> #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at >ffffffff815c86d7 >>> >>> There seems to be two problems causing this issue. >>> >>> First, in drivers/tty/n_tty.c, __receive_buf() stores the data and >>> updates ldata->commit_head using smp_store_release() and then checks >>> the wait queue using waitqueue_active(). However, since there is no >>> memory barrier, __receive_buf() could return without calling >>> wake_up_interactive_poll(), and at the same time, n_tty_read() could >>> start to wait in wait_woken() as in the following chart. >>> >>> __receive_buf() n_tty_read() >>> >------------------------------------------------------------------------ >>> if (waitqueue_active(&tty->read_wait)) >>> /* Memory operations issued after the >>> RELEASE may be completed before the >>> RELEASE operation has completed */ >>> >>> add_wait_queue(&tty->read_wait, &wait); >>> ... >>> if (!input_available_p(tty, >>> 0)) { >>> smp_store_release(&ldata->commit_head, >>> ldata->read_head); >>> ... >>> timeout = wait_woken(&wait, >>> TASK_INTERRUPTIBLE, >timeout); >>> >------------------------------------------------------------------------ >>> >>> The second problem is that n_tty_read() also lacks a memory barrier >>> call and could also cause __receive_buf() to return without calling >>> wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() >>> as in the chart below. >>> >>> __receive_buf() n_tty_read() >>> >------------------------------------------------------------------------ >>> spin_lock_irqsave(&q->lock, >>> flags); >>> /* from add_wait_queue() */ >>> ... >>> if (!input_available_p(tty, >>> 0)) { >>> /* Memory operations issued >>> after the >>> RELEASE may be completed >>> before the >>> RELEASE operation has >>> completed */ >>> smp_store_release(&ldata->commit_head, >>> ldata->read_head); >>> if (waitqueue_active(&tty->read_wait)) >>> __add_wait_queue(q, wait); >>> >>> spin_unlock_irqrestore(&q->lock,flags); >>> /* from add_wait_queue() */ >>> ... >>> timeout = wait_woken(&wait, >>> TASK_INTERRUPTIBLE, >timeout); >>> >------------------------------------------------------------------------ >>> >>> There are also other places in drivers/tty/n_tty.c which have >similar >>> calls to waitqueue_active(), so instead of adding many memory >barrier >>> calls, this patch simply removes the call to waitqueue_active(), >>> leaving just wake_up*() behind. >>> >>> This fixes both problems because, even though the memory access >before >>> or after the spinlocks in both wake_up*() and add_wait_queue() can >>> sneak into the critical section, it cannot go past it and the >critical >>> section assures that they will be serialized (please see "INTER-CPU >>> ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for >a >>> better explanation). Moreover, the resulting code is much simpler. >>> >>> Latency measurement using a ping-pong test over a pty doesn't show >any >>> visible performance drop. >>> >>> Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >>> (backported from commit e81107d4c6bd098878af9796b24edc8d4a9524fd) >>> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> >>> --- >>> drivers/tty/n_tty.c | 11 +++++------ >>> 1 file changed, 5 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c >>> index 84dcdf4..2d4088d 100644 >>> --- a/drivers/tty/n_tty.c >>> +++ b/drivers/tty/n_tty.c >>> @@ -363,6 +363,7 @@ static void n_tty_packet_mode_flush(struct >>> tty_struct *tty) >>> spin_lock_irqsave(&tty->ctrl_lock, flags); >>> if (tty->link->packet) { >>> tty->ctrl_status |= TIOCPKT_FLUSHREAD; >>> + spin_unlock_irqrestore(&tty->ctrl_lock, flags); >>> wake_up_interruptible(&tty->link->read_wait); >>> } >>> spin_unlock_irqrestore(&tty->ctrl_lock, flags); >> >> This doesn't look right. I think spin lock/unlock have to be >balanced, >> plus it was not part of the original patch. >> >>> @@ -1173,7 +1174,7 @@ static void n_tty_receive_break(struct >>> tty_struct *tty) >>> put_tty_queue('\0', ldata); >>> } >>> put_tty_queue('\0', ldata); >>> - wake_up_interruptible(&tty->read_wait); >>> + wake_up_interruptible_poll(&tty->read_wait, POLLIN); >>> } >>> >> >> I'm also a little leery of using wake_up_interruptible_poll() when >the >> original code was wake_up_interruptible(), though upstream is >currently >> using wake_up_interruptible_poll(). However, test results appear to >be >> positive, so meh. >> >> See attached backport. >> >> rtg >> >> > >well, my backport isn't completely right either. Just reproduce your >original patch without the spin_unlock_irqrestore() changes. > >rtg >-- >Tim Gardner tim.gardner@canonical.com
From 4728d4041c3cd561b9b6a1ed6e7db001426ca502 Mon Sep 17 00:00:00 2001 From: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Date: Fri, 2 Oct 2015 08:27:05 +0000 Subject: [PATCH] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c My colleague ran into a program stall on a x86_64 server, where n_tty_read() was waiting for data even if there was data in the buffer in the pty. kernel stack for the stuck process looks like below. #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 #1 [ffff88303d107bd0] schedule at ffffffff815c513e #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 #5 [ffff88303d107dd0] tty_read at ffffffff81368013 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 #8 [ffff88303d107f00] sys_read at ffffffff811a4306 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 There seems to be two problems causing this issue. First, in drivers/tty/n_tty.c, __receive_buf() stores the data and updates ldata->commit_head using smp_store_release() and then checks the wait queue using waitqueue_active(). However, since there is no memory barrier, __receive_buf() could return without calling wake_up_interactive_poll(), and at the same time, n_tty_read() could start to wait in wait_woken() as in the following chart. __receive_buf() n_tty_read() ------------------------------------------------------------------------ if (waitqueue_active(&tty->read_wait)) /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ add_wait_queue(&tty->read_wait, &wait); ... if (!input_available_p(tty, 0)) { smp_store_release(&ldata->commit_head, ldata->read_head); ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ The second problem is that n_tty_read() also lacks a memory barrier call and could also cause __receive_buf() to return without calling wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() as in the chart below. __receive_buf() n_tty_read() ------------------------------------------------------------------------ spin_lock_irqsave(&q->lock, flags); /* from add_wait_queue() */ ... if (!input_available_p(tty, 0)) { /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ smp_store_release(&ldata->commit_head, ldata->read_head); if (waitqueue_active(&tty->read_wait)) __add_wait_queue(q, wait); spin_unlock_irqrestore(&q->lock,flags); /* from add_wait_queue() */ ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ There are also other places in drivers/tty/n_tty.c which have similar calls to waitqueue_active(), so instead of adding many memory barrier calls, this patch simply removes the call to waitqueue_active(), leaving just wake_up*() behind. This fixes both problems because, even though the memory access before or after the spinlocks in both wake_up*() and add_wait_queue() can sneak into the critical section, it cannot go past it and the critical section assures that they will be serialized (please see "INTER-CPU ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a better explanation). Moreover, the resulting code is much simpler. Latency measurement using a ping-pong test over a pty doesn't show any visible performance drop. Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (back ported from commit e81107d4c6bd098878af9796b24edc8d4a9524fd) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Conflicts: drivers/tty/n_tty.c --- drivers/tty/n_tty.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 84dcdf4..3de41df 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1385,8 +1385,7 @@ handle_newline: put_tty_queue(c, ldata); ldata->canon_head = ldata->read_head; kill_fasync(&tty->fasync, SIGIO, POLL_IN); - if (waitqueue_active(&tty->read_wait)) - wake_up_interruptible(&tty->read_wait); + wake_up_interruptible_poll(&tty->read_wait, POLLIN); return 0; } } @@ -1671,8 +1670,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp, if ((!ldata->icanon && (read_cnt(ldata) >= ldata->minimum_to_wake)) || L_EXTPROC(tty)) { kill_fasync(&tty->fasync, SIGIO, POLL_IN); - if (waitqueue_active(&tty->read_wait)) - wake_up_interruptible(&tty->read_wait); + wake_up_interruptible_poll(&tty->read_wait, POLLIN); } } -- 1.9.1