Message ID | f711bc66ab1f1122d95753ad658264131574fbbf.1447861835.git.joseph.salisbury@canonical.com |
---|---|
State | New |
Headers | show |
On 18.11.2015 17:25, Joseph Salisbury wrote: > From: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > > BugLink: http://bugs.launchpad.net/bugs/1512815 > > My colleague ran into a program stall on a x86_64 server, where > n_tty_read() was waiting for data even if there was data in the buffer > in the pty. kernel stack for the stuck process looks like below. > #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 > #1 [ffff88303d107bd0] schedule at ffffffff815c513e > #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 > #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 > #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 > #5 [ffff88303d107dd0] tty_read at ffffffff81368013 > #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 > #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 > #8 [ffff88303d107f00] sys_read at ffffffff811a4306 > #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 > > There seems to be two problems causing this issue. > > First, in drivers/tty/n_tty.c, __receive_buf() stores the data and > updates ldata->commit_head using smp_store_release() and then checks > the wait queue using waitqueue_active(). However, since there is no > memory barrier, __receive_buf() could return without calling > wake_up_interactive_poll(), and at the same time, n_tty_read() could > start to wait in wait_woken() as in the following chart. > > __receive_buf() n_tty_read() > ------------------------------------------------------------------------ > if (waitqueue_active(&tty->read_wait)) > /* Memory operations issued after the > RELEASE may be completed before the > RELEASE operation has completed */ > add_wait_queue(&tty->read_wait, &wait); > ... > if (!input_available_p(tty, 0)) { > smp_store_release(&ldata->commit_head, > ldata->read_head); > ... > timeout = wait_woken(&wait, > TASK_INTERRUPTIBLE, timeout); > ------------------------------------------------------------------------ > > The second problem is that n_tty_read() also lacks a memory barrier > call and could also cause __receive_buf() to return without calling > wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() > as in the chart below. > > __receive_buf() n_tty_read() > ------------------------------------------------------------------------ > spin_lock_irqsave(&q->lock, flags); > /* from add_wait_queue() */ > ... > if (!input_available_p(tty, 0)) { > /* Memory operations issued after the > RELEASE may be completed before the > RELEASE operation has completed */ > smp_store_release(&ldata->commit_head, > ldata->read_head); > if (waitqueue_active(&tty->read_wait)) > __add_wait_queue(q, wait); > spin_unlock_irqrestore(&q->lock,flags); > /* from add_wait_queue() */ > ... > timeout = wait_woken(&wait, > TASK_INTERRUPTIBLE, timeout); > ------------------------------------------------------------------------ > > There are also other places in drivers/tty/n_tty.c which have similar > calls to waitqueue_active(), so instead of adding many memory barrier > calls, this patch simply removes the call to waitqueue_active(), > leaving just wake_up*() behind. > > This fixes both problems because, even though the memory access before > or after the spinlocks in both wake_up*() and add_wait_queue() can > sneak into the critical section, it cannot go past it and the critical > section assures that they will be serialized (please see "INTER-CPU > ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a > better explanation). Moreover, the resulting code is much simpler. > > Latency measurement using a ping-pong test over a pty doesn't show any > visible performance drop. > > Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > Cc: stable@vger.kernel.org > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > (backported from commit e81107d4c6bd098878af9796b24edc8d4a9524fd) > Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> > --- > drivers/tty/n_tty.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > index 84dcdf4..3de41df 100644 > --- a/drivers/tty/n_tty.c > +++ b/drivers/tty/n_tty.c > @@ -1385,8 +1385,7 @@ handle_newline: > put_tty_queue(c, ldata); > ldata->canon_head = ldata->read_head; > kill_fasync(&tty->fasync, SIGIO, POLL_IN); > - if (waitqueue_active(&tty->read_wait)) > - wake_up_interruptible(&tty->read_wait); > + wake_up_interruptible_poll(&tty->read_wait, POLLIN); Are you sure about changing the wake up into a wakeup_*_poll? The original patch patch only removed if cases but left the type of wakup as it was before... > return 0; > } > } > @@ -1671,8 +1670,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp, > if ((!ldata->icanon && (read_cnt(ldata) >= ldata->minimum_to_wake)) || > L_EXTPROC(tty)) { > kill_fasync(&tty->fasync, SIGIO, POLL_IN); > - if (waitqueue_active(&tty->read_wait)) > - wake_up_interruptible(&tty->read_wait); > + wake_up_interruptible_poll(&tty->read_wait, POLLIN); Same here... -Stefan > } > } > >
On Thu, Nov 19, 2015 at 01:18:13PM +0100, Stefan Bader wrote: > On 18.11.2015 17:25, Joseph Salisbury wrote: > > From: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > > > > BugLink: http://bugs.launchpad.net/bugs/1512815 > > > > My colleague ran into a program stall on a x86_64 server, where > > n_tty_read() was waiting for data even if there was data in the buffer > > in the pty. kernel stack for the stuck process looks like below. > > #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 > > #1 [ffff88303d107bd0] schedule at ffffffff815c513e > > #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 > > #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 > > #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 > > #5 [ffff88303d107dd0] tty_read at ffffffff81368013 > > #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 > > #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 > > #8 [ffff88303d107f00] sys_read at ffffffff811a4306 > > #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 > > > > There seems to be two problems causing this issue. > > > > First, in drivers/tty/n_tty.c, __receive_buf() stores the data and > > updates ldata->commit_head using smp_store_release() and then checks > > the wait queue using waitqueue_active(). However, since there is no > > memory barrier, __receive_buf() could return without calling > > wake_up_interactive_poll(), and at the same time, n_tty_read() could > > start to wait in wait_woken() as in the following chart. > > > > __receive_buf() n_tty_read() > > ------------------------------------------------------------------------ > > if (waitqueue_active(&tty->read_wait)) > > /* Memory operations issued after the > > RELEASE may be completed before the > > RELEASE operation has completed */ > > add_wait_queue(&tty->read_wait, &wait); > > ... > > if (!input_available_p(tty, 0)) { > > smp_store_release(&ldata->commit_head, > > ldata->read_head); > > ... > > timeout = wait_woken(&wait, > > TASK_INTERRUPTIBLE, timeout); > > ------------------------------------------------------------------------ > > > > The second problem is that n_tty_read() also lacks a memory barrier > > call and could also cause __receive_buf() to return without calling > > wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() > > as in the chart below. > > > > __receive_buf() n_tty_read() > > ------------------------------------------------------------------------ > > spin_lock_irqsave(&q->lock, flags); > > /* from add_wait_queue() */ > > ... > > if (!input_available_p(tty, 0)) { > > /* Memory operations issued after the > > RELEASE may be completed before the > > RELEASE operation has completed */ > > smp_store_release(&ldata->commit_head, > > ldata->read_head); > > if (waitqueue_active(&tty->read_wait)) > > __add_wait_queue(q, wait); > > spin_unlock_irqrestore(&q->lock,flags); > > /* from add_wait_queue() */ > > ... > > timeout = wait_woken(&wait, > > TASK_INTERRUPTIBLE, timeout); > > ------------------------------------------------------------------------ > > > > There are also other places in drivers/tty/n_tty.c which have similar > > calls to waitqueue_active(), so instead of adding many memory barrier > > calls, this patch simply removes the call to waitqueue_active(), > > leaving just wake_up*() behind. > > > > This fixes both problems because, even though the memory access before > > or after the spinlocks in both wake_up*() and add_wait_queue() can > > sneak into the critical section, it cannot go past it and the critical > > section assures that they will be serialized (please see "INTER-CPU > > ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a > > better explanation). Moreover, the resulting code is much simpler. > > > > Latency measurement using a ping-pong test over a pty doesn't show any > > visible performance drop. > > > > Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > > Cc: stable@vger.kernel.org > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > (backported from commit e81107d4c6bd098878af9796b24edc8d4a9524fd) > > Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> > > --- > > drivers/tty/n_tty.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > > index 84dcdf4..3de41df 100644 > > --- a/drivers/tty/n_tty.c > > +++ b/drivers/tty/n_tty.c > > @@ -1385,8 +1385,7 @@ handle_newline: > > put_tty_queue(c, ldata); > > ldata->canon_head = ldata->read_head; > > kill_fasync(&tty->fasync, SIGIO, POLL_IN); > > - if (waitqueue_active(&tty->read_wait)) > > - wake_up_interruptible(&tty->read_wait); > > + wake_up_interruptible_poll(&tty->read_wait, POLLIN); > > Are you sure about changing the wake up into a wakeup_*_poll? The original patch > patch only removed if cases but left the type of wakup as it was before... > Yeah, I was actually wondering the same thing. Looking at Ben's backport in stable 3.2.73, he uses wake_up_interruptible(), not wake_up_interruptible_poll(). Commit 57087d515441 ("tty: Fix spurious poll() wakeups") has done the change upstream to use wake_up_interruptible_poll(), so this could actually not be a real problem. But it's probably better to change to backport (and, just in case, maybe ask the bug reporter to re-test it...?). Cheers, -- Luís > > return 0; > > } > > } > > @@ -1671,8 +1670,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp, > > if ((!ldata->icanon && (read_cnt(ldata) >= ldata->minimum_to_wake)) || > > L_EXTPROC(tty)) { > > kill_fasync(&tty->fasync, SIGIO, POLL_IN); > > - if (waitqueue_active(&tty->read_wait)) > > - wake_up_interruptible(&tty->read_wait); > > + wake_up_interruptible_poll(&tty->read_wait, POLLIN); > > Same here... > > -Stefan > > > } > > } > > > > > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On 11/19/2015 07:38 AM, Luis Henriques wrote: > On Thu, Nov 19, 2015 at 01:18:13PM +0100, Stefan Bader wrote: >> On 18.11.2015 17:25, Joseph Salisbury wrote: >>> From: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >>> >>> BugLink: http://bugs.launchpad.net/bugs/1512815 >>> >>> My colleague ran into a program stall on a x86_64 server, where >>> n_tty_read() was waiting for data even if there was data in the buffer >>> in the pty. kernel stack for the stuck process looks like below. >>> #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 >>> #1 [ffff88303d107bd0] schedule at ffffffff815c513e >>> #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 >>> #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 >>> #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 >>> #5 [ffff88303d107dd0] tty_read at ffffffff81368013 >>> #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 >>> #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 >>> #8 [ffff88303d107f00] sys_read at ffffffff811a4306 >>> #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 >>> >>> There seems to be two problems causing this issue. >>> >>> First, in drivers/tty/n_tty.c, __receive_buf() stores the data and >>> updates ldata->commit_head using smp_store_release() and then checks >>> the wait queue using waitqueue_active(). However, since there is no >>> memory barrier, __receive_buf() could return without calling >>> wake_up_interactive_poll(), and at the same time, n_tty_read() could >>> start to wait in wait_woken() as in the following chart. >>> >>> __receive_buf() n_tty_read() >>> ------------------------------------------------------------------------ >>> if (waitqueue_active(&tty->read_wait)) >>> /* Memory operations issued after the >>> RELEASE may be completed before the >>> RELEASE operation has completed */ >>> add_wait_queue(&tty->read_wait, &wait); >>> ... >>> if (!input_available_p(tty, 0)) { >>> smp_store_release(&ldata->commit_head, >>> ldata->read_head); >>> ... >>> timeout = wait_woken(&wait, >>> TASK_INTERRUPTIBLE, timeout); >>> ------------------------------------------------------------------------ >>> >>> The second problem is that n_tty_read() also lacks a memory barrier >>> call and could also cause __receive_buf() to return without calling >>> wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() >>> as in the chart below. >>> >>> __receive_buf() n_tty_read() >>> ------------------------------------------------------------------------ >>> spin_lock_irqsave(&q->lock, flags); >>> /* from add_wait_queue() */ >>> ... >>> if (!input_available_p(tty, 0)) { >>> /* Memory operations issued after the >>> RELEASE may be completed before the >>> RELEASE operation has completed */ >>> smp_store_release(&ldata->commit_head, >>> ldata->read_head); >>> if (waitqueue_active(&tty->read_wait)) >>> __add_wait_queue(q, wait); >>> spin_unlock_irqrestore(&q->lock,flags); >>> /* from add_wait_queue() */ >>> ... >>> timeout = wait_woken(&wait, >>> TASK_INTERRUPTIBLE, timeout); >>> ------------------------------------------------------------------------ >>> >>> There are also other places in drivers/tty/n_tty.c which have similar >>> calls to waitqueue_active(), so instead of adding many memory barrier >>> calls, this patch simply removes the call to waitqueue_active(), >>> leaving just wake_up*() behind. >>> >>> This fixes both problems because, even though the memory access before >>> or after the spinlocks in both wake_up*() and add_wait_queue() can >>> sneak into the critical section, it cannot go past it and the critical >>> section assures that they will be serialized (please see "INTER-CPU >>> ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a >>> better explanation). Moreover, the resulting code is much simpler. >>> >>> Latency measurement using a ping-pong test over a pty doesn't show any >>> visible performance drop. >>> >>> Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >>> (backported from commit e81107d4c6bd098878af9796b24edc8d4a9524fd) >>> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> >>> --- >>> drivers/tty/n_tty.c | 6 ++---- >>> 1 file changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c >>> index 84dcdf4..3de41df 100644 >>> --- a/drivers/tty/n_tty.c >>> +++ b/drivers/tty/n_tty.c >>> @@ -1385,8 +1385,7 @@ handle_newline: >>> put_tty_queue(c, ldata); >>> ldata->canon_head = ldata->read_head; >>> kill_fasync(&tty->fasync, SIGIO, POLL_IN); >>> - if (waitqueue_active(&tty->read_wait)) >>> - wake_up_interruptible(&tty->read_wait); >>> + wake_up_interruptible_poll(&tty->read_wait, POLLIN); >> Are you sure about changing the wake up into a wakeup_*_poll? The original patch >> patch only removed if cases but left the type of wakup as it was before... >> Under 'handle_newline:', the original patch, commit e81107d, did remove an 'if case' but continued to use wake_up_interruptible_poll in both cases of before and after the patch application. Like Luis says, this is because of the changes commit 57087d5 makes. The original patch was also cc'd to stable and landed in 3.18-rc1, so it will eventually land in Luis and Kamal's queue. Do you think we should just wait until this lands in the upstream queues? Or would it be best to perform the backport like it was done by Ben in 3.2 and as for re-testing? > Yeah, I was actually wondering the same thing. Looking at Ben's backport > in stable 3.2.73, he uses wake_up_interruptible(), not > wake_up_interruptible_poll(). > > Commit 57087d515441 ("tty: Fix spurious poll() wakeups") has done the > change upstream to use wake_up_interruptible_poll(), so this could > actually not be a real problem. But it's probably better to change to > backport (and, just in case, maybe ask the bug reporter to re-test > it...?). > > Cheers, > -- > Luís > > >>> return 0; >>> } >>> } >>> @@ -1671,8 +1670,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp, >>> if ((!ldata->icanon && (read_cnt(ldata) >= ldata->minimum_to_wake)) || >>> L_EXTPROC(tty)) { >>> kill_fasync(&tty->fasync, SIGIO, POLL_IN); >>> - if (waitqueue_active(&tty->read_wait)) >>> - wake_up_interruptible(&tty->read_wait); >>> + wake_up_interruptible_poll(&tty->read_wait, POLLIN); >> Same here... >> >> -Stefan >> >>> } >>> } >>> >>> >> > > >> -- >> kernel-team mailing list >> kernel-team@lists.ubuntu.com >> https://lists.ubuntu.com/mailman/listinfo/kernel-team >
I'm just going to wait for this to come in via upstream stable.
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 84dcdf4..3de41df 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1385,8 +1385,7 @@ handle_newline: put_tty_queue(c, ldata); ldata->canon_head = ldata->read_head; kill_fasync(&tty->fasync, SIGIO, POLL_IN); - if (waitqueue_active(&tty->read_wait)) - wake_up_interruptible(&tty->read_wait); + wake_up_interruptible_poll(&tty->read_wait, POLLIN); return 0; } } @@ -1671,8 +1670,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp, if ((!ldata->icanon && (read_cnt(ldata) >= ldata->minimum_to_wake)) || L_EXTPROC(tty)) { kill_fasync(&tty->fasync, SIGIO, POLL_IN); - if (waitqueue_active(&tty->read_wait)) - wake_up_interruptible(&tty->read_wait); + wake_up_interruptible_poll(&tty->read_wait, POLLIN); } }