Message ID | 20180614145318.7985-1-khalid.elmously@canonical.com |
---|---|
State | New |
Headers | show |
Series | block: do not use interruptible wait anywhere | expand |
Sent a v2 with correct subject tags On 2018-06-14 10:53:18 , Khalid Elmously wrote: > From: Alan Jenkins <alan.christopher.jenkins@gmail.com> > > BugLink: http://bugs.launchpad.net/bugs/1776887 > > When blk_queue_enter() waits for a queue to unfreeze, or unset the > PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. > > The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec > ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI > device is resumed asynchronously, i.e. after un-freezing userspace tasks. > > So that commit exposed the bug as a regression in v4.15. A mysterious > SIGBUS (or -EIO) sometimes happened during the time the device was being > resumed. Most frequently, there was no kernel log message, and we saw Xorg > or Xwayland killed by SIGBUS.[1] > > [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 > > Without this fix, I get an IO error in this test: > > # dd if=/dev/sda of=/dev/null iflag=direct & \ > while killall -SIGUSR1 dd; do sleep 0.1; done & \ > echo mem > /sys/power/state ; \ > sleep 5; killall dd # stop after 5 seconds > > The interruptible wait was added to blk_queue_enter in > commit 3ef28e83ab15 ("block: generic request_queue reference counting"). > Before then, the interruptible wait was only in blk-mq, but I don't think > it could ever have been correct. > > Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: stable@vger.kernel.org > Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428) > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> > > --- > block/blk-core.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index fc0666354af3..59c91e345eea 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > while (true) { > bool success = false; > - int ret; > > rcu_read_lock(); > if (percpu_ref_tryget_live(&q->q_usage_counter)) { > @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > */ > smp_rmb(); > > - ret = wait_event_interruptible(q->mq_freeze_wq, > - (atomic_read(&q->mq_freeze_depth) == 0 && > - (preempt || !blk_queue_preempt_only(q))) || > - blk_queue_dying(q)); > + wait_event(q->mq_freeze_wq, > + (atomic_read(&q->mq_freeze_depth) == 0 && > + (preempt || !blk_queue_preempt_only(q))) || > + blk_queue_dying(q)); > if (blk_queue_dying(q)) > return -ENODEV; > - if (ret) > - return ret; > } > } > > -- > 2.17.1 >
stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again). On 2018-06-14 10:53:18 , Khalid Elmously wrote: > From: Alan Jenkins <alan.christopher.jenkins@gmail.com> > > BugLink: http://bugs.launchpad.net/bugs/1776887 > > When blk_queue_enter() waits for a queue to unfreeze, or unset the > PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. > > The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec > ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI > device is resumed asynchronously, i.e. after un-freezing userspace tasks. > > So that commit exposed the bug as a regression in v4.15. A mysterious > SIGBUS (or -EIO) sometimes happened during the time the device was being > resumed. Most frequently, there was no kernel log message, and we saw Xorg > or Xwayland killed by SIGBUS.[1] > > [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 > > Without this fix, I get an IO error in this test: > > # dd if=/dev/sda of=/dev/null iflag=direct & \ > while killall -SIGUSR1 dd; do sleep 0.1; done & \ > echo mem > /sys/power/state ; \ > sleep 5; killall dd # stop after 5 seconds > > The interruptible wait was added to blk_queue_enter in > commit 3ef28e83ab15 ("block: generic request_queue reference counting"). > Before then, the interruptible wait was only in blk-mq, but I don't think > it could ever have been correct. > > Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: stable@vger.kernel.org > Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428) > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> > > --- > block/blk-core.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index fc0666354af3..59c91e345eea 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > while (true) { > bool success = false; > - int ret; > > rcu_read_lock(); > if (percpu_ref_tryget_live(&q->q_usage_counter)) { > @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > */ > smp_rmb(); > > - ret = wait_event_interruptible(q->mq_freeze_wq, > - (atomic_read(&q->mq_freeze_depth) == 0 && > - (preempt || !blk_queue_preempt_only(q))) || > - blk_queue_dying(q)); > + wait_event(q->mq_freeze_wq, > + (atomic_read(&q->mq_freeze_depth) == 0 && > + (preempt || !blk_queue_preempt_only(q))) || > + blk_queue_dying(q)); > if (blk_queue_dying(q)) > return -ENODEV; > - if (ret) > - return ret; > } > } > > -- > 2.17.1 >
Hi Khaled As per the Ubuntu bug I quoted in my message to the Ubuntu kernel team, this patch has already been accepted in -stable kernel 4.16.7. AFAIK there is no need to resend to -stable, on the basis that they are not currently maintaining a 4.15.x kernel. I'm not deeply familiar so it is possible I am mis-understanding. (Hence I will restrain myself from writing further commentary :-P). Regards Alan On 14/06/18 16:06, Khaled Elmously wrote: > stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again). > > > > On 2018-06-14 10:53:18 , Khalid Elmously wrote: >> From: Alan Jenkins <alan.christopher.jenkins@gmail.com> >> >> BugLink: http://bugs.launchpad.net/bugs/1776887 >> >> When blk_queue_enter() waits for a queue to unfreeze, or unset the >> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. >> >> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec >> ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI >> device is resumed asynchronously, i.e. after un-freezing userspace tasks. >> >> So that commit exposed the bug as a regression in v4.15. A mysterious >> SIGBUS (or -EIO) sometimes happened during the time the device was being >> resumed. Most frequently, there was no kernel log message, and we saw Xorg >> or Xwayland killed by SIGBUS.[1] >> >> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 >> >> Without this fix, I get an IO error in this test: >> >> # dd if=/dev/sda of=/dev/null iflag=direct & \ >> while killall -SIGUSR1 dd; do sleep 0.1; done & \ >> echo mem > /sys/power/state ; \ >> sleep 5; killall dd # stop after 5 seconds >> >> The interruptible wait was added to blk_queue_enter in >> commit 3ef28e83ab15 ("block: generic request_queue reference counting"). >> Before then, the interruptible wait was only in blk-mq, but I don't think >> it could ever have been correct. >> >> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> >> Cc: stable@vger.kernel.org >> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428) >> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> >> >> --- >> block/blk-core.c | 11 ++++------- >> 1 file changed, 4 insertions(+), 7 deletions(-) >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index fc0666354af3..59c91e345eea 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) >> >> while (true) { >> bool success = false; >> - int ret; >> >> rcu_read_lock(); >> if (percpu_ref_tryget_live(&q->q_usage_counter)) { >> @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) >> */ >> smp_rmb(); >> >> - ret = wait_event_interruptible(q->mq_freeze_wq, >> - (atomic_read(&q->mq_freeze_depth) == 0 && >> - (preempt || !blk_queue_preempt_only(q))) || >> - blk_queue_dying(q)); >> + wait_event(q->mq_freeze_wq, >> + (atomic_read(&q->mq_freeze_depth) == 0 && >> + (preempt || !blk_queue_preempt_only(q))) || >> + blk_queue_dying(q)); >> if (blk_queue_dying(q)) >> return -ENODEV; >> - if (ret) >> - return ret; >> } >> } >> >> -- >> 2.17.1 >>
I screwed up with git-send-email (I think there was a change in its behaviour recently). I shouldn't have sent to -stable - sorry. On 2018-06-14 16:36:46 , Alan Jenkins wrote: > Hi Khaled > > As per the Ubuntu bug I quoted in my message to the Ubuntu kernel team, this > patch has already been accepted in -stable kernel 4.16.7. AFAIK there is no > need to resend to -stable, on the basis that they are not currently > maintaining a 4.15.x kernel. > > I'm not deeply familiar so it is possible I am mis-understanding. (Hence I > will restrain myself from writing further commentary :-P). > > Regards > > Alan > > > On 14/06/18 16:06, Khaled Elmously wrote: > > stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again). > > > > > > > > On 2018-06-14 10:53:18 , Khalid Elmously wrote: > > > From: Alan Jenkins <alan.christopher.jenkins@gmail.com> > > > > > > BugLink: http://bugs.launchpad.net/bugs/1776887 > > > > > > When blk_queue_enter() waits for a queue to unfreeze, or unset the > > > PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. > > > > > > The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec > > > ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI > > > device is resumed asynchronously, i.e. after un-freezing userspace tasks. > > > > > > So that commit exposed the bug as a regression in v4.15. A mysterious > > > SIGBUS (or -EIO) sometimes happened during the time the device was being > > > resumed. Most frequently, there was no kernel log message, and we saw Xorg > > > or Xwayland killed by SIGBUS.[1] > > > > > > [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 > > > > > > Without this fix, I get an IO error in this test: > > > > > > # dd if=/dev/sda of=/dev/null iflag=direct & \ > > > while killall -SIGUSR1 dd; do sleep 0.1; done & \ > > > echo mem > /sys/power/state ; \ > > > sleep 5; killall dd # stop after 5 seconds > > > > > > The interruptible wait was added to blk_queue_enter in > > > commit 3ef28e83ab15 ("block: generic request_queue reference counting"). > > > Before then, the interruptible wait was only in blk-mq, but I don't think > > > it could ever have been correct. > > > > > > Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> > > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428) > > > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> > > > > > > --- > > > block/blk-core.c | 11 ++++------- > > > 1 file changed, 4 insertions(+), 7 deletions(-) > > > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index fc0666354af3..59c91e345eea 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > > while (true) { > > > bool success = false; > > > - int ret; > > > rcu_read_lock(); > > > if (percpu_ref_tryget_live(&q->q_usage_counter)) { > > > @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > > */ > > > smp_rmb(); > > > - ret = wait_event_interruptible(q->mq_freeze_wq, > > > - (atomic_read(&q->mq_freeze_depth) == 0 && > > > - (preempt || !blk_queue_preempt_only(q))) || > > > - blk_queue_dying(q)); > > > + wait_event(q->mq_freeze_wq, > > > + (atomic_read(&q->mq_freeze_depth) == 0 && > > > + (preempt || !blk_queue_preempt_only(q))) || > > > + blk_queue_dying(q)); > > > if (blk_queue_dying(q)) > > > return -ENODEV; > > > - if (ret) > > > - return ret; > > > } > > > } > > > -- > > > 2.17.1 > > > >
diff --git a/block/blk-core.c b/block/blk-core.c index fc0666354af3..59c91e345eea 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) while (true) { bool success = false; - int ret; rcu_read_lock(); if (percpu_ref_tryget_live(&q->q_usage_counter)) { @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) */ smp_rmb(); - ret = wait_event_interruptible(q->mq_freeze_wq, - (atomic_read(&q->mq_freeze_depth) == 0 && - (preempt || !blk_queue_preempt_only(q))) || - blk_queue_dying(q)); + wait_event(q->mq_freeze_wq, + (atomic_read(&q->mq_freeze_depth) == 0 && + (preempt || !blk_queue_preempt_only(q))) || + blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; - if (ret) - return ret; } }