diff mbox series

[SRU,Xenial,1/2] UBUNTU: SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI command requeue

Message ID 20181016153820.29487-2-mfo@canonical.com
State New
Headers show
Series Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup) | expand

Commit Message

Mauricio Faria de Oliveira Oct. 16, 2018, 3:38 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1798110

The 'reqs' counter is incremented in the SCSI .queuecommand() path,
right before virtscsi_queuecommand() is called, in either
 - virtscsi_queuecommand_single(), or
 - virtscsi_queuecommand_multi(), via virtscsi_pick_vq{_mq}().

And it's decremented only in the SCSI command completion callback
(after the command is successfully queued and completed by adapter):
 - virtscsi_complete_cmd().

This allows for the counter to be incremented but _not_ decremented
if virtscsi_queuecommand() gets an error to add/kick the command to
the virtio ring (i.e., virtscsi_kick_cmd() fails with not 0 nor EIO).

    static virtscsi_queuecommand(...)
    {
            ...
            ret = virtscsi_kick_cmd(...)
            if (ret == -EIO) {
                    ...
                    virtscsi_complete_cmd(vscsi, cmd);
                    ...
            } else if (ret != 0) {
                    return SCSI_MLQUEUE_HOST_BUSY;
            }
            return 0;
    }

In that case, the return code SCSI_MLQUEUE_HOST_BUSY causes the SCSI
command to be requeued by the SCSI layer, which sends it again later
in the .queuecommand() path -- incrementing the reqs counter _again_.

This may happen many times for the same SCSI command, depending on
the virtio ring condition/implementation, so the reqs counter will
never return to zero (it's decremented only once in the completion
callback). And it may happen for (m)any SCSI commands in this path.

Unfortunately.. that causes a problem with a downstream/SAUCE patch
for Xenial, which uses the reqs counter to sync with the completion
callback: commit f1f609d8015e ("UBUNTU: SAUCE: (no-up) virtio-scsi:
Fix race in target free"), and waits for the value to become zero.

This problem plus that patch prevent the SCSI target removal from
finishing, eventually causing a CPU soft lockup on another CPU that
is waiting for some kernel resource that is/remains locked in the
stack chain of this CPU.

This has been verified 1) with a synthetic test case with QEMU+GDB
that fakes the number of available elements in virtio ring for one
time (harmless), so to force the SCSI command to be requeued, then
uses QEMU monitor to remove the virtio-scsi target.

_AND_ 2) with the test-case reported by the customer (a for-loop on
a cloud instance that repeatedly mounts the virtio-scsi drive, copy
data out of it, unmount it, then detach the virtio-scsi drive).
(Here, the problem usually happens in the 1st or 2nd iteration, but
with the patch it has run for 35 iterations without any problems).

Upstream has done away with the reqs counter (originally used only
to check if any requests were still active, for steering;  not for
our sync purposes).  Instead of trying to find an alternative sync
way for now let's just fix the behavior which we know is incorrect.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Tested-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Tested-by: David Coronel <david.coronel@canonical.com>
---
 drivers/scsi/virtio_scsi.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Stefan Bader Oct. 17, 2018, 8:14 a.m. UTC | #1
On 16.10.18 17:38, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1798110
> 
> The 'reqs' counter is incremented in the SCSI .queuecommand() path,
> right before virtscsi_queuecommand() is called, in either
>  - virtscsi_queuecommand_single(), or
>  - virtscsi_queuecommand_multi(), via virtscsi_pick_vq{_mq}().
> 
> And it's decremented only in the SCSI command completion callback
> (after the command is successfully queued and completed by adapter):
>  - virtscsi_complete_cmd().
> 
> This allows for the counter to be incremented but _not_ decremented
> if virtscsi_queuecommand() gets an error to add/kick the command to
> the virtio ring (i.e., virtscsi_kick_cmd() fails with not 0 nor EIO).
> 
>     static virtscsi_queuecommand(...)
>     {
>             ...
>             ret = virtscsi_kick_cmd(...)
>             if (ret == -EIO) {
>                     ...
>                     virtscsi_complete_cmd(vscsi, cmd);
>                     ...
>             } else if (ret != 0) {
>                     return SCSI_MLQUEUE_HOST_BUSY;
>             }
>             return 0;
>     }
> 
> In that case, the return code SCSI_MLQUEUE_HOST_BUSY causes the SCSI
> command to be requeued by the SCSI layer, which sends it again later
> in the .queuecommand() path -- incrementing the reqs counter _again_.
> 
> This may happen many times for the same SCSI command, depending on
> the virtio ring condition/implementation, so the reqs counter will
> never return to zero (it's decremented only once in the completion
> callback). And it may happen for (m)any SCSI commands in this path.
> 
> Unfortunately.. that causes a problem with a downstream/SAUCE patch
> for Xenial, which uses the reqs counter to sync with the completion
> callback: commit f1f609d8015e ("UBUNTU: SAUCE: (no-up) virtio-scsi:
> Fix race in target free"), and waits for the value to become zero.
> 
> This problem plus that patch prevent the SCSI target removal from
> finishing, eventually causing a CPU soft lockup on another CPU that
> is waiting for some kernel resource that is/remains locked in the
> stack chain of this CPU.
> 
> This has been verified 1) with a synthetic test case with QEMU+GDB
> that fakes the number of available elements in virtio ring for one
> time (harmless), so to force the SCSI command to be requeued, then
> uses QEMU monitor to remove the virtio-scsi target.
> 
> _AND_ 2) with the test-case reported by the customer (a for-loop on
> a cloud instance that repeatedly mounts the virtio-scsi drive, copy
> data out of it, unmount it, then detach the virtio-scsi drive).
> (Here, the problem usually happens in the 1st or 2nd iteration, but
> with the patch it has run for 35 iterations without any problems).
> 
> Upstream has done away with the reqs counter (originally used only
> to check if any requests were still active, for steering;  not for
> our sync purposes).  Instead of trying to find an alternative sync
> way for now let's just fix the behavior which we know is incorrect.
> 
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> Tested-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> Tested-by: David Coronel <david.coronel@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  drivers/scsi/virtio_scsi.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index b4a41d581021..572722e86bef 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -571,6 +571,10 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>  		virtscsi_complete_cmd(vscsi, cmd);
>  		spin_unlock_irqrestore(&req_vq->vq_lock, flags);
>  	} else if (ret != 0) {
> +		/* The SCSI command requeue will increment 'tgt->reqs' again. */
> +		struct virtio_scsi_target_state *tgt =
> +				scsi_target(sc->device)->hostdata;
> +		atomic_dec(&tgt->reqs);
>  		return SCSI_MLQUEUE_HOST_BUSY;
>  	}
>  	return 0;
>
Khalid Elmously Oct. 22, 2018, 7:20 a.m. UTC | #2
Only patch 1/2 has been applied

On 2018-10-16 12:38:19 , Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1798110
> 
> The 'reqs' counter is incremented in the SCSI .queuecommand() path,
> right before virtscsi_queuecommand() is called, in either
>  - virtscsi_queuecommand_single(), or
>  - virtscsi_queuecommand_multi(), via virtscsi_pick_vq{_mq}().
> 
> And it's decremented only in the SCSI command completion callback
> (after the command is successfully queued and completed by adapter):
>  - virtscsi_complete_cmd().
> 
> This allows for the counter to be incremented but _not_ decremented
> if virtscsi_queuecommand() gets an error to add/kick the command to
> the virtio ring (i.e., virtscsi_kick_cmd() fails with not 0 nor EIO).
> 
>     static virtscsi_queuecommand(...)
>     {
>             ...
>             ret = virtscsi_kick_cmd(...)
>             if (ret == -EIO) {
>                     ...
>                     virtscsi_complete_cmd(vscsi, cmd);
>                     ...
>             } else if (ret != 0) {
>                     return SCSI_MLQUEUE_HOST_BUSY;
>             }
>             return 0;
>     }
> 
> In that case, the return code SCSI_MLQUEUE_HOST_BUSY causes the SCSI
> command to be requeued by the SCSI layer, which sends it again later
> in the .queuecommand() path -- incrementing the reqs counter _again_.
> 
> This may happen many times for the same SCSI command, depending on
> the virtio ring condition/implementation, so the reqs counter will
> never return to zero (it's decremented only once in the completion
> callback). And it may happen for (m)any SCSI commands in this path.
> 
> Unfortunately.. that causes a problem with a downstream/SAUCE patch
> for Xenial, which uses the reqs counter to sync with the completion
> callback: commit f1f609d8015e ("UBUNTU: SAUCE: (no-up) virtio-scsi:
> Fix race in target free"), and waits for the value to become zero.
> 
> This problem plus that patch prevent the SCSI target removal from
> finishing, eventually causing a CPU soft lockup on another CPU that
> is waiting for some kernel resource that is/remains locked in the
> stack chain of this CPU.
> 
> This has been verified 1) with a synthetic test case with QEMU+GDB
> that fakes the number of available elements in virtio ring for one
> time (harmless), so to force the SCSI command to be requeued, then
> uses QEMU monitor to remove the virtio-scsi target.
> 
> _AND_ 2) with the test-case reported by the customer (a for-loop on
> a cloud instance that repeatedly mounts the virtio-scsi drive, copy
> data out of it, unmount it, then detach the virtio-scsi drive).
> (Here, the problem usually happens in the 1st or 2nd iteration, but
> with the patch it has run for 35 iterations without any problems).
> 
> Upstream has done away with the reqs counter (originally used only
> to check if any requests were still active, for steering;  not for
> our sync purposes).  Instead of trying to find an alternative sync
> way for now let's just fix the behavior which we know is incorrect.
> 
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> Tested-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> Tested-by: David Coronel <david.coronel@canonical.com>
> ---
>  drivers/scsi/virtio_scsi.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index b4a41d581021..572722e86bef 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -571,6 +571,10 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>  		virtscsi_complete_cmd(vscsi, cmd);
>  		spin_unlock_irqrestore(&req_vq->vq_lock, flags);
>  	} else if (ret != 0) {
> +		/* The SCSI command requeue will increment 'tgt->reqs' again. */
> +		struct virtio_scsi_target_state *tgt =
> +				scsi_target(sc->device)->hostdata;
> +		atomic_dec(&tgt->reqs);
>  		return SCSI_MLQUEUE_HOST_BUSY;
>  	}
>  	return 0;
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff mbox series

Patch

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index b4a41d581021..572722e86bef 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -571,6 +571,10 @@  static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 		virtscsi_complete_cmd(vscsi, cmd);
 		spin_unlock_irqrestore(&req_vq->vq_lock, flags);
 	} else if (ret != 0) {
+		/* The SCSI command requeue will increment 'tgt->reqs' again. */
+		struct virtio_scsi_target_state *tgt =
+				scsi_target(sc->device)->hostdata;
+		atomic_dec(&tgt->reqs);
 		return SCSI_MLQUEUE_HOST_BUSY;
 	}
 	return 0;