[1/7] block/nvme: poll queues without q->lock

Message ID	20200519171138.201667-2-stefanha@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Stefan Hajnoczi <stefanha@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH 1/7] block/nvme: poll queues without q->lock Date: Tue, 19 May 2020 18:11:32 +0100 Message-Id: <20200519171138.201667-2-stefanha@redhat.com> In-Reply-To: <20200519171138.201667-1-stefanha@redhat.com> References: <20200519171138.201667-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: base64 Received-SPF: pass client-ip=205.139.110.120; envelope-from=stefanha@redhat.com; helo=us-smtp-1.mimecast.com Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>, qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud?= =?utf-8?q?=C3=A9?= <philmd@redhat.com> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	block/nvme: support nested aio_poll() \| expand [0/7] block/nvme: support nested aio_poll() [1/7] block/nvme: poll queues without q->lock [2/7] block/nvme: drop tautologous assertion [3/7] block/nvme: don't access CQE after moving cq.head [4/7] block/nvme: switch to a NVMeRequest freelist [5/7] block/nvme: clarify that free_req_queue is protected by q->lock [6/7] block/nvme: keep BDRVNVMeState pointer in NVMeQueuePair [7/7] block/nvme: support nested aio_poll()

Message ID

20200519171138.201667-2-stefanha@redhat.com

State

New

Headers

From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Subject: [PATCH 1/7] block/nvme: poll queues without q->lock
Date: Tue, 19 May 2020 18:11:32 +0100
Message-Id: <20200519171138.201667-2-stefanha@redhat.com>
In-Reply-To: <20200519171138.201667-1-stefanha@redhat.com>
References: <20200519171138.201667-1-stefanha@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: base64
Received-SPF: pass client-ip=205.139.110.120;
 envelope-from=stefanha@redhat.com;
 helo=us-smtp-1.mimecast.com
X-Spam_score_int: -3
X-Spam_score: -0.4
X-Spam_bar: /
X-Spam_report: (-0.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=0.001,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 MIME_BASE64_TEXT=1.741, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001,
 RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
 qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>,
 Stefan Hajnoczi <stefanha@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud?=
	=?utf-8?q?=C3=A9?= <philmd@redhat.com>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Series

block/nvme: support nested aio_poll() | expand

Commit Message

Stefan Hajnoczi May 19, 2020, 5:11 p.m. UTC

A lot of CPU time is spent simply locking/unlocking q->lock during
polling. Check for completion outside the lock to make q->lock disappear
from the profile.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nvme.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Sergio Lopez May 25, 2020, 8:07 a.m. UTC | #1

On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> A lot of CPU time is spent simply locking/unlocking q->lock during
> polling. Check for completion outside the lock to make q->lock disappear
> from the profile.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nvme.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index eb2f54dd9d..7eb4512666 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
>  
>      for (i = 0; i < s->nr_queues; i++) {
>          NVMeQueuePair *q = s->queues[i];
> +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> +
> +        /*
> +         * q->lock isn't needed for checking completion because
> +         * nvme_process_completion() only runs in the event loop thread and
> +         * cannot race with itself.
> +         */
> +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> +            continue;
> +        }
> +

IIUC, this is introducing an early check of the phase bit to determine
if there is something new in the queue.

I'm fine with this optimization, but I have the feeling that the
comment doesn't properly describe it.

Sergio.

>          qemu_mutex_lock(&q->lock);
>          while (nvme_process_completion(s, q)) {
>              /* Keep polling */
> -- 
> 2.25.3
>

Stefan Hajnoczi May 28, 2020, 3:23 p.m. UTC | #2

On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > A lot of CPU time is spent simply locking/unlocking q->lock during
> > polling. Check for completion outside the lock to make q->lock disappear
> > from the profile.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/nvme.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index eb2f54dd9d..7eb4512666 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> >  
> >      for (i = 0; i < s->nr_queues; i++) {
> >          NVMeQueuePair *q = s->queues[i];
> > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > +
> > +        /*
> > +         * q->lock isn't needed for checking completion because
> > +         * nvme_process_completion() only runs in the event loop thread and
> > +         * cannot race with itself.
> > +         */
> > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > +            continue;
> > +        }
> > +
> 
> IIUC, this is introducing an early check of the phase bit to determine
> if there is something new in the queue.
> 
> I'm fine with this optimization, but I have the feeling that the
> comment doesn't properly describe it.

I'm not sure I understand. The comment explains why it's safe not to
take q->lock. Normally it would be taken. Without the comment readers
could be confused why we ignore the locking rules here.

As for documenting the cqe->status expression itself, I didn't think of
explaining it since it's part of the theory of operation of this device.
Any polling driver will do this, there's nothing QEMU-specific or
unusual going on here.

Would you like me to expand the comment explaining that NVMe polling
consists of checking the phase bit of the latest cqe to check for
readiness?

Or maybe I misunderstood? :)

Stefan

Sergio Lopez May 29, 2020, 7:49 a.m. UTC | #3

On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote:
> On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > > A lot of CPU time is spent simply locking/unlocking q->lock during
> > > polling. Check for completion outside the lock to make q->lock disappear
> > > from the profile.
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  block/nvme.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/block/nvme.c b/block/nvme.c
> > > index eb2f54dd9d..7eb4512666 100644
> > > --- a/block/nvme.c
> > > +++ b/block/nvme.c
> > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> > >
> > >      for (i = 0; i < s->nr_queues; i++) {
> > >          NVMeQueuePair *q = s->queues[i];
> > > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > > +
> > > +        /*
> > > +         * q->lock isn't needed for checking completion because
> > > +         * nvme_process_completion() only runs in the event loop thread and
> > > +         * cannot race with itself.
> > > +         */
> > > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > > +            continue;
> > > +        }
> > > +
> >
> > IIUC, this is introducing an early check of the phase bit to determine
> > if there is something new in the queue.
> >
> > I'm fine with this optimization, but I have the feeling that the
> > comment doesn't properly describe it.
>
> I'm not sure I understand. The comment explains why it's safe not to
> take q->lock. Normally it would be taken. Without the comment readers
> could be confused why we ignore the locking rules here.
>
> As for documenting the cqe->status expression itself, I didn't think of
> explaining it since it's part of the theory of operation of this device.
> Any polling driver will do this, there's nothing QEMU-specific or
> unusual going on here.
>
> Would you like me to expand the comment explaining that NVMe polling
> consists of checking the phase bit of the latest cqe to check for
> readiness?
>
> Or maybe I misunderstood? :)

I was thinking of something like "Do an early check for
completions. We don't need q->lock here because
nvme_process_completion() only runs (...)"

Sergio.

Stefan Hajnoczi June 17, 2020, 12:52 p.m. UTC | #4

On Fri, May 29, 2020 at 09:49:31AM +0200, Sergio Lopez wrote:
> On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote:
> > On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> > > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > > > A lot of CPU time is spent simply locking/unlocking q->lock during
> > > > polling. Check for completion outside the lock to make q->lock disappear
> > > > from the profile.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > >  block/nvme.c | 12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > >
> > > > diff --git a/block/nvme.c b/block/nvme.c
> > > > index eb2f54dd9d..7eb4512666 100644
> > > > --- a/block/nvme.c
> > > > +++ b/block/nvme.c
> > > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> > > >
> > > >      for (i = 0; i < s->nr_queues; i++) {
> > > >          NVMeQueuePair *q = s->queues[i];
> > > > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > > > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > > > +
> > > > +        /*
> > > > +         * q->lock isn't needed for checking completion because
> > > > +         * nvme_process_completion() only runs in the event loop thread and
> > > > +         * cannot race with itself.
> > > > +         */
> > > > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > > > +            continue;
> > > > +        }
> > > > +
> > >
> > > IIUC, this is introducing an early check of the phase bit to determine
> > > if there is something new in the queue.
> > >
> > > I'm fine with this optimization, but I have the feeling that the
> > > comment doesn't properly describe it.
> >
> > I'm not sure I understand. The comment explains why it's safe not to
> > take q->lock. Normally it would be taken. Without the comment readers
> > could be confused why we ignore the locking rules here.
> >
> > As for documenting the cqe->status expression itself, I didn't think of
> > explaining it since it's part of the theory of operation of this device.
> > Any polling driver will do this, there's nothing QEMU-specific or
> > unusual going on here.
> >
> > Would you like me to expand the comment explaining that NVMe polling
> > consists of checking the phase bit of the latest cqe to check for
> > readiness?
> >
> > Or maybe I misunderstood? :)
> 
> I was thinking of something like "Do an early check for
> completions. We don't need q->lock here because
> nvme_process_completion() only runs (...)"

Sure, will fix.

Stefan

diff --git a/block/nvme.c b/block/nvme.c
index eb2f54dd9d..7eb4512666 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -512,6 +512,18 @@  static bool nvme_poll_queues(BDRVNVMeState *s)
 
     for (i = 0; i < s->nr_queues; i++) {
         NVMeQueuePair *q = s->queues[i];
+        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
+        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
+
+        /*
+         * q->lock isn't needed for checking completion because
+         * nvme_process_completion() only runs in the event loop thread and
+         * cannot race with itself.
+         */
+        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
+            continue;
+        }
+
         qemu_mutex_lock(&q->lock);
         while (nvme_process_completion(s, q)) {
             /* Keep polling */

[1/7] block/nvme: poll queues without q->lock

Commit Message

Comments

Patch