diff mbox series

[1/7] block/nvme: poll queues without q->lock

Message ID 20200519171138.201667-2-stefanha@redhat.com
State New
Headers show
Series block/nvme: support nested aio_poll() | expand

Commit Message

Stefan Hajnoczi May 19, 2020, 5:11 p.m. UTC
A lot of CPU time is spent simply locking/unlocking q->lock during
polling. Check for completion outside the lock to make q->lock disappear
from the profile.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nvme.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Sergio Lopez May 25, 2020, 8:07 a.m. UTC | #1
On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> A lot of CPU time is spent simply locking/unlocking q->lock during
> polling. Check for completion outside the lock to make q->lock disappear
> from the profile.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nvme.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index eb2f54dd9d..7eb4512666 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
>  
>      for (i = 0; i < s->nr_queues; i++) {
>          NVMeQueuePair *q = s->queues[i];
> +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> +
> +        /*
> +         * q->lock isn't needed for checking completion because
> +         * nvme_process_completion() only runs in the event loop thread and
> +         * cannot race with itself.
> +         */
> +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> +            continue;
> +        }
> +

IIUC, this is introducing an early check of the phase bit to determine
if there is something new in the queue.

I'm fine with this optimization, but I have the feeling that the
comment doesn't properly describe it.

Sergio.

>          qemu_mutex_lock(&q->lock);
>          while (nvme_process_completion(s, q)) {
>              /* Keep polling */
> -- 
> 2.25.3
>
Stefan Hajnoczi May 28, 2020, 3:23 p.m. UTC | #2
On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > A lot of CPU time is spent simply locking/unlocking q->lock during
> > polling. Check for completion outside the lock to make q->lock disappear
> > from the profile.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/nvme.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index eb2f54dd9d..7eb4512666 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> >  
> >      for (i = 0; i < s->nr_queues; i++) {
> >          NVMeQueuePair *q = s->queues[i];
> > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > +
> > +        /*
> > +         * q->lock isn't needed for checking completion because
> > +         * nvme_process_completion() only runs in the event loop thread and
> > +         * cannot race with itself.
> > +         */
> > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > +            continue;
> > +        }
> > +
> 
> IIUC, this is introducing an early check of the phase bit to determine
> if there is something new in the queue.
> 
> I'm fine with this optimization, but I have the feeling that the
> comment doesn't properly describe it.

I'm not sure I understand. The comment explains why it's safe not to
take q->lock. Normally it would be taken. Without the comment readers
could be confused why we ignore the locking rules here.

As for documenting the cqe->status expression itself, I didn't think of
explaining it since it's part of the theory of operation of this device.
Any polling driver will do this, there's nothing QEMU-specific or
unusual going on here.

Would you like me to expand the comment explaining that NVMe polling
consists of checking the phase bit of the latest cqe to check for
readiness?

Or maybe I misunderstood? :)

Stefan
Sergio Lopez May 29, 2020, 7:49 a.m. UTC | #3
On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote:
> On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > > A lot of CPU time is spent simply locking/unlocking q->lock during
> > > polling. Check for completion outside the lock to make q->lock disappear
> > > from the profile.
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  block/nvme.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/block/nvme.c b/block/nvme.c
> > > index eb2f54dd9d..7eb4512666 100644
> > > --- a/block/nvme.c
> > > +++ b/block/nvme.c
> > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> > >
> > >      for (i = 0; i < s->nr_queues; i++) {
> > >          NVMeQueuePair *q = s->queues[i];
> > > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > > +
> > > +        /*
> > > +         * q->lock isn't needed for checking completion because
> > > +         * nvme_process_completion() only runs in the event loop thread and
> > > +         * cannot race with itself.
> > > +         */
> > > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > > +            continue;
> > > +        }
> > > +
> >
> > IIUC, this is introducing an early check of the phase bit to determine
> > if there is something new in the queue.
> >
> > I'm fine with this optimization, but I have the feeling that the
> > comment doesn't properly describe it.
>
> I'm not sure I understand. The comment explains why it's safe not to
> take q->lock. Normally it would be taken. Without the comment readers
> could be confused why we ignore the locking rules here.
>
> As for documenting the cqe->status expression itself, I didn't think of
> explaining it since it's part of the theory of operation of this device.
> Any polling driver will do this, there's nothing QEMU-specific or
> unusual going on here.
>
> Would you like me to expand the comment explaining that NVMe polling
> consists of checking the phase bit of the latest cqe to check for
> readiness?
>
> Or maybe I misunderstood? :)

I was thinking of something like "Do an early check for
completions. We don't need q->lock here because
nvme_process_completion() only runs (...)"

Sergio.
Stefan Hajnoczi June 17, 2020, 12:52 p.m. UTC | #4
On Fri, May 29, 2020 at 09:49:31AM +0200, Sergio Lopez wrote:
> On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote:
> > On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> > > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > > > A lot of CPU time is spent simply locking/unlocking q->lock during
> > > > polling. Check for completion outside the lock to make q->lock disappear
> > > > from the profile.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > >  block/nvme.c | 12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > >
> > > > diff --git a/block/nvme.c b/block/nvme.c
> > > > index eb2f54dd9d..7eb4512666 100644
> > > > --- a/block/nvme.c
> > > > +++ b/block/nvme.c
> > > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> > > >
> > > >      for (i = 0; i < s->nr_queues; i++) {
> > > >          NVMeQueuePair *q = s->queues[i];
> > > > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > > > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > > > +
> > > > +        /*
> > > > +         * q->lock isn't needed for checking completion because
> > > > +         * nvme_process_completion() only runs in the event loop thread and
> > > > +         * cannot race with itself.
> > > > +         */
> > > > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > > > +            continue;
> > > > +        }
> > > > +
> > >
> > > IIUC, this is introducing an early check of the phase bit to determine
> > > if there is something new in the queue.
> > >
> > > I'm fine with this optimization, but I have the feeling that the
> > > comment doesn't properly describe it.
> >
> > I'm not sure I understand. The comment explains why it's safe not to
> > take q->lock. Normally it would be taken. Without the comment readers
> > could be confused why we ignore the locking rules here.
> >
> > As for documenting the cqe->status expression itself, I didn't think of
> > explaining it since it's part of the theory of operation of this device.
> > Any polling driver will do this, there's nothing QEMU-specific or
> > unusual going on here.
> >
> > Would you like me to expand the comment explaining that NVMe polling
> > consists of checking the phase bit of the latest cqe to check for
> > readiness?
> >
> > Or maybe I misunderstood? :)
> 
> I was thinking of something like "Do an early check for
> completions. We don't need q->lock here because
> nvme_process_completion() only runs (...)"

Sure, will fix.

Stefan
diff mbox series

Patch

diff --git a/block/nvme.c b/block/nvme.c
index eb2f54dd9d..7eb4512666 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -512,6 +512,18 @@  static bool nvme_poll_queues(BDRVNVMeState *s)
 
     for (i = 0; i < s->nr_queues; i++) {
         NVMeQueuePair *q = s->queues[i];
+        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
+        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
+
+        /*
+         * q->lock isn't needed for checking completion because
+         * nvme_process_completion() only runs in the event loop thread and
+         * cannot race with itself.
+         */
+        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
+            continue;
+        }
+
         qemu_mutex_lock(&q->lock);
         while (nvme_process_completion(s, q)) {
             /* Keep polling */