Message ID | 20200519171138.201667-2-stefanha@redhat.com |
---|---|
State | New |
Headers | show |
Series | block/nvme: support nested aio_poll() | expand |
On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote: > A lot of CPU time is spent simply locking/unlocking q->lock during > polling. Check for completion outside the lock to make q->lock disappear > from the profile. > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > block/nvme.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/block/nvme.c b/block/nvme.c > index eb2f54dd9d..7eb4512666 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s) > > for (i = 0; i < s->nr_queues; i++) { > NVMeQueuePair *q = s->queues[i]; > + const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES; > + NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset]; > + > + /* > + * q->lock isn't needed for checking completion because > + * nvme_process_completion() only runs in the event loop thread and > + * cannot race with itself. > + */ > + if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) { > + continue; > + } > + IIUC, this is introducing an early check of the phase bit to determine if there is something new in the queue. I'm fine with this optimization, but I have the feeling that the comment doesn't properly describe it. Sergio. > qemu_mutex_lock(&q->lock); > while (nvme_process_completion(s, q)) { > /* Keep polling */ > -- > 2.25.3 >
On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote: > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote: > > A lot of CPU time is spent simply locking/unlocking q->lock during > > polling. Check for completion outside the lock to make q->lock disappear > > from the profile. > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > block/nvme.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/block/nvme.c b/block/nvme.c > > index eb2f54dd9d..7eb4512666 100644 > > --- a/block/nvme.c > > +++ b/block/nvme.c > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s) > > > > for (i = 0; i < s->nr_queues; i++) { > > NVMeQueuePair *q = s->queues[i]; > > + const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES; > > + NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset]; > > + > > + /* > > + * q->lock isn't needed for checking completion because > > + * nvme_process_completion() only runs in the event loop thread and > > + * cannot race with itself. > > + */ > > + if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) { > > + continue; > > + } > > + > > IIUC, this is introducing an early check of the phase bit to determine > if there is something new in the queue. > > I'm fine with this optimization, but I have the feeling that the > comment doesn't properly describe it. I'm not sure I understand. The comment explains why it's safe not to take q->lock. Normally it would be taken. Without the comment readers could be confused why we ignore the locking rules here. As for documenting the cqe->status expression itself, I didn't think of explaining it since it's part of the theory of operation of this device. Any polling driver will do this, there's nothing QEMU-specific or unusual going on here. Would you like me to expand the comment explaining that NVMe polling consists of checking the phase bit of the latest cqe to check for readiness? Or maybe I misunderstood? :) Stefan
On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote: > On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote: > > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote: > > > A lot of CPU time is spent simply locking/unlocking q->lock during > > > polling. Check for completion outside the lock to make q->lock disappear > > > from the profile. > > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > > --- > > > block/nvme.c | 12 ++++++++++++ > > > 1 file changed, 12 insertions(+) > > > > > > diff --git a/block/nvme.c b/block/nvme.c > > > index eb2f54dd9d..7eb4512666 100644 > > > --- a/block/nvme.c > > > +++ b/block/nvme.c > > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s) > > > > > > for (i = 0; i < s->nr_queues; i++) { > > > NVMeQueuePair *q = s->queues[i]; > > > + const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES; > > > + NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset]; > > > + > > > + /* > > > + * q->lock isn't needed for checking completion because > > > + * nvme_process_completion() only runs in the event loop thread and > > > + * cannot race with itself. > > > + */ > > > + if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) { > > > + continue; > > > + } > > > + > > > > IIUC, this is introducing an early check of the phase bit to determine > > if there is something new in the queue. > > > > I'm fine with this optimization, but I have the feeling that the > > comment doesn't properly describe it. > > I'm not sure I understand. The comment explains why it's safe not to > take q->lock. Normally it would be taken. Without the comment readers > could be confused why we ignore the locking rules here. > > As for documenting the cqe->status expression itself, I didn't think of > explaining it since it's part of the theory of operation of this device. > Any polling driver will do this, there's nothing QEMU-specific or > unusual going on here. > > Would you like me to expand the comment explaining that NVMe polling > consists of checking the phase bit of the latest cqe to check for > readiness? > > Or maybe I misunderstood? :) I was thinking of something like "Do an early check for completions. We don't need q->lock here because nvme_process_completion() only runs (...)" Sergio.
On Fri, May 29, 2020 at 09:49:31AM +0200, Sergio Lopez wrote: > On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote: > > On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote: > > > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote: > > > > A lot of CPU time is spent simply locking/unlocking q->lock during > > > > polling. Check for completion outside the lock to make q->lock disappear > > > > from the profile. > > > > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > --- > > > > block/nvme.c | 12 ++++++++++++ > > > > 1 file changed, 12 insertions(+) > > > > > > > > diff --git a/block/nvme.c b/block/nvme.c > > > > index eb2f54dd9d..7eb4512666 100644 > > > > --- a/block/nvme.c > > > > +++ b/block/nvme.c > > > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s) > > > > > > > > for (i = 0; i < s->nr_queues; i++) { > > > > NVMeQueuePair *q = s->queues[i]; > > > > + const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES; > > > > + NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset]; > > > > + > > > > + /* > > > > + * q->lock isn't needed for checking completion because > > > > + * nvme_process_completion() only runs in the event loop thread and > > > > + * cannot race with itself. > > > > + */ > > > > + if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) { > > > > + continue; > > > > + } > > > > + > > > > > > IIUC, this is introducing an early check of the phase bit to determine > > > if there is something new in the queue. > > > > > > I'm fine with this optimization, but I have the feeling that the > > > comment doesn't properly describe it. > > > > I'm not sure I understand. The comment explains why it's safe not to > > take q->lock. Normally it would be taken. Without the comment readers > > could be confused why we ignore the locking rules here. > > > > As for documenting the cqe->status expression itself, I didn't think of > > explaining it since it's part of the theory of operation of this device. > > Any polling driver will do this, there's nothing QEMU-specific or > > unusual going on here. > > > > Would you like me to expand the comment explaining that NVMe polling > > consists of checking the phase bit of the latest cqe to check for > > readiness? > > > > Or maybe I misunderstood? :) > > I was thinking of something like "Do an early check for > completions. We don't need q->lock here because > nvme_process_completion() only runs (...)" Sure, will fix. Stefan
diff --git a/block/nvme.c b/block/nvme.c index eb2f54dd9d..7eb4512666 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s) for (i = 0; i < s->nr_queues; i++) { NVMeQueuePair *q = s->queues[i]; + const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES; + NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset]; + + /* + * q->lock isn't needed for checking completion because + * nvme_process_completion() only runs in the event loop thread and + * cannot race with itself. + */ + if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) { + continue; + } + qemu_mutex_lock(&q->lock); while (nvme_process_completion(s, q)) { /* Keep polling */
A lot of CPU time is spent simply locking/unlocking q->lock during polling. Check for completion outside the lock to make q->lock disappear from the profile. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- block/nvme.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)