diff mbox series

[v2,1/5] block/nvme: don't flip CQ phase bits

Message ID 20190417195355.16123-2-mlevitsk@redhat.com
State New
Headers show
Series Few fixes for userspace NVME driver | expand

Commit Message

Maxim Levitsky April 17, 2019, 7:53 p.m. UTC
Phase bits are only set by the hardware to indicate new completions
and not by the device driver.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c | 2 --
 1 file changed, 2 deletions(-)

Comments

John Snow June 3, 2019, 10:25 p.m. UTC | #1
On 4/17/19 3:53 PM, Maxim Levitsky wrote:
> Phase bits are only set by the hardware to indicate new completions
> and not by the device driver.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/nvme.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 0684bbd077..2d208000df 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
>          qemu_mutex_lock(&q->lock);
>          c->cid = cpu_to_le16(0);
>          q->inflight--;
> -        /* Flip Phase Tag bit. */
> -        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
>          progress = true;
>      }
>      if (progress) {
> 

Since you've not got much traction on this and you've pinged a v2, can
you point me to a spec or a reproducer that illustrates the problem?

(Or wait for more NVME knowledgeable people to give you a review...!)
Maxim Levitsky June 5, 2019, 7:47 a.m. UTC | #2
On Mon, 2019-06-03 at 18:25 -0400, John Snow wrote:
> 
> On 4/17/19 3:53 PM, Maxim Levitsky wrote:
> > Phase bits are only set by the hardware to indicate new completions
> > and not by the device driver.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/nvme.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index 0684bbd077..2d208000df 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
> >          qemu_mutex_lock(&q->lock);
> >          c->cid = cpu_to_le16(0);
> >          q->inflight--;
> > -        /* Flip Phase Tag bit. */
> > -        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
> >          progress = true;
> >      }
> >      if (progress) {
> > 
> 
> Since you've not got much traction on this and you've pinged a v2, can
> you point me to a spec or a reproducer that illustrates the problem?
> 
> (Or wait for more NVME knowledgeable people to give you a review...!)

"A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion
Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value
in memory. The controller may generate an interrupt to the host to indicate that one or more Completion
Queue entries have been posted."



Best regards,
	Maxim Levitsky
John Snow June 6, 2019, 9:23 p.m. UTC | #3
On 6/5/19 3:47 AM, Maxim Levitsky wrote:
> On Mon, 2019-06-03 at 18:25 -0400, John Snow wrote:
>>
>> On 4/17/19 3:53 PM, Maxim Levitsky wrote:
>>> Phase bits are only set by the hardware to indicate new completions
>>> and not by the device driver.
>>>
>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>> ---
>>>  block/nvme.c | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/block/nvme.c b/block/nvme.c
>>> index 0684bbd077..2d208000df 100644
>>> --- a/block/nvme.c
>>> +++ b/block/nvme.c
>>> @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
>>>          qemu_mutex_lock(&q->lock);
>>>          c->cid = cpu_to_le16(0);
>>>          q->inflight--;
>>> -        /* Flip Phase Tag bit. */
>>> -        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
>>>          progress = true;
>>>      }
>>>      if (progress) {
>>>
>>
>> Since you've not got much traction on this and you've pinged a v2, can
>> you point me to a spec or a reproducer that illustrates the problem?
>>
>> (Or wait for more NVME knowledgeable people to give you a review...!)
> 
> "A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion
> Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value
> in memory. The controller may generate an interrupt to the host to indicate that one or more Completion
> Queue entries have been posted."
> 

In the future, please reference the sections in your commit messages
when relevant:

NVM Express 1.3, Section 4.1 "Submission Queue & Completion Queue
Definition"

I also found 4.6 "Completion Queue Entry" to be informative; especially
Figure 28 which defines the phase bit.

So: This looks right; does this fix a bug that can be observed? Do we
have any regression tests for block/NVMe?

--js

> 
> 
> Best regards,
> 	Maxim Levitsky
>
Paolo Bonzini June 7, 2019, 11:08 a.m. UTC | #4
On 06/06/19 23:23, John Snow wrote:
> So: This looks right; does this fix a bug that can be observed? Do we
> have any regression tests for block/NVMe?

I don't think it fixes a bug; by the time the CQ entry is picked up by
QEMU, the device is not supposed to touch it anymore.

However, the idea behind the phase bits is that you can decide whether
the driver has placed a completion in the queue.  When we get here, we have

	le16_to_cpu(c->status) & 0x1) == !q->cq_phase

On the next pass through the ring buffer q->cq_phase will be flipped,
and thus when we see this element we'll get

	le16_to_cpu(c->status) & 0x1) == q->cq_phase

and not process it.  Since block/nvme.c flips the bit, this mechanism
does not work and the loop termination relies on the other part of the
condition, "if (!c->cid) break;".

So the patch is correct, but it would also be nice to also either remove
phase handling altogether, or check that the phase handling works
properly and drop the !c->cid test.

Paolo
John Snow June 7, 2019, 7:28 p.m. UTC | #5
On 6/7/19 7:08 AM, Paolo Bonzini wrote:
> On 06/06/19 23:23, John Snow wrote:
>> So: This looks right; does this fix a bug that can be observed? Do we
>> have any regression tests for block/NVMe?
> 
> I don't think it fixes a bug; by the time the CQ entry is picked up by
> QEMU, the device is not supposed to touch it anymore.
> 
> However, the idea behind the phase bits is that you can decide whether
> the driver has placed a completion in the queue.  When we get here, we have
> 
> 	le16_to_cpu(c->status) & 0x1) == !q->cq_phase
> 
> On the next pass through the ring buffer q->cq_phase will be flipped,
> and thus when we see this element we'll get
> 
> 	le16_to_cpu(c->status) & 0x1) == q->cq_phase
> 
> and not process it.  Since block/nvme.c flips the bit, this mechanism
> does not work and the loop termination relies on the other part of the
> condition, "if (!c->cid) break;".
> 
> So the patch is correct, but it would also be nice to also either remove
> phase handling altogether, or check that the phase handling works
> properly and drop the !c->cid test.
> 
> Paolo
> 

Gotcha, I see, that's why it doesn't cause problems. Thanks :)

--js
Maxim Levitsky June 11, 2019, 8:50 a.m. UTC | #6
On Fri, 2019-06-07 at 15:28 -0400, John Snow wrote:
> 
> On 6/7/19 7:08 AM, Paolo Bonzini wrote:
> > On 06/06/19 23:23, John Snow wrote:
> > > So: This looks right; does this fix a bug that can be observed? Do we
> > > have any regression tests for block/NVMe?
> > 
> > I don't think it fixes a bug; by the time the CQ entry is picked up by
> > QEMU, the device is not supposed to touch it anymore.
> > 
> > However, the idea behind the phase bits is that you can decide whether
> > the driver has placed a completion in the queue.  When we get here, we have
> > 
> > 	le16_to_cpu(c->status) & 0x1) == !q->cq_phase
> > 
> > On the next pass through the ring buffer q->cq_phase will be flipped,
> > and thus when we see this element we'll get
> > 
> > 	le16_to_cpu(c->status) & 0x1) == q->cq_phase
> > 
> > and not process it.  Since block/nvme.c flips the bit, this mechanism
> > does not work and the loop termination relies on the other part of the
> > condition, "if (!c->cid) break;".
> > 
> > So the patch is correct, but it would also be nice to also either remove
> > phase handling altogether, or check that the phase handling works
> > properly and drop the !c->cid test.
> > 
> > Paolo


I agree with that and I'll send an updated patch soon.

The driver should not touch the completion entries at all, but rather just scan for the entries whose
phase bit was flipped by the hardware.

in fact I don't even think that the 'c->cid' became the exit condition, but rather since the device is not allowed 
to fully fill the compleiton queue (it must alway keep at least one free entry there), the end condition would still
be the check on the flipped phase bit.


I'll fix that to be up to the spec,

Best regards,
	Maxim Levitskky
diff mbox series

Patch

diff --git a/block/nvme.c b/block/nvme.c
index 0684bbd077..2d208000df 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -340,8 +340,6 @@  static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
         qemu_mutex_lock(&q->lock);
         c->cid = cpu_to_le16(0);
         q->inflight--;
-        /* Flip Phase Tag bit. */
-        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
         progress = true;
     }
     if (progress) {