Message ID | 20190417195355.16123-2-mlevitsk@redhat.com |
---|---|
State | New |
Headers | show |
Series | Few fixes for userspace NVME driver | expand |
On 4/17/19 3:53 PM, Maxim Levitsky wrote: > Phase bits are only set by the hardware to indicate new completions > and not by the device driver. > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > block/nvme.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 0684bbd077..2d208000df 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q) > qemu_mutex_lock(&q->lock); > c->cid = cpu_to_le16(0); > q->inflight--; > - /* Flip Phase Tag bit. */ > - c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1); > progress = true; > } > if (progress) { > Since you've not got much traction on this and you've pinged a v2, can you point me to a spec or a reproducer that illustrates the problem? (Or wait for more NVME knowledgeable people to give you a review...!)
On Mon, 2019-06-03 at 18:25 -0400, John Snow wrote: > > On 4/17/19 3:53 PM, Maxim Levitsky wrote: > > Phase bits are only set by the hardware to indicate new completions > > and not by the device driver. > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > block/nvme.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/block/nvme.c b/block/nvme.c > > index 0684bbd077..2d208000df 100644 > > --- a/block/nvme.c > > +++ b/block/nvme.c > > @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q) > > qemu_mutex_lock(&q->lock); > > c->cid = cpu_to_le16(0); > > q->inflight--; > > - /* Flip Phase Tag bit. */ > > - c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1); > > progress = true; > > } > > if (progress) { > > > > Since you've not got much traction on this and you've pinged a v2, can > you point me to a spec or a reproducer that illustrates the problem? > > (Or wait for more NVME knowledgeable people to give you a review...!) "A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value in memory. The controller may generate an interrupt to the host to indicate that one or more Completion Queue entries have been posted." Best regards, Maxim Levitsky
On 6/5/19 3:47 AM, Maxim Levitsky wrote: > On Mon, 2019-06-03 at 18:25 -0400, John Snow wrote: >> >> On 4/17/19 3:53 PM, Maxim Levitsky wrote: >>> Phase bits are only set by the hardware to indicate new completions >>> and not by the device driver. >>> >>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> >>> --- >>> block/nvme.c | 2 -- >>> 1 file changed, 2 deletions(-) >>> >>> diff --git a/block/nvme.c b/block/nvme.c >>> index 0684bbd077..2d208000df 100644 >>> --- a/block/nvme.c >>> +++ b/block/nvme.c >>> @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q) >>> qemu_mutex_lock(&q->lock); >>> c->cid = cpu_to_le16(0); >>> q->inflight--; >>> - /* Flip Phase Tag bit. */ >>> - c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1); >>> progress = true; >>> } >>> if (progress) { >>> >> >> Since you've not got much traction on this and you've pinged a v2, can >> you point me to a spec or a reproducer that illustrates the problem? >> >> (Or wait for more NVME knowledgeable people to give you a review...!) > > "A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion > Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value > in memory. The controller may generate an interrupt to the host to indicate that one or more Completion > Queue entries have been posted." > In the future, please reference the sections in your commit messages when relevant: NVM Express 1.3, Section 4.1 "Submission Queue & Completion Queue Definition" I also found 4.6 "Completion Queue Entry" to be informative; especially Figure 28 which defines the phase bit. So: This looks right; does this fix a bug that can be observed? Do we have any regression tests for block/NVMe? --js > > > Best regards, > Maxim Levitsky >
On 06/06/19 23:23, John Snow wrote: > So: This looks right; does this fix a bug that can be observed? Do we > have any regression tests for block/NVMe? I don't think it fixes a bug; by the time the CQ entry is picked up by QEMU, the device is not supposed to touch it anymore. However, the idea behind the phase bits is that you can decide whether the driver has placed a completion in the queue. When we get here, we have le16_to_cpu(c->status) & 0x1) == !q->cq_phase On the next pass through the ring buffer q->cq_phase will be flipped, and thus when we see this element we'll get le16_to_cpu(c->status) & 0x1) == q->cq_phase and not process it. Since block/nvme.c flips the bit, this mechanism does not work and the loop termination relies on the other part of the condition, "if (!c->cid) break;". So the patch is correct, but it would also be nice to also either remove phase handling altogether, or check that the phase handling works properly and drop the !c->cid test. Paolo
On 6/7/19 7:08 AM, Paolo Bonzini wrote: > On 06/06/19 23:23, John Snow wrote: >> So: This looks right; does this fix a bug that can be observed? Do we >> have any regression tests for block/NVMe? > > I don't think it fixes a bug; by the time the CQ entry is picked up by > QEMU, the device is not supposed to touch it anymore. > > However, the idea behind the phase bits is that you can decide whether > the driver has placed a completion in the queue. When we get here, we have > > le16_to_cpu(c->status) & 0x1) == !q->cq_phase > > On the next pass through the ring buffer q->cq_phase will be flipped, > and thus when we see this element we'll get > > le16_to_cpu(c->status) & 0x1) == q->cq_phase > > and not process it. Since block/nvme.c flips the bit, this mechanism > does not work and the loop termination relies on the other part of the > condition, "if (!c->cid) break;". > > So the patch is correct, but it would also be nice to also either remove > phase handling altogether, or check that the phase handling works > properly and drop the !c->cid test. > > Paolo > Gotcha, I see, that's why it doesn't cause problems. Thanks :) --js
On Fri, 2019-06-07 at 15:28 -0400, John Snow wrote: > > On 6/7/19 7:08 AM, Paolo Bonzini wrote: > > On 06/06/19 23:23, John Snow wrote: > > > So: This looks right; does this fix a bug that can be observed? Do we > > > have any regression tests for block/NVMe? > > > > I don't think it fixes a bug; by the time the CQ entry is picked up by > > QEMU, the device is not supposed to touch it anymore. > > > > However, the idea behind the phase bits is that you can decide whether > > the driver has placed a completion in the queue. When we get here, we have > > > > le16_to_cpu(c->status) & 0x1) == !q->cq_phase > > > > On the next pass through the ring buffer q->cq_phase will be flipped, > > and thus when we see this element we'll get > > > > le16_to_cpu(c->status) & 0x1) == q->cq_phase > > > > and not process it. Since block/nvme.c flips the bit, this mechanism > > does not work and the loop termination relies on the other part of the > > condition, "if (!c->cid) break;". > > > > So the patch is correct, but it would also be nice to also either remove > > phase handling altogether, or check that the phase handling works > > properly and drop the !c->cid test. > > > > Paolo I agree with that and I'll send an updated patch soon. The driver should not touch the completion entries at all, but rather just scan for the entries whose phase bit was flipped by the hardware. in fact I don't even think that the 'c->cid' became the exit condition, but rather since the device is not allowed to fully fill the compleiton queue (it must alway keep at least one free entry there), the end condition would still be the check on the flipped phase bit. I'll fix that to be up to the spec, Best regards, Maxim Levitskky
diff --git a/block/nvme.c b/block/nvme.c index 0684bbd077..2d208000df 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q) qemu_mutex_lock(&q->lock); c->cid = cpu_to_le16(0); q->inflight--; - /* Flip Phase Tag bit. */ - c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1); progress = true; } if (progress) {
Phase bits are only set by the hardware to indicate new completions and not by the device driver. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- block/nvme.c | 2 -- 1 file changed, 2 deletions(-)