diff mbox series

[12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

Message ID 20211007162406.1920374-13-lukasz.maniak@linux.intel.com
State New
Headers show
Series hw/nvme: SR-IOV with Virtualization Enhancements | expand

Commit Message

Lukasz Maniak Oct. 7, 2021, 4:24 p.m. UTC
From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>

With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
can configure the maximum number of virtual queues and interrupts
assignable to a single virtual device. The primary and secondary
controller capability structures are initialized accordingly.

Since the number of available queues (interrupts) now varies between
VF/PF, BAR size calculation is also adjusted.

Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
---
 hw/nvme/ctrl.c       | 110 +++++++++++++++++++++++++++++++++++++++----
 hw/nvme/nvme.h       |   2 +
 include/block/nvme.h |   5 ++
 3 files changed, 108 insertions(+), 9 deletions(-)

Comments

Klaus Jensen Nov. 3, 2021, 12:07 p.m. UTC | #1
On Oct  7 18:24, Lukasz Maniak wrote:
> From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> 
> With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> can configure the maximum number of virtual queues and interrupts
> assignable to a single virtual device. The primary and secondary
> controller capability structures are initialized accordingly.
> 
> Since the number of available queues (interrupts) now varies between
> VF/PF, BAR size calculation is also adjusted.
> 

While this patch allows configuring the VQFRSM and VIFRSM fields, it
implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
bound and this removes a testable case for host software (e.g.
requesting more flexible resources than what is currently available).

This patch also requires that these parameters are set if sriov_max_vfs
is. I think we can provide better defaults.

How about,

1. if only sriov_max_vfs is set, then all VFs get private resources
   equal to max_ioqpairs. Like before this patch. This limits the number
   of parameters required to get a basic setup going.

2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
   10), the difference between that and max_ioqpairs become flexible
   resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
   instead and just make the difference become private resources.
   Potato/potato.

   a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
      of calculated flexible resources.

This probably smells a bit like bikeshedding, but I think this gives
more flexibility and better defaults, which helps with verifying host
software.

If we can't agree on this now, I suggest we could go ahead and merge the
base functionality (i.e. private resources only) and ruminate some more
about these parameters.
Łukasz Gieryk Nov. 4, 2021, 3:48 p.m. UTC | #2
On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> On Oct  7 18:24, Lukasz Maniak wrote:
> > From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> > 
> > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> > can configure the maximum number of virtual queues and interrupts
> > assignable to a single virtual device. The primary and secondary
> > controller capability structures are initialized accordingly.
> > 
> > Since the number of available queues (interrupts) now varies between
> > VF/PF, BAR size calculation is also adjusted.
> > 
> 
> While this patch allows configuring the VQFRSM and VIFRSM fields, it
> implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
> sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> bound and this removes a testable case for host software (e.g.
> requesting more flexible resources than what is currently available).
> 
> This patch also requires that these parameters are set if sriov_max_vfs
> is. I think we can provide better defaults.
> 

Originally I considered more params, but ended up coding the simplest,
user-friendly solution, because I did not like the mess with so many
parameters, and the flexibility wasn't needed for my use cases. But I do
agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
valid and resembles an actual device.

> How about,
> 
> 1. if only sriov_max_vfs is set, then all VFs get private resources
>    equal to max_ioqpairs. Like before this patch. This limits the number
>    of parameters required to get a basic setup going.
> 
> 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
>    10), the difference between that and max_ioqpairs become flexible
>    resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
>    instead and just make the difference become private resources.
>    Potato/potato.
> 
>    a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
>       of calculated flexible resources.
> 
> This probably smells a bit like bikeshedding, but I think this gives
> more flexibility and better defaults, which helps with verifying host
> software.
> 
> If we can't agree on this now, I suggest we could go ahead and merge the
> base functionality (i.e. private resources only) and ruminate some more
> about these parameters.

The problem is that the spec allows VFs to support either only private,
or only flexible resources.

At this point I have to admit, that since my use cases for
QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
attention to the case with VFs having private resources. So this SR/IOV
implementation doesn’t even support such case (max_vX_per_vf != 0).

Let me summarize the possible config space, and how the current
parameters (could) map to these (interrupt-related ones omitted):

Flexible resources not supported (not implemented):
 - Private resources for PF     = max_ioqpairs
 - Private resources per VF     = ?
 - (error if flexible resources are configured)

With flexible resources:
 - VQPRT, private resources for PF      = max_ioqpairs
 - VQFRT, total flexible resources      = max_vq_per_vf * num_vfs
 - VQFRSM, maximum assignable per VF    = max_vq_per_vf
 - VQGRAN, granularity                  = #define constant
 - (error if private resources per VF are configured)

Since I don’t want to misunderstand your suggestion: could you provide a
similar map with your parameters, formulas, and explain how to determine
if flexible resources are active? I want to be sure we are on the
same page.
Łukasz Gieryk Nov. 5, 2021, 8:46 a.m. UTC | #3
On Thu, Nov 04, 2021 at 04:48:43PM +0100, Łukasz Gieryk wrote:
> On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> > On Oct  7 18:24, Lukasz Maniak wrote:
> > > From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> > > 
> > > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> > > can configure the maximum number of virtual queues and interrupts
> > > assignable to a single virtual device. The primary and secondary
> > > controller capability structures are initialized accordingly.
> > > 
> > > Since the number of available queues (interrupts) now varies between
> > > VF/PF, BAR size calculation is also adjusted.
> > > 
> > 
> > While this patch allows configuring the VQFRSM and VIFRSM fields, it
> > implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
> > sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> > bound and this removes a testable case for host software (e.g.
> > requesting more flexible resources than what is currently available).
> > 
> > This patch also requires that these parameters are set if sriov_max_vfs
> > is. I think we can provide better defaults.
> > 
> 
> Originally I considered more params, but ended up coding the simplest,
> user-friendly solution, because I did not like the mess with so many
> parameters, and the flexibility wasn't needed for my use cases. But I do
> agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
> valid and resembles an actual device.
> 
> > How about,
> > 
> > 1. if only sriov_max_vfs is set, then all VFs get private resources
> >    equal to max_ioqpairs. Like before this patch. This limits the number
> >    of parameters required to get a basic setup going.
> > 
> > 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
> >    10), the difference between that and max_ioqpairs become flexible
> >    resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
> >    instead and just make the difference become private resources.
> >    Potato/potato.
> > 
> >    a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
> >       of calculated flexible resources.
> > 
> > This probably smells a bit like bikeshedding, but I think this gives
> > more flexibility and better defaults, which helps with verifying host
> > software.
> > 
> > If we can't agree on this now, I suggest we could go ahead and merge the
> > base functionality (i.e. private resources only) and ruminate some more
> > about these parameters.
> 
> The problem is that the spec allows VFs to support either only private,
> or only flexible resources.
> 
> At this point I have to admit, that since my use cases for
> QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
> attention to the case with VFs having private resources. So this SR/IOV
> implementation doesn’t even support such case (max_vX_per_vf != 0).
> 
> Let me summarize the possible config space, and how the current
> parameters (could) map to these (interrupt-related ones omitted):
> 
> Flexible resources not supported (not implemented):
>  - Private resources for PF     = max_ioqpairs
>  - Private resources per VF     = ?
>  - (error if flexible resources are configured)
> 
> With flexible resources:
>  - VQPRT, private resources for PF      = max_ioqpairs
>  - VQFRT, total flexible resources      = max_vq_per_vf * num_vfs
>  - VQFRSM, maximum assignable per VF    = max_vq_per_vf
>  - VQGRAN, granularity                  = #define constant
>  - (error if private resources per VF are configured)
> 
> Since I don’t want to misunderstand your suggestion: could you provide a
> similar map with your parameters, formulas, and explain how to determine
> if flexible resources are active? I want to be sure we are on the
> same page.
> 

I’ve just re-read through my email and decided that some bits need
clarification.

This implementation supports the “Flexible”-resources-only flavor of
SR/IOV, while the “Private” also could be supported. Some effort is
required to support both, and I cannot afford that (at least I cannot
commit today, neither the other Lukasz).

While I’m ready to rework the Flexible config and prepare it to be
extended later to handle the Private variant, the 2nd version of these
patches will still support the Flexible flavor only.

I will include appropriate TODO/open in the next cover letter.
Łukasz Gieryk Nov. 5, 2021, 2:04 p.m. UTC | #4
On Fri, Nov 05, 2021 at 09:46:28AM +0100, Łukasz Gieryk wrote:
> On Thu, Nov 04, 2021 at 04:48:43PM +0100, Łukasz Gieryk wrote:
> > On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> > > On Oct  7 18:24, Lukasz Maniak wrote:
> > > > From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> > > > 
> > > > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> > > > can configure the maximum number of virtual queues and interrupts
> > > > assignable to a single virtual device. The primary and secondary
> > > > controller capability structures are initialized accordingly.
> > > > 
> > > > Since the number of available queues (interrupts) now varies between
> > > > VF/PF, BAR size calculation is also adjusted.
> > > > 
> > > 
> > > While this patch allows configuring the VQFRSM and VIFRSM fields, it
> > > implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
> > > sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> > > bound and this removes a testable case for host software (e.g.
> > > requesting more flexible resources than what is currently available).
> > > 
> > > This patch also requires that these parameters are set if sriov_max_vfs
> > > is. I think we can provide better defaults.
> > > 
> > 
> > Originally I considered more params, but ended up coding the simplest,
> > user-friendly solution, because I did not like the mess with so many
> > parameters, and the flexibility wasn't needed for my use cases. But I do
> > agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
> > valid and resembles an actual device.
> > 
> > > How about,
> > > 
> > > 1. if only sriov_max_vfs is set, then all VFs get private resources
> > >    equal to max_ioqpairs. Like before this patch. This limits the number
> > >    of parameters required to get a basic setup going.
> > > 
> > > 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
> > >    10), the difference between that and max_ioqpairs become flexible
> > >    resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
> > >    instead and just make the difference become private resources.
> > >    Potato/potato.
> > > 
> > >    a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
> > >       of calculated flexible resources.
> > > 
> > > This probably smells a bit like bikeshedding, but I think this gives
> > > more flexibility and better defaults, which helps with verifying host
> > > software.
> > > 
> > > If we can't agree on this now, I suggest we could go ahead and merge the
> > > base functionality (i.e. private resources only) and ruminate some more
> > > about these parameters.
> > 
> > The problem is that the spec allows VFs to support either only private,
> > or only flexible resources.
> > 
> > At this point I have to admit, that since my use cases for
> > QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
> > attention to the case with VFs having private resources. So this SR/IOV
> > implementation doesn’t even support such case (max_vX_per_vf != 0).
> > 
> > Let me summarize the possible config space, and how the current
> > parameters (could) map to these (interrupt-related ones omitted):
> > 
> > Flexible resources not supported (not implemented):
> >  - Private resources for PF     = max_ioqpairs
> >  - Private resources per VF     = ?
> >  - (error if flexible resources are configured)
> > 
> > With flexible resources:
> >  - VQPRT, private resources for PF      = max_ioqpairs
> >  - VQFRT, total flexible resources      = max_vq_per_vf * num_vfs
> >  - VQFRSM, maximum assignable per VF    = max_vq_per_vf
> >  - VQGRAN, granularity                  = #define constant
> >  - (error if private resources per VF are configured)
> > 
> > Since I don’t want to misunderstand your suggestion: could you provide a
> > similar map with your parameters, formulas, and explain how to determine
> > if flexible resources are active? I want to be sure we are on the
> > same page.
> > 
> 
> I’ve just re-read through my email and decided that some bits need
> clarification.
> 
> This implementation supports the “Flexible”-resources-only flavor of
> SR/IOV, while the “Private” also could be supported. Some effort is
> required to support both, and I cannot afford that (at least I cannot
> commit today, neither the other Lukasz).
> 
> While I’m ready to rework the Flexible config and prepare it to be
> extended later to handle the Private variant, the 2nd version of these
> patches will still support the Flexible flavor only.
> 
> I will include appropriate TODO/open in the next cover letter.
> 

The summary of my thoughts, so far:
- I'm going to introduce sriov_v{q,i}_flexible and better defaults,
  according to your suggestion (as far as I understand your intentions,
  please correct me if I've missed something).
- The Private SR/IOV flavor, if it's ever implemented, could introduce
  sriov_vq_private_per_vf.
- The updated formulas are listed below.

Flexible resources not supported (not implemented):
 - Private resources for PF     = max_ioqpairs
 - Private resources per VF     = sriov_vq_private_per_vf
 - (error if sriov_vq_flexible is set)

With flexible resources:
 - VQPRT, private resources for PF      = max_ioqpairs - sriov_vq_flexible
 - VQFRT, total flexible resources      = sriov_vq_flexible (if set, or)
                                          VQPRT * num_vfs
 - VQFRSM, maximum assignable per VF    = sriov_max_vq_per_vf (if set, or)
                                          VQPRT
 - VQGRAN, granularity                  = #define constant
 - (error if sriov_vq_private_per_vf is set)

Is this version acceptable?
Klaus Jensen Nov. 8, 2021, 8:25 a.m. UTC | #5
On Nov  5 15:04, Łukasz Gieryk wrote:
> On Fri, Nov 05, 2021 at 09:46:28AM +0100, Łukasz Gieryk wrote:
> > On Thu, Nov 04, 2021 at 04:48:43PM +0100, Łukasz Gieryk wrote:
> > > On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> > > > On Oct  7 18:24, Lukasz Maniak wrote:
> > > > > From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> > > > > 
> > > > > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> > > > > can configure the maximum number of virtual queues and interrupts
> > > > > assignable to a single virtual device. The primary and secondary
> > > > > controller capability structures are initialized accordingly.
> > > > > 
> > > > > Since the number of available queues (interrupts) now varies between
> > > > > VF/PF, BAR size calculation is also adjusted.
> > > > > 
> > > > 
> > > > While this patch allows configuring the VQFRSM and VIFRSM fields, it
> > > > implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
> > > > sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> > > > bound and this removes a testable case for host software (e.g.
> > > > requesting more flexible resources than what is currently available).
> > > > 
> > > > This patch also requires that these parameters are set if sriov_max_vfs
> > > > is. I think we can provide better defaults.
> > > > 
> > > 
> > > Originally I considered more params, but ended up coding the simplest,
> > > user-friendly solution, because I did not like the mess with so many
> > > parameters, and the flexibility wasn't needed for my use cases. But I do
> > > agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
> > > valid and resembles an actual device.
> > > 
> > > > How about,
> > > > 
> > > > 1. if only sriov_max_vfs is set, then all VFs get private resources
> > > >    equal to max_ioqpairs. Like before this patch. This limits the number
> > > >    of parameters required to get a basic setup going.
> > > > 
> > > > 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
> > > >    10), the difference between that and max_ioqpairs become flexible
> > > >    resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
> > > >    instead and just make the difference become private resources.
> > > >    Potato/potato.
> > > > 
> > > >    a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
> > > >       of calculated flexible resources.
> > > > 
> > > > This probably smells a bit like bikeshedding, but I think this gives
> > > > more flexibility and better defaults, which helps with verifying host
> > > > software.
> > > > 
> > > > If we can't agree on this now, I suggest we could go ahead and merge the
> > > > base functionality (i.e. private resources only) and ruminate some more
> > > > about these parameters.
> > > 
> > > The problem is that the spec allows VFs to support either only private,
> > > or only flexible resources.
> > > 
> > > At this point I have to admit, that since my use cases for
> > > QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
> > > attention to the case with VFs having private resources. So this SR/IOV
> > > implementation doesn’t even support such case (max_vX_per_vf != 0).
> > > 
> > > Let me summarize the possible config space, and how the current
> > > parameters (could) map to these (interrupt-related ones omitted):
> > > 
> > > Flexible resources not supported (not implemented):
> > >  - Private resources for PF     = max_ioqpairs
> > >  - Private resources per VF     = ?
> > >  - (error if flexible resources are configured)
> > > 
> > > With flexible resources:
> > >  - VQPRT, private resources for PF      = max_ioqpairs
> > >  - VQFRT, total flexible resources      = max_vq_per_vf * num_vfs
> > >  - VQFRSM, maximum assignable per VF    = max_vq_per_vf
> > >  - VQGRAN, granularity                  = #define constant
> > >  - (error if private resources per VF are configured)
> > > 
> > > Since I don’t want to misunderstand your suggestion: could you provide a
> > > similar map with your parameters, formulas, and explain how to determine
> > > if flexible resources are active? I want to be sure we are on the
> > > same page.
> > > 
> > 
> > I’ve just re-read through my email and decided that some bits need
> > clarification.
> > 
> > This implementation supports the “Flexible”-resources-only flavor of
> > SR/IOV, while the “Private” also could be supported. Some effort is
> > required to support both, and I cannot afford that (at least I cannot
> > commit today, neither the other Lukasz).
> > 
> > While I’m ready to rework the Flexible config and prepare it to be
> > extended later to handle the Private variant, the 2nd version of these
> > patches will still support the Flexible flavor only.
> > 
> > I will include appropriate TODO/open in the next cover letter.
> > 
> 
> The summary of my thoughts, so far:
> - I'm going to introduce sriov_v{q,i}_flexible and better defaults,
>   according to your suggestion (as far as I understand your intentions,
>   please correct me if I've missed something).
> - The Private SR/IOV flavor, if it's ever implemented, could introduce
>   sriov_vq_private_per_vf.
> - The updated formulas are listed below.
> 
> Flexible resources not supported (not implemented):
>  - Private resources for PF     = max_ioqpairs
>  - Private resources per VF     = sriov_vq_private_per_vf

I would just keep it simple and say, if sriov_v{q,i}_flexible is not
set, then each VF gets max_ioqpairs private resources.

>  - (error if sriov_vq_flexible is set)
> 
> With flexible resources:
>  - VQPRT, private resources for PF      = max_ioqpairs - sriov_vq_flexible
>  - VQFRT, total flexible resources      = sriov_vq_flexible (if set, or)
>                                           VQPRT * num_vfs
>  - VQFRSM, maximum assignable per VF    = sriov_max_vq_per_vf (if set, or)
>                                           VQPRT

You mean VQFRT here, right?

>  - VQGRAN, granularity                  = #define constant

Yeah, 1 seems pretty reasonable here.

>  - (error if sriov_vq_private_per_vf is set)
> 
> Is this version acceptable?
> 

Sounds good to me. The only one I am not too happy about is the default
of VQPRT * num_vfs. (i.e. max_ioqpairs * num_vfs) when vq_flexible is
not set. I think this is the case where we should default to private
resources. If you don't want to work with private resources right now,
can we instead have it bug out and complain that sriov_vq_flexible must
be set? We can then later lift that restriction and implement private
resources.
Łukasz Gieryk Nov. 8, 2021, 1:57 p.m. UTC | #6
On Mon, Nov 08, 2021 at 09:25:58AM +0100, Klaus Jensen wrote:
> On Nov  5 15:04, Łukasz Gieryk wrote:
> > On Fri, Nov 05, 2021 at 09:46:28AM +0100, Łukasz Gieryk wrote:
> > > On Thu, Nov 04, 2021 at 04:48:43PM +0100, Łukasz Gieryk wrote:
> > > > On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> > > > > On Oct  7 18:24, Lukasz Maniak wrote:
> > > > > > From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> > > > > > 
> > > > > > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> > > > > > can configure the maximum number of virtual queues and interrupts
> > > > > > assignable to a single virtual device. The primary and secondary
> > > > > > controller capability structures are initialized accordingly.
> > > > > > 
> > > > > > Since the number of available queues (interrupts) now varies between
> > > > > > VF/PF, BAR size calculation is also adjusted.
> > > > > > 
> > > > > 
> > > > > While this patch allows configuring the VQFRSM and VIFRSM fields, it
> > > > > implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
> > > > > sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> > > > > bound and this removes a testable case for host software (e.g.
> > > > > requesting more flexible resources than what is currently available).
> > > > > 
> > > > > This patch also requires that these parameters are set if sriov_max_vfs
> > > > > is. I think we can provide better defaults.
> > > > > 
> > > > 
> > > > Originally I considered more params, but ended up coding the simplest,
> > > > user-friendly solution, because I did not like the mess with so many
> > > > parameters, and the flexibility wasn't needed for my use cases. But I do
> > > > agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
> > > > valid and resembles an actual device.
> > > > 
> > > > > How about,
> > > > > 
> > > > > 1. if only sriov_max_vfs is set, then all VFs get private resources
> > > > >    equal to max_ioqpairs. Like before this patch. This limits the number
> > > > >    of parameters required to get a basic setup going.
> > > > > 
> > > > > 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
> > > > >    10), the difference between that and max_ioqpairs become flexible
> > > > >    resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
> > > > >    instead and just make the difference become private resources.
> > > > >    Potato/potato.
> > > > > 
> > > > >    a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
> > > > >       of calculated flexible resources.
> > > > > 
> > > > > This probably smells a bit like bikeshedding, but I think this gives
> > > > > more flexibility and better defaults, which helps with verifying host
> > > > > software.
> > > > > 
> > > > > If we can't agree on this now, I suggest we could go ahead and merge the
> > > > > base functionality (i.e. private resources only) and ruminate some more
> > > > > about these parameters.
> > > > 
> > > > The problem is that the spec allows VFs to support either only private,
> > > > or only flexible resources.
> > > > 
> > > > At this point I have to admit, that since my use cases for
> > > > QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
> > > > attention to the case with VFs having private resources. So this SR/IOV
> > > > implementation doesn’t even support such case (max_vX_per_vf != 0).
> > > > 
> > > > Let me summarize the possible config space, and how the current
> > > > parameters (could) map to these (interrupt-related ones omitted):
> > > > 
> > > > Flexible resources not supported (not implemented):
> > > >  - Private resources for PF     = max_ioqpairs
> > > >  - Private resources per VF     = ?
> > > >  - (error if flexible resources are configured)
> > > > 
> > > > With flexible resources:
> > > >  - VQPRT, private resources for PF      = max_ioqpairs
> > > >  - VQFRT, total flexible resources      = max_vq_per_vf * num_vfs
> > > >  - VQFRSM, maximum assignable per VF    = max_vq_per_vf
> > > >  - VQGRAN, granularity                  = #define constant
> > > >  - (error if private resources per VF are configured)
> > > > 
> > > > Since I don’t want to misunderstand your suggestion: could you provide a
> > > > similar map with your parameters, formulas, and explain how to determine
> > > > if flexible resources are active? I want to be sure we are on the
> > > > same page.
> > > > 
> > > 
> > > I’ve just re-read through my email and decided that some bits need
> > > clarification.
> > > 
> > > This implementation supports the “Flexible”-resources-only flavor of
> > > SR/IOV, while the “Private” also could be supported. Some effort is
> > > required to support both, and I cannot afford that (at least I cannot
> > > commit today, neither the other Lukasz).
> > > 
> > > While I’m ready to rework the Flexible config and prepare it to be
> > > extended later to handle the Private variant, the 2nd version of these
> > > patches will still support the Flexible flavor only.
> > > 
> > > I will include appropriate TODO/open in the next cover letter.
> > > 
> > 
> > The summary of my thoughts, so far:
> > - I'm going to introduce sriov_v{q,i}_flexible and better defaults,
> >   according to your suggestion (as far as I understand your intentions,
> >   please correct me if I've missed something).
> > - The Private SR/IOV flavor, if it's ever implemented, could introduce
> >   sriov_vq_private_per_vf.
> > - The updated formulas are listed below.
> > 
> > Flexible resources not supported (not implemented):
> >  - Private resources for PF     = max_ioqpairs
> >  - Private resources per VF     = sriov_vq_private_per_vf
> 
> I would just keep it simple and say, if sriov_v{q,i}_flexible is not
> set, then each VF gets max_ioqpairs private resources.
> 

Since you did request more tuning knobs for the Flexible variant, the
Private one should follow that and allow full configuration. A device
where PF.priv=64 and each VF.priv=4 makes sense, and I couldn’t
configure it if sriov_v{q,i}_flexible=0 enabled the Private mode.

> >  - (error if sriov_vq_flexible is set)
> > 
> > With flexible resources:
> >  - VQPRT, private resources for PF      = max_ioqpairs - sriov_vq_flexible
> >  - VQFRT, total flexible resources      = sriov_vq_flexible (if set, or)
> >                                           VQPRT * num_vfs
> >  - VQFRSM, maximum assignable per VF    = sriov_max_vq_per_vf (if set, or)
> >                                           VQPRT
> 
> You mean VQFRT here, right?
> 

VQPRT is right, and – in my opinion – makes a better default than VQFRT.

E.g., configuring a device:

(max_vfs=32, PF.priv=VQPRT=X, PF.flex_total=VQFRT=256)

as (num_vfs=1, VF0.flex=256) doesn’t make much sense. Virtualization is
not needed in such case, and user should probably use PF directly. On
the other hand, VQPRT is probably tuned to offer most (if not all) of
the performance and functionality; thus serves as a sane default.

> >  - VQGRAN, granularity                  = #define constant
> 
> Yeah, 1 seems pretty reasonable here.
> 
> >  - (error if sriov_vq_private_per_vf is set)
> > 
> > Is this version acceptable?
> > 
> 
> Sounds good to me. The only one I am not too happy about is the default
> of VQPRT * num_vfs. (i.e. max_ioqpairs * num_vfs) when vq_flexible is
> not set. I think this is the case where we should default to private
> resources. If you don't want to work with private resources right now,
> can we instead have it bug out and complain that sriov_vq_flexible must
> be set? We can then later lift that restriction and implement private
> resources.

I would prefer reserving sriov_v{q,i}_flexible=0 for now. That's my current
plan for V2.
Klaus Jensen Nov. 9, 2021, 12:22 p.m. UTC | #7
On Nov  8 14:57, Łukasz Gieryk wrote:
> On Mon, Nov 08, 2021 at 09:25:58AM +0100, Klaus Jensen wrote:
> > On Nov  5 15:04, Łukasz Gieryk wrote:
> > > On Fri, Nov 05, 2021 at 09:46:28AM +0100, Łukasz Gieryk wrote:
> > > > On Thu, Nov 04, 2021 at 04:48:43PM +0100, Łukasz Gieryk wrote:
> > > > > On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> > > > > > On Oct  7 18:24, Lukasz Maniak wrote:
> > > > > > > From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> > > > > > > 
> > > > > > > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> > > > > > > can configure the maximum number of virtual queues and interrupts
> > > > > > > assignable to a single virtual device. The primary and secondary
> > > > > > > controller capability structures are initialized accordingly.
> > > > > > > 
> > > > > > > Since the number of available queues (interrupts) now varies between
> > > > > > > VF/PF, BAR size calculation is also adjusted.
> > > > > > > 
> > > > > > 
> > > > > > While this patch allows configuring the VQFRSM and VIFRSM fields, it
> > > > > > implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
> > > > > > sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> > > > > > bound and this removes a testable case for host software (e.g.
> > > > > > requesting more flexible resources than what is currently available).
> > > > > > 
> > > > > > This patch also requires that these parameters are set if sriov_max_vfs
> > > > > > is. I think we can provide better defaults.
> > > > > > 
> > > > > 
> > > > > Originally I considered more params, but ended up coding the simplest,
> > > > > user-friendly solution, because I did not like the mess with so many
> > > > > parameters, and the flexibility wasn't needed for my use cases. But I do
> > > > > agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
> > > > > valid and resembles an actual device.
> > > > > 
> > > > > > How about,
> > > > > > 
> > > > > > 1. if only sriov_max_vfs is set, then all VFs get private resources
> > > > > >    equal to max_ioqpairs. Like before this patch. This limits the number
> > > > > >    of parameters required to get a basic setup going.
> > > > > > 
> > > > > > 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
> > > > > >    10), the difference between that and max_ioqpairs become flexible
> > > > > >    resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
> > > > > >    instead and just make the difference become private resources.
> > > > > >    Potato/potato.
> > > > > > 
> > > > > >    a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
> > > > > >       of calculated flexible resources.
> > > > > > 
> > > > > > This probably smells a bit like bikeshedding, but I think this gives
> > > > > > more flexibility and better defaults, which helps with verifying host
> > > > > > software.
> > > > > > 
> > > > > > If we can't agree on this now, I suggest we could go ahead and merge the
> > > > > > base functionality (i.e. private resources only) and ruminate some more
> > > > > > about these parameters.
> > > > > 
> > > > > The problem is that the spec allows VFs to support either only private,
> > > > > or only flexible resources.
> > > > > 
> > > > > At this point I have to admit, that since my use cases for
> > > > > QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
> > > > > attention to the case with VFs having private resources. So this SR/IOV
> > > > > implementation doesn’t even support such case (max_vX_per_vf != 0).
> > > > > 
> > > > > Let me summarize the possible config space, and how the current
> > > > > parameters (could) map to these (interrupt-related ones omitted):
> > > > > 
> > > > > Flexible resources not supported (not implemented):
> > > > >  - Private resources for PF     = max_ioqpairs
> > > > >  - Private resources per VF     = ?
> > > > >  - (error if flexible resources are configured)
> > > > > 
> > > > > With flexible resources:
> > > > >  - VQPRT, private resources for PF      = max_ioqpairs
> > > > >  - VQFRT, total flexible resources      = max_vq_per_vf * num_vfs
> > > > >  - VQFRSM, maximum assignable per VF    = max_vq_per_vf
> > > > >  - VQGRAN, granularity                  = #define constant
> > > > >  - (error if private resources per VF are configured)
> > > > > 
> > > > > Since I don’t want to misunderstand your suggestion: could you provide a
> > > > > similar map with your parameters, formulas, and explain how to determine
> > > > > if flexible resources are active? I want to be sure we are on the
> > > > > same page.
> > > > > 
> > > > 
> > > > I’ve just re-read through my email and decided that some bits need
> > > > clarification.
> > > > 
> > > > This implementation supports the “Flexible”-resources-only flavor of
> > > > SR/IOV, while the “Private” also could be supported. Some effort is
> > > > required to support both, and I cannot afford that (at least I cannot
> > > > commit today, neither the other Lukasz).
> > > > 
> > > > While I’m ready to rework the Flexible config and prepare it to be
> > > > extended later to handle the Private variant, the 2nd version of these
> > > > patches will still support the Flexible flavor only.
> > > > 
> > > > I will include appropriate TODO/open in the next cover letter.
> > > > 
> > > 
> > > The summary of my thoughts, so far:
> > > - I'm going to introduce sriov_v{q,i}_flexible and better defaults,
> > >   according to your suggestion (as far as I understand your intentions,
> > >   please correct me if I've missed something).
> > > - The Private SR/IOV flavor, if it's ever implemented, could introduce
> > >   sriov_vq_private_per_vf.
> > > - The updated formulas are listed below.
> > > 
> > > Flexible resources not supported (not implemented):
> > >  - Private resources for PF     = max_ioqpairs
> > >  - Private resources per VF     = sriov_vq_private_per_vf
> > 
> > I would just keep it simple and say, if sriov_v{q,i}_flexible is not
> > set, then each VF gets max_ioqpairs private resources.
> > 
> 
> Since you did request more tuning knobs for the Flexible variant, the
> Private one should follow that and allow full configuration. A device
> where PF.priv=64 and each VF.priv=4 makes sense, and I couldn’t
> configure it if sriov_v{q,i}_flexible=0 enabled the Private mode.
> 

It was just to simplify, I am just fine with having
`sriov_vq_private_per_vf` :)

> > >  - (error if sriov_vq_flexible is set)
> > > 
> > > With flexible resources:
> > >  - VQPRT, private resources for PF      = max_ioqpairs - sriov_vq_flexible
> > >  - VQFRT, total flexible resources      = sriov_vq_flexible (if set, or)
> > >                                           VQPRT * num_vfs
> > >  - VQFRSM, maximum assignable per VF    = sriov_max_vq_per_vf (if set, or)
> > >                                           VQPRT
> > 
> > You mean VQFRT here, right?
> > 
> 
> VQPRT is right, and – in my opinion – makes a better default than VQFRT.
> 
> E.g., configuring a device:
> 
> (max_vfs=32, PF.priv=VQPRT=X, PF.flex_total=VQFRT=256)
> 
> as (num_vfs=1, VF0.flex=256) doesn’t make much sense. Virtualization is
> not needed in such case, and user should probably use PF directly. On
> the other hand, VQPRT is probably tuned to offer most (if not all) of
> the performance and functionality; thus serves as a sane default.
> 

Alright.

> > >  - VQGRAN, granularity                  = #define constant
> > 
> > Yeah, 1 seems pretty reasonable here.
> > 
> > >  - (error if sriov_vq_private_per_vf is set)
> > > 
> > > Is this version acceptable?
> > > 
> > 
> > Sounds good to me. The only one I am not too happy about is the default
> > of VQPRT * num_vfs. (i.e. max_ioqpairs * num_vfs) when vq_flexible is
> > not set. I think this is the case where we should default to private
> > resources. If you don't want to work with private resources right now,
> > can we instead have it bug out and complain that sriov_vq_flexible must
> > be set? We can then later lift that restriction and implement private
> > resources.
> 
> I would prefer reserving sriov_v{q,i}_flexible=0 for now. That's my current
> plan for V2.
> 

Alright.
diff mbox series

Patch

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 425fbf2c73..67c7210d7e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -36,6 +36,8 @@ 
  *              zoned.zasl=<N[optional]>, \
  *              zoned.auto_transition=<on|off[optional]>, \
  *              sriov_max_vfs=<N[optional]> \
+ *              sriov_max_vi_per_vf=<N[optional]> \
+ *              sriov_max_vq_per_vf=<N[optional]> \
  *              subsys=<subsys_id>
  *      -device nvme-ns,drive=<drive_id>,bus=<bus_name>,nsid=<nsid>,\
  *              zoned=<true|false[optional]>, \
@@ -113,6 +115,18 @@ 
  *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
  *   Virtual function controllers will not report SR-IOV capability.
  *
+ * - `sriov_max_vi_per_vf`
+ *   Indicates the maximum number of virtual interrupt resources assignable
+ *   to a secondary controller. Must be explicitly set if sriov_max_vfs != 0.
+ *   The parameter affect VFs similarly to how msix_qsize affects PF, i.e.,
+ *   determines the number of interrupts available to all queues (admin, io).
+ *
+ * - `sriov_max_vq_per_vf`
+ *   Indicates the maximum number of virtual queue resources assignable to
+ *   a secondary controller. Must be explicitly set if sriov_max_vfs != 0.
+ *   The parameter affect VFs similarly to how max_ioqpairs affects PF,
+ *   except the number of flexible queues includes the admin queue.
+ *
  * nvme namespace device parameters
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  * - `shared`
@@ -184,6 +198,7 @@ 
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 #define NVME_MAX_VFS 127
+#define NVME_VF_RES_GRANULARITY 1
 #define NVME_VF_OFFSET 0x1
 #define NVME_VF_STRIDE 1
 
@@ -6254,6 +6269,7 @@  static const MemoryRegionOps nvme_cmb_ops = {
 static void nvme_check_constraints(NvmeCtrl *n, Error **errp)
 {
     NvmeParams *params = &n->params;
+    int msix_total;
 
     if (params->num_queues) {
         warn_report("num_queues is deprecated; please use max_ioqpairs "
@@ -6324,6 +6340,30 @@  static void nvme_check_constraints(NvmeCtrl *n, Error **errp)
                        NVME_MAX_VFS);
             return;
         }
+
+        if (params->sriov_max_vi_per_vf < 1 ||
+            (params->sriov_max_vi_per_vf - 1) % NVME_VF_RES_GRANULARITY) {
+            error_setg(errp, "sriov_max_vi_per_vf must meet:"
+                       " (X - 1) %% %d == 0 and X >= 1",
+                       NVME_VF_RES_GRANULARITY);
+            return;
+        }
+
+        if (params->sriov_max_vq_per_vf < 2 ||
+            (params->sriov_max_vq_per_vf - 1) % NVME_VF_RES_GRANULARITY) {
+            error_setg(errp, "sriov_max_vq_per_vf must meet:"
+                       " (X - 1) %% %d == 0 and X >= 2",
+                       NVME_VF_RES_GRANULARITY);
+            return;
+        }
+
+        msix_total = params->msix_qsize +
+                     params->sriov_max_vfs * params->sriov_max_vi_per_vf;
+        if (msix_total > PCI_MSIX_FLAGS_QSIZE + 1) {
+            error_setg(errp, "sriov_max_vi_per_vf is too big for max_vfs=%d",
+                       params->sriov_max_vfs);
+            return;
+        }
     }
 }
 
@@ -6332,13 +6372,35 @@  static void nvme_init_state(NvmeCtrl *n)
     NvmePriCtrlCap *cap = &n->pri_ctrl_cap;
     NvmeSecCtrlList *list = &n->sec_ctrl_list;
     NvmeSecCtrlEntry *sctrl;
+    uint8_t max_vfs;
+    uint32_t total_vq, total_vi;
     int i;
 
-    n->max_ioqpairs = n->params.max_ioqpairs;
-    n->conf_ioqpairs = n->max_ioqpairs;
+    if (pci_is_vf(&n->parent_obj)) {
+        sctrl = nvme_sctrl(n);
+
+        max_vfs = 0;
+
+        n->max_ioqpairs = n->params.sriov_max_vq_per_vf - 1;
+        n->conf_ioqpairs = sctrl->nvq ? le16_to_cpu(sctrl->nvq) - 1 : 0;
+
+        n->max_msix_qsize = n->params.sriov_max_vi_per_vf;
+        n->conf_msix_qsize = sctrl->nvi ? le16_to_cpu(sctrl->nvi) : 1;
+    } else {
+        max_vfs = n->params.sriov_max_vfs;
+
+        n->max_ioqpairs = n->params.max_ioqpairs +
+                          max_vfs * n->params.sriov_max_vq_per_vf;
+        n->conf_ioqpairs = n->max_ioqpairs;
+
+        n->max_msix_qsize = n->params.msix_qsize +
+                            max_vfs * n->params.sriov_max_vi_per_vf;
+        n->conf_msix_qsize = n->max_msix_qsize;
+    }
+
+    total_vq = n->params.sriov_max_vq_per_vf * max_vfs;
+    total_vi = n->params.sriov_max_vi_per_vf * max_vfs;
 
-    n->max_msix_qsize = n->params.msix_qsize;
-    n->conf_msix_qsize = n->max_msix_qsize;
     n->sq = g_new0(NvmeSQueue *, n->max_ioqpairs + 1);
     n->cq = g_new0(NvmeCQueue *, n->max_ioqpairs + 1);
     n->temperature = NVME_TEMPERATURE;
@@ -6346,13 +6408,34 @@  static void nvme_init_state(NvmeCtrl *n)
     n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
     n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 
-    list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
-    for (i = 0; i < n->params.sriov_max_vfs; i++) {
+    list->numcntl = cpu_to_le16(max_vfs);
+    for (i = 0; i < max_vfs; i++) {
         sctrl = &list->sec[i];
         sctrl->pcid = cpu_to_le16(n->cntlid);
     }
 
     cap->cntlid = cpu_to_le16(n->cntlid);
+    cap->crt = NVME_CRT_VQ | NVME_CRT_VI;
+
+    cap->vqfrt = cpu_to_le32(total_vq);
+    cap->vqrfap = cpu_to_le32(total_vq);
+    if (pci_is_vf(&n->parent_obj)) {
+        cap->vqprt = cpu_to_le16(n->conf_ioqpairs + 1);
+    } else {
+        cap->vqprt = cpu_to_le16(n->params.max_ioqpairs + 1);
+        cap->vqfrsm = cpu_to_le16(n->params.sriov_max_vq_per_vf);
+        cap->vqgran = cpu_to_le16(NVME_VF_RES_GRANULARITY);
+    }
+
+    cap->vifrt = cpu_to_le32(total_vi);
+    cap->virfap = cpu_to_le32(total_vi);
+    if (pci_is_vf(&n->parent_obj)) {
+        cap->viprt = cpu_to_le16(n->conf_msix_qsize);
+    } else {
+        cap->viprt = cpu_to_le16(n->params.msix_qsize);
+        cap->vifrsm = cpu_to_le16(n->params.sriov_max_vi_per_vf);
+        cap->vigran = cpu_to_le16(NVME_VF_RES_GRANULARITY);
+    }
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -6448,11 +6531,13 @@  static void nvme_update_vfs(PCIDevice *pci_dev, uint16_t prev_num_vfs,
     }
 }
 
-static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
-                            uint64_t bar_size)
+static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
 {
     uint16_t vf_dev_id = n->params.use_intel_id ?
                          PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
+    uint64_t bar_size = nvme_bar_size(n->params.sriov_max_vq_per_vf,
+                                      n->params.sriov_max_vi_per_vf,
+                                      NULL, NULL);
 
     pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
                        n->params.sriov_max_vfs, n->params.sriov_max_vfs,
@@ -6550,7 +6635,7 @@  static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
     }
 
     if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) {
-        nvme_init_sriov(n, pci_dev, 0x120, bar_size);
+        nvme_init_sriov(n, pci_dev, 0x120);
     }
 
     return 0;
@@ -6574,6 +6659,7 @@  static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
     NvmeIdCtrl *id = &n->id_ctrl;
     uint8_t *pci_conf = pci_dev->config;
     uint64_t cap = ldq_le_p(&n->bar.cap);
+    NvmeSecCtrlEntry *sctrl = nvme_sctrl(n);
 
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
     id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -6665,6 +6751,10 @@  static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 
     stl_le_p(&n->bar.vs, NVME_SPEC_VER);
     n->bar.intmc = n->bar.intms = 0;
+
+    if (pci_is_vf(&n->parent_obj) && !sctrl->scs) {
+        stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
+    }
 }
 
 static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
@@ -6804,6 +6894,8 @@  static Property nvme_props[] = {
     DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl,
                      params.auto_transition_zones, true),
     DEFINE_PROP_UINT8("sriov_max_vfs", NvmeCtrl, params.sriov_max_vfs, 0),
+    DEFINE_PROP_UINT8("sriov_max_vi_per_vf", NvmeCtrl, params.sriov_max_vi_per_vf, 0),
+    DEFINE_PROP_UINT8("sriov_max_vq_per_vf", NvmeCtrl, params.sriov_max_vq_per_vf, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index a8eded4713..43609c979a 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -393,6 +393,8 @@  typedef struct NvmeParams {
     bool     auto_transition_zones;
     bool     legacy_cmb;
     uint8_t  sriov_max_vfs;
+    uint8_t  sriov_max_vq_per_vf;
+    uint8_t  sriov_max_vi_per_vf;
 } NvmeParams;
 
 typedef struct NvmeCtrl {
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 96595ea8f1..26672d0a31 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1488,6 +1488,11 @@  typedef struct QEMU_PACKED NvmePriCtrlCap {
     uint8_t     rsvd80[4016];
 } NvmePriCtrlCap;
 
+typedef enum NvmePriCtrlCapCrt {
+    NVME_CRT_VQ             = 1 << 0,
+    NVME_CRT_VI             = 1 << 1,
+} NvmePriCtrlCapCrt;
+
 typedef struct QEMU_PACKED NvmeSecCtrlEntry {
     uint16_t    scid;
     uint16_t    pcid;