Patchwork virtio-spec: document MSI-X

login
register
mail settings
Submitter Michael S. Tsirkin
Date Feb. 11, 2010, 5:22 p.m.
Message ID <20100211172236.GA20357@redhat.com>
Download mbox | patch
Permalink /patch/45140/
State New
Headers show

Comments

Michael S. Tsirkin - Feb. 11, 2010, 5:22 p.m.
This documents MSI-X support in virtio.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---
 virtio-spec.lyx |  358 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 332 insertions(+), 26 deletions(-)
Rusty Russell - Feb. 12, 2010, 9:47 a.m.
On Fri, 12 Feb 2010 03:52:36 am Michael S. Tsirkin wrote:
> This documents MSI-X support in virtio.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Wow, great!

I reworked one paragraph for better grammar.  Mainly adding "the":

(pseudo-patch):
    Devices report such failures by returning +the+ NO_VECTOR value
    when the relevant Vector field is read. After mapping an event to vector,
    +the+ driver must verify success by reading the Vector field value: on
    success, +the+ previously written value is returned-;-+, and+ on failure,
    NO_VECTOR -value- is returned. If +a+ mapping failure is detected, +the+
    driver can retry mapping with +fewer+-less- vectors, or disable MSI-X.

I really liked the conversational style: standards can be intimidating and
unfriendly documents if they concentrate too much on partitioning all
information into precise sections.

That makes it 0.8.6.  I will re-read the entire document for consistency
before releasing 0.9.

Thanks!
Rusty.
Michael S. Tsirkin - Feb. 12, 2010, 10:47 a.m.
On Fri, Feb 12, 2010 at 08:17:55PM +1030, Rusty Russell wrote:
> On Fri, 12 Feb 2010 03:52:36 am Michael S. Tsirkin wrote:
> > This documents MSI-X support in virtio.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Wow, great!
> 
> I reworked one paragraph for better grammar.  Mainly adding "the":
> 
> (pseudo-patch):
>     Devices report such failures by returning +the+ NO_VECTOR value
>     when the relevant Vector field is read. After mapping an event to vector,
>     +the+ driver must verify success by reading the Vector field value: on
>     success, +the+ previously written value is returned-;-+, and+ on failure,
>     NO_VECTOR -value- is returned. If +a+ mapping failure is detected, +the+
>     driver can retry mapping with +fewer+-less- vectors, or disable MSI-X.

Looks good, thanks for the corrections!

> I really liked the conversational style: standards can be intimidating and
> unfriendly documents if they concentrate too much on partitioning all
> information into precise sections.
> 
> That makes it 0.8.6.  I will re-read the entire document for consistency
> before releasing 0.9.
> 
> Thanks!
> Rusty.

Patch

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 49ed612..d16104a 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -1,4 +1,4 @@ 
-#LyX 1.6.4 created this file. For more info see http://www.lyx.org/
+#LyX 1.6.5 created this file. For more info see http://www.lyx.org/
 \lyxformat 345
 \begin_document
 \begin_header
@@ -35,9 +35,8 @@ 
 \papersides 1
 \paperpagestyle default
 \tracking_changes true
-\output_changes true
-\author "" 
-\author "" 
+\output_changes false
+\author "Michael S. Tsirkin" 
 \author "" 
 \end_header
 
@@ -72,7 +71,11 @@  FIXME: virtio block scsi passthrough section
 \end_layout
 
 \begin_layout Standard
+
+\change_deleted 0 1265908736
 FIXME: MSI-X documentation
+\change_unchanged
+
 \end_layout
 
 \begin_layout Chapter
@@ -590,8 +593,11 @@  The DRIVER status bit is set: we know how to drive the device.
 
 \begin_layout Enumerate
 Device-specific setup, including reading the Device Feature Bits, discovery
- of virtqueues for the device, and reading and possibly writing the virtio
- configuration space.
+ of virtqueues for the device, 
+\change_inserted 0 1265905891
+optional MSI-X setup, 
+\change_unchanged
+and reading and possibly writing the virtio configuration space.
 \end_layout
 
 \begin_layout Enumerate
@@ -636,7 +642,7 @@  Virtio Header
 
 \begin_layout Standard
 \begin_inset Tabular
-<lyxtabular version="3" rows="4" columns="10">
+<lyxtabular version="3" rows="4" columns="12">
 <features>
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
@@ -648,6 +654,8 @@  Virtio Header
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
+<column alignment="left" valignment="top" width="0">
+<column alignment="left" valignment="top" width="0">
 <row>
 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
 \begin_inset Text
@@ -730,6 +738,28 @@  Bits
 
 \end_inset
 </cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895519
+16 (optional)
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895525
+16 (optional)
+\end_layout
+
+\end_inset
+</cell>
 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
 \begin_inset Text
 
@@ -822,6 +852,28 @@  R
 
 \end_inset
 </cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895422
+R+W
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895531
+R+W
+\end_layout
+
+\end_inset
+</cell>
 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
 \begin_inset Text
 
@@ -930,6 +982,28 @@  ISR
 
 \end_inset
 </cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895579
+Configuration
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895618
+Queue
+\end_layout
+
+\end_inset
+</cell>
 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
 \begin_inset Text
 
@@ -1040,6 +1114,28 @@  Status
 
 \end_inset
 </cell>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895695
+Vector
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265895623
+Vector
+\end_layout
+
+\end_inset
+</cell>
 <cell alignment="center" valignment="top" bottomline="true" leftline="true" rightline="true" usebox="none">
 \begin_inset Text
 
@@ -1181,6 +1277,88 @@  This allows for forwards and backwards compatibility: if the device is enhanced
  support, it will not see that feature bit in the Device Features field
  and can go into backwards compatibility mode (or, for poor implementations,
  set the FAILED Device Status bit).
+\change_inserted 0 1265896046
+
+\end_layout
+
+\begin_layout Subsubsection
+
+\change_inserted 0 1265896301
+Configuration/Queue Vectors
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1265908336
+When MSI-X capability is present and enabled in the device (through standard
+ PCI configuration space) 4 bytes at byte offset 20 are used to map configuratio
+n change and queue interrupts to MSI-X vectors.
+ In this case, the ISR Status field is unused, and device specific configuration
+ starts at byte offset 24 in virtio header structure.
+ When MSI-X capability is not enabled, device specific configuration starts
+ at byte offset 20 in virtio header.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1265907969
+Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of Configuration/Qu
+eue Vector registers, 
+\emph on
+maps
+\emph default
+ interrupts triggered by the configuration change/selected queue events
+ respectively to the corresponding MSI-X vector.
+ To disable interrupts for a specific event type, unmap it by writing a
+ special NO_VECTOR value:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1265902253
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265902147
+
+/* Vector value used to disable MSI for queue */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1265902136
+
+#define VIRTIO_MSI_NO_VECTOR            0xffff 
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1265905829
+Reading these registers returns vector mapped to a given event, or NO_VECTOR
+ if unmapped.
+ All queue and configuration change events are unmapped by default.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1265907870
+Note that mapping an event to vector might require allocating internal device
+ resources, and might fail.
+ Devices report such failures by returning NO_VECTOR value when the relevant
+ Vector field is read.
+ After mapping an event to vector, driver must verify success by reading
+ the Vector field valueon success, previously written value is returned;
+ on failure, NO_VECTOR value is returned.
+ If mapping failure is detected, driver can retry mapping with less vectors,
+ or disable MSI-X.
 \end_layout
 
 \begin_layout Section
@@ -1224,6 +1402,19 @@  The 4096 is based on the x86 page size, but it's also large enough to ensure
 \end_inset
 
 
+\change_inserted 0 1265902802
+
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 0 1265907664
+Optionally, if MSI-X capability is present and enabled on the device, select
+ a vector to use to request interrupts triggered by virtqueue events.
+ Write the MSI-X Table entry number corresponding to this vector in Queue
+ Vector field.
+ Read the Queue Vector field: on success, previously written value is returned;
+ on failure, NO_VECTOR value is returned.
 \end_layout
 
 \begin_layout Standard
@@ -2107,6 +2298,17 @@  Update the used ring idx.
 
 \begin_layout Enumerate
 If the VRING_AVAIL_F_NO_INTERRUPT flag is not set in avail->flags:
+\change_inserted 0 1265903387
+
+\end_layout
+
+\begin_deeper
+\begin_layout Enumerate
+
+\change_inserted 0 1265903435
+If MSI-X capability is disabled:
+\change_unchanged
+
 \end_layout
 
 \begin_deeper
@@ -2116,16 +2318,66 @@  Set the lower bit of the ISR Status field for the device.
 
 \begin_layout Enumerate
 Send the appropriate PCI interrupt for the device.
+\change_inserted 0 1265904154
+
 \end_layout
 
 \end_deeper
+\begin_layout Enumerate
+
+\change_inserted 0 1265903452
+If MSI-X capability is enabled:
+\end_layout
+
+\begin_deeper
+\begin_layout Enumerate
+
+\change_inserted 0 1265907522
+Request the appropriate MSI-X interrupt message for the device, Queue Vector
+ field sets the MSI-X Table entry number.
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 0 1265907541
+If Queue Vector field value is NO_VECTOR, no interrupt message is requested
+ for this event.
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\end_deeper
 \begin_layout Standard
-The guest interrupt handler should read the ISR Status field, which will
- reset it to zero.
+The guest interrupt handler should
+\change_inserted 0 1265904434
+:
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 0 1265904449
+If MSI-X capability is disabled: 
+\change_deleted 0 1265904425
+ 
+\change_unchanged
+read the ISR Status field, which will reset it to zero.
  If the lower bit is zero, the interrupt was not for this device.
  Otherwise, the guest driver should look through the used rings of each
  virtqueue for the device, to see if any progress has been made by the device
  which requires servicing.
+\change_inserted 0 1265904489
+
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 0 1265904546
+If MSI-X capability is enabled: look through the used rings of each virtqueue
+ mapped to the specific MSI-X vector for the device, to see if any progress
+ has been made by the device which requires servicing.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Standard
@@ -2170,12 +2422,23 @@  Dealing With Configuration Changes
 \begin_layout Standard
 Some virtio PCI devices can change the device configuration state, as reflected
  in the virtio header in the PCI configuration space.
- In this case, an interrupt is delivered and the second highest bit is set
- in the ISR Status field to indicate that the driver should re-examine the
- configuration space.
+ In this case
+\change_inserted 0 1265904732
+:
 \end_layout
 
-\begin_layout Standard
+\begin_layout Enumerate
+
+\change_inserted 0 1265904810
+If MSI-X capability is disabled:
+\change_deleted 0 1265904811
+,
+\change_unchanged
+ an interrupt is delivered and the second highest bit is set in the ISR
+ Status field to indicate that the driver should re-examine the configuration
+ space.
+\change_deleted 0 1265905023
+
 \begin_inset listings
 inline false
 status open
@@ -2188,12 +2451,31 @@  status open
 \end_inset
 
 
+\change_inserted 0 1265905350
+Note that a single interrupt can indicate both that one or more virtqueue
+ has been used and that the configuration space has changed: even if the
+ config bit is set, virtqueues must be scanned.
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 0 1265907476
+If MSI-X capability is enabled: an interrupt message is requested.
+ The Configuration Vector field sets the MSI-X Table entry number to use.
+ If Configuration Vector field value is NO_VECTOR, no interrupt message
+ is requested for this event.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Standard
+
+\change_deleted 0 1265905342
 Note that a single interrupt can indicate both that one or more virtqueue
  has been used and that the configuration space has changed: even if the
  config bit is set, virtqueues must be scanned.
+\change_inserted 0 1265905057
+
 \end_layout
 
 \begin_layout Chapter
@@ -2259,6 +2541,30 @@  Meanwhile for experimental drivers, use 65535 and work backwards.
 \end_layout
 
 \begin_layout Section*
+
+\change_inserted 0 1265906688
+How many MSI-X vectors?
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1265907268
+Using the optional MSI-X capability devices can speed up interrupt processing
+ by removing the need to read ISR Status register by guest driver (which
+ might be an expensive operation), reducing interrupt sharing between devices
+ and queues within the device, and handling interrupts from multiple CPUs.
+ However, some systems impose a limit (which might be as low as 256) on
+ the total number of MSI-X vectors that can be allocated to all devices.
+ Devices and/or device drivers should take this into account, limiting the
+ number of vectors used unless the device is expected to cause a high volume
+ of interrupts.
+ Devices can control the number of vectors used by limiting the MSI-X Table
+ Size or not presenting MSI-X capability in PCI configuration space.
+ Drivers can control this by mapping events to as small number of vectors
+ as possible, or disabling MSI-X capability altogether.
+\end_layout
+
+\begin_layout Section*
 Message Framing
 \end_layout
 
@@ -2276,7 +2582,7 @@  The descriptors used for a buffer should not effect the semantics of the
 In particular, no implementation should use the descriptor boundaries to
  determine the size of any header in a request.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 The current qemu device implementations mistakenly insist that the first
@@ -2298,7 +2604,7 @@  Any change to configuration space, or new virtqueues, or behavioural changes,
  should be indicated be negotiation of a new feature bit.
  This establishes clarity
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 Even if it does mean documenting design or implementation mistakes!
@@ -3092,7 +3398,7 @@  Virtqueues 0:receiveq.
  1:transmitq.
  2:controlq
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 Only if VIRTIO_NET_F_CTRL_VQ set
@@ -3143,7 +3449,7 @@  VIRTIO_NET_F_GSO
 
 (6) (Deprecated) device handles packets with any GSO type.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 It was supposed to indicate segmentation offload support, but upon further
@@ -3412,7 +3718,7 @@  This is a common restriction in real, older network cards.
 The converse features are also available: a driver can save the virtual
  device some work by negotiating these features.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 For example, a network packet transported between two guests on the same
@@ -3576,7 +3882,7 @@  csum_start is set to the offset within the packet to begin checksumming,
 csum_offset indicates how many bytes after the csum_start the new (16 bit
  ones' complement) checksum should be placed.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 For example, consider a partially checksummed TCP (IPv4) packet.
@@ -3653,7 +3959,7 @@  gso_type
 
  as well, indicating that the TCP packet has the ECN bit set.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 This case is not handled by some older hardware, so is called out specifically
@@ -3682,7 +3988,7 @@  reference "sub:Notifying-The-Device"
 
 ).
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 Note that the header will be two bytes longer for the VIRTIO_NET_F_MRG_RXBUF
@@ -4070,7 +4376,7 @@  struct virtio_net_ctrl_mac {
 The device can filter incoming packets by any number of destination MAC
  addresses.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 Since there are no guarentees, it can use a hash filter orsilently switch
@@ -4633,7 +4939,7 @@  Device Operation
 \begin_layout Enumerate
 For output, a buffer containing the characters is placed in the port's transmitq.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 Because this is high importance and low bandwidth, the current Linux implementat
@@ -4843,7 +5149,7 @@  Virtqueues 0:inflateq.
  1:deflateq.
  2:statsq.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 Only if VIRTIO_BALLON_F_STATS_VQ set
@@ -5001,7 +5307,7 @@  To supply memory to the balloon (aka.
 The driver constructs an array of addresses of unused memory pages.
  These addresses are divided by 4096
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 This is historical, and independent of the guest page size
@@ -5062,7 +5368,7 @@  actual
  field of the configuration should be updated to reflect the new number
  of pages in the balloon.
 \begin_inset Foot
-status collapsed
+status open
 
 \begin_layout Plain Layout
 As updates to configuration space are not atomic, this field isn't particularly