Message ID | 20121203105843.GA26194@redhat.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2012-12-03 at 12:58 +0200, Michael S. Tsirkin wrote: > Add RFS support to virtio network device. > Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new > configuration field max_virtqueue_pairs to detect supported number of > virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program > packet steering for unidirectional protocols. [...] > +Programming of the receive flow classificator is implicit. > + Transmitting a packet of a specific flow on transmitqX will cause incoming > + packets for this flow to be steered to receiveqX. > + For uni-directional protocols, or where no packets have been transmitted > + yet, device will steer a packet to a random queue out of the specified > + receiveq0..receiveqn. [...] It doesn't seem like this is usable to implement accelerated RFS in the guest, though perhaps that doesn't matter. On the host side, presumably you'll want vhost_net to do the equivalent of sock_rps_record_flow() - only without a socket? But in any case, that requires an rxhash, so I don't see how this is supposed to work. Ben.
On Wed, Dec 05, 2012 at 08:39:26PM +0000, Ben Hutchings wrote: > On Mon, 2012-12-03 at 12:58 +0200, Michael S. Tsirkin wrote: > > Add RFS support to virtio network device. > > Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new > > configuration field max_virtqueue_pairs to detect supported number of > > virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program > > packet steering for unidirectional protocols. > [...] > > +Programming of the receive flow classificator is implicit. > > + Transmitting a packet of a specific flow on transmitqX will cause incoming > > + packets for this flow to be steered to receiveqX. > > + For uni-directional protocols, or where no packets have been transmitted > > + yet, device will steer a packet to a random queue out of the specified > > + receiveq0..receiveqn. > [...] > > It doesn't seem like this is usable to implement accelerated RFS in the > guest, though perhaps that doesn't matter. What is the issue? Could you be more explicit please? It seems to work pretty well: if we have # of queues >= # of cpus, incoming TCP_STREAM into guest scales very nicely without manual tweaks in guest. The way it works is, when guest sends a packet driver select the rx queue that we want to use for incoming packets for this slow, and transmit on the matching tx queue. This is exactly what text above suggests no? > On the host side, presumably > you'll want vhost_net to do the equivalent of sock_rps_record_flow() - > only without a socket? But in any case, that requires an rxhash, so I > don't see how this is supposed to work. > > Ben. Host should just do what guest tells it to. On the host side we build up the steering table as we get packets to transmit. See the code in drivers/net/tun.c in recent kernels. Again this actually works fine - what are the problems that you see? Could you give an example please? > -- > Ben Hutchings, Staff Engineer, Solarflare > Not speaking for my employer; that's the marketing department's job. > They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-12-06 at 10:13 +0200, Michael S. Tsirkin wrote: > On Wed, Dec 05, 2012 at 08:39:26PM +0000, Ben Hutchings wrote: > > On Mon, 2012-12-03 at 12:58 +0200, Michael S. Tsirkin wrote: > > > Add RFS support to virtio network device. > > > Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new > > > configuration field max_virtqueue_pairs to detect supported number of > > > virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program > > > packet steering for unidirectional protocols. > > [...] > > > +Programming of the receive flow classificator is implicit. > > > + Transmitting a packet of a specific flow on transmitqX will cause incoming > > > + packets for this flow to be steered to receiveqX. > > > + For uni-directional protocols, or where no packets have been transmitted > > > + yet, device will steer a packet to a random queue out of the specified > > > + receiveq0..receiveqn. > > [...] > > > > It doesn't seem like this is usable to implement accelerated RFS in the > > guest, though perhaps that doesn't matter. > > What is the issue? Could you be more explicit please? > > It seems to work pretty well: if we have > # of queues >= # of cpus, incoming TCP_STREAM into > guest scales very nicely without manual tweaks in guest. > > The way it works is, when guest sends a packet driver > select the rx queue that we want to use for incoming > packets for this slow, and transmit on the matching tx queue. > This is exactly what text above suggests no? Yes, I get that. > > On the host side, presumably > > you'll want vhost_net to do the equivalent of sock_rps_record_flow() - > > only without a socket? But in any case, that requires an rxhash, so I > > don't see how this is supposed to work. > > > > Ben. > > Host should just do what guest tells it to. > On the host side we build up the steering table as we get packets > to transmit. See the code in drivers/net/tun.c in recent > kernels. > > Again this actually works fine - what are the problems that you see? > Could you give an example please? I'm not saying it doesn't work in its own way, I just don't see how you would make it work with the existing RFS! Since this doesn't seem to be intended to have *any* connection with the existing core networking feature called RFS, perhaps you could find a different name for it. Ben.
On Thu, Dec 06, 2012 at 08:03:14PM +0000, Ben Hutchings wrote: > On Thu, 2012-12-06 at 10:13 +0200, Michael S. Tsirkin wrote: > > On Wed, Dec 05, 2012 at 08:39:26PM +0000, Ben Hutchings wrote: > > > On Mon, 2012-12-03 at 12:58 +0200, Michael S. Tsirkin wrote: > > > > Add RFS support to virtio network device. > > > > Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new > > > > configuration field max_virtqueue_pairs to detect supported number of > > > > virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program > > > > packet steering for unidirectional protocols. > > > [...] > > > > +Programming of the receive flow classificator is implicit. > > > > + Transmitting a packet of a specific flow on transmitqX will cause incoming > > > > + packets for this flow to be steered to receiveqX. > > > > + For uni-directional protocols, or where no packets have been transmitted > > > > + yet, device will steer a packet to a random queue out of the specified > > > > + receiveq0..receiveqn. > > > [...] > > > > > > It doesn't seem like this is usable to implement accelerated RFS in the > > > guest, though perhaps that doesn't matter. > > > > What is the issue? Could you be more explicit please? > > > > It seems to work pretty well: if we have > > # of queues >= # of cpus, incoming TCP_STREAM into > > guest scales very nicely without manual tweaks in guest. > > > > The way it works is, when guest sends a packet driver > > select the rx queue that we want to use for incoming > > packets for this slow, and transmit on the matching tx queue. > > This is exactly what text above suggests no? > > Yes, I get that. > > > > On the host side, presumably > > > you'll want vhost_net to do the equivalent of sock_rps_record_flow() - > > > only without a socket? But in any case, that requires an rxhash, so I > > > don't see how this is supposed to work. > > > > > > Ben. > > > > Host should just do what guest tells it to. > > On the host side we build up the steering table as we get packets > > to transmit. See the code in drivers/net/tun.c in recent > > kernels. > > > > Again this actually works fine - what are the problems that you see? > > Could you give an example please? > > I'm not saying it doesn't work in its own way, I just don't see how you > would make it work with the existing RFS! > > Since this doesn't seem to be intended to have *any* connection with the > existing core networking feature called RFS, perhaps you could find a > different name for it. > > Ben. Ah I see what you mean. We started out calling this feature "multiqueue" Rusty suggested "RFS" since it gives similar functionality to RFS but in device: it has receive steering logic per flow as part of the device. Maybe simply adding a statement similar to the one above would be sufficient to avoid confusion? > -- > Ben Hutchings, Staff Engineer, Solarflare > Not speaking for my employer; that's the marketing department's job. > They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-12-06 at 22:29 +0200, Michael S. Tsirkin wrote: > On Thu, Dec 06, 2012 at 08:03:14PM +0000, Ben Hutchings wrote: [...] > > Since this doesn't seem to be intended to have *any* connection with the > > existing core networking feature called RFS, perhaps you could find a > > different name for it. > > > > Ben. > > > Ah I see what you mean. We started out calling this feature "multiqueue" > Rusty suggested "RFS" since it gives similar functionality to RFS but in > device: it has receive steering logic per flow as part of the device. The name is quite generic, but in the context of Linux it has so far been used for a specific software feature and not as a generic name for flow steering by hardware (or drivers). The existing documentation (Documentation/networking/scaling.txt) states quite clearly that 'RFS' means that specific software implementation (with optional driver integration) and configuration interface. > Maybe simply adding a statement similar to the one above would be > sufficient to avoid confusion? No, I don't think it's sufficient. We have documentation that says how to configure 'RFS', and you're proposing to add a very similar feature called 'RFS' that is configured differently. No matter how clearly you distinguish them in new documentation, this will make the old documentation confusing. Ben.
On Thu, Dec 06, 2012 at 08:53:59PM +0000, Ben Hutchings wrote: > On Thu, 2012-12-06 at 22:29 +0200, Michael S. Tsirkin wrote: > > On Thu, Dec 06, 2012 at 08:03:14PM +0000, Ben Hutchings wrote: > [...] > > > Since this doesn't seem to be intended to have *any* connection with the > > > existing core networking feature called RFS, perhaps you could find a > > > different name for it. > > > > > > Ben. > > > > > > Ah I see what you mean. We started out calling this feature "multiqueue" > > Rusty suggested "RFS" since it gives similar functionality to RFS but in > > device: it has receive steering logic per flow as part of the device. > > The name is quite generic, but in the context of Linux it has so far > been used for a specific software feature and not as a generic name for > flow steering by hardware (or drivers). The existing documentation > (Documentation/networking/scaling.txt) states quite clearly that 'RFS' > means that specific software implementation (with optional driver > integration) and configuration interface. > > > Maybe simply adding a statement similar to the one above would be > > sufficient to avoid confusion? > > No, I don't think it's sufficient. We have documentation that says how > to configure 'RFS', and you're proposing to add a very similar feature > called 'RFS' that is configured differently. No matter how clearly you > distinguish them in new documentation, this will make the old > documentation confusing. > > Ben. I don't mind, renaming is just s/RFS/whatever/ away - how should hardware call this in your opinion? > -- > Ben Hutchings, Staff Engineer, Solarflare > Not speaking for my employer; that's the marketing department's job. > They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-12-06 at 23:01 +0200, Michael S. Tsirkin wrote: > On Thu, Dec 06, 2012 at 08:53:59PM +0000, Ben Hutchings wrote: > > On Thu, 2012-12-06 at 22:29 +0200, Michael S. Tsirkin wrote: > > > On Thu, Dec 06, 2012 at 08:03:14PM +0000, Ben Hutchings wrote: > > [...] > > > > Since this doesn't seem to be intended to have *any* connection with the > > > > existing core networking feature called RFS, perhaps you could find a > > > > different name for it. > > > > > > > > Ben. > > > > > > > > > Ah I see what you mean. We started out calling this feature "multiqueue" > > > Rusty suggested "RFS" since it gives similar functionality to RFS but in > > > device: it has receive steering logic per flow as part of the device. > > > > The name is quite generic, but in the context of Linux it has so far > > been used for a specific software feature and not as a generic name for > > flow steering by hardware (or drivers). The existing documentation > > (Documentation/networking/scaling.txt) states quite clearly that 'RFS' > > means that specific software implementation (with optional driver > > integration) and configuration interface. > > > > > Maybe simply adding a statement similar to the one above would be > > > sufficient to avoid confusion? > > > > No, I don't think it's sufficient. We have documentation that says how > > to configure 'RFS', and you're proposing to add a very similar feature > > called 'RFS' that is configured differently. No matter how clearly you > > distinguish them in new documentation, this will make the old > > documentation confusing. > > > > Ben. > > I don't mind, renaming is just s/RFS/whatever/ away - > how should hardware call this in your opinion? If by 'this' you mean the use of perfect filters or a large hash table to select the RX queue per flow, then 'flow steering'. But that is usually combined with the fall-back of a simple mapping from hash to queue ('RSS' or 'flow hashing') in case there is no specific queue selection yet, which I can see tun has. And you're specifying multiple transmit queues too. If you want a name for the whole set of features involved, I don't see any better name than 'multiqueue'/'MQ'. If you want a name for this specific flow steering mechanism, add some distinguishing adjective(s) like 'virtual' or 'automatic'. Ben.
Ben Hutchings <bhutchings@solarflare.com> writes: > If you want a name for the whole set of > features involved, I don't see any better name than 'multiqueue'/'MQ'. OK, let's go back to multiqueue then, and perhaps refer to the current receive steering as 'automatic'. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 83f2771..119925c 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -59,6 +59,7 @@ \author -608949062 "Rusty Russell,,," \author -385801441 "Cornelia Huck" cornelia.huck@de.ibm.com \author 1531152142 "Paolo Bonzini,,," +\author 1986246365 "Michael S. Tsirkin" \end_header \begin_body @@ -4170,9 +4171,46 @@ ID 1 \end_layout \begin_layout Description -Virtqueues 0:receiveq. - 1:transmitq. - 2:controlq +Virtqueues 0:receiveq +\change_inserted 1986246365 1352742829 +0 +\change_unchanged +. + 1:transmitq +\change_inserted 1986246365 1352742832 +0 +\change_deleted 1986246365 1352742947 +. + +\change_inserted 1986246365 1352742952 +. + .... + 2N +\begin_inset Foot +status open + +\begin_layout Plain Layout + +\change_inserted 1986246365 1354531595 +N=0 if VIRTIO_NET_F_RFS is not negotiated, otherwise N is derived from +\emph on +max_virtqueue_pairs +\emph default + control +\emph on + +\emph default +field. + +\end_layout + +\end_inset + +: receivqN. + 2N+1: transmitqN. + 2N+ +\change_unchanged +2:controlq \begin_inset Foot status open @@ -4343,6 +4381,16 @@ VIRTIO_NET_F_CTRL_VLAN \begin_layout Description VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets. +\change_inserted 1986246365 1352742767 + +\end_layout + +\begin_layout Description + +\change_inserted 1986246365 1352742808 +VIRTIO_NET_F_RFS(22) Device supports Receive Flow Steering. +\change_unchanged + \end_layout \end_deeper @@ -4355,11 +4403,45 @@ configuration \begin_inset space ~ \end_inset -layout Two configuration fields are currently defined. +layout +\change_deleted 1986246365 1352743300 +Two +\change_inserted 1986246365 1354531413 +Three +\change_unchanged + configuration fields are currently defined. The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC is set), and the status field only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN K_UP and VIRTIO_NET_S_ANNOUNCE. + +\change_inserted 1986246365 1354531470 + The following read-only field, +\emph on +max_virtqueue_pairs +\emph default + only exists if VIRTIO_NET_F_RFS is set. + This field specifies the maximum number of each of transmit and receive + virtqueues (receiveq0..receiveq +\emph on +N +\emph default + and transmitq0..transmitq +\emph on +N +\emph default + respectively; +\emph on +N +\emph default += +\emph on +max_virtqueue_pairs - 1 +\emph default +) that can be configured once VIRTIO_NET_F_RFS is negotiated. + Legal values for this field are 1 to 8000h. + +\change_unchanged \begin_inset listings inline false @@ -4392,6 +4474,17 @@ struct virtio_net_config { \begin_layout Plain Layout u16 status; +\change_inserted 1986246365 1354531427 + +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1354531437 + + u16 max_virtqueue_pairs; +\change_unchanged + \end_layout \begin_layout Plain Layout @@ -4410,7 +4503,24 @@ Device Initialization \begin_layout Enumerate The initialization routine should identify the receive and transmission - virtqueues. + virtqueues +\change_inserted 1986246365 1352744077 +, up to N+1 of each kind +\change_unchanged +. + +\change_inserted 1986246365 1352743942 + If VIRTIO_NET_F_RFS feature bit is negotiated, +\emph on +N=max_virtqueue_pairs-1 +\emph default +, otherwise identify +\emph on +N=0 +\emph default +. +\change_unchanged + \end_layout \begin_layout Enumerate @@ -4452,10 +4562,33 @@ status config field. Otherwise, the link should be assumed active. +\change_inserted 1986246365 1354529306 + \end_layout \begin_layout Enumerate -The receive virtqueue should be filled with receive buffers. + +\change_inserted 1986246365 1354531717 +Only receiveq0, transmitq0 and controlq are used by default. + To use more queues driver must negotiate the VIRTIO_NET_F_RFS feature; + initialize up to +\emph on +max_virtqueue_pairs +\emph default + of each of transmit and receive queues; execute_VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SE +T command specifying the number of the transmit and receive queues that + is going to be used and wait until the device consumes the controlq buffer + and acks this command. +\change_unchanged + +\end_layout + +\begin_layout Enumerate +The receive virtqueue +\change_inserted 1986246365 1352743953 +s +\change_unchanged + should be filled with receive buffers. This is described in detail below in \begin_inset Quotes eld \end_inset @@ -4550,8 +4683,15 @@ Device Operation \end_layout \begin_layout Standard -Packets are transmitted by placing them in the transmitq, and buffers for - incoming packets are placed in the receiveq. +Packets are transmitted by placing them in the transmitq +\change_inserted 1986246365 1353593685 +0..transmitqN +\change_unchanged +, and buffers for incoming packets are placed in the receiveq +\change_inserted 1986246365 1353593692 +0..receiveqN +\change_unchanged +. In each case, the packet itself is preceeded by a header: \end_layout @@ -4861,6 +5001,17 @@ If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at least the struct virtio_net_hdr \family default . +\change_inserted 1986246365 1353594518 + +\end_layout + +\begin_layout Standard + +\change_inserted 1986246365 1353594638 +If VIRTIO_NET_F_RFS is negotiated, each of receiveq0...receiveqN that will + be used should be populated with receive buffers. +\change_unchanged + \end_layout \begin_layout Subsection* @@ -5293,8 +5444,149 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq. \end_layout -\begin_layout Enumerate +\begin_layout Subsection* + +\change_inserted 1986246365 1353593879 +Packet Receive Flow Steering +\end_layout + +\begin_layout Standard + +\change_inserted 1986246365 1354528882 +If the driver negotiates the VIRTIO_NET_F_RFS feature bit (depends on VIRTIO_NET +_F_CTRL_VQ), it can transmit outgoing packets on one of the multiple transmitq0..t +ransmitqN and ask the device to queue incoming packets into one the multiple + receiveq0..receiveqN depending on the packet flow. +\change_unchanged + +\end_layout + +\begin_layout Standard + +\change_inserted 1986246365 1353594292 +\begin_inset listings +inline false +status open + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594178 + +struct virtio_net_ctrl_rfs { +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594212 + + u16 virtqueue_pairs; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594172 + +}; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594172 + +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594263 + +#define VIRTIO_NET_CTRL_RFS 1 +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594273 + + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET 0 +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594273 + + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN 1 +\end_layout + +\begin_layout Plain Layout + +\change_inserted 1986246365 1353594273 + + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX 0x8000 +\end_layout + +\end_inset + + +\end_layout + +\begin_layout Standard + +\change_inserted 1986246365 1354531492 +RFS acceleration is disabled by default. + Driver enables RFS by executing the VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET command, + specifying the number of the transmit and receive queues that will be used; + thus transmitq0..transmitqn and receiveq0..receiveqn where +\emph on +n=virtqueue_pairs-1 +\emph default + will be used. + All these virtqueues must have been pre-configured in advance. + The range of legal values for the +\emph on + virtqueue_pairs +\emph off + field is between 1 and +\emph on +max_virtqueue_pairs +\emph off +. +\end_layout + +\begin_layout Standard + +\change_inserted 1986246365 1353595328 +Programming of the receive flow classificator is implicit. + Transmitting a packet of a specific flow on transmitqX will cause incoming + packets for this flow to be steered to receiveqX. + For uni-directional protocols, or where no packets have been transmitted + yet, device will steer a packet to a random queue out of the specified + receiveq0..receiveqn. +\change_unchanged + +\end_layout + +\begin_layout Standard + +\change_inserted 1986246365 1354528710 +RFS acceleration is disabled by setting +\emph on +virtqueue_pairs = 1 +\emph default + (this is the default). + After the command is consumed by the device, the device will not steer + new packets on virtqueues receveq1..receiveqN (i.e. + other than receiveq0) nor read from transmitq1..transmitqN (i.e. + other than transmitq0); accordingly, driver should not transmit new packets + on virtqueues other than transmitq0. +\change_unchanged + +\end_layout + +\begin_layout Standard + +\change_deleted 1986246365 1353593873 . + +\change_unchanged \end_layout