Message ID | 20200516012948.3173993-1-vinicius.gomes@intel.com |
---|---|
Headers | show |
Series | ethtool: Add support for frame preemption | expand |
On Fri, May 15, 2020 at 06:29:44PM -0700, Vinicius Costa Gomes wrote: > Hi, > > This series adds support for configuring frame preemption, as defined > by IEEE 802.1Q-2018 (previously IEEE 802.1Qbu) and IEEE 802.3br. > > Frame preemption allows a packet from a higher priority queue marked > as "express" to preempt a packet from lower priority queue marked as > "preemptible". The idea is that this can help reduce the latency for > higher priority traffic. > > Previously, the proposed interface for configuring these features was > using the qdisc layer. But as this is very hardware dependent and all > that qdisc did was pass the information to the driver, it makes sense > to have this in ethtool. > > One example, for retrieving and setting the configuration: > > $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 > Frame preemption settings for enp3s0: > support: supported IMHO we don't need a special bool for this. IIUC this is not a state flag that would change value for a particular device; either the device supports the feature or it does not. If it does not, the ethtool_ops callbacks would return -EOPNOTSUPP (or would not even exist if the driver has no support) and ethtool would say so. > active: active > supported queues: 0xf > supported queues: 0xe > minimum fragment size: 68 > > > $ ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 preemptible-queues-mask 0xe > > This is a RFC because I wanted to have feedback on some points: > > - The parameters added are enough for the hardware I have, is it > enough in general? > > - even with the ethtool via netlink effort, I chose to keep the > ioctl() way, in case someone wants to backport this to an older > kernel, is there a problem with this? I would prefer not extending ioctl interface with new features, with obvious exceptions like adding new link modes or so. Not only because having new features only available through netlink will motivate authors of userspace tools to support netlink but mostly because the lack of flexibility and extensibility of ioctl interface inevitably leads to compromises you wouldn't have to do if you only implement netlink requests. One example I can see is the use of u32 for queue bitmaps. Perhaps you don't expect this feature to be supported on devices with more than 32 queues (and I don't have enough expertise to tell if it's justified at the moment) but can you be sure it will be the case in 10 or 20 years? As long as these hardcoded u32 bitmaps are only part of internal kernel API (ethtool_ops), extending the support for bigger devices will mean some code churn (possibly large if many drivers implement the feature) but it's something that can be done. But if you have this limit in userspace API, you are in a much bigger trouble. The same can be said for adding new attributes - easy with netlink but with ioctl you never know if those reserved fields will suffice. > > - Some space for bikeshedding the names and location (for example, > does it make sense for these settings to be per-queue?), as I am > not quite happy with them, one example, is the use of preemptible > vs. preemptable; > > > About the patches, should be quite straightforward: > > Patch 1, adds the ETHTOOL_GFP and ETHOOL_SFP commands and the > associated data structures; > > Patch 2, adds the ETHTOOL_MSG_PREEMPT_GET and ETHTOOL_MSG_PREEMPT_SET > netlink messages and the associated attributes; I didn't look too deeply but one thing I noticed is that setting the parameters using ioctl() does not trigger netlink notification. If we decide to implement ioctl support (and I'm not a fan of that), the notifications should be sent even when ioctl is used. Michal
From: Vinicius Costa Gomes <vinicius.gomes@intel.com> Date: Fri, 15 May 2020 18:29:44 -0700 > This series adds support for configuring frame preemption, as defined > by IEEE 802.1Q-2018 (previously IEEE 802.1Qbu) and IEEE 802.3br. > > Frame preemption allows a packet from a higher priority queue marked > as "express" to preempt a packet from lower priority queue marked as > "preemptible". The idea is that this can help reduce the latency for > higher priority traffic. Why do we need yet another name for something which is just basic traffic prioritization and why is this configured via ethtool instead of the "traffic classifier" which is where all of this stuff should be done?
Hi David, On Sat, 16 May 2020 at 23:39, David Miller <davem@davemloft.net> wrote: > > From: Vinicius Costa Gomes <vinicius.gomes@intel.com> > Date: Fri, 15 May 2020 18:29:44 -0700 > > > This series adds support for configuring frame preemption, as defined > > by IEEE 802.1Q-2018 (previously IEEE 802.1Qbu) and IEEE 802.3br. > > > > Frame preemption allows a packet from a higher priority queue marked > > as "express" to preempt a packet from lower priority queue marked as > > "preemptible". The idea is that this can help reduce the latency for > > higher priority traffic. > > Why do we need yet another name for something which is just basic > traffic prioritization and why is this configured via ethtool instead > of the "traffic classifier" which is where all of this stuff should > be done? It is not 'just another name for basic traffic prioritization'. With basic traffic prioritization only, a high-priority packet still has to wait in the egress queue of a switch until a (potentially large) low-priority packet has finished transmission and has freed the medium. Frame preemption changes that. Actually it requires hardware support on both ends, because the way it is transmitted on the wire is not compatible with regular Ethernet frames (it uses a special Start Of Frame Delimiter to encode preemptible traffic). I know we are talking about ridiculously low improvements in latency, but the background is that Ethernet is making its way into the industrial and process control fields, and for that type of application you need to ensure minimally low and maximally consistent end-to-end latencies. Frame preemption helps with the "minimally low" part. The way it works is that typically there are 2 MACs per interface (1 is "express" - equivalent to the legacy type, and the other is "preemptible" - the new type) and this new IEEE 802.1Q clause thing allows some arbitration/preemption event to happen between the two MACs. When a preemption event happens, the preemptible MAC quickly wraps up and aborts the frame it's currently transmitting (to come back and continue later), making room for the express MAC to do its thing because it's time-constrained. Then, after the express MAC finishes, the preemptible MAC continues with the rest of the frame fragments from where it was preempted. As to why this doesn't go to tc but to ethtool: why would it go to tc? You can't emulate such behavior in software. It's a hardware feature. You only* (more or less) need to specify which traffic classes on a port go to the preemptible MAC and which go to the express MAC. We discussed about the possibility of extending tc-taprio to configure frame preemption through it, but the consensus was that somebody might want to use frame preemption as a standalone feature, without scheduled traffic, and that inventing another qdisc for frame preemption alone would be too much of a formalism. (I hope I didn't omit anything important from the previous discussion on the topic) Thanks, -Vladimir
From: Vladimir Oltean <olteanv@gmail.com> Date: Sun, 17 May 2020 00:03:39 +0300 > As to why this doesn't go to tc but to ethtool: why would it go to tc? Maybe you can't %100 duplicate the on-the-wire special format and whatever, but the queueing behavior ABSOLUTELY you can emulate in software. And then you have the proper hooks added for HW offload which can do the on-the-wire stuff. That's how we do these things, not with bolted on ethtool stuff.
On Sun, 17 May 2020 at 01:19, David Miller <davem@davemloft.net> wrote: > > From: Vladimir Oltean <olteanv@gmail.com> > Date: Sun, 17 May 2020 00:03:39 +0300 > > > As to why this doesn't go to tc but to ethtool: why would it go to tc? > > Maybe you can't %100 duplicate the on-the-wire special format and > whatever, but the queueing behavior ABSOLUTELY you can emulate in > software. > > And then you have the proper hooks added for HW offload which can > do the on-the-wire stuff. > > That's how we do these things, not with bolted on ethtool stuff. When talking about frame preemption in the way that it is defined in 802.1Qbu and 802.3br, it says or assumes nothing about queuing. It describes the fact that there are 2 MACs per interface, 1 MAC dealing with some traffic classes and the other dealing with the rest. Queuing sits completely above the layer where frame preemption happens: - The queuing layer does not care if packets go to a traffic class that is serviced by a preemptible MAC or an express MAC - The MAC does not care by what means have packets been classified to one traffic class or another. I have no idea what emulation of queuing behavior you are talking about. Frame preemption is a MAC hardware feature. Your NIC supports it or it doesn't. Which means it can talk to a link partner that supports frame preemption, or it can't. Thanks, -Vladimir
On Sun, May 17, 2020 at 01:51:19PM +0300, Vladimir Oltean wrote: > On Sun, 17 May 2020 at 01:19, David Miller <davem@davemloft.net> wrote: > > > > From: Vladimir Oltean <olteanv@gmail.com> > > Date: Sun, 17 May 2020 00:03:39 +0300 > > > > > As to why this doesn't go to tc but to ethtool: why would it go to tc? > > > > Maybe you can't %100 duplicate the on-the-wire special format and > > whatever, but the queueing behavior ABSOLUTELY you can emulate in > > software. > > > > And then you have the proper hooks added for HW offload which can > > do the on-the-wire stuff. > > > > That's how we do these things, not with bolted on ethtool stuff. > > When talking about frame preemption in the way that it is defined in > 802.1Qbu and 802.3br, it says or assumes nothing about queuing. It > describes the fact that there are 2 MACs per interface, 1 MAC dealing > with some traffic classes and the other dealing with the rest. I did not follow the previous discussion, but i assume you talked about modelling it in Linux as two MACs? Why was that approach not followed? Andrew
Hi Andrew, On Sun, 17 May 2020 at 21:45, Andrew Lunn <andrew@lunn.ch> wrote: > > On Sun, May 17, 2020 at 01:51:19PM +0300, Vladimir Oltean wrote: > > On Sun, 17 May 2020 at 01:19, David Miller <davem@davemloft.net> wrote: > > > > > > From: Vladimir Oltean <olteanv@gmail.com> > > > Date: Sun, 17 May 2020 00:03:39 +0300 > > > > > > > As to why this doesn't go to tc but to ethtool: why would it go to tc? > > > > > > Maybe you can't %100 duplicate the on-the-wire special format and > > > whatever, but the queueing behavior ABSOLUTELY you can emulate in > > > software. > > > > > > And then you have the proper hooks added for HW offload which can > > > do the on-the-wire stuff. > > > > > > That's how we do these things, not with bolted on ethtool stuff. > > > > When talking about frame preemption in the way that it is defined in > > 802.1Qbu and 802.3br, it says or assumes nothing about queuing. It > > describes the fact that there are 2 MACs per interface, 1 MAC dealing > > with some traffic classes and the other dealing with the rest. > > I did not follow the previous discussion, but i assume you talked > about modelling it in Linux as two MACs? Why was that approach not > followed? > > Andrew I don't recall having discussed that option, but I guess that you can see why, if I quote this portion from IEEE 802.1Q-2018: 6.7.1 Support of the ISS by IEEE Std 802.3 (Ethernet) Mapping between M_CONTROL.requests/indications and IEEE 802.3 MA_CONTROL.requests/indications is performed as specified in IEEE Std 802.1AC. If the MAC supports IEEE 802.3br (TM) Interspersing Express Traffic, then PFC M_CONTROL.requests are mapped onto the MAC control interface associated with the express MAC (eMAC). If frame preemption (6.7.2) is supported on a Port, then the IEEE 802.3 MAC provides the following two MAC service interfaces (99.4 of IEEE Std 802.3br (TM)-2016 [B21]): a) A preemptible MAC (pMAC) service interface b) An express MAC (eMAC) service interface For priority values that are identified in the frame preemption status table (6.7.2) as preemptible, frames that are selected for transmission shall be transmitted using the pMAC service instance, and for priority values that are identified in the frame preemption status table as express, frames that are selected for transmission shall be transmitted using the eMAC service instance. _In all other respects, the Port behaves as if it is supported by a single MAC service interface_ (emphasis mine). In particular, all frames received by the Port are treated as if they were received on a single MAC service interface regardless of whether they were received on the eMAC service interface or the pMAC service interface, except with respect to frame preemption. Thanks, -Vladimir
Hi, David Miller <davem@davemloft.net> writes: > From: Vladimir Oltean <olteanv@gmail.com> > Date: Sun, 17 May 2020 00:03:39 +0300 > >> As to why this doesn't go to tc but to ethtool: why would it go to tc? > > Maybe you can't %100 duplicate the on-the-wire special format and > whatever, but the queueing behavior ABSOLUTELY you can emulate in > software. Just saying what Vladimir said in different words: the queueing behavior is already implemented in software, by mqprio or taprio, for example. That is to say, if we add frame preemption support to those qdiscs all they will do is pass the information to the driver, and that's it. They won't be able to use that information at all. The mental model I have for this feature is that is more similar to the segmentation offloads, energy efficient ethernet or auto-negotiation than it is to a traffic shaper like CBS. > > And then you have the proper hooks added for HW offload which can > do the on-the-wire stuff. > > That's how we do these things, not with bolted on ethtool stuff. Cheers,
Hi, Michal Kubecek <mkubecek@suse.cz> writes: > On Fri, May 15, 2020 at 06:29:44PM -0700, Vinicius Costa Gomes wrote: >> Hi, >> >> This series adds support for configuring frame preemption, as defined >> by IEEE 802.1Q-2018 (previously IEEE 802.1Qbu) and IEEE 802.3br. >> >> Frame preemption allows a packet from a higher priority queue marked >> as "express" to preempt a packet from lower priority queue marked as >> "preemptible". The idea is that this can help reduce the latency for >> higher priority traffic. >> >> Previously, the proposed interface for configuring these features was >> using the qdisc layer. But as this is very hardware dependent and all >> that qdisc did was pass the information to the driver, it makes sense >> to have this in ethtool. >> >> One example, for retrieving and setting the configuration: >> >> $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 >> Frame preemption settings for enp3s0: >> support: supported > > IMHO we don't need a special bool for this. IIUC this is not a state > flag that would change value for a particular device; either the device > supports the feature or it does not. If it does not, the ethtool_ops > callbacks would return -EOPNOTSUPP (or would not even exist if the > driver has no support) and ethtool would say so. (I know that the comments below only apply if "ethtool-way" is what's decided) Cool. Will remove the supported bit. > >> active: active >> supported queues: 0xf >> supported queues: 0xe >> minimum fragment size: 68 >> >> >> $ ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 preemptible-queues-mask 0xe >> >> This is a RFC because I wanted to have feedback on some points: >> >> - The parameters added are enough for the hardware I have, is it >> enough in general? >> >> - even with the ethtool via netlink effort, I chose to keep the >> ioctl() way, in case someone wants to backport this to an older >> kernel, is there a problem with this? > > I would prefer not extending ioctl interface with new features, with > obvious exceptions like adding new link modes or so. Not only because > having new features only available through netlink will motivate authors > of userspace tools to support netlink but mostly because the lack of > flexibility and extensibility of ioctl interface inevitably leads to > compromises you wouldn't have to do if you only implement netlink > requests. Agreed. Will send the next version with only the netlink interface, and let's see who complains. > > One example I can see is the use of u32 for queue bitmaps. Perhaps you > don't expect this feature to be supported on devices with more than 32 > queues (and I don't have enough expertise to tell if it's justified at > the moment) but can you be sure it will be the case in 10 or 20 years? > As long as these hardcoded u32 bitmaps are only part of internal kernel > API (ethtool_ops), extending the support for bigger devices will mean > some code churn (possibly large if many drivers implement the feature) > but it's something that can be done. But if you have this limit in > userspace API, you are in a much bigger trouble. The same can be said > for adding new attributes - easy with netlink but with ioctl you never > know if those reserved fields will suffice. A bit of background for this decision (using u32), frame preemption has dimishing returns in relation to link speeds, my gut feeling is that for links faster than 2.5G it stops making sense, at least in Linux, the measurement noise will hide any latency improvement brought by frame preemption. And I don't see many 2.5G NICs supporting more than 32 queues. But I agree that keeping the interface future proof is better. Will change to expose the queues configuration as bitset. > >> >> - Some space for bikeshedding the names and location (for example, >> does it make sense for these settings to be per-queue?), as I am >> not quite happy with them, one example, is the use of preemptible >> vs. preemptable; >> >> >> About the patches, should be quite straightforward: >> >> Patch 1, adds the ETHTOOL_GFP and ETHOOL_SFP commands and the >> associated data structures; >> >> Patch 2, adds the ETHTOOL_MSG_PREEMPT_GET and ETHTOOL_MSG_PREEMPT_SET >> netlink messages and the associated attributes; > > I didn't look too deeply but one thing I noticed is that setting the > parameters using ioctl() does not trigger netlink notification. If we > decide to implement ioctl support (and I'm not a fan of that), the > notifications should be sent even when ioctl is used. Oh, yeah, that's right. Nice catch. Cheers,
On Mon, 18 May 2020 12:05:04 -0700 Vinicius Costa Gomes wrote: > David Miller <davem@davemloft.net> writes: > > From: Vladimir Oltean <olteanv@gmail.com> > > Date: Sun, 17 May 2020 00:03:39 +0300 > > > >> As to why this doesn't go to tc but to ethtool: why would it go to tc? > > > > Maybe you can't %100 duplicate the on-the-wire special format and > > whatever, but the queueing behavior ABSOLUTELY you can emulate in > > software. > > Just saying what Vladimir said in different words: the queueing behavior > is already implemented in software, by mqprio or taprio, for example. > > That is to say, if we add frame preemption support to those qdiscs all > they will do is pass the information to the driver, and that's it. They > won't be able to use that information at all. > > The mental model I have for this feature is that is more similar to the > segmentation offloads, energy efficient ethernet or auto-negotiation > than it is to a traffic shaper like CBS. Please take a look at the example from the cover letter: $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 Frame preemption settings for enp3s0: support: supported active: active supported queues: 0xf supported queues: 0xe minimum fragment size: 68 Reading this I have no idea what 0xe is. I have to go and query TC API to see what priorities and queues that will be. Which IMHO is a strong argument that this information belongs there in the first place.
Hi, Jakub Kicinski <kuba@kernel.org> writes: > > Please take a look at the example from the cover letter: > > $ ethtool $ sudo ./ethtool --show-frame-preemption > enp3s0 Frame preemption settings for enp3s0: > support: supported > active: active > supported queues: 0xf > supported queues: 0xe > minimum fragment size: 68 > > Reading this I have no idea what 0xe is. I have to go and query TC API > to see what priorities and queues that will be. Which IMHO is a strong > argument that this information belongs there in the first place. That was the (only?) strong argument in favor of having frame preemption in the TC side when this was last discussed. We can have a hybrid solution, we can move the express/preemptible per queue map to mqprio/taprio/whatever. And have the more specific configuration knobs, minimum fragment size, etc, in ethtool. What do you think? Cheers,
On Mon, 18 May 2020 15:06:26 -0700 Vinicius Costa Gomes wrote: > Jakub Kicinski <kuba@kernel.org> writes: > > > > Please take a look at the example from the cover letter: > > > > $ ethtool $ sudo ./ethtool --show-frame-preemption > > enp3s0 Frame preemption settings for enp3s0: > > support: supported > > active: active > > supported queues: 0xf > > supported queues: 0xe > > minimum fragment size: 68 > > > > Reading this I have no idea what 0xe is. I have to go and query TC API > > to see what priorities and queues that will be. Which IMHO is a strong > > argument that this information belongs there in the first place. > > That was the (only?) strong argument in favor of having frame preemption > in the TC side when this was last discussed. > > We can have a hybrid solution, we can move the express/preemptible per > queue map to mqprio/taprio/whatever. And have the more specific > configuration knobs, minimum fragment size, etc, in ethtool. > > What do you think? Does the standard specify minimum fragment size as a global MAC setting?
Jakub Kicinski <kuba@kernel.org> writes: >> That was the (only?) strong argument in favor of having frame preemption >> in the TC side when this was last discussed. >> >> We can have a hybrid solution, we can move the express/preemptible per >> queue map to mqprio/taprio/whatever. And have the more specific >> configuration knobs, minimum fragment size, etc, in ethtool. >> >> What do you think? > > Does the standard specify minimum fragment size as a global MAC setting? Yes, it's a per-MAC setting, not per-queue.
On Mon, 18 May 2020 16:05:08 -0700 Vinicius Costa Gomes wrote: > Jakub Kicinski <kuba@kernel.org> writes: > >> That was the (only?) strong argument in favor of having frame preemption > >> in the TC side when this was last discussed. > >> > >> We can have a hybrid solution, we can move the express/preemptible per > >> queue map to mqprio/taprio/whatever. And have the more specific > >> configuration knobs, minimum fragment size, etc, in ethtool. > >> > >> What do you think? > > > > Does the standard specify minimum fragment size as a global MAC setting? > > Yes, it's a per-MAC setting, not per-queue. If standard defines it as per-MAC and we can reasonably expect vendors won't try to "add value" and make it per queue (unlikely here AFAIU), then for this part ethtool configuration seems okay to me.
Hi Vinicius, On 5/15/20 9:29 PM, Vinicius Costa Gomes wrote: > Hi, > > This series adds support for configuring frame preemption, as defined > by IEEE 802.1Q-2018 (previously IEEE 802.1Qbu) and IEEE 802.3br. > > Frame preemption allows a packet from a higher priority queue marked > as "express" to preempt a packet from lower priority queue marked as > "preemptible". The idea is that this can help reduce the latency for > higher priority traffic. > > Previously, the proposed interface for configuring these features was > using the qdisc layer. But as this is very hardware dependent and all > that qdisc did was pass the information to the driver, it makes sense > to have this in ethtool. > > One example, for retrieving and setting the configuration: > > $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 > Frame preemption settings for enp3s0: > support: supported > active: active > supported queues: 0xf I assume this is will be in sync with ethtool -L output which indicates how many tx h/w queues present? I mean if there are 8 h/w queues, supported queues will show 0xff. > supported queues: 0xe From the command below, it appears this is the preemptible queue mask. bit 0 is Q0, bit 1 Q1 and so forth. Right? In that case isn't it more clear to display preemptible queues : 0xef In the above Q7 is express queue and Q6-Q0 are preemptible. Also there is a handshake called verify that happens which initiated by the h/w to check the capability of peer. It looks like not all vendor's hardware supports it and good to have it displayed something like Verify supported/{not supported} If Verify is supported, FPE is enabled only if it succeeds. So may be good to show a status of Verify if it is supported something like Verify success/Failed > minimum fragment size: 68 > > > $ ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 preemptible-queues-mask 0xe > > This is a RFC because I wanted to have feedback on some points: > > - The parameters added are enough for the hardware I have, is it > enough in general? As described above, it would be good to add an optional parameter for verify ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 preemptible-queues-mask 0xe verify on > > - even with the ethtool via netlink effort, I chose to keep the > ioctl() way, in case someone wants to backport this to an older > kernel, is there a problem with this? > > - Some space for bikeshedding the names and location (for example, > does it make sense for these settings to be per-queue?), as I am > not quite happy with them, one example, is the use of preemptible > vs. preemptable; > > > About the patches, should be quite straightforward: > > Patch 1, adds the ETHTOOL_GFP and ETHOOL_SFP commands and the > associated data structures; > > Patch 2, adds the ETHTOOL_MSG_PREEMPT_GET and ETHTOOL_MSG_PREEMPT_SET > netlink messages and the associated attributes; > > Patch 3, is the example implementation for the igc driver, the catch > here is that frame preemption in igc is dependent on the TSN "mode" > being enabled; > > Patch 4, adds some registers that helped during implementation. > > Another thing is that because igc is still under development, to avoid > conflicts, I think it might be easier for the PATCH version of this > series to go through Jeff Kirsher's tree. > > Vinicius Costa Gomes (4): > ethtool: Add support for configuring frame preemption > ethtool: Add support for configuring frame preemption via netlink > igc: Add support for configuring frame preemption > igc: Add support for exposing frame preemption stats registers > > drivers/net/ethernet/intel/igc/igc.h | 3 + > drivers/net/ethernet/intel/igc/igc_defines.h | 6 + > drivers/net/ethernet/intel/igc/igc_ethtool.c | 77 ++++++++ > drivers/net/ethernet/intel/igc/igc_regs.h | 10 + > drivers/net/ethernet/intel/igc/igc_tsn.c | 46 ++++- > include/linux/ethtool.h | 6 + > include/uapi/linux/ethtool.h | 25 +++ > include/uapi/linux/ethtool_netlink.h | 19 ++ > net/ethtool/Makefile | 3 +- > net/ethtool/ioctl.c | 36 ++++ > net/ethtool/netlink.c | 15 ++ > net/ethtool/netlink.h | 2 + > net/ethtool/preempt.c | 181 +++++++++++++++++++ > 13 files changed, 423 insertions(+), 6 deletions(-) > create mode 100644 net/ethtool/preempt.c >
Murali Karicheri <m-karicheri2@ti.com> writes: >> $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 >> Frame preemption settings for enp3s0: >> support: supported >> active: active >> supported queues: 0xf > > I assume this is will be in sync with ethtool -L output which indicates > how many tx h/w queues present? I mean if there are 8 h/w queues, > supported queues will show 0xff. In this approach, the driver builds these bitmasks, so it's responsible to keep it consistent with the rest of the stuff that's exposed in ethtool. > >> supported queues: 0xe > From the command below, it appears this is the preemptible queue mask. > bit 0 is Q0, bit 1 Q1 and so forth. Right? In that case isn't it more > clear to display > preemptible queues : 0xef > > In the above Q7 is express queue and Q6-Q0 are preemptible. In my case, the controller I have here only has 4 queues, and Queue 0 is the highest priority one, and it's marked as express. > > Also there is a handshake called verify that happens which initiated > by the h/w to check the capability of peer. It looks like > not all vendor's hardware supports it and good to have it displayed > something like > > Verify supported/{not supported} > > If Verify is supported, FPE is enabled only if it succeeds. So may be > good to show a status of Verify if it is supported something like > Verify success/Failed > >> minimum fragment size: 68 >> >> >> $ ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 preemptible-queues-mask 0xe >> >> This is a RFC because I wanted to have feedback on some points: >> >> - The parameters added are enough for the hardware I have, is it >> enough in general? > > As described above, it would be good to add an optional parameter for > verify > > ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 > preemptible-queues-mask 0xe verify on > The hardware I have do not support this, but this makes sense. Cheers,
Hi Vinicius, On 5/19/20 11:32 AM, Vinicius Costa Gomes wrote: > Murali Karicheri <m-karicheri2@ti.com> writes: > >>> $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 >>> Frame preemption settings for enp3s0: >>> support: supported >>> active: active >>> supported queues: 0xf >> >> I assume this is will be in sync with ethtool -L output which indicates >> how many tx h/w queues present? I mean if there are 8 h/w queues, >> supported queues will show 0xff. > > In this approach, the driver builds these bitmasks, so it's responsible > to keep it consistent with the rest of the stuff that's exposed in > ethtool. Agree > >> >>> supported queues: 0xe >> From the command below, it appears this is the preemptible queue mask. >> bit 0 is Q0, bit 1 Q1 and so forth. Right? In that case isn't it more >> clear to display >> preemptible queues : 0xef >> >> In the above Q7 is express queue and Q6-Q0 are preemptible. > > In my case, the controller I have here only has 4 queues, and Queue 0 is > the highest priority one, and it's marked as express. > Ok. So it is up to the device driver to manage this. >> >> Also there is a handshake called verify that happens which initiated >> by the h/w to check the capability of peer. It looks like >> not all vendor's hardware supports it and good to have it displayed >> something like >> >> Verify supported/{not supported} >> >> If Verify is supported, FPE is enabled only if it succeeds. So may be >> good to show a status of Verify if it is supported something like >> Verify success/Failed >> >>> minimum fragment size: 68 >>> >>> >>> $ ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 preemptible-queues-mask 0xe >>> >>> This is a RFC because I wanted to have feedback on some points: >>> >>> - The parameters added are enough for the hardware I have, is it >>> enough in general? >> >> As described above, it would be good to add an optional parameter for >> verify >> >> ethtool --set-frame-preemption enp3s0 fp on min-frag-size 68 >> preemptible-queues-mask 0xe verify on >> > > The hardware I have do not support this, but this makes sense. I can work to support this for TI AM65 CPSW once a formal patch is available. AM65 CPSW supports Verify as an optional feature. H/w works also in manual mode where it is assumed that it is connected to a IET FPE capable partner, but can't do Verify. Thanks for sending the RFC. regards, Murali > > > Cheers, >
On 5/18/20 6:06 PM, Vinicius Costa Gomes wrote: > Hi, > > Jakub Kicinski <kuba@kernel.org> writes: >> >> Please take a look at the example from the cover letter: >> >> $ ethtool $ sudo ./ethtool --show-frame-preemption >> enp3s0 Frame preemption settings for enp3s0: >> support: supported >> active: active >> supported queues: 0xf >> supported queues: 0xe >> minimum fragment size: 68 >> >> Reading this I have no idea what 0xe is. I have to go and query TC API >> to see what priorities and queues that will be. Which IMHO is a strong >> argument that this information belongs there in the first place. > > That was the (only?) strong argument in favor of having frame preemption > in the TC side when this was last discussed. > > We can have a hybrid solution, we can move the express/preemptible per > queue map to mqprio/taprio/whatever. And have the more specific > configuration knobs, minimum fragment size, etc, in ethtool. Isn't this a pure h/w feature? FPE is implemented at L2 and involves fragments that are only seen by h/w and never at Linux network core unlike IP fragments and is transparent to network stack. However it enhances priority handling at h/w to the next level by pre-empting existing lower priority traffic to give way to express queue traffic and improve latency. So everything happens in h/w. So ethtool makes perfect sense here as it is a queue configuration. I agree with Vinicius and Vladmir to support this in ethtool instead of TC. Murali > > What do you think? > > > Cheers, >
Murali Karicheri <m-karicheri2@ti.com> writes: >> That was the (only?) strong argument in favor of having frame preemption >> in the TC side when this was last discussed. >> >> We can have a hybrid solution, we can move the express/preemptible per >> queue map to mqprio/taprio/whatever. And have the more specific >> configuration knobs, minimum fragment size, etc, in ethtool. > > Isn't this a pure h/w feature? FPE is implemented at L2 and involves > fragments that are only seen by h/w and never at Linux network core > unlike IP fragments and is transparent to network stack. However it > enhances priority handling at h/w to the next level by pre-empting > existing lower priority traffic to give way to express queue traffic > and improve latency. So everything happens in h/w. So ethtool makes > perfect sense here as it is a queue configuration. I agree with Vinicius > and Vladmir to support this in ethtool instead of TC. The way I see, the issue that Jakub is pointing here is more of usability/understandability. By having the express/preemptible queue mapping in TC, we have the configuration near where the "priority to queue" mapping happens. That improves the ease of configuration, makes it easier to spot mistakes, that kind of thing, all of which are a big plus. Right now, I am seeing this hybrid approach as a good compromise, we have the queue settings near to where the kinds of traffic are mapped to queues, and we have the rest of the hardware configuration in ethtool. Cheers,
Hi, Quoting Vinicius Costa Gomes (2020-05-15 18:29:44) > One example, for retrieving and setting the configuration: > > $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 > Frame preemption settings for enp3s0: > support: supported > active: active IIUC the code in patch 2, 'active' is the actual configuration knob that enables or disables the FP functionality on the NIC. That sounded a bit confusing to me since the spec uses the term 'active' to indicate FP is currently enabled at both ends, and it is a read-only information (see 12.30.1.4 from IEEE 802.1Q-2018). Maybe if we called this 'enabled' it would be more clear. > supported queues: 0xf > supported queues: 0xe > minimum fragment size: 68 I'm assuming this is the configuration knob for the minimal non-final fragment defined in 802.3br. My understanding from the specs is that this value must be a multiple from 64 and cannot assume arbitrary values like shown here. See 99.4.7.3 from IEEE 802.3 and Note 1 in S.2 from IEEE 802.1Q. In the previous discussion about FP, we had this as a multiplier factor, not absolute value. Regards, Andre
Hi, Quoting Vinicius Costa Gomes (2020-05-18 12:34:22) > Hi, > > Michal Kubecek <mkubecek@suse.cz> writes: > > > On Fri, May 15, 2020 at 06:29:44PM -0700, Vinicius Costa Gomes wrote: > >> Hi, > >> > >> This series adds support for configuring frame preemption, as defined > >> by IEEE 802.1Q-2018 (previously IEEE 802.1Qbu) and IEEE 802.3br. > >> > >> Frame preemption allows a packet from a higher priority queue marked > >> as "express" to preempt a packet from lower priority queue marked as > >> "preemptible". The idea is that this can help reduce the latency for > >> higher priority traffic. > >> > >> Previously, the proposed interface for configuring these features was > >> using the qdisc layer. But as this is very hardware dependent and all > >> that qdisc did was pass the information to the driver, it makes sense > >> to have this in ethtool. > >> > >> One example, for retrieving and setting the configuration: > >> > >> $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 > >> Frame preemption settings for enp3s0: > >> support: supported > > > > IMHO we don't need a special bool for this. IIUC this is not a state > > flag that would change value for a particular device; either the device > > supports the feature or it does not. If it does not, the ethtool_ops > > callbacks would return -EOPNOTSUPP (or would not even exist if the > > driver has no support) and ethtool would say so. > > (I know that the comments below only apply if "ethtool-way" is what's > decided) > > Cool. Will remove the supported bit. > > > > >> active: active > >> supported queues: 0xf Following the same rationale, is this 'supported queue' going aways as well? - Andre
Andre Guedes <andre.guedes@intel.com> writes: >> > >> >> active: active >> >> supported queues: 0xf > > Following the same rationale, is this 'supported queue' going aways as well? > I think so, with good error messages, when trying to set an express-only queue as preemptible, no need to expose this information.
Andre Guedes <andre.guedes@intel.com> writes: > Hi, > > Quoting Vinicius Costa Gomes (2020-05-15 18:29:44) >> One example, for retrieving and setting the configuration: >> >> $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 >> Frame preemption settings for enp3s0: >> support: supported >> active: active > > IIUC the code in patch 2, 'active' is the actual configuration knob that > enables or disables the FP functionality on the NIC. > > That sounded a bit confusing to me since the spec uses the term 'active' to > indicate FP is currently enabled at both ends, and it is a read-only > information (see 12.30.1.4 from IEEE 802.1Q-2018). Maybe if we called this > 'enabled' it would be more clear. Good point. Will rename this to "enabled". > >> supported queues: 0xf >> supported queues: 0xe >> minimum fragment size: 68 > > I'm assuming this is the configuration knob for the minimal non-final fragment > defined in 802.3br. > > My understanding from the specs is that this value must be a multiple from 64 > and cannot assume arbitrary values like shown here. See 99.4.7.3 from IEEE > 802.3 and Note 1 in S.2 from IEEE 802.1Q. In the previous discussion about FP, > we had this as a multiplier factor, not absolute value. I thought that exposing this as "(1 + N)*64" (with 0 <= N <= 3) that it was more related to what's exposed via LLDP than the actual capabilities of the hardware. And for the hardware I have actually the values supported are: (1 + N)*64 + 4 (for N = 0, 1, 2, 3). So I thought I was better to let the driver decide what values are acceptable. This is a good question for people working with other hardware.
Hi Vinicius, On 5/19/20 7:37 PM, Vinicius Costa Gomes wrote: > Andre Guedes <andre.guedes@intel.com> writes: > >> Hi, >> >> Quoting Vinicius Costa Gomes (2020-05-15 18:29:44) >>> One example, for retrieving and setting the configuration: >>> >>> $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 >>> Frame preemption settings for enp3s0: >>> support: supported >>> active: active >> >> IIUC the code in patch 2, 'active' is the actual configuration knob that >> enables or disables the FP functionality on the NIC. >> >> That sounded a bit confusing to me since the spec uses the term 'active' to >> indicate FP is currently enabled at both ends, and it is a read-only >> information (see 12.30.1.4 from IEEE 802.1Q-2018). Maybe if we called this >> 'enabled' it would be more clear. > > Good point. Will rename this to "enabled". > >> >>> supported queues: 0xf >>> supported queues: 0xe >>> minimum fragment size: 68 >> >> I'm assuming this is the configuration knob for the minimal non-final fragment >> defined in 802.3br. >> >> My understanding from the specs is that this value must be a multiple from 64 >> and cannot assume arbitrary values like shown here. See 99.4.7.3 from IEEE >> 802.3 and Note 1 in S.2 from IEEE 802.1Q. In the previous discussion about FP, >> we had this as a multiplier factor, not absolute value. > > I thought that exposing this as "(1 + N)*64" (with 0 <= N <= 3) that it > was more related to what's exposed via LLDP than the actual capabilities > of the hardware. And for the hardware I have actually the values > supported are: (1 + N)*64 + 4 (for N = 0, 1, 2, 3). > > So I thought I was better to let the driver decide what values are > acceptable. > > This is a good question for people working with other hardware. > > AM65 CPSW supports this as a multiple of 64. Since this ethtool configuration is for the hardware, might want to make it as a multiple of 64. Murali
The 05/19/2020 16:37, Vinicius Costa Gomes wrote: > > Andre Guedes <andre.guedes@intel.com> writes: > > > Hi, > > > > Quoting Vinicius Costa Gomes (2020-05-15 18:29:44) > >> One example, for retrieving and setting the configuration: > >> > >> $ ethtool $ sudo ./ethtool --show-frame-preemption enp3s0 > >> Frame preemption settings for enp3s0: > >> support: supported > >> active: active > > > > IIUC the code in patch 2, 'active' is the actual configuration knob that > > enables or disables the FP functionality on the NIC. > > > > That sounded a bit confusing to me since the spec uses the term 'active' to > > indicate FP is currently enabled at both ends, and it is a read-only > > information (see 12.30.1.4 from IEEE 802.1Q-2018). Maybe if we called this > > 'enabled' it would be more clear. > > Good point. Will rename this to "enabled". > > > > >> supported queues: 0xf > >> supported queues: 0xe > >> minimum fragment size: 68 > > > > I'm assuming this is the configuration knob for the minimal non-final fragment > > defined in 802.3br. > > > > My understanding from the specs is that this value must be a multiple from 64 > > and cannot assume arbitrary values like shown here. See 99.4.7.3 from IEEE > > 802.3 and Note 1 in S.2 from IEEE 802.1Q. In the previous discussion about FP, > > we had this as a multiplier factor, not absolute value. > > I thought that exposing this as "(1 + N)*64" (with 0 <= N <= 3) that it > was more related to what's exposed via LLDP than the actual capabilities > of the hardware. And for the hardware I have actually the values > supported are: (1 + N)*64 + 4 (for N = 0, 1, 2, 3). > > So I thought I was better to let the driver decide what values are > acceptable. > > This is a good question for people working with other hardware. > I think it's most intuitive to use the values for AddFragSize as described in 802.3br (N = 0, 1, 2, 3). You will anyway have to use one of these values when you want to expose the requirements of your receiver through LLDP. > > -- > Vinicius
Joergen Andreasen <joergen.andreasen@microchip.com> writes: >> So I thought I was better to let the driver decide what values are >> acceptable. >> >> This is a good question for people working with other hardware. >> > > I think it's most intuitive to use the values for AddFragSize as described in > 802.3br (N = 0, 1, 2, 3). > You will anyway have to use one of these values when you want to expose the > requirements of your receiver through LLDP. > Thanks. Seems that keeping this value restricted to multiples of 64 is the way to go. Will fix for the next version of the series. Cheers,
Hi, Quoting Jakub Kicinski (2020-05-18 16:09:06) > On Mon, 18 May 2020 16:05:08 -0700 Vinicius Costa Gomes wrote: > > Jakub Kicinski <kuba@kernel.org> writes: > > >> That was the (only?) strong argument in favor of having frame preemption > > >> in the TC side when this was last discussed. > > >> > > >> We can have a hybrid solution, we can move the express/preemptible per > > >> queue map to mqprio/taprio/whatever. And have the more specific > > >> configuration knobs, minimum fragment size, etc, in ethtool. > > >> > > >> What do you think? > > > > > > Does the standard specify minimum fragment size as a global MAC setting? > > > > Yes, it's a per-MAC setting, not per-queue. > > If standard defines it as per-MAC and we can reasonably expect vendors > won't try to "add value" and make it per queue (unlikely here AFAIU), > then for this part ethtool configuration seems okay to me. Before we move forward with this hybrid approach, let's recap a few points that we discussed in the previous thread and make sure it addresses them properly. 1) Frame Preemption (FP) can be enabled without EST, as described in IEEE 802.1Q. In this case, the user has to create a dummy EST schedule in taprio just to be able to enable FP, which doesn't look natural. 2) Mpqrio already looks overloaded. Besides mapping traffic classes into hardware queues, it also supports different modes and traffic shaping. Do we want to add yet another setting to it? Regards, Andre
Andre Guedes <andre.guedes@intel.com> writes: >> If standard defines it as per-MAC and we can reasonably expect vendors >> won't try to "add value" and make it per queue (unlikely here AFAIU), >> then for this part ethtool configuration seems okay to me. > > Before we move forward with this hybrid approach, let's recap a few points that > we discussed in the previous thread and make sure it addresses them > properly. Thanks for bringing them up. > > 1) Frame Preemption (FP) can be enabled without EST, as described in IEEE > 802.1Q. In this case, the user has to create a dummy EST schedule in taprio > just to be able to enable FP, which doesn't look natural. What I meant by "dummy" schedule, is to configure taprio without specifying any "sched-entry". And since we have support for adding schedules during runtime, this might be even useful in general. > > 2) Mpqrio already looks overloaded. Besides mapping traffic classes into > hardware queues, it also supports different modes and traffic shaping. Do we > want to add yet another setting to it? I also don't see this as a problem. The parameters that mqprio has carry a lot of information, but the number of them is not that big. Cheers,