Message ID | 20170901012625.14838-1-vinicius.gomes@intel.com |
---|---|
Headers | show |
Series | TSN: Add qdisc-based config interfaces for traffic shapers | expand |
I happy to see this posted. At first glance, it seems like a step in the right direction. On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: > * Time-aware shaper (802.1Qbv): ... > S 0x01 300 > S 0x03 500 > > This means that there are two intervals, the first will have the gate > for traffic class 0 open for 300 nanoseconds, the second will have > both traffic classes open for 500 nanoseconds. The i210 doesn't support this in HW, or does it? > * Frame Preemption (802.1Qbu): > > To control even further the latency, it may prove useful to signal which > traffic classes are marked as preemptable. For that, 'taprio' provides the > preemption command so you set each traffic class as preemptable or not: > > $ tc qdisc (...) \ > preemption 0 1 1 1 Neither can the i210 preempt frames, or what am I missing? The timing of this RFC is good, as I am just finishing up an RFC that implements time-based transmit using the i210. I'll try and get that out ASAP. Thanks, Richard
Hi Richard, On 09/01/2017 06:03 AM, Richard Cochran wrote: > > I happy to see this posted. At first glance, it seems like a step in > the right direction. > > On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: >> * Time-aware shaper (802.1Qbv): > ... >> S 0x01 300 >> S 0x03 500 >> >> This means that there are two intervals, the first will have the gate >> for traffic class 0 open for 300 nanoseconds, the second will have >> both traffic classes open for 500 nanoseconds. > > The i210 doesn't support this in HW, or does it? No, it does not. i210 only provides support for a per-packet feature called LaunchTime that can be used control both the fetch and the transmission time of packets. > >> * Frame Preemption (802.1Qbu): >> >> To control even further the latency, it may prove useful to signal which >> traffic classes are marked as preemptable. For that, 'taprio' provides the >> preemption command so you set each traffic class as preemptable or not: >> >> $ tc qdisc (...) \ >> preemption 0 1 1 1 > > Neither can the i210 preempt frames, or what am I missing? No, it does not. But when we started working on the shapers we decided to look ahead and try to come up with interfaces that could cover beyond 802.1Qav. These are just some ideas we've been prototyping here together with the 'cbs' qdisc. > > The timing of this RFC is good, as I am just finishing up an RFC that > implements time-based transmit using the i210. I'll try and get that > out ASAP. Is it correct to assume you are referring to an interface for Launchtime here? Thanks, Jesus
On Fri, Sep 01, 2017 at 09:12:17AM -0700, Jesus Sanchez-Palencia wrote:
> Is it correct to assume you are referring to an interface for Launchtime here?
Yes.
Thanks,
Richard
On Fri, Sep 01, 2017 at 09:12:17AM -0700, Jesus Sanchez-Palencia wrote: > On 09/01/2017 06:03 AM, Richard Cochran wrote: > > The timing of this RFC is good, as I am just finishing up an RFC that > > implements time-based transmit using the i210. I'll try and get that > > out ASAP. I have an RFC series ready for net-next, but the the merge window just started. I'll post it when the window closes again... Thanks, Richard
On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: > Hi, > > This patchset is an RFC on a proposal of how the Traffic Control subsystem can > be used to offload the configuration of traffic shapers into network devices > that provide support for them in HW. Our goal here is to start upstreaming > support for features related to the Time-Sensitive Networking (TSN) set of > standards into the kernel. Nice to see that others are working on this as well! :) A short disclaimer; I'm pretty much anchored in the view "linux is the end-station in a TSN domain", is this your approach as well, or are you looking at this driver to be used in bridges as well? (because that will affect the comments on time-aware shaper and frame preemption) Yet another disclaimer; I am not a linux networking subsystem expert. Not by a long shot! There are black magic happening in the internals of the networking subsystem that I am not even aware of. So if something I say or ask does not make sense _at_all_, that's probably why.. I do know a tiny bit about TSN though, and I have been messing around with it for a little while, hence my comments below > As part of this work, we've assessed previous public discussions related to TSN > enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann > at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and > the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/). /me eyes Cc ;p > Overview > ======== > > Time-sensitive Networking (TSN) is a set of standards that aim to address > resources availability for providing bandwidth reservation and bounded latency > on Ethernet based LANs. The proposal described here aims to cover mainly what is > needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and > 802.1Qbu. > > The initial target of this work is the Intel i210 NIC, but other controllers' > datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and > the Synopsis DesignWare Ethernet QoS controller. NXP has a TSN aware chip on the i.MX7 sabre board as well </fyi> > Proposal > ======== > > Feature-wise, what is covered here are configuration interfaces for HW > implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper > (802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while > Qbv and Qbu must be configured per port, with the configuration covering all > queues. Given that these features are related to traffic shaping, and that the > traffic control subsystem already provides a queueing discipline that offloads > config into the device driver (i.e. mqprio), designing new qdiscs for the > specific purpose of offloading the config for each shaper seemed like a good > fit. just to be clear, you register sch_cbs as a subclass to mqprio, not as a root class? > For steering traffic into the correct queues, we use the socket option > SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues. > The qdisc mqprio is currently used in our tests. Right, fair enough, I'd prefer the TSN qdisc to be the root-device and rather have mqprio for high priority traffic and another for 'everything else'', but this would work too. This is not that relevant at this stage I guess :) > As for the shapers config interface: > > * CBS (802.1Qav) > > This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is: > $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \ > idleslope I So this confuses me a bit, why specify sendSlope? sendSlope = portTransmitRate - idleSlope and portTransmitRate is the speed of the MAC (which you get from the driver). Adding sendSlope here is just redundant I think. Also, does this mean that when you create the qdisc, you have locked the bandwidth for the scheduler? Meaning, if I later want to add another stream that requires more bandwidth, I have to close all active streams, reconfigure the qdisc and then restart? > Note that the parameters for this qdisc are the ones defined by the > 802.1Q-2014 spec, so no hardware specific functionality is exposed here. You do need to know if the link is brought up as 100 or 1000 though - which the driver already knows. > * Time-aware shaper (802.1Qbv): > > The idea we are currently exploring is to add a "time-aware", priority based > qdisc, that also exposes the Tx queues available and provides a mechanism for > mapping priority <-> traffic class <-> Tx queues in a similar fashion as > mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be: As far as I know, this is not supported by i210, and if time-aware shaping is enabled in the network - you'll be queued on a bridge until the window opens as time-aware shaping is enforced on the tx-port and not on rx. Is this required in this driver? > $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ > map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ > queues 0 1 2 3 \ > sched-file gates.sched [base-time <interval>] \ > [cycle-time <interval>] [extension-time <interval>] That was a lot of priorities! 802.1Q lists 8 priorities, where does these 16 come from? You map pri 0,1 to queue 2, pri 2 to queue 1 (Class B), pri 3 to queue 0 (class A) and everythign else to queue 3. This is what I would expect, except for the additional 8 priorities. > <file> is multi-line, with each line being of the following format: > <cmd> <gate mask> <interval in nanoseconds> > > Qbv only defines one <cmd>: "S" for 'SetGates' > > For example: > > S 0x01 300 > S 0x03 500 > > This means that there are two intervals, the first will have the gate > for traffic class 0 open for 300 nanoseconds, the second will have > both traffic classes open for 500 nanoseconds. Are you aware of any hw except dedicated switching stuff that supports this? (meant as "I'm curious and would like to know") > Additionally, an option to set just one entry of the gate control list will > also be provided by 'taprio': > > $ tc qdisc (...) \ > sched-row <row number> <cmd> <gate mask> <interval> \ > [base-time <interval>] [cycle-time <interval>] \ > [extension-time <interval>] > > > * Frame Preemption (802.1Qbu): So Frame preemption is nice, but my understanding of Qbu is that the real benefit is at the bridges and not in the endpoints. As jumbo-frames is explicitly disallowed in Qav, the maximum latency incurred by a frame in flight is 12us on a 1Gbps link. I am not sure if these 12us is what will be the main delay in your application. Or have I missed some crucial point here? > To control even further the latency, it may prove useful to signal which > traffic classes are marked as preemptable. For that, 'taprio' provides the > preemption command so you set each traffic class as preemptable or not: > > $ tc qdisc (...) \ > preemption 0 1 1 1 > > * Time-aware shaper + Preemption: > > As an example of how Qbv and Qbu can be used together, we may specify > both the schedule and the preempt-mask, and this way we may also > specify the Set-Gates-and-Hold and Set-Gates-and-Release commands as > specified in the Qbu spec: > > $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ > map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ > queues 0 1 2 3 \ > preemption 0 1 1 1 \ > sched-file preempt_gates.sched > > <file> is multi-line, with each line being of the following format: > <cmd> <gate mask> <interval in nanoseconds> > > For this case, two new commands are introduced: > > "H" for 'set gates and hold' > "R" for 'set gates and release' > > H 0x01 300 > R 0x03 500 So my understanding of all of this is that you configure the *total* bandwith for each class when you load the qdisc and then let userspace handle the rest. Is this correct? In my view, it would be nice if the qdisc had some notion about streams so that you could create a stream, feed frames to it and let the driver pace them out. (The fewer you queue, the shorter the delay). This will also allow you to enforce per-stream bandwidth restrictions. I don't see how you can do this here unless you want to do this in userspace. Do you have any plans for adding support for multiplexing streams? If you have multiple streams, how do you enforce that one stream does not eat into the bandwidth of another stream? AFAIK, this is something the network must enforce, but I see no option of doing som here. > Testing this RFC > ================ > > For testing the patches of this RFC only, you can refer to the samples and > helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to > setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to > configure the HW shaper of the i210 controller: I will test it, feedback will be provided soon! :) Thanks! -Henrik > 1) Setup priorities to traffic classes to hardware queues mapping > $ tc qdisc replace dev enp3s0 parent root mqprio num_tc 3 \ > map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 > > 2) Check scheme. You want to get the inner qdiscs ID from the bottom up > $ tc -g class show dev enp3s0 > > Ex.: > +---(802a:3) mqprio > | +---(802a:6) mqprio > | +---(802a:7) mqprio > | > +---(802a:2) mqprio > | +---(802a:5) mqprio > | > +---(802a:1) mqprio > +---(802a:4) mqprio > > * Here '802a:4' is Tx Queue #0 and '802a:5' is Tx Queue #1. > > 3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and > for B is 10Mbps: > $ ./samples/tsn/calculate_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500 > > 4) Configure CBS for traffic class A (priority 3) as provided by the script: > $ tc qdisc replace dev enp3s0 parent 802a:4 cbs locredit -1470 \ > hicredit 30 sendslope -980000 idleslope 20000 > > 5) Configure CBS for traffic class B (priority 2): > $ tc qdisc replace dev enp3s0 parent 802a:5 cbs \ > locredit -1485 hicredit 31 sendslope -990000 idleslope 10000 > > 6) Run Listener, compiled from samples/tsn/listener.c > $ ./listener -i enp3s0 > > 7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c > $ ./talker -i enp3s0 -p 3 > > * The bandwidth displayed on the listener output at this stage should be very > close to the one configured for class A. > > 8) You can also run a Talker for class B (prio 2 here) > $ ./talker -i enp3s0 -p 2 > > * The bandwidth displayed on the listener output now should increase to very > close to the one configured for class A + class B. Because you grab both class A *and* B, or because B will eat what A does not use? -H > Authors > ======= > - Andre Guedes <andre.guedes@intel.com> > - Ivan Briano <ivan.briano@intel.com> > - Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> > - Vinicius Gomes <vinicius.gomes@intel.com> > > > Andre Guedes (2): > igb: Add support for CBS offload > samples/tsn: Add script for calculating CBS config > > Jesus Sanchez-Palencia (1): > sample: Add TSN Talker and Listener examples > > Vinicius Costa Gomes (2): > net/sched: Introduce the user API for the CBS shaper > net/sched: Introduce Credit Based Shaper (CBS) qdisc > > drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++ > drivers/net/ethernet/intel/igb/e1000_regs.h | 8 + > drivers/net/ethernet/intel/igb/igb.h | 6 + > drivers/net/ethernet/intel/igb/igb_main.c | 349 +++++++++++++++++++++++++ > include/linux/netdevice.h | 1 + > include/uapi/linux/pkt_sched.h | 29 ++ > net/sched/Kconfig | 11 + > net/sched/Makefile | 1 + > net/sched/sch_cbs.c | 286 ++++++++++++++++++++ > samples/tsn/calculate_cbs_params.py | 112 ++++++++ > samples/tsn/listener.c | 254 ++++++++++++++++++ > samples/tsn/talker.c | 136 ++++++++++ > 12 files changed, 1216 insertions(+) > create mode 100644 net/sched/sch_cbs.c > create mode 100755 samples/tsn/calculate_cbs_params.py > create mode 100644 samples/tsn/listener.c > create mode 100644 samples/tsn/talker.c > > -- > 2.14.1
On Thu, Sep 07, 2017 at 07:34:11AM +0200, Henrik Austad wrote: > Also, does this mean that when you create the qdisc, you have locked the > bandwidth for the scheduler? Meaning, if I later want to add another > stream that requires more bandwidth, I have to close all active streams, > reconfigure the qdisc and then restart? No, just allocate enough bandwidth to accomodate all of the expected streams. The streams can start and stop at will. > So my understanding of all of this is that you configure the *total* > bandwith for each class when you load the qdisc and then let userspace > handle the rest. Is this correct? Nothing wrong with that. > In my view, it would be nice if the qdisc had some notion about streams so > that you could create a stream, feed frames to it and let the driver pace > them out. (The fewer you queue, the shorter the delay). This will also > allow you to enforce per-stream bandwidth restrictions. I don't see how you > can do this here unless you want to do this in userspace. > > Do you have any plans for adding support for multiplexing streams? If you > have multiple streams, how do you enforce that one stream does not eat into > the bandwidth of another stream? AFAIK, this is something the network must > enforce, but I see no option of doing som here. Please, lets keep this simple. Today we have exactly zero user space applications using this kind of bandwidth reservation. The case of wanting the kernel to police individual stream usage does not exist, and probably never will. For serious TSN use cases, the bandwidth needed by each system and indeed the entire network will be engineered, and we can reasonably expect applications to cooperate in this regard. Thanks, Richard
On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote: > On Thu, Sep 07, 2017 at 07:34:11AM +0200, Henrik Austad wrote: > > Also, does this mean that when you create the qdisc, you have locked the > > bandwidth for the scheduler? Meaning, if I later want to add another > > stream that requires more bandwidth, I have to close all active streams, > > reconfigure the qdisc and then restart? > > No, just allocate enough bandwidth to accomodate all of the expected > streams. The streams can start and stop at will. Sure, that'll work. And if you want to this driver to act as a bridge, how do you accomodate change in network requirements? (i.e. how does this work with switchdev?) - Or am I overthinking this? > > So my understanding of all of this is that you configure the *total* > > bandwith for each class when you load the qdisc and then let userspace > > handle the rest. Is this correct? > > Nothing wrong with that. Didn't mean to say it was wrong, just making sure I've understood the concept. > > In my view, it would be nice if the qdisc had some notion about streams so > > that you could create a stream, feed frames to it and let the driver pace > > them out. (The fewer you queue, the shorter the delay). This will also > > allow you to enforce per-stream bandwidth restrictions. I don't see how you > > can do this here unless you want to do this in userspace. > > > > Do you have any plans for adding support for multiplexing streams? If you > > have multiple streams, how do you enforce that one stream does not eat into > > the bandwidth of another stream? AFAIK, this is something the network must > > enforce, but I see no option of doing som here. > > Please, lets keep this simple. Simple is always good > Today we have exactly zero user space > applications using this kind of bandwidth reservation. The case of > wanting the kernel to police individual stream usage does not exist, > and probably never will. That we have *zero* userspace applications today is probably related to the fact that we have exacatly *zero* drivers in the kernel that talks TSN :) To rephrase a bit, what I'm worried about: If you have more than 1 application in userspace that wants to send data using this scheduler, how do you ensure fair transmission of frames? (both how much bandwidth they use, but also ordering of frames from each application) Do you expect all of this to be handled in userspace? > For serious TSN use cases, the bandwidth needed by each system and > indeed the entire network will be engineered, and we can reasonably > expect applications to cooperate in this regard. yes.. that'll happen ;) > Thanks, > Richard Don't get me wrong, I think it is great that others are working on this! I'm just trying to fully understand the thought that have gone into this and how it is inteded to be used. I'll get busy testing the code and wrapping my head around the different parameters.
On Thu, Sep 07, 2017 at 05:27:51PM +0200, Henrik Austad wrote: > On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote: > And if you want to this driver to act as a bridge, how do you accomodate > change in network requirements? (i.e. how does this work with switchdev?) To my understanding, this Qdisc idea provides QoS for the host's transmitted traffic, and nothing more. > - Or am I overthinking this? Being able to configure the external ports of a switchdev is probably a nice feature, but that is another story. (But maybe I misunderstood the authors' intent!) > If you have more than 1 application in userspace that wants to send data > using this scheduler, how do you ensure fair transmission of frames? (both > how much bandwidth they use, There are many ways to handle this, and we shouldn't put any of that policy into the kernel. For example, there might be a monolithic application with configurable threads, or an allocation server that grants bandwidth to applications via IPC, or a multiplexing stream server like jack, pulse, etc, and so on... > but also ordering of frames from each application) Not sure what you mean by this. > Do you expect all of this to be handled in userspace? Yes, I do. Thanks, Richard
On Thu, Sep 07, 2017 at 05:53:15PM +0200, Richard Cochran wrote: > On Thu, Sep 07, 2017 at 05:27:51PM +0200, Henrik Austad wrote: > > On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote: > > And if you want to this driver to act as a bridge, how do you accomodate > > change in network requirements? (i.e. how does this work with switchdev?) > > To my understanding, this Qdisc idea provides QoS for the host's > transmitted traffic, and nothing more. Ok, then we're on the same page. > > - Or am I overthinking this? > > Being able to configure the external ports of a switchdev is probably > a nice feature, but that is another story. (But maybe I misunderstood > the authors' intent!) ok, chalk that one up for later perhaps > > If you have more than 1 application in userspace that wants to send data > > using this scheduler, how do you ensure fair transmission of frames? (both > > how much bandwidth they use, > > There are many ways to handle this, and we shouldn't put any of that > policy into the kernel. For example, there might be a monolithic > application with configurable threads, or an allocation server that > grants bandwidth to applications via IPC, or a multiplexing stream > server like jack, pulse, etc, and so on... true > > but also ordering of frames from each application) > > Not sure what you mean by this. Fair enough, I'm not that good at making myself clear :) Let's see if I can make a better attempt: If you have 2 separate applications that have their own streams going to different endpoints - but both are in the same class, then they will share the qdisc bandwidth. So application - A sends frame A1, A2, A3, .. An - B sends B1, B2, .. Bn What I was trying to describe was: if application A send 2 frames, and B sends 2 frames at the same time, then you would hope that the order would be A1, B1, A2, B2, and not A1, A2, B1, B2. None of this would be a problem if you expect a *single* user, like the allocation server you described above. Again, I think this is just me overthinking the problem right now :) > > Do you expect all of this to be handled in userspace? > > Yes, I do. ok, fair enough Thanks for answering my questions!
Hi Henrik, Thanks for your feedback! I'll address some of your comments below. On Thu, 2017-09-07 at 07:34 +0200, Henrik Austad wrote: > > As for the shapers config interface: > > > > * CBS (802.1Qav) > > > > This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line > > is: > > $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S > > \ > > idleslope I > > So this confuses me a bit, why specify sendSlope? > > sendSlope = portTransmitRate - idleSlope > > and portTransmitRate is the speed of the MAC (which you get from the > driver). Adding sendSlope here is just redundant I think. Yes, this was something we've spent quite a few time discussing before this RFC series. After reading the Annex L from 802.1Q-2014 (operation of CBS algorithm) so many times, we've came up with the rationale explained below. The rationale here is that sendSlope is just another parameter from CBS algorithm like idleSlope, hiCredit and loCredit. As such, its calculation should be done at the same "layer" as the others parameters (in this case, user space) in order to keep consistency. Moreover, in this design, the driver layer is dead simple: all the device driver has to do is applying CBS parameters to hardware. Having any CBS parameter calculation in the driver layer means all device drivers must implement that calculation. > Also, does this mean that when you create the qdisc, you have locked the > bandwidth for the scheduler? Meaning, if I later want to add another > stream that requires more bandwidth, I have to close all active streams, > reconfigure the qdisc and then restart? If we want to reserve more bandwidth to "accommodate" a new stream, we don't need to close all active streams. All we have to do is changing the CBS qdisc and pass the new CBS parameters. Here is what the command-line would look like: $ tc qdisc change dev enp0s4 parent 8001:5 cbs locredit -1470 hicredit 30 sendslope -980000 idleslope 20000 No application/stream is interrupted while new CBS parameters are applied. > > Note that the parameters for this qdisc are the ones defined by the > > 802.1Q-2014 spec, so no hardware specific functionality is exposed here. > > You do need to know if the link is brought up as 100 or 1000 though - which > the driver already knows. User space knows that information via ethtool or /sys. > > Testing this RFC > > ================ > > > > For testing the patches of this RFC only, you can refer to the samples and > > helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to > > setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to > > configure the HW shaper of the i210 controller: > > I will test it, feedback will be provided soon! :) That's great! Please let us know if you find any issue and thanks for you support. > > 8) You can also run a Talker for class B (prio 2 here) > > $ ./talker -i enp3s0 -p 2 > > > > * The bandwidth displayed on the listener output now should increase to > > very > > close to the one configured for class A + class B. > > Because you grab both class A *and* B, or because B will eat what A does > not use? Because the listener application grabs both class A and B traffic. Regards, Andre
On Thu, 2017-09-07 at 18:18 +0200, Henrik Austad wrote: > On Thu, Sep 07, 2017 at 05:53:15PM +0200, Richard Cochran wrote: > > On Thu, Sep 07, 2017 at 05:27:51PM +0200, Henrik Austad wrote: > > > On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote: > > > And if you want to this driver to act as a bridge, how do you accomodate > > > change in network requirements? (i.e. how does this work with switchdev?) > > > > To my understanding, this Qdisc idea provides QoS for the host's > > transmitted traffic, and nothing more. > > Ok, then we're on the same page. > > > > - Or am I overthinking this? > > > > Being able to configure the external ports of a switchdev is probably > > a nice feature, but that is another story. (But maybe I misunderstood > > the authors' intent!) > > ok, chalk that one up for later perhaps Just to clarify, we've been most focused on end-station use-cases. We've considered some bridge use-cases as well just to verify that the proposed design won't be an issue if someone else goes for it. - Andre
Henrik Austad <henrik@austad.us> writes: > On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: >> Hi, >> >> This patchset is an RFC on a proposal of how the Traffic Control subsystem can >> be used to offload the configuration of traffic shapers into network devices >> that provide support for them in HW. Our goal here is to start upstreaming >> support for features related to the Time-Sensitive Networking (TSN) set of >> standards into the kernel. > > Nice to see that others are working on this as well! :) > > A short disclaimer; I'm pretty much anchored in the view "linux is the > end-station in a TSN domain", is this your approach as well, or are you > looking at this driver to be used in bridges as well? (because that will > affect the comments on time-aware shaper and frame preemption) > > Yet another disclaimer; I am not a linux networking subsystem expert. Not > by a long shot! There are black magic happening in the internals of the > networking subsystem that I am not even aware of. So if something I say or > ask does not make sense _at_all_, that's probably why.. > > I do know a tiny bit about TSN though, and I have been messing around > with it for a little while, hence my comments below > >> As part of this work, we've assessed previous public discussions related to TSN >> enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann >> at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and >> the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/). > > /me eyes Cc ;p > >> Overview >> ======== >> >> Time-sensitive Networking (TSN) is a set of standards that aim to address >> resources availability for providing bandwidth reservation and bounded latency >> on Ethernet based LANs. The proposal described here aims to cover mainly what is >> needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and >> 802.1Qbu. >> >> The initial target of this work is the Intel i210 NIC, but other controllers' >> datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and >> the Synopsis DesignWare Ethernet QoS controller. > > NXP has a TSN aware chip on the i.MX7 sabre board as well </fyi> Cool. Will take a look. > >> Proposal >> ======== >> >> Feature-wise, what is covered here are configuration interfaces for HW >> implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper >> (802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while >> Qbv and Qbu must be configured per port, with the configuration covering all >> queues. Given that these features are related to traffic shaping, and that the >> traffic control subsystem already provides a queueing discipline that offloads >> config into the device driver (i.e. mqprio), designing new qdiscs for the >> specific purpose of offloading the config for each shaper seemed like a good >> fit. > > just to be clear, you register sch_cbs as a subclass to mqprio, not as a > root class? That's right. > >> For steering traffic into the correct queues, we use the socket option >> SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues. >> The qdisc mqprio is currently used in our tests. > > Right, fair enough, I'd prefer the TSN qdisc to be the root-device and > rather have mqprio for high priority traffic and another for 'everything > else'', but this would work too. This is not that relevant at this stage I > guess :) That's a scenario I haven't considered, will give it some thought. > >> As for the shapers config interface: >> >> * CBS (802.1Qav) >> >> This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is: >> $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \ >> idleslope I > > So this confuses me a bit, why specify sendSlope? > > sendSlope = portTransmitRate - idleSlope > > and portTransmitRate is the speed of the MAC (which you get from the > driver). Adding sendSlope here is just redundant I think. > > Also, does this mean that when you create the qdisc, you have locked the > bandwidth for the scheduler? Meaning, if I later want to add another > stream that requires more bandwidth, I have to close all active streams, > reconfigure the qdisc and then restart? > >> Note that the parameters for this qdisc are the ones defined by the >> 802.1Q-2014 spec, so no hardware specific functionality is exposed here. > > You do need to know if the link is brought up as 100 or 1000 though - which > the driver already knows. > >> * Time-aware shaper (802.1Qbv): >> >> The idea we are currently exploring is to add a "time-aware", priority based >> qdisc, that also exposes the Tx queues available and provides a mechanism for >> mapping priority <-> traffic class <-> Tx queues in a similar fashion as >> mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be: > > As far as I know, this is not supported by i210, and if time-aware shaping > is enabled in the network - you'll be queued on a bridge until the window > opens as time-aware shaping is enforced on the tx-port and not on rx. Is > this required in this driver? Yeah, i210 doesn't support the time-aware shaper. I think the second part of your question doesn't really apply, then. > >> $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ >> map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ >> queues 0 1 2 3 \ >> sched-file gates.sched [base-time <interval>] \ >> [cycle-time <interval>] [extension-time <interval>] > > That was a lot of priorities! 802.1Q lists 8 priorities, where does these > 16 come from? Even if the 802.1Q only defines 8 priorities, the Linux network stack supports a lot more (and this command line is more than slightly inspired by the mqprio equivalent). > > You map pri 0,1 to queue 2, pri 2 to queue 1 (Class B), pri 3 to queue 0 > (class A) and everythign else to queue 3. This is what I would expect, > except for the additional 8 priorities. > >> <file> is multi-line, with each line being of the following format: >> <cmd> <gate mask> <interval in nanoseconds> >> >> Qbv only defines one <cmd>: "S" for 'SetGates' >> >> For example: >> >> S 0x01 300 >> S 0x03 500 >> >> This means that there are two intervals, the first will have the gate >> for traffic class 0 open for 300 nanoseconds, the second will have >> both traffic classes open for 500 nanoseconds. > > Are you aware of any hw except dedicated switching stuff that supports > this? (meant as "I'm curious and would like to know") Not really. I couldn't find any public documentation about products destined for end stations that support this. I, too, would like to know more. > >> Additionally, an option to set just one entry of the gate control list will >> also be provided by 'taprio': >> >> $ tc qdisc (...) \ >> sched-row <row number> <cmd> <gate mask> <interval> \ >> [base-time <interval>] [cycle-time <interval>] \ >> [extension-time <interval>] >> >> >> * Frame Preemption (802.1Qbu): > > So Frame preemption is nice, but my understanding of Qbu is that the real > benefit is at the bridges and not in the endpoints. As jumbo-frames is > explicitly disallowed in Qav, the maximum latency incurred by a frame in > flight is 12us on a 1Gbps link. I am not sure if these 12us is what will be > the main delay in your application. > > Or have I missed some crucial point here? You didn't seem to have missed anything. What I saw as the biggest point for frame preemption, is when it is used with scheduled traffic, you could keep the preemptable traffic classes gates always open, have a few time windows for periodic traffic, and still have predictable behaviour for an unscheduled "emergency" traffic. Cheers, -- Vinicius
On Thu, Sep 07, 2017 at 07:58:53PM +0000, Guedes, Andre wrote: > Hi Henrik, > > Thanks for your feedback! I'll address some of your comments below. > > On Thu, 2017-09-07 at 07:34 +0200, Henrik Austad wrote: > > > As for the shapers config interface: > > > > > > * CBS (802.1Qav) > > > > > > This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line > > > is: > > > $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S > > > \ > > > idleslope I > > > > So this confuses me a bit, why specify sendSlope? > > > > sendSlope = portTransmitRate - idleSlope > > > > and portTransmitRate is the speed of the MAC (which you get from the > > driver). Adding sendSlope here is just redundant I think. > > Yes, this was something we've spent quite a few time discussing before this RFC > series. After reading the Annex L from 802.1Q-2014 (operation of CBS algorithm) > so many times, we've came up with the rationale explained below. > > The rationale here is that sendSlope is just another parameter from CBS > algorithm like idleSlope, hiCredit and loCredit. As such, its calculation > should be done at the same "layer" as the others parameters (in this case, user > space) in order to keep consistency. Moreover, in this design, the driver layer > is dead simple: all the device driver has to do is applying CBS parameters to > hardware. Having any CBS parameter calculation in the driver layer means all > device drivers must implement that calculation. Ok, that actually makes a lot of sense, and anything that keeps this kind of arithmetic outside the kernel is a good thing! Thanks for the clarification! > > Also, does this mean that when you create the qdisc, you have locked the > > bandwidth for the scheduler? Meaning, if I later want to add another > > stream that requires more bandwidth, I have to close all active streams, > > reconfigure the qdisc and then restart? > > If we want to reserve more bandwidth to "accommodate" a new stream, we don't > need to close all active streams. All we have to do is changing the CBS qdisc > and pass the new CBS parameters. Here is what the command-line would look like: > > $ tc qdisc change dev enp0s4 parent 8001:5 cbs locredit -1470 hicredit 30 > sendslope -980000 idleslope 20000 > > No application/stream is interrupted while new CBS parameters are applied. Ah, good. > > > Note that the parameters for this qdisc are the ones defined by the > > > 802.1Q-2014 spec, so no hardware specific functionality is exposed here. > > > > You do need to know if the link is brought up as 100 or 1000 though - which > > the driver already knows. > > User space knows that information via ethtool or /sys. Fair point. > > > Testing this RFC > > > ================ > > > > > > For testing the patches of this RFC only, you can refer to the samples and > > > helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to > > > setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to > > > configure the HW shaper of the i210 controller: > > > > I will test it, feedback will be provided soon! :) > > That's great! Please let us know if you find any issue and thanks for you > support. > > > > 8) You can also run a Talker for class B (prio 2 here) > > > $ ./talker -i enp3s0 -p 2 > > > > > > * The bandwidth displayed on the listener output now should increase to > > > very > > > close to the one configured for class A + class B. > > > > Because you grab both class A *and* B, or because B will eat what A does > > not use? > > Because the listener application grabs both class A and B traffic. Right, got it. Thanks for the feedback, I'm getting really excited about this! :D
On Thu, Sep 07, 2017 at 06:29:00PM -0700, Vinicius Costa Gomes wrote: > >> * Time-aware shaper (802.1Qbv): > >> > >> The idea we are currently exploring is to add a "time-aware", priority based > >> qdisc, that also exposes the Tx queues available and provides a mechanism for > >> mapping priority <-> traffic class <-> Tx queues in a similar fashion as > >> mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be: > > > > As far as I know, this is not supported by i210, and if time-aware shaping > > is enabled in the network - you'll be queued on a bridge until the window > > opens as time-aware shaping is enforced on the tx-port and not on rx. Is > > this required in this driver? > > Yeah, i210 doesn't support the time-aware shaper. I think the second > part of your question doesn't really apply, then. Actually, you can implement 802.1Qbv (as an end station) quite easily using the i210. I'll show how by posting a series after net-next opens up again. Thanks, Richard
On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: > * Time-aware shaper (802.1Qbv): I just posted a working alternative showing how to handle 802.1Qbv and many other Ethernet field buses. > The idea we are currently exploring is to add a "time-aware", priority based > qdisc, that also exposes the Tx queues available and provides a mechanism for > mapping priority <-> traffic class <-> Tx queues in a similar fashion as > mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be: > > $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ > map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ > queues 0 1 2 3 \ > sched-file gates.sched [base-time <interval>] \ > [cycle-time <interval>] [extension-time <interval>] > > <file> is multi-line, with each line being of the following format: > <cmd> <gate mask> <interval in nanoseconds> > > Qbv only defines one <cmd>: "S" for 'SetGates' > > For example: > > S 0x01 300 > S 0x03 500 > > This means that there are two intervals, the first will have the gate > for traffic class 0 open for 300 nanoseconds, the second will have > both traffic classes open for 500 nanoseconds. The idea of the schedule file will not work in practice. Consider the fact that the application wants to deliver time critical data in a particular slot. How can it find out a) what the time slots are and b) when the next slot is scheduled? With this Qdisc, it cannot do this, AFAICT. The admin might delete the file after configuring the Qdisc! Using the SO_TXTIME option, the application has total control over the scheduling. The great advantages of this approach is that we can support any possible combination of periodic or aperiodic scheduling and we can support any priority scheme user space dreams up. For example, one can imaging running two or more loops that only occasionally collide. When they do collide, which packet should be sent first? Just let user space decide. Thanks, Richard
On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: > This patchset is an RFC on a proposal of how the Traffic Control subsystem can > be used to offload the configuration of traffic shapers into network devices > that provide support for them in HW. Our goal here is to start upstreaming > support for features related to the Time-Sensitive Networking (TSN) set of > standards into the kernel. Just for the record, here is my score card showing the current status of TSN support in Linux. Comments and corrections are more welcome. Thanks, Richard | FEATURE | STANDARD | STATUS | |------------------------------------------------+---------------------+------------------------------| | Synchronization | 802.1AS-2011 | Implemented in | | | | - Linux kernel PHC subsystem | | | | - linuxptp (userspace) | |------------------------------------------------+---------------------+------------------------------| | Forwarding and Queuing Enhancements | 802.1Q-2014 sec. 34 | RFC posted (this thread) | | for Time-Sensitive Streams (FQTSS) | | | |------------------------------------------------+---------------------+------------------------------| | Stream Reservation Protocol (SRP) | 802.1Q-2014 sec. 35 | in Open-AVB [1] | |------------------------------------------------+---------------------+------------------------------| | Audio Video Transport Protocol (AVTP) | IEEE 1722-2011 | DNE | |------------------------------------------------+---------------------+------------------------------| | Audio/Video Device Discovery, Enumeration, | IEEE 1722.1-2013 | jdksavdecc-c [2] | | Connection Management and Control (AVDECC) | | | | AVDECC Connection Management Protocol (ACMP) | | | | AVDECC Enumeration and Control Protocol (AECP) | | | | MAC Address Acquisition Protocol (MAAP) | | in Open-AVB | |------------------------------------------------+---------------------+------------------------------| | Frame Preemption | P802.1Qbu | DNE | | Scheduled Traffic | P802.1Qbv | RFC posted (SO_TXTIME) | | SRP Enhancements and Performance Improvements | P802.1Qcc | DNE | DNE = Does Not Exist (to my knowledge) 1. https://github.com/Avnu/OpenAvnu (DISCLAIMER from the website:) It is planned to eventually include the various packet encapsulation types, protocol discovery daemons, libraries to convert media clocks to AVB clocks and vice versa, and drivers. This repository does not include all components required to build a full production AVB/TSN system (e.g. a turnkey solution to stream stored or live audio or video content). Some simple example applications are provided which illustrate the flow - but a professional Audio/Video system requires a full media stack - including audio and video inputs and outputs, media processing elements, and various graphical user interfaces. Various companies provide such integrated solutions. 2. https://github.com/jdkoftinoff/jdksavdecc-c
On Mon, Sep 18, 2017 at 10:02:14AM +0200, Richard Cochran wrote: > On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: > > * Time-aware shaper (802.1Qbv): > > I just posted a working alternative showing how to handle 802.1Qbv and > many other Ethernet field buses. Yes, I saw them, grabbing them for testing now - thanks! > > The idea we are currently exploring is to add a "time-aware", priority based > > qdisc, that also exposes the Tx queues available and provides a mechanism for > > mapping priority <-> traffic class <-> Tx queues in a similar fashion as > > mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be: > > > > $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ > > map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ > > queues 0 1 2 3 \ > > sched-file gates.sched [base-time <interval>] \ > > [cycle-time <interval>] [extension-time <interval>] > > > > <file> is multi-line, with each line being of the following format: > > <cmd> <gate mask> <interval in nanoseconds> > > > > Qbv only defines one <cmd>: "S" for 'SetGates' > > > > For example: > > > > S 0x01 300 > > S 0x03 500 > > > > This means that there are two intervals, the first will have the gate > > for traffic class 0 open for 300 nanoseconds, the second will have > > both traffic classes open for 500 nanoseconds. > > The idea of the schedule file will not work in practice. Consider the > fact that the application wants to deliver time critical data in a > particular slot. How can it find out a) what the time slots are and > b) when the next slot is scheduled? With this Qdisc, it cannot do > this, AFAICT. The admin might delete the file after configuring the > Qdisc! > > Using the SO_TXTIME option, the application has total control over the > scheduling. The great advantages of this approach is that we can > support any possible combination of periodic or aperiodic scheduling > and we can support any priority scheme user space dreams up. Using SO_TXTIME makes a lot of sense. TSN has a presentation_time, which you can use to deduce the time it should be transmitted (Class A has a 2ms latency guarantee, B has 50), but given how TSN uses the timestamp, it will wrap every 4.3 seconds, using SO_TXTIME allows you to schedule transmission at a much later time. It should also lessen the dependency on a specific protocol, which is also good. > For example, one can imaging running two or more loops that only > occasionally collide. When they do collide, which packet should be > sent first? Just let user space decide. If 2 userspace apps send to the same Tx-queue with the same priority, would it not make sense to just do FIFO? For all practical purposes, they have the same importance (same SO_PRIORITY, same SO_TXTIME). If the priority differs, then they would be directed to different queues, where one queue will take presedence anyway. How far into the future would it make sense to schedule packets anyway? I'll have a look at the other series you just posted!
Hi Richard, Richard Cochran <richardcochran@gmail.com> writes: > On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: >> * Time-aware shaper (802.1Qbv): > > I just posted a working alternative showing how to handle 802.1Qbv and > many other Ethernet field buses. > >> The idea we are currently exploring is to add a "time-aware", priority based >> qdisc, that also exposes the Tx queues available and provides a mechanism for >> mapping priority <-> traffic class <-> Tx queues in a similar fashion as >> mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be: >> >> $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ >> map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ >> queues 0 1 2 3 \ >> sched-file gates.sched [base-time <interval>] \ >> [cycle-time <interval>] [extension-time <interval>] >> >> <file> is multi-line, with each line being of the following format: >> <cmd> <gate mask> <interval in nanoseconds> >> >> Qbv only defines one <cmd>: "S" for 'SetGates' >> >> For example: >> >> S 0x01 300 >> S 0x03 500 >> >> This means that there are two intervals, the first will have the gate >> for traffic class 0 open for 300 nanoseconds, the second will have >> both traffic classes open for 500 nanoseconds. > > The idea of the schedule file will not work in practice. Consider the > fact that the application wants to deliver time critical data in a > particular slot. How can it find out a) what the time slots are and > b) when the next slot is scheduled? With this Qdisc, it cannot do > this, AFAICT. The admin might delete the file after configuring the > Qdisc! That's the point, the application does not need to know that, and asking that would be stupid. From the point of view of the Qbv specification, applications only need to care about its basic bandwidth requirements: its interval, frame size, frames per interval (using the terms of the SRP section of 802.1Q). The traffic schedule is provided (off band) by a "god box" which knows all the requirements of all applications in all the nodes and how they are connected. (And that's another nice point of how 802.1Qbv works, applications do not need to be changed to use it, and I think we should work to achieve this on the Linux side) That being said, that only works for kinds of traffic that maps well to this configuration in advance model, which is the model that the IEEE (see 802.1Qcc) and the AVNU Alliance[1] are pushing for. In the real world, I can see multiple types of applications, some using something like TXTIME, and some configured in advance. > > Using the SO_TXTIME option, the application has total control over the > scheduling. The great advantages of this approach is that we can > support any possible combination of periodic or aperiodic scheduling > and we can support any priority scheme user space dreams up. It has the disavantage of that the scheduling information has to be in-band with the data. I *really* think that for scheduled traffic, there should be a clear separation, we should not mix the dataflow with scheduling. In short, an application in the network don't need to have all the information necessary to schedule its own traffic well. I have two points here: 1. I see both "solutions" (taprio and SO_TXTIME) as being ortoghonal and useful, both; 2. trying to make one do the job of the other, however, looks like "If all I have is a hammer, everything looks like a nail". In short, I see a per-packet transmission time and a per-queue schedule as solutions to different problems. > > For example, one can imaging running two or more loops that only > occasionally collide. When they do collide, which packet should be > sent first? Just let user space decide. > > Thanks, > Richard Cheers, -- Vinicius [1] http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/
On Mon, Sep 18, 2017 at 04:06:28PM -0700, Vinicius Costa Gomes wrote: > That's the point, the application does not need to know that, and asking > that would be stupid. On the contrary, this information is essential to the application. Probably you have never seen an actual Ethernet field bus in operation? In any case, you are missing the point. > (And that's another nice point of how 802.1Qbv works, applications do > not need to be changed to use it, and I think we should work to achieve > this on the Linux side) Once you start to care about real time performance, then you need to consider the applications. This is industrial control, not streaming your tunes from your ipod. > That being said, that only works for kinds of traffic that maps well to > this configuration in advance model, which is the model that the IEEE > (see 802.1Qcc) and the AVNU Alliance[1] are pushing for. Again, you are missing the point of what they aiming for. I have looked at a number of production systems, and in each case the developers want total control over the transmission, in order to reduce latency to an absolute minimum. Typically the data to be sent are available only microseconds before the transmission deadline. Consider OpenAVB on github that people are already using. Take a look at simple_talker.c and explain how "applications do not need to be changed to use it." > [1] > http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/ Did you even read this? [page 24] As described in section 2, some industrial control systems require predictable, very low latency and cycle-to-cycle variation to meet hard real-time application requirements. In these systems, multiple distributed controllers commonly synchronize their sensor/actuator operations with other controllers by scheduling these operations in time, typically using a repeating control cycle. ... The gate control mechanism is itself a time-aware PTP application operating within a bridge or end station port. It is an application, not a "god box." > In short, I see a per-packet transmission time and a per-queue schedule > as solutions to different problems. Well, I can agree with that. For some non real-time applications, bandwidth shaping is enough, and your Qdisc idea is sufficient. For the really challenging TSN targets (industrial control, automotive), your idea of an opaque schedule file won't fly. Thanks, Richard
Hi all, On Tue, Sep 19, 2017 at 07:22:44AM +0200, Richard Cochran wrote: > On Mon, Sep 18, 2017 at 04:06:28PM -0700, Vinicius Costa Gomes wrote: > > That's the point, the application does not need to know that, and asking > > that would be stupid. > > On the contrary, this information is essential to the application. > Probably you have never seen an actual Ethernet field bus in > operation? In any case, you are missing the point. > > > (And that's another nice point of how 802.1Qbv works, applications do > > not need to be changed to use it, and I think we should work to achieve > > this on the Linux side) > > Once you start to care about real time performance, then you need to > consider the applications. This is industrial control, not streaming > your tunes from your ipod. Do not underestimate the need for media over TSN. I fully see your point of real-time systems, but they are not the only valid use-cases for TSN. > > That being said, that only works for kinds of traffic that maps well to > > this configuration in advance model, which is the model that the IEEE > > (see 802.1Qcc) and the AVNU Alliance[1] are pushing for. > > Again, you are missing the point of what they aiming for. I have > looked at a number of production systems, and in each case the > developers want total control over the transmission, in order to > reduce latency to an absolute minimum. Typically the data to be sent > are available only microseconds before the transmission deadline. > > Consider OpenAVB on github that people are already using. Take a look > at simple_talker.c and explain how "applications do not need to be > changed to use it." I do not think simple-talker was everintended to be how users of AVB should be implemented, but as a demonstration of what the protocol could do. ALSA/V4L2 should supply some interface to this so that you can attach media-applications to it without the application itself having to be "TSN aware". > > [1] > > http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/ > > Did you even read this? > > [page 24] > > As described in section 2, some industrial control systems require > predictable, very low latency and cycle-to-cycle variation to meet > hard real-time application requirements. In these systems, > multiple distributed controllers commonly synchronize their > sensor/actuator operations with other controllers by scheduling > these operations in time, typically using a repeating control > cycle. > ... > The gate control mechanism is itself a time-aware PTP application > operating within a bridge or end station port. > > It is an application, not a "god box." > > > In short, I see a per-packet transmission time and a per-queue schedule > > as solutions to different problems. > > Well, I can agree with that. For some non real-time applications, > bandwidth shaping is enough, and your Qdisc idea is sufficient. For > the really challenging TSN targets (industrial control, automotive), > your idea of an opaque schedule file won't fly. Would it make sense to adapt the proposed Qdisc here as well as the back-o-the-napkin idea in the other thread to to a per-socket queue for each priority and then sort those sockets based on SO_TXTIME? TSN operates on a per-StreamID basis, and that should map fairly well to a per-socket approach I think (let us just assume that an application that sends TSN traffic will open up a separate socket for each stream. This should allow a userspace application that is _very_ aware of its timing constraints to send frames exactly when it needs to as you have SO_TXTIME available. It would also let applications that basically want a fine-grained rate control (audio and video comes to mind) to use the same qdisc. For those sockets that do not support SO_TXTIME, but still map to a priority handled by sch_cbs (or whatever it'll end up being called) you can set the transmit-time to be the time of the last skb in the queue + an delta which will give you the correct rate (TSN operates on observation intervals which you can specify via tc when you create the queues). Then you can have, as you propose in your other series, a hrtimer that is being called when the next SO_TXTIME enters, grab a skb and move it to the hw-queue. This should also allow you to keep a sorted per-socket queue should an application send frames in the wrong order, without having to rearrange descriptors for the DMA machinery. If this makes sense, I am more than happy to give it a stab and see how it goes. -Henrik
Hi Richard, Richard Cochran <richardcochran@gmail.com> writes: > On Mon, Sep 18, 2017 at 04:06:28PM -0700, Vinicius Costa Gomes wrote: >> That's the point, the application does not need to know that, and asking >> that would be stupid. > > On the contrary, this information is essential to the application. > Probably you have never seen an actual Ethernet field bus in > operation? In any case, you are missing the point. > >> (And that's another nice point of how 802.1Qbv works, applications do >> not need to be changed to use it, and I think we should work to achieve >> this on the Linux side) > > Once you start to care about real time performance, then you need to > consider the applications. This is industrial control, not streaming > your tunes from your ipod. > >> That being said, that only works for kinds of traffic that maps well to >> this configuration in advance model, which is the model that the IEEE >> (see 802.1Qcc) and the AVNU Alliance[1] are pushing for. > > Again, you are missing the point of what they aiming for. I have > looked at a number of production systems, and in each case the > developers want total control over the transmission, in order to > reduce latency to an absolute minimum. Typically the data to be sent > are available only microseconds before the transmission deadline. > > Consider OpenAVB on github that people are already using. Take a look > at simple_talker.c and explain how "applications do not need to be > changed to use it." Just let me use the mention of OpenAVNU as a hook to explain what we (the team I am part of) are working to do, perhaps it will make our choices and designs clearer. One of the problems with OpenAVNU is that it's too coupled with the i210 NIC. One of the things we want is to decouple OpenAVNU from the controller. The way we thought best was to propose interfaces (that would work along side to the Linux networking stack) as close as possible to what the current standards define, that means the IEEE 802.1Q family of specifications, in the hope that network controller vendors would also look at the specifications when designing their controllers. Our objective with the Qdiscs we are proposing (both cbs and taprio) is to provide a sane way to configure controllers that support TSN features (we were looking specifically at the IEEE specs). After we have some rough consensus on the interfaces to use, then we can start working on OpenAVNU. > >> [1] >> http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/ > > Did you even read this? > > [page 24] > > As described in section 2, some industrial control systems require > predictable, very low latency and cycle-to-cycle variation to meet > hard real-time application requirements. In these systems, > multiple distributed controllers commonly synchronize their > sensor/actuator operations with other controllers by scheduling > these operations in time, typically using a repeating control > cycle. > ... > The gate control mechanism is itself a time-aware PTP application > operating within a bridge or end station port. > > It is an application, not a "god box." > >> In short, I see a per-packet transmission time and a per-queue schedule >> as solutions to different problems. > > Well, I can agree with that. For some non real-time applications, > bandwidth shaping is enough, and your Qdisc idea is sufficient. For > the really challenging TSN targets (industrial control, automotive), > your idea of an opaque schedule file won't fly. (Sorry if I am being annoying here, but the idea of an opaque schedule is not ours, that comes from the people who wrote the Qbv specification) I have a question, what about a controller that doesn't provide a way to set a per-packet transmission time, but it supports Qbv/Qbu. What would be your proposal to configure it? (I think LaunchTime is something specific to the i210, right?) Cheers, -- Vinicius
On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote: > Hi, > > This patchset is an RFC on a proposal of how the Traffic Control subsystem can > be used to offload the configuration of traffic shapers into network devices > that provide support for them in HW. Our goal here is to start upstreaming > support for features related to the Time-Sensitive Networking (TSN) set of > standards into the kernel. I'm very excited to see these features moving into the kernel! I am one of the maintainers of the OpenAvnu project and I've been involved in building AVB/TSN systems and working on the standards for around 10 years, so the support that's been slowly making it into more silicon and now Linux drivers is very encouraging. My team at Harman is working on endpoint code based on what's in the OpenAvnu project and a few Linux-based platforms. The Qav interface you've proposed will fit nicely with our traffic shaper management daemon, which already uses mqprio as a base but uses the htb shaper to approximate the Qav credit-based shaper on platforms where launch time scheduling isn't available. I've applied your patches and plan on testing them in conjunction with our shaper manager to see if we run into any hitches, but I don't expect any problems. > As part of this work, we've assessed previous public discussions related to TSN > enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann > at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and > the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/). > > Please note that the patches provided as part of this RFC are implementing what > is needed only for 802.1Qav (FQTSS) only, but we'd like to take advantage of > this discussion and share our WIP ideas for the 802.1Qbv and 802.1Qbu interfaces > as well. The current patches are only providing support for HW offload of the > configs. > > > Overview > ======== > > Time-sensitive Networking (TSN) is a set of standards that aim to address > resources availability for providing bandwidth reservation and bounded latency > on Ethernet based LANs. The proposal described here aims to cover mainly what is > needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and > 802.1Qbu. > > The initial target of this work is the Intel i210 NIC, but other controllers' > datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and > the Synopsis DesignWare Ethernet QoS controller. Recent SoCs from NXP (the i.MX 6 SoloX, and all the i.MX 7 and 8 parts) support Qav shaping as well as scheduled launch functionality; these are the parts I have been mostly working with. Marvell silicon (some subset of Armada processors and Link Street DSA switches) generally supports traffic shaping as well. I think a lack of an interface like this has probably slowed upstream driver support for this functionality where it exists; most vendors have an out-of- tree version of their driver with TSN functionality enabled via non-standard interfaces. Hopefully making it available will encourage vendors to upstream their driver support! > Proposal > ======== > > Feature-wise, what is covered here are configuration interfaces for HW > implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper > (802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while > Qbv and Qbu must be configured per port, with the configuration covering all > queues. Given that these features are related to traffic shaping, and that the > traffic control subsystem already provides a queueing discipline that offloads > config into the device driver (i.e. mqprio), designing new qdiscs for the > specific purpose of offloading the config for each shaper seemed like a good > fit. This makes sense to me too. The 802.1Q standards are all based on the sort of mappings between priority, traffic class, and hardware queues that the existing tc infrastructure seems to be modeling. I believe the mqprio module's mapping scheme is flexible enough to meet any TSN needs in conjunction with the other parts of the kernel qdisc system. > For steering traffic into the correct queues, we use the socket option > SO_PRIORITY and then a mechanism to map priority to traffic classes / Txqueues. > The qdisc mqprio is currently used in our tests. > > As for the shapers config interface: > > * CBS (802.1Qav) > > This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is: > $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \ > idleslope I > > Note that the parameters for this qdisc are the ones defined by the > 802.1Q-2014 spec, so no hardware specific functionality is exposed here. These parameters look good to me as a baseline; some additional optional parameters may be useful for software-based implementations--such as setting an interval at which to recalculate queues--but those can be discussed later. > * Time-aware shaper (802.1Qbv): I haven't come across any specific NIC or SoC MAC that does Qbv, but I have been experimenting with an EspressoBin board, which has a "Topaz" DSA switch in it that has some features intended for Qbv support, although they were done with a draft version in mind. I haven't looked at the interaction between the qdisc subsystem and DSA yet, but this mechanism might be useful to configure Qbv on the slave ports in that context. I've got both the board and the documentation, so I might be able to work on an implementation at some point. If some endpoint device shows up with direct Qbv support, this interface would probably work well there too, although a talker would need to be able to schedule its transmits pretty precisely to achieve the lowest possible latency. > The idea we are currently exploring is to add a "time-aware", priority based > qdisc, that also exposes the Tx queues available and provides a mechanism for > mapping priority <-> traffic class <-> Tx queues in a similar fashion as > mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be: > > $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ > map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ > queues 0 1 2 3 \ > sched-file gates.sched [base-time <interval>] \ > [cycle-time <interval>] [extension-time <interval>] One concern here is calling the base-time parameter an interval; it's really an absolute time with respect to the PTP timescale. Good documentation will be important to this one, since the specification discusses some subtleties regarding the impact of different time values chosen here. The format for specifying the actual intervals such as cycle-time could prove to be an important detail as well; Qbv specifies cycle-time as a ratio of two integers expressed in seconds, while extension-time is specified as an integer number of nanoseconds. Precision with the cycle-time is especially important, since base-time can be almost arbitrarily far in the past or future, and any given cycle start should be calculable from the base-time plus/minus some integer multiple of cycle- time. > <file> is multi-line, with each line being of the following format: > <cmd> <gate mask> <interval in nanoseconds> > > Qbv only defines one <cmd>: "S" for 'SetGates' > > For example: > > S 0x01 300 > S 0x03 500 > > This means that there are two intervals, the first will have the gate > for traffic class 0 open for 300 nanoseconds, the second will have > both traffic classes open for 500 nanoseconds. > > Additionally, an option to set just one entry of the gate control list will > also be provided by 'taprio': > > $ tc qdisc (...) \ > sched-row <row number> <cmd> <gate mask> <interval> \ > [base-time <interval>] [cycle-time <interval>] \ > [extension-time <interval>] If I understand correctly, 'sched-row' is meant to be usable multiple times in a single command and the 'sched-file' option is just a shorthand version for large tables? Or is it meant to update an existing schedule table? It doesn't seem very useful if it can only be specified once when the whole taprio intance is being established. > * Frame Preemption (802.1Qbu): > > To control even further the latency, it may prove useful to signal which > traffic classes are marked as preemptable. For that, 'taprio' provides the > preemption command so you set each traffic class as preemptable or not: > > $ tc qdisc (...) \ > preemption 0 1 1 1 > > > * Time-aware shaper + Preemption: > > As an example of how Qbv and Qbu can be used together, we may specify > both the schedule and the preempt-mask, and this way we may also > specify the Set-Gates-and-Hold and Set-Gates-and-Release commands as > specified in the Qbu spec: > > $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \ > map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ > queues 0 1 2 3 \ > preemption 0 1 1 1 \ > sched-file preempt_gates.sched > > <file> is multi-line, with each line being of the following format: > <cmd> <gate mask> <interval in nanoseconds> > > For this case, two new commands are introduced: > > "H" for 'set gates and hold' > "R" for 'set gates and release' > > H 0x01 300 > R 0x03 500 > The new Hold and Release gate commands look right, but I'm not sure about the preemption flags. Qbu describes a preemption parameter table indexed by *priority* rather than traffic class or queue. These select which of two MAC service interfaces is used by the frame at the ISS layer, either express or preemptable, at the time the frame is selected for transmit. If my understanding is correct, it's possible to map a preemptable priority as well as an express priority to the same queue, so flagging preemptability at the queue level is not correct. I'm not aware of any endpoint interfaces that support Qbu either, nor do I know of any switches that support it that someone could experiment with right now, so there's no pressure on getting that interface nailed down yet. Hopefully you find this feedback useful, and I appreciate the effort taken to get the RFC posted here! Levi
On Mon, Sep 18, 2017, Richard Cochran wrote: > Just for the record, here is my score card showing the current status > of TSN support in Linux. Comments and corrections are more welcome. > > Thanks, > Richard > > > | FEATURE | STANDARD | STATUS | > |------------------------------------------------+---------------------+------------------------------| > | Synchronization | 802.1AS-2011 | Implemented in | > | | | - Linux kernel PHC subsystem | > | | | - linuxptp (userspace) | > |------------------------------------------------+---------------------+------------------------------| An alternate implementation of the userspace portion of gPTP is also available at [1] > | Forwarding and Queuing Enhancements | 802.1Q-2014 sec. 34 | RFC posted (this thread) | > | for Time-Sensitive Streams (FQTSS) | | | > |------------------------------------------------+---------------------+------------------------------| > | Stream Reservation Protocol (SRP) | 802.1Q-2014 sec. 35 | in Open-AVB [1] | > |------------------------------------------------+---------------------+------------------------------| > | Audio Video Transport Protocol (AVTP) | IEEE 1722-2011 | DNE | > |------------------------------------------------+---------------------+------------------------------| > | Audio/Video Device Discovery, Enumeration, | IEEE 1722.1-2013 | jdksavdecc-c [2] | > | Connection Management and Control (AVDECC) | | | > | AVDECC Connection Management Protocol (ACMP) | | | > | AVDECC Enumeration and Control Protocol (AECP) | | | > | MAC Address Acquisition Protocol (MAAP) | | in Open-AVB | > |------------------------------------------------+---------------------+------------------------------| All of the above are available to some degree in the AVTP Pipeline part of [1], specifically at this location: https://github.com/AVnu/OpenAvnu/tree/master/lib/avtp_pipeline The code is very modular and configurable, although some parts are in better shape than others. The AVTP portion can use the custom userspace driver for the i210, which can be configured to use launch scheduling, or it can use standard kernel interfaces via sendmsg or PACKET_MMAP. It runs as-is when configured for standard interfaces with any network hardware that supports gPTP. I previously implemented a CMSG-based launch time scheduling mechanism like the one you have proposed, and I have a socket backend for it that could easily be ported to your proposal. It is not part of the repository yet since there's no kernel support for it outside of my prototype and your RFC. It is currently tied to the OpenAvnu gPTP daemon rather than linuxptp, as it uses a shared memory interface to get the current rate-ratio and offset information between the various clocks. There may be better ways to do this, but that's how the initial port of the codebase was done. It would be nice to get it working with linuxptp's userspace tools at some point as well, though. The libraries under avtp_pipeline are designed to be used separately, but a simple integrated application is provided and is built by the CI system. In addition to OpenAvnu, Renesas has a number of github repositories with what looks like a fairly complete media streaming system: https://github.com/renesas-rcar/avb-mse https://github.com/renesas-rcar/avb-streaming https://github.com/renesas-rcar/avb-applications I haven't examined them in great detail yet, though. > | Frame Preemption | P802.1Qbu | DNE | > | Scheduled Traffic | P802.1Qbv | RFC posted (SO_TXTIME) | > | SRP Enhancements and Performance Improvements | P802.1Qcc | DNE | > > DNE = Does Not Exist (to my knowledge) Although your SO_TXTIME proposal could certainly form the basis of an endpoint's implementation of Qbv, I think it is a stretch to consider it a Qbv implementation in itself, if that's what you mean by this. I have been working with colleagues on some experiments relating to a Linux-controlled DSN switch (a Marvell Topaz) that are a part of this effort in TSN: http://ieee802.org/1/files/public/docs2017/tsn-cgunther-802-3cg-multidrop-0917-v01.pdf The proper interfaces for the Qbv configuration and managing of switch-level PTP timestamps are not yet in place, so there's nothing even at RFC stage to present yet, but Qbv-capable Linux-managed switch hardware is available and we hope to get some reusable code published even if it's not yet ready to be integrated in the kernel. > > 1. https://github.com/Avnu/OpenAvnu > > (DISCLAIMER from the website:) > > It is planned to eventually include the various packet encapsulation types, > protocol discovery daemons, libraries to convert media clocks to AVB clocks > and vice versa, and drivers. > > This repository does not include all components required to build a full > production AVB/TSN system (e.g. a turnkey solution to stream stored or live audio > or video content). Some simple example applications are provided which > illustrate the flow - but a professional Audio/Video system requires a full media stack > - including audio and video inputs and outputs, media processing elements, and > various graphical user interfaces. Various companies provide such integrated > solutions. A bit of progress has been made since that was written, although it is true that it's still not quite complete and certainly not turnkey. The most glaring absence at the moment is the media clock recovery portion of AVTP, but I am actively working on this. > > 2. https://github.com/jdkoftinoff/jdksavdecc-c This is pulled in as a dependency of the AVDECC code in OpenAvnu; it's used in the command line driven controller, but not in the avtp_pipeline code that implements the endpoint AVDECC behavior. I don't think either are complete by any means, but they are complete enough to be mostly compliant and usable in the subset of behavior they support. The bulk of the command line controller is a clone of: https://github.com/audioscience/avdecc-lib Things are maybe a bit farther along than they seemed, but there is still important kernel work to be done to reduce the need for out-of-tree drivers and to get everyone on the same interfaces. I plan to be an active participant going forward. Levi
On Tue, Sep 19, 2017 at 05:19:18PM -0700, Vinicius Costa Gomes wrote: > One of the problems with OpenAVNU is that it's too coupled with the i210 > NIC. One of the things we want is to decouple OpenAVNU from the > controller. Yes, I want that, too. > The way we thought best was to propose interfaces (that > would work along side to the Linux networking stack) as close as > possible to what the current standards define, that means the IEEE > 802.1Q family of specifications, in the hope that network controller > vendors would also look at the specifications when designing their > controllers. These standard define the *behavior*, not the programming APIs. Our task as kernel developers is to invent the best interfaces for supporting 802.1Q and other standards, the hardware capabilities, and the widest range of applications (not jut AVB). > Our objective with the Qdiscs we are proposing (both cbs and taprio) is > to provide a sane way to configure controllers that support TSN features > (we were looking specifically at the IEEE specs). I can see how your proposed Qdiscs are inspired by the IEEE standards. However, in the case of time based transmission, I think there is a better way to do it, namely with SO_TXTIME (which BTW was originally proposed by Eric Mann). > After we have some rough consensus on the interfaces to use, then we can > start working on OpenAVNU. Did you see my table in the other mail? Any comments? > (Sorry if I am being annoying here, but the idea of an opaque schedule > is not ours, that comes from the people who wrote the Qbv specification) The schedule is easy to implement using SO_TXTIME. > I have a question, what about a controller that doesn't provide a way to > set a per-packet transmission time, but it supports Qbv/Qbu. What would > be your proposal to configure it? SO_TXTIME will have a generic SW fallback. BTW, regarding the i210, there is no sensible way to configure both CBS and time based transmission at the same time. The card performs a logical AND to make the launch decision. The effect of this is that each and every packet needs a LaunchTime, and the driver would be forced to guess the time for a packet before entering it into its queue. So if we end up merging CBS and SO_TXTIME, then we'll have to make them exclusive of each other (in the case of the i210) and manage the i210 queue configurations correctly. > (I think LaunchTime is something specific to the i210, right?) To my knowledge yes. However, if TSN does take hold, then other MAC vendors will copy it. Thanks, Richard
On Tue, Sep 19, 2017 at 11:17:54PM -0600, levipearson@gmail.com wrote: > In addition to OpenAvnu, Renesas has a number of github repositories with what looks like a fairly > complete media streaming system: Is it a generic stack or a set of hacks for their HW? > Although your SO_TXTIME proposal could certainly form the basis of an endpoint's implementation of Qbv, I > think it is a stretch to consider it a Qbv implementation in itself, if that's what you mean by this. No, that is not what I meant. We need some minimal additional kernel support in order to fully implement the TSN family of standards. Of course, the bulk will have to be done in user space. It would be a mistake to cram the stuff that belongs in userland into the kernel. Looking at the table, and reading your descriptions of the state of OpenAVB, I remained convinced that the kernel needs only three additions: 1. SO_TXTIME 2. CBS Qdisc 3. ALSA support for DAC clock control (but that is another story) > The proper interfaces for the Qbv configuration and managing of switch-level PTP timestamps are not yet > in place, so there's nothing even at RFC stage to present yet, but Qbv-capable Linux-managed switch > hardware is available and we hope to get some reusable code published even if it's not yet ready to be > integrated in the kernel. Right, configuring Qbv in an attached DSA switch needs its own interface. Regarding PHC support for DSA switches, I have something in the works to be published soon. > A bit of progress has been made since that was written, although it is true that it's still not > quite complete and certainly not turnkey. So OpenAVB is neither complete nor turnkey. That was my impression, too. > Things are maybe a bit farther along than they seemed, but there is still important kernel work to be > done to reduce the need for out-of-tree drivers and to get everyone on the same interfaces. I plan > to be an active participant going forward. You mentioned a couple of different kernel things you implemented. I would encourage you to post the work already done. Thanks, Richard
On Tue, Sep 19, 2017 at 07:59:11PM -0600, levipearson@gmail.com wrote: > If some endpoint device shows up with direct Qbv support, this interface would > probably work well there too, although a talker would need to be able to > schedule its transmits pretty precisely to achieve the lowest possible latency. This is an argument for SO_TXTIME. > One concern here is calling the base-time parameter an interval; it's really > an absolute time with respect to the PTP timescale. Good documentation will > be important to this one, since the specification discusses some subtleties > regarding the impact of different time values chosen here. > > The format for specifying the actual intervals such as cycle-time could prove > to be an important detail as well; Qbv specifies cycle-time as a ratio of two > integers expressed in seconds, while extension-time is specified as an integer > number of nanoseconds. > > Precision with the cycle-time is especially important, since base-time can be > almost arbitrarily far in the past or future, and any given cycle start should > be calculable from the base-time plus/minus some integer multiple of cycle- > time. The above three points also. Thanks, Richard
On Tue, Sep 19, 2017 at 05:19:18PM -0700, Vinicius Costa Gomes wrote:
> (I think LaunchTime is something specific to the i210, right?)
Levi just told us:
Recent SoCs from NXP (the i.MX 6 SoloX, and all the i.MX 7 and 8
parts) support Qav shaping as well as scheduled launch
functionality;
Thanks,
Richard
Hi, On 09/19/2017 10:49 PM, Richard Cochran wrote: (...) > > No, that is not what I meant. We need some minimal additional kernel > support in order to fully implement the TSN family of standards. Of > course, the bulk will have to be done in user space. It would be a > mistake to cram the stuff that belongs in userland into the kernel. > > Looking at the table, and reading your descriptions of the state of > OpenAVB, I remained convinced that the kernel needs only three > additions: > > 1. SO_TXTIME > 2. CBS Qdisc > 3. ALSA support for DAC clock control (but that is another story) We'll be posting the CBS v1 series for review soon. The current SO_TXTIME RFC for the purpose of Launchtime looks great, and we are looking forward for the v1 + its companion qdisc so we can test / review and provide feedback. We are still under the impression that a config interface for HW offload of Qbv / Qbu config will be needed, but we'll be deferring the 'taprio' proposal until there are NICs (end stations) that support these standards available. We can revisit it if that ever happens, and if it's still needed, but then taking into account SO_TXTIME (and its related qdisc). Thanks everyone for all the feedback so far. Regards, Jesus
Hi Richard, On 09/19/2017 10:25 PM, Richard Cochran wrote: (...) > >> I have a question, what about a controller that doesn't provide a way to >> set a per-packet transmission time, but it supports Qbv/Qbu. What would >> be your proposal to configure it? > > SO_TXTIME will have a generic SW fallback. > > BTW, regarding the i210, there is no sensible way to configure both > CBS and time based transmission at the same time. The card performs a > logical AND to make the launch decision. The effect of this is that > each and every packet needs a LaunchTime, and the driver would be > forced to guess the time for a packet before entering it into its > queue. > > So if we end up merging CBS and SO_TXTIME, then we'll have to make > them exclusive of each other (in the case of the i210) and manage the > i210 queue configurations correctly. > I've ran some quick tests here having launch time enabled on i210 + our cbs patchset. When valid Launch times are set on each packet you still get the expected behavior, so I'm not sure we should just make them exclusive of each other. I also did some tests with when you don't set valid launch times, but here using your idea from above, so with the driver calculating a valid launch time (i.e. current NIC time + X ns, varying X across tests) for packets that didn't have it set by the user, and I wasn't too happy with its reliability. It could definitely be improved, but it has left me wondering: instead, what about documenting that if you enable TXTIME, then you *must* provide a valid Launch time for all packets on traffic classes that are affected? With the SO_TXTIME qdisc idea in place, that could even be enforced before packets were enqueued into the netdevice. Regards, Jesus
On Wed, Oct 18, 2017 at 03:37:35PM -0700, Jesus Sanchez-Palencia wrote: > I also did some tests with when you don't set valid launch times, but here using > your idea from above, so with the driver calculating a valid launch time (i.e. > current NIC time + X ns, varying X across tests) for packets that didn't have it > set by the user, and I wasn't too happy with its reliability. It could > definitely be improved, but it has left me wondering: instead, what about > documenting that if you enable TXTIME, then you *must* provide a valid Launch > time for all packets on traffic classes that are affected? If txtime is enabled, then CBS is pointless because the txtime already specifies the bandwidth implicitly. The problem is when one program uses txtime and another uses CBS, then the CBS user will experience really wrong performance. Thanks, Richard
Hi, On 10/19/2017 01:39 PM, Richard Cochran wrote: > On Wed, Oct 18, 2017 at 03:37:35PM -0700, Jesus Sanchez-Palencia wrote: >> I also did some tests with when you don't set valid launch times, but here using >> your idea from above, so with the driver calculating a valid launch time (i.e. >> current NIC time + X ns, varying X across tests) for packets that didn't have it >> set by the user, and I wasn't too happy with its reliability. It could >> definitely be improved, but it has left me wondering: instead, what about >> documenting that if you enable TXTIME, then you *must* provide a valid Launch >> time for all packets on traffic classes that are affected? > > If txtime is enabled, then CBS is pointless because the txtime already > specifies the bandwidth implicitly. Assuming there is no "interfering" traffic on that traffic class, yes. Otherwise, CBS could be configured just to avoid that outbound traffic ever goes beyond the reserved bandwidth. > > The problem is when one program uses txtime and another uses CBS, then > the CBS user will experience really wrong performance. Good point. We'll need to adjust the launch time for controllers that behave like the i210 then, imo. Thanks, Jesus > > Thanks, > Richard >