mbox series

[RFC,net-next,0/6] tc-taprio offload for SJA1105 DSA

Message ID 20190707172921.17731-1-olteanv@gmail.com
Headers show
Series tc-taprio offload for SJA1105 DSA | expand

Message

Vladimir Oltean July 7, 2019, 5:29 p.m. UTC
Using Vinicius Costa Gomes' configuration interface for 802.1Qbv (later
resent by Voon Weifeng for the stmmac driver), I am submitting for
review a draft implementation of this offload for a DSA switch.

I don't want to insist too much on the hardware specifics of SJA1105
which isn't otherwise very compliant to the IEEE spec.

In order to be able to test with Vedang Patel's iproute2 patch for
taprio offload (https://www.spinics.net/lists/netdev/msg573072.html)
I had to actually revert the txtime-assist branch as it had changed the
iproute2 interface.

In terms of impact for DSA drivers, I would like to point out that:

- Maybe somebody should pre-populate qopt->cycle_time in case the user
  does not provide one. Otherwise each driver needs to iterate over the
  GCL once, just to set the cycle time (right now stmmac does as well).

- Configuring the switch over SPI cannot apparently be done from this
  ndo_setup_tc callback because it runs in atomic context. I also have
  some downstream patches to offload tc clsact matchall with mirred
  action, but in that case it looks like the atomic context restriction
  does not apply.

- I had to copy the struct tc_taprio_qopt_offload to driver private
  memory because a static config needs to be constructed every time a
  change takes place, and there are up to 4 switch ports that may take a
  TAS configuration. I have created a private
  tc_taprio_qopt_offload_copy() helper for this - I don't know whether
  it's of any help in the general case.

There is more to be done however. The TAS needs to be integrated with
the PTP driver. This is because with a PTP clock source, the base time
is written dynamically to the PTPSCHTM (PTP schedule time) register and
must be a time in the future. Then the "real" base time of each port's
TAS config can be offset by at most ~50 ms (the DELTA field from the
Schedule Entry Points Table) relative to PTPSCHTM.
Because base times in the past are completely ignored by this hardware,
we need to decide if it's ok behaviorally for a driver to "roll" a past
base time into the immediate future by incrementally adding the cycle
time (so the phase doesn't change). If it is, then decide by how long in
the future it is ok to do so. Or alternatively, is it preferable if the
driver errors out if the user-supplied base time is in the past and the
hardware doesn't like it? But even then, there might be fringe cases
when the base time becomes a past PTP time right as the driver tries to
apply the config.
Also applying a tc-taprio offload to a second SJA1105 switch port will
inevitably need to roll the first port's (now past) base time into an
equivalent future time.
All of this is going to be complicated even further by the fact that
resetting the switch (to apply the tc-taprio offload) makes it reset its
PTP time.

Vinicius Costa Gomes (1):
  taprio: Add support for hardware offloading

Vladimir Oltean (5):
  Revert "Merge branch 'net-sched-Add-txtime-assist-support-for-taprio'"
  net: dsa: Pass tc-taprio offload to drivers
  net: dsa: sja1105: Add static config tables for scheduling
  net: dsa: sja1105: Advertise the 8 TX queues
  net: dsa: sja1105: Configure the Time-Aware Shaper via tc-taprio
    offload

 drivers/net/dsa/sja1105/Kconfig               |   8 +
 drivers/net/dsa/sja1105/Makefile              |   4 +
 drivers/net/dsa/sja1105/sja1105.h             |   6 +
 .../net/dsa/sja1105/sja1105_dynamic_config.c  |   8 +
 drivers/net/dsa/sja1105/sja1105_main.c        |  19 +-
 .../net/dsa/sja1105/sja1105_static_config.c   | 167 +++++
 .../net/dsa/sja1105/sja1105_static_config.h   |  48 +-
 drivers/net/dsa/sja1105/sja1105_tas.c         | 452 ++++++++++++
 drivers/net/dsa/sja1105/sja1105_tas.h         |  22 +
 drivers/net/ethernet/intel/igb/igb_main.c     |   1 -
 include/linux/netdevice.h                     |   1 +
 include/net/dsa.h                             |   3 +
 include/net/pkt_sched.h                       |  18 +
 include/uapi/linux/pkt_sched.h                |  11 +-
 net/dsa/slave.c                               |  14 +
 net/dsa/tag_sja1105.c                         |   3 +-
 net/sched/sch_etf.c                           |  10 -
 net/sched/sch_taprio.c                        | 666 ++++++++----------
 18 files changed, 1060 insertions(+), 401 deletions(-)
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h

Comments

Andrew Lunn July 7, 2019, 5:47 p.m. UTC | #1
> - Configuring the switch over SPI cannot apparently be done from this
>   ndo_setup_tc callback because it runs in atomic context. I also have
>   some downstream patches to offload tc clsact matchall with mirred
>   action, but in that case it looks like the atomic context restriction
>   does not apply.

There have been similar problems in the past. We can probably have the
DSA layer turn it into a notifier. Look at the dsa_port_mdb_*
functions for example.

	  Andrew
Vladimir Oltean July 7, 2019, 8:28 p.m. UTC | #2
On Sun, 7 Jul 2019 at 20:47, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > - Configuring the switch over SPI cannot apparently be done from this
> >   ndo_setup_tc callback because it runs in atomic context. I also have
> >   some downstream patches to offload tc clsact matchall with mirred
> >   action, but in that case it looks like the atomic context restriction
> >   does not apply.
>
> There have been similar problems in the past. We can probably have the
> DSA layer turn it into a notifier. Look at the dsa_port_mdb_*
> functions for example.
>
>           Andrew

Ok, thanks. I thought the dsa_port_notify functions are just to be
called from switchdev. I'm still not sure I fully understand, but I'll
try to switch to that in v2 and see what happens.

-Vladimir
Vivien Didelot July 8, 2019, 4:10 p.m. UTC | #3
On Sun, 7 Jul 2019 23:28:24 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> I thought the dsa_port_notify functions are just to be
> called from switchdev. I'm still not sure I fully understand, but I'll
> try to switch to that in v2 and see what happens.

dsa_port_notify is used to forward an event occuring on a given port, to
all switch chips of a fabric. For example when programming a VLAN on a port,
all ports interconnecting switches together must to programmed too. Even
though mainly switchdev objects or attributes are notified at the moment,
a DSA notification is not specific to switchdev.


Thanks,

	Vivien
Vinicius Costa Gomes July 8, 2019, 11:28 p.m. UTC | #4
Hi Vladimir,

Vladimir Oltean <olteanv@gmail.com> writes:

> Using Vinicius Costa Gomes' configuration interface for 802.1Qbv (later
> resent by Voon Weifeng for the stmmac driver), I am submitting for
> review a draft implementation of this offload for a DSA switch.
>
> I don't want to insist too much on the hardware specifics of SJA1105
> which isn't otherwise very compliant to the IEEE spec.
>
> In order to be able to test with Vedang Patel's iproute2 patch for
> taprio offload (https://www.spinics.net/lists/netdev/msg573072.html)
> I had to actually revert the txtime-assist branch as it had changed the
> iproute2 interface.

Now, that Vedang's work was merged, I will send a rebased version of
the taprio offload series (also taking your feedback into account, see
below).

>
> In terms of impact for DSA drivers, I would like to point out that:
>
> - Maybe somebody should pre-populate qopt->cycle_time in case the user
>   does not provide one. Otherwise each driver needs to iterate over the
>   GCL once, just to set the cycle time (right now stmmac does as
> well).

Very fair, this should be very easy to do from taprio side.

>
> - Configuring the switch over SPI cannot apparently be done from this
>   ndo_setup_tc callback because it runs in atomic context. I also have
>   some downstream patches to offload tc clsact matchall with mirred
>   action, but in that case it looks like the atomic context restriction
>   does not apply.
>
> - I had to copy the struct tc_taprio_qopt_offload to driver private
>   memory because a static config needs to be constructed every time a
>   change takes place, and there are up to 4 switch ports that may take a
>   TAS configuration. I have created a private
>   tc_taprio_qopt_offload_copy() helper for this - I don't know whether
>   it's of any help in the general case.

If everyone needs to do this, perhaps we can think of something else,
one first idea is that taprio builds this configuration and gives
ownership of it to the driver, but we would need to add _ref()/_unref()
(or similar) helpers to the qdisc/driver API.

>
> There is more to be done however. The TAS needs to be integrated with
> the PTP driver. This is because with a PTP clock source, the base time
> is written dynamically to the PTPSCHTM (PTP schedule time) register and
> must be a time in the future. Then the "real" base time of each port's
> TAS config can be offset by at most ~50 ms (the DELTA field from the
> Schedule Entry Points Table) relative to PTPSCHTM.
> Because base times in the past are completely ignored by this hardware,
> we need to decide if it's ok behaviorally for a driver to "roll" a past
> base time into the immediate future by incrementally adding the cycle
> time (so the phase doesn't change).

That's another good piece of information. My understanding from reading
section 8.6.9.1.1 from the IEEE 802.1Q-2018 spec, is that it's ok:

	"""
	CycleStartTime = (OperBaseTime + N*OperCycleTime)
	where N is the smallest integer for which the relation:
	CycleStartTime >= CurrentTime
	would be TRUE.
	"""

> If it is, then decide by how long in
> the future it is ok to do so. Or alternatively, is it preferable if the
> driver errors out if the user-supplied base time is in the past and the
> hardware doesn't like it? But even then, there might be fringe cases
> when the base time becomes a past PTP time right as the driver tries to
> apply the config.

This fringe case is interesting. I don't know how to handle it well. The
first idea that comes to mind is for the driver to add a integer number
of cycles so there's enough time to apply the config. But this I think
will be different for every driver, no?

> Also applying a tc-taprio offload to a second SJA1105 switch port will
> inevitably need to roll the first port's (now past) base time into an
> equivalent future time.
> All of this is going to be complicated even further by the fact that
> resetting the switch (to apply the tc-taprio offload) makes it reset its
> PTP time.

This is going to be complicated indeed.


Thanks a lot,
--
Vinicius