diff mbox series

[RFC,v3,net-next,14/18] net/sched: Add HW offloading capability to TBS

Message ID 20180307011230.24001-15-jesus.sanchez-palencia@intel.com
State RFC, archived
Delegated to: David Miller
Headers show
Series Time based packet transmission | expand

Commit Message

Jesus Sanchez-Palencia March 7, 2018, 1:12 a.m. UTC
Add new queueing modes to tbs qdisc so HW offload is supported.

For hw offload, if sorting is on, then the time sorted list will still
be used, but when sorting is disabled the enqueue / dequeue flow will
be based on a 'raw' FIFO through the usage of qdisc_enqueue_tail() and
qdisc_dequeue_head(). For the 'raw hw offload' mode, the drop_if_late
flag from skbuffs is not used by the Qdisc since this mode implicitly
assumes the PHC clock is being used by applications.

Example 1:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 tbs offload

In this example, the Qdisc will use HW offload for the control of the
transmission time through the network adapter. It's assumed the timestamp
in skbuffs are in reference to the interface's PHC and setting any other
valid clockid would be treated as an error. Because there is no
scheduling being performed in the qdisc, setting a delta != 0 would also
be considered an error.

Example 2:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 tbs offload delta 100000 \
	   clockid CLOCK_REALTIME sorting

Here, the Qdisc will use HW offload for the txtime control again,
but now sorting will be enabled, and thus there will be scheduling being
performed by the qdisc. That is done based on the clockid CLOCK_REALTIME
reference and packets leave the Qdisc "delta" (100000) nanoseconds before
their transmission time. Because this will be using HW offload and
since dynamic clocks are not supported by the hrtimer, the system clock
and the PHC clock must be synchronized for this mode to behave as expected.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 include/net/pkt_sched.h        |   5 ++
 include/uapi/linux/pkt_sched.h |   1 +
 net/sched/sch_tbs.c            | 159 +++++++++++++++++++++++++++++++++++------
 3 files changed, 144 insertions(+), 21 deletions(-)

Comments

Thomas Gleixner March 21, 2018, 2:22 p.m. UTC | #1
On Tue, 6 Mar 2018, Jesus Sanchez-Palencia wrote:
> $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
>            map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
> $ tc qdisc add dev enp2s0 parent 100:1 tbs offload
> 
> In this example, the Qdisc will use HW offload for the control of the
> transmission time through the network adapter. It's assumed the timestamp
> in skbuffs are in reference to the interface's PHC and setting any other
> valid clockid would be treated as an error. Because there is no
> scheduling being performed in the qdisc, setting a delta != 0 would also
> be considered an error.

Which clockid will be handed in from the application? The network adapter
time has no fixed clockid. The only way you can get to it is via a fd based
posix clock and that does not work at all because the qdisc setup might
have a different FD than the application which queues packets.

I think this should look like this:

    clock_adapter:	1 = clock of the network adapter
    			0 = system clock selected by clock_system

    clock_system:	0 = CLOCK_REALTIME
    			1 = CLOCK_MONOTONIC

or something like that.

> Example 2:
> 
> $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
>            map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
> $ tc qdisc add dev enp2s0 parent 100:1 tbs offload delta 100000 \
> 	   clockid CLOCK_REALTIME sorting
> 
> Here, the Qdisc will use HW offload for the txtime control again,
> but now sorting will be enabled, and thus there will be scheduling being
> performed by the qdisc. That is done based on the clockid CLOCK_REALTIME
> reference and packets leave the Qdisc "delta" (100000) nanoseconds before
> their transmission time. Because this will be using HW offload and
> since dynamic clocks are not supported by the hrtimer, the system clock
> and the PHC clock must be synchronized for this mode to behave as expected.

So what you do here is queueing the packets in the qdisk and then schedule
them at some point ahead of actual transmission time for delivery to the
hardware. That delivery uses the same txtime as used for qdisc scheduling
to tell the hardware when the packet should go on the wire. That's needed
when the network adapter does not support queueing of multiple packets.

Bah, and probably there you need CLOCK_TAI because that's what PTP is based
on, so clock_system needs to accomodate that as well. Dammit, there goes
the simple 2 bits implementation. CLOCK_TAI is 11, so we'd need 4 clock
bits plus the adapter bit.

Though we could spare a bit. The fixed CLOCK_* space goes from 0 to 15. I
don't see us adding new fixed clocks, so we really can reserve #15 for
selecting the adapter clock if sparing that extra bit is truly required.

Thanks,

	tglx
Richard Cochran March 21, 2018, 3:03 p.m. UTC | #2
On Wed, Mar 21, 2018 at 03:22:11PM +0100, Thomas Gleixner wrote:
> Which clockid will be handed in from the application? The network adapter
> time has no fixed clockid. The only way you can get to it is via a fd based
> posix clock and that does not work at all because the qdisc setup might
> have a different FD than the application which queues packets.

Duh.  That explains it.  Please ignore my "why not?" Q in the other thread...

Thanks,
Richard
Thomas Gleixner March 21, 2018, 4:18 p.m. UTC | #3
On Wed, 21 Mar 2018, Richard Cochran wrote:

> On Wed, Mar 21, 2018 at 03:22:11PM +0100, Thomas Gleixner wrote:
> > Which clockid will be handed in from the application? The network adapter
> > time has no fixed clockid. The only way you can get to it is via a fd based
> > posix clock and that does not work at all because the qdisc setup might
> > have a different FD than the application which queues packets.
> 
> Duh.  That explains it.  Please ignore my "why not?" Q in the other thread...

:)

So in that case you are either bound to rely on the application to use the
proper dynamic clock or if we need a sanity check, then you need a cookie
of some form which can be retrieved from the posix clock file descriptor
and handed in as 'clockid' together with clock_adapter = true.

That's doable, but that needs a bit more trickery. A simple unique ID per
dynamic posix-clock would be trivial to add, but that would not give you
any form of verification whether this ID actually belongs to the network
adapter or not.

So either you ignore the clockid and rely on the application not being
stupid when it says "clock_adpater = true" or you need some extra
complexity to build an association of a "clockid" to a network adapter.

There is a connection already, via

     adapter->ptp_clock->devid

which is MKDEV(major, index) which is accessible at least at the network
driver level, but probably not from networking core. So you'd need to drill
a few more holes by adding yet another callback to net_device_ops.

I'm not sure if its worth the trouble. If the application hands in bogus
timestamps, packets go out at the wrong time or are dropped. That's true
whether it uses the proper clock or not. So nothing the kernel should
really worry about.

For clock_system - REAL/MONO/TAI(sigh) - you surely need a sanity check,
but that is independent of the underlying network adapater even in the
qdisc assisted HW offload case.

Thanks,

	tglx
Jesus Sanchez-Palencia March 22, 2018, 10:01 p.m. UTC | #4
Hi,


On 03/21/2018 09:18 AM, Thomas Gleixner wrote:
> On Wed, 21 Mar 2018, Richard Cochran wrote:
> 
>> On Wed, Mar 21, 2018 at 03:22:11PM +0100, Thomas Gleixner wrote:
>>> Which clockid will be handed in from the application? The network adapter
>>> time has no fixed clockid. The only way you can get to it is via a fd based
>>> posix clock and that does not work at all because the qdisc setup might
>>> have a different FD than the application which queues packets.
>>
>> Duh.  That explains it.  Please ignore my "why not?" Q in the other thread...
> 
> :)
> 
> So in that case you are either bound to rely on the application to use the
> proper dynamic clock or if we need a sanity check, then you need a cookie
> of some form which can be retrieved from the posix clock file descriptor
> and handed in as 'clockid' together with clock_adapter = true.
> 
> That's doable, but that needs a bit more trickery. A simple unique ID per
> dynamic posix-clock would be trivial to add, but that would not give you
> any form of verification whether this ID actually belongs to the network
> adapter or not.
> 
> So either you ignore the clockid and rely on the application not being
> stupid when it says "clock_adpater = true" or you need some extra
> complexity to build an association of a "clockid" to a network adapter.
> 
> There is a connection already, via
> 
>      adapter->ptp_clock->devid
> 
> which is MKDEV(major, index) which is accessible at least at the network
> driver level, but probably not from networking core. So you'd need to drill
> a few more holes by adding yet another callback to net_device_ops.
> 
> I'm not sure if its worth the trouble. If the application hands in bogus
> timestamps, packets go out at the wrong time or are dropped. That's true
> whether it uses the proper clock or not. So nothing the kernel should
> really worry about.


+1 and that is the approach we've taken so far with the qdisc setting
"CLOCKID_INVALID" to its internal clockid for the "raw" (non-assisted) hw
offload case.

thanks,
Jesus



> 
> For clock_system - REAL/MONO/TAI(sigh) - you surely need a sanity check,
> but that is independent of the underlying network adapater even in the
> qdisc assisted HW offload case.
> 
> Thanks,
> 
> 	tglx
> 
> 
> 
> 
> 
>
Jesus Sanchez-Palencia March 22, 2018, 11:15 p.m. UTC | #5
Hi,


On 03/21/2018 07:22 AM, Thomas Gleixner wrote:
> On Tue, 6 Mar 2018, Jesus Sanchez-Palencia wrote:
>> $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
>>            map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
>>
>> $ tc qdisc add dev enp2s0 parent 100:1 tbs offload
>>
>> In this example, the Qdisc will use HW offload for the control of the
>> transmission time through the network adapter. It's assumed the timestamp
>> in skbuffs are in reference to the interface's PHC and setting any other
>> valid clockid would be treated as an error. Because there is no
>> scheduling being performed in the qdisc, setting a delta != 0 would also
>> be considered an error.
> 
> Which clockid will be handed in from the application? The network adapter
> time has no fixed clockid. The only way you can get to it is via a fd based
> posix clock and that does not work at all because the qdisc setup might
> have a different FD than the application which queues packets.


Yes. As a result, we came up with a rather simplistic solution that would still
allow for dynamic clocks to be used in the future without any API changes. As of
the v3 RFC, the qdisc returns -EINVAL if a netlink application (i.e. tc) tries
to initialize it in 'raw' hw offload passing any clockid != CLOCKID_INVALID. The
skbuffs' clockid was initialized with the same value, so if the application sets
its value to any other valid clockids through the cmsg interface, the qdisc
would just drop the patches on enqueue() due to the mismatch.

In other words, dynamic clocks are currently not used at all.

(I noticed later that this was broken anyway because the definition of invalid
clockids from posix-timers.h is actually only valid for negative numbers.)

Given all the feedback against adding the clockid into struct sk_buff, for the
next version, we'll have to re-think this anyway now that clockid will be set
per socket (i.e. as an argument to the SO_TXTIME) and not per packet anymore.




> 
> I think this should look like this:
> 
>     clock_adapter:	1 = clock of the network adapter
>     			0 = system clock selected by clock_system
> 
>     clock_system:	0 = CLOCK_REALTIME
>     			1 = CLOCK_MONOTONIC
> 
> or something like that.
> 
>> Example 2:
>>
>> $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
>>            map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
>>
>> $ tc qdisc add dev enp2s0 parent 100:1 tbs offload delta 100000 \
>> 	   clockid CLOCK_REALTIME sorting
>>
>> Here, the Qdisc will use HW offload for the txtime control again,
>> but now sorting will be enabled, and thus there will be scheduling being
>> performed by the qdisc. That is done based on the clockid CLOCK_REALTIME
>> reference and packets leave the Qdisc "delta" (100000) nanoseconds before
>> their transmission time. Because this will be using HW offload and
>> since dynamic clocks are not supported by the hrtimer, the system clock
>> and the PHC clock must be synchronized for this mode to behave as expected.
> 
> So what you do here is queueing the packets in the qdisk and then schedule
> them at some point ahead of actual transmission time for delivery to the
> hardware. That delivery uses the same txtime as used for qdisc scheduling
> to tell the hardware when the packet should go on the wire. That's needed
> when the network adapter does not support queueing of multiple packets.
> 
> Bah, and probably there you need CLOCK_TAI because that's what PTP is based
> on, so clock_system needs to accomodate that as well. Dammit, there goes
> the simple 2 bits implementation. CLOCK_TAI is 11, so we'd need 4 clock
> bits plus the adapter bit.
> 
> Though we could spare a bit. The fixed CLOCK_* space goes from 0 to 15. I
> don't see us adding new fixed clocks, so we really can reserve #15 for
> selecting the adapter clock if sparing that extra bit is truly required.


So what about just using the previous single 'clockid' argument, but then just
adding to uapi time.h something like:

#define DYNAMIC_CLOCKID 15

And using it for that, instead. This way applications that will use the raw hw
offload mode must use this value for their per-socket clockid, and the qdisc's
clockid would be implicitly initialized to the same value.

What do you think?

Thanks,
Jesus



> 
> Thanks,
> 
> 	tglx
> 
>
Thomas Gleixner March 23, 2018, 8:51 a.m. UTC | #6
On Thu, 22 Mar 2018, Jesus Sanchez-Palencia wrote:
> On 03/21/2018 07:22 AM, Thomas Gleixner wrote:
> > Bah, and probably there you need CLOCK_TAI because that's what PTP is based
> > on, so clock_system needs to accomodate that as well. Dammit, there goes
> > the simple 2 bits implementation. CLOCK_TAI is 11, so we'd need 4 clock
> > bits plus the adapter bit.
> > 
> > Though we could spare a bit. The fixed CLOCK_* space goes from 0 to 15. I
> > don't see us adding new fixed clocks, so we really can reserve #15 for
> > selecting the adapter clock if sparing that extra bit is truly required.
> 
> 
> So what about just using the previous single 'clockid' argument, but then just
> adding to uapi time.h something like:
> 
> #define DYNAMIC_CLOCKID 15
> 
> And using it for that, instead. This way applications that will use the raw hw
> offload mode must use this value for their per-socket clockid, and the qdisc's
> clockid would be implicitly initialized to the same value.

That's what I suggested above.

Thanks,

	tglx
diff mbox series

Patch

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2466ea143d01..d042ffda7f21 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -155,4 +155,9 @@  struct tc_cbs_qopt_offload {
 	s32 sendslope;
 };
 
+struct tc_tbs_qopt_offload {
+	u8 enable;
+	s32 queue;
+};
+
 #endif
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index a33b5b9da81a..92af9fa4dee4 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -941,6 +941,7 @@  struct tc_tbs_qopt {
 	__s32 clockid;
 	__u32 flags;
 #define TC_TBS_SORTING_ON BIT(0)
+#define TC_TBS_OFFLOAD_ON BIT(1)
 };
 
 enum {
diff --git a/net/sched/sch_tbs.c b/net/sched/sch_tbs.c
index c19eedda9bc5..2aafa55de42c 100644
--- a/net/sched/sch_tbs.c
+++ b/net/sched/sch_tbs.c
@@ -25,8 +25,10 @@ 
 #include <net/sock.h>
 
 #define SORTING_IS_ON(x) (x->flags & TC_TBS_SORTING_ON)
+#define OFFLOAD_IS_ON(x) (x->flags & TC_TBS_OFFLOAD_ON)
 
 struct tbs_sched_data {
+	bool offload;
 	bool sorting;
 	int clockid;
 	int queue;
@@ -68,25 +70,42 @@  static inline int validate_input_params(struct tc_tbs_qopt *qopt,
 					struct netlink_ext_ack *extack)
 {
 	/* Check if params comply to the following rules:
-	 *	* If SW best-effort, then clockid and delta must be valid
-	 *	  regardless of sorting enabled or not.
+	 *	* If SW best-effort, then clockid and delta must be valid.
+	 *
+	 *	* If HW offload is ON and sorting is ON, then clockid and delta
+	 *	  must be valid.
+	 *
+	 *	* If HW offload is ON and sorting is OFF, then clockid and
+	 *	  delta must not have been set. The netdevice PHC will be used
+	 *	  implictly.
 	 *
 	 *	* Dynamic clockids are not supported.
 	 *	* Delta must be a positive integer.
 	 */
-	if ((qopt->clockid & CLOCKID_INVALID) == CLOCKID_INVALID ||
-	    qopt->clockid >= MAX_CLOCKS) {
-		NL_SET_ERR_MSG(extack, "Invalid clockid");
-		return -EINVAL;
-	} else if (qopt->clockid < 0 ||
-		   !clockid_to_get_time[qopt->clockid]) {
-		NL_SET_ERR_MSG(extack, "Clockid is not supported");
-		return -ENOTSUPP;
-	}
-
-	if (qopt->delta < 0) {
-		NL_SET_ERR_MSG(extack, "Delta must be positive");
-		return -EINVAL;
+	if (!OFFLOAD_IS_ON(qopt) || SORTING_IS_ON(qopt)) {
+		if ((qopt->clockid & CLOCKID_INVALID) == CLOCKID_INVALID ||
+		    qopt->clockid >= MAX_CLOCKS) {
+			NL_SET_ERR_MSG(extack, "Invalid clockid");
+			return -EINVAL;
+		} else if (qopt->clockid < 0 ||
+			   !clockid_to_get_time[qopt->clockid]) {
+			NL_SET_ERR_MSG(extack, "Clockid is not supported");
+			return -ENOTSUPP;
+		}
+
+		if (qopt->delta < 0) {
+			NL_SET_ERR_MSG(extack, "Delta must be positive");
+			return -EINVAL;
+		}
+	} else {
+		if (qopt->delta != 0) {
+			NL_SET_ERR_MSG(extack, "Cannot set delta for this mode");
+			return -EINVAL;
+		}
+		if ((qopt->clockid & CLOCKID_INVALID) != CLOCKID_INVALID) {
+			NL_SET_ERR_MSG(extack, "Cannot set clockid for this mode");
+			return -EINVAL;
+		}
 	}
 
 	return 0;
@@ -155,6 +174,15 @@  static int tbs_enqueue(struct sk_buff *nskb, struct Qdisc *sch,
 	return q->enqueue(nskb, sch, to_free);
 }
 
+static int tbs_enqueue_fifo(struct sk_buff *nskb, struct Qdisc *sch,
+			    struct sk_buff **to_free)
+{
+	if (!is_packet_valid(sch, nskb))
+		return qdisc_drop(nskb, sch, to_free);
+
+	return qdisc_enqueue_tail(nskb, sch);
+}
+
 static int tbs_enqueue_scheduledfifo(struct sk_buff *nskb, struct Qdisc *sch,
 				     struct sk_buff **to_free)
 {
@@ -242,6 +270,21 @@  static struct sk_buff *tbs_dequeue(struct Qdisc *sch)
 	return q->dequeue(sch);
 }
 
+static struct sk_buff *tbs_dequeue_fifo(struct Qdisc *sch)
+{
+	struct tbs_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = qdisc_dequeue_head(sch);
+
+	/* XXX: The drop_if_late bit is not checked here because that would
+	 *      require the PHC time to be read directly.
+	 */
+
+	if (skb)
+		q->last = skb->tstamp;
+
+	return skb;
+}
+
 static struct sk_buff *tbs_dequeue_scheduledfifo(struct Qdisc *sch)
 {
 	struct tbs_sched_data *q = qdisc_priv(sch);
@@ -318,6 +361,56 @@  static struct sk_buff *tbs_dequeue_timesortedlist(struct Qdisc *sch)
 	return skb;
 }
 
+static void tbs_disable_offload(struct net_device *dev,
+				struct tbs_sched_data *q)
+{
+	struct tc_tbs_qopt_offload tbs = { };
+	const struct net_device_ops *ops;
+	int err;
+
+	if (!q->offload)
+		return;
+
+	ops = dev->netdev_ops;
+	if (!ops->ndo_setup_tc)
+		return;
+
+	tbs.queue = q->queue;
+	tbs.enable = 0;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TBS, &tbs);
+	if (err < 0)
+		pr_warn("Couldn't disable TBS offload for queue %d\n",
+			tbs.queue);
+}
+
+static int tbs_enable_offload(struct net_device *dev, struct tbs_sched_data *q,
+			      struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_tbs_qopt_offload tbs = { };
+	int err;
+
+	if (q->offload)
+		return 0;
+
+	if (!ops->ndo_setup_tc) {
+		NL_SET_ERR_MSG(extack, "Specified device does not support TBS offload");
+		return -EOPNOTSUPP;
+	}
+
+	tbs.queue = q->queue;
+	tbs.enable = 1;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TBS, &tbs);
+	if (err < 0) {
+		NL_SET_ERR_MSG(extack, "Specified device failed to setup TBS hardware offload");
+		return err;
+	}
+
+	return 0;
+}
+
 static inline void setup_queueing_mode(struct tbs_sched_data *q)
 {
 	if (q->sorting) {
@@ -325,9 +418,15 @@  static inline void setup_queueing_mode(struct tbs_sched_data *q)
 		q->dequeue = tbs_dequeue_timesortedlist;
 		q->peek = tbs_peek_timesortedlist;
 	} else {
-		q->enqueue = tbs_enqueue_scheduledfifo;
-		q->dequeue = tbs_dequeue_scheduledfifo;
-		q->peek = qdisc_peek_head;
+		if (q->offload) {
+			q->enqueue = tbs_enqueue_fifo;
+			q->dequeue = tbs_dequeue_fifo;
+			q->peek = qdisc_peek_head;
+		} else {
+			q->enqueue = tbs_enqueue_scheduledfifo;
+			q->dequeue = tbs_dequeue_scheduledfifo;
+			q->peek = qdisc_peek_head;
+		}
 	}
 }
 
@@ -356,8 +455,9 @@  static int tbs_init(struct Qdisc *sch, struct nlattr *opt,
 
 	qopt = nla_data(tb[TCA_TBS_PARMS]);
 
-	pr_debug("delta %d clockid %d sorting %s\n",
+	pr_debug("delta %d clockid %d offload %s sorting %s\n",
 		 qopt->delta, qopt->clockid,
+		 OFFLOAD_IS_ON(qopt) ? "on" : "off",
 		 SORTING_IS_ON(qopt) ? "on" : "off");
 
 	err = validate_input_params(qopt, extack);
@@ -366,15 +466,26 @@  static int tbs_init(struct Qdisc *sch, struct nlattr *opt,
 
 	q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
 
+	if (OFFLOAD_IS_ON(qopt)) {
+		err = tbs_enable_offload(dev, q, extack);
+		if (err < 0)
+			return err;
+	}
+
 	/* Everything went OK, save the parameters used. */
 	q->delta = qopt->delta;
 	q->clockid = qopt->clockid;
+	q->offload = OFFLOAD_IS_ON(qopt);
 	q->sorting = SORTING_IS_ON(qopt);
 
-	/* Select queueing mode based on parameters. */
+	/* Select queueing mode based on offload and sorting parameters. */
 	setup_queueing_mode(q);
 
-	qdisc_watchdog_init_clockid(&q->watchdog, sch, q->clockid);
+	/* The watchdog will be needed for SW best-effort or if TxTime
+	 * based sorting is on.
+	 */
+	if (!q->offload || q->sorting)
+		qdisc_watchdog_init_clockid(&q->watchdog, sch, q->clockid);
 
 	return 0;
 }
@@ -416,10 +527,13 @@  static void tbs_reset(struct Qdisc *sch)
 static void tbs_destroy(struct Qdisc *sch)
 {
 	struct tbs_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
 	/* Only cancel watchdog if it's been initialized. */
 	if (q->watchdog.qdisc == sch)
 		qdisc_watchdog_cancel(&q->watchdog);
+
+	tbs_disable_offload(dev, q);
 }
 
 static int tbs_dump(struct Qdisc *sch, struct sk_buff *skb)
@@ -434,6 +548,9 @@  static int tbs_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	opt.delta = q->delta;
 	opt.clockid = q->clockid;
+	if (q->offload)
+		opt.flags |= TC_TBS_OFFLOAD_ON;
+
 	if (q->sorting)
 		opt.flags |= TC_TBS_SORTING_ON;