[v8] Add support for cake qdisc

Message ID 20180515151328.3547-1-toke@toke.dk
State New
Delegated to: David Ahern
Headers show
Series
  • [v8] Add support for cake qdisc
Related show

Commit Message

Toke Høiland-Jørgensen May 15, 2018, 3:13 p.m.
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort

Cake is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
Changelog:
v8:
  - Change rates to 64bit values (apparently, 32 Gbps is not enough for
    everyone).
    
v7:
  - Move the target/interval presets to a table and check that only
    one is passed.

v6:
  - Identical to v5 because apparently I don't git so well... :/

v5:
  - Print the SPLIT_GSO flag
  - Switch to print_u64() for JSON output
  - Fix a format string for mpu option output

v4:
  - Switch stats parsing to use nested netlink attributes
  - Tweaks to JSON stats output keys

v3:
  - Remove accidentally included test flag

v2:
  - Updated netlink config ABI
  - Remove diffserv-llt mode
  - Various tweaks and clean-ups of stats output

 man/man8/tc-cake.8 | 632 ++++++++++++++++++++++++++++++++++++++
 man/man8/tc.8      |   1 +
 tc/Makefile        |   1 +
 tc/q_cake.c        | 750 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 1384 insertions(+)
 create mode 100644 man/man8/tc-cake.8
 create mode 100644 tc/q_cake.c

Patch

diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
new file mode 100644
index 00000000..dff2e360
--- /dev/null
+++ b/man/man8/tc-cake.8
@@ -0,0 +1,632 @@ 
+.TH CAKE 8 "27 April 2018" "iproute2" "Linux"
+.SH NAME
+CAKE \- Common Applications Kept Enhanced (CAKE)
+.SH SYNOPSIS
+.B tc qdisc ... cake
+.br
+[
+.BR bandwidth
+RATE |
+.BR unlimited*
+|
+.BR autorate_ingress
+]
+.br
+[
+.BR rtt
+TIME |
+.BR datacentre
+|
+.BR lan
+|
+.BR metro
+|
+.BR regional
+|
+.BR internet*
+|
+.BR oceanic
+|
+.BR satellite
+|
+.BR interplanetary
+]
+.br
+[
+.BR besteffort
+|
+.BR diffserv8
+|
+.BR diffserv4
+|
+.BR diffserv3*
+]
+.br
+[
+.BR flowblind
+|
+.BR srchost
+|
+.BR dsthost
+|
+.BR hosts
+|
+.BR flows
+|
+.BR dual-srchost
+|
+.BR dual-dsthost
+|
+.BR triple-isolate*
+]
+.br
+[
+.BR nat
+|
+.BR nonat*
+]
+.br
+[
+.BR wash
+|
+.BR nowash*
+]
+.br
+[
+.BR ack-filter
+|
+.BR ack-filter-aggressive
+|
+.BR no-ack-filter*
+]
+.br
+[
+.BR memlimit
+LIMIT ]
+.br
+[
+.BR ptm
+|
+.BR atm
+|
+.BR noatm*
+]
+.br
+[
+.BR overhead
+N |
+.BR conservative
+|
+.BR raw*
+]
+.br
+[
+.BR mpu
+N ]
+.br
+[
+.BR ingress
+|
+.BR egress*
+]
+.br
+(* marks defaults)
+
+
+.SH DESCRIPTION
+CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline
+which uses both AQM and FQ.  It combines COBALT, which is an AQM algorithm
+combining Codel and BLUE, a shaper which operates in deficit mode, and a variant
+of DRR++ for flow isolation.  8-way set-associative hashing is used to virtually
+eliminate hash collisions.  Priority queuing is available through a simplified
+diffserv implementation.  Overhead compensation for various encapsulation
+schemes is tightly integrated.
+
+All settings are optional; the default settings are chosen to be sensible in
+most common deployments.  Most people will only need to set the
+.B bandwidth
+parameter to get useful results, but reading the
+.B Overhead Compensation
+and
+.B Round Trip Time
+sections is strongly encouraged.
+
+.SH SHAPER PARAMETERS
+CAKE uses a deficit-mode shaper, which does not exhibit the initial burst
+typical of token-bucket shapers.  It will automatically burst precisely as much
+as required to maintain the configured throughput.  As such, it is very
+straightforward to configure.
+.PP
+.B unlimited
+(default)
+.br
+	No limit on the bandwidth.
+.PP
+.B bandwidth
+RATE
+.br
+	Set the shaper bandwidth.  See
+.BR tc(8)
+or examples below for details of the RATE value.
+.PP
+.B autorate_ingress
+.br
+	Automatic capacity estimation based on traffic arriving at this qdisc.
+This is most likely to be useful with cellular links, which tend to change
+quality randomly.  A
+.B bandwidth
+parameter can be used in conjunction to specify an initial estimate.  The shaper
+will periodically be set to a bandwidth slightly below the estimated rate.  This
+estimator cannot estimate the bandwidth of links downstream of itself.
+
+.SH OVERHEAD COMPENSATION PARAMETERS
+The size of each packet on the wire may differ from that seen by Linux.  The
+following parameters allow CAKE to compensate for this difference by internally
+considering each packet to be bigger than Linux informs it.  To assist users who
+are not expert network engineers, keywords have been provided to represent a
+number of common link technologies.
+
+.SS	Manual Overhead Specification
+.B overhead
+BYTES
+.br
+	Adds BYTES to the size of each packet.  BYTES may be negative; values
+between -64 and 256 (inclusive) are accepted.
+.PP
+.B mpu
+BYTES
+.br
+	Rounds each packet (including overhead) up to a minimum length
+BYTES. BYTES may not be negative; values between 0 and 256 (inclusive)
+are accepted.
+.PP
+.B atm
+.br
+	Compensates for ATM cell framing, which is normally found on ADSL links.
+This is performed after the
+.B overhead
+parameter above.  ATM uses fixed 53-byte cells, each of which can carry 48 bytes
+payload.
+.PP
+.B ptm
+.br
+	Compensates for PTM encoding, which is normally found on VDSL2 links and
+uses a 64b/65b encoding scheme. It is even more efficient to simply
+derate the specified shaper bandwidth by a factor of 64/65 or 0.984. See
+ITU G.992.3 Annex N and IEEE 802.3 Section 61.3 for details.
+.PP
+.B noatm
+.br
+	Disables ATM and PTM compensation.
+
+.SS	Failsafe Overhead Keywords
+These two keywords are provided for quick-and-dirty setup.  Use them if you
+can't be bothered to read the rest of this section.
+.PP
+.B raw
+(default)
+.br
+	Turns off all overhead compensation in CAKE.  The packet size reported
+by Linux will be used directly.
+.PP
+	Other overhead keywords may be added after "raw".  The effect of this is
+to make the overhead compensation operate relative to the reported packet size,
+not the underlying IP packet size.
+.PP
+.B conservative
+.br
+	Compensates for more overhead than is likely to occur on any
+widely-deployed link technology.
+.br
+	Equivalent to
+.B overhead 48 atm.
+
+.SS ADSL Overhead Keywords
+Most ADSL modems have a way to check which framing scheme is in use.  Often this
+is also specified in the settings document provided by the ISP.  The keywords in
+this section are intended to correspond with these sources of information.  All
+of them implicitly set the
+.B atm
+flag.
+.PP
+.B pppoa-vcmux
+.br
+	Equivalent to
+.B overhead 10 atm
+.PP
+.B pppoa-llc
+.br
+	Equivalent to
+.B overhead 14 atm
+.PP
+.B pppoe-vcmux
+.br
+	Equivalent to
+.B overhead 32 atm
+.PP
+.B pppoe-llcsnap
+.br
+	Equivalent to
+.B overhead 40 atm
+.PP
+.B bridged-vcmux
+.br
+	Equivalent to
+.B overhead 24 atm
+.PP
+.B bridged-llcsnap
+.br
+	Equivalent to
+.B overhead 32 atm
+.PP
+.B ipoa-vcmux
+.br
+	Equivalent to
+.B overhead 8 atm
+.PP
+.B ipoa-llcsnap
+.br
+	Equivalent to
+.B overhead 16 atm
+.PP
+See also the Ethernet Correction Factors section below.
+
+.SS VDSL2 Overhead Keywords
+ATM was dropped from VDSL2 in favour of PTM, which is a much more
+straightforward framing scheme.  Some ISPs retained PPPoE for compatibility with
+their existing back-end systems.
+.PP
+.B pppoe-ptm
+.br
+	Equivalent to
+.B overhead 30 ptm
+
+.br
+	PPPoE: 2B PPP + 6B PPPoE +
+.br
+	ETHERNET: 6B dest MAC + 6B src MAC + 2B ethertype + 4B Frame Check Sequence +
+.br
+	PTM: 1B Start of Frame (S) + 1B End of Frame (Ck) + 2B TC-CRC (PTM-FCS)
+.br
+.PP
+.B bridged-ptm
+.br
+	Equivalent to
+.B overhead 22 ptm
+.br
+	ETHERNET: 6B dest MAC + 6B src MAC + 2B ethertype + 4B Frame Check Sequence +
+.br
+	PTM: 1B Start of Frame (S) + 1B End of Frame (Ck) + 2B TC-CRC (PTM-FCS)
+.br
+.PP
+See also the Ethernet Correction Factors section below.
+
+.SS DOCSIS Cable Overhead Keyword
+DOCSIS is the universal standard for providing Internet service over cable-TV
+infrastructure.
+
+In this case, the actual on-wire overhead is less important than the packet size
+the head-end equipment uses for shaping and metering.  This is specified to be
+an Ethernet frame including the CRC (aka FCS).
+.PP
+.B docsis
+.br
+	Equivalent to
+.B overhead 18 mpu 64 noatm
+
+.SS Ethernet Overhead Keywords
+.PP
+.B ethernet
+.br
+	Accounts for Ethernet's preamble, inter-frame gap, and Frame Check
+Sequence.  Use this keyword when the bottleneck being shaped for is an
+actual Ethernet cable.
+.br
+	Equivalent to
+.B overhead 38 mpu 84 noatm
+.PP
+.B ether-vlan
+.br
+	Adds 4 bytes to the overhead compensation, accounting for an IEEE 802.1Q
+VLAN header appended to the Ethernet frame header.  NB: Some ISPs use one or
+even two of these within PPPoE; this keyword may be repeated as necessary to
+express this.
+
+.SH ROUND TRIP TIME PARAMETERS
+Active Queue Management (AQM) consists of embedding congestion signals in the
+packet flow, which receivers use to instruct senders to slow down when the queue
+is persistently occupied.  CAKE uses ECN signalling when available, and packet
+drops otherwise, according to a combination of the Codel and BLUE AQM algorithms
+called COBALT.
+
+Very short latencies require a very rapid AQM response to adequately control
+latency.  However, such a rapid response tends to impair throughput when the
+actual RTT is relatively long.  CAKE allows specifying the RTT it assumes for
+tuning various parameters.  Actual RTTs within an order of magnitude of this
+will generally work well for both throughput and latency management.
+
+At the 'lan' setting and below, the time constants are similar in magnitude to
+the jitter in the Linux kernel itself, so congestion might be signalled
+prematurely. The flows will then become sparse and total throughput reduced,
+leaving little or no back-pressure for the fairness logic to work against. Use
+the "metro" setting for local lans unless you have a custom kernel.
+.PP
+.B rtt
+TIME
+.br
+	Manually specify an RTT.
+.PP
+.B datacentre
+.br
+	For extremely high-performance 10GigE+ networks only.  Equivalent to
+.B rtt 100us.
+.PP
+.B lan
+.br
+	For pure Ethernet (not Wi-Fi) networks, at home or in the office.  Don't
+use this when shaping for an Internet access link.  Equivalent to
+.B rtt 1ms.
+.PP
+.B metro
+.br
+	For traffic mostly within a single city.  Equivalent to
+.B rtt 10ms.
+.PP
+.B regional
+.br
+	For traffic mostly within a European-sized country.  Equivalent to
+.B rtt 30ms.
+.PP
+.B internet
+(default)
+.br
+	This is suitable for most Internet traffic.  Equivalent to
+.B rtt 100ms.
+.PP
+.B oceanic
+.br
+	For Internet traffic with generally above-average latency, such as that
+suffered by Australasian residents.  Equivalent to
+.B rtt 300ms.
+.PP
+.B satellite
+.br
+	For traffic via geostationary satellites.  Equivalent to
+.B rtt 1000ms.
+.PP
+.B interplanetary
+.br
+	So named because Jupiter is about 1 light-hour from Earth.  Use this to
+(almost) completely disable AQM actions.  Equivalent to
+.B rtt 3600s.
+
+.SH FLOW ISOLATION PARAMETERS
+With flow isolation enabled, CAKE places packets from different flows into
+different queues, each of which carries its own AQM state.  Packets from each
+queue are then delivered fairly, according to a DRR++ algorithm which minimises
+latency for "sparse" flows.  CAKE uses a set-associative hashing algorithm to
+minimise flow collisions.
+
+These keywords specify whether fairness based on source address, destination
+address, individual flows, or any combination of those is desired.
+.PP
+.B flowblind
+.br
+	Disables flow isolation; all traffic passes through a single queue for
+each tin.
+.PP
+.B srchost
+.br
+	Flows are defined only by source address.  Could be useful on the egress
+path of an ISP backhaul.
+.PP
+.B dsthost
+.br
+	Flows are defined only by destination address.  Could be useful on the
+ingress path of an ISP backhaul.
+.PP
+.B hosts
+.br
+	Flows are defined by source-destination host pairs.  This is host
+isolation, rather than flow isolation.
+.PP
+.B flows
+.br
+	Flows are defined by the entire 5-tuple of source address, destination
+address, transport protocol, source port and destination port.  This is the type
+of flow isolation performed by SFQ and fq_codel.
+.PP
+.B dual-srchost
+.br
+	Flows are defined by the 5-tuple, and fairness is applied first over
+source addresses, then over individual flows.  Good for use on egress traffic
+from a LAN to the internet, where it'll prevent any one LAN host from
+monopolising the uplink, regardless of the number of flows they use.
+.PP
+.B dual-dsthost
+.br
+	Flows are defined by the 5-tuple, and fairness is applied first over
+destination addresses, then over individual flows.  Good for use on ingress
+traffic to a LAN from the internet, where it'll prevent any one LAN host from
+monopolising the downlink, regardless of the number of flows they use.
+.PP
+.B triple-isolate
+(default)
+.br
+	Flows are defined by the 5-tuple, and fairness is applied over source
+*and* destination addresses intelligently (ie. not merely by host-pairs), and
+also over individual flows.  Use this if you're not certain whether to use
+dual-srchost or dual-dsthost; it'll do both jobs at once, preventing any one
+host on *either* side of the link from monopolising it with a large number of
+flows.
+.PP
+.B nat
+.br
+	Instructs Cake to perform a NAT lookup before applying flow-isolation
+rules, to determine the true addresses and port numbers of the packet, to
+improve fairness between hosts "inside" the NAT.  This has no practical effect
+in "flowblind" or "flows" modes, or if NAT is performed on a different host.
+.PP
+.B nonat
+(default)
+.br
+	Cake will not perform a NAT lookup.  Flow isolation will be performed
+using the addresses and port numbers directly visible to the interface Cake is
+attached to.
+
+.SH PRIORITY QUEUE PARAMETERS
+CAKE can divide traffic into "tins" based on the Diffserv field.  Each tin has
+its own independent set of flow-isolation queues, and is serviced based on a WRR
+algorithm.  To avoid perverse Diffserv marking incentives, tin weights have a
+"priority sharing" value when bandwidth used by that tin is below a threshold,
+and a lower "bandwidth sharing" value when above.  Bandwidth is compared against
+the threshold using the same algorithm as the deficit-mode shaper.
+
+Detailed customisation of tin parameters is not provided.  The following presets
+perform all necessary tuning, relative to the current shaper bandwidth and RTT
+settings.
+.PP
+.B besteffort
+.br
+	Disables priority queuing by placing all traffic in one tin.
+.PP
+.B precedence
+.br
+	Enables legacy interpretation of TOS "Precedence" field.  Use of this
+preset on the modern Internet is firmly discouraged.
+.PP
+.B diffserv4
+.br
+	Provides a general-purpose Diffserv implementation with four tins:
+.br
+		Bulk (CS1), 6.25% threshold, generally low priority.
+.br
+		Best Effort (general), 100% threshold.
+.br
+		Video (AF4x, AF3x, CS3, AF2x, CS2, TOS4, TOS1), 50% threshold.
+.br
+		Voice (CS7, CS6, EF, VA, CS5, CS4), 25% threshold.
+.PP
+.B diffserv3
+(default)
+.br
+	Provides a simple, general-purpose Diffserv implementation with three tins:
+.br
+		Bulk (CS1), 6.25% threshold, generally low priority.
+.br
+		Best Effort (general), 100% threshold.
+.br
+		Voice (CS7, CS6, EF, VA, TOS4), 25% threshold, reduced Codel interval.
+
+.SH OTHER PARAMETERS
+.B memlimit
+LIMIT
+.br
+	Limit the memory consumed by Cake to LIMIT bytes. Note that this does
+not translate directly to queue size (so do not size this based on bandwidth
+delay product considerations, but rather on worst case acceptable memory
+consumption), as there is some overhead in the data structures containing the
+packets, especially for small packets.
+
+	By default, the limit is calculated based on the bandwidth and RTT
+settings.
+
+.PP
+.B wash
+
+.br
+	Traffic entering your diffserv domain is frequently mis-marked in
+transit from the perspective of your network, and traffic exiting yours may be
+mis-marked from the perspective of the transiting provider.
+
+Apply the wash option to clear all extra diffserv (but not ECN bits), after
+priority queuing has taken place.
+
+If you are shaping inbound, and cannot trust the diffserv markings (as is the
+case for Comcast Cable, among others), it is best to use a single queue
+"besteffort" mode with wash.
+
+.SH EXAMPLES
+# tc qdisc delete root dev eth0
+.br
+# tc qdisc add root dev eth0 cake bandwidth 100Mbit ethernet
+.br
+# tc -s qdisc show dev eth0
+.br
+qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 100.0ms noatm overhead 38 mpu 84 
+ Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
+ backlog 0b 0p requeues 0
+ memory used: 0b of 5000000b
+ capacity estimate: 100Mbit
+ min/max network layer size:        65535 /       0
+ min/max overhead-adjusted size:    65535 /       0
+ average network hdr offset:            0
+
+                   Bulk  Best Effort        Voice
+  thresh       6250Kbit      100Mbit       25Mbit
+  target          5.0ms        5.0ms        5.0ms
+  interval      100.0ms      100.0ms      100.0ms
+  pk_delay          0us          0us          0us
+  av_delay          0us          0us          0us
+  sp_delay          0us          0us          0us
+  pkts                0            0            0
+  bytes               0            0            0
+  way_inds            0            0            0
+  way_miss            0            0            0
+  way_cols            0            0            0
+  drops               0            0            0
+  marks               0            0            0
+  ack_drop            0            0            0
+  sp_flows            0            0            0
+  bk_flows            0            0            0
+  un_flows            0            0            0
+  max_len             0            0            0
+  quantum           300         1514          762
+
+After some use:
+.br
+# tc -s qdisc show dev eth0
+
+qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 100.0ms noatm overhead 38 mpu 84 
+ Sent 44709231 bytes 31931 pkt (dropped 45, overlimits 93782 requeues 0) 
+ backlog 33308b 22p requeues 0
+ memory used: 292352b of 5000000b
+ capacity estimate: 100Mbit
+ min/max network layer size:           28 /    1500
+ min/max overhead-adjusted size:       84 /    1538
+ average network hdr offset:           14
+
+                   Bulk  Best Effort        Voice
+  thresh       6250Kbit      100Mbit       25Mbit
+  target          5.0ms        5.0ms        5.0ms
+  interval      100.0ms      100.0ms      100.0ms
+  pk_delay        8.7ms        6.9ms        5.0ms
+  av_delay        4.9ms        5.3ms        3.8ms
+  sp_delay        727us        1.4ms        511us
+  pkts             2590        21271         8137
+  bytes         3081804     30302659     11426206
+  way_inds            0           46            0
+  way_miss            3           17            4
+  way_cols            0            0            0
+  drops              20           15           10
+  marks               0            0            0
+  ack_drop            0            0            0
+  sp_flows            2            4            1
+  bk_flows            1            2            1
+  un_flows            0            0            0
+  max_len          1514         1514         1514
+  quantum           300         1514          762
+
+.SH SEE ALSO
+.BR tc (8),
+.BR tc-codel (8),
+.BR tc-fq_codel (8),
+.BR tc-red (8)
+
+.SH AUTHORS
+Cake's principal author is Jonathan Morton, with contributions from
+Tony Ambardar, Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen,
+Sebastian Moeller, Ryan Mounce, Dean Scarff, Nils Andreas Svee, and Dave Täht.
+
+This manual page was written by Loganaden Velvindron. Please report corrections
+to the Linux Networking mailing list <netdev@vger.kernel.org>.
diff --git a/man/man8/tc.8 b/man/man8/tc.8
index 840880fb..716dfec5 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -795,6 +795,7 @@  was written by Alexey N. Kuznetsov and added in Linux 2.2.
 .BR tc-basic (8),
 .BR tc-bfifo (8),
 .BR tc-bpf (8),
+.BR tc-cake (8),
 .BR tc-cbq (8),
 .BR tc-cgroup (8),
 .BR tc-choke (8),
diff --git a/tc/Makefile b/tc/Makefile
index dfd00267..d9a43568 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -66,6 +66,7 @@  TCMODULES += q_codel.o
 TCMODULES += q_fq_codel.o
 TCMODULES += q_fq.o
 TCMODULES += q_pie.o
+TCMODULES += q_cake.o
 TCMODULES += q_hhf.o
 TCMODULES += q_clsact.o
 TCMODULES += e_bpf.o
diff --git a/tc/q_cake.c b/tc/q_cake.c
new file mode 100644
index 00000000..09c498a9
--- /dev/null
+++ b/tc/q_cake.c
@@ -0,0 +1,750 @@ 
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/*
+ * Common Applications Kept Enhanced  --  CAKE
+ *
+ *  Copyright (C) 2014-2018 Jonathan Morton <chromatix99@gmail.com>
+ *  Copyright (C) 2017-2018 Toke Høiland-Jørgensen <toke@toke.dk>
+ */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include <inttypes.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+struct cake_preset {
+	char *name;
+	unsigned int target;
+	unsigned int interval;
+};
+
+static struct cake_preset presets[] = {
+	{"datacentre",		5,		100},
+	{"lan",			50,		1000},
+	{"metro",		500,		10000},
+	{"regional",		1500,		30000},
+	{"internet",		5000,		100000},
+	{"oceanic",		15000,		300000},
+	{"satellite",		50000,		1000000},
+	{"interplanetary",	50000000,	1000000000},
+};
+
+
+static struct cake_preset *find_preset(char *argv)
+{
+	for (int i = 0; i < ARRAY_SIZE(presets); i++)
+		if (!strcmp(argv, presets[i].name))
+			return &presets[i];
+	return NULL;
+}
+
+static void explain(void)
+{
+	fprintf(stderr,
+"Usage: ... cake [ bandwidth RATE | unlimited* | autorate_ingress ]\n"
+"                [ rtt TIME | datacentre | lan | metro | regional |\n"
+"                  internet* | oceanic | satellite | interplanetary ]\n"
+"                [ besteffort | diffserv8 | diffserv4 | diffserv3* ]\n"
+"                [ flowblind | srchost | dsthost | hosts | flows |\n"
+"                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
+"                [ nat | nonat* ]\n"
+"                [ wash | nowash* ]\n"
+"                [ ack-filter | ack-filter-aggressive | no-ack-filter* ]\n"
+"                [ memlimit LIMIT ]\n"
+"                [ ptm | atm | noatm* ] [ overhead N | conservative | raw* ]\n"
+"                [ mpu N ] [ ingress | egress* ]\n"
+"                (* marks defaults)\n");
+}
+
+static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			  struct nlmsghdr *n, const char *dev)
+{
+	int unlimited = 0;
+	__u64 bandwidth = 0;
+	unsigned interval = 0;
+	unsigned target = 0;
+	unsigned diffserv = 0;
+	unsigned memlimit = 0;
+	int  overhead = 0;
+	bool overhead_set = false;
+	bool overhead_override = false;
+	int mpu = 0;
+	int flowmode = -1;
+	int nat = -1;
+	int atm = -1;
+	int autorate = -1;
+	int wash = -1;
+	int ingress = -1;
+	int ack_filter = -1;
+	struct rtattr *tail;
+	struct cake_preset *preset, *preset_set = NULL;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "bandwidth") == 0) {
+			NEXT_ARG();
+			if (get_rate64(&bandwidth, *argv)) {
+				fprintf(stderr, "Illegal \"bandwidth\"\n");
+				return -1;
+			}
+			unlimited = 0;
+			autorate = 0;
+		} else if (strcmp(*argv, "unlimited") == 0) {
+			bandwidth = 0;
+			unlimited = 1;
+			autorate = 0;
+		} else if (strcmp(*argv, "autorate_ingress") == 0) {
+			autorate = 1;
+
+		} else if (strcmp(*argv, "rtt") == 0) {
+			NEXT_ARG();
+			if (get_time(&interval, *argv)) {
+				fprintf(stderr, "Illegal \"rtt\"\n");
+				return -1;
+			}
+			target = interval / 20;
+			if(!target)
+				target = 1;
+		} else if ((preset = find_preset(*argv))) {
+			if (preset_set)
+				duparg(*argv, preset_set->name);
+			preset_set = preset;
+			target = preset->target;
+			interval = preset->interval;
+
+		} else if (strcmp(*argv, "besteffort") == 0) {
+			diffserv = CAKE_DIFFSERV_BESTEFFORT;
+		} else if (strcmp(*argv, "precedence") == 0) {
+			diffserv = CAKE_DIFFSERV_PRECEDENCE;
+		} else if (strcmp(*argv, "diffserv8") == 0) {
+			diffserv = CAKE_DIFFSERV_DIFFSERV8;
+		} else if (strcmp(*argv, "diffserv4") == 0) {
+			diffserv = CAKE_DIFFSERV_DIFFSERV4;
+		} else if (strcmp(*argv, "diffserv") == 0) {
+			diffserv = CAKE_DIFFSERV_DIFFSERV4;
+		} else if (strcmp(*argv, "diffserv3") == 0) {
+			diffserv = CAKE_DIFFSERV_DIFFSERV3;
+
+		} else if (strcmp(*argv, "nowash") == 0) {
+			wash = 0;
+		} else if (strcmp(*argv, "wash") == 0) {
+			wash = 1;
+
+		} else if (strcmp(*argv, "flowblind") == 0) {
+			flowmode = CAKE_FLOW_NONE;
+		} else if (strcmp(*argv, "srchost") == 0) {
+			flowmode = CAKE_FLOW_SRC_IP;
+		} else if (strcmp(*argv, "dsthost") == 0) {
+			flowmode = CAKE_FLOW_DST_IP;
+		} else if (strcmp(*argv, "hosts") == 0) {
+			flowmode = CAKE_FLOW_HOSTS;
+		} else if (strcmp(*argv, "flows") == 0) {
+			flowmode = CAKE_FLOW_FLOWS;
+		} else if (strcmp(*argv, "dual-srchost") == 0) {
+			flowmode = CAKE_FLOW_DUAL_SRC;
+		} else if (strcmp(*argv, "dual-dsthost") == 0) {
+			flowmode = CAKE_FLOW_DUAL_DST;
+		} else if (strcmp(*argv, "triple-isolate") == 0) {
+			flowmode = CAKE_FLOW_TRIPLE;
+
+		} else if (strcmp(*argv, "nat") == 0) {
+			nat = 1;
+		} else if (strcmp(*argv, "nonat") == 0) {
+			nat = 0;
+
+		} else if (strcmp(*argv, "ptm") == 0) {
+			atm = CAKE_ATM_PTM;
+		} else if (strcmp(*argv, "atm") == 0) {
+			atm = CAKE_ATM_ATM;
+		} else if (strcmp(*argv, "noatm") == 0) {
+			atm = CAKE_ATM_NONE;
+
+		} else if (strcmp(*argv, "raw") == 0) {
+			atm = CAKE_ATM_NONE;
+			overhead = 0;
+			overhead_set = true;
+			overhead_override = true;
+		} else if (strcmp(*argv, "conservative") == 0) {
+			/*
+			 * Deliberately over-estimate overhead:
+			 * one whole ATM cell plus ATM framing.
+			 * A safe choice if the actual overhead is unknown.
+			 */
+			atm = CAKE_ATM_ATM;
+			overhead = 48;
+			overhead_set = true;
+
+		/* Various ADSL framing schemes, all over ATM cells */
+		} else if (strcmp(*argv, "ipoa-vcmux") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 8;
+			overhead_set = true;
+		} else if (strcmp(*argv, "ipoa-llcsnap") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 16;
+			overhead_set = true;
+		} else if (strcmp(*argv, "bridged-vcmux") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 24;
+			overhead_set = true;
+		} else if (strcmp(*argv, "bridged-llcsnap") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 32;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoa-vcmux") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 10;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoa-llc") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 14;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoe-vcmux") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 32;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoe-llcsnap") == 0) {
+			atm = CAKE_ATM_ATM;
+			overhead += 40;
+			overhead_set = true;
+
+		/* Typical VDSL2 framing schemes, both over PTM */
+		/* PTM has 64b/65b coding which absorbs some bandwidth */
+		} else if (strcmp(*argv, "pppoe-ptm") == 0) {
+			/* 2B PPP + 6B PPPoE + 6B dest MAC + 6B src MAC
+			 * + 2B ethertype + 4B Frame Check Sequence
+			 * + 1B Start of Frame (S) + 1B End of Frame (Ck)
+			 * + 2B TC-CRC (PTM-FCS) = 30B
+			 */
+			atm = CAKE_ATM_PTM;
+			overhead += 30;
+			overhead_set = true;
+		} else if (strcmp(*argv, "bridged-ptm") == 0) {
+			/* 6B dest MAC + 6B src MAC + 2B ethertype
+			 * + 4B Frame Check Sequence
+			 * + 1B Start of Frame (S) + 1B End of Frame (Ck)
+			 * + 2B TC-CRC (PTM-FCS) = 22B
+			 */
+			atm = CAKE_ATM_PTM;
+			overhead += 22;
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "via-ethernet") == 0) {
+			/*
+			 * We used to use this flag to manually compensate for
+			 * Linux including the Ethernet header on Ethernet-type
+			 * interfaces, but not on IP-type interfaces.
+			 *
+			 * It is no longer needed, because Cake now adjusts for
+			 * that automatically, and is thus ignored.
+			 *
+			 * It would be deleted entirely, but it appears in the
+			 * stats output when the automatic compensation is
+			 * active.
+			 */
+
+		} else if (strcmp(*argv, "ethernet") == 0) {
+			/* ethernet pre-amble & interframe gap & FCS
+			 * you may need to add vlan tag */
+			overhead += 38;
+			overhead_set = true;
+			mpu = 84;
+
+		/* Additional Ethernet-related overhead used by some ISPs */
+		} else if (strcmp(*argv, "ether-vlan") == 0) {
+			/* 802.1q VLAN tag - may be repeated */
+			overhead += 4;
+			overhead_set = true;
+
+		/*
+		 * DOCSIS cable shapers account for Ethernet frame with FCS,
+		 * but not interframe gap or preamble.
+		 */
+		} else if (strcmp(*argv, "docsis") == 0) {
+			atm = CAKE_ATM_NONE;
+			overhead += 18;
+			overhead_set = true;
+			mpu = 64;
+
+		} else if (strcmp(*argv, "overhead") == 0) {
+			char* p = NULL;
+			NEXT_ARG();
+			overhead = strtol(*argv, &p, 10);
+			if(!p || *p || !*argv || overhead < -64 || overhead > 256) {
+				fprintf(stderr, "Illegal \"overhead\", valid range is -64 to 256\\n");
+				return -1;
+			}
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "mpu") == 0) {
+			char* p = NULL;
+			NEXT_ARG();
+			mpu = strtol(*argv, &p, 10);
+			if(!p || *p || !*argv || mpu < 0 || mpu > 256) {
+				fprintf(stderr, "Illegal \"mpu\", valid range is 0 to 256\\n");
+				return -1;
+			}
+
+		} else if (strcmp(*argv, "ingress") == 0) {
+			ingress = 1;
+		} else if (strcmp(*argv, "egress") == 0) {
+			ingress = 0;
+
+		} else if (strcmp(*argv, "no-ack-filter") == 0) {
+			ack_filter = CAKE_ACK_NONE;
+		} else if (strcmp(*argv, "ack-filter") == 0) {
+			ack_filter = CAKE_ACK_FILTER;
+		} else if (strcmp(*argv, "ack-filter-aggressive") == 0) {
+			ack_filter = CAKE_ACK_AGGRESSIVE;
+
+		} else if (strcmp(*argv, "memlimit") == 0) {
+			NEXT_ARG();
+			if(get_size(&memlimit, *argv)) {
+				fprintf(stderr, "Illegal value for \"memlimit\": \"%s\"\n", *argv);
+				return -1;
+			}
+
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (bandwidth || unlimited)
+		addattr_l(n, 1024, TCA_CAKE_BASE_RATE64, &bandwidth, sizeof(bandwidth));
+	if (diffserv)
+		addattr_l(n, 1024, TCA_CAKE_DIFFSERV_MODE, &diffserv, sizeof(diffserv));
+	if (atm != -1)
+		addattr_l(n, 1024, TCA_CAKE_ATM, &atm, sizeof(atm));
+	if (flowmode != -1)
+		addattr_l(n, 1024, TCA_CAKE_FLOW_MODE, &flowmode, sizeof(flowmode));
+	if (overhead_set)
+		addattr_l(n, 1024, TCA_CAKE_OVERHEAD, &overhead, sizeof(overhead));
+	if (overhead_override) {
+		unsigned zero = 0;
+		addattr_l(n, 1024, TCA_CAKE_RAW, &zero, sizeof(zero));
+	}
+	if (mpu > 0)
+		addattr_l(n, 1024, TCA_CAKE_MPU, &mpu, sizeof(mpu));
+	if (interval)
+		addattr_l(n, 1024, TCA_CAKE_RTT, &interval, sizeof(interval));
+	if (target)
+		addattr_l(n, 1024, TCA_CAKE_TARGET, &target, sizeof(target));
+	if (autorate != -1)
+		addattr_l(n, 1024, TCA_CAKE_AUTORATE, &autorate, sizeof(autorate));
+	if (memlimit)
+		addattr_l(n, 1024, TCA_CAKE_MEMORY, &memlimit, sizeof(memlimit));
+	if (nat != -1)
+		addattr_l(n, 1024, TCA_CAKE_NAT, &nat, sizeof(nat));
+	if (wash != -1)
+		addattr_l(n, 1024, TCA_CAKE_WASH, &wash, sizeof(wash));
+	if (ingress != -1)
+		addattr_l(n, 1024, TCA_CAKE_INGRESS, &ingress, sizeof(ingress));
+	if (ack_filter != -1)
+		addattr_l(n, 1024, TCA_CAKE_ACK_FILTER, &ack_filter, sizeof(ack_filter));
+
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+
+static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CAKE_MAX + 1];
+	__u64 bandwidth = 0;
+	unsigned diffserv = 0;
+	unsigned flowmode = 0;
+	unsigned interval = 0;
+	unsigned memlimit = 0;
+	int overhead = 0;
+	int raw = 0;
+	int mpu = 0;
+	int atm = 0;
+	int nat = 0;
+	int autorate = 0;
+	int wash = 0;
+	int ingress = 0;
+	int ack_filter = 0;
+	int split_gso = 0;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CAKE_MAX, opt);
+
+	if (tb[TCA_CAKE_BASE_RATE64] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_BASE_RATE64]) >= sizeof(bandwidth)) {
+		bandwidth = rta_getattr_u64(tb[TCA_CAKE_BASE_RATE64]);
+		if(bandwidth) {
+			print_uint(PRINT_JSON, "bandwidth", NULL, bandwidth);
+			print_string(PRINT_FP, NULL, "bandwidth %s ", sprint_rate(bandwidth, b1));
+		} else
+			print_string(PRINT_ANY, "bandwidth", "bandwidth %s ", "unlimited");
+	}
+	if (tb[TCA_CAKE_AUTORATE] &&
+		RTA_PAYLOAD(tb[TCA_CAKE_AUTORATE]) >= sizeof(__u32)) {
+		autorate = rta_getattr_u32(tb[TCA_CAKE_AUTORATE]);
+		if(autorate == 1)
+			print_string(PRINT_ANY, "autorate", "autorate_%s ", "ingress");
+		else if(autorate)
+			print_string(PRINT_ANY, "autorate", "(?autorate?) ", "unknown");
+	}
+	if (tb[TCA_CAKE_DIFFSERV_MODE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_DIFFSERV_MODE]) >= sizeof(__u32)) {
+		diffserv = rta_getattr_u32(tb[TCA_CAKE_DIFFSERV_MODE]);
+		switch(diffserv) {
+		case CAKE_DIFFSERV_DIFFSERV3:
+			print_string(PRINT_ANY, "diffserv", "%s ", "diffserv3");
+			break;
+		case CAKE_DIFFSERV_DIFFSERV4:
+			print_string(PRINT_ANY, "diffserv", "%s ", "diffserv4");
+			break;
+		case CAKE_DIFFSERV_DIFFSERV8:
+			print_string(PRINT_ANY, "diffserv", "%s ", "diffserv8");
+			break;
+		case CAKE_DIFFSERV_BESTEFFORT:
+			print_string(PRINT_ANY, "diffserv", "%s ", "besteffort");
+			break;
+		case CAKE_DIFFSERV_PRECEDENCE:
+			print_string(PRINT_ANY, "diffserv", "%s ", "precedence");
+			break;
+		default:
+			print_string(PRINT_ANY, "diffserv", "(?diffserv?) ", "unknown");
+			break;
+		};
+	}
+	if (tb[TCA_CAKE_FLOW_MODE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_FLOW_MODE]) >= sizeof(__u32)) {
+		flowmode = rta_getattr_u32(tb[TCA_CAKE_FLOW_MODE]);
+		switch(flowmode) {
+		case CAKE_FLOW_NONE:
+			print_string(PRINT_ANY, "flowmode", "%s ", "flowblind");
+			break;
+		case CAKE_FLOW_SRC_IP:
+			print_string(PRINT_ANY, "flowmode", "%s ", "srchost");
+			break;
+		case CAKE_FLOW_DST_IP:
+			print_string(PRINT_ANY, "flowmode", "%s ", "dsthost");
+			break;
+		case CAKE_FLOW_HOSTS:
+			print_string(PRINT_ANY, "flowmode", "%s ", "hosts");
+			break;
+		case CAKE_FLOW_FLOWS:
+			print_string(PRINT_ANY, "flowmode", "%s ", "flows");
+			break;
+		case CAKE_FLOW_DUAL_SRC:
+			print_string(PRINT_ANY, "flowmode", "%s ", "dual-srchost");
+			break;
+		case CAKE_FLOW_DUAL_DST:
+			print_string(PRINT_ANY, "flowmode", "%s ", "dual-dsthost");
+			break;
+		case CAKE_FLOW_TRIPLE:
+			print_string(PRINT_ANY, "flowmode", "%s ", "triple-isolate");
+			break;
+		default:
+			print_string(PRINT_ANY, "flowmode", "(?flowmode?) ", "unknown");
+			break;
+		};
+
+	}
+
+	if (tb[TCA_CAKE_NAT] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_NAT]) >= sizeof(__u32)) {
+	    nat = rta_getattr_u32(tb[TCA_CAKE_NAT]);
+	}
+
+	if(nat)
+		print_string(PRINT_FP, NULL, "nat ", NULL);
+	print_bool(PRINT_JSON, "nat", NULL, nat);
+
+	if (tb[TCA_CAKE_WASH] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_WASH]) >= sizeof(__u32)) {
+		wash = rta_getattr_u32(tb[TCA_CAKE_WASH]);
+	}
+	if (tb[TCA_CAKE_ATM] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_ATM]) >= sizeof(__u32)) {
+		atm = rta_getattr_u32(tb[TCA_CAKE_ATM]);
+	}
+	if (tb[TCA_CAKE_OVERHEAD] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_OVERHEAD]) >= sizeof(__s32)) {
+		overhead = *(__s32 *) RTA_DATA(tb[TCA_CAKE_OVERHEAD]);
+	}
+	if (tb[TCA_CAKE_MPU] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_MPU]) >= sizeof(__u32)) {
+		mpu = rta_getattr_u32(tb[TCA_CAKE_MPU]);
+	}
+	if (tb[TCA_CAKE_INGRESS] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_INGRESS]) >= sizeof(__u32)) {
+		ingress = rta_getattr_u32(tb[TCA_CAKE_INGRESS]);
+	}
+	if (tb[TCA_CAKE_ACK_FILTER] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_ACK_FILTER]) >= sizeof(__u32)) {
+		ack_filter = rta_getattr_u32(tb[TCA_CAKE_ACK_FILTER]);
+	}
+	if (tb[TCA_CAKE_SPLIT_GSO] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_SPLIT_GSO]) >= sizeof(__u32)) {
+		split_gso = rta_getattr_u32(tb[TCA_CAKE_SPLIT_GSO]);
+	}
+	if (tb[TCA_CAKE_RAW]) {
+		raw = 1;
+	}
+	if (tb[TCA_CAKE_RTT] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_RTT]) >= sizeof(__u32)) {
+		interval = rta_getattr_u32(tb[TCA_CAKE_RTT]);
+	}
+
+	if (wash)
+		print_string(PRINT_FP, NULL, "wash ", NULL);
+	print_bool(PRINT_JSON, "wash", NULL, wash);
+
+	if (ingress)
+		print_string(PRINT_FP, NULL, "ingress ", NULL);
+	print_bool(PRINT_JSON, "ingress", NULL, ingress);
+
+	if (ack_filter == CAKE_ACK_AGGRESSIVE)
+		print_string(PRINT_ANY, "ack-filter", "ack-filter-%s ", "aggressive");
+	else if (ack_filter == CAKE_ACK_FILTER)
+		print_string(PRINT_ANY, "ack-filter", "ack-filter ", "enabled");
+	else
+		print_string(PRINT_JSON, "ack-filter", NULL, "disabled");
+
+	if (split_gso)
+		print_string(PRINT_FP, NULL, "split-gso ", NULL);
+	print_bool(PRINT_JSON, "split_gso", NULL, split_gso);
+
+	if (interval)
+		print_string(PRINT_FP, NULL, "rtt %s ", sprint_time(interval, b2));
+	print_uint(PRINT_JSON, "rtt", NULL, interval);
+
+	if (raw)
+		print_string(PRINT_FP, NULL, "raw ", NULL);
+	print_bool(PRINT_JSON, "raw", NULL, raw);
+
+	if (atm == CAKE_ATM_ATM)
+		print_string(PRINT_ANY, "atm", "%s ", "atm");
+	else if (atm == CAKE_ATM_PTM)
+		print_string(PRINT_ANY, "atm", "%s ", "ptm");
+	else if (!raw)
+		print_string(PRINT_ANY, "atm", "%s ", "noatm");
+
+	print_int(PRINT_ANY, "overhead", "overhead %d ", overhead);
+
+	if (mpu)
+		print_uint(PRINT_ANY, "mpu", "mpu %u ", mpu);
+
+	if (memlimit) {
+		print_uint(PRINT_JSON, "memlimit", NULL, memlimit);
+		print_string(PRINT_FP, NULL, "memlimit %s", sprint_size(memlimit, b1));
+	}
+
+	return 0;
+}
+
+static void cake_print_json_tin(struct rtattr **tstat)
+{
+#define PRINT_TSTAT_JSON(type, name, attr) if (tstat[TCA_CAKE_TIN_STATS_ ## attr]) \
+		print_u64(PRINT_JSON, name, NULL,			\
+			rta_getattr_ ## type((struct rtattr *)tstat[TCA_CAKE_TIN_STATS_ ## attr]))
+
+	open_json_object(NULL);
+	PRINT_TSTAT_JSON(u64, "threshold_rate", THRESHOLD_RATE64);
+	PRINT_TSTAT_JSON(u32, "target_us", TARGET_US);
+	PRINT_TSTAT_JSON(u32, "interval_us", INTERVAL_US);
+	PRINT_TSTAT_JSON(u32, "peak_delay_us", PEAK_DELAY_US);
+	PRINT_TSTAT_JSON(u32, "avg_delay_us", AVG_DELAY_US);
+	PRINT_TSTAT_JSON(u32, "base_delay_us", BASE_DELAY_US);
+	PRINT_TSTAT_JSON(u32, "sent_packets", SENT_PACKETS);
+	PRINT_TSTAT_JSON(u64, "sent_bytes", SENT_BYTES64);
+	PRINT_TSTAT_JSON(u32, "way_indirect_hits", WAY_INDIRECT_HITS);
+	PRINT_TSTAT_JSON(u32, "way_misses", WAY_MISSES);
+	PRINT_TSTAT_JSON(u32, "way_collisions", WAY_COLLISIONS);
+	PRINT_TSTAT_JSON(u32, "drops", DROPPED_PACKETS);
+	PRINT_TSTAT_JSON(u32, "ecn_mark", ECN_MARKED_PACKETS);
+	PRINT_TSTAT_JSON(u32, "ack_drops", ACKS_DROPPED_PACKETS);
+	PRINT_TSTAT_JSON(u32, "sparse_flows", SPARSE_FLOWS);
+	PRINT_TSTAT_JSON(u32, "bulk_flows", BULK_FLOWS);
+	PRINT_TSTAT_JSON(u32, "unresponsive_flows", UNRESPONSIVE_FLOWS);
+	PRINT_TSTAT_JSON(u32, "max_pkt_len", MAX_SKBLEN);
+	PRINT_TSTAT_JSON(u32, "flow_quantum", FLOW_QUANTUM);
+	close_json_object();
+
+#undef PRINT_TSTAT_JSON
+}
+
+static int cake_print_xstats(struct qdisc_util *qu, FILE *f,
+			     struct rtattr *xstats)
+{
+	SPRINT_BUF(b1);
+	struct rtattr *st[TCA_CAKE_STATS_MAX + 1];
+	int i;
+
+	if (xstats == NULL)
+		return 0;
+
+#define GET_STAT_U32(attr) rta_getattr_u32(st[TCA_CAKE_STATS_ ## attr])
+#define GET_STAT_U64(attr) rta_getattr_u64(st[TCA_CAKE_STATS_ ## attr])
+
+	parse_rtattr_nested(st, TCA_CAKE_STATS_MAX, xstats);
+
+	if (st[TCA_CAKE_STATS_MEMORY_USED] &&
+	    st[TCA_CAKE_STATS_MEMORY_LIMIT]) {
+		print_string(PRINT_FP, NULL, " memory used: %s",
+			sprint_size(GET_STAT_U32(MEMORY_USED), b1));
+
+		print_string(PRINT_FP, NULL, " of %s\n",
+			sprint_size(GET_STAT_U32(MEMORY_LIMIT), b1));
+
+		print_uint(PRINT_JSON, "memory_used", NULL,
+			GET_STAT_U32(MEMORY_USED));
+		print_uint(PRINT_JSON, "memory_limit", NULL,
+			GET_STAT_U32(MEMORY_LIMIT));
+	}
+
+	if (st[TCA_CAKE_STATS_CAPACITY_ESTIMATE64]) {
+		print_string(PRINT_FP, NULL, " capacity estimate: %s\n",
+			sprint_rate(GET_STAT_U64(CAPACITY_ESTIMATE64), b1));
+		print_uint(PRINT_JSON, "capacity_estimate", NULL,
+			GET_STAT_U64(CAPACITY_ESTIMATE64));
+	}
+
+	if (st[TCA_CAKE_STATS_MIN_NETLEN] &&
+	    st[TCA_CAKE_STATS_MAX_NETLEN]) {
+		print_uint(PRINT_ANY, "min_network_size",
+			   " min/max network layer size: %12u",
+			   GET_STAT_U32(MIN_NETLEN));
+		print_uint(PRINT_ANY, "max_network_size",
+			   " /%8u\n", GET_STAT_U32(MAX_NETLEN));
+	}
+
+	if (st[TCA_CAKE_STATS_MIN_ADJLEN] &&
+	    st[TCA_CAKE_STATS_MAX_ADJLEN]) {
+		print_uint(PRINT_ANY, "min_adj_size",
+			   " min/max overhead-adjusted size: %8u",
+			   GET_STAT_U32(MIN_ADJLEN));
+		print_uint(PRINT_ANY, "max_adj_size",
+			   " /%8u\n", GET_STAT_U32(MAX_ADJLEN));
+	}
+
+	if (st[TCA_CAKE_STATS_AVG_NETOFF])
+		print_uint(PRINT_ANY, "avg_hdr_offset",
+			   " average network hdr offset: %12u\n\n",
+			   GET_STAT_U32(AVG_NETOFF));
+
+#undef GET_STAT_U32
+#undef GET_STAT_U64
+
+	if (st[TCA_CAKE_STATS_TIN_STATS]) {
+		struct rtattr *tins[TC_CAKE_MAX_TINS + 1];
+		struct rtattr *tstat[TC_CAKE_MAX_TINS][TCA_CAKE_TIN_STATS_MAX + 1];
+		int num_tins = 0;
+
+		parse_rtattr_nested(tins, TC_CAKE_MAX_TINS, st[TCA_CAKE_STATS_TIN_STATS]);
+
+		for (i = 1; i <= TC_CAKE_MAX_TINS && tins[i]; i++) {
+			parse_rtattr_nested(tstat[i-1], TCA_CAKE_TIN_STATS_MAX, tins[i]);
+			num_tins++;
+		}
+
+		if (!num_tins)
+			return 0;
+
+		if (is_json_context()) {
+			open_json_array(PRINT_JSON, "tins");
+			for (i = 0; i < num_tins; i++)
+				cake_print_json_tin(tstat[i]);
+			close_json_array(PRINT_JSON, NULL);
+
+			return 0;
+		}
+
+
+		switch(num_tins) {
+		case 3:
+			fprintf(f, "                   Bulk  Best Effort        Voice\n");
+			break;
+
+		case 4:
+			fprintf(f, "                   Bulk  Best Effort        Video        Voice\n");
+			break;
+
+		default:
+			fprintf(f, "          ");
+			for(i=0; i < num_tins; i++)
+				fprintf(f, "        Tin %u", i);
+			fprintf(f, "\n");
+		};
+
+#define GET_TSTAT(i, attr) (tstat[i][TCA_CAKE_TIN_STATS_ ## attr])
+#define PRINT_TSTAT(name, attr, fmts, val)	do {		\
+			if (GET_TSTAT(0, attr)) {		\
+				fprintf(f, name);		\
+				for (i = 0; i < num_tins; i++)	\
+					fprintf(f, " %12" fmts,	val);	\
+				fprintf(f, "\n");			\
+			}						\
+		} while (0)
+
+#define SPRINT_TSTAT(pfunc, type, name, attr) PRINT_TSTAT(		\
+			name, attr, "s", sprint_ ## pfunc(		\
+				rta_getattr_ ## type(GET_TSTAT(i, attr)), b1))
+
+#define PRINT_TSTAT_U32(name, attr)	PRINT_TSTAT(			\
+			name, attr, "u", rta_getattr_u32(GET_TSTAT(i, attr)))
+
+#define PRINT_TSTAT_U64(name, attr)	PRINT_TSTAT(			\
+			name, attr, "llu", rta_getattr_u64(GET_TSTAT(i, attr)))
+
+		SPRINT_TSTAT(rate, u64, "  thresh  ", THRESHOLD_RATE64);
+		SPRINT_TSTAT(time, u32, "  target  ", TARGET_US);
+		SPRINT_TSTAT(time, u32, "  interval", INTERVAL_US);
+		SPRINT_TSTAT(time, u32, "  pk_delay", PEAK_DELAY_US);
+		SPRINT_TSTAT(time, u32, "  av_delay", AVG_DELAY_US);
+		SPRINT_TSTAT(time, u32, "  sp_delay", BASE_DELAY_US);
+
+		PRINT_TSTAT_U32("  pkts    ", SENT_PACKETS);
+		PRINT_TSTAT_U64("  bytes   ", SENT_BYTES64);
+
+		PRINT_TSTAT_U32("  way_inds", WAY_INDIRECT_HITS);
+		PRINT_TSTAT_U32("  way_miss", WAY_MISSES);
+		PRINT_TSTAT_U32("  way_cols", WAY_COLLISIONS);
+		PRINT_TSTAT_U32("  drops   ", DROPPED_PACKETS);
+		PRINT_TSTAT_U32("  marks   ", ECN_MARKED_PACKETS);
+		PRINT_TSTAT_U32("  ack_drop", ACKS_DROPPED_PACKETS);
+		PRINT_TSTAT_U32("  sp_flows", SPARSE_FLOWS);
+		PRINT_TSTAT_U32("  bk_flows", BULK_FLOWS);
+		PRINT_TSTAT_U32("  un_flows", UNRESPONSIVE_FLOWS);
+		PRINT_TSTAT_U32("  max_len ", MAX_SKBLEN);
+		PRINT_TSTAT_U32("  quantum ", FLOW_QUANTUM);
+
+#undef GET_STAT
+#undef PRINT_TSTAT
+#undef SPRINT_TSTAT
+#undef PRINT_TSTAT_U32
+#undef PRINT_TSTAT_U64
+	}
+	return 0;
+}
+
+struct qdisc_util cake_qdisc_util = {
+	.id		= "cake",
+	.parse_qopt	= cake_parse_opt,
+	.print_qopt	= cake_print_opt,
+	.print_xstats	= cake_print_xstats,
+};