diff mbox

[RFC,v1] iproute2: add IFLA_TC support to 'ip link'

Message ID 20101201182758.3297.34345.stgit@jf-dev1-dcblab
State RFC, archived
Delegated to: stephen hemminger
Headers show

Commit Message

John Fastabend Dec. 1, 2010, 6:27 p.m. UTC
Add support to return IFLA_TC qos settings to the 'ip link'
command. The following sets the number of traffic classes
supported in HW and builds a priority map.

#ip link set eth3 tc num 8 map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0

With the output from 'ip link' showing maps for interfaces with
the ability to use HW traffic classes.

#ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:30:48:f0:fc:88 brdff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:30:48:f0:fc:89 brdff:ff:ff:ff:ff:ff
6: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:1b:21:55:23:58 brdff:ff:ff:ff:ff:ff
    tc 0:8
7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:55:23:59 brdff:ff:ff:ff:ff:ff
    tc 8:8 map: { 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 }
    txqs: (0:8) (8:16) (16:24) (24:32) (32:40) (40:48) (48:56) (56:64)

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/if_link.h |   51 ++++++++++++++++++++++++++++++++++++++
 ip/ipaddress.c          |   48 ++++++++++++++++++++++++++++++++++--
 ip/iplink.c             |   63 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 160 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

stephen hemminger Dec. 1, 2010, 6:38 p.m. UTC | #1
On Wed, 01 Dec 2010 10:27:58 -0800
John Fastabend <john.r.fastabend@intel.com> wrote:

> Add support to return IFLA_TC qos settings to the 'ip link'
> command. The following sets the number of traffic classes
> supported in HW and builds a priority map.
> 
> #ip link set eth3 tc num 8 map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0
> 
> With the output from 'ip link' showing maps for interfaces with
> the ability to use HW traffic classes.
> 
> #ip link show
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>     link/loopback 00:00:00:00:00:00 brd00:00:00:00:00:00
> 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>     link/ether 00:30:48:f0:fc:88 brdff:ff:ff:ff:ff:ff
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>     link/ether 00:30:48:f0:fc:89 brdff:ff:ff:ff:ff:ff
> 6: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000
>     link/ether 00:1b:21:55:23:58 brdff:ff:ff:ff:ff:ff
>     tc 0:8
> 7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>     link/ether 00:1b:21:55:23:59 brdff:ff:ff:ff:ff:ff
>     tc 8:8 map: { 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 }
>     txqs: (0:8) (8:16) (16:24) (24:32) (32:40) (40:48) (48:56) (56:64)
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Ok. but will not be applied until after 2.6.38 (when kernel support
is upstream).
David Miller Dec. 1, 2010, 6:48 p.m. UTC | #2
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 1 Dec 2010 10:38:02 -0800

> Ok. but will not be applied until after 2.6.38 (when kernel support
> is upstream).

Speaking of which, Stephen could you please process the iproute2 patches
which have been rotting in the patchwork queue since... May?!

Just reject them if you don't plan to apply them.  Letting them just sit
around there for half a year bothers me (and patch submitters) a lot.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Dec. 1, 2010, 7:27 p.m. UTC | #3
On Wed, 01 Dec 2010 10:48:23 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 1 Dec 2010 10:38:02 -0800
> 
> > Ok. but will not be applied until after 2.6.38 (when kernel support
> > is upstream).
> 
> Speaking of which, Stephen could you please process the iproute2 patches
> which have been rotting in the patchwork queue since... May?!
> 
> Just reject them if you don't plan to apply them.  Letting them just sit
> around there for half a year bothers me (and patch submitters) a lot.
> 

Now up to date with 2.6.37-rcX in git tree.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 1, 2010, 7:30 p.m. UTC | #4
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 1 Dec 2010 11:27:16 -0800

> Now up to date with 2.6.37-rcX in git tree.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend Dec. 1, 2010, 8:57 p.m. UTC | #5
On 12/1/2010 10:38 AM, Stephen Hemminger wrote:
> On Wed, 01 Dec 2010 10:27:58 -0800
> John Fastabend <john.r.fastabend@intel.com> wrote:
> 
>> Add support to return IFLA_TC qos settings to the 'ip link'
>> command. The following sets the number of traffic classes
>> supported in HW and builds a priority map.
>>
>> #ip link set eth3 tc num 8 map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0
>>
>> With the output from 'ip link' showing maps for interfaces with
>> the ability to use HW traffic classes.
>>
>> #ip link show
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>>     link/loopback 00:00:00:00:00:00 brd00:00:00:00:00:00
>> 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>>     link/ether 00:30:48:f0:fc:88 brdff:ff:ff:ff:ff:ff
>> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>>     link/ether 00:30:48:f0:fc:89 brdff:ff:ff:ff:ff:ff
>> 6: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000
>>     link/ether 00:1b:21:55:23:58 brdff:ff:ff:ff:ff:ff
>>     tc 0:8
>> 7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
>>     link/ether 00:1b:21:55:23:59 brdff:ff:ff:ff:ff:ff
>>     tc 8:8 map: { 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 }
>>     txqs: (0:8) (8:16) (16:24) (24:32) (32:40) (40:48) (48:56) (56:64)
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> Ok. but will not be applied until after 2.6.38 (when kernel support
> is upstream).
> 
> 

Agreed. I wanted to send this out to illustrate the interface. I'll post a non-RFC patch after I get the corresponding kernel support accepted.

Thanks,
John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Dec. 2, 2010, 10:40 a.m. UTC | #6
On Wed, 2010-12-01 at 10:27 -0800, John Fastabend wrote:
> Add support to return IFLA_TC qos settings to the 'ip link'
> command. The following sets the number of traffic classes
> supported in HW and builds a priority map.
> 
> #ip link set eth3 tc num 8 map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0
> 
> With the output from 'ip link' showing maps for interfaces with
> the ability to use HW traffic classes.

2 comments apply to the kernel patches as well - but easier to point
out here.

1) IMO, this looks like the wrong interface to use. 
Was there any reason not to use tc and instead having it show
itself embedded within "ip" abstraction? 
Example, this would suit your intent:
tc qdisc add dev eth3 hware-kinda-8021q-sched num 8 map blah bleh

You can then modify individual classes of traffic with "tc class".

[There are plenty of other chips (switching chips for example) that
implement a variety different hardware schedulers, hence the
"hardware-kinda-8021q-sched" above]

2) How does this mapping in hardware correlate to the software side
mapping? When packets of class X make it off the hardware and hit
the stack are they still going to get the same treatment as they
would have in h/ware?


cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend Dec. 2, 2010, 7:51 p.m. UTC | #7
On 12/2/2010 2:40 AM, jamal wrote:
> On Wed, 2010-12-01 at 10:27 -0800, John Fastabend wrote:
>> Add support to return IFLA_TC qos settings to the 'ip link'
>> command. The following sets the number of traffic classes
>> supported in HW and builds a priority map.
>>
>> #ip link set eth3 tc num 8 map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0
>>
>> With the output from 'ip link' showing maps for interfaces with
>> the ability to use HW traffic classes.
> 
> 2 comments apply to the kernel patches as well - but easier to point
> out here.
> 
> 1) IMO, this looks like the wrong interface to use. 
> Was there any reason not to use tc and instead having it show
> itself embedded within "ip" abstraction? 
> Example, this would suit your intent:
> tc qdisc add dev eth3 hware-kinda-8021q-sched num 8 map blah bleh
> 

I viewed the HW QOS as L2 link attributes more than a queuing discipline per se. Plus 'ip link' is already used to set things outside of ip. For example 'txqueuelen' and 'vf x'.

> You can then modify individual classes of traffic with "tc class".
> 
> [There are plenty of other chips (switching chips for example) that
> implement a variety different hardware schedulers, hence the
> "hardware-kinda-8021q-sched" above]

However thinking about this a bit more qdisc support seems cleaner. For one we can configure QOS policies per class with Qdisc_class_ops. And then also aggregate statistics with dump_stats. I would avoid the "hardware-kinda-8021q-sched" name though to account for schedulers that may not be 802.1Q compliant maybe 'mclass-sched' for multi-class scheduler. I'll look into this. Thanks for the suggestion!

> 
> 2) How does this mapping in hardware correlate to the software side
> mapping? When packets of class X make it off the hardware and hit
> the stack are they still going to get the same treatment as they
> would have in h/ware?
>

On egress the skb priority is mapped to a class which is associated with a range of queues (qoffset:qoffset + qcount). In the 802.1Q case this queue range is mapped to the 802.1Qp traffic class in hardware. So the hardware traffic class is mapped 1-1 with the software class. Additionally in software the VLAN egress mapping is used to map the skb priority to the 802.1Q priority. Here I expect user policies to configure this to get a consistent mapping. On ingress the skb priority is set using the 802.1Q ingress mapping. This case is something a userspace policy could configure if egress/ingress mappings should be symmetric.

In the simpler case of hardware rate limiting (not 802.1Q) this is not really a concern at all. With this mechanism we can identify traffic and push it to the correct queues that are grouped into a rate limited class. If there are egress/ingress mappings then those will apply skb priority tags on egress and the correct skb priority on ingress.

Currently everything works reasonably well with this scheme and the mq qdisc. The mq qdisc uses pfifo and the driver then pauses the queues as needed. Using the enhanced transmission selection algorithm (ETS - 802.1Qaz pre-standard) in hardware we see variations from expected bandwidth around +-5% with TCP/UDP. Instrumenting HW rate limiters gives similar variations. I tested this is with ixgbe and the 82599 device.

Bit long winded but hopefully that answers your question.

> 
> cheers,
> jamal
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Dec. 3, 2010, 11:06 a.m. UTC | #8
On Thu, 2010-12-02 at 11:51 -0800, John Fastabend wrote:
> On 12/2/2010 2:40 AM, jamal wrote:


> I viewed the HW QOS as L2 link attributes more than a queuing discipline per se.
> Plus 'ip link' is already used to set things outside of ip. 
> For example 'txqueuelen' and 'vf x'.

the vf one maybe borderline-ok txquelen is probably inherited from
ifconfig (and not sure a single queue a scheduler qualifies)


> However thinking about this a bit more qdisc support seems cleaner. 
> For one we can configure QOS policies per class with Qdisc_class_ops. 
> And then also aggregate statistics with dump_stats. I would avoid the 
> "hardware-kinda-8021q-sched" name though to account for schedulers that 
> may not be 802.1Q compliant maybe 'mclass-sched' for multi-class scheduler. 

Typically the scheduler would be a very familiar one implemented
per-spec by many vendors and will have a name acceptable by all.
So pick an appropriate noun so the user expectation matches it.

> I'll look into this. Thanks for the suggestion!

> 
> On egress the skb priority is mapped to a class which is associated with a
> range of queues (qoffset:qoffset + qcount). 
> In the 802.1Q case this queue range is mapped to the 802.1Qp 
> traffic class in hardware. So the hardware traffic class is mapped 1-1 
> with the software class. Additionally in software the VLAN egress mapping
> is used to map the skb priority to the 802.1Q priority. Here I expect user
> policies to configure this to get a consistent mapping. On ingress the 
> skb priority is set using the 802.1Q ingress mapping. This case is 
> something a userspace policy could configure if egress/ingress mappings
> should be symmetric.
> 

Sounds sensible. 

> In the simpler case of hardware rate limiting (not 802.1Q) this is not
> really a concern at all. With this mechanism we can identify traffic 
> and push it to the correct queues that are grouped into a rate limited class.

Ok, so you can do rate control as well?

> If there are egress/ingress mappings then those will apply skb priority tags 
> on egress and the correct skb priority on ingress.

Curious how you would do this in a rate controlled environment. EX: on
egress, do you use whatever skb prio you get to map to a specific rate
queue in h/ware? Note: skb prio has a strict priority scheduling
semantics so a 1-1 mapping doesnt sound reasonable..

> Currently everything works reasonably well with this scheme and the mq qdisc.
>  The mq qdisc uses pfifo and the driver then pauses the queues as needed. 
> Using the enhanced transmission selection algorithm (ETS - 802.1Qaz pre-standard)
>  in hardware we see variations from expected bandwidth around +-5% with TCP/UDP. 
> Instrumenting HW rate limiters gives similar variations. I tested this is with 
> ixgbe and the 82599 device.
> 
> Bit long winded but hopefully that answers your question.

I am curious about the rate based scheme - and i hope you are looking at
a different qdisc for that?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend Dec. 9, 2010, 7:58 p.m. UTC | #9
On 12/3/2010 3:06 AM, jamal wrote:
> On Thu, 2010-12-02 at 11:51 -0800, John Fastabend wrote:
>> On 12/2/2010 2:40 AM, jamal wrote:
> 
> 
>> I viewed the HW QOS as L2 link attributes more than a queuing discipline per se.
>> Plus 'ip link' is already used to set things outside of ip. 
>> For example 'txqueuelen' and 'vf x'.
> 
> the vf one maybe borderline-ok txquelen is probably inherited from
> ifconfig (and not sure a single queue a scheduler qualifies)
> 
> 
>> However thinking about this a bit more qdisc support seems cleaner. 
>> For one we can configure QOS policies per class with Qdisc_class_ops. 
>> And then also aggregate statistics with dump_stats. I would avoid the 
>> "hardware-kinda-8021q-sched" name though to account for schedulers that 
>> may not be 802.1Q compliant maybe 'mclass-sched' for multi-class scheduler. 
> 
> Typically the scheduler would be a very familiar one implemented
> per-spec by many vendors and will have a name acceptable by all.
> So pick an appropriate noun so the user expectation matches it.
> 

I think what we really want is a container to create groups of tx queues which can then be managed and given a scheduler. One reason for this is the 802.1Q spec allows for different schedulers to be running on different traffic classes including vendor specific schedulers. So having a root "hardware-kinda-8021q-sched" doesn't seem flexible enough to handle adding/removing schedulers per traffic class.

With a container qdisc statistics roll up nicely as expected and the default scheduler can be the usual mq qdisc.

A first take at this coming shortly. Any thoughts?

>> I'll look into this. Thanks for the suggestion!
> 
>>
>> On egress the skb priority is mapped to a class which is associated with a
>> range of queues (qoffset:qoffset + qcount). 
>> In the 802.1Q case this queue range is mapped to the 802.1Qp 
>> traffic class in hardware. So the hardware traffic class is mapped 1-1 
>> with the software class. Additionally in software the VLAN egress mapping
>> is used to map the skb priority to the 802.1Q priority. Here I expect user
>> policies to configure this to get a consistent mapping. On ingress the 
>> skb priority is set using the 802.1Q ingress mapping. This case is 
>> something a userspace policy could configure if egress/ingress mappings
>> should be symmetric.
>>
> 
> Sounds sensible. 
> 
>> In the simpler case of hardware rate limiting (not 802.1Q) this is not
>> really a concern at all. With this mechanism we can identify traffic 
>> and push it to the correct queues that are grouped into a rate limited class.
> 
> Ok, so you can do rate control as well?
> 

Yes, but per tx_ring. So software needs to then balance the rings into an aggregated rate limiter. Using the container scheme I imagine a root mclass qdisc with multiple "sch_rate_limiter" qdiscs. This qdisc could manage the individual rate limiters per queue and get something like a rate limiter per groups of tx queues.

>> If there are egress/ingress mappings then those will apply skb priority tags 
>> on egress and the correct skb priority on ingress.
> 
> Curious how you would do this in a rate controlled environment. EX: on
> egress, do you use whatever skb prio you get to map to a specific rate
> queue in h/ware? Note: skb prio has a strict priority scheduling
> semantics so a 1-1 mapping doesnt sound reasonable..

Yes this is how I would expect this to work. The prio mapping is configurable so I think this could be worked around by policy in tc. iproute2 would need to pick a reasonable default mapping.

Warning thinking out loud here but maybe we could also add a qdisc op to pick the underlying tx queue basically a qdisc ops for dev_pick_tx(). This ops could be part of the root qdisc and called in dev_queue_xmit(). I would need to think about this some more to see if it is sane but bottom line is the tx queue needs to be learned before __dev_xmit_skb(). The default mechanism in this patch set being the skb prio.

> 
>> Currently everything works reasonably well with this scheme and the mq qdisc.
>>  The mq qdisc uses pfifo and the driver then pauses the queues as needed. 
>> Using the enhanced transmission selection algorithm (ETS - 802.1Qaz pre-standard)
>>  in hardware we see variations from expected bandwidth around +-5% with TCP/UDP. 
>> Instrumenting HW rate limiters gives similar variations. I tested this is with 
>> ixgbe and the 82599 device.
>>
>> Bit long winded but hopefully that answers your question.
> 
> I am curious about the rate based scheme - and i hope you are looking at
> a different qdisc for that?

Yes a different qdisc.

Thanks,
John

> 
> cheers,
> jamal
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Dec. 15, 2010, 1:19 p.m. UTC | #10
Sorry for the latency.

On Thu, 2010-12-09 at 11:58 -0800, John Fastabend wrote:

> I think what we really want is a container to create groups of tx queues 
> which can then be managed and given a scheduler. One reason for this is 
> the 802.1Q spec allows for different schedulers to be running on different
> traffic classes including vendor specific schedulers. So having a root 
> "hardware-kinda-8021q-sched" doesn't seem flexible enough to handle 
> adding/removing schedulers per traffic class.
> 
> With a container qdisc statistics roll up nicely as expected and 
> the default scheduler can be the usual mq qdisc.

As far as i can see the "container" is a qdisc. The noun doesnt
matter, mq looks sufficient.
[I just said "hardware-kinda-8021q-sched" because what you posted didnt
look 8012q conformant.]
 
> A first take at this coming shortly. Any thoughts?

Havent had time to look at patches you posted.

> > Ok, so you can do rate control as well?
> 
> Yes, but per tx_ring. So software needs to then balance the rings into
> an aggregated rate limiter. Using the container scheme I imagine a root 
> mclass qdisc with multiple "sch_rate_limiter" qdiscs. This qdisc could 
> manage the individual rate limiters per queue and get something like a 
> rate limiter per groups of tx queues.
> 

The qdisc semantics allow for hierachies i.e you could have qdiscs that
hold other qdiscs that each hold different scheduling algorithms etc.

> Yes this is how I would expect this to work. The prio mapping is configurable
> so I think this could be worked around by policy in tc. iproute2 would need 
> to pick a reasonable default mapping.
> 
> Warning thinking out loud here but maybe we could also add a qdisc op to pick 
> the underlying tx queue basically a qdisc ops for dev_pick_tx(). This ops could 
> be part of the root qdisc and called in dev_queue_xmit(). I would need to think 
> about this some more to see if it is sane but bottom line is the tx queue needs 
> to be learned before __dev_xmit_skb(). The default mechanism in this patch set 
> being the skb prio.
> 

You could use the qdisc major:minor to map to the hardware level queues.
But care is needed so that the user doesnt choose the wrong mapping, out
of boundary mapping etc. I am sure such validation can be done at
iproute2 level way before the hardware is configured.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index f5bb2dc..190c70d 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,6 +116,8 @@  enum {
 	IFLA_STATS64,
 	IFLA_VF_PORTS,
 	IFLA_PORT_SELF,
+	IFLA_AF_SPEC,
+	IFLA_TC,
 	__IFLA_MAX
 };
 
@@ -348,4 +350,53 @@  struct ifla_port_vsi {
 	__u8 pad[3];
 };
 
+/* HW QOS management section
+ *
+ *	Nested layout of set/get msg is:
+ *
+ *		[IFLA_TC]
+ *			[IFLA_TC_MAX_TC]
+ *			[IFLA_TC_NUM_TC]
+ *			[IFLA_TC_TXQS]
+ *				[IFLA_TC_TXQ]
+ *				...
+ *			[IFLA_TC_MAPS]
+ *				[IFLA_TC_MAP]
+ *				...
+ */
+enum {
+	IFLA_TC_UNSPEC,
+	IFLA_TC_TXMAX,
+	IFLA_TC_TXNUM,
+	IFLA_TC_TXQS,
+	IFLA_TC_MAPS,
+	__IFLA_TC_MAX,
+};
+#define IFLA_TC_MAX (__IFLA_TC_MAX - 1)
+
+struct ifla_tc_txq {
+	__u8 num;
+	__u16 count;
+	__u16 offset;
+};
+
+enum {
+	IFLA_TC_TXQ_UNSPEC,
+	IFLA_TC_TXQ,
+	__IFLA_TC_TCQ_MAX,
+};
+#define IFLA_TC_TXQS_MAX (__IFLA_TC_TCQ_MAX - 1)
+
+struct ifla_tc_map {
+	__u8 prio;
+	__u8 tc;
+};
+
+enum {
+	IFLA_TC_MAP_UNSPEC,
+	IFLA_TC_MAP,
+	__IFLA_TC_MAP_MAX,
+};
+#define IFLA_TC_MAP_MAX (__IFLA_TC_MAP_MAX - 1)
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 19b3d6e..39b357f 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -152,6 +152,47 @@  static void print_queuelen(FILE *f, const char *name)
 		fprintf(f, "qlen %d", ifr.ifr_qlen);
 }
 
+static void print_tc_table(FILE *fp, struct rtattr *tb)
+{
+	struct rtattr *table[IFLA_TC_MAX+1];
+	struct rtattr *attr;
+	struct ifla_tc_map *map;
+	struct ifla_tc_txq *txq;
+	__u8 max, num;
+	int rem;
+
+	parse_rtattr_nested(table, IFLA_TC_MAX, tb);
+
+	if (table[IFLA_TC_MAX])
+		max = * (__u8 *) RTA_DATA(table[IFLA_TC_TXMAX]);
+	if (table[IFLA_TC_MAX])
+		num = * (__u8 *) RTA_DATA(table[IFLA_TC_TXNUM]);
+
+
+	fprintf(fp, "\n    tc %d:%d ", num, max);
+
+	if (!num)
+		return;
+
+	rem = RTA_PAYLOAD(table[IFLA_TC_MAPS]);
+	attr = RTA_DATA(table[IFLA_TC_MAPS]);
+	fprintf(fp, "map: {");
+	for (; RTA_OK(attr, rem); attr = RTA_NEXT(attr, rem)) {
+		map = RTA_DATA(attr);
+		fprintf(fp, " %d", map->tc);
+	}
+	fprintf(fp, " }");
+
+	rem = RTA_PAYLOAD(table[IFLA_TC_TXQS]);
+	attr = RTA_DATA(table[IFLA_TC_TXQS]);
+	fprintf(fp, "\n    txqs: ");
+	for (; RTA_OK(attr, rem); attr = RTA_NEXT(attr, rem)) {
+		txq = RTA_DATA(attr);
+		fprintf(fp, "(%d:%d) ",
+			txq->offset, txq->offset + txq->count);
+	}
+}
+
 static void print_linktype(FILE *fp, struct rtattr *tb)
 {
 	struct rtattr *linkinfo[IFLA_INFO_MAX+1];
@@ -299,8 +340,8 @@  int print_linkinfo(const struct sockaddr_nl *who,
 			if (ifi->ifi_flags&IFF_POINTOPOINT)
 				fprintf(fp, " peer ");
 			else
-				fprintf(fp, " brd ");
-			fprintf(fp, "%s", ll_addr_n2a(RTA_DATA(tb[IFLA_BROADCAST]),
+				fprintf(fp, " brd");
+			fprintf(fp, "%s ", ll_addr_n2a(RTA_DATA(tb[IFLA_BROADCAST]),
 						      RTA_PAYLOAD(tb[IFLA_BROADCAST]),
 						      ifi->ifi_type,
 						      b1, sizeof(b1)));
@@ -421,6 +462,9 @@  int print_linkinfo(const struct sockaddr_nl *who,
 			print_vfinfo(fp, i);
 	}
 
+	if (tb[IFLA_TC])
+		print_tc_table(fp, tb[IFLA_TC]);
+
 	fprintf(fp, "\n");
 	fflush(fp);
 	return 0;
diff --git a/ip/iplink.c b/ip/iplink.c
index cb2c4f5..fa9236e 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -71,6 +71,8 @@  void iplink_usage(void)
 	fprintf(stderr, "	                  [ vf NUM [ mac LLADDR ]\n");
 	fprintf(stderr, "				   [ vlan VLANID [ qos VLAN-QOS ] ]\n");
 	fprintf(stderr, "				   [ rate TXRATE ] ] \n");
+	fprintf(stderr, "	                  [ tc [ num NUMTC ]\n");
+	fprintf(stderr, "			       [ map PRIO1 PRIO2 ... ] ]\n");
 	fprintf(stderr, "       ip link show [ DEVICE ]\n");
 
 	if (iplink_have_newlink()) {
@@ -242,6 +244,58 @@  int iplink_parse_vf(int vf, int *argcp, char ***argvp,
 	return 0;
 }
 
+int iplink_parse_tc(int *argcp, char ***argvp, struct iplink_req *req)
+{
+	int argc = *argcp;
+	char **argv = *argvp;
+
+	while (NEXT_ARG_OK()) {
+		NEXT_ARG();
+		if (matches(*argv, "num") == 0) {
+			__u8 numtc;
+			NEXT_ARG();
+			if (get_u8(&numtc,  *argv, 0))
+				invarg("Invalid \"num\" value\n", *argv);
+			addattr_l(&req->n, sizeof(*req), IFLA_TC_TXNUM,
+				  &numtc, 1);
+		}
+
+		if (matches(*argv, "map") == 0) {
+			struct ifla_tc_map map;
+			struct rtattr *maps;
+			int i;
+
+			maps = addattr_nest(&req->n, sizeof(*req),
+					      IFLA_TC_MAPS);
+
+			for (i = 0; NEXT_ARG_OK(); i++) {
+				NEXT_ARG(); 
+				if (i > 15)
+					invarg("\"map\" value exceeds "
+					       "prio map\n", *argv);
+				map.prio = i;
+				if (get_u8(&map.tc,  *argv, 0))
+					invarg("Invalid \"map\" value\n", *argv);
+
+				if (map.tc > 15)
+					invarg("\"map\" value exceeds max tc",
+					       *argv);
+
+				addattr_l(&req->n, sizeof(*req), IFLA_TC_MAP,
+				  	  &map, sizeof(map));
+			}
+
+			addattr_nest_end(&req->n, maps);
+		}
+	}
+
+	if (argc == *argcp)
+		incomplete_command();
+	*argcp = argc;
+	*argvp = argv;
+	return 0;
+}
+
 
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
 		char **name, char **type, char **link, char **dev)
@@ -361,6 +415,15 @@  int iplink_parse(int argc, char **argv, struct iplink_req *req,
 			if (len < 0)
 				return -1;
 			addattr_nest_end(&req->n, vflist);
+		} else if (strcmp(*argv, "tc") == 0) {
+			struct rtattr *table;
+
+			table = addattr_nest(&req->n, sizeof(*req),
+					     IFLA_TC);
+			len = iplink_parse_tc(&argc, &argv, req);
+			if (len < 0)
+				return -1;
+			addattr_nest_end(&req->n, table);
 #ifdef IFF_DYNAMIC
 		} else if (matches(*argv, "dynamic") == 0) {
 			NEXT_ARG();