diff mbox

[net-next-2.6,1/2] net: add IFLA_NUM_TXQ attribute

Message ID 1284712288.3391.36.camel@edumazet-laptop
State Deferred, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 17, 2010, 8:31 a.m. UTC
In order to enable multiqueue support on some devices,
add IFLA_NUM_TXQ attribute, number of transmit queues, that "ip link"
can use, at creation and show time :

# ip link add gre34 txqueues 8 type gre remote 192.168.20.80

# ip link sho dev gre34
8: gre34: <POINTOPOINT,NOARP> mtu 1476 qdisc noop state DOWN txqueues 8 
    link/gre 0.0.0.0 peer 192.168.20.80

Drivers not yet multiqueue aware are supported, because core network
temporary sets real_num_tx_queues to one.

Multiqueue enabled drivers must then sets real_num_tx_queues to
num_tx_queues in their newlink() method.

Limits number of queues to 256 for the moment.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/if_link.h |    1 +
 net/core/rtnetlink.c    |   17 +++++++++++++++++
 2 files changed, 18 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 17, 2010, 11:29 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 17 Sep 2010 10:31:28 +0200

> In order to enable multiqueue support on some devices,
> add IFLA_NUM_TXQ attribute, number of transmit queues, that "ip link"
> can use, at creation and show time :
> 
> # ip link add gre34 txqueues 8 type gre remote 192.168.20.80
> 
> # ip link sho dev gre34
> 8: gre34: <POINTOPOINT,NOARP> mtu 1476 qdisc noop state DOWN txqueues 8 
>     link/gre 0.0.0.0 peer 192.168.20.80
> 
> Drivers not yet multiqueue aware are supported, because core network
> temporary sets real_num_tx_queues to one.
> 
> Multiqueue enabled drivers must then sets real_num_tx_queues to
> num_tx_queues in their newlink() method.
> 
> Limits number of queues to 256 for the moment.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

This is one way to solve the problem, but I think we can do a lot
better.

What is the true barrier for full parallel processing over GRE tunnels
at the moment?

It seems to me that the only issue that exists is the TXQ->lock done
by dev_queue_xmit() for the GRE tunnel xmit.

This is something we should have fixed ages ago, and we tried with the
ugly LLTX thing.  In my opinion all paths leading to a non-queueing
device should not take the TX lock, because by definition there is no
queueing state or synchronization to be cognizant of.

Actually, statistics can matter but we already have to address that
problem seperately for the sake of 64-bit stats on 32-bit machines.

Alexey even open condones this in the huge comment that sits in
the "!q->enqueue" path of dev_queue_xmit().

If we take care of this, then TX multi-queue works transparently for
all software devices layered on top of suitably capable hardware,
without us having to make any explicit multi-queue changes to the
software device code.

Eric, if you can demonstrate a real need for this once we solve the
fundamental issue, as I have outlined above, I am happy to add this
netlink attribute and tunable.  But for now I'm deferring these
two patches.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 18, 2010, 5:33 a.m. UTC | #2
Le vendredi 17 septembre 2010 à 16:29 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 17 Sep 2010 10:31:28 +0200
> 
> > In order to enable multiqueue support on some devices,
> > add IFLA_NUM_TXQ attribute, number of transmit queues, that "ip link"
> > can use, at creation and show time :
> > 
> > # ip link add gre34 txqueues 8 type gre remote 192.168.20.80
> > 
> > # ip link sho dev gre34
> > 8: gre34: <POINTOPOINT,NOARP> mtu 1476 qdisc noop state DOWN txqueues 8 
> >     link/gre 0.0.0.0 peer 192.168.20.80
> > 
> > Drivers not yet multiqueue aware are supported, because core network
> > temporary sets real_num_tx_queues to one.
> > 
> > Multiqueue enabled drivers must then sets real_num_tx_queues to
> > num_tx_queues in their newlink() method.
> > 
> > Limits number of queues to 256 for the moment.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> This is one way to solve the problem, but I think we can do a lot
> better.
> 
> What is the true barrier for full parallel processing over GRE tunnels
> at the moment?
> 
> It seems to me that the only issue that exists is the TXQ->lock done
> by dev_queue_xmit() for the GRE tunnel xmit.
> 
> This is something we should have fixed ages ago, and we tried with the
> ugly LLTX thing.  In my opinion all paths leading to a non-queueing
> device should not take the TX lock, because by definition there is no
> queueing state or synchronization to be cognizant of.
> 
> Actually, statistics can matter but we already have to address that
> problem seperately for the sake of 64-bit stats on 32-bit machines.
> 

Agreed, and even before 64bit stats, we did percpu stats on
loopback/bridge...

> Alexey even open condones this in the huge comment that sits in
> the "!q->enqueue" path of dev_queue_xmit().
> 
> If we take care of this, then TX multi-queue works transparently for
> all software devices layered on top of suitably capable hardware,
> without us having to make any explicit multi-queue changes to the
> software device code.
> 
> Eric, if you can demonstrate a real need for this once we solve the
> fundamental issue, as I have outlined above, I am happy to add this
> netlink attribute and tunable.  But for now I'm deferring these
> two patches.
> 
> Thanks!

Hmm, in my case, I was interested in RX processing and RPS.
(Asymetric routing)

The only way to have more than one rx queue today is to have more than
one tx queue.

But this probably can be addressed separately.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 2fc66dd..87dca81 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,6 +116,7 @@  enum {
 	IFLA_STATS64,
 	IFLA_VF_PORTS,
 	IFLA_PORT_SELF,
+	IFLA_NUM_TXQ,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b2a718d..1b9af34 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -676,6 +676,7 @@  static noinline size_t if_nlmsg_size(const struct net_device *dev)
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_NUM_VF */
+	       + nla_total_size(4) /* IFLA_NUM_TXQ */
 	       + rtnl_vfinfo_size(dev) /* IFLA_VFINFO_LIST */
 	       + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev); /* IFLA_LINKINFO */
@@ -791,6 +792,9 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (dev->master)
 		NLA_PUT_U32(skb, IFLA_MASTER, dev->master->ifindex);
 
+	if (dev->real_num_tx_queues > 1)
+		NLA_PUT_U32(skb, IFLA_NUM_TXQ, dev->real_num_tx_queues);
+
 	if (dev->qdisc)
 		NLA_PUT_STRING(skb, IFLA_QDISC, dev->qdisc->ops->id);
 
@@ -922,6 +926,7 @@  const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_MTU]		= { .type = NLA_U32 },
 	[IFLA_LINK]		= { .type = NLA_U32 },
 	[IFLA_TXQLEN]		= { .type = NLA_U32 },
+	[IFLA_NUM_TXQ]		= { .type = NLA_U32 },
 	[IFLA_WEIGHT]		= { .type = NLA_U32 },
 	[IFLA_OPERSTATE]	= { .type = NLA_U8 },
 	[IFLA_LINKMODE]		= { .type = NLA_U8 },
@@ -1357,6 +1362,18 @@  struct net_device *rtnl_create_link(struct net *src_net, struct net *net,
 		if (err)
 			goto err;
 	}
+	if (tb[IFLA_NUM_TXQ]) {
+		err = -EINVAL;
+		num_queues = nla_get_u32(tb[IFLA_NUM_TXQ]);
+		if (num_queues < 1 || num_queues > 256)
+			goto err;
+		/* multiqueue drivers have to set
+		 * dev->real_num_tx_queues = dev->num_tx_queues;
+		 * in their ->newlink() method. We force a temporary
+		 * single queue to be compatable with old drivers.
+		 */
+		real_num_queues = 1;
+	}
 	err = -ENOMEM;
 	dev = alloc_netdev_mq(ops->priv_size, ifname, ops->setup, num_queues);
 	if (!dev)