diff mbox

[net-next] dql: add a burst attribute

Message ID 1412081485.30721.75.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 30, 2014, 12:51 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

Add a new dql attribute, to control how much packets we are allowed to
burst from qdisc to device.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
Jesper, please consider using this instead of 7 or 8 values you hard coded.

 include/linux/dynamic_queue_limits.h |    1 
 lib/dynamic_queue_limits.c           |    1 
 net/core/net-sysfs.c                 |   31 +++++++++++++++++++++++++
 3 files changed, 33 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Florian Westphal Sept. 30, 2014, 1:46 p.m. UTC | #1
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Add a new dql attribute, to control how much packets we are allowed to
> burst from qdisc to device.

I understand the motivation for this, but I find it a bit out-of-place
to have a 'packet' type counter in bql context?

Would it perhaps make more sense to restrict bulk dequeues by an upper
(possibly changeable from userspace) byte counter limit?

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 30, 2014, 2:17 p.m. UTC | #2
On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Add a new dql attribute, to control how much packets we are allowed to
> > burst from qdisc to device.
> 
> I understand the motivation for this, but I find it a bit out-of-place
> to have a 'packet' type counter in bql context?
> 
> Would it perhaps make more sense to restrict bulk dequeues by an upper
> (possibly changeable from userspace) byte counter limit?

The byte count is already provided : its the BQL limit.
We already have ways to tune it (limit_min & limit_max)
We do not think we need something else here.
That was the whole point of reusing BQL in the first place : Its already
powerful and implemented.

This new attribute is a packet burst.


When I suggested 'pburst' attribute for pktgen to Alexei, he sent a
patch with "burst". Nobody complained.

Now I am reusing what apparently was OK for pktgen, you seem to object.

If you feel not comfortable with "burst", rename it to whatever you
think is best.

But please, do not hard code magic 7 in your code.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Sept. 30, 2014, 2:26 p.m. UTC | #3
On Tue, Sep 30, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > From: Eric Dumazet <edumazet@google.com>
>> >
>> > Add a new dql attribute, to control how much packets we are allowed to
>> > burst from qdisc to device.
>>
>> I understand the motivation for this, but I find it a bit out-of-place
>> to have a 'packet' type counter in bql context?
>>
>> Would it perhaps make more sense to restrict bulk dequeues by an upper
>> (possibly changeable from userspace) byte counter limit?
>
> The byte count is already provided : its the BQL limit.
> We already have ways to tune it (limit_min & limit_max)
> We do not think we need something else here.
> That was the whole point of reusing BQL in the first place : Its already
> powerful and implemented.
>
> This new attribute is a packet burst.
>
Yes, this is the wrong place to put it. DQL knows nothing about
packets or technically even bytes, it's a generic library that deal
with "objects" on a queue. I don't see how burst relates to this.

I would infer from the patch that what you want is to add an attribute
to the TX queue. Why not do this directly?

>
> When I suggested 'pburst' attribute for pktgen to Alexei, he sent a
> patch with "burst". Nobody complained.
>
> Now I am reusing what apparently was OK for pktgen, you seem to object.
>
> If you feel not comfortable with "burst", rename it to whatever you
> think is best.
>
> But please, do not hard code magic 7 in your code.
>
> Thanks
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Sept. 30, 2014, 2:31 p.m. UTC | #4
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> If you feel not comfortable with "burst", rename it to whatever you
> think is best.
> But please, do not hard code magic 7 in your code.

I had hoped that this 'magic' value could be removed
completely, only using bql data for bulking decisions.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 30, 2014, 2:49 p.m. UTC | #5
On Tue, 2014-09-30 at 07:26 -0700, Tom Herbert wrote:


> Yes, this is the wrong place to put it. DQL knows nothing about
> packets or technically even bytes, it's a generic library that deal
> with "objects" on a queue. I don't see how burst relates to this.

I wonder then why you put :

netdev_tx_completed_queue(struct netdev_queue *dev_queue,
			  unsigned int pkts, unsigned int bytes)

Really, I don't care _where_ the attribute is, as long we dont access
yet another cache line in fast path.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 30, 2014, 2:55 p.m. UTC | #6
On Tue, 2014-09-30 at 16:31 +0200, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > If you feel not comfortable with "burst", rename it to whatever you
> > think is best.
> > But please, do not hard code magic 7 in your code.
> 
> I had hoped that this 'magic' value could be removed
> completely, only using bql data for bulking decisions.

But it is apparently not the case, since you guys decided to had it set
to 8, then to 7 later, based on experiments.

This value is probably dependent on the NIC hardware, while BQL
adj_limit is more likely tied to external factors, like latencies to
react to TX IRQ and perform TX completion.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Sept. 30, 2014, 3:05 p.m. UTC | #7
On Tue, Sep 30, 2014 at 7:49 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-09-30 at 07:26 -0700, Tom Herbert wrote:
>
>
>> Yes, this is the wrong place to put it. DQL knows nothing about
>> packets or technically even bytes, it's a generic library that deal
>> with "objects" on a queue. I don't see how burst relates to this.
>
> I wonder then why you put :
>
> netdev_tx_completed_queue(struct netdev_queue *dev_queue,
>                           unsigned int pkts, unsigned int bytes)
>
In case we ever wanted to count by packets instead of or in addition to bytes.

> Really, I don't care _where_ the attribute is, as long we dont access
> yet another cache line in fast path.

Just put it after/before dql in netdev_queue structure.

>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Sept. 30, 2014, 3:26 p.m. UTC | #8
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > Add a new dql attribute, to control how much packets we are allowed to
> > > burst from qdisc to device.
> > 
> > I understand the motivation for this, but I find it a bit out-of-place
> > to have a 'packet' type counter in bql context?
> > 
> > Would it perhaps make more sense to restrict bulk dequeues by an upper
> > (possibly changeable from userspace) byte counter limit?
> 
> The byte count is already provided : its the BQL limit.
> We already have ways to tune it (limit_min & limit_max)
> We do not think we need something else here.

So you're saying that a bulk dequeue should just grab as many skbs
as possible until no more available or dql_avail exhausted?

The "magic" value was just to be conservative and not induce any
hol blocking, which is also why Jesper reduced it again in the latest
submission.

Then, we might later be able to remove the TSO restriction and switch
to a pure byte-based limit.

(I don't think having a packet-based count makes sense once tso/gso
 skbs can be bulk dequeued).

Sorry for the confusion.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Taht Sept. 30, 2014, 3:39 p.m. UTC | #9
On Tue, Sep 30, 2014 at 8:26 AM, Florian Westphal <fw@strlen.de> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote:
>> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > > From: Eric Dumazet <edumazet@google.com>
>> > >
>> > > Add a new dql attribute, to control how much packets we are allowed to
>> > > burst from qdisc to device.
>> >
>> > I understand the motivation for this, but I find it a bit out-of-place
>> > to have a 'packet' type counter in bql context?
>> >
>> > Would it perhaps make more sense to restrict bulk dequeues by an upper
>> > (possibly changeable from userspace) byte counter limit?
>>
>> The byte count is already provided : its the BQL limit.
>> We already have ways to tune it (limit_min & limit_max)
>> We do not think we need something else here.
>
> So you're saying that a bulk dequeue should just grab as many skbs
> as possible until no more available or dql_avail exhausted?
>
> The "magic" value was just to be conservative and not induce any
> hol blocking, which is also why Jesper reduced it again in the latest
> submission.
>
> Then, we might later be able to remove the TSO restriction and switch
> to a pure byte-based limit.
>
> (I don't think having a packet-based count makes sense once tso/gso
>  skbs can be bulk dequeued).
>
> Sorry for the confusion.

I still have some hope that we can one day fix wifi packet aggregation,
which is a limit of 42 packets or 64k bytes per aggregate (best case),
with something BQL-like.

As for the size of the tx ring problem vs GSO, there is at least one
driver with a very limited tx ring that I can think of that tears apart
GSO packets in the driver... but I'm not sure if it's mainlined.

I would not mind at all if TSO/GSO were banned on devices running
slower than 100Mbit.

Perhaps exporting the tx-ring size would be saner than a burst
parameter?
Eric Dumazet Sept. 30, 2014, 3:41 p.m. UTC | #10
On Tue, 2014-09-30 at 17:26 +0200, Florian Westphal wrote:

> So you're saying that a bulk dequeue should just grab as many skbs
> as possible until no more available or dql_avail exhausted?
> 

This was certainly the intent.


> The "magic" value was just to be conservative and not induce any
> hol blocking, which is also why Jesper reduced it again in the latest
> submission.

Think about whats happening here.

BQL controls the optimal queue _in_ the NIC TX ring, given latencies and
throughput goals.

Once packets are queued in the NIC TX ring, there is no way a high prio
packet can be injected ahead of them, unless NIC has separate TX ring
for different prio already.

There is no need redoing this in another layer. BQL logic was hard to
setup and tune, I have no idea why you want to reinvent it.

> 
> Then, we might later be able to remove the TSO restriction and switch
> to a pure byte-based limit

Why later ? If it is not needed now, it should not be added 'to be safe'

> .
> 
> (I don't think having a packet-based count makes sense once tso/gso
>  skbs can be bulk dequeued).

So do not add this limit, so that we can spot bugs ahead of time.

If some NIC wants a 'doorbell at least once every 8 packets', then the
NIC driver should be fixed.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Sept. 30, 2014, 9:29 p.m. UTC | #11
On Tue, 30 Sep 2014 05:51:25 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> Add a new dql attribute, to control how much packets we are allowed to
> burst from qdisc to device.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> Jesper, please consider using this instead of 7 or 8 values you hard coded.

The hard coded budget was only a safe guard.  I didn't want to export
it (as we can never remove it then), I plan to remove it or replace it
with something else.
Jesper Dangaard Brouer Sept. 30, 2014, 9:46 p.m. UTC | #12
On Tue, 30 Sep 2014 07:55:36 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Tue, 2014-09-30 at 16:31 +0200, Florian Westphal wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > If you feel not comfortable with "burst", rename it to whatever you
> > > think is best.
> > > But please, do not hard code magic 7 in your code.
> > 
> > I had hoped that this 'magic' value could be removed
> > completely, only using bql data for bulking decisions.
> 
> But it is apparently not the case, since you guys decided to had it set
> to 8, then to 7 later, based on experiments.

The "magic" limit is only a conservative save guard.  We do plan to
remove this completely.  That is the reason for not exporting this, as
we want free-hands to remove this completely.

Guess, I'll remove it completely now, so we can move on.


I would like some off/on switch, exported to userspace, for disabling
this bulking (then we don't need this conservative magic number).  How
would that be done best?
Tom Herbert Oct. 1, 2014, 4:44 a.m. UTC | #13
On Tue, Sep 30, 2014 at 2:46 PM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Tue, 30 Sep 2014 07:55:36 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > On Tue, 2014-09-30 at 16:31 +0200, Florian Westphal wrote:
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > > If you feel not comfortable with "burst", rename it to whatever you
> > > > think is best.
> > > > But please, do not hard code magic 7 in your code.
> > >
> > > I had hoped that this 'magic' value could be removed
> > > completely, only using bql data for bulking decisions.
> >
> > But it is apparently not the case, since you guys decided to had it set
> > to 8, then to 7 later, based on experiments.
>
> The "magic" limit is only a conservative save guard.  We do plan to
> remove this completely.  That is the reason for not exporting this, as
> we want free-hands to remove this completely.
>
> Guess, I'll remove it completely now, so we can move on.
>
>
> I would like some off/on switch, exported to userspace, for disabling
> this bulking (then we don't need this conservative magic number).  How
> would that be done best?


I'd suggest to keep the packet limit as an unsigned. Default value is
infinity (-1UL), 0 turns it off, and other values are available if
someone really wants to tune this (for example, if BQL is disabled--
BQL limit is infinity). Export this as sys variable for the TX queue.

Tom

>
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Sr. Network Kernel Developer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index a4be70398ce1..8509053eb330 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -42,6 +42,7 @@  struct dql {
 	unsigned int	num_queued;		/* Total ever queued */
 	unsigned int	adj_limit;		/* limit + num_completed */
 	unsigned int	last_obj_cnt;		/* Count at last queuing */
+	unsigned int	burst;
 
 	/* Fields accessed only by completion path (dql_completed) */
 
diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c
index 0777c5a45fa0..acc0d6105191 100644
--- a/lib/dynamic_queue_limits.c
+++ b/lib/dynamic_queue_limits.c
@@ -132,6 +132,7 @@  int dql_init(struct dql *dql, unsigned hold_time)
 	dql->max_limit = DQL_MAX_LIMIT;
 	dql->min_limit = 0;
 	dql->slack_hold_time = hold_time;
+	dql->burst = 7;
 	dql_reset(dql);
 	return 0;
 }
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9dd06699b09c..80cf6f166706 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -978,6 +978,36 @@  static struct netdev_queue_attribute bql_hold_time_attribute =
 	__ATTR(hold_time, S_IRUGO | S_IWUSR, bql_show_hold_time,
 	    bql_set_hold_time);
 
+static ssize_t bql_show_burst(struct netdev_queue *queue,
+			      struct netdev_queue_attribute *attr,
+			      char *buf)
+{
+	const struct dql *dql = &queue->dql;
+
+	return sprintf(buf, "%u\n", dql->burst);
+}
+
+static ssize_t bql_set_burst(struct netdev_queue *queue,
+			     struct netdev_queue_attribute *attribute,
+			     const char *buf, size_t len)
+{
+	struct dql *dql = &queue->dql;
+	unsigned int value;
+	int err;
+
+	err = kstrtouint(buf, 10, &value);
+	if (err < 0)
+		return err;
+
+	dql->burst = value;
+
+	return len;
+}
+
+static struct netdev_queue_attribute bql_burst_attribute =
+	__ATTR(burst, S_IRUGO | S_IWUSR, bql_show_burst,
+	    bql_set_burst);
+
 static ssize_t bql_show_inflight(struct netdev_queue *queue,
 				 struct netdev_queue_attribute *attr,
 				 char *buf)
@@ -1019,6 +1049,7 @@  static struct attribute *dql_attrs[] = {
 	&bql_limit_min_attribute.attr,
 	&bql_hold_time_attribute.attr,
 	&bql_inflight_attribute.attr,
+	&bql_burst_attribute.attr,
 	NULL
 };