Message ID | 1412081485.30721.75.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > Add a new dql attribute, to control how much packets we are allowed to > burst from qdisc to device. I understand the motivation for this, but I find it a bit out-of-place to have a 'packet' type counter in bql context? Would it perhaps make more sense to restrict bulk dequeues by an upper (possibly changeable from userspace) byte counter limit? Thanks, Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > Add a new dql attribute, to control how much packets we are allowed to > > burst from qdisc to device. > > I understand the motivation for this, but I find it a bit out-of-place > to have a 'packet' type counter in bql context? > > Would it perhaps make more sense to restrict bulk dequeues by an upper > (possibly changeable from userspace) byte counter limit? The byte count is already provided : its the BQL limit. We already have ways to tune it (limit_min & limit_max) We do not think we need something else here. That was the whole point of reusing BQL in the first place : Its already powerful and implemented. This new attribute is a packet burst. When I suggested 'pburst' attribute for pktgen to Alexei, he sent a patch with "burst". Nobody complained. Now I am reusing what apparently was OK for pktgen, you seem to object. If you feel not comfortable with "burst", rename it to whatever you think is best. But please, do not hard code magic 7 in your code. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote: >> Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > From: Eric Dumazet <edumazet@google.com> >> > >> > Add a new dql attribute, to control how much packets we are allowed to >> > burst from qdisc to device. >> >> I understand the motivation for this, but I find it a bit out-of-place >> to have a 'packet' type counter in bql context? >> >> Would it perhaps make more sense to restrict bulk dequeues by an upper >> (possibly changeable from userspace) byte counter limit? > > The byte count is already provided : its the BQL limit. > We already have ways to tune it (limit_min & limit_max) > We do not think we need something else here. > That was the whole point of reusing BQL in the first place : Its already > powerful and implemented. > > This new attribute is a packet burst. > Yes, this is the wrong place to put it. DQL knows nothing about packets or technically even bytes, it's a generic library that deal with "objects" on a queue. I don't see how burst relates to this. I would infer from the patch that what you want is to add an attribute to the TX queue. Why not do this directly? > > When I suggested 'pburst' attribute for pktgen to Alexei, he sent a > patch with "burst". Nobody complained. > > Now I am reusing what apparently was OK for pktgen, you seem to object. > > If you feel not comfortable with "burst", rename it to whatever you > think is best. > > But please, do not hard code magic 7 in your code. > > Thanks > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > If you feel not comfortable with "burst", rename it to whatever you > think is best. > But please, do not hard code magic 7 in your code. I had hoped that this 'magic' value could be removed completely, only using bql data for bulking decisions. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-09-30 at 07:26 -0700, Tom Herbert wrote: > Yes, this is the wrong place to put it. DQL knows nothing about > packets or technically even bytes, it's a generic library that deal > with "objects" on a queue. I don't see how burst relates to this. I wonder then why you put : netdev_tx_completed_queue(struct netdev_queue *dev_queue, unsigned int pkts, unsigned int bytes) Really, I don't care _where_ the attribute is, as long we dont access yet another cache line in fast path. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-09-30 at 16:31 +0200, Florian Westphal wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > If you feel not comfortable with "burst", rename it to whatever you > > think is best. > > But please, do not hard code magic 7 in your code. > > I had hoped that this 'magic' value could be removed > completely, only using bql data for bulking decisions. But it is apparently not the case, since you guys decided to had it set to 8, then to 7 later, based on experiments. This value is probably dependent on the NIC hardware, while BQL adj_limit is more likely tied to external factors, like latencies to react to TX IRQ and perform TX completion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 7:49 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2014-09-30 at 07:26 -0700, Tom Herbert wrote: > > >> Yes, this is the wrong place to put it. DQL knows nothing about >> packets or technically even bytes, it's a generic library that deal >> with "objects" on a queue. I don't see how burst relates to this. > > I wonder then why you put : > > netdev_tx_completed_queue(struct netdev_queue *dev_queue, > unsigned int pkts, unsigned int bytes) > In case we ever wanted to count by packets instead of or in addition to bytes. > Really, I don't care _where_ the attribute is, as long we dont access > yet another cache line in fast path. Just put it after/before dql in netdev_queue structure. > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote: > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > From: Eric Dumazet <edumazet@google.com> > > > > > > Add a new dql attribute, to control how much packets we are allowed to > > > burst from qdisc to device. > > > > I understand the motivation for this, but I find it a bit out-of-place > > to have a 'packet' type counter in bql context? > > > > Would it perhaps make more sense to restrict bulk dequeues by an upper > > (possibly changeable from userspace) byte counter limit? > > The byte count is already provided : its the BQL limit. > We already have ways to tune it (limit_min & limit_max) > We do not think we need something else here. So you're saying that a bulk dequeue should just grab as many skbs as possible until no more available or dql_avail exhausted? The "magic" value was just to be conservative and not induce any hol blocking, which is also why Jesper reduced it again in the latest submission. Then, we might later be able to remove the TSO restriction and switch to a pure byte-based limit. (I don't think having a packet-based count makes sense once tso/gso skbs can be bulk dequeued). Sorry for the confusion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 8:26 AM, Florian Westphal <fw@strlen.de> wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Tue, 2014-09-30 at 15:46 +0200, Florian Westphal wrote: >> > Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > > From: Eric Dumazet <edumazet@google.com> >> > > >> > > Add a new dql attribute, to control how much packets we are allowed to >> > > burst from qdisc to device. >> > >> > I understand the motivation for this, but I find it a bit out-of-place >> > to have a 'packet' type counter in bql context? >> > >> > Would it perhaps make more sense to restrict bulk dequeues by an upper >> > (possibly changeable from userspace) byte counter limit? >> >> The byte count is already provided : its the BQL limit. >> We already have ways to tune it (limit_min & limit_max) >> We do not think we need something else here. > > So you're saying that a bulk dequeue should just grab as many skbs > as possible until no more available or dql_avail exhausted? > > The "magic" value was just to be conservative and not induce any > hol blocking, which is also why Jesper reduced it again in the latest > submission. > > Then, we might later be able to remove the TSO restriction and switch > to a pure byte-based limit. > > (I don't think having a packet-based count makes sense once tso/gso > skbs can be bulk dequeued). > > Sorry for the confusion. I still have some hope that we can one day fix wifi packet aggregation, which is a limit of 42 packets or 64k bytes per aggregate (best case), with something BQL-like. As for the size of the tx ring problem vs GSO, there is at least one driver with a very limited tx ring that I can think of that tears apart GSO packets in the driver... but I'm not sure if it's mainlined. I would not mind at all if TSO/GSO were banned on devices running slower than 100Mbit. Perhaps exporting the tx-ring size would be saner than a burst parameter?
On Tue, 2014-09-30 at 17:26 +0200, Florian Westphal wrote: > So you're saying that a bulk dequeue should just grab as many skbs > as possible until no more available or dql_avail exhausted? > This was certainly the intent. > The "magic" value was just to be conservative and not induce any > hol blocking, which is also why Jesper reduced it again in the latest > submission. Think about whats happening here. BQL controls the optimal queue _in_ the NIC TX ring, given latencies and throughput goals. Once packets are queued in the NIC TX ring, there is no way a high prio packet can be injected ahead of them, unless NIC has separate TX ring for different prio already. There is no need redoing this in another layer. BQL logic was hard to setup and tune, I have no idea why you want to reinvent it. > > Then, we might later be able to remove the TSO restriction and switch > to a pure byte-based limit Why later ? If it is not needed now, it should not be added 'to be safe' > . > > (I don't think having a packet-based count makes sense once tso/gso > skbs can be bulk dequeued). So do not add this limit, so that we can spot bugs ahead of time. If some NIC wants a 'doorbell at least once every 8 packets', then the NIC driver should be fixed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 30 Sep 2014 05:51:25 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > Add a new dql attribute, to control how much packets we are allowed to > burst from qdisc to device. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > Jesper, please consider using this instead of 7 or 8 values you hard coded. The hard coded budget was only a safe guard. I didn't want to export it (as we can never remove it then), I plan to remove it or replace it with something else.
On Tue, 30 Sep 2014 07:55:36 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2014-09-30 at 16:31 +0200, Florian Westphal wrote: > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > If you feel not comfortable with "burst", rename it to whatever you > > > think is best. > > > But please, do not hard code magic 7 in your code. > > > > I had hoped that this 'magic' value could be removed > > completely, only using bql data for bulking decisions. > > But it is apparently not the case, since you guys decided to had it set > to 8, then to 7 later, based on experiments. The "magic" limit is only a conservative save guard. We do plan to remove this completely. That is the reason for not exporting this, as we want free-hands to remove this completely. Guess, I'll remove it completely now, so we can move on. I would like some off/on switch, exported to userspace, for disabling this bulking (then we don't need this conservative magic number). How would that be done best?
On Tue, Sep 30, 2014 at 2:46 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote: > > On Tue, 30 Sep 2014 07:55:36 -0700 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On Tue, 2014-09-30 at 16:31 +0200, Florian Westphal wrote: > > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > If you feel not comfortable with "burst", rename it to whatever you > > > > think is best. > > > > But please, do not hard code magic 7 in your code. > > > > > > I had hoped that this 'magic' value could be removed > > > completely, only using bql data for bulking decisions. > > > > But it is apparently not the case, since you guys decided to had it set > > to 8, then to 7 later, based on experiments. > > The "magic" limit is only a conservative save guard. We do plan to > remove this completely. That is the reason for not exporting this, as > we want free-hands to remove this completely. > > Guess, I'll remove it completely now, so we can move on. > > > I would like some off/on switch, exported to userspace, for disabling > this bulking (then we don't need this conservative magic number). How > would that be done best? I'd suggest to keep the packet limit as an unsigned. Default value is infinity (-1UL), 0 turns it off, and other values are available if someone really wants to tune this (for example, if BQL is disabled-- BQL limit is infinity). Export this as sys variable for the TX queue. Tom > > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Sr. Network Kernel Developer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h index a4be70398ce1..8509053eb330 100644 --- a/include/linux/dynamic_queue_limits.h +++ b/include/linux/dynamic_queue_limits.h @@ -42,6 +42,7 @@ struct dql { unsigned int num_queued; /* Total ever queued */ unsigned int adj_limit; /* limit + num_completed */ unsigned int last_obj_cnt; /* Count at last queuing */ + unsigned int burst; /* Fields accessed only by completion path (dql_completed) */ diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c index 0777c5a45fa0..acc0d6105191 100644 --- a/lib/dynamic_queue_limits.c +++ b/lib/dynamic_queue_limits.c @@ -132,6 +132,7 @@ int dql_init(struct dql *dql, unsigned hold_time) dql->max_limit = DQL_MAX_LIMIT; dql->min_limit = 0; dql->slack_hold_time = hold_time; + dql->burst = 7; dql_reset(dql); return 0; } diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 9dd06699b09c..80cf6f166706 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -978,6 +978,36 @@ static struct netdev_queue_attribute bql_hold_time_attribute = __ATTR(hold_time, S_IRUGO | S_IWUSR, bql_show_hold_time, bql_set_hold_time); +static ssize_t bql_show_burst(struct netdev_queue *queue, + struct netdev_queue_attribute *attr, + char *buf) +{ + const struct dql *dql = &queue->dql; + + return sprintf(buf, "%u\n", dql->burst); +} + +static ssize_t bql_set_burst(struct netdev_queue *queue, + struct netdev_queue_attribute *attribute, + const char *buf, size_t len) +{ + struct dql *dql = &queue->dql; + unsigned int value; + int err; + + err = kstrtouint(buf, 10, &value); + if (err < 0) + return err; + + dql->burst = value; + + return len; +} + +static struct netdev_queue_attribute bql_burst_attribute = + __ATTR(burst, S_IRUGO | S_IWUSR, bql_show_burst, + bql_set_burst); + static ssize_t bql_show_inflight(struct netdev_queue *queue, struct netdev_queue_attribute *attr, char *buf) @@ -1019,6 +1049,7 @@ static struct attribute *dql_attrs[] = { &bql_limit_min_attribute.attr, &bql_hold_time_attribute.attr, &bql_inflight_attribute.attr, + &bql_burst_attribute.attr, NULL };