[iproute2] tc: add dualpi2 scheduler module

Message ID	20190324095234.20742-1-olga@albisser.org
State	Awaiting Upstream
Delegated to:	David Ahern
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Olga Albisser <olgabnd@gmail.com> To: netdev@vger.kernel.org Cc: Olga Albisser <olga@albisser.org>, Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>, Oliver Tilmans <olivier.tilmans@nokia-bell-labs.com>, Bob Briscoe <research@bobbriscoe.net>, Henrik Steen <henrist@henrist.net> Subject: [PATCH iproute2] tc: add dualpi2 scheduler module Date: Sun, 24 Mar 2019 10:52:34 +0100 Message-Id: <20190324095234.20742-1-olga@albisser.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	[iproute2] tc: add dualpi2 scheduler module \| expand [iproute2] tc: add dualpi2 scheduler module

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 0d18b1d1..dae44257 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -1160,4 +1160,36 @@ enum { #define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1) +/* DUALPI2 */ +enum { + TCA_DUALPI2_UNSPEC, + TCA_DUALPI2_ALPHA, + TCA_DUALPI2_BETA, + TCA_DUALPI2_DUALQ, + TCA_DUALPI2_ECN, + TCA_DUALPI2_K, + TCA_DUALPI2_L_DROP, + TCA_DUALPI2_ET_PACKETS, + TCA_DUALPI2_L_THRESH, + TCA_DUALPI2_LIMIT, + TCA_DUALPI2_T_SHIFT, + TCA_DUALPI2_T_SPEED, + TCA_DUALPI2_TARGET, + TCA_DUALPI2_TUPDATE, + TCA_DUALPI2_DROP_EARLY, + TCA_DUALPI2_WRR_RATIO, + __TCA_DUALPI2_MAX +}; + +#define TCA_DUALPI2_MAX (__TCA_DUALPI2_MAX - 1) +struct tc_dualpi2_xstats { + __u32 prob; /* current probability */ + __u32 delay_c; /* current delay in C queue */ + __u32 delay_l; /* current delay in L queue */ + __u32 packets_in; /* total number of packets enqueued */ + __u32 dropped; /* packets dropped due to pie_action */ + __u32 overlimit; /* dropped due to lack of space in queue */ + __u32 maxq; /* maximum queue size */ + __u32 ecn_mark; /* packets marked with ecn*/ +}; #endif diff --git a/man/man8/tc-dualpi2.8 b/man/man8/tc-dualpi2.8 new file mode 100644 index 00000000..c94fc583 --- /dev/null +++ b/man/man8/tc-dualpi2.8 @@ -0,0 +1,201 @@ +.TH DUALPI2 8 "13 December 2018" "iproute2" "Linux" + +.SH NAME +DUALPI2 \- Dual Queue Proportional Integral Controller AQM - Improved with a square +.SH SYNOPSIS +.sp +.ad l +.in +8 +.ti -8 +.BR tc " " qdisc " ... " dualpi2 +.br +.RB "[ " limit +.IR PACKETS " ]" +.br +.RB "[ " target +.IR TIME " ]" +.br +.RB "[ " tupdate +.IR TIME " ]" +.br +.RB "[ " alpha +.IR float " ]" +.br +.RB "[ " beta +.IR float " ] " +.br +.RB "[ " no_dualq " | " l4s_dualq " | " dc_dualq " ]" +.br +.RB "[ " k +.IR int " ]" +.br +.RB "[ " no_ecn " | " classic_ecn " | " l4s_ecn " | " dc_ecn " ]" +.br +.RB "[ " l_thresh +.IR TIME " ]" +.br +.RB "[ " t_shift +.IR TIME " ] " +.br +.RB "[ " c_limit " | " l_drop +.IR int " ] " +.br +.RB "[ " drop_enqueue " | " drop_dequeue " ]" +.br +.RB "[ " wrr_ratio +.IR PACKETS " ] " + +.SH DESCRIPTION +DUALPI2 AQM is a combination of the DUALQ Coupled-AQM with a PI2 base-AQM. The PI2 AQM (details can be found in the paper cited below) is in turn both an extension and a simplification of the PIE AQM. PI2 makes quite some PIE heuristics unnecessary, while being able to control scalable congestion controls like DCTCP and TCP-Prague. With PI2, both Reno/Cubic can be used in parallel with DCTCP, maintaining window fairness. DUALQ provides latency separation between low latency DCTCP flows and Reno/Cubic flows that need a bigger queue. The main design goals are: +.PD 0 +.IP \(bu 4 +L4S - Low Loss, Low Latency and Scalable congestion control support +.IP \(bu 4 +DualQ option to separate the L4S traffic in a low latency queue, without harming remaining traffic that is scheduled in classic queue due to congestion-coupling +.IP \(bu 4 +Configurable overload strategies +.IP \(bu 4 +Use of sojourn time to reliably estimate queue delay +.IP \(bu 4 +Simple implementation +.IP \(bu 4 +Guaranteed stability and fast responsiveness +.PD + +.SH ALGORITHM +DUALPI2 is designed to provide low loss and low latency to L4S traffic, without harming classic traffic. Every update interval a new internal base probability is calculated, based on queue delay. The base probability is updated with a delta based on the difference between the current queue delay and the target delay, and the queue growth comparing with the queuing delay during the previous interval. +The integral gain factor alpha is used to correct slowly enough any persistent standing queue error to the user specified target delay, while the proportional gain factor beta is used to quickly compensate for queue changes (growth or shrink). + +The updated base probability is used as input to decide to mark and drop packets. DUALPI2 scales the calculated probability for each of the two queues accordingly. For the L4S queue, the probability is multiplied by a scaling factor k, while for the classic queue, it is squared to compensate the squareroot rate equation of Reno/Cubic. The ECT identifier is used to classify traffic into respective queues. + +If DUALPI2 AQM has detected overload (when excessive non-responsive traffic is sent), it can signal congestion solely using drop, irrespective of the ECN field, or alternatively limit the drop probability and let the queue grow and eventually overflow (like tail-drop). + +Additional details can be found in the draft cited below. + +.SH PARAMETERS +.TP +.BI limit " PACKETS" +Limit the number of packets that can be enqueued. Incoming packets are dropped when this limit +is reached. This limit is common for the L4S and Classic queue. Defaults to +.I 10000 +packets. This is about 120ms delay on a 1Gbps link. +.TP +.BI target " TIME" +Set the expected queue delay. Defaults to +.I 20 +ms. +.TP +.BI tupdate " TIME" +Set the frequency at which the system drop probability is calculated. Defaults to +.I 32 +ms. This should be a third of the max RTT supported. +.TP +.BI alpha " float" +.PD 0 +.TP +.PD +.BI beta " float" +Set alpha and beta, the integral and proportional gain factors in Hz for the PI controller. These + can be calculated based on control theory and should be in the range between 0.00390625 and 40000. Defaults are +.I 0.3125 +and +.I 3.125 +Hz, which provide stable control for RTT's up to 100ms with tupdate of 30ms. Be aware, unlike with PIE, these are the real unscaled gain factors. +.TP +.B no_dualq | l4s_dualq | dc_dualq +Configures the ECN based queue selection/classifier. Two dualq options are available, the first for the standardized L4S Internet traffic (TCP-Prague) and the second for DCTCP traffic (mis)using ECT(0). Defaults to +.I l4s_dualq +.PD 0 +.TS +tab(:); +rb l. +no_dualq:all traffic in a single queue +l4s_dualq:dual queue, Classic TCP (non-ECT and ECT(0)) in the classic and TCP-Prague in the low latency queue (for ECT(1) and CE) +dc_dualq:dual queue, all non-ECN in the classic and all ECN traffic in the low latency queue (legacy DCTCP compatibility) +.TE +.PD +.TP +.BI k " int" +Set the coupling rate factor between Classic and L4S. Defaults to +.I 2 +.TP +.B no_ecn | classic_ecn | l4s_ecn | dc_ecn +Configures the ECN or drop classifier. Defaults to +.I l4s_ecn +.PD 0 +.TS +tab(:); +rb l. +no_ecn:All traffic is dropped with the classic (squared) probability +classic_ecn:Mark all ECN capable traffic with classic marking probability +l4s_ecn:Mark all ECT(0) with classic (squared) probability and ECT(1) with scalable (base*k) probability +dc_ecn:Mark all ECN Capable Traffic with scalable probability (legacy DCTCP compatibility) +.TE +.PD +.TP +.BI l_thresh " TIME" +Set the sojourn queue size when low-latency packets get marked. Defaults to +.I 1 +ms. +.TP +.BI t_shift " TIME +Set the L4S FIFO time shift. Defaults to +.I 40 +ms. +.TP +.B c_limit | l_drop +Control the overload strategy. Defaults to +.I c_limit +.PD 0 +.TS +tab(:); +rb l. +c_limit:Limits the Classic probability to align with 100% scalable probability. Further load will increase the queue and eventually results in overflow. +l_drop:Set the L4S maximum probability where classic drop is applied to all traffic. Results in high drop probabilities (up to 100%) for all traffic, while maintaining queue target control. +.TE +.PD +.TP +.B drop_enqueue | drop_dequeue +Decide when packets are PI-based dropped or marked. The l_thresh based L4S marking is always at dequeue. Defaults to +.I drop_dequeue +.PD 0 +.TS +tab(:); +rb l. +drop_enqueue:Drop at enqueue +drop_dequeue:Drop at dequeue +.TE +.PD +.TP +.BI wrr_ratio " PACKETS +Set the Weighted Round-Robin ratio between the L4S queue and the classic one. This protects the L4S queue from bursts in the classic queue, by pacing out those burst. In reaction, the PIE2 controller eventually adjusts the marking/drop probabilities, such that the L4S sender ends up pacing itself (i.e., has sufficiently large transmission gaps in its packet stream to let the classic flow recover). This behavior is disabled if wrr_ratio is set to 0. Default to +.I 0 + +.SH EXAMPLES +Setting DUALPI2 for the Internet with default parameters: + # sudo tc qdisc add dev eth0 root dualpi2 + # tc qdisc + +qdisc dualpi2 8001: dev eth0 root refcnt 2 limit 10000p target 20.0ms tupdate 33.3ms alpha 0.312500 beta 3.125000 l4s_dualq l4s_ecn k 2 c_limit et_time l_thresh 1.0ms t_shift 40.0ms t_speed 0 drop_dequeue wrr_ratio 0 + +Setting DUALPI2 for datacenter with legacy DCTCP using ECT(0): + # sudo tc qdisc add dev eth0 root dualpi2 dc_dualq dc_ecn + # tc qdisc + +qdisc dualpi2 8002: dev eth0 root refcnt 2 limit 10000p target 20.0ms tupdate 32.0ms alpha 0.312500 beta 3.125000 dc_dualq dc_ecn k 2 c_limit et_time l_thresh 1.0ms t_shift 40.0ms t_speed 0 drop_dequeue wrr_ratio 0 + +.SH SEE ALSO +.BR tc (8), +.BR tc-pie (8) + +.SH SOURCES +.IP \(bu 4 +IETF draft submission is at https://www.ietf.org/id/draft-ietf-tsvwg-aqm-dualq-coupled +.IP \(bu 4 +CoNEXT '16 Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies : "PI2: A +Linearized AQM for both Classic and Scalable TCP" + +.SH AUTHORS +DUALPI2 was implemented by Koen De Schepper, Olga Albisser, Henrik Steen, and Olivier Tilmans also the authors of +this man page. Please report bugs and corrections to the Linux networking +development mailing list at <netdev@vger.kernel.org>. diff --git a/tc/Makefile b/tc/Makefile index 2edaf2c8..abbd6439 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -75,6 +75,7 @@ TCMODULES += f_matchall.o TCMODULES += q_cbs.o TCMODULES += q_etf.o TCMODULES += q_taprio.o +TCMODULES += q_dualpi2.o TCSO := ifeq ($(TC_CONFIG_ATM),y) diff --git a/tc/q_dualpi2.c b/tc/q_dualpi2.c new file mode 100644 index 00000000..3eb1e6d8 --- /dev/null +++ b/tc/q_dualpi2.c @@ -0,0 +1,472 @@ +/* Copyright (C) 2019 Nokia. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * DualQ PI Improved with a Square (dualpi2) + * Supports controlling scalable congestion controls (DCTCP, etc...) + * Supports DualQ with PI2 + * Supports L4S ECN identifier + * Author: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com> + * Author: Olga Albisser <olga@albisser.org> + * Author: Henrik Steen <henrist@henrist.net> + * + * Based on the PIE implementation: + * Copyright (C) 2013 Cisco Systems, Inc, 2013. + * Author: Vijay Subramanian <vijaynsu@cisco.com> + * Author: Mythili Prabhu <mysuryan@cisco.com> + * + */ + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <syslog.h> +#include <fcntl.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <string.h> +#include <math.h> + +#include "utils.h" +#include "tc_util.h" + +enum { + INET_ECN_NOT_ECT = 0, + INET_ECN_ECT_1 = 1, + INET_ECN_CE = 3, + INET_ECN_MASK = 3, +}; + +static void explain(void) +{ + fprintf(stderr, "Usage: ... dualpi2 [limit PACKETS] [target TIME]"); + fprintf(stderr, " [tupdate TIME]\n"); + fprintf(stderr, " [alpha ALPHA] [beta BETA]\n"); + fprintf(stderr, " [no_dualq|l4s_dualq|dc_dualq]"); + fprintf(stderr, " [k KFACTOR]\n"); + fprintf(stderr, " [no_ecn|classic_ecn|l4s_ecn|dc_ecn]\n"); + fprintf(stderr, " [et_packets|et_time] "); + fprintf(stderr, " [l_thresh TIME|PACKETS]"); + fprintf(stderr, " [t_shift TIME]\n"); + fprintf(stderr, " [c_limit|l_drop PROBABILITY %%]\n"); + fprintf(stderr, " [drop_enqueue|drop_dequeue]\n"); + fprintf(stderr, " [wrr_ratio PACKETS]\n"); +} + +static int get_float(float *val, const char *arg) +{ + float res; + char *ptr; + + if (!arg || !*arg) + return -1; + res = strtof(arg, &ptr); + if (!ptr || ptr == arg || *ptr) + return -1; + *val = res; + return 0; +} + +#define DEFAULT_ALPHA_BETA 0xffffffff +#define ALPHA_BETA_MAX 40000 +#define ALPHA_BETA_MIN_ENABLED 0.00390625 +#define ALPHA_BETA_MIN 0 +#define K_MAX 8 +#define K_MIN 1 +#define P_MAX 100 +#define P_MIN 0 +#define T_SPEED_MAX 31 + +/* iproute2 v4.15 changed the API in commit b317557f5854bb8 / tag v4.15 + * Conveniently, the CBS scheduler got introduced in Linux in commit + * 0f7787b4133fb / tag v4.15. + * Checking for the presence of *CBS*-related defines in pkt_sched.h is thus + * a sign that we are on iproute2 >= 4.15 + */ +static int dualpi2_parse_opt(struct qdisc_util *qu, int argc, char **argv, + #ifdef TCA_CBS_MAX + struct nlmsghdr *n, const char *dev) +#else + struct nlmsghdr *n) +#endif +{ + unsigned int limit = 0; + unsigned int target = 0; + unsigned int tupdate = 0; + unsigned int alpha = DEFAULT_ALPHA_BETA; + float alpha_f = 0; + unsigned int beta = DEFAULT_ALPHA_BETA; + float beta_f = 0; + unsigned int kfactor = 0; + unsigned int et_packets = 0; + unsigned int et_time = 0; + unsigned int l_thresh = 0; + __u16 t_speed = 0; + unsigned int t_shift = 0; + unsigned int c_limit = 0; + unsigned int l_drop = 0; + unsigned int wrr_ratio = 0; + int queue_mask = -1; + int ecn = -1; + int drop_early = -1; + struct rtattr *tail; + + while (argc > 0) { + if (strcmp(*argv, "limit") == 0) { + NEXT_ARG(); + if (get_unsigned(&limit, *argv, 0)) { + fprintf(stderr, "Illegal \"limit\"\n"); + return -1; + } + } else if (strcmp(*argv, "target") == 0) { + NEXT_ARG(); + if (get_time(&target, *argv)) { + fprintf(stderr, "Illegal \"target\"\n"); + return -1; + } + } else if (strcmp(*argv, "tupdate") == 0) { + NEXT_ARG(); + if (get_time(&tupdate, *argv)) { + fprintf(stderr, "Illegal \"tupdate\"\n"); + return -1; + } + } else if (strcmp(*argv, "alpha") == 0) { + NEXT_ARG(); + if (get_float(&alpha_f, *argv) || + alpha_f > ALPHA_BETA_MAX || + alpha_f < ALPHA_BETA_MIN) { + fprintf(stderr, "Illegal \"alpha\"\n"); + return -1; + } + if (alpha_f == 0) { + fprintf(stderr, "Warning: \"alpha\" is zero."); + fprintf(stderr, " Integral controller will"); + fprintf(stderr, " be disabled.\n"); + } else if (alpha_f < ALPHA_BETA_MIN_ENABLED) { + fprintf(stderr, "Warning: \"alpha\" is too "); + fprintf(stderr, "small and will be rounded "); + fprintf(stderr, "to zero. Integral controller"); + fprintf(stderr, " will be disabled\n"); + } + alpha = (unsigned int)(alpha_f * 256); + } else if (strcmp(*argv, "beta") == 0) { + NEXT_ARG(); + if (get_float(&beta_f, *argv) || + beta_f > ALPHA_BETA_MAX || + beta_f < ALPHA_BETA_MIN) { + fprintf(stderr, "Illegal \"beta\"\n"); + return -1; + } + if (beta_f == 0) { + fprintf(stderr, "Warning: \"beta\" is zero. "); + fprintf(stderr, "Proportional controller will"); + fprintf(stderr, "be disabled.\n"); + } else if (beta_f < ALPHA_BETA_MIN_ENABLED) { + fprintf(stderr, "Warning: \"beta\" is too "); + fprintf(stderr, "small and will be rounded to"); + fprintf(stderr, " zero. Proportional "); + fprintf(stderr, "controller will be"); + fprintf(stderr, "disabled.\n"); + } + beta = (unsigned int)(beta_f * 256); + } else if (strcmp(*argv, "k") == 0) { + NEXT_ARG(); + if (get_unsigned(&kfactor, *argv, 0) || + kfactor > K_MAX || kfactor < K_MIN) { + fprintf(stderr, "Illegal \"k\"\n"); + return -1; + } + } else if (strcmp(*argv, "no_dualq") == 0) { + queue_mask = INET_ECN_NOT_ECT; + } else if (strcmp(*argv, "l4s_dualq") == 0) { + queue_mask = INET_ECN_ECT_1; + } else if (strcmp(*argv, "dc_dualq") == 0) { + queue_mask = INET_ECN_CE; + } else if (strcmp(*argv, "no_ecn") == 0) { + ecn = INET_ECN_NOT_ECT; + } else if (strcmp(*argv, "classic_ecn") == 0) { + ecn = (INET_ECN_MASK << 2) | INET_ECN_NOT_ECT; + } else if (strcmp(*argv, "l4s_ecn") == 0) { + ecn = (INET_ECN_MASK << 2) | INET_ECN_ECT_1; + } else if (strcmp(*argv, "dc_ecn") == 0) { + ecn = (INET_ECN_MASK << 2) | INET_ECN_CE; + } else if (strcmp(*argv, "et_packets") == 0) { + et_packets = 1; + } else if (strcmp(*argv, "et_time") == 0) { + et_time = 1; + } else if (strcmp(*argv, "l_thresh") == 0) { + NEXT_ARG(); + if (get_time(&l_thresh, *argv)) { + fprintf(stderr, "Illegal \"l_thresh\"\n"); + return -1; + } + } else if (strcmp(*argv, "t_shift") == 0) { + NEXT_ARG(); + if (get_time(&t_shift, *argv)) { + fprintf(stderr, "Illegal \"t_shift\"\n"); + return -1; + } + } else if (strcmp(*argv, "t_speed") == 0) { + NEXT_ARG(); + if (get_u16(&t_speed, *argv, 0) || + t_speed > T_SPEED_MAX) { + fprintf(stderr, "Illegal \"t_speed\"\n"); + return -1; + } + } else if (strcmp(*argv, "l_drop") == 0) { + NEXT_ARG(); + if (get_unsigned(&l_drop, *argv, 0) || + l_drop > P_MAX || l_drop < P_MIN) { + fprintf(stderr, "Illegal \"l_drop\"\n"); + return -1; + } + } else if (strcmp(*argv, "c_limit") == 0) { + c_limit = 1; + } else if (strcmp(*argv, "drop_enqueue") == 0) { + drop_early = 1; + } else if (strcmp(*argv, "drop_dequeue") == 0) { + drop_early = 0; + } else if (strcmp(*argv, "wrr_ratio") == 0) { + NEXT_ARG(); + if (get_unsigned(&wrr_ratio, *argv, 0)) { + fprintf(stderr, "Illegal \"wrr_ratio\"\n"); + return -1; + } + } else if (strcmp(*argv, "help") == 0) { + explain(); + return -1; + } else { + fprintf(stderr, "What is \"%s\"?\n", *argv); + explain(); + return -1; + } + argc--; + argv++; + } + + if (c_limit && l_drop) { + fprintf(stderr, "c_limit cannot be used with l_drop, use "); + fprintf(stderr, "either c_limit or l_drop (refer to README)"); + explain(); + return -1; + } + + if (et_packets && et_time) { + fprintf(stderr, "et_packets cannot be used with et_time, use "); + fprintf(stderr, "either et_packets or et_time )"); + fprintf(stderr, "(refer to README)"); + explain(); + return -1; + } + + tail = NLMSG_TAIL(n); + addattr_l(n, 1024, TCA_OPTIONS, NULL, 0); + if (limit) + addattr_l(n, 1024, TCA_DUALPI2_LIMIT, &limit, sizeof(limit)); + if (tupdate) + addattr_l(n, 1024, TCA_DUALPI2_TUPDATE, &tupdate, + sizeof(tupdate)); + if (target) + addattr_l(n, 1024, TCA_DUALPI2_TARGET, &target, sizeof(target)); + if (alpha != DEFAULT_ALPHA_BETA) + addattr_l(n, 1024, TCA_DUALPI2_ALPHA, &alpha, sizeof(alpha)); + if (beta != DEFAULT_ALPHA_BETA) + addattr_l(n, 1024, TCA_DUALPI2_BETA, &beta, sizeof(beta)); + if (queue_mask != -1) + addattr_l(n, 1024, TCA_DUALPI2_DUALQ, &queue_mask, + sizeof(queue_mask)); + if (ecn != -1) + addattr_l(n, 1024, TCA_DUALPI2_ECN, &ecn, sizeof(ecn)); + if (l_drop) + addattr_l(n, 1024, TCA_DUALPI2_L_DROP, &l_drop, + sizeof(l_drop)); + if (kfactor) + addattr_l(n, 1024, TCA_DUALPI2_K, &kfactor, sizeof(kfactor)); + if (et_packets) + addattr_l(n, 1024, TCA_DUALPI2_ET_PACKETS, &et_packets, + sizeof(et_packets)); + if (l_thresh) + addattr_l(n, 1024, TCA_DUALPI2_L_THRESH, &l_thresh, + sizeof(l_thresh)); + if (t_shift) + addattr_l(n, 1024, TCA_DUALPI2_T_SHIFT, &t_shift, + sizeof(t_shift)); + if (t_speed) + addattr_l(n, 1024, TCA_DUALPI2_T_SPEED, &t_speed, + sizeof(t_speed)); + if (drop_early != -1) + addattr_l(n, 1024, TCA_DUALPI2_DROP_EARLY, &drop_early, + sizeof(drop_early)); + addattr_l(n, 1024, TCA_DUALPI2_WRR_RATIO, &wrr_ratio, + sizeof(wrr_ratio)); + tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail; + return 0; +} + +static int dualpi2_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) +{ + struct rtattr *tb[TCA_DUALPI2_MAX + 1]; + unsigned int limit; + unsigned int tupdate; + unsigned int target; + unsigned int alpha; + float alpha_f; + unsigned int beta; + float beta_f; + unsigned int queue_mask; + unsigned int ecn; + unsigned int et_packets; + unsigned int l_thresh; + unsigned int t_shift; + unsigned int t_speed; + unsigned int kfactor; + unsigned int l_drop; + unsigned int drop_early; + unsigned int wrr_ratio; + + SPRINT_BUF(b1); + + if (!opt) + return 0; + + parse_rtattr_nested(tb, TCA_DUALPI2_MAX, opt); + + if (tb[TCA_DUALPI2_LIMIT] && + RTA_PAYLOAD(tb[TCA_DUALPI2_LIMIT]) >= sizeof(__u32)) { + limit = rta_getattr_u32(tb[TCA_DUALPI2_LIMIT]); + fprintf(f, "limit %up ", limit); + } + if (tb[TCA_DUALPI2_TARGET] && + RTA_PAYLOAD(tb[TCA_DUALPI2_TARGET]) >= sizeof(__u32)) { + target = rta_getattr_u32(tb[TCA_DUALPI2_TARGET]); + fprintf(f, "target %s ", sprint_time(target, b1)); + } + if (tb[TCA_DUALPI2_TUPDATE] && + RTA_PAYLOAD(tb[TCA_DUALPI2_TUPDATE]) >= sizeof(__u32)) { + tupdate = rta_getattr_u32(tb[TCA_DUALPI2_TUPDATE]); + fprintf(f, "tupdate %s ", sprint_time(tupdate, b1)); + } + if (tb[TCA_DUALPI2_ALPHA] && + RTA_PAYLOAD(tb[TCA_DUALPI2_ALPHA]) >= sizeof(__u32)) { + alpha = rta_getattr_u32(tb[TCA_DUALPI2_ALPHA]); + alpha_f = (float)alpha / 256; + fprintf(f, "alpha %f ", alpha_f); + } + if (tb[TCA_DUALPI2_BETA] && + RTA_PAYLOAD(tb[TCA_DUALPI2_BETA]) >= sizeof(__u32)) { + beta = rta_getattr_u32(tb[TCA_DUALPI2_BETA]); + beta_f = (float)beta / 256; + fprintf(f, "beta %f ", beta_f); + } + if (tb[TCA_DUALPI2_DUALQ] && + RTA_PAYLOAD(tb[TCA_DUALPI2_DUALQ]) >= sizeof(__u32)) { + queue_mask = rta_getattr_u32(tb[TCA_DUALPI2_DUALQ]); + if (queue_mask == INET_ECN_NOT_ECT) + fprintf(f, "no_dualq "); + else if (queue_mask == INET_ECN_ECT_1) + fprintf(f, "l4s_dualq "); + else if (queue_mask == INET_ECN_CE) + fprintf(f, "dc_dualq "); + } + if (tb[TCA_DUALPI2_ECN] && + RTA_PAYLOAD(tb[TCA_DUALPI2_ECN]) >= sizeof(__u32)) { + ecn = rta_getattr_u32(tb[TCA_DUALPI2_ECN]); + if (ecn == INET_ECN_NOT_ECT) + fprintf(f, "no_ecn "); + else if (ecn == (INET_ECN_MASK << 2)) + fprintf(f, "classic_ecn "); + else if (ecn == ((INET_ECN_MASK << 2) | 1)) + fprintf(f, "l4s_ecn "); + else if (ecn == ((INET_ECN_MASK << 2) | INET_ECN_CE)) + fprintf(f, "dc_ecn "); + } + if (tb[TCA_DUALPI2_K] && + RTA_PAYLOAD(tb[TCA_DUALPI2_K]) >= sizeof(__u32)) { + kfactor = rta_getattr_u32(tb[TCA_DUALPI2_K]); + fprintf(f, "k %u ", kfactor); + } + if (tb[TCA_DUALPI2_L_DROP] && + RTA_PAYLOAD(tb[TCA_DUALPI2_L_DROP]) >= sizeof(__u32)) { + l_drop = rta_getattr_u32(tb[TCA_DUALPI2_L_DROP]); + if (l_drop > 0) + fprintf(f, "l_drop %u ", l_drop); + else + fprintf(f, "c_limit "); + } + if (tb[TCA_DUALPI2_ET_PACKETS] && + RTA_PAYLOAD(tb[TCA_DUALPI2_ET_PACKETS]) >= sizeof(__u32)) { + et_packets = rta_getattr_u32(tb[TCA_DUALPI2_ET_PACKETS]); + if (et_packets > 0) + fprintf(f, "et_packets "); + else + fprintf(f, "et_time "); + } + if (tb[TCA_DUALPI2_L_THRESH] && + RTA_PAYLOAD(tb[TCA_DUALPI2_L_THRESH]) >= sizeof(__u32)) { + l_thresh = rta_getattr_u32(tb[TCA_DUALPI2_L_THRESH]); + fprintf(f, "l_thresh %s ", sprint_time(l_thresh, b1)); + } + if (tb[TCA_DUALPI2_T_SHIFT] && + RTA_PAYLOAD(tb[TCA_DUALPI2_T_SHIFT]) >= sizeof(__u32)) { + t_shift = rta_getattr_u32(tb[TCA_DUALPI2_T_SHIFT]); + fprintf(f, "t_shift %s ", sprint_time(t_shift, b1)); + } + if (tb[TCA_DUALPI2_T_SPEED] && + RTA_PAYLOAD(tb[TCA_DUALPI2_T_SPEED]) >= sizeof(__u16)) { + t_speed = rta_getattr_u16(tb[TCA_DUALPI2_T_SPEED]); + fprintf(f, "t_speed %u ", t_speed); + } + if (tb[TCA_DUALPI2_DROP_EARLY] && + RTA_PAYLOAD(tb[TCA_DUALPI2_DROP_EARLY]) >= sizeof(__u32)) { + drop_early = rta_getattr_u32(tb[TCA_DUALPI2_DROP_EARLY]); + if (drop_early) + fprintf(f, "drop_enqueue "); + else + fprintf(f, "drop_dequeue "); + } + if (tb[TCA_DUALPI2_WRR_RATIO] && + RTA_PAYLOAD(tb[TCA_DUALPI2_WRR_RATIO]) >= sizeof(__u32)) { + wrr_ratio = rta_getattr_u32(tb[TCA_DUALPI2_WRR_RATIO]); + fprintf(f, "wrr_ratio %u ", wrr_ratio); + } + + return 0; +} + +static int dualpi2_print_xstats(struct qdisc_util *qu, FILE *f, + struct rtattr *xstats) +{ + struct tc_dualpi2_xstats *st; + + if (!xstats) + return 0; + + if (RTA_PAYLOAD(xstats) < sizeof(*st)) + return -1; + + st = RTA_DATA(xstats); + /*prob is returned as a fraction of maximum integer value */ + fprintf(f, "prob %f delay_c %uus delay_l %uus\n", + (double)st->prob / (double)0xffffffff, st->delay_c, + st->delay_l); + fprintf(f, "pkts_in %u overlimit %u dropped %u maxq %u ecn_mark %u\n", + st->packets_in, st->overlimit, st->dropped, st->maxq, + st->ecn_mark); + return 0; +} + +struct qdisc_util dualpi2_qdisc_util = { + .id = "dualpi2", + .parse_qopt = dualpi2_parse_opt, + .print_qopt = dualpi2_print_opt, + .print_xstats = dualpi2_print_xstats, +};

[iproute2] tc: add dualpi2 scheduler module

Commit Message

Patch