Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/1.2/patches/825817/?format=api
{ "id": 825817, "url": "http://patchwork.ozlabs.org/api/1.2/patches/825817/?format=api", "web_url": "http://patchwork.ozlabs.org/project/netdev/patch/20171014114714.3694-6-natale.patriciello@gmail.com/", "project": { "id": 7, "url": "http://patchwork.ozlabs.org/api/1.2/projects/7/?format=api", "name": "Linux network development", "link_name": "netdev", "list_id": "netdev.vger.kernel.org", "list_email": "netdev@vger.kernel.org", "web_url": null, "scm_url": null, "webscm_url": null, "list_archive_url": "", "list_archive_url_format": "", "commit_url_format": "" }, "msgid": "<20171014114714.3694-6-natale.patriciello@gmail.com>", "list_archive_url": null, "date": "2017-10-14T11:47:14", "name": "[RFC,v2,5/5] wave: Added TCP Wave", "commit_ref": null, "pull_url": null, "state": "rfc", "archived": true, "hash": "ad388aa5fa6bf872cfb579a02723a33eedcd2317", "submitter": { "id": 72063, "url": "http://patchwork.ozlabs.org/api/1.2/people/72063/?format=api", "name": "Natale Patriciello", "email": "natale.patriciello@gmail.com" }, "delegate": { "id": 34, "url": "http://patchwork.ozlabs.org/api/1.2/users/34/?format=api", "username": "davem", "first_name": "David", "last_name": "Miller", "email": "davem@davemloft.net" }, "mbox": "http://patchwork.ozlabs.org/project/netdev/patch/20171014114714.3694-6-natale.patriciello@gmail.com/mbox/", "series": [ { "id": 8182, "url": "http://patchwork.ozlabs.org/api/1.2/series/8182/?format=api", "web_url": "http://patchwork.ozlabs.org/project/netdev/list/?series=8182", "date": "2017-10-14T11:47:09", "name": "TCP Wave", "version": 2, "mbox": "http://patchwork.ozlabs.org/series/8182/mbox/" } ], "comments": "http://patchwork.ozlabs.org/api/patches/825817/comments/", "check": "pending", "checks": "http://patchwork.ozlabs.org/api/patches/825817/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<netdev-owner@vger.kernel.org>", "X-Original-To": "patchwork-incoming@ozlabs.org", "Delivered-To": "patchwork-incoming@ozlabs.org", "Authentication-Results": [ "ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)", "ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"i3Lx70Nu\"; dkim-atps=neutral" ], "Received": [ "from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3yDjZ56KYSz9sNx\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 14 Oct 2017 22:50:05 +1100 (AEDT)", "(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1753538AbdJNLuD (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tSat, 14 Oct 2017 07:50:03 -0400", "from mail-wr0-f193.google.com ([209.85.128.193]:34004 \"EHLO\n\tmail-wr0-f193.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1753506AbdJNLty (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Sat, 14 Oct 2017 07:49:54 -0400", "by mail-wr0-f193.google.com with SMTP id l1so2187553wrc.1\n\tfor <netdev@vger.kernel.org>; Sat, 14 Oct 2017 04:49:53 -0700 (PDT)", "from localhost.localdomain (62.57.152.197.dyn.user.ono.com.\n\t[62.57.152.197])\n\tby smtp.gmail.com with ESMTPSA id 4sm4560337wmm.1.2017.10.14.04.49.51\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tSat, 14 Oct 2017 04:49:52 -0700 (PDT)" ], "DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=from:to:cc:subject:date:message-id:in-reply-to:references;\n\tbh=90vY5j8fstNhZBoJGM6tRdnvwZuBffB1KeTgwvYmi7A=;\n\tb=i3Lx70NuQijVm+trutfEfDwPSLg84kmi84iW8iF+O3q1JDVZJW19nwvNKT2YTap8LY\n\thazUFUpZeVb9GXFQemPFpXDo1P5A7GcwdeR7qjgCpDh+DnKUkEdIbDYAe5E6+C7yQobi\n\txR7AABn085y6B3n6CMsMucIRpj2poLOBcor+bz5CaUfdHrtSDGGHOqB1cGRj9BD68Ooy\n\ts4LAHgqU7j4kj6aKSiUoQtOrXkg1gEgSCv376JEYZP7Q4b998oOGUEuiw6rnK6cTCQ/l\n\tAg55r7aqQE1ATRK4SGyEtq6WvdR5bsITZrMfig09/MjJdO0/49Qr6V/nzrigsXmUvf0s\n\tQhlA==", "X-Google-DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to\n\t:references;\n\tbh=90vY5j8fstNhZBoJGM6tRdnvwZuBffB1KeTgwvYmi7A=;\n\tb=Eo6zqEfXfEg7u/BWA+8yleuDiDLDAuyNom2WuynQkebIXdq79KzxuD6yuO/8FtnH/s\n\t0H3GNDt6gFoDx3Js1ripvaHNXG8bNSg3OecCV7LaSIajMPVQXfAwIcSFxu3dY77TFuSN\n\tZlznxMP8lzsiTFQrZ2wlTuuf3zwg1wvu/+UhlTY+cu2omxSgw/7uZLp3gkiljttkDvZJ\n\tGFI0OYWbKVBSamQPbBcpXKbLv5Mzte3jUQWsMlQdbKnnYaKI/ywFK1ca3oMcvRMdiIsE\n\t5KjwjmJgaxJGN5fhqYWllIpPcWH9Pm8r07V5I9U7Ou8nftqGK0MMrea+rMI0zHmB5ebb\n\tqtlQ==", "X-Gm-Message-State": "AMCzsaWE1/LdVkRm/zRHdOqlec2CXP8NVM+su2zzZAD93Wp1IP3Z545w\n\t6zT0aLN+yfsXzInjpKiXYVU=", "X-Google-Smtp-Source": "AOwi7QCAaa4TI59xDs6Tvt9vshTUrnKNO1K9MZ0mCgp1aapeKwgSBl+X0FD2fkPflBH0bTh2EnbD5g==", "X-Received": "by 10.223.163.141 with SMTP id l13mr3889096wrb.54.1507981792670; \n\tSat, 14 Oct 2017 04:49:52 -0700 (PDT)", "From": "Natale Patriciello <natale.patriciello@gmail.com>", "To": "\"David S . Miller\" <davem@davemloft.net>,\n\tEric Dumazet <eric.dumazet@gmail.com>", "Cc": "netdev <netdev@vger.kernel.org>, Ahmed Said <ahmed.said@uniroma2.it>,\n\tNatale Patriciello <natale.patriciello@gmail.com>,\n\tFrancesco Zampognaro <zampognaro@ing.uniroma2.it>,\n\tCesare Roseti <roseti@ing.uniroma2.it>", "Subject": "[RFC PATCH v2 5/5] wave: Added TCP Wave", "Date": "Sat, 14 Oct 2017 13:47:14 +0200", "Message-Id": "<20171014114714.3694-6-natale.patriciello@gmail.com>", "X-Mailer": "git-send-email 2.14.2", "In-Reply-To": "<20171014114714.3694-1-natale.patriciello@gmail.com>", "References": "<20171014114714.3694-1-natale.patriciello@gmail.com>", "Sender": "netdev-owner@vger.kernel.org", "Precedence": "bulk", "List-ID": "<netdev.vger.kernel.org>", "X-Mailing-List": "netdev@vger.kernel.org" }, "content": "TCP Wave (TCPW) replaces the window-based transmission paradigm of the\nstandard TCP with a burst-based transmission, the ACK-clock scheduling\nwith a self-managed timer and the RTT-based congestion control loop\nwith an Ack-based Capacity and Congestion Estimation (ACCE) module. In\nnon-technical words, it sends data down the stack when its internal\ntimer expires, and the timing of the received ACKs contribute to\nupdating this timer regularly.\n\nIt is the first TCP congestion control that uses the timing constraint\ndeveloped in the Linux kernel.\n\nSigned-off-by: Natale Patriciello <natale.patriciello@gmail.com>\nTested-by: Ahmed Said <ahmed.said@uniroma2.it>\n---\n MAINTAINERS | 6 +\n include/uapi/linux/inet_diag.h | 13 +\n net/ipv4/Kconfig | 16 +\n net/ipv4/Makefile | 1 +\n net/ipv4/tcp_output.c | 4 +-\n net/ipv4/tcp_wave.c | 1035 ++++++++++++++++++++++++++++++++++++++++\n 6 files changed, 1074 insertions(+), 1 deletion(-)\n create mode 100644 net/ipv4/tcp_wave.c", "diff": "diff --git a/MAINTAINERS b/MAINTAINERS\nindex 2d3d750b19c0..b59815dcda67 100644\n--- a/MAINTAINERS\n+++ b/MAINTAINERS\n@@ -13024,6 +13024,12 @@ W:\thttp://tcp-lp-mod.sourceforge.net/\n S:\tMaintained\n F:\tnet/ipv4/tcp_lp.c\n \n+TCP WAVE MODULE\n+M:\t\"Natale Patriciello\" <natale.patriciello@gmail.com>\n+W:\thttp://tlcsat.uniroma2.it/tcpwave4linux/\n+S:\tMaintained\n+F:\tnet/ipv4/tcp_wave.c\n+\n TDA10071 MEDIA DRIVER\n M:\tAntti Palosaari <crope@iki.fi>\n L:\tlinux-media@vger.kernel.org\ndiff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h\nindex f52ff62bfabe..2f204844e580 100644\n--- a/include/uapi/linux/inet_diag.h\n+++ b/include/uapi/linux/inet_diag.h\n@@ -142,6 +142,7 @@ enum {\n \tINET_DIAG_PAD,\n \tINET_DIAG_MARK,\n \tINET_DIAG_BBRINFO,\n+\tINET_DIAG_WAVEINFO,\n \tINET_DIAG_CLASS_ID,\n \tINET_DIAG_MD5SIG,\n \t__INET_DIAG_MAX,\n@@ -188,9 +189,21 @@ struct tcp_bbr_info {\n \t__u32\tbbr_cwnd_gain;\t\t/* cwnd gain shifted left 8 bits */\n };\n \n+/* INET_DIAG_WAVEINFO */\n+\n+struct tcp_wave_info {\n+\t__u32\ttx_timer;\n+\t__u16\tburst;\n+\t__u32\tprevious_ack_t_disp;\n+\t__u32\tmin_rtt;\n+\t__u32\tavg_rtt;\n+\t__u32\tmax_rtt;\n+};\n+\n union tcp_cc_info {\n \tstruct tcpvegas_info\tvegas;\n \tstruct tcp_dctcp_info\tdctcp;\n \tstruct tcp_bbr_info\tbbr;\n+\tstruct tcp_wave_info\twave;\n };\n #endif /* _UAPI_INET_DIAG_H_ */\ndiff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig\nindex 91a2557942fa..de23b3a04b98 100644\n--- a/net/ipv4/Kconfig\n+++ b/net/ipv4/Kconfig\n@@ -492,6 +492,18 @@ config TCP_CONG_BIC\n \tincrease provides TCP friendliness.\n \tSee http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/\n \n+config TCP_CONG_WAVE\n+\ttristate \"Wave TCP\"\n+\tdefault m\n+\t---help---\n+\tTCP Wave (TCPW) replaces the window-based transmission paradigm of the\n+\tstandard TCP with a burst-based transmission, the ACK-clock scheduling\n+\twith a self-managed timer and the RTT-based congestion control loop with\n+\tan Ack-based Capacity and Congestion Estimation (ACCE) module. In\n+\tnon-technical words, it sends data down the stack when its internal\n+\ttimer expires, and the timing of the received ACKs contribute to\n+\tupdating this timer regularly.\n+\n config TCP_CONG_CUBIC\n \ttristate \"CUBIC TCP\"\n \tdefault y\n@@ -690,6 +702,9 @@ choice\n \tconfig DEFAULT_CUBIC\n \t\tbool \"Cubic\" if TCP_CONG_CUBIC=y\n \n+\tconfig DEFAULT_WAVE\n+\t\tbool \"Wave\" if TCP_CONG_WAVE=y\n+\n \tconfig DEFAULT_HTCP\n \t\tbool \"Htcp\" if TCP_CONG_HTCP=y\n \n@@ -729,6 +744,7 @@ config DEFAULT_TCP_CONG\n \tstring\n \tdefault \"bic\" if DEFAULT_BIC\n \tdefault \"cubic\" if DEFAULT_CUBIC\n+\tdefault \"wave\" if DEFAULT_WAVE\n \tdefault \"htcp\" if DEFAULT_HTCP\n \tdefault \"hybla\" if DEFAULT_HYBLA\n \tdefault \"vegas\" if DEFAULT_VEGAS\ndiff --git a/net/ipv4/Makefile b/net/ipv4/Makefile\nindex afcb435adfbe..bdc8cd1a804a 100644\n--- a/net/ipv4/Makefile\n+++ b/net/ipv4/Makefile\n@@ -47,6 +47,7 @@ obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o\n obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o\n obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o\n obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o\n+obj-$(CONFIG_TCP_CONG_WAVE) += tcp_wave.o\n obj-$(CONFIG_TCP_CONG_DCTCP) += tcp_dctcp.o\n obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o\n obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o\ndiff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c\nindex ef50202659da..40ec467e5afd 100644\n--- a/net/ipv4/tcp_output.c\n+++ b/net/ipv4/tcp_output.c\n@@ -2527,7 +2527,9 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)\n {\n \tstruct sk_buff *skb = tcp_send_head(sk);\n \n-\tBUG_ON(!skb || skb->len < mss_now);\n+\t/* Don't be forced to send not meaningful data */\n+\tif (!skb || skb->len < mss_now)\n+\t\treturn;\n \n \ttcp_write_xmit(sk, mss_now, TCP_NAGLE_PUSH, 1, sk->sk_allocation);\n }\ndiff --git a/net/ipv4/tcp_wave.c b/net/ipv4/tcp_wave.c\nnew file mode 100644\nindex 000000000000..f5a1e1412caf\n--- /dev/null\n+++ b/net/ipv4/tcp_wave.c\n@@ -0,0 +1,1035 @@\n+/*\n+ * TCP Wave\n+ *\n+ * Copyright 2017 Natale Patriciello <natale.patriciello@gmail.com>\n+ *\n+ * This program is free software: you can redistribute it and/or modify\n+ * it under the terms of the GNU General Public License as published by\n+ * the Free Software Foundation, either version 3 of the License, or\n+ * (at your option) any later version.\n+ *\n+ * This program is distributed in the hope that it will be useful,\n+ * but WITHOUT ANY WARRANTY; without even the implied warranty of\n+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n+ * GNU General Public License for more details.\n+ *\n+ * You should have received a copy of the GNU General Public License\n+ * along with this program. If not, see <http://www.gnu.org/licenses/>.\n+ *\n+ */\n+\n+#define pr_fmt(fmt) \"WAVE: \" fmt\n+\n+#include <net/tcp.h>\n+#include <linux/inet_diag.h>\n+#include <linux/module.h>\n+\n+#define NOW ktime_to_us(ktime_get())\n+#define SPORT(sk) ntohs(inet_sk(sk)->inet_sport)\n+#define DPORT(sk) ntohs(inet_sk(sk)->inet_dport)\n+\n+static uint init_burst __read_mostly = 10;\n+static uint min_burst __read_mostly = 3;\n+static uint init_timer_ms __read_mostly = 200;\n+static uint beta_ms __read_mostly = 150;\n+\n+module_param(init_burst, uint, 0644);\n+MODULE_PARM_DESC(init_burst, \"initial burst (segments)\");\n+module_param(min_burst, uint, 0644);\n+MODULE_PARM_DESC(min_burst, \"minimum burst (segments)\");\n+module_param(init_timer_ms, uint, 0644);\n+MODULE_PARM_DESC(init_timer_ms, \"initial timer (ms)\");\n+module_param(beta_ms, uint, 0644);\n+MODULE_PARM_DESC(beta_ms, \"beta parameter (ms)\");\n+\n+/* Shift factor for the exponentially weighted average. */\n+#define AVG_SCALE 20\n+#define AVG_UNIT BIT(AVG_SCALE)\n+\n+/* Tell if the driver is initialized (init has been called) */\n+#define FLAG_INIT 0x1\n+/* Tell if, as sender, the driver is started (after TX_START) */\n+#define FLAG_START 0x2\n+/* If it's true, we save the sent size as a burst */\n+#define FLAG_SAVE 0x4\n+\n+/* List for saving the size of sent burst over time */\n+struct wavetcp_burst_hist {\n+\tu16 size; /* The burst size */\n+\tstruct list_head list; /* Kernel list declaration */\n+};\n+\n+static bool test_flag(u8 flags, u8 value)\n+{\n+\treturn (flags & value) == value;\n+}\n+\n+static void set_flag(u8 *flags, u8 value)\n+{\n+\t*flags |= value;\n+}\n+\n+static void clear_flag(u8 *flags, u8 value)\n+{\n+\t*flags &= ~(value);\n+}\n+\n+static bool ktime_is_null(ktime_t kt)\n+{\n+\treturn ktime_compare(kt, ns_to_ktime(0)) == 0;\n+}\n+\n+/* TCP Wave private struct */\n+struct wavetcp {\n+\tu8 flags; /* The module flags */\n+\tu32 tx_timer; /* The current transmission timer (us) */\n+\tu8 burst; /* The current burst size (segments) */\n+\ts8 delta_segments; /* Difference between sent and burst size */\n+\tu16 pkts_acked; /* The segments acked in the round */\n+\tu8 backup_pkts_acked;\n+\tu8 aligned_acks_rcv; /* The number of ACKs received in a round */\n+\tu8 heuristic_scale; /* Heuristic scale, to divide the RTT */\n+\tktime_t previous_ack_t_disp; /* Previous ack_train_disp Value */\n+\tktime_t first_ack_time; /* First ACK time of the round */\n+\tktime_t last_ack_time; /* Last ACK time of the round */\n+\tu32 backup_first_ack_time_us; /* Backup value of the first ack time */\n+\tu32 previous_rtt; /* RTT of the previous acked segment */\n+\tu32 first_rtt; /* First RTT of the round */\n+\tu32 min_rtt; /* Minimum RTT of the round */\n+\tu32 avg_rtt; /* Average RTT of the previous round */\n+\tu32 max_rtt; /* Maximum RTT */\n+\tu8 stab_factor; /* Stability factor */\n+\tstruct kmem_cache *cache; /* The memory for saving the burst sizes */\n+\tstruct wavetcp_burst_hist *history; /* The burst history */\n+};\n+\n+/* Called to setup Wave for the current socket after it enters the CONNECTED\n+ * state (i.e., called after the SYN-ACK is received). The slow start should be\n+ * 0 (see wavetcp_get_ssthresh) and we set the initial cwnd to the initial\n+ * burst.\n+ *\n+ * After the ACK of the SYN-ACK is sent, the TCP will add a bit of delay to\n+ * permit the queueing of data from the application, otherwise we will end up\n+ * in a scattered situation (we have one segment -> send it -> no other segment,\n+ * don't set the timer -> slightly after, another segment come and we loop).\n+ *\n+ * At the first expiration, the cwnd will be large enough to push init_burst\n+ * segments out.\n+ */\n+static void wavetcp_init(struct sock *sk)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tstruct tcp_sock *tp = tcp_sk(sk);\n+\n+\tpr_debug(\"%llu sport: %u [%s]\\n\", NOW, SPORT(sk), __func__);\n+\n+\t/* Setting the initial Cwnd to 0 will not call the TX_START event */\n+\ttp->snd_ssthresh = 0;\n+\ttp->snd_cwnd = init_burst;\n+\n+\t/* Used to avoid to take the SYN-ACK measurements */\n+\tca->flags = 0;\n+\tca->flags = FLAG_INIT | FLAG_SAVE;\n+\n+\tca->burst = init_burst;\n+\tca->delta_segments = init_burst;\n+\tca->tx_timer = init_timer_ms * USEC_PER_MSEC;\n+\tca->pkts_acked = 0;\n+\tca->backup_pkts_acked = 0;\n+\tca->aligned_acks_rcv = 0;\n+\tca->first_ack_time = ns_to_ktime(0);\n+\tca->backup_first_ack_time_us = 0;\n+\tca->heuristic_scale = 0;\n+\tca->first_rtt = 0;\n+\tca->min_rtt = -1; /* a lot of time */\n+\tca->avg_rtt = 0;\n+\tca->max_rtt = 0;\n+\tca->stab_factor = 0;\n+\tca->previous_ack_t_disp = ns_to_ktime(0);\n+\n+\tca->history = kmalloc(sizeof(*ca->history), GFP_KERNEL);\n+\n+\t/* Init the history of bwnd */\n+\tINIT_LIST_HEAD(&ca->history->list);\n+\n+\t/* Init our cache pool for the bwnd history */\n+\tca->cache = KMEM_CACHE(wavetcp_burst_hist, 0);\n+\n+\tcmpxchg(&sk->sk_pacing_status, SK_PACING_NONE, SK_PACING_NEEDED);\n+}\n+\n+static void wavetcp_release(struct sock *sk)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tstruct wavetcp_burst_hist *tmp;\n+\tstruct list_head *pos, *q;\n+\n+\tif (!test_flag(ca->flags, FLAG_INIT))\n+\t\treturn;\n+\n+\tpr_debug(\"%llu sport: %u [%s]\\n\", NOW, SPORT(sk), __func__);\n+\n+\tlist_for_each_safe(pos, q, &ca->history->list) {\n+\t\ttmp = list_entry(pos, struct wavetcp_burst_hist, list);\n+\t\tlist_del(pos);\n+\t\tkmem_cache_free(ca->cache, tmp);\n+\t}\n+\n+\tkfree(ca->history);\n+\tkmem_cache_destroy(ca->cache);\n+}\n+\n+/* Please explain that we will be forever in congestion avoidance. */\n+static u32 wavetcp_recalc_ssthresh(struct sock *sk)\n+{\n+\tpr_debug(\"%llu [%s]\\n\", NOW, __func__);\n+\treturn 0;\n+}\n+\n+static void wavetcp_state(struct sock *sk, u8 new_state)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\tif (!test_flag(ca->flags, FLAG_INIT))\n+\t\treturn;\n+\n+\tswitch (new_state) {\n+\tcase TCP_CA_Open:\n+\t\tpr_debug(\"%llu sport: %u [%s] set CA_Open\\n\", NOW,\n+\t\t\t SPORT(sk), __func__);\n+\t\t/* We have fully recovered, so reset some variables */\n+\t\tca->delta_segments = 0;\n+\t\tbreak;\n+\tdefault:\n+\t\tpr_debug(\"%llu sport: %u [%s] set state %u, ignored\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, new_state);\n+\t}\n+}\n+\n+static u32 wavetcp_undo_cwnd(struct sock *sk)\n+{\n+\tstruct tcp_sock *tp = tcp_sk(sk);\n+\n+\t/* Not implemented yet. We stick to the decision made earlier */\n+\tpr_debug(\"%llu [%s]\\n\", NOW, __func__);\n+\treturn tp->snd_cwnd;\n+}\n+\n+/* Add the size of the burst in the history of bursts */\n+static void wavetcp_insert_burst(struct wavetcp *ca, u32 burst)\n+{\n+\tstruct wavetcp_burst_hist *cur;\n+\n+\tpr_debug(\"%llu [%s] adding %u segment in the history of burst\\n\", NOW,\n+\t\t __func__, burst);\n+\t/* Take the memory from the pre-allocated pool */\n+\tcur = (struct wavetcp_burst_hist *)kmem_cache_alloc(ca->cache,\n+\t\t\t\t\t\t\t GFP_KERNEL);\n+\tBUG_ON(!cur);\n+\n+\tcur->size = burst;\n+\tlist_add_tail(&cur->list, &ca->history->list);\n+}\n+\n+static void wavetcp_cwnd_event(struct sock *sk, enum tcp_ca_event event)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\tif (!test_flag(ca->flags, FLAG_INIT))\n+\t\treturn;\n+\n+\tswitch (event) {\n+\tcase CA_EVENT_TX_START:\n+\t\t/* first transmit when no packets in flight */\n+\t\tpr_debug(\"%llu sport: %u [%s] TX_START\\n\", NOW,\n+\t\t\t SPORT(sk), __func__);\n+\n+\t\tset_flag(&ca->flags, FLAG_START);\n+\n+\t\tbreak;\n+\tdefault:\n+\t\tpr_debug(\"%llu sport: %u [%s] got event %u, ignored\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, event);\n+\t\tbreak;\n+\t}\n+}\n+\n+static void wavetcp_adj_mode(struct sock *sk, unsigned long delta_rtt)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\tca->stab_factor = ca->avg_rtt / ca->tx_timer;\n+\n+\tca->min_rtt = -1; /* a lot of time */\n+\tca->avg_rtt = ca->max_rtt;\n+\tca->tx_timer = init_timer_ms * USEC_PER_MSEC;\n+\n+\tpr_debug(\"%llu sport: %u [%s] stab_factor %u, timer %u us, avg_rtt %u us\\n\",\n+\t\t NOW, SPORT(sk), __func__, ca->stab_factor,\n+\t\t ca->tx_timer, ca->avg_rtt);\n+}\n+\n+static void wavetcp_tracking_mode(struct sock *sk, u64 delta_rtt,\n+\t\t\t\t ktime_t ack_train_disp)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\tif (ktime_is_null(ack_train_disp)) {\n+\t\tpr_debug(\"%llu sport: %u [%s] ack_train_disp is 0. Impossible to do tracking.\\n\",\n+\t\t\t NOW, SPORT(sk), __func__);\n+\t\treturn;\n+\t}\n+\n+\tca->tx_timer = (ktime_to_us(ack_train_disp) + (delta_rtt / 2));\n+\n+\tif (ca->tx_timer == 0) {\n+\t\tpr_debug(\"%llu sport: %u [%s] WARNING: tx timer is 0\"\n+\t\t\t \", forcefully set it to 1000 us\\n\",\n+\t\t\t NOW, SPORT(sk), __func__);\n+\t\tca->tx_timer = 1000;\n+\t}\n+\n+\tpr_debug(\"%llu sport: %u [%s] tx timer is %u us\\n\",\n+\t\t NOW, SPORT(sk), __func__, ca->tx_timer);\n+}\n+\n+/* The weight a is:\n+ *\n+ * a = (first_rtt - min_rtt) / first_rtt\n+ *\n+ */\n+static u64 wavetcp_compute_weight(u32 first_rtt, u32 min_rtt)\n+{\n+\tu64 diff = first_rtt - min_rtt;\n+\n+\tdiff = diff * AVG_UNIT;\n+\n+\treturn diff / first_rtt;\n+}\n+\n+static ktime_t heuristic_ack_train_disp(struct sock *sk,\n+\t\t\t\t\tconst struct rate_sample *rs,\n+\t\t\t\t\tu32 burst)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tktime_t ack_train_disp = ns_to_ktime(0);\n+\tktime_t interval = ns_to_ktime(0);\n+\tktime_t backup_first_ack = ns_to_ktime(0);\n+\n+\tif (rs->interval_us <= 0) {\n+\t\tpr_debug(\"%llu sport: %u [%s] WARNING is not possible \"\n+\t\t\t \"to heuristically calculate ack_train_disp, returning 0.\"\n+\t\t\t \"Delivered %u, interval_us %li\\n\",\n+\t\t\t NOW, SPORT(sk), __func__,\n+\t\t\t rs->delivered, rs->interval_us);\n+\t\treturn ack_train_disp;\n+\t}\n+\n+\tinterval = ns_to_ktime(rs->interval_us * NSEC_PER_USEC);\n+\tbackup_first_ack = ns_to_ktime(ca->backup_first_ack_time_us * NSEC_PER_USEC);\n+\n+\t/* The heuristic takes the RTT of the first ACK, the RTT of the\n+\t * latest ACK, and uses the difference as ack_train_disp.\n+\t *\n+\t * If the sample for the first and last ACK are the same (e.g.,\n+\t * one ACK per burst) we use as the latest option the value of\n+\t * interval_us (which is the RTT). However, this value is\n+\t * exponentially lowered each time we don't have any valid\n+\t * sample (i.e., we perform a division by 2, by 4, and so on).\n+\t * The increased transmitted rate, if it is out of the capacity\n+\t * of the bottleneck, will be compensated by an higher\n+\t * delta_rtt, and so limited by the adjustment algorithm. This\n+\t * is a blind search, but we do not have any valid sample...\n+\t */\n+\tif (ktime_compare(interval, backup_first_ack) > 0) {\n+\t\t/* first heuristic */\n+\t\tack_train_disp = ktime_sub(interval, backup_first_ack);\n+\t} else {\n+\t\t/* this branch avoids an overflow. However, reaching\n+\t\t * this point means that the ACK train is not aligned\n+\t\t * with the sent burst.\n+\t\t */\n+\t\tack_train_disp = ktime_sub(backup_first_ack, interval);\n+\t}\n+\n+\tif (ktime_is_null(ack_train_disp)) {\n+\t\t/* Blind search */\n+\t\tu32 blind_interval_us = rs->interval_us >> ca->heuristic_scale;\n+\t\t++ca->heuristic_scale;\n+\t\tack_train_disp = ns_to_ktime(blind_interval_us * NSEC_PER_USEC);\n+\t\tpr_debug(\"%llu sport: %u [%s] we received one BIG ack.\"\n+\t\t\t \" Doing an heuristic with scale %u, interval_us\"\n+\t\t\t \" %li us, and setting ack_train_disp to %lli us\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, ca->heuristic_scale,\n+\t\t\t rs->interval_us, ktime_to_us(ack_train_disp));\n+\t} else {\n+\t\tpr_debug(\"%llu sport: %u [%s] we got the first ack with\"\n+\t\t\t \" interval %u us, the last (this) with interval %li us.\"\n+\t\t\t \" Doing a substraction and setting ack_train_disp\"\n+\t\t\t \" to %lli us\\n\", NOW, SPORT(sk), __func__,\n+\t\t\t ca->backup_first_ack_time_us, rs->interval_us,\n+\t\t\t ktime_to_us(ack_train_disp));\n+\t}\n+\n+\treturn ack_train_disp;\n+}\n+\n+/* In case that round_burst == current_burst:\n+ *\n+ * ack_train_disp = last - first * (rcv_ack/rcv_ack-1)\n+ * |__________| |_________________|\n+ * left right\n+ *\n+ * else (assuming left is last - first)\n+ *\n+ * left\n+ * ack_train_disp = ------------ * current_burst\n+ * round_burst\n+ */\n+static ktime_t get_ack_train_disp(const ktime_t *last_ack_time,\n+\t\t\t\t const ktime_t *first_ack_time,\n+\t\t\t\t u8 aligned_acks_rcv, u32 round_burst,\n+\t\t\t\t u32 current_burst)\n+{\n+\tu64 left = ktime_to_ns(*last_ack_time) - ktime_to_ns(*first_ack_time);\n+\tu64 right;\n+\n+\tif (round_burst == current_burst) {\n+\t\tright = (aligned_acks_rcv * AVG_UNIT) / (aligned_acks_rcv - 1);\n+\t\tpr_debug(\"%llu [%s] last %lli us, first %lli us, acks %u round_burst %u current_burst %u\\n\",\n+\t\t\t NOW, __func__, ktime_to_us(*last_ack_time),\n+\t\t\t ktime_to_us(*first_ack_time), aligned_acks_rcv,\n+\t\t\t round_burst, current_burst);\n+\t} else {\n+\t\tright = current_burst;\n+\t\tleft *= AVG_UNIT;\n+\t\tleft = left / round_burst;\n+\t\tpr_debug(\"%llu [%s] last %lli us, first %lli us, small_round_burst %u\\n\",\n+\t\t\t NOW, __func__, ktime_to_us(*last_ack_time),\n+\t\t\t ktime_to_us(*first_ack_time), round_burst);\n+\t}\n+\n+\treturn ns_to_ktime((left * right) / AVG_UNIT);\n+}\n+\n+static ktime_t calculate_ack_train_disp(struct sock *sk,\n+\t\t\t\t\tconst struct rate_sample *rs,\n+\t\t\t\t\tu32 burst, u64 delta_rtt_us)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tktime_t ack_train_disp = ns_to_ktime(0);\n+\n+\tif (ktime_is_null(ca->first_ack_time) || ca->aligned_acks_rcv <= 1) {\n+\t\t/* We don't have the initial bound of the burst,\n+\t\t * or we don't have samples to do measurements\n+\t\t */\n+\t\tif (ktime_is_null(ca->previous_ack_t_disp))\n+\t\t\t/* do heuristic without saving anything */\n+\t\t\treturn heuristic_ack_train_disp(sk, rs, burst);\n+\n+\t\t/* Returning the previous value */\n+\t\treturn ca->previous_ack_t_disp;\n+\t}\n+\n+\t/* If we have a complete burst, the value returned by get_ack_train_disp\n+\t * is safe to use. Otherwise, it can be a bad approximation, so it's better\n+\t * to use the previous value. Of course, if we don't have such value,\n+\t * a bad approximation is better than nothing.\n+\t */\n+\tif (burst == ca->burst || ktime_is_null(ca->previous_ack_t_disp))\n+\t\tack_train_disp = get_ack_train_disp(&ca->last_ack_time,\n+\t\t\t\t\t\t &ca->first_ack_time,\n+\t\t\t\t\t\t ca->aligned_acks_rcv,\n+\t\t\t\t\t\t burst, ca->burst);\n+\telse\n+\t\treturn ca->previous_ack_t_disp;\n+\n+\tif (ktime_is_null(ack_train_disp)) {\n+\t\t/* Use the plain previous value */\n+\t\tpr_debug(\"%llu sport: %u [%s] use_plain previous_ack_train_disp %lli us, ack_train_disp %lli us\\n\",\n+\t\t\t NOW, SPORT(sk), __func__,\n+\t\t\t ktime_to_us(ca->previous_ack_t_disp),\n+\t\t\t ktime_to_us(ack_train_disp));\n+\t\treturn ca->previous_ack_t_disp;\n+\t}\n+\n+\t/* We have a real sample! */\n+\tca->heuristic_scale = 0;\n+\tca->previous_ack_t_disp = ack_train_disp;\n+\n+\tpr_debug(\"%llu sport: %u [%s] previous_ack_train_disp %lli us, final_ack_train_disp %lli us\\n\",\n+\t\t NOW, SPORT(sk), __func__, ktime_to_us(ca->previous_ack_t_disp),\n+\t\t ktime_to_us(ack_train_disp));\n+\n+\treturn ack_train_disp;\n+}\n+\n+static u32 calculate_avg_rtt(struct sock *sk)\n+{\n+\tconst struct wavetcp *ca = inet_csk_ca(sk);\n+\n+\t/* Why the if?\n+\t *\n+\t * a = (first_rtt - min_rtt) / first_rtt = 1 - (min_rtt/first_rtt)\n+\t *\n+\t * avg_rtt_0 = (1 - a) * first_rtt\n+\t * = (1 - (1 - (min_rtt/first_rtt))) * first_rtt\n+\t * = first_rtt - (first_rtt - min_rtt)\n+\t * = min_rtt\n+\t *\n+\t *\n+\t * And.. what happen in the else branch? We calculate first a (scaled by\n+\t * 1024), then do the substraction (1-a) by keeping in the consideration\n+\t * the scale, and in the end coming back to the result removing the\n+\t * scaling.\n+\t *\n+\t * We divide the equation\n+\t *\n+\t * AvgRtt = a * AvgRtt + (1-a)*Rtt\n+\t *\n+\t * in two part properly scaled, left and right, and then having a sum of\n+\t * the two parts to avoid (possible) overflow.\n+\t */\n+\tif (ca->avg_rtt == 0) {\n+\t\tpr_debug(\"%llu sport: %u [%s] returning min_rtt %u\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, ca->min_rtt);\n+\t\treturn ca->min_rtt;\n+\t} else if (ca->first_rtt > 0) {\n+\t\tu32 old_value = ca->avg_rtt;\n+\t\tu64 right;\n+\t\tu64 left;\n+\t\tu64 a;\n+\n+\t\ta = wavetcp_compute_weight(ca->first_rtt, ca->min_rtt);\n+\n+\t\tleft = (a * ca->avg_rtt) / AVG_UNIT;\n+\t\tright = ((AVG_UNIT - a) * ca->first_rtt) / AVG_UNIT;\n+\n+\t\tpr_debug(\"%llu sport: %u [%s] previous avg %u us, first_rtt %u us, \"\n+\t\t\t \"min %u us, a (shifted) %llu, calculated avg %u us\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, old_value, ca->first_rtt,\n+\t\t\t ca->min_rtt, a, (u32)left + (u32)right);\n+\t\treturn (u32)left + (u32)right;\n+\t}\n+\n+\tpr_debug(\"%llu sport: %u [%s] Can't calculate avg_rtt.\\n\",\n+\t\t NOW, SPORT(sk), __func__);\n+\treturn 0;\n+}\n+\n+static u64 calculate_delta_rtt(const struct wavetcp *ca)\n+{\n+\treturn ca->avg_rtt - ca->min_rtt;\n+}\n+\n+static void wavetcp_round_terminated(struct sock *sk,\n+\t\t\t\t const struct rate_sample *rs,\n+\t\t\t\t u32 burst)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tktime_t ack_train_disp;\n+\tu64 delta_rtt_us;\n+\tu32 avg_rtt;\n+\n+\tavg_rtt = calculate_avg_rtt(sk);\n+\tif (avg_rtt != 0)\n+\t\tca->avg_rtt = avg_rtt;\n+\n+\t/* If we have to wait, let's wait */\n+\tif (ca->stab_factor > 0) {\n+\t\t--ca->stab_factor;\n+\t\tpr_debug(\"%llu sport: %u [%s] reached burst %u, not applying (stab left: %u)\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, burst, ca->stab_factor);\n+\t\treturn;\n+\t}\n+\n+\tdelta_rtt_us = calculate_delta_rtt(ca);\n+\tack_train_disp = calculate_ack_train_disp(sk, rs, burst, delta_rtt_us);\n+\n+\tpr_debug(\"%llu sport: %u [%s] reached burst %u, drtt %llu, atd %lli\\n\",\n+\t\t NOW, SPORT(sk), __func__, burst, delta_rtt_us,\n+\t\t ktime_to_us(ack_train_disp));\n+\n+\t/* delta_rtt_us is in us, beta_ms in ms */\n+\tif (delta_rtt_us > beta_ms * USEC_PER_MSEC)\n+\t\twavetcp_adj_mode(sk, delta_rtt_us);\n+\telse\n+\t\twavetcp_tracking_mode(sk, delta_rtt_us, ack_train_disp);\n+}\n+\n+static void wavetcp_reset_round(struct wavetcp *ca)\n+{\n+\tca->first_ack_time = ns_to_ktime(0);\n+\tca->last_ack_time = ca->first_ack_time;\n+\tca->backup_first_ack_time_us = 0;\n+\tca->aligned_acks_rcv = 0;\n+\tca->first_rtt = 0;\n+}\n+\n+static void wavetcp_middle_round(struct sock *sk, ktime_t *last_ack_time,\n+\t\t\t\t const ktime_t *now)\n+{\n+\tpr_debug(\"%llu sport: %u [%s]\", NOW, SPORT(sk), __func__);\n+\t*last_ack_time = *now;\n+}\n+\n+static void wavetcp_begin_round(struct sock *sk, ktime_t *first_ack_time,\n+\t\t\t\tktime_t *last_ack_time, const ktime_t *now)\n+{\n+\tpr_debug(\"%llu sport: %u [%s]\", NOW, SPORT(sk), __func__);\n+\t*first_ack_time = *now;\n+\t*last_ack_time = *now;\n+\tpr_debug(\"%llu sport: %u [%s], first %lli\\n\", NOW, SPORT(sk),\n+\t\t __func__, ktime_to_us(*first_ack_time));\n+}\n+\n+static void wavetcp_rtt_measurements(struct sock *sk, s32 rtt_us,\n+\t\t\t\t s32 interval_us)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\tif (ca->backup_first_ack_time_us == 0 && interval_us > 0)\n+\t\tca->backup_first_ack_time_us = interval_us;\n+\n+\tif (rtt_us <= 0)\n+\t\treturn;\n+\n+\tca->previous_rtt = rtt_us;\n+\n+\t/* Check the first RTT in the round */\n+\tif (ca->first_rtt == 0) {\n+\t\tca->first_rtt = rtt_us;\n+\n+\t\t/* Check the minimum RTT we have seen */\n+\t\tif (rtt_us < ca->min_rtt) {\n+\t\t\tca->min_rtt = rtt_us;\n+\t\t\tpr_debug(\"%llu sport: %u [%s] min rtt %u\\n\", NOW,\n+\t\t\t\t SPORT(sk), __func__, rtt_us);\n+\t\t}\n+\n+\t\t/* Check the maximum RTT we have seen */\n+\t\tif (rtt_us > ca->max_rtt) {\n+\t\t\tca->max_rtt = rtt_us;\n+\t\t\tpr_debug(\"%llu sport: %u [%s] max rtt %u\\n\", NOW,\n+\t\t\t\t SPORT(sk), __func__, rtt_us);\n+\t\t}\n+\t}\n+}\n+\n+static void wavetcp_end_round(struct sock *sk, const struct rate_sample *rs,\n+\t\t\t const ktime_t *now)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tstruct wavetcp_burst_hist *tmp;\n+\tstruct list_head *pos;\n+\n+\tpr_debug(\"%llu [%s]\", NOW, __func__);\n+\tpos = ca->history->list.next;\n+\ttmp = list_entry(pos, struct wavetcp_burst_hist, list);\n+\n+\tif (!tmp || ca->pkts_acked < tmp->size) {\n+\t\tpr_debug(\"%llu sport: %u [%s] WARNING: Something wrong\\n\",\n+\t\t\t NOW, SPORT(sk), __func__);\n+\t\treturn;\n+\t}\n+\n+\t/* The position we are is end_round, but if the following is false,\n+\t * in reality we are at the beginning of the next round,\n+\t * and the previous middle was an end. In the other case,\n+\t * update last_ack_time with the current time, and the number of\n+\t * received acks.\n+\t */\n+\tif (rs->rtt_us >= ca->previous_rtt) {\n+\t\t++ca->aligned_acks_rcv;\n+\t\tca->last_ack_time = *now;\n+\t}\n+\n+\t/* If the round terminates without a sample of RTT, use the average */\n+\tif (ca->first_rtt == 0) {\n+\t\tca->first_rtt = ca->avg_rtt;\n+\t\tpr_debug(\"%llu sport: %u [%s] Using the average value for first_rtt %u\\n\",\n+\t\t NOW, SPORT(sk), __func__, ca->first_rtt);\n+\t}\n+\n+\tif (tmp->size > min_burst) {\n+\t\twavetcp_round_terminated(sk, rs, tmp->size);\n+\t} else {\n+\t\tpr_debug(\"%llu sport: %u [%s] skipping burst of %u segments\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, tmp->size);\n+\t}\n+\n+\t/* Consume the burst history if it's a cumulative ACK for many bursts */\n+\twhile (tmp && ca->pkts_acked >= tmp->size) {\n+\t\tca->pkts_acked -= tmp->size;\n+\n+\t\t/* Delete the burst from the history */\n+\t\tpr_debug(\"%llu sport: %u [%s] deleting burst of %u segments\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, tmp->size);\n+\t\tlist_del(pos);\n+\t\tkmem_cache_free(ca->cache, tmp);\n+\n+\t\t/* Take next burst */\n+\t\tpos = ca->history->list.next;\n+\t\ttmp = list_entry(pos, struct wavetcp_burst_hist, list);\n+\t}\n+\n+\twavetcp_reset_round(ca);\n+\n+\t/* We have to emulate a beginning of the round in case this RTT is less than\n+\t * the previous one\n+\t */\n+\tif (rs->rtt_us > 0 && rs->rtt_us < ca->previous_rtt) {\n+\t\tpr_debug(\"%llu sport: %u [%s] Emulating the beginning, set the first_rtt to %u\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, ca->first_rtt);\n+\n+\t\t/* Emulate the beginning of the round using as \"now\"\n+\t\t * the time of the previous ACK\n+\t\t */\n+\t\twavetcp_begin_round(sk, &ca->first_ack_time,\n+\t\t\t\t &ca->last_ack_time, now);\n+\t\t/* Emulate a middle round with the current time */\n+\t\twavetcp_middle_round(sk, &ca->last_ack_time, now);\n+\n+\t\t/* Take the measurements for the RTT. If we are not emulating a\n+\t\t * beginning, then let the real begin to take it\n+\t\t */\n+\t\twavetcp_rtt_measurements(sk, rs->rtt_us, rs->interval_us);\n+\n+\t\t/* Emulate the reception of one aligned ack, this */\n+\t\tca->aligned_acks_rcv = 1;\n+\t} else if (rs->rtt_us > 0) {\n+\t\tca->previous_rtt = rs->rtt_us;\n+\t}\n+}\n+\n+static void wavetcp_cong_control(struct sock *sk, const struct rate_sample *rs)\n+{\n+\tktime_t now = ktime_get();\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tstruct wavetcp_burst_hist *tmp;\n+\tstruct list_head *pos;\n+\n+\tif (!test_flag(ca->flags, FLAG_INIT))\n+\t\treturn;\n+\n+\tpr_debug(\"%llu sport: %u [%s] prior_delivered %u, delivered %i, interval_us %li, \"\n+\t\t \"rtt_us %li, losses %i, ack_sack %u, prior_in_flight %u, is_app %i,\"\n+\t\t \" is_retrans %i\\n\", NOW, SPORT(sk), __func__,\n+\t\t rs->prior_delivered, rs->delivered, rs->interval_us,\n+\t\t rs->rtt_us, rs->losses, rs->acked_sacked, rs->prior_in_flight,\n+\t\t rs->is_app_limited, rs->is_retrans);\n+\n+\tpos = ca->history->list.next;\n+\ttmp = list_entry(pos, struct wavetcp_burst_hist, list);\n+\n+\tif (!tmp)\n+\t\treturn;\n+\n+\t/* Train management.*/\n+\tca->pkts_acked += rs->acked_sacked;\n+\n+\tif (ca->previous_rtt < rs->rtt_us)\n+\t\tpr_debug(\"%llu sport: %u [%s] previous < rtt: %u < %li\",\n+\t\t\t NOW, SPORT(sk), __func__, ca->previous_rtt,\n+\t\t\t rs->rtt_us);\n+\telse\n+\t\tpr_debug(\"%llu sport: %u [%s] previous >= rtt: %u >= %li\",\n+\t\t\t NOW, SPORT(sk), __func__, ca->previous_rtt,\n+\t\t\t rs->rtt_us);\n+\n+\t/* We have three possibilities: beginning, middle, end.\n+\t * - Beginning: is the moment in which we receive the first ACK for\n+\t * the round\n+\t * - Middle: we are receiving ACKs but still not as many to cover a\n+\t * complete burst\n+\t * - End: the other end ACKed sufficient bytes to declare a round\n+\t * completed\n+\t */\n+\tif (ca->pkts_acked < tmp->size) {\n+\t\t/* The way to discriminate between beginning and end is thanks\n+\t\t * to ca->first_ack_time, which is zeroed at the end of a run\n+\t\t */\n+\t\tif (ktime_is_null(ca->first_ack_time)) {\n+\t\t\twavetcp_begin_round(sk, &ca->first_ack_time,\n+\t\t\t\t\t &ca->last_ack_time, &now);\n+\t\t\t++ca->aligned_acks_rcv;\n+\t\t\tca->backup_pkts_acked = ca->pkts_acked - rs->acked_sacked;\n+\n+\t\t\tpr_debug(\"%llu sport: %u [%s] first ack of the train\\n\",\n+\t\t\t\t NOW, SPORT(sk), __func__);\n+\t\t} else {\n+\t\t\tif (rs->rtt_us >= ca->previous_rtt) {\n+\t\t\t\twavetcp_middle_round(sk, &ca->last_ack_time, &now);\n+\t\t\t\t++ca->aligned_acks_rcv;\n+\t\t\t\tpr_debug(\"%llu sport: %u [%s] middle aligned ack (tot %u)\\n\",\n+\t\t\t\t\t NOW, SPORT(sk), __func__,\n+\t\t\t\t\t ca->aligned_acks_rcv);\n+\t\t\t} else if (rs->rtt_us > 0) {\n+\t\t\t\t/* This is the real round beginning! */\n+\t\t\t\tca->aligned_acks_rcv = 1;\n+\t\t\t\tca->pkts_acked = ca->backup_pkts_acked + rs->acked_sacked;\n+\n+\t\t\t\twavetcp_begin_round(sk, &ca->first_ack_time,\n+\t\t\t\t\t\t &ca->last_ack_time, &now);\n+\n+\t\t\t\tpr_debug(\"%llu sport: %u [%s] changed beginning to NOW\\n\",\n+\t\t\t\t\t NOW, SPORT(sk), __func__);\n+\t\t\t}\n+\t\t}\n+\n+\t\t/* Take RTT measurements for min and max measurments. For the\n+\t\t * end of the burst, do it manually depending on the case\n+\t\t */\n+\t\twavetcp_rtt_measurements(sk, rs->rtt_us, rs->interval_us);\n+\t} else {\n+\t\twavetcp_end_round(sk, rs, &now);\n+\t}\n+}\n+\n+/* Invoked each time we receive an ACK. Obviously, this function also gets\n+ * called when we receive the SYN-ACK, but we ignore it thanks to the\n+ * FLAG_INIT flag.\n+ *\n+ * We close the cwnd of the amount of segments acked, because we don't like\n+ * sending out segments if the timer is not expired. Without doing this, we\n+ * would end with cwnd - in_flight > 0.\n+ */\n+static void wavetcp_acked(struct sock *sk, const struct ack_sample *sample)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tstruct tcp_sock *tp = tcp_sk(sk);\n+\n+\tif (!test_flag(ca->flags, FLAG_INIT))\n+\t\treturn;\n+\n+\tif (tp->snd_cwnd < sample->pkts_acked) {\n+\t\t/* We sent some scattered segments, so the burst segments and\n+\t\t * the ACK we get is not aligned.\n+\t\t */\n+\t\tpr_debug(\"%llu sport: %u [%s] delta_seg %i\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, ca->delta_segments);\n+\n+\t\tca->delta_segments += sample->pkts_acked - tp->snd_cwnd;\n+\t}\n+\n+\tpr_debug(\"%llu sport: %u [%s] pkts_acked %u, rtt_us %i, in_flight %u \"\n+\t\t \", cwnd %u, seq ack %u, delta %i\\n\", NOW, SPORT(sk),\n+\t\t __func__, sample->pkts_acked, sample->rtt_us,\n+\t\t sample->in_flight, tp->snd_cwnd, tp->snd_una,\n+\t\t ca->delta_segments);\n+\n+\t/* Brutally set the cwnd in order to not let segment out */\n+\ttp->snd_cwnd = tcp_packets_in_flight(tp);\n+}\n+\n+/* The TCP informs us that the timer is expired (or has never been set). We can\n+ * infer the latter by the FLAG_STARTED flag: if it's false, don't increase the\n+ * cwnd, because it is at its default value (init_burst) and we still have to\n+ * transmit the first burst.\n+ */\n+static void wavetcp_timer_expired(struct sock *sk)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tstruct tcp_sock *tp = tcp_sk(sk);\n+\tu32 current_burst = ca->burst;\n+\n+\tif (!test_flag(ca->flags, FLAG_START) ||\n+\t !test_flag(ca->flags, FLAG_INIT)) {\n+\t\tpr_debug(\"%llu sport: %u [%s] returning because of flags, leaving cwnd %u\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, tp->snd_cwnd);\n+\t\treturn;\n+\t}\n+\n+\tpr_debug(\"%llu sport: %u [%s] starting with delta %u current_burst %u\\n\",\n+\t\t NOW, SPORT(sk), __func__, ca->delta_segments, current_burst);\n+\n+\tif (ca->delta_segments < 0) {\n+\t\t/* In the previous round, we sent more than the allowed burst,\n+\t\t * so reduce the current burst.\n+\t\t */\n+\t\tBUG_ON(current_burst > ca->delta_segments);\n+\t\tcurrent_burst += ca->delta_segments; /* please *reduce* */\n+\n+\t\t/* Right now, we should send \"current_burst\" segments out */\n+\n+\t\tif (tcp_packets_in_flight(tp) > tp->snd_cwnd) {\n+\t\t\t/* For some reasons (e.g., tcp loss probe)\n+\t\t\t * we sent something outside the allowed window.\n+\t\t\t * Add the amount of segments into the burst, in order\n+\t\t\t * to effectively send the previous \"current_burst\"\n+\t\t\t * segments, but without touching delta_segments.\n+\t\t\t */\n+\t\t\tu32 diff = tcp_packets_in_flight(tp) - tp->snd_cwnd;\n+\n+\t\t\tcurrent_burst += diff;\n+\t\t\tpr_debug(\"%llu sport: %u [%s] adding %u to balance \"\n+\t\t\t\t \"segments sent out of window\", NOW,\n+\t\t\t\t SPORT(sk), __func__, diff);\n+\t\t}\n+\t}\n+\n+\tca->delta_segments = current_burst;\n+\tpr_debug(\"%llu sport: %u [%s] setting delta_seg %u current burst %u\\n\",\n+\t\t NOW, SPORT(sk), __func__, ca->delta_segments, current_burst);\n+\n+\tif (current_burst < min_burst) {\n+\t\tpr_debug(\"%llu sport: %u [%s] WARNING !! not min_burst\",\n+\t\t\t NOW, SPORT(sk), __func__);\n+\t\tca->delta_segments += min_burst - current_burst;\n+\t\tcurrent_burst = min_burst;\n+\t}\n+\n+\ttp->snd_cwnd += current_burst;\n+\tset_flag(&ca->flags, FLAG_SAVE);\n+\n+\tpr_debug(\"%llu sport: %u [%s], increased window of %u segments, \"\n+\t\t \"total %u, delta %i, in_flight %u\\n\", NOW, SPORT(sk),\n+\t\t __func__, ca->burst, tp->snd_cwnd, ca->delta_segments,\n+\t\t tcp_packets_in_flight(tp));\n+\n+\tif (tp->snd_cwnd - tcp_packets_in_flight(tp) > current_burst) {\n+\t\tpr_debug(\"%llu sport: %u [%s] WARNING! \"\n+\t\t\t \" cwnd %u, in_flight %u, current burst %u\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, tp->snd_cwnd,\n+\t\t\t tcp_packets_in_flight(tp), current_burst);\n+\t}\n+}\n+\n+static u64 wavetcp_get_timer(struct sock *sk)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\tu64 timer;\n+\n+\tBUG_ON(!test_flag(ca->flags, FLAG_INIT));\n+\n+\ttimer = min_t(u64,\n+\t\t ca->tx_timer * NSEC_PER_USEC,\n+\t\t init_timer_ms * NSEC_PER_MSEC);\n+\n+\t/* Very low pacing rate. Ideally, we don't need pacing. */\n+\tsk->sk_max_pacing_rate = 1;\n+\n+\tpr_debug(\"%llu sport: %u [%s] returning timer of %llu ns\\n\",\n+\t\t NOW, SPORT(sk), __func__, timer);\n+\n+\treturn timer;\n+}\n+\n+static void wavetcp_segment_sent(struct sock *sk, u32 sent)\n+{\n+\tstruct tcp_sock *tp = tcp_sk(sk);\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\tif (!test_flag(ca->flags, FLAG_START)) {\n+\t\tpr_debug(\"%llu sport: %u [%s] !START\\n\",\n+\t\t\t NOW, SPORT(sk), __func__);\n+\t\treturn;\n+\t}\n+\n+\tif (test_flag(ca->flags, FLAG_SAVE) && sent > 0) {\n+\t\twavetcp_insert_burst(ca, sent);\n+\t\tclear_flag(&ca->flags, FLAG_SAVE);\n+\t} else {\n+\t\tpr_debug(\"%llu sport: %u [%s] not saving burst, sent %u\\n\",\n+\t\t\t NOW, SPORT(sk), __func__, sent);\n+\t}\n+\n+\tif (sent > ca->burst) {\n+\t\tpr_debug(\"%llu sport: %u [%s] WARNING! sent %u, burst %u\"\n+\t\t \" cwnd %u delta_seg %i\\n, TSO very probable\", NOW,\n+\t\t SPORT(sk), __func__, sent, ca->burst,\n+\t\t tp->snd_cwnd, ca->delta_segments);\n+\t}\n+\n+\tca->delta_segments -= sent;\n+\n+\tif (ca->delta_segments >= 0 &&\n+\t ca->burst > sent &&\n+\t tcp_packets_in_flight(tp) <= tp->snd_cwnd) {\n+\t\t/* Reduce the cwnd accordingly, because we didn't sent enough\n+\t\t * to cover it (we are app limited probably)\n+\t\t */\n+\t\tu32 diff = ca->burst - sent;\n+\n+\t\tif (tp->snd_cwnd >= diff)\n+\t\t\ttp->snd_cwnd -= diff;\n+\t\telse\n+\t\t\ttp->snd_cwnd = 0;\n+\t\tpr_debug(\"%llu sport: %u [%s] reducing cwnd by %u, value %u\\n\",\n+\t\t\t NOW, SPORT(sk), __func__,\n+\t\t\t ca->burst - sent, tp->snd_cwnd);\n+\t}\n+}\n+\n+static size_t wavetcp_get_info(struct sock *sk, u32 ext, int *attr,\n+\t\t\t union tcp_cc_info *info)\n+{\n+\tpr_debug(\"%llu [%s] ext=%u\", NOW, __func__, ext);\n+\n+\tif (ext & (1 << (INET_DIAG_WAVEINFO - 1))) {\n+\t\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\t\tmemset(&info->wave, 0, sizeof(info->wave));\n+\t\tinfo->wave.tx_timer\t= ca->tx_timer;\n+\t\tinfo->wave.burst\t= ca->burst;\n+\t\tinfo->wave.previous_ack_t_disp = ca->previous_ack_t_disp;\n+\t\tinfo->wave.min_rtt\t= ca->min_rtt;\n+\t\tinfo->wave.avg_rtt\t= ca->avg_rtt;\n+\t\tinfo->wave.max_rtt\t= ca->max_rtt;\n+\t\t*attr = INET_DIAG_WAVEINFO;\n+\t\treturn sizeof(info->wave);\n+\t}\n+\treturn 0;\n+}\n+\n+static u32 wavetcp_sndbuf_expand(struct sock *sk)\n+{\n+\treturn 10;\n+}\n+\n+static u32 wavetcp_get_segs_per_round(struct sock *sk)\n+{\n+\tstruct wavetcp *ca = inet_csk_ca(sk);\n+\n+\treturn ca->burst;\n+}\n+\n+static struct tcp_congestion_ops wave_cong_tcp __read_mostly = {\n+\t.init\t\t\t= wavetcp_init,\n+\t.get_info\t\t= wavetcp_get_info,\n+\t.release\t\t= wavetcp_release,\n+\t.ssthresh\t\t= wavetcp_recalc_ssthresh,\n+/*\t.cong_avoid\t\t= wavetcp_cong_avoid, */\n+\t.cong_control\t\t= wavetcp_cong_control,\n+\t.set_state\t\t= wavetcp_state,\n+\t.undo_cwnd\t\t= wavetcp_undo_cwnd,\n+\t.cwnd_event\t\t= wavetcp_cwnd_event,\n+\t.pkts_acked\t\t= wavetcp_acked,\n+\t.sndbuf_expand\t\t= wavetcp_sndbuf_expand,\n+\t.get_pacing_time\t= wavetcp_get_timer,\n+\t.pacing_timer_expired\t= wavetcp_timer_expired,\n+\t.get_segs_per_round\t= wavetcp_get_segs_per_round,\n+\t.segments_sent\t\t= wavetcp_segment_sent,\n+\t.owner\t\t\t= THIS_MODULE,\n+\t.name\t\t\t= \"wave\",\n+};\n+\n+static int __init wavetcp_register(void)\n+{\n+\tBUILD_BUG_ON(sizeof(struct wavetcp) > ICSK_CA_PRIV_SIZE);\n+\n+\treturn tcp_register_congestion_control(&wave_cong_tcp);\n+}\n+\n+static void __exit wavetcp_unregister(void)\n+{\n+\ttcp_unregister_congestion_control(&wave_cong_tcp);\n+}\n+\n+module_init(wavetcp_register);\n+module_exit(wavetcp_unregister);\n+\n+MODULE_AUTHOR(\"Natale Patriciello\");\n+MODULE_LICENSE(\"GPL\");\n+MODULE_DESCRIPTION(\"WAVE TCP\");\n+MODULE_VERSION(\"0.2\");\n", "prefixes": [ "RFC", "v2", "5/5" ] }