From patchwork Thu Aug 10 15:38:06 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ilya Maximets
X-Patchwork-Id: 800242
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 3xSsnB31s2z9t2r
for ;
Fri, 11 Aug 2017 01:41:34 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id C13F4A7A;
Thu, 10 Aug 2017 15:38:33 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 4975499F
for ; Thu, 10 Aug 2017 15:38:32 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com
[210.118.77.11])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id ECA61350
for ; Thu, 10 Aug 2017 15:38:31 +0000 (UTC)
Received: from eucas1p2.samsung.com (unknown [182.198.249.207])
by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id
20170810153829euoutp01edc003ca0166a470096cec42ca935edd~ZhjRHNQSF0436604366euoutp01b;
Thu, 10 Aug 2017 15:38:29 +0000 (GMT)
Received: from eusmges4.samsung.com (unknown [203.254.199.244]) by
eucas1p1.samsung.com (KnoxPortal) with ESMTP id
20170810153829eucas1p1215e211ae54c02a1da21261fa4a760c9~ZhjQcboJ11169411694eucas1p1M;
Thu, 10 Aug 2017 15:38:29 +0000 (GMT)
Received: from eucas1p1.samsung.com ( [182.198.249.206]) by
eusmges4.samsung.com (EUCPMTA) with SMTP id 14.08.12944.4FD7C895;
Thu, 10 Aug 2017 16:38:28 +0100 (BST)
Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by
eucas1p2.samsung.com (KnoxPortal) with ESMTP id
20170810153828eucas1p24d336949b681def57514e9c1eb3e6168~ZhjPi0pVg0083300833eucas1p22;
Thu, 10 Aug 2017 15:38:28 +0000 (GMT)
X-AuditID: cbfec7f4-f79ab6d000003290-e8-598c7df4cec2
Received: from eusync4.samsung.com ( [203.254.199.214]) by
eusmgms2.samsung.com (EUCPMTA) with SMTP id 23.34.20118.4FD7C895;
Thu, 10 Aug 2017 16:38:28 +0100 (BST)
Received: from imaximets.rnd.samsung.ru ([106.109.129.180]) by
eusync4.samsung.com (Oracle Communications Messaging Server
7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id
<0OUH005CN7FL8NB0@eusync4.samsung.com>;
Thu, 10 Aug 2017 16:38:28 +0100 (BST)
From: Ilya Maximets
To: ovs-dev@openvswitch.org, Bhanuprakash Bodireddy
Date: Thu, 10 Aug 2017 18:38:06 +0300
Message-id: <1502379486-1568-5-git-send-email-i.maximets@samsung.com>
X-Mailer: git-send-email 2.7.4
In-reply-to: <1502379486-1568-1-git-send-email-i.maximets@samsung.com>
X-Brightmail-Tracker:
H4sIAAAAAAAAA+NgFrrHIsWRmVeSWpSXmKPExsWy7djPc7pfansiDZ7vNLRY/YvTYuczZYtX
kxsYLVr6ZzJbPH+xkNnizpWfbBbTPt9mt7jS/pPdYu2hD+wWcz89Z3Tg8li85yWTx7Ob/xk9
nl/rYfF4v+8qm0ffllWMHu/mv2ULYIvisklJzcksSy3St0vgynj55jtLQXN2xYLd+9gbGD96
djFyckgImEjcb+thhLDFJC7cW8/WxcjFISSwlFHi05vJrCAJIYHPjBLn2jNhGpZuaGeCKFrG
KHFkymtmCKeZSWLBlsdgo9gEdCROrT4CZosIREg8+D6LHaSIWWAbk8SSx5PAEsICLhIbftxh
A7FZBFQlfl5vBrN5BVwlruy4wQSxTk7i5rlOZhCbU8BNYuKeiSwggyQEutklFi05DVTEAeTI
Smw6wAxR7yLxZ8VudghbWOLV8S1QtozE5cndUL3NjBINqy4xQjgTGCW+NC+H2mYvcermVTCb
WYBPYtK26cwQC3glOtqEIEo8JGbsnMcGYTtKXN76gBXi/ZmMEj9WfWWawCizgJFhFaNIamlx
bnpqsYlecWJucWleul5yfu4mRmCsn/53/MsOxsXHrA4xCnAwKvHwJoh2RwqxJpYVV+YeYpTg
YFYS4e2o7IkU4k1JrKxKLcqPLyrNSS0+xCjNwaIkzmsb1RYpJJCeWJKanZpakFoEk2Xi4JRq
YHQJ0D6b/3jCJrfoFXM5V3MtlnrxZ/tKwXvrplkfXVbyaf/Ro0mKAXf/cOWq+06caP0+JkXo
9nWD+T6Bc5uF3J8vvSG775B5hfEZ8Tkeub4Rxd6XbGMuWh/qdnzbo3eih7/qgahTo/CkSp3D
b9XUq189qbbzvCBkr+F3bOqM23s2xPyvtfLtllBiKc5INNRiLipOBACXy4t78QIAAA==
X-Brightmail-Tracker:
H4sIAAAAAAAAA+NgFvrLLMWRmVeSWpSXmKPExsVy+t/xa7pfansiDf7NYLVY/YvTYuczZYtX
kxsYLVr6ZzJbPH+xkNnizpWfbBbTPt9mt7jS/pPdYu2hD+wWcz89Z3Tg8li85yWTx7Ob/xk9
nl/rYfF4v+8qm0ffllWMHu/mv2ULYItys8lITUxJLVJIzUvOT8nMS7dVCg1x07VQUshLzE21
VYrQ9Q0JUlIoS8wpBfKMDNCAg3OAe7CSvl2CW8bLN99ZCpqzKxbs3sfewPjRs4uRk0NCwERi
6YZ2JghbTOLCvfVsXYxcHEICSxglrh87zA7htDJJHF3dyghSxSagI3Fq9REwW0QgQqJlznpG
kCJmgW1MEo8vfGIBSQgLuEhs+HGHDcRmEVCV+Hm9GczmFXCVuLLjBtQ6OYmb5zqZQWxOATeJ
iXsmgvUKAdWsPPeYaQIj7wJGhlWMIqmlxbnpucVGesWJucWleel6yfm5mxiBQb/t2M8tOxi7
3gUfYhTgYFTi4U0Q7Y4UYk0sK67MPcQowcGsJMLbUdkTKcSbklhZlVqUH19UmpNafIjRFOio
icxSosn5wIjMK4k3NDE0tzQ0MrawMDcyUhLnVb/cFCkkkJ5YkpqdmlqQWgTTx8TBKdXAaJe5
ntmla1nEi8unD29Sddvt+bCo4cnJ6euuH0+rqzz30PXLrB32GbGRTvlfHO+zb/Ha28fYu7Jc
ZnZ9MWP1Sm3Z5vtmabs/2p81nB+8qeRb9p561bsuIr5dV2Q+7ztboHJB3X7Jx3q+KyfkGhSP
NW9OlNef/jR0waE966/EfDJjOJRrfuDvDiWW4oxEQy3mouJEAK2yX2CQAgAA
X-MTR: 20000000000000000@CPGS
X-CMS-MailID: 20170810153828eucas1p24d336949b681def57514e9c1eb3e6168
X-Msg-Generator: CA
X-Sender-IP: 182.198.249.180
X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
=?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?=
X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
=?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?=
X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?=
CMS-TYPE: 201P
X-CMS-RootMailID: 20170810153828eucas1p24d336949b681def57514e9c1eb3e6168
X-RootMTR: 20170810153828eucas1p24d336949b681def57514e9c1eb3e6168
References: <1502379486-1568-1-git-send-email-i.maximets@samsung.com>
Cc: Heetae Ahn ,
Ilya Maximets
Subject: [ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output
batching.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
This allows to collect packets from more than one RX burst
and send them together with a configurable maximum latency.
'other_config:output-max-latency' can be used to configure
time that a packet can wait in output batch for sending.
Signed-off-by: Ilya Maximets
---
Notes:
* This is an RFC and should not be used for performance testing.
* Millisecond granularity is used for now. Can be easily switched
to use microseconds instead.
lib/dpif-netdev.c | 121 ++++++++++++++++++++++++++++++++++++++++++---------
vswitchd/vswitch.xml | 15 +++++++
2 files changed, 115 insertions(+), 21 deletions(-)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index dcf55f3..0d78ae4 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -85,6 +85,9 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
#define MAX_RECIRC_DEPTH 5
DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
+/* Use instant packet send by default. */
+#define DEFAULT_OUTPUT_MAX_LATENCY 0
+
/* Configuration parameters. */
enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
enum { MAX_METERS = 65536 }; /* Maximum number of meters. */
@@ -262,6 +265,9 @@ struct dp_netdev {
struct hmap ports;
struct seq *port_seq; /* Incremented whenever a port changes. */
+ /* The time that a packet can wait in output batch for sending. */
+ atomic_uint32_t output_max_latency;
+
/* Meters. */
struct ovs_mutex meter_locks[N_METER_LOCKS];
struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
@@ -502,6 +508,7 @@ struct tx_port {
int qid;
long long last_used;
struct hmap_node node;
+ long long output_time;
struct dp_packet_batch output_pkts;
};
@@ -574,6 +581,9 @@ struct dp_netdev_pmd_thread {
* than 'cmap_count(dp->poll_threads)'. */
uint32_t static_tx_qid;
+ /* Number of filled output batches. */
+ int n_output_batches;
+
struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and 'tx_ports'. */
/* List of rx queues to poll. */
struct hmap poll_list OVS_GUARDED;
@@ -669,9 +679,9 @@ static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread *pmd,
struct rxq_poll *poll)
OVS_REQUIRES(pmd->port_mutex);
-static void
+static int
dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd,
- long long now);
+ long long now, bool force);
static void reconfigure_datapath(struct dp_netdev *dp)
OVS_REQUIRES(dp->port_mutex);
static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
@@ -1193,6 +1203,7 @@ create_dp_netdev(const char *name, const struct dpif_class *class,
conntrack_init(&dp->conntrack);
atomic_init(&dp->emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
+ atomic_init(&dp->output_max_latency, DEFAULT_OUTPUT_MAX_LATENCY);
cmap_init(&dp->poll_threads);
@@ -2858,7 +2869,7 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
dp_packet_batch_init_packet(&pp, execute->packet);
dp_netdev_execute_actions(pmd, &pp, false, execute->flow,
execute->actions, execute->actions_len, now);
- dp_netdev_pmd_flush_output_packets(pmd, now);
+ dp_netdev_pmd_flush_output_packets(pmd, now, true);
if (pmd->core_id == NON_PMD_CORE_ID) {
ovs_mutex_unlock(&dp->non_pmd_mutex);
@@ -2907,6 +2918,16 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
smap_get_ullong(other_config, "emc-insert-inv-prob",
DEFAULT_EM_FLOW_INSERT_INV_PROB);
uint32_t insert_min, cur_min;
+ uint32_t output_max_latency, cur_max_latency;
+
+ output_max_latency = smap_get_int(other_config, "output-max-latency",
+ DEFAULT_OUTPUT_MAX_LATENCY);
+ atomic_read_relaxed(&dp->output_max_latency, &cur_max_latency);
+ if (output_max_latency != cur_max_latency) {
+ atomic_store_relaxed(&dp->output_max_latency, output_max_latency);
+ VLOG_INFO("Output maximum latency set to %"PRIu32" ms",
+ output_max_latency);
+ }
if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
free(dp->pmd_cmask);
@@ -3107,11 +3128,12 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
non_atomic_ullong_add(&pmd->cycles.n[type], interval);
}
-static void
+static int
dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd,
struct tx_port *p, long long now)
{
int tx_qid;
+ int output_cnt;
bool dynamic_txqs;
dynamic_txqs = p->port->dynamic_txqs;
@@ -3121,21 +3143,39 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd,
tx_qid = pmd->static_tx_qid;
}
+ output_cnt = dp_packet_batch_size(&p->output_pkts);
netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs);
dp_packet_batch_init(&p->output_pkts);
+
+ if (output_cnt) {
+ ovs_assert(pmd->n_output_batches > 0);
+ pmd->n_output_batches--;
+ }
+ return output_cnt;
}
-static void
+static int
dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd,
- long long now)
+ long long now, bool force)
{
struct tx_port *p;
+ int output_cnt = 0;
+
+ if (!pmd->n_output_batches) {
+ return 0;
+ }
+
+ if (!now) {
+ now = time_msec();
+ }
HMAP_FOR_EACH (p, node, &pmd->send_port_cache) {
- if (!dp_packet_batch_is_empty(&p->output_pkts)) {
- dp_netdev_pmd_flush_output_on_port(pmd, p, now);
+ if (!dp_packet_batch_is_empty(&p->output_pkts)
+ && (force || p->output_time <= now)) {
+ output_cnt += dp_netdev_pmd_flush_output_on_port(pmd, p, now);
}
}
+ return output_cnt;
}
static int
@@ -3145,7 +3185,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
{
struct dp_packet_batch batch;
int error;
- int batch_cnt = 0;
+ int batch_cnt = 0, output_cnt = 0;
dp_packet_batch_init(&batch);
error = netdev_rxq_recv(rx, &batch);
@@ -3156,7 +3196,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
batch_cnt = batch.count;
dp_netdev_input(pmd, &batch, port_no, now);
- dp_netdev_pmd_flush_output_packets(pmd, now);
+ output_cnt = dp_netdev_pmd_flush_output_packets(pmd, now, false);
} else if (error != EAGAIN && error != EOPNOTSUPP) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
@@ -3164,7 +3204,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
netdev_rxq_get_name(rx), ovs_strerror(error));
}
- return batch_cnt;
+ return batch_cnt + output_cnt;
}
static struct tx_port *
@@ -3691,7 +3731,8 @@ dpif_netdev_run(struct dpif *dpif)
struct dp_netdev *dp = get_dp_netdev(dpif);
struct dp_netdev_pmd_thread *non_pmd;
uint64_t new_tnl_seq;
- int process_packets = 0;
+ int process_packets;
+ bool need_to_flush = true;
ovs_mutex_lock(&dp->port_mutex);
non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
@@ -3707,12 +3748,25 @@ dpif_netdev_run(struct dpif *dpif)
dp_netdev_process_rxq_port(non_pmd,
port->rxqs[i].rx,
port->port_no);
- cycles_count_intermediate(non_pmd, process_packets ?
- PMD_CYCLES_PROCESSING
- : PMD_CYCLES_IDLE);
+ cycles_count_intermediate(non_pmd, process_packets
+ ? PMD_CYCLES_PROCESSING
+ : PMD_CYCLES_IDLE);
+ if (process_packets) {
+ need_to_flush = false;
+ }
}
}
}
+ if (need_to_flush) {
+ /* We didn't receive anything in the process loop.
+ * Check if we need to send something. */
+ process_packets = dp_netdev_pmd_flush_output_packets(non_pmd,
+ 0, false);
+ cycles_count_intermediate(non_pmd, process_packets
+ ? PMD_CYCLES_PROCESSING
+ : PMD_CYCLES_IDLE);
+ }
+
cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
ovs_mutex_unlock(&dp->non_pmd_mutex);
@@ -3764,6 +3818,8 @@ pmd_free_cached_ports(struct dp_netdev_pmd_thread *pmd)
{
struct tx_port *tx_port_cached;
+ /* Flush all the queued packets. */
+ dp_netdev_pmd_flush_output_packets(pmd, 0, true);
/* Free all used tx queue ids. */
dpif_netdev_xps_revalidate_pmd(pmd, 0, true);
@@ -3860,7 +3916,6 @@ pmd_thread_main(void *f_)
bool exiting;
int poll_cnt;
int i;
- int process_packets = 0;
poll_list = NULL;
@@ -3890,6 +3945,9 @@ reload:
cycles_count_start(pmd);
for (;;) {
+ int process_packets;
+ bool need_to_flush = true;
+
for (i = 0; i < poll_cnt; i++) {
process_packets =
dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
@@ -3897,6 +3955,19 @@ reload:
cycles_count_intermediate(pmd,
process_packets ? PMD_CYCLES_PROCESSING
: PMD_CYCLES_IDLE);
+ if (process_packets) {
+ need_to_flush = false;
+ }
+ }
+
+ if (need_to_flush) {
+ /* We didn't receive anything in the process loop.
+ * Check if we need to send something. */
+ process_packets = dp_netdev_pmd_flush_output_packets(pmd,
+ 0, false);
+ cycles_count_intermediate(pmd,
+ process_packets ? PMD_CYCLES_PROCESSING
+ : PMD_CYCLES_IDLE);
}
if (lc++ > 1024) {
@@ -4336,6 +4407,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
pmd->core_id = core_id;
pmd->numa_id = numa_id;
pmd->need_reload = false;
+ pmd->n_output_batches = 0;
ovs_refcount_init(&pmd->ref_cnt);
latch_init(&pmd->exit_latch);
@@ -4521,6 +4593,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
tx->port = port;
tx->qid = -1;
+ tx->output_time = 0LL;
dp_packet_batch_init(&tx->output_pkts);
hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port->port_no));
@@ -5197,14 +5270,20 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,
dp_netdev_pmd_flush_output_on_port(pmd, p, now);
}
#endif
-
- if (OVS_UNLIKELY(dp_packet_batch_size(&p->output_pkts)
- + dp_packet_batch_size(packets_) > NETDEV_MAX_BURST)) {
- /* Some packets was generated while input batch processing.
- * Flush here to avoid overflow. */
+ if (dp_packet_batch_size(&p->output_pkts)
+ + dp_packet_batch_size(packets_) > NETDEV_MAX_BURST) {
+ /* Flush here to avoid overflow. */
dp_netdev_pmd_flush_output_on_port(pmd, p, now);
}
+ if (dp_packet_batch_is_empty(&p->output_pkts)) {
+ uint32_t cur_max_latency;
+
+ atomic_read_relaxed(&dp->output_max_latency, &cur_max_latency);
+ p->output_time = now + cur_max_latency;
+ pmd->n_output_batches++;
+ }
+
DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
dp_packet_batch_add(&p->output_pkts, packet);
}
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 074535b..23930f0 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -344,6 +344,21 @@
+
+
+ Specifies the time in milliseconds that a packet can wait in output
+ batch for sending i.e. amount of time that packet can spend in an
+ intermediate output queue before sending to netdev.
+ This option can be used to configure balance between throughput
+ and latency. Lower values decreases latency while higher values
+ may be useful to achieve higher performance.
+
+
+ Defaults to 0 i.e. instant packet sending (latency optimized).
+
+
+