[ovs-dev] dpif-netlink: don't allocate per thread netlink sockets

When using the kernel datapath, OVS allocates a pool of sockets to handle
netlink events. The number of sockets is: ports * n-handler-threads, where
n-handler-threads is user configurable and defaults to 3/4*number of cores.

This because vswitchd starts n-handler-threads threads, each one with a
netlink socket for every port of the switch. Every thread then, starts
listening on events on its set of sockets with epoll().

On setup with lot of CPUs and ports, the number of sockets easily hits
the process file descriptor limit, and ovs-vswitchd will exit with -EMFILE.

Change the number of allocated sockets to just one per port by moving
the socket array from a per handler structure to a per datapath one,
and let all the handlers share the same sockets by using EPOLLEXCLUSIVE
epoll flag which avoids duplicate events, on systems that support it.

The patch was tested on a 56 core machine running Linux 4.18 and latest
Open vSwitch. A bridge was created with 2000+ ports, some of them being
veth interfaces with the peer outside the bridge. The latency of the upcall
is measured by setting a single 'action=controller,local' OpenFlow rule to
force all the packets going to the slow path and then to the local port.
A tool[1] injects some packets to the veth outside the bridge, and measures
the delay until the packet is captured on the local port. The rx timestamp
is get from the socket ancillary data in the attribute SO_TIMESTAMPNS, to
avoid having the scheduler delay in the measured time.

The first test measures the average latency for an upcall generated from
a single port. To measure it 100k packets, one every msec, are sent to a
single port and the latencies are measured.

The second test is meant to check latency fairness among ports, namely if
latency is equal between ports or if some ports have lower priority.
The previous test is repeated for every port, the average of the average
latencies and the standard deviation between averages is measured.

The third test serves to measure responsiveness under load. Heavy traffic
is sent through all ports, latency and packet loss is measured
on a single idle port.

The fourth test is all about fairness. Heavy traffic is injected in all
ports but one, latency and packet loss is measured on the single idle port.

This is the test setup:

  # nproc
  56
  # ovs-vsctl show |grep -c Port
  2223
  # ovs-ofctl dump-flows ovs_upc_br
   cookie=0x0, duration=4.827s, table=0, n_packets=0, n_bytes=0, actions=CONTROLLER:65535,LOCAL
  # uname -a
  Linux fc28 4.18.7-200.fc28.x86_64 #1 SMP Mon Sep 10 15:44:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

And these are the results of the tests:

                                          Stock OVS                 Patched
  netlink sockets
  in use by vswitchd
  lsof -p $(pidof ovs-vswitchd) \
      |grep -c GENERIC                        91187                    2227

  Test 1
  one port latency
  min/avg/max/mdev (us)           2.7/6.6/238.7/1.8       1.6/6.8/160.6/1.7

  Test 2
  all port
  avg latency/mdev (us)                   6.51/0.97               6.86/0.17

  Test 3
  single port latency
  under load
  avg/mdev (us)                             7.5/5.9                 3.8/4.8
  packet loss                                  95 %                    62 %

  Test 4
  idle port latency
  under load
  min/avg/max/mdev (us)           0.8/1.5/210.5/0.9       1.0/2.1/344.5/1.2
  packet loss                                  94 %                     4 %

CPU and RAM usage seems not to be affected, the resource usage of vswitchd
idle with 2000+ ports is unchanged:

  # ps u $(pidof ovs-vswitchd)
  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  openvsw+  5430 54.3  0.3 4263964 510968 pts/1  RLl+ 16:20   0:50 ovs-vswitchd

[1] https://github.com/teknoraver/network-tools/blob/master/weed.c

Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 lib/dpif-netlink.c | 308 ++++++++++++---------------------------------
 1 file changed, 80 insertions(+), 228 deletions(-)

Message ID	20180919124703.18704-1-mcroce@redhat.com
State	Changes Requested
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42Ffl71PXJz9sCS for <incoming@patchwork.ozlabs.org>; Wed, 19 Sep 2018 22:47:14 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id CB405DA6; Wed, 19 Sep 2018 12:47:10 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 78D8BBC4 for <dev@openvswitch.org>; Wed, 19 Sep 2018 12:47:09 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C568C63D for <dev@openvswitch.org>; Wed, 19 Sep 2018 12:47:07 +0000 (UTC) Received: by mail-wm1-f48.google.com with SMTP id n11-v6so6777390wmc.2 for <dev@openvswitch.org>; Wed, 19 Sep 2018 05:47:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=2dcSwcp6HUhEGGNFVH4VrXWobdKVGl5SmGrRZ5gEJ1c=; b=RW08MvhiYI4Th+8xFy0hxYPojUsFHTEqlFVk5CEWhe8Gw546rVjrHQ2sQEjgemQ0Xa DqCqDgjfIxh/RglYdd5mB0ISmffy9yne6Td0LQr2AERl6ulsqZ9hKBlLlTWu78eR3zjm IOVEst7jWcX39U7wqMhgj9oDU8Zma/0IwWGalrBA3FXc0pQkeL3Ll+9gK5z7epb7yPjK t77F9S3sxqY1TEA5jvjj6FWNMwaaiWnjGwx9qkoGPYr6w6Y09a2E1eqpc/iAU8RU/VD9 +fgWKPLQ4YNJplZkh+1GgoiVxsEirlO+f4MWkpLVVaoUo5Ky21aU3Y3Vz/aeKOfdCe+3 WDtQ== X-Gm-Message-State: APzg51DaFORzsTZ3ZFzrIwDXl1Jk/P2cgm0fWsRB1e8dUfQhCSk//LLN UTnQJPJU4HzUEFjY/EsmqEicdMKNpwA= X-Google-Smtp-Source: ANB0Vdb1BA7UbUtHTXrLXIXNdohn10jVaPgTzu4oa/wiDZGSgmrg4RDwd60wH4rDKd927rE5WMraLw== X-Received: by 2002:a1c:7015:: with SMTP id l21-v6mr19929706wmc.81.1537361225440; Wed, 19 Sep 2018 05:47:05 -0700 (PDT) Received: from mcroce-redhat.mxp.redhat.com (nat-pool-mxp-t.redhat.com. [149.6.153.186]) by smtp.gmail.com with ESMTPSA id k63-v6sm9424434wmd.46.2018.09.19.05.47.04 for <dev@openvswitch.org> (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 19 Sep 2018 05:47:04 -0700 (PDT) From: Matteo Croce <mcroce@redhat.com> To: dev@openvswitch.org Date: Wed, 19 Sep 2018 14:47:03 +0200 Message-Id: <20180919124703.18704-1-mcroce@redhat.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH] dpif-netlink: don't allocate per thread netlink sockets X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: <ovs-dev.openvswitch.org> List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>, <mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe> List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/> List-Post: <mailto:ovs-dev@openvswitch.org> List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help> List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>, <mailto:ovs-dev-request@openvswitch.org?subject=subscribe> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org
Series	[ovs-dev] dpif-netlink: don't allocate per thread netlink sockets \| expand [ovs-dev] dpif-netlink: don't allocate per thread netlink sockets

[ovs-dev] dpif-netlink: don't allocate per thread netlink sockets

Commit Message

Comments

Patch