[{"id":1777352,"web_url":"http://patchwork.ozlabs.org/comment/1777352/","msgid":"<20170929032146.vs5v454wjs4niu4k@ast-mbp>","list_archive_url":null,"date":"2017-09-29T03:21:47","subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","submitter":{"id":42586,"url":"http://patchwork.ozlabs.org/api/people/42586/","name":"Alexei Starovoitov","email":"alexei.starovoitov@gmail.com"},"content":"On Thu, Sep 28, 2017 at 02:57:08PM +0200, Jesper Dangaard Brouer wrote:\n> The 'cpumap' is primary used as a backend map for XDP BPF helper\n> call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.\n> \n> This patch implement the main part of the map.  It is not connected to\n> the XDP redirect system yet, and no SKB allocation are done yet.\n> \n> The main concern in this patch is to ensure the datapath can run\n> without any locking.  This adds complexity to the setup and tear-down\n> procedure, which assumptions are extra carefully documented in the\n> code comments.\n> \n> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>\n> ---\n>  include/linux/bpf_types.h      |    1 \n>  include/uapi/linux/bpf.h       |    1 \n>  kernel/bpf/Makefile            |    1 \n>  kernel/bpf/cpumap.c            |  547 ++++++++++++++++++++++++++++++++++++++++\n>  kernel/bpf/syscall.c           |    8 +\n>  tools/include/uapi/linux/bpf.h |    1 \n>  6 files changed, 558 insertions(+), 1 deletion(-)\n>  create mode 100644 kernel/bpf/cpumap.c\n> \n> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h\n> index 6f1a567667b8..814c1081a4a9 100644\n> --- a/include/linux/bpf_types.h\n> +++ b/include/linux/bpf_types.h\n> @@ -41,4 +41,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)\n>  #ifdef CONFIG_STREAM_PARSER\n>  BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)\n>  #endif\n> +BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)\n>  #endif\n> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h\n> index e43491ac4823..f14e15702533 100644\n> --- a/include/uapi/linux/bpf.h\n> +++ b/include/uapi/linux/bpf.h\n> @@ -111,6 +111,7 @@ enum bpf_map_type {\n>  \tBPF_MAP_TYPE_HASH_OF_MAPS,\n>  \tBPF_MAP_TYPE_DEVMAP,\n>  \tBPF_MAP_TYPE_SOCKMAP,\n> +\tBPF_MAP_TYPE_CPUMAP,\n>  };\n>  \n>  enum bpf_prog_type {\n> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile\n> index 897daa005b23..dba0bd33a43c 100644\n> --- a/kernel/bpf/Makefile\n> +++ b/kernel/bpf/Makefile\n> @@ -4,6 +4,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o\n>  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o\n>  ifeq ($(CONFIG_NET),y)\n>  obj-$(CONFIG_BPF_SYSCALL) += devmap.o\n> +obj-$(CONFIG_BPF_SYSCALL) += cpumap.o\n>  ifeq ($(CONFIG_STREAM_PARSER),y)\n>  obj-$(CONFIG_BPF_SYSCALL) += sockmap.o\n>  endif\n> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c\n> new file mode 100644\n> index 000000000000..f0948af82e65\n> --- /dev/null\n> +++ b/kernel/bpf/cpumap.c\n> @@ -0,0 +1,547 @@\n> +/* bpf/cpumap.c\n> + *\n> + * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.\n> + * Released under terms in GPL version 2.  See COPYING.\n> + */\n> +\n> +/* The 'cpumap' is primary used as a backend map for XDP BPF helper\n> + * call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.\n> + *\n> + * Unlike devmap which redirect XDP frames out another NIC device,\n> + * this map type redirect raw XDP frames to another CPU.  The remote\n> + * CPU will do SKB-allocation and call the normal network stack.\n> + *\n> + * This is a scalability and isolation mechanism, that allow\n> + * separating the early driver network XDP layer, from the rest of the\n> + * netstack, and assigning dedicated CPUs for this stage.  This\n> + * basically allows for 10G wirespeed pre-filtering via bpf.\n> + */\n> +#include <linux/bpf.h>\n> +#include <linux/filter.h>\n> +#include <linux/ptr_ring.h>\n> +\n> +#include <linux/sched.h>\n> +#include <linux/workqueue.h>\n> +#include <linux/kthread.h>\n> +\n> +/*\n> + * General idea: XDP packets getting XDP redirected to another CPU,\n> + * will maximum be stored/queued for one driver ->poll() call.  It is\n> + * guaranteed that setting flush bit and flush operation happen on\n> + * same CPU.  Thus, cpu_map_flush operation can deduct via this_cpu_ptr()\n> + * which queue in bpf_cpu_map_entry contains packets.\n> + */\n> +\n> +#define CPU_MAP_BULK_SIZE 8  /* 8 == one cacheline on 64-bit archs */\n> +struct xdp_bulk_queue {\n> +\tvoid *q[CPU_MAP_BULK_SIZE];\n> +\tunsigned int count;\n> +};\n> +\n> +/* Struct for every remote \"destination\" CPU in map */\n> +struct bpf_cpu_map_entry {\n> +\tu32 cpu;    /* kthread CPU and map index */\n> +\tint map_id; /* Back reference to map */\n> +\tu32 qsize;  /* Redundant queue size for map lookup */\n> +\n> +\t/* XDP can run multiple RX-ring queues, need __percpu enqueue store */\n> +\tstruct xdp_bulk_queue __percpu *bulkq;\n> +\n> +\t/* Queue with potential multi-producers, and single-consumer kthread */\n> +\tstruct ptr_ring *queue;\n> +\tstruct task_struct *kthread;\n> +\tstruct work_struct kthread_stop_wq;\n> +\n> +\tatomic_t refcnt; /* Control when this struct can be free'ed */\n> +\tstruct rcu_head rcu;\n> +};\n> +\n> +struct bpf_cpu_map {\n> +\tstruct bpf_map map;\n> +\t/* Below members specific for map type */\n> +\tstruct bpf_cpu_map_entry **cpu_map;\n> +\tunsigned long __percpu *flush_needed;\n> +};\n> +\n> +static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu,\n> +\t\t\t     struct xdp_bulk_queue *bq);\n> +\n> +static u64 cpu_map_bitmap_size(const union bpf_attr *attr)\n> +{\n> +\treturn BITS_TO_LONGS(attr->max_entries) * sizeof(unsigned long);\n> +}\n> +\n> +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)\n> +{\n> +\tstruct bpf_cpu_map *cmap;\n> +\tu64 cost;\n> +\tint err;\n> +\n> +\t/* check sanity of attributes */\n> +\tif (attr->max_entries == 0 || attr->key_size != 4 ||\n> +\t    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)\n> +\t\treturn ERR_PTR(-EINVAL);\n> +\n> +\tcmap = kzalloc(sizeof(*cmap), GFP_USER);\n> +\tif (!cmap)\n> +\t\treturn ERR_PTR(-ENOMEM);\n> +\n> +\t/* mandatory map attributes */\n> +\tcmap->map.map_type = attr->map_type;\n> +\tcmap->map.key_size = attr->key_size;\n> +\tcmap->map.value_size = attr->value_size;\n> +\tcmap->map.max_entries = attr->max_entries;\n> +\tcmap->map.map_flags = attr->map_flags;\n> +\tcmap->map.numa_node = bpf_map_attr_numa_node(attr);\n> +\n> +\t/* make sure page count doesn't overflow */\n> +\tcost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);\n> +\tcost += cpu_map_bitmap_size(attr) * num_possible_cpus();\n> +\tif (cost >= U32_MAX - PAGE_SIZE)\n> +\t\tgoto free_cmap;\n> +\tcmap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;\n> +\n> +\t/* if map size is larger than memlock limit, reject it early */\n> +\terr = bpf_map_precharge_memlock(cmap->map.pages);\n> +\tif (err)\n> +\t\tgoto free_cmap;\n> +\n> +\t/* A per cpu bitfield with a bit per possible CPU in map  */\n> +\tcmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),\n> +\t\t\t\t\t    __alignof__(unsigned long));\n> +\tif (!cmap->flush_needed)\n> +\t\tgoto free_cmap;\n> +\n> +\t/* Alloc array for possible remote \"destination\" CPUs */\n> +\tcmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *\n> +\t\t\t\t\t   sizeof(struct bpf_cpu_map_entry *),\n> +\t\t\t\t\t   cmap->map.numa_node);\n> +\tif (!cmap->cpu_map)\n> +\t\tgoto free_cmap;\n> +\n> +\treturn &cmap->map;\n> +free_cmap:\n> +\tfree_percpu(cmap->flush_needed);\n> +\tkfree(cmap);\n> +\treturn ERR_PTR(-ENOMEM);\n> +}\n> +\n> +void __cpu_map_queue_destructor(void *ptr)\n> +{\n> +\t/* For now, just catch this as an error */\n> +\tif (!ptr)\n> +\t\treturn;\n> +\tpr_err(\"ERROR: %s() cpu_map queue was not empty\\n\", __func__);\n> +\tpage_frag_free(ptr);\n> +}\n> +\n> +static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)\n> +{\n> +\tif (atomic_dec_and_test(&rcpu->refcnt)) {\n> +\t\t/* The queue should be empty at this point */\n> +\t\tptr_ring_cleanup(rcpu->queue, __cpu_map_queue_destructor);\n> +\t\tkfree(rcpu->queue);\n> +\t\tkfree(rcpu);\n> +\t}\n> +}\n> +\n> +static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)\n> +{\n> +\tatomic_inc(&rcpu->refcnt);\n> +}\n> +\n> +/* called from workqueue, to workaround syscall using preempt_disable */\n> +static void cpu_map_kthread_stop(struct work_struct *work)\n> +{\n> +\tstruct bpf_cpu_map_entry *rcpu;\n> +\n> +\trcpu = container_of(work, struct bpf_cpu_map_entry, kthread_stop_wq);\n> +\tsynchronize_rcu(); /* wait for flush in __cpu_map_entry_free() */\n> +\tkthread_stop(rcpu->kthread); /* calls put_cpu_map_entry */\n> +}\n> +\n> +static int cpu_map_kthread_run(void *data)\n> +{\n> +\tstruct bpf_cpu_map_entry *rcpu = data;\n> +\n> +\tset_current_state(TASK_INTERRUPTIBLE);\n> +\twhile (!kthread_should_stop()) {\n> +\t\tstruct xdp_pkt *xdp_pkt;\n> +\n> +\t\tschedule();\n> +\t\t/* Do work */\n> +\t\twhile ((xdp_pkt = ptr_ring_consume(rcpu->queue))) {\n> +\t\t\t/* For now just \"refcnt-free\" */\n> +\t\t\tpage_frag_free(xdp_pkt);\n> +\t\t}\n> +\t\t__set_current_state(TASK_INTERRUPTIBLE);\n> +\t}\n> +\tput_cpu_map_entry(rcpu);\n> +\n> +\t__set_current_state(TASK_RUNNING);\n> +\treturn 0;\n> +}\n> +\n> +struct bpf_cpu_map_entry *__cpu_map_entry_alloc(u32 qsize, u32 cpu, int map_id)\n> +{\n> +\tgfp_t gfp = GFP_ATOMIC|__GFP_NOWARN;\n> +\tstruct bpf_cpu_map_entry *rcpu;\n> +\tint numa, err;\n> +\n> +\t/* Have map->numa_node, but choose node of redirect target CPU */\n> +\tnuma = cpu_to_node(cpu);\n> +\n> +\trcpu = kzalloc_node(sizeof(*rcpu), gfp, numa);\n> +\tif (!rcpu)\n> +\t\treturn NULL;\n> +\n> +\t/* Alloc percpu bulkq */\n> +\trcpu->bulkq = __alloc_percpu_gfp(sizeof(*rcpu->bulkq),\n> +\t\t\t\t\t sizeof(void *), gfp);\n> +\tif (!rcpu->bulkq)\n> +\t\tgoto fail;\n> +\n> +\t/* Alloc queue */\n> +\trcpu->queue = kzalloc_node(sizeof(*rcpu->queue), gfp, numa);\n> +\tif (!rcpu->queue)\n> +\t\tgoto fail;\n> +\n> +\terr = ptr_ring_init(rcpu->queue, qsize, gfp);\n> +\tif (err)\n> +\t\tgoto fail;\n> +\trcpu->qsize = qsize;\n> +\n> +\t/* Setup kthread */\n> +\trcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa,\n> +\t\t\t\t\t       \"cpumap/%d/map:%d\", cpu, map_id);\n> +\tif (IS_ERR(rcpu->kthread))\n> +\t\tgoto fail;\n> +\n> +\t/* Make sure kthread runs on a single CPU */\n> +\tkthread_bind(rcpu->kthread, cpu);\n\nis there a check that max_entries <= num_possible_cpu ? I couldn't find it.\notherwise it will be binding to impossible cpu?\n\n> +\twake_up_process(rcpu->kthread);\n\nIn general the whole thing looks like 'threaded NAPI' that Hannes was\nproposing some time back. I liked it back then and I like it now.\nI don't remember what were the objections back then.\nSomething scheduler related?\nAdding Hannes.\n\nStill curious about the questions I asked in the other thread\non what's causing it to be so much better than RPS","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"k7Jk/pSx\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y3H3j6gP0z9t3R\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 13:24:33 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1750981AbdI2DVx (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 23:21:53 -0400","from mail-pg0-f65.google.com ([74.125.83.65]:38843 \"EHLO\n\tmail-pg0-f65.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1750763AbdI2DVv (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Thu, 28 Sep 2017 23:21:51 -0400","by mail-pg0-f65.google.com with SMTP id y192so100403pgd.5\n\tfor <netdev@vger.kernel.org>; Thu, 28 Sep 2017 20:21:51 -0700 (PDT)","from ast-mbp ([2620:10d:c090:180::1:c1f4])\n\tby smtp.gmail.com with ESMTPSA id\n\tf13sm5417540pfj.127.2017.09.28.20.21.48\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tThu, 28 Sep 2017 20:21:50 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:in-reply-to:user-agent;\n\tbh=tGmUxC7P/pnKSwWMHMNSkXk4NB8AKO7N+Pm/uAlRSn0=;\n\tb=k7Jk/pSxL+t2CN/Go1Azc5IA4PJsILyasWanwYBzUXl+IMUs8/aAxMlwbax+yRQrPn\n\tw8kV4HBUBgZFVz74IkR8CvYvWGi0t2dqTkHNbnWLZv5QWHo4i9nCkmBETkthpITtxBNK\n\t54NnMga1LIP/5jG6gpY8t8BuL674mYYEoHARnrNrD5tQpvwi1aSHrF/3p8W48j1kAXC/\n\tbwvrxBTI7WKOUDtdy+v17ObS7DLOiOFkevSRw6EBQyJG8CRvCbtRZmWRGSbdkrLE0Gkh\n\t3a86Af9B0r49124DIhfiaMBvlVYASYIq22Zl7tKM4PVEMFNUiHRdEy1Y/nCJ8q4xFqet\n\tWzbw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:in-reply-to:user-agent;\n\tbh=tGmUxC7P/pnKSwWMHMNSkXk4NB8AKO7N+Pm/uAlRSn0=;\n\tb=a09ZnIYYSudS+qdgI4xhesT1h9K8xdBIJ3hCjARUHD5u0ompTdG+MGxKHq8mmQE96B\n\t3BwrawQ2hRxfeb/zUNs5Z4wPncpXc0JltufoA6eihjnFQSAbrC0daCwaq6jb8o/l4kGL\n\tPOGlL0h2yaN5j9Ps5OEb6/ubo30YxnwoC/KFmJJJgQc/JaDAquVRI1KjpDJBygWxmvgt\n\t94EY97uslMSitOv12wK7MquIYHVzTkc4Mml8L0R07udDC7zA4T3kEVZB0weRlrMUcKnI\n\tEe+6kMYBD/gqxOlc0t6lz8eIohalAT8pv4QvcmPYiyVAD4YaviGf/fM8tFMtR+NotpOe\n\tLang==","X-Gm-Message-State":"AHPjjUiZr6iJhpl7OdKXx5ltDAcEtO0O6DjBvJw2cKq/qEZpg6Vxv0rH\n\tPemKrK9AeH4Y0lN1Vldc81E=","X-Google-Smtp-Source":"AOwi7QBQgASBjfXmlAe0zEQ/tw4gyB4axGO+W/FhrnR7vAoyF7xS2m/a0J7Ur7vs5qwlP9QUO5PVBw==","X-Received":"by 10.99.101.68 with SMTP id z65mr2908491pgb.205.1506655310996; \n\tThu, 28 Sep 2017 20:21:50 -0700 (PDT)","Date":"Thu, 28 Sep 2017 20:21:47 -0700","From":"Alexei Starovoitov <alexei.starovoitov@gmail.com>","To":"Jesper Dangaard Brouer <brouer@redhat.com>","Cc":"netdev@vger.kernel.org, jakub.kicinski@netronome.com,\n\t\"Michael S. Tsirkin\" <mst@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, mchan@broadcom.com,\n\tJohn Fastabend <john.fastabend@gmail.com>, peter.waskiewicz.jr@intel.com,\n\tDaniel Borkmann <borkmann@iogearbox.net>,\n\tAndy Gospodarek <andy@greyhouse.net>, hannes@stressinduktion.org","Subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","Message-ID":"<20170929032146.vs5v454wjs4niu4k@ast-mbp>","References":"<150660339205.2808.7084136789768233829.stgit@firesoul>\n\t<150660342793.2808.10838498581615265043.stgit@firesoul>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<150660342793.2808.10838498581615265043.stgit@firesoul>","User-Agent":"NeoMutt/20170421 (1.8.2)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777401,"web_url":"http://patchwork.ozlabs.org/comment/1777401/","msgid":"<8737760wg5.fsf@stressinduktion.org>","list_archive_url":null,"date":"2017-09-29T07:56:42","subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","submitter":{"id":18284,"url":"http://patchwork.ozlabs.org/api/people/18284/","name":"Hannes Frederic Sowa","email":"hannes@stressinduktion.org"},"content":"[adding Paolo, Eric]\n\nAlexei Starovoitov <alexei.starovoitov@gmail.com> writes:\n\n> On Thu, Sep 28, 2017 at 02:57:08PM +0200, Jesper Dangaard Brouer wrote:\n\n[...]\n\n>> +\twake_up_process(rcpu->kthread);\n>\n> In general the whole thing looks like 'threaded NAPI' that Hannes was\n> proposing some time back. I liked it back then and I like it now.\n> I don't remember what were the objections back then.\n> Something scheduler related?\n> Adding Hannes.\n\nYes.\n\nThe main objection from Eric at that time was that user space now starts\nto compete with the threaded NAPI threads depending on process\npriorities, which are under control of user space. Softirq always runs\nfirst to end. Networking could starve because a process with higher\npriority is runnable. At that time Eric found a way to fix the\nparticular problem, which resulted in commit 4cd13c21b207e80d. Pinning\nand other control is also possible from user space, causing more complex\ntuning set ups and problems will be harder to debug.\n\nIn particular after Eric's patch threaded NAPI proofed itself to be not\nuseful anymore, because his patch successfully deferred work to the\nksoftirqd more reliable thus allowing the UDP rx queue to get drained by\nuser space.\n\n> Still curious about the questions I asked in the other thread\n> on what's causing it to be so much better than RPS\n\nMy guess is that RPS uses expensive IPI to notify the remote\nsoftirq. The batching size on RPS depends on how many packets could get\nworked on during one softirq invocation on the source CPU until we wake\nup remote CPU(s!), if they are not constantly running.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=stressinduktion.org\n\theader.i=@stressinduktion.org header.b=\"caYQXgcr\"; \n\tdkim=pass (2048-bit key;\n\tunprotected) header.d=messagingengine.com\n\theader.i=@messagingengine.com header.b=\"fCUHIkL1\"; \n\tdkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y3P5w28MQz9t2m\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 17:56:52 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752024AbdI2H4u (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 29 Sep 2017 03:56:50 -0400","from out1-smtp.messagingengine.com ([66.111.4.25]:56817 \"EHLO\n\tout1-smtp.messagingengine.com\" rhost-flags-OK-OK-OK-OK)\n\tby vger.kernel.org with ESMTP id S1751037AbdI2H4r (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Fri, 29 Sep 2017 03:56:47 -0400","from compute7.internal (compute7.nyi.internal [10.202.2.47])\n\tby mailout.nyi.internal (Postfix) with ESMTP id 0AD4620C7F;\n\tFri, 29 Sep 2017 03:56:47 -0400 (EDT)","from frontend2 ([10.202.2.161])\n\tby compute7.internal (MEProxy); Fri, 29 Sep 2017 03:56:47 -0400","from z.localhost.stressinduktion.org (unknown [185.72.66.238])\n\tby mail.messagingengine.com (Postfix) with ESMTPA id 88BE2249FA;\n\tFri, 29 Sep 2017 03:56:43 -0400 (EDT)"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=\n\tstressinduktion.org; h=cc:content-type:date:from:in-reply-to\n\t:message-id:mime-version:references:subject:to:x-me-sender\n\t:x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=B81By3M1XjrdVOuw6D\n\t+Mw1pvCJL2mfQUvFviyVbUWOc=; b=caYQXgcrJ1/yX6uX2YcMoy1XzWQP29T+2o\n\tYIeG3fSz5xHX23j9leFAeBPkTSA1SFGuInOLPVNP1AiMniNw4+/Fpw7+A80NhH0u\n\t5E7Kf9urQhAvTcnVBumS/wBgQnPuy4YkF7guWu9zIMZrKeF58p6eqLTcG9TJnEUL\n\t88PlTlhx7lnLUIASyfU6KcWnM+VQWv0O0cgX+SPlXvHNsUdI8X0mU99OkzuF2AUQ\n\tz7Zhqn/DNm2KBC9MNt7w1Hx5s6nrrRZJ9ljKIKKAsacdNRAZcPngeth8rJjlpJHU\n\tt06Hjapb+Qr8KhMMsTXsJS9ZI2u731VqigpN6PlVeEW79Q1qJODg==","v=1; a=rsa-sha256; c=relaxed/relaxed; d=\n\tmessagingengine.com; h=cc:content-type:date:from:in-reply-to\n\t:message-id:mime-version:references:subject:to:x-me-sender\n\t:x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=B81By3M1XjrdVOuw6D\n\t+Mw1pvCJL2mfQUvFviyVbUWOc=; b=fCUHIkL1+3xUcvhdQnnGCpfXfL5kqMfYVf\n\tcDIywqe4RBsPLbP1woBCfZo8WcJyQSJq0AbUGiyiaIKs40+p4N1ODN7MpBFLrXnY\n\t9WWtWA48Ass9M0nLaV1CFdXTih28yIr8RRBkfA/9M2de6t7DRURJOwMa2lTb+GBO\n\tMUD8y8yC7mK1oeYAv8KPGEtAU+rduRMlO2V3ydqRyXGwIzKiIw4yU21JPX14G/98\n\tMpZtgi4tAtVznyz5jxGRTR38KlPzbyMJ5MghLXbSNObg90gn8V8EWITdK+i1bzPp\n\t1HLW4Ukrpxr+sjc5C+VYBH6zT1VWjUvv6Gp6vf32gaZtp2znVqvw=="],"X-ME-Sender":"<xms:vvzNWVJuyixJGpLJqIe3mc-n09CZsuHHLbMf5odls_bFL1O8fpL_YA>","X-Sasl-enc":"+iYg7xb5WtxbtbHHoql1v93EoPHyPDzmCShuGtt1bft9 1506671805","From":"Hannes Frederic Sowa <hannes@stressinduktion.org>","To":"Alexei Starovoitov <alexei.starovoitov@gmail.com>","Cc":"Jesper Dangaard Brouer <brouer@redhat.com>, netdev@vger.kernel.org,\n\tjakub.kicinski@netronome.com, \"Michael S. Tsirkin\" <mst@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, mchan@broadcom.com,\n\tJohn Fastabend <john.fastabend@gmail.com>, peter.waskiewicz.jr@intel.com,\n\tDaniel Borkmann <borkmann@iogearbox.net>,\n\tAndy Gospodarek <andy@greyhouse.net>, pabeni@redhat.com,\n\tedumazet@google.com","Subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","References":"<150660339205.2808.7084136789768233829.stgit@firesoul>\n\t<150660342793.2808.10838498581615265043.stgit@firesoul>\n\t<20170929032146.vs5v454wjs4niu4k@ast-mbp>","Date":"Fri, 29 Sep 2017 09:56:42 +0200","In-Reply-To":"<20170929032146.vs5v454wjs4niu4k@ast-mbp> (Alexei Starovoitov's\n\tmessage of \"Thu, 28 Sep 2017 20:21:47 -0700\")","Message-ID":"<8737760wg5.fsf@stressinduktion.org>","User-Agent":"Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)","MIME-Version":"1.0","Content-Type":"text/plain","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777435,"web_url":"http://patchwork.ozlabs.org/comment/1777435/","msgid":"<20170929111411.59ef54d7@redhat.com>","list_archive_url":null,"date":"2017-09-29T09:14:11","subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"content":"On Thu, 28 Sep 2017 20:21:47 -0700\nAlexei Starovoitov <alexei.starovoitov@gmail.com> wrote:\n\n> On Thu, Sep 28, 2017 at 02:57:08PM +0200, Jesper Dangaard Brouer wrote:\n> > The 'cpumap' is primary used as a backend map for XDP BPF helper\n> > call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.\n> > \n> > This patch implement the main part of the map.  It is not connected to\n> > the XDP redirect system yet, and no SKB allocation are done yet.\n> > \n> > The main concern in this patch is to ensure the datapath can run\n> > without any locking.  This adds complexity to the setup and tear-down\n> > procedure, which assumptions are extra carefully documented in the\n> > code comments.\n> > \n> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>\n> > ---\n> >  include/linux/bpf_types.h      |    1 \n> >  include/uapi/linux/bpf.h       |    1 \n> >  kernel/bpf/Makefile            |    1 \n> >  kernel/bpf/cpumap.c            |  547 ++++++++++++++++++++++++++++++++++++++++\n> >  kernel/bpf/syscall.c           |    8 +\n> >  tools/include/uapi/linux/bpf.h |    1 \n> >  6 files changed, 558 insertions(+), 1 deletion(-)\n> >  create mode 100644 kernel/bpf/cpumap.c\n> > \n> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h\n> > index 6f1a567667b8..814c1081a4a9 100644\n> > --- a/include/linux/bpf_types.h\n> > +++ b/include/linux/bpf_types.h\n> > @@ -41,4 +41,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)\n> >  #ifdef CONFIG_STREAM_PARSER\n> >  BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)\n> >  #endif\n> > +BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)\n> >  #endif\n> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h\n> > index e43491ac4823..f14e15702533 100644\n> > --- a/include/uapi/linux/bpf.h\n> > +++ b/include/uapi/linux/bpf.h\n> > @@ -111,6 +111,7 @@ enum bpf_map_type {\n> >  \tBPF_MAP_TYPE_HASH_OF_MAPS,\n> >  \tBPF_MAP_TYPE_DEVMAP,\n> >  \tBPF_MAP_TYPE_SOCKMAP,\n> > +\tBPF_MAP_TYPE_CPUMAP,\n> >  };\n> >  \n> >  enum bpf_prog_type {\n> > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile\n> > index 897daa005b23..dba0bd33a43c 100644\n> > --- a/kernel/bpf/Makefile\n> > +++ b/kernel/bpf/Makefile\n> > @@ -4,6 +4,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o\n> >  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o\n> >  ifeq ($(CONFIG_NET),y)\n> >  obj-$(CONFIG_BPF_SYSCALL) += devmap.o\n> > +obj-$(CONFIG_BPF_SYSCALL) += cpumap.o\n> >  ifeq ($(CONFIG_STREAM_PARSER),y)\n> >  obj-$(CONFIG_BPF_SYSCALL) += sockmap.o\n> >  endif\n> > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c\n> > new file mode 100644\n> > index 000000000000..f0948af82e65\n> > --- /dev/null\n> > +++ b/kernel/bpf/cpumap.c\n> > @@ -0,0 +1,547 @@\n> > +/* bpf/cpumap.c\n> > + *\n> > + * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.\n> > + * Released under terms in GPL version 2.  See COPYING.\n> > + */\n> > +\n> > +/* The 'cpumap' is primary used as a backend map for XDP BPF helper\n> > + * call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.\n> > + *\n> > + * Unlike devmap which redirect XDP frames out another NIC device,\n> > + * this map type redirect raw XDP frames to another CPU.  The remote\n> > + * CPU will do SKB-allocation and call the normal network stack.\n> > + *\n> > + * This is a scalability and isolation mechanism, that allow\n> > + * separating the early driver network XDP layer, from the rest of the\n> > + * netstack, and assigning dedicated CPUs for this stage.  This\n> > + * basically allows for 10G wirespeed pre-filtering via bpf.\n> > + */\n> > +#include <linux/bpf.h>\n> > +#include <linux/filter.h>\n> > +#include <linux/ptr_ring.h>\n> > +\n> > +#include <linux/sched.h>\n> > +#include <linux/workqueue.h>\n> > +#include <linux/kthread.h>\n> > +\n> > +/*\n> > + * General idea: XDP packets getting XDP redirected to another CPU,\n> > + * will maximum be stored/queued for one driver ->poll() call.  It is\n> > + * guaranteed that setting flush bit and flush operation happen on\n> > + * same CPU.  Thus, cpu_map_flush operation can deduct via this_cpu_ptr()\n> > + * which queue in bpf_cpu_map_entry contains packets.\n> > + */\n> > +\n> > +#define CPU_MAP_BULK_SIZE 8  /* 8 == one cacheline on 64-bit archs */\n> > +struct xdp_bulk_queue {\n> > +\tvoid *q[CPU_MAP_BULK_SIZE];\n> > +\tunsigned int count;\n> > +};\n> > +\n> > +/* Struct for every remote \"destination\" CPU in map */\n> > +struct bpf_cpu_map_entry {\n> > +\tu32 cpu;    /* kthread CPU and map index */\n> > +\tint map_id; /* Back reference to map */\n> > +\tu32 qsize;  /* Redundant queue size for map lookup */\n> > +\n> > +\t/* XDP can run multiple RX-ring queues, need __percpu enqueue store */\n> > +\tstruct xdp_bulk_queue __percpu *bulkq;\n> > +\n> > +\t/* Queue with potential multi-producers, and single-consumer kthread */\n> > +\tstruct ptr_ring *queue;\n> > +\tstruct task_struct *kthread;\n> > +\tstruct work_struct kthread_stop_wq;\n> > +\n> > +\tatomic_t refcnt; /* Control when this struct can be free'ed */\n> > +\tstruct rcu_head rcu;\n> > +};\n> > +\n> > +struct bpf_cpu_map {\n> > +\tstruct bpf_map map;\n> > +\t/* Below members specific for map type */\n> > +\tstruct bpf_cpu_map_entry **cpu_map;\n> > +\tunsigned long __percpu *flush_needed;\n> > +};\n> > +\n> > +static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu,\n> > +\t\t\t     struct xdp_bulk_queue *bq);\n> > +\n> > +static u64 cpu_map_bitmap_size(const union bpf_attr *attr)\n> > +{\n> > +\treturn BITS_TO_LONGS(attr->max_entries) * sizeof(unsigned long);\n> > +}\n> > +\n> > +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)\n> > +{\n> > +\tstruct bpf_cpu_map *cmap;\n> > +\tu64 cost;\n> > +\tint err;\n> > +\n> > +\t/* check sanity of attributes */\n> > +\tif (attr->max_entries == 0 || attr->key_size != 4 ||\n> > +\t    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)\n> > +\t\treturn ERR_PTR(-EINVAL);\n> > +\n> > +\tcmap = kzalloc(sizeof(*cmap), GFP_USER);\n> > +\tif (!cmap)\n> > +\t\treturn ERR_PTR(-ENOMEM);\n> > +\n> > +\t/* mandatory map attributes */\n> > +\tcmap->map.map_type = attr->map_type;\n> > +\tcmap->map.key_size = attr->key_size;\n> > +\tcmap->map.value_size = attr->value_size;\n> > +\tcmap->map.max_entries = attr->max_entries;\n> > +\tcmap->map.map_flags = attr->map_flags;\n> > +\tcmap->map.numa_node = bpf_map_attr_numa_node(attr);\n> > +\n> > +\t/* make sure page count doesn't overflow */\n> > +\tcost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);\n> > +\tcost += cpu_map_bitmap_size(attr) * num_possible_cpus();\n> > +\tif (cost >= U32_MAX - PAGE_SIZE)\n> > +\t\tgoto free_cmap;\n> > +\tcmap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;\n> > +\n> > +\t/* if map size is larger than memlock limit, reject it early */\n> > +\terr = bpf_map_precharge_memlock(cmap->map.pages);\n> > +\tif (err)\n> > +\t\tgoto free_cmap;\n> > +\n> > +\t/* A per cpu bitfield with a bit per possible CPU in map  */\n> > +\tcmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),\n> > +\t\t\t\t\t    __alignof__(unsigned long));\n> > +\tif (!cmap->flush_needed)\n> > +\t\tgoto free_cmap;\n> > +\n> > +\t/* Alloc array for possible remote \"destination\" CPUs */\n> > +\tcmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *\n> > +\t\t\t\t\t   sizeof(struct bpf_cpu_map_entry *),\n> > +\t\t\t\t\t   cmap->map.numa_node);\n> > +\tif (!cmap->cpu_map)\n> > +\t\tgoto free_cmap;\n> > +\n> > +\treturn &cmap->map;\n> > +free_cmap:\n> > +\tfree_percpu(cmap->flush_needed);\n> > +\tkfree(cmap);\n> > +\treturn ERR_PTR(-ENOMEM);\n> > +}\n> > +\n> > +void __cpu_map_queue_destructor(void *ptr)\n> > +{\n> > +\t/* For now, just catch this as an error */\n> > +\tif (!ptr)\n> > +\t\treturn;\n> > +\tpr_err(\"ERROR: %s() cpu_map queue was not empty\\n\", __func__);\n> > +\tpage_frag_free(ptr);\n> > +}\n> > +\n> > +static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)\n> > +{\n> > +\tif (atomic_dec_and_test(&rcpu->refcnt)) {\n> > +\t\t/* The queue should be empty at this point */\n> > +\t\tptr_ring_cleanup(rcpu->queue, __cpu_map_queue_destructor);\n> > +\t\tkfree(rcpu->queue);\n> > +\t\tkfree(rcpu);\n> > +\t}\n> > +}\n> > +\n> > +static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)\n> > +{\n> > +\tatomic_inc(&rcpu->refcnt);\n> > +}\n> > +\n> > +/* called from workqueue, to workaround syscall using preempt_disable */\n> > +static void cpu_map_kthread_stop(struct work_struct *work)\n> > +{\n> > +\tstruct bpf_cpu_map_entry *rcpu;\n> > +\n> > +\trcpu = container_of(work, struct bpf_cpu_map_entry, kthread_stop_wq);\n> > +\tsynchronize_rcu(); /* wait for flush in __cpu_map_entry_free() */\n> > +\tkthread_stop(rcpu->kthread); /* calls put_cpu_map_entry */\n> > +}\n> > +\n> > +static int cpu_map_kthread_run(void *data)\n> > +{\n> > +\tstruct bpf_cpu_map_entry *rcpu = data;\n> > +\n> > +\tset_current_state(TASK_INTERRUPTIBLE);\n> > +\twhile (!kthread_should_stop()) {\n> > +\t\tstruct xdp_pkt *xdp_pkt;\n> > +\n> > +\t\tschedule();\n> > +\t\t/* Do work */\n> > +\t\twhile ((xdp_pkt = ptr_ring_consume(rcpu->queue))) {\n> > +\t\t\t/* For now just \"refcnt-free\" */\n> > +\t\t\tpage_frag_free(xdp_pkt);\n> > +\t\t}\n> > +\t\t__set_current_state(TASK_INTERRUPTIBLE);\n> > +\t}\n> > +\tput_cpu_map_entry(rcpu);\n> > +\n> > +\t__set_current_state(TASK_RUNNING);\n> > +\treturn 0;\n> > +}\n> > +\n> > +struct bpf_cpu_map_entry *__cpu_map_entry_alloc(u32 qsize, u32 cpu, int map_id)\n> > +{\n> > +\tgfp_t gfp = GFP_ATOMIC|__GFP_NOWARN;\n> > +\tstruct bpf_cpu_map_entry *rcpu;\n> > +\tint numa, err;\n> > +\n> > +\t/* Have map->numa_node, but choose node of redirect target CPU */\n> > +\tnuma = cpu_to_node(cpu);\n> > +\n> > +\trcpu = kzalloc_node(sizeof(*rcpu), gfp, numa);\n> > +\tif (!rcpu)\n> > +\t\treturn NULL;\n> > +\n> > +\t/* Alloc percpu bulkq */\n> > +\trcpu->bulkq = __alloc_percpu_gfp(sizeof(*rcpu->bulkq),\n> > +\t\t\t\t\t sizeof(void *), gfp);\n> > +\tif (!rcpu->bulkq)\n> > +\t\tgoto fail;\n> > +\n> > +\t/* Alloc queue */\n> > +\trcpu->queue = kzalloc_node(sizeof(*rcpu->queue), gfp, numa);\n> > +\tif (!rcpu->queue)\n> > +\t\tgoto fail;\n> > +\n> > +\terr = ptr_ring_init(rcpu->queue, qsize, gfp);\n> > +\tif (err)\n> > +\t\tgoto fail;\n> > +\trcpu->qsize = qsize;\n> > +\n> > +\t/* Setup kthread */\n> > +\trcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa,\n> > +\t\t\t\t\t       \"cpumap/%d/map:%d\", cpu, map_id);\n> > +\tif (IS_ERR(rcpu->kthread))\n> > +\t\tgoto fail;\n> > +\n> > +\t/* Make sure kthread runs on a single CPU */\n> > +\tkthread_bind(rcpu->kthread, cpu);  \n> \n> is there a check that max_entries <= num_possible_cpu ? I couldn't\n> find it. otherwise it will be binding to impossible cpu?\n\nGood point! -- I'll find an appropriate place to add such a limit.\n\n\n> > +\twake_up_process(rcpu->kthread);  \n> \n> In general the whole thing looks like 'threaded NAPI' that Hannes was\n> proposing some time back. I liked it back then and I like it now.\n> I don't remember what were the objections back then.\n> Something scheduler related?\n> Adding Hannes.\n\nIt is related to the threaded NAPI' idea[1], and I did choose kthreads\nbecause this was used by this patch[1].\n(Link to Hannes & Paolo's patch:[1] http://patchwork.ozlabs.org/patch/620657/)\n\nIt's less-intrusive, as it's only activated specifically when activating\nbpf+XDP+cpumap.  Plus, it's not taking over the calling of napi->poll,\nit is \"just\" making to \"cost\" of calling napi->poll significantly\nsmaller, as it moves invoking the network stack to another kthread. And\nthe choice is done on a per packet level (you don't get more\nflexibility than that).\n\n> Still curious about the questions I asked in the other thread\n> on what's causing it to be so much better than RPS\n\nAnswered in that thread.  It is simply that the RPS-RX CPU have to do\ntoo much work (like memory allocations).  Plus it uses more expensive\nIPI calls, where I use wake_up_process() which doesn't do a IPI if it\ncan see that the remote thread is already running.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx02.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx02.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y3QqQ5cQ2z9t2m\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 19:14:26 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752102AbdI2JOY (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 29 Sep 2017 05:14:24 -0400","from mx1.redhat.com ([209.132.183.28]:49706 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1750709AbdI2JOV (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 29 Sep 2017 05:14:21 -0400","from smtp.corp.redhat.com\n\t(int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 22017883B0;\n\tFri, 29 Sep 2017 09:14:21 +0000 (UTC)","from localhost (ovpn-200-30.brq.redhat.com [10.40.200.30])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 3F5996061D;\n\tFri, 29 Sep 2017 09:14:12 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 22017883B0","Date":"Fri, 29 Sep 2017 11:14:11 +0200","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"Alexei Starovoitov <alexei.starovoitov@gmail.com>","Cc":"netdev@vger.kernel.org, jakub.kicinski@netronome.com,\n\t\"Michael S. Tsirkin\" <mst@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, mchan@broadcom.com,\n\tJohn Fastabend <john.fastabend@gmail.com>, peter.waskiewicz.jr@intel.com,\n\tDaniel Borkmann <borkmann@iogearbox.net>,\n\tAndy Gospodarek <andy@greyhouse.net>,\n\thannes@stressinduktion.org, brouer@redhat.com","Subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","Message-ID":"<20170929111411.59ef54d7@redhat.com>","In-Reply-To":"<20170929032146.vs5v454wjs4niu4k@ast-mbp>","References":"<150660339205.2808.7084136789768233829.stgit@firesoul>\n\t<150660342793.2808.10838498581615265043.stgit@firesoul>\n\t<20170929032146.vs5v454wjs4niu4k@ast-mbp>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.13","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.26]);\n\tFri, 29 Sep 2017 09:14:21 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777444,"web_url":"http://patchwork.ozlabs.org/comment/1777444/","msgid":"<1506677857.2478.5.camel@redhat.com>","list_archive_url":null,"date":"2017-09-29T09:37:37","subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","submitter":{"id":67312,"url":"http://patchwork.ozlabs.org/api/people/67312/","name":"Paolo Abeni","email":"pabeni@redhat.com"},"content":"On Fri, 2017-09-29 at 09:56 +0200, Hannes Frederic Sowa wrote:\n> [adding Paolo, Eric]\n> \n> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:\n> \n> > On Thu, Sep 28, 2017 at 02:57:08PM +0200, Jesper Dangaard Brouer wrote:\n> \n> [...]\n> \n> > > +\twake_up_process(rcpu->kthread);\n> > \n> > In general the whole thing looks like 'threaded NAPI' that Hannes was\n> > proposing some time back. I liked it back then and I like it now.\n> > I don't remember what were the objections back then.\n> > Something scheduler related?\n> > Adding Hannes.\n\nBeyond the added scheduling complexity, the threaded NAPI\nimplementation proposed some time ago also possibly introduced OoO\npacket delivery, because the NAPI threads were left unbound to any CPU.\n\nCheers,\n\nPaolo","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=pabeni@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y3RLP4Bzpz9t2f\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 19:37:49 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751895AbdI2Jhq (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 29 Sep 2017 05:37:46 -0400","from mx1.redhat.com ([209.132.183.28]:39856 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1750847AbdI2Jhp (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 29 Sep 2017 05:37:45 -0400","from smtp.corp.redhat.com\n\t(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 473D2C04AC41;\n\tFri, 29 Sep 2017 09:37:45 +0000 (UTC)","from localhost.localdomain (unknown [10.32.181.195])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 190F5627D9;\n\tFri, 29 Sep 2017 09:37:37 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 473D2C04AC41","Message-ID":"<1506677857.2478.5.camel@redhat.com>","Subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","From":"Paolo Abeni <pabeni@redhat.com>","To":"Hannes Frederic Sowa <hannes@stressinduktion.org>,\n\tAlexei Starovoitov <alexei.starovoitov@gmail.com>","Cc":"Jesper Dangaard Brouer <brouer@redhat.com>, netdev@vger.kernel.org,\n\tjakub.kicinski@netronome.com, \"Michael S. Tsirkin\" <mst@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, mchan@broadcom.com,\n\tJohn Fastabend <john.fastabend@gmail.com>, peter.waskiewicz.jr@intel.com,\n\tDaniel Borkmann <borkmann@iogearbox.net>,\n\tAndy Gospodarek <andy@greyhouse.net>, edumazet@google.com","Date":"Fri, 29 Sep 2017 11:37:37 +0200","In-Reply-To":"<8737760wg5.fsf@stressinduktion.org>","References":"<150660339205.2808.7084136789768233829.stgit@firesoul>\n\t<150660342793.2808.10838498581615265043.stgit@firesoul>\n\t<20170929032146.vs5v454wjs4niu4k@ast-mbp>\n\t<8737760wg5.fsf@stressinduktion.org>","Content-Type":"text/plain; charset=\"UTF-8\"","Mime-Version":"1.0","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.15","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.31]);\n\tFri, 29 Sep 2017 09:37:45 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777445,"web_url":"http://patchwork.ozlabs.org/comment/1777445/","msgid":"<87r2upn8pr.fsf@stressinduktion.org>","list_archive_url":null,"date":"2017-09-29T09:40:48","subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","submitter":{"id":18284,"url":"http://patchwork.ozlabs.org/api/people/18284/","name":"Hannes Frederic Sowa","email":"hannes@stressinduktion.org"},"content":"Paolo Abeni <pabeni@redhat.com> writes:\n\n> On Fri, 2017-09-29 at 09:56 +0200, Hannes Frederic Sowa wrote:\n>> [adding Paolo, Eric]\n>> \n>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:\n>> \n>> > On Thu, Sep 28, 2017 at 02:57:08PM +0200, Jesper Dangaard Brouer wrote:\n>> \n>> [...]\n>> \n>> > > +\twake_up_process(rcpu->kthread);\n>> > \n>> > In general the whole thing looks like 'threaded NAPI' that Hannes was\n>> > proposing some time back. I liked it back then and I like it now.\n>> > I don't remember what were the objections back then.\n>> > Something scheduler related?\n>> > Adding Hannes.\n>\n> Beyond the added scheduling complexity, the threaded NAPI\n> implementation proposed some time ago also possibly introduced OoO\n> packet delivery, because the NAPI threads were left unbound to any CPU.\n\nRight, yes, but that can be resolved. The problem was just in that\nparticular patch.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=stressinduktion.org\n\theader.i=@stressinduktion.org header.b=\"vMuIqVDd\"; \n\tdkim=pass (2048-bit key;\n\tunprotected) header.d=messagingengine.com\n\theader.i=@messagingengine.com header.b=\"UhmJDlXJ\"; \n\tdkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y3RQ239pKz9t2r\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 19:40:58 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751436AbdI2Jk4 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 29 Sep 2017 05:40:56 -0400","from out1-smtp.messagingengine.com ([66.111.4.25]:35593 \"EHLO\n\tout1-smtp.messagingengine.com\" rhost-flags-OK-OK-OK-OK)\n\tby vger.kernel.org with ESMTP id S1750847AbdI2Jkz (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Fri, 29 Sep 2017 05:40:55 -0400","from compute7.internal (compute7.nyi.internal [10.202.2.47])\n\tby mailout.nyi.internal (Postfix) with ESMTP id 36AE421ACC;\n\tFri, 29 Sep 2017 05:40:52 -0400 (EDT)","from frontend1 ([10.202.2.160])\n\tby compute7.internal (MEProxy); Fri, 29 Sep 2017 05:40:54 -0400","from z.localhost.stressinduktion.org (unknown [185.72.66.238])\n\tby mail.messagingengine.com (Postfix) with ESMTPA id B6EFF7E312;\n\tFri, 29 Sep 2017 05:40:49 -0400 (EDT)"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=\n\tstressinduktion.org; h=cc:content-type:date:from:in-reply-to\n\t:message-id:mime-version:references:subject:to:x-me-sender\n\t:x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=IzO/Dag4nWTbKHK20h\n\t7CNY1lH9eMjxgR76oNRu6f2io=; b=vMuIqVDdyGqAQaf/uiaNkXUX5eJoI0p7Yk\n\tqRCvL+x+/5YOt/nINrIk2+n19oCP2YLYu2tER7gNpQGiMwuROBmKjMS9195RS2CN\n\txh9w/ixvLcCwIFgcgsNPK+ai3WS1d7nTo9Y3tzcHRdMcLFiXxMblV3mUq/mwZUvE\n\trxtGOfCv9NI33U91iXRuN2jKyyRIMcuw649U4A00IYYxBrNLougexB+Q6lFL5x8s\n\txaz97Ygaa1+9NL3TCJaky9lyxKwQ5VxzNbXI5NtizkJEo5aQN1QtK0XlUFO0O0Sg\n\tKjlJ9KDNPJBA28JzHZai4t6rGAKS5H3K1FKuqdoDWoJT/MzJ28dQ==","v=1; a=rsa-sha256; c=relaxed/relaxed; d=\n\tmessagingengine.com; h=cc:content-type:date:from:in-reply-to\n\t:message-id:mime-version:references:subject:to:x-me-sender\n\t:x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=IzO/Dag4nWTbKHK20h\n\t7CNY1lH9eMjxgR76oNRu6f2io=; b=UhmJDlXJs4hF844DgnHHwgyxv4EWvtCmPJ\n\t+PaR7SsxJxZNU6H8wR1OY2njDj7Iy7zmzzGXRhyVFqY5fWR+ZD3txMI2Q7yI+Vg4\n\tHBq6jISHVOKShKJyTVIDM53Z+Dt8Gv3rkv3YqPnvLdheFGr38cDwyAogNn4dvsM7\n\tlYZFRBP2cstwwlu66UrcYu7y5OxKXSwhJ/5M0E0HzjCgi7yZDwT3XLu3EXes3gcT\n\tj9srHMBT6empNcHlrltsa+gq9Zatbh0X+DxZBzg8Bw8BeasiSyETJTQq+FGV41Mz\n\t8/P1Z5CBRE+tLvJ8ds4IE5AlJ7Bm6bJpWPHZJ1pts2pR6h0Lfr0w=="],"X-ME-Sender":"<xms:JBXOWflXnkxEtkoQRsnLoZLShqWSHtR25EBMQ81WqjYG6u6i7Ryy2g>","X-Sasl-enc":"VZoaZun8ZfnaS4QznOwI/iYRbYMMrCrxKFddMHcP8z14 1506678052","From":"Hannes Frederic Sowa <hannes@stressinduktion.org>","To":"Paolo Abeni <pabeni@redhat.com>","Cc":"Alexei Starovoitov <alexei.starovoitov@gmail.com>,\n\tJesper Dangaard Brouer <brouer@redhat.com>,\n\tnetdev@vger.kernel.org, jakub.kicinski@netronome.com,\n\t\"Michael S. Tsirkin\" <mst@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, mchan@broadcom.com,\n\tJohn Fastabend <john.fastabend@gmail.com>, peter.waskiewicz.jr@intel.com,\n\tDaniel Borkmann <borkmann@iogearbox.net>,\n\tAndy Gospodarek <andy@greyhouse.net>, edumazet@google.com","Subject":"Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type\n\tBPF_MAP_TYPE_CPUMAP","References":"<150660339205.2808.7084136789768233829.stgit@firesoul>\n\t<150660342793.2808.10838498581615265043.stgit@firesoul>\n\t<20170929032146.vs5v454wjs4niu4k@ast-mbp>\n\t<8737760wg5.fsf@stressinduktion.org>\n\t<1506677857.2478.5.camel@redhat.com>","Date":"Fri, 29 Sep 2017 11:40:48 +0200","In-Reply-To":"<1506677857.2478.5.camel@redhat.com> (Paolo Abeni's message of\n\t\"Fri, 29 Sep 2017 11:37:37 +0200\")","Message-ID":"<87r2upn8pr.fsf@stressinduktion.org>","User-Agent":"Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)","MIME-Version":"1.0","Content-Type":"text/plain","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}}]