[{"id":1774900,"web_url":"http://patchwork.ozlabs.org/comment/1774900/","msgid":"<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>","list_archive_url":null,"date":"2017-09-25T18:10:00","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":971,"url":"http://patchwork.ozlabs.org/api/people/971/","name":"Andy Gospodarek","email":"andy@greyhouse.net"},"content":"On Mon, Sep 25, 2017 at 02:25:51AM +0200, Daniel Borkmann wrote:\n> This work enables generic transfer of metadata from XDP into skb. The\n> basic idea is that we can make use of the fact that the resulting skb\n> must be linear and already comes with a larger headroom for supporting\n> bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work\n> on a similar principle and introduce a small helper bpf_xdp_adjust_meta()\n> for adjusting a new pointer called xdp->data_meta. Thus, the packet has\n> a flexible and programmable room for meta data, followed by the actual\n> packet data. struct xdp_buff is therefore laid out that we first point\n> to data_hard_start, then data_meta directly prepended to data followed\n> by data_end marking the end of packet. bpf_xdp_adjust_head() takes into\n> account whether we have meta data already prepended and if so, memmove()s\n> this along with the given offset provided there's enough room.\n> \n> xdp->data_meta is optional and programs are not required to use it. The\n> rationale is that when we process the packet in XDP (e.g. as DoS filter),\n> we can push further meta data along with it for the XDP_PASS case, and\n> give the guarantee that a clsact ingress BPF program on the same device\n> can pick this up for further post-processing. Since we work with skb\n> there, we can also set skb->mark, skb->priority or other skb meta data\n> out of BPF, thus having this scratch space generic and programmable\n> allows for more flexibility than defining a direct 1:1 transfer of\n> potentially new XDP members into skb (it's also more efficient as we\n> don't need to initialize/handle each of such new members).  The facility\n> also works together with GRO aggregation. The scratch space at the head\n> of the packet can be multiple of 4 byte up to 32 byte large. Drivers not\n> yet supporting xdp->data_meta can simply be set up with xdp->data_meta\n> as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,\n> such that the subsequent match against xdp->data for later access is\n> guaranteed to fail.\n> \n> The verifier treats xdp->data_meta/xdp->data the same way as we treat\n> xdp->data/xdp->data_end pointer comparisons. The requirement for doing\n> the compare against xdp->data is that it hasn't been modified from it's\n> original address we got from ctx access. It may have a range marking\n> already from prior successful xdp->data/xdp->data_end pointer comparisons\n> though.\n\nFirst, thanks for this detailed description.  It was helpful to read\nalong with the patches.\n\nMy only concern about this area being generic is that you are now in a\nstate where any bpf program must know about all the bpf programs in the\nreceive pipeline before it can properly parse what is stored in the\nmeta-data and add it to an skb (or perform any other action).\nEspecially if each program adds it's own meta-data along the way.\n\nMaybe this isn't a big concern based on the number of users of this\ntoday, but it just starts to seem like a concern as there are these\nhints being passed between layers that are challenging to track due to a\nlack of a standard format for passing data between.\n\nThe main reason I bring this up is that Michael and I had discussed and\ndesigned a way for drivers to communicate between each other that rx\nresources could be freed after a tx completion on an XDP_REDIRECT\naction.  Much like this code, it involved adding an new element to\nstruct xdp_md that could point to the important information.  Now that\nthere is a generic way to handle this, it would seem nice to be able to\nleverage it, but I'm not sure how reliable this meta-data area would be\nwithout the ability to mark it in some manner.\n\nFor additional background, the minimum amount of data needed in the case\nMichael and I were discussing was really 2 words.  One to serve as a\npointer to an rx_ring structure and one to have a counter to the rx\nproducer entry.  This data could be acessed by the driver processing the\ntx completions and callback to the driver that received the frame off the wire\nto perform any needed processing.  (For those curious this would also require a\nnew callback/netdev op to act on this data stored in the XDP buffer.)\n\nIIUC, I could use this meta_data area to store this information, but would it\nalso be useful to create some type field/marker that could also be stored in\nthe meta_data to indicate what type of information is there?  I hate to propose\nsuch a thing as it may add unneeded complexity, but I just wanted to make sure\nto say something before it was too late as there may be more users of this\nright away.  :-)\n\n> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>\n> Acked-by: Alexei Starovoitov <ast@kernel.org>\n> Acked-by: John Fastabend <john.fastabend@gmail.com>\n> ---\n>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c      |   1 +\n>  drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   1 +\n>  drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   1 +\n>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   1 +\n>  drivers/net/ethernet/mellanox/mlx4/en_rx.c         |   1 +\n>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   1 +\n>  .../net/ethernet/netronome/nfp/nfp_net_common.c    |   1 +\n>  drivers/net/ethernet/qlogic/qede/qede_fp.c         |   1 +\n>  drivers/net/tun.c                                  |   1 +\n>  drivers/net/virtio_net.c                           |   2 +\n>  include/linux/bpf.h                                |   1 +\n>  include/linux/filter.h                             |  21 +++-\n>  include/linux/skbuff.h                             |  68 +++++++++++-\n>  include/uapi/linux/bpf.h                           |  13 ++-\n>  kernel/bpf/verifier.c                              | 114 ++++++++++++++++-----\n>  net/bpf/test_run.c                                 |   1 +\n>  net/core/dev.c                                     |  31 +++++-\n>  net/core/filter.c                                  |  77 +++++++++++++-\n>  net/core/skbuff.c                                  |   2 +\n>  19 files changed, 297 insertions(+), 42 deletions(-)\n[...]","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=greyhouse-net.20150623.gappssmtp.com\n\theader.i=@greyhouse-net.20150623.gappssmtp.com\n\theader.b=\"n93zFtUR\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y1BvS343Tz9tX8\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 26 Sep 2017 04:10:12 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S964952AbdIYSKK (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tMon, 25 Sep 2017 14:10:10 -0400","from mail-qk0-f170.google.com ([209.85.220.170]:49506 \"EHLO\n\tmail-qk0-f170.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S935062AbdIYSKH (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Mon, 25 Sep 2017 14:10:07 -0400","by mail-qk0-f170.google.com with SMTP id u67so7555658qkg.6\n\tfor <netdev@vger.kernel.org>; Mon, 25 Sep 2017 11:10:06 -0700 (PDT)","from C02RW35GFVH8.dhcp.broadcom.net ([192.19.231.250])\n\tby smtp.gmail.com with ESMTPSA id\n\te18sm5407295qtc.59.2017.09.25.11.10.04\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tMon, 25 Sep 2017 11:10:05 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=greyhouse-net.20150623.gappssmtp.com; s=20150623;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:in-reply-to:user-agent;\n\tbh=WsJTWdakfkw+eJZScMdduyGuVqd7ANXWbvS/zenyn4A=;\n\tb=n93zFtURbaiU9cOLAM0Psl4USQTr23FxdwCb7QCYRM/0b7TxA6oPYbjAbx+XzMqMpT\n\tlSFi3OztyGZyj9/QhT8WhKmisuviXHxnFhfEqXa2BFiUb5W9d3tU8wtYID/H1KaEBoDl\n\t0YGcRv5972CF7IdsZ6dv8Q2awUiuuhP0QnBkA4kY+CRS4sHYw43lAvvr+/fzQiOe+z72\n\tKznh95DjVxWm58suHSHkOlgfUCtr22i2mn73I6FWuBl+k/jTk2v1XJqDPowPw/V16IJ5\n\tojI9SkRzfhzgSMmTunvFyTmiwTtZlVp9IZ2leN7q314VLfYua56hLPnffNgyqsgTCA5v\n\t8mTA==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:in-reply-to:user-agent;\n\tbh=WsJTWdakfkw+eJZScMdduyGuVqd7ANXWbvS/zenyn4A=;\n\tb=KhzmSBxnj7fzGuETFZizt3GBspjfKa1vZe8Qwtrkq5osHvw324iVvedymnHNoiuwPq\n\tsMXY15jj3hTLufNFM3+3IIvG3R8sMy9S4m95QuBl6jr1FgHXJSxFUP6XEUoPu6r4mQpa\n\tNIHPG0/2CHLxsb0DG1gVxm0pfHrQrMBBpJt/7RSPlVZ8/Oq7phFda/gWtdk1ydTMGl4a\n\teFef8G9nrHwGTRb3v96p73F4BpEIleDtj0Jn5MmmCwVHhNlHpmxlPc0dK7cE3Vf2Lmki\n\t6nLGbbHnC/UgPZfXSFKkcYb8rHvCiEnG6Z9Qr9p8VdKnmCzYC5Hi3FjERaRYQ/GGmunC\n\tZwIg==","X-Gm-Message-State":"AHPjjUj/osp9x/wWuo9qWsILsS/3RBh1tS+F43hyE4qDjj1xPObMIw6u\n\trDywlZKaf6rOGbLzGFca8x7qYw==","X-Google-Smtp-Source":"AOwi7QAdrWbx39nOWP/y8fyJ3NlPuKxk/OxNqGicBRdOFnsSuu8uyA1EtemkPzIC3RMnwAMQx0OEPg==","X-Received":"by 10.55.49.67 with SMTP id x64mr1256294qkx.138.1506363006161;\n\tMon, 25 Sep 2017 11:10:06 -0700 (PDT)","Date":"Mon, 25 Sep 2017 14:10:00 -0400","From":"Andy Gospodarek <andy@greyhouse.net>","To":"Daniel Borkmann <daniel@iogearbox.net>","Cc":"davem@davemloft.net, alexei.starovoitov@gmail.com,\n\tjohn.fastabend@gmail.com, peter.waskiewicz.jr@intel.com,\n\tjakub.kicinski@netronome.com, netdev@vger.kernel.org, mchan@broadcom.com","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Message-ID":"<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>","User-Agent":"Mutt/1.8.0 (2017-02-23)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1774926,"web_url":"http://patchwork.ozlabs.org/comment/1774926/","msgid":"<59C94FF4.8070900@iogearbox.net>","list_archive_url":null,"date":"2017-09-25T18:50:28","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":65705,"url":"http://patchwork.ozlabs.org/api/people/65705/","name":"Daniel Borkmann","email":"daniel@iogearbox.net"},"content":"On 09/25/2017 08:10 PM, Andy Gospodarek wrote:\n[...]\n> First, thanks for this detailed description.  It was helpful to read\n> along with the patches.\n>\n> My only concern about this area being generic is that you are now in a\n> state where any bpf program must know about all the bpf programs in the\n> receive pipeline before it can properly parse what is stored in the\n> meta-data and add it to an skb (or perform any other action).\n> Especially if each program adds it's own meta-data along the way.\n>\n> Maybe this isn't a big concern based on the number of users of this\n> today, but it just starts to seem like a concern as there are these\n> hints being passed between layers that are challenging to track due to a\n> lack of a standard format for passing data between.\n\nBtw, we do have similar kind of programmable scratch buffer also today\nwrt skb cb[] that you can program from tc side, the perf ring buffer,\nwhich doesn't have any fixed layout for the slots, or a per-cpu map\nwhere you can transfer data between tail calls for example, then tail\ncalls themselves that need to coordinate, or simply mangling of packets\nitself if you will, but more below to your use case ...\n\n> The main reason I bring this up is that Michael and I had discussed and\n> designed a way for drivers to communicate between each other that rx\n> resources could be freed after a tx completion on an XDP_REDIRECT\n> action.  Much like this code, it involved adding an new element to\n> struct xdp_md that could point to the important information.  Now that\n> there is a generic way to handle this, it would seem nice to be able to\n> leverage it, but I'm not sure how reliable this meta-data area would be\n> without the ability to mark it in some manner.\n>\n> For additional background, the minimum amount of data needed in the case\n> Michael and I were discussing was really 2 words.  One to serve as a\n> pointer to an rx_ring structure and one to have a counter to the rx\n> producer entry.  This data could be acessed by the driver processing the\n> tx completions and callback to the driver that received the frame off the wire\n> to perform any needed processing.  (For those curious this would also require a\n> new callback/netdev op to act on this data stored in the XDP buffer.)\n\nWhat you describe above doesn't seem to be fitting to the use-case of\nthis set, meaning the area here is fully programmable out of the BPF\nprogram, the infrastructure you're describing is some sort of means of\ncommunication between drivers for the XDP_REDIRECT, and should be\noutside of the control of the BPF program to mangle.\n\nYou could probably reuse the base infra here and make a part of that\ninaccessible for the program with some sort of a fixed layout, but I\nhaven't seen your code yet to be able to fully judge. Intention here\nis to allow for programmability within the BPF prog in a generic way,\nsuch that based on the use-case it can be populated in specific ways\nand propagated to the skb w/o having to define a fixed layout and\nbloat xdp_buff all the way to an skb while still retaining all the\nflexibility.\n\nThanks,\nDaniel","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y1CpB3wRzz9tX8\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 26 Sep 2017 04:50:42 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1754194AbdIYSuj (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tMon, 25 Sep 2017 14:50:39 -0400","from www62.your-server.de ([213.133.104.62]:60882 \"EHLO\n\twww62.your-server.de\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1754157AbdIYSui (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Mon, 25 Sep 2017 14:50:38 -0400","from [85.7.161.218] (helo=localhost.localdomain)\n\tby www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-SHA:256)\n\t(Exim 4.85_2) (envelope-from <daniel@iogearbox.net>)\n\tid 1dwYSb-0001rR-Qj; Mon, 25 Sep 2017 20:50:29 +0200"],"Message-ID":"<59C94FF4.8070900@iogearbox.net>","Date":"Mon, 25 Sep 2017 20:50:28 +0200","From":"Daniel Borkmann <daniel@iogearbox.net>","User-Agent":"Mozilla/5.0 (X11; Linux x86_64;\n\trv:31.0) Gecko/20100101 Thunderbird/31.7.0","MIME-Version":"1.0","To":"Andy Gospodarek <andy@greyhouse.net>","CC":"davem@davemloft.net, alexei.starovoitov@gmail.com,\n\tjohn.fastabend@gmail.com, peter.waskiewicz.jr@intel.com,\n\tjakub.kicinski@netronome.com, netdev@vger.kernel.org, mchan@broadcom.com","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>","In-Reply-To":"<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>","Content-Type":"text/plain; charset=windows-1252; format=flowed","Content-Transfer-Encoding":"7bit","X-Authenticated-Sender":"daniel@iogearbox.net","X-Virus-Scanned":"Clear (ClamAV 0.99.2/23872/Mon Sep 25 18:43:22 2017)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1774950,"web_url":"http://patchwork.ozlabs.org/comment/1774950/","msgid":"<dea162f5-62e0-85d1-2fef-44fbffb9f860@gmail.com>","list_archive_url":null,"date":"2017-09-25T19:47:06","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":20028,"url":"http://patchwork.ozlabs.org/api/people/20028/","name":"John Fastabend","email":"john.fastabend@gmail.com"},"content":"On 09/25/2017 11:50 AM, Daniel Borkmann wrote:\n> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:\n> [...]\n>> First, thanks for this detailed description.  It was helpful to read\n>> along with the patches.\n>>\n>> My only concern about this area being generic is that you are now in a\n>> state where any bpf program must know about all the bpf programs in the\n>> receive pipeline before it can properly parse what is stored in the\n>> meta-data and add it to an skb (or perform any other action).\n>> Especially if each program adds it's own meta-data along the way.\n>>\n>> Maybe this isn't a big concern based on the number of users of this\n>> today, but it just starts to seem like a concern as there are these\n>> hints being passed between layers that are challenging to track due to a\n>> lack of a standard format for passing data between.\n> \n> Btw, we do have similar kind of programmable scratch buffer also today\n> wrt skb cb[] that you can program from tc side, the perf ring buffer,\n> which doesn't have any fixed layout for the slots, or a per-cpu map\n> where you can transfer data between tail calls for example, then tail\n> calls themselves that need to coordinate, or simply mangling of packets\n> itself if you will, but more below to your use case ...\n> \n>> The main reason I bring this up is that Michael and I had discussed and\n>> designed a way for drivers to communicate between each other that rx\n>> resources could be freed after a tx completion on an XDP_REDIRECT\n>> action.  Much like this code, it involved adding an new element to\n>> struct xdp_md that could point to the important information.  Now that\n>> there is a generic way to handle this, it would seem nice to be able to\n>> leverage it, but I'm not sure how reliable this meta-data area would be\n>> without the ability to mark it in some manner.\n>>\n>> For additional background, the minimum amount of data needed in the case\n>> Michael and I were discussing was really 2 words.  One to serve as a\n>> pointer to an rx_ring structure and one to have a counter to the rx\n>> producer entry.  This data could be acessed by the driver processing the\n>> tx completions and callback to the driver that received the frame off\n>> the wire\n>> to perform any needed processing.  (For those curious this would also\n>> require a\n>> new callback/netdev op to act on this data stored in the XDP buffer.)\n> \n> What you describe above doesn't seem to be fitting to the use-case of\n> this set, meaning the area here is fully programmable out of the BPF\n> program, the infrastructure you're describing is some sort of means of\n> communication between drivers for the XDP_REDIRECT, and should be\n> outside of the control of the BPF program to mangle.\n> \n> You could probably reuse the base infra here and make a part of that\n> inaccessible for the program with some sort of a fixed layout, but I\n> haven't seen your code yet to be able to fully judge. Intention here\n> is to allow for programmability within the BPF prog in a generic way,\n> such that based on the use-case it can be populated in specific ways\n> and propagated to the skb w/o having to define a fixed layout and\n> bloat xdp_buff all the way to an skb while still retaining all the\n> flexibility.\n> \n> Thanks,\n> Daniel\n\nHi Andy,\n\nI'm guessing this data needs to be passed from the input dev to the\noutput dev based on your description. If the driver data\nis pushed after the BPF program is run but before the xdp_do_flush_map\ncall no other BPF programs can be run on that xdp_buff. It\nshould be safe at this point to use the metadata region directly\nfrom the driver. We would just need to add a few helpers for the\ndrivers to use for this maybe, xdp_metadata_write_drv,\nxdp_meadata_read_drv. I think this would work for your use case?\nThe data structure would have to be agreed upon by all the drivers\nbut would not be UAPI because it would only be exposed in the\ndriver. So we would be free to change/update it as needed.\n\nThanks,\nJohn","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"pFXzYLJc\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y1F3g4Tsmz9s76\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 26 Sep 2017 05:47:27 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S936457AbdIYTrZ (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tMon, 25 Sep 2017 15:47:25 -0400","from mail-pf0-f171.google.com ([209.85.192.171]:52075 \"EHLO\n\tmail-pf0-f171.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S934788AbdIYTrX (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Mon, 25 Sep 2017 15:47:23 -0400","by mail-pf0-f171.google.com with SMTP id b70so4322837pfl.8\n\tfor <netdev@vger.kernel.org>; Mon, 25 Sep 2017 12:47:23 -0700 (PDT)","from [192.168.86.24] ([72.168.144.40])\n\tby smtp.gmail.com with ESMTPSA id\n\tz30sm13329633pfg.54.2017.09.25.12.47.13\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tMon, 25 Sep 2017 12:47:22 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=subject:to:cc:references:from:message-id:date:user-agent\n\t:mime-version:in-reply-to:content-language:content-transfer-encoding; \n\tbh=oj9AMwOgUm+Y8Ansv+lkngqVpwR22tKoMl2EjUgvFRQ=;\n\tb=pFXzYLJcJZhDLjdxliNh2zDjbCTs2XC12igtmQWhjavzScURig1lj9hIWREiJB+8Oo\n\t0WOaKx5FeWjk4C2w4dqP6mgROGkUNHGeRnHgeT0Nv5xVv/t44R2a9EbUlk+kyyU9eWGY\n\t7ymKbutSEO8ljcccTMEjeIu5Z8pgZMYo4VN4/asXykPpj1sWHCLpgp4SZSOQ9oYF9CJ6\n\tkx+e09phJyWTChlr3i7KJwRfhzpPi2/Rwfm+tUL6QpahyCwUercvd1B4Qws6hruCsQ2M\n\tzYFOL63SX5a9EcHUxyAS7YAhujCgBomV82we4+NGDYngHURVVknA044xWvAMTB4oX9pB\n\t3sxw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:subject:to:cc:references:from:message-id:date\n\t:user-agent:mime-version:in-reply-to:content-language\n\t:content-transfer-encoding;\n\tbh=oj9AMwOgUm+Y8Ansv+lkngqVpwR22tKoMl2EjUgvFRQ=;\n\tb=Ccf9+c9G4l4k10LTeYUNzxlXGI8g8NHRGVArAQRnCeLhiFonKGlpvrh+HDMXnW1juS\n\tv15GZrJIEWzSsN+tbzJCa+N7VNG25FBdLZEO6wKkYAbfEWr9wD2O5z7i1TC8zXap/kpB\n\tNMHYk99+jmwDhDvR1Imp30QpjXkWGgsYRU/ZFrR6xrN0TXjLSj4SXEi1DCFFfhzRLnr5\n\tXfGHkcKV5qJXJgP7G2i4PiyoPJ+CDRZkBZ80NUPEG3JUOTEb3i1dVHKBuNs2uZskn4I5\n\tywDqTB9g3rQ6X9kaH3x94dKCbpkjyh1DQyEHtEhfNvmM2aLvvFWRTHu6V8XRTFMr2xXN\n\tmV8A==","X-Gm-Message-State":"AHPjjUiUJLetfgO+Po0q3IJC/k5tjgIEzowxscG4Agcv02sRMAb82ma2\n\tys3vO04PsC0dfhwwfanzYfBa2mM8","X-Google-Smtp-Source":"AOwi7QBE0xZ5aPSoiimiGlGlW+hlsBywr8vMgnk/6fPf7snB5hvYC6iFN1Y1Hg0Nrjtj02x7HDGplg==","X-Received":"by 10.84.133.164 with SMTP id f33mr8471950plf.73.1506368843105; \n\tMon, 25 Sep 2017 12:47:23 -0700 (PDT)","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","To":"Daniel Borkmann <daniel@iogearbox.net>,\n\tAndy Gospodarek <andy@greyhouse.net>","Cc":"davem@davemloft.net, alexei.starovoitov@gmail.com,\n\tpeter.waskiewicz.jr@intel.com, jakub.kicinski@netronome.com,\n\tnetdev@vger.kernel.org, mchan@broadcom.com","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>","From":"John Fastabend <john.fastabend@gmail.com>","Message-ID":"<dea162f5-62e0-85d1-2fef-44fbffb9f860@gmail.com>","Date":"Mon, 25 Sep 2017 12:47:06 -0700","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<59C94FF4.8070900@iogearbox.net>","Content-Type":"text/plain; charset=windows-1252","Content-Language":"en-US","Content-Transfer-Encoding":"8bit","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1775715,"web_url":"http://patchwork.ozlabs.org/comment/1775715/","msgid":"<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>","list_archive_url":null,"date":"2017-09-26T17:21:40","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":971,"url":"http://patchwork.ozlabs.org/api/people/971/","name":"Andy Gospodarek","email":"andy@greyhouse.net"},"content":"On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:\n> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:\n> [...]\n> > First, thanks for this detailed description.  It was helpful to read\n> > along with the patches.\n> > \n> > My only concern about this area being generic is that you are now in a\n> > state where any bpf program must know about all the bpf programs in the\n> > receive pipeline before it can properly parse what is stored in the\n> > meta-data and add it to an skb (or perform any other action).\n> > Especially if each program adds it's own meta-data along the way.\n> > \n> > Maybe this isn't a big concern based on the number of users of this\n> > today, but it just starts to seem like a concern as there are these\n> > hints being passed between layers that are challenging to track due to a\n> > lack of a standard format for passing data between.\n> \n> Btw, we do have similar kind of programmable scratch buffer also today\n> wrt skb cb[] that you can program from tc side, the perf ring buffer,\n> which doesn't have any fixed layout for the slots, or a per-cpu map\n> where you can transfer data between tail calls for example, then tail\n> calls themselves that need to coordinate, or simply mangling of packets\n> itself if you will, but more below to your use case ...\n> \n> > The main reason I bring this up is that Michael and I had discussed and\n> > designed a way for drivers to communicate between each other that rx\n> > resources could be freed after a tx completion on an XDP_REDIRECT\n> > action.  Much like this code, it involved adding an new element to\n> > struct xdp_md that could point to the important information.  Now that\n> > there is a generic way to handle this, it would seem nice to be able to\n> > leverage it, but I'm not sure how reliable this meta-data area would be\n> > without the ability to mark it in some manner.\n> > \n> > For additional background, the minimum amount of data needed in the case\n> > Michael and I were discussing was really 2 words.  One to serve as a\n> > pointer to an rx_ring structure and one to have a counter to the rx\n> > producer entry.  This data could be acessed by the driver processing the\n> > tx completions and callback to the driver that received the frame off the wire\n> > to perform any needed processing.  (For those curious this would also require a\n> > new callback/netdev op to act on this data stored in the XDP buffer.)\n> \n> What you describe above doesn't seem to be fitting to the use-case of\n> this set, meaning the area here is fully programmable out of the BPF\n> program, the infrastructure you're describing is some sort of means of\n> communication between drivers for the XDP_REDIRECT, and should be\n> outside of the control of the BPF program to mangle.\n\nOK, I understand that perspective.  I think saying this is really meant\nas a BPF<->BPF communication channel for now is fine.\n\n> You could probably reuse the base infra here and make a part of that\n> inaccessible for the program with some sort of a fixed layout, but I\n> haven't seen your code yet to be able to fully judge. Intention here\n> is to allow for programmability within the BPF prog in a generic way,\n> such that based on the use-case it can be populated in specific ways\n> and propagated to the skb w/o having to define a fixed layout and\n> bloat xdp_buff all the way to an skb while still retaining all the\n> flexibility.\n\nSome level of reuse might be proper, but I'd rather it be explicit for\nmy use since it's not exclusively something that will need to be used by\na BPF prog, but rather the driver.  I'll produce some patches this week\nfor reference.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=greyhouse-net.20150623.gappssmtp.com\n\theader.i=@greyhouse-net.20150623.gappssmtp.com\n\theader.b=\"qIlNTsgp\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y1nnB6w9Hz9s83\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 27 Sep 2017 03:21:50 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S968734AbdIZRVs (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 26 Sep 2017 13:21:48 -0400","from mail-qk0-f172.google.com ([209.85.220.172]:44939 \"EHLO\n\tmail-qk0-f172.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S965047AbdIZRVq (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Tue, 26 Sep 2017 13:21:46 -0400","by mail-qk0-f172.google.com with SMTP id b23so10807004qkg.1\n\tfor <netdev@vger.kernel.org>; Tue, 26 Sep 2017 10:21:46 -0700 (PDT)","from C02RW35GFVH8.dhcp.broadcom.net ([192.19.231.250])\n\tby smtp.gmail.com with ESMTPSA id\n\ts90sm7162162qkl.81.2017.09.26.10.21.44\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tTue, 26 Sep 2017 10:21:45 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=greyhouse-net.20150623.gappssmtp.com; s=20150623;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:in-reply-to:user-agent;\n\tbh=qLGxha+EDXOKYT98RS8b9UZvr+5hfTnfHigLplOGnwE=;\n\tb=qIlNTsgpb4yia6ArtIH2Tt4dc7FM0An+466HXeVmGIwnBnMcjPKUxPbJtiU1sS6jKJ\n\tQS21Rl3+jxWtNPqaB4XpJH76fYhjKXNNd8N3KX2b71GpiCPr+849fkSUR5B0yFtO57e2\n\t8Troihb8HCr7H5kq3H9Zg9AT/DZ2Jn7+eTXhuFXDuIbq9zSMsJ7qRNBV8GLJ4hbPcMjB\n\tqmxCZoSwiYvjQlQVbMg5Siu2AsuOxM6EMU/djltrW2PAG+dX40/1U8vSJp5ceAP76aLr\n\tPg6S5SxSRtI0QyDTegDV4Jwjme2i968sSNn49EFFzlnHOffor+BaD1qCAUoOxu/EhYKO\n\tgnBw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:in-reply-to:user-agent;\n\tbh=qLGxha+EDXOKYT98RS8b9UZvr+5hfTnfHigLplOGnwE=;\n\tb=YugnGv50qO3NVKTSJmv93yNkcLgkJyWj7w3yHUNOjZAw/fjjzWR3WkyKXepV4GCPaa\n\t8UV7/UwEbEeoNbJlc2MluLv5MpZiXGy0z19oYCq/pBgZagQtQGcX6JARvrUek4t8V7a4\n\tlj65L0GQU+aKzZ/gsiNUPxiSZgzDt8agn1wSR1tMBoMGm2rUfAs1RAkR36K1iiL9gREU\n\tlqU8d9tk6OdkPYe2f1SbcCBn7RmFGvHBfccCYwWjInn90x+dws012+KUoH5/U570W1TT\n\tl+932Nhlr4rzfuf/GMcMLiBjR6YaUOUCPOytxaTANLPUv5tA/ajR2D/sPit3sqffwz/n\n\tAT7w==","X-Gm-Message-State":"AHPjjUgCbKF8RYTMTHCk5hMV/YHHwm2SZ6Cx8N8bi+1BfWIv8+6JAxHt\n\tdRHJta7gir5+HmxhH9vsYWIFww==","X-Google-Smtp-Source":"AOwi7QBwd0LIATz/7/nnLxt1v8+nGzZgpSmdt4xYIyTY6oOEy5lbRB2/wplVpQorFRfpo9pp+mR1pw==","X-Received":"by 10.55.104.140 with SMTP id\n\td134mr16488100qkc.133.1506446505799; \n\tTue, 26 Sep 2017 10:21:45 -0700 (PDT)","Date":"Tue, 26 Sep 2017 13:21:40 -0400","From":"Andy Gospodarek <andy@greyhouse.net>","To":"Daniel Borkmann <daniel@iogearbox.net>","Cc":"davem@davemloft.net, alexei.starovoitov@gmail.com,\n\tjohn.fastabend@gmail.com, peter.waskiewicz.jr@intel.com,\n\tjakub.kicinski@netronome.com, netdev@vger.kernel.org, mchan@broadcom.com","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Message-ID":"<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<59C94FF4.8070900@iogearbox.net>","User-Agent":"Mutt/1.8.0 (2017-02-23)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1775793,"web_url":"http://patchwork.ozlabs.org/comment/1775793/","msgid":"<20170926211342.0c8e72b0@redhat.com>","list_archive_url":null,"date":"2017-09-26T19:13:42","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"content":"On Mon, 25 Sep 2017 02:25:51 +0200\nDaniel Borkmann <daniel@iogearbox.net> wrote:\n\n> This work enables generic transfer of metadata from XDP into skb. The\n> basic idea is that we can make use of the fact that the resulting skb\n> must be linear and already comes with a larger headroom for supporting\n> bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work\n> on a similar principle and introduce a small helper bpf_xdp_adjust_meta()\n> for adjusting a new pointer called xdp->data_meta. Thus, the packet has\n> a flexible and programmable room for meta data, followed by the actual\n> packet data. struct xdp_buff is therefore laid out that we first point\n> to data_hard_start, then data_meta directly prepended to data followed\n> by data_end marking the end of packet. bpf_xdp_adjust_head() takes into\n> account whether we have meta data already prepended and if so, memmove()s\n> this along with the given offset provided there's enough room.\n> \n> [...] The scratch space at the head\n> of the packet can be multiple of 4 byte up to 32 byte large. Drivers not\n> yet supporting xdp->data_meta can simply be set up with xdp->data_meta\n> as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,\n> such that the subsequent match against xdp->data for later access is\n> guaranteed to fail.\n\nSo, xdp->meta_data is placed just before the packet xdp->data starts.\n\nI'm currently implementing a cpumap type, that transfers raw XDP frames\nto another CPU, and the SKB is allocated on the remote CPU.  (It\nactually works extremely well).  \n\nFor transferring info I need, I'm currently using xdp->data_hard_start\n(the top/start of the xdp page).  Which should be compatible with your\napproach, right?\n\nThe info I need:\n\n struct xdp_pkt {\n\tvoid *data;\n\tu16 len;\n\tu16 headroom;\n\tstruct net_device *dev_rx;\n };\n\nWhen I enqueue the xdp packet I do the following:\n\n int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,\n\tstruct net_device *dev_rx)\n {\n\tstruct xdp_pkt *xdp_pkt;\n\tint headroom;\n\n\t/* Convert xdp_buff to xdp_pkt */\n\theadroom = xdp->data - xdp->data_hard_start;\n\tif (headroom < sizeof(*xdp_pkt))\n\t\treturn -EOVERFLOW;\n\txdp_pkt = xdp->data_hard_start;\n\txdp_pkt->data = xdp->data;\n\txdp_pkt->len  = xdp->data_end - xdp->data;\n\txdp_pkt->headroom = headroom - sizeof(*xdp_pkt);\n\n\t/* Info needed when constructing SKB on remote CPU */\n\txdp_pkt->dev_rx = dev_rx;\n\n\tbq_enqueue(rcpu, xdp_pkt);\n\treturn 0;\n }\n\nOn the remote CPU dequeueing the packet, I'm doing the following.  As\nyou can see I'm still lacking some meta-data, that would be nice to\nalso transfer.  Could I use your infrastructure for that?\n\n static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,\n\t\t\t\t\t  struct xdp_pkt *xdp_pkt)\n {\n\tunsigned int truesize;\n\tvoid *pkt_data_start;\n\tstruct sk_buff *skb;\n\n\t/* TODO: rcpu could provide truesize, it's static per RX-ring */\n\ttruesize = 2048;\n\n\t// pkt_data_start = xdp_pkt + sizeof(*xdp_pkt);\n\tpkt_data_start = xdp_pkt->data - xdp_pkt->headroom;\n\n\t/* Need to adjust \"truesize\" for skb_shared_info to get proper\n\t * placed, to take into account that xdp_pkt is using part of\n\t * headroom\n\t */\n\tskb = build_skb(pkt_data_start, truesize - sizeof(*xdp_pkt));\n\tif (!skb)\n\t\treturn NULL;\n\n\tskb_reserve(skb, xdp_pkt->headroom);\n\t__skb_put(skb, xdp_pkt->len);\n\n\t// skb_record_rx_queue(skb, rx_ring->queue_index);\n\tskb->protocol = eth_type_trans(skb, xdp_pkt->dev_rx);\n\n\t// How much does csum matter? \n //\tskb->ip_summed = CHECKSUM_UNNECESSARY; // Try to fake it...\n\n\t// Does setting skb_set_hash()) matter?\n //\t__skb_set_hash(skb, 42, true, false); // Say it is software\n //\t__skb_set_hash(skb, 42, false, true); // Say it is hardware\n\n\t// Do we lack setting rx_queue... it doesn't seem to matter\n //\tskb_record_rx_queue(skb, 0);\n\n\treturn skb;\n }\n\n(I'll send out some patches soonish, hopefully tomorrow... to show in\nmore details what I'm doing)","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y1rGV4x9Gz9t2Q\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 27 Sep 2017 05:13:54 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S936829AbdIZTNw (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 26 Sep 2017 15:13:52 -0400","from mx1.redhat.com ([209.132.183.28]:46226 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S935538AbdIZTNu (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tTue, 26 Sep 2017 15:13:50 -0400","from smtp.corp.redhat.com\n\t(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id AE545C047B66;\n\tTue, 26 Sep 2017 19:13:50 +0000 (UTC)","from localhost (ovpn-200-30.brq.redhat.com [10.40.200.30])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 32771189B4;\n\tTue, 26 Sep 2017 19:13:44 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com AE545C047B66","Date":"Tue, 26 Sep 2017 21:13:42 +0200","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"Daniel Borkmann <daniel@iogearbox.net>","Cc":"brouer@redhat.com, davem@davemloft.net,\n\talexei.starovoitov@gmail.com, john.fastabend@gmail.com,\n\tpeter.waskiewicz.jr@intel.com, jakub.kicinski@netronome.com,\n\tnetdev@vger.kernel.org, Andy Gospodarek <andy@greyhouse.net>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Message-ID":"<20170926211342.0c8e72b0@redhat.com>","In-Reply-To":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>","References":"<cover.1506297988.git.daniel@iogearbox.net>\n\t<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.11","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.31]);\n\tTue, 26 Sep 2017 19:13:50 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1775823,"web_url":"http://patchwork.ozlabs.org/comment/1775823/","msgid":"<59CAB17D.5090204@iogearbox.net>","list_archive_url":null,"date":"2017-09-26T19:58:53","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":65705,"url":"http://patchwork.ozlabs.org/api/people/65705/","name":"Daniel Borkmann","email":"daniel@iogearbox.net"},"content":"On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:\n[...]\n> I'm currently implementing a cpumap type, that transfers raw XDP frames\n> to another CPU, and the SKB is allocated on the remote CPU.  (It\n> actually works extremely well).\n\nMeaning you let all the XDP_PASS packets get processed on a\ndifferent CPU, so you can reserve the whole CPU just for\nprefiltering, right? Do you have some numbers to share at\nthis point, just curious when you mention it works extremely\nwell.\n\n> For transferring info I need, I'm currently using xdp->data_hard_start\n> (the top/start of the xdp page).  Which should be compatible with your\n> approach, right?\n\nShould be possible, yes. More below.\n\n> The info I need:\n>\n>   struct xdp_pkt {\n> \tvoid *data;\n> \tu16 len;\n> \tu16 headroom;\n> \tstruct net_device *dev_rx;\n>   };\n>\n> When I enqueue the xdp packet I do the following:\n>\n>   int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,\n> \tstruct net_device *dev_rx)\n>   {\n> \tstruct xdp_pkt *xdp_pkt;\n> \tint headroom;\n>\n> \t/* Convert xdp_buff to xdp_pkt */\n> \theadroom = xdp->data - xdp->data_hard_start;\n> \tif (headroom < sizeof(*xdp_pkt))\n> \t\treturn -EOVERFLOW;\n> \txdp_pkt = xdp->data_hard_start;\n> \txdp_pkt->data = xdp->data;\n> \txdp_pkt->len  = xdp->data_end - xdp->data;\n> \txdp_pkt->headroom = headroom - sizeof(*xdp_pkt);\n>\n> \t/* Info needed when constructing SKB on remote CPU */\n> \txdp_pkt->dev_rx = dev_rx;\n>\n> \tbq_enqueue(rcpu, xdp_pkt);\n> \treturn 0;\n>   }\n>\n> On the remote CPU dequeueing the packet, I'm doing the following.  As\n> you can see I'm still lacking some meta-data, that would be nice to\n> also transfer.  Could I use your infrastructure for that?\n\nThere could be multiple options to use it, in case you have a\nhelper where you look up the CPU in the map and would also store\nthe meta data, you could use a per-CPU scratch buffer similarly\nas we do with struct redirect_info, and move that later e.g.\nafter program return into xdp->data_hard_start pointer. You\ncould also reserve that upfront potentially, so it's hidden from\nthe beginning from the program unless you want the program itself\nto fill it out (modulo the pointers). Not all drivers currently\nleave room though, I've also seen where xdp->data_hard_start\npoints directly to xdp->data, so there's 0 headroom available to\nuse; in such case it could either be treated as a hint and for\nthose drivers where they just pass the skb up the current CPU or\nyou would need some other means to move the meta data to the\nremote CPU, or potentially just use tail room.\n\nThanks,\nDaniel","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y1sGc6SNhz9t3F\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 27 Sep 2017 05:59:04 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S968556AbdIZT7C (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 26 Sep 2017 15:59:02 -0400","from www62.your-server.de ([213.133.104.62]:37956 \"EHLO\n\twww62.your-server.de\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S965209AbdIZT7B (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Tue, 26 Sep 2017 15:59:01 -0400","from [85.7.161.218] (helo=localhost.localdomain)\n\tby www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-SHA:256)\n\t(Exim 4.85_2) (envelope-from <daniel@iogearbox.net>)\n\tid 1dww0M-00059d-Gu; Tue, 26 Sep 2017 21:58:54 +0200"],"Message-ID":"<59CAB17D.5090204@iogearbox.net>","Date":"Tue, 26 Sep 2017 21:58:53 +0200","From":"Daniel Borkmann <daniel@iogearbox.net>","User-Agent":"Mozilla/5.0 (X11; Linux x86_64;\n\trv:31.0) Gecko/20100101 Thunderbird/31.7.0","MIME-Version":"1.0","To":"Jesper Dangaard Brouer <brouer@redhat.com>","CC":"davem@davemloft.net, alexei.starovoitov@gmail.com,\n\tjohn.fastabend@gmail.com, peter.waskiewicz.jr@intel.com,\n\tjakub.kicinski@netronome.com, netdev@vger.kernel.org,\n\tAndy Gospodarek <andy@greyhouse.net>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","References":"<cover.1506297988.git.daniel@iogearbox.net>\n\t<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170926211342.0c8e72b0@redhat.com>","In-Reply-To":"<20170926211342.0c8e72b0@redhat.com>","Content-Type":"text/plain; charset=windows-1252; format=flowed","Content-Transfer-Encoding":"7bit","X-Authenticated-Sender":"daniel@iogearbox.net","X-Virus-Scanned":"Clear (ClamAV 0.99.2/23875/Tue Sep 26 14:44:28 2017)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776178,"web_url":"http://patchwork.ozlabs.org/comment/1776178/","msgid":"<20170927112604.1284f536@redhat.com>","list_archive_url":null,"date":"2017-09-27T09:26:04","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"content":"On Tue, 26 Sep 2017 21:58:53 +0200\nDaniel Borkmann <daniel@iogearbox.net> wrote:\n\n> On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:\n> [...]\n> > I'm currently implementing a cpumap type, that transfers raw XDP frames\n> > to another CPU, and the SKB is allocated on the remote CPU.  (It\n> > actually works extremely well).  \n> \n> Meaning you let all the XDP_PASS packets get processed on a\n> different CPU, so you can reserve the whole CPU just for\n> prefiltering, right? \n\nYes, exactly.  Except I use the XDP_REDIRECT action to steer packets.\nThe trick is using the map-flush point, to transfer packets in bulk to\nthe remote CPU (single call IPC is too slow), but at the same time\nflush single packets if NAPI didn't see a bulk.\n\n> Do you have some numbers to share at this point, just curious when\n> you mention it works extremely well.\n\nSure... I've done a lot of benchmarking on this patchset ;-)\nI have a benchmark program called xdp_redirect_cpu [1][2], that collect\nstats via tracepoints (atm I'm limiting bulking 8 packets, and have\ntracepoints at bulk spots, to amortize tracepoint cost 25ns/8=3.125ns)\n\n [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_kern.c\n [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_user.c\n\nHere I'm installing a DDoS program that drops UDP port 9 (pktgen\npackets) on RX CPU=0.  I'm forcing my netperf to hit the same CPU, that\nthe 11.9Mpps DDoS attack is hitting.\n\nRunning XDP/eBPF prog_num:4\nXDP-cpumap      CPU:to  pps            drop-pps    extra-info\nXDP-RX          0       12,030,471     11,966,982  0          \nXDP-RX          total   12,030,471     11,966,982 \ncpumap-enqueue    0:2   63,488         0           0          \ncpumap-enqueue  sum:2   63,488         0           0          \ncpumap_kthread  2       63,488         0           3          time_exceed\ncpumap_kthread  total   63,488         0           0          \nredirect_err    total   0              0          \n\n$ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -D1 -T5,5 -- -r 1024,1024\nLocal /Remote\nSocket Size   Request  Resp.   Elapsed  Trans.\nSend   Recv   Size     Size    Time     Rate         \nbytes  Bytes  bytes    bytes   secs.    per sec   \n\n16384  87380  1024     1024    10.00    12735.97   \n16384  87380 \n\nThe netperf TCP_CRR performance is the same, without XDP loaded.\n\n\n> Another test\n\nI've previously shown (and optimized) in commit c0303efeab73 (\"net:\nreduce cycles spend on ICMP replies that gets rate limited\"), that my\nsystem can handle approx 2.7Mpps for UdpNoPorts, before the network\nstack chokes.\n\nThus it is interesting to see, when I get UDP traffic that hits the\nsame CPU, if I can simply round-robin distribute it other CPUs.  This\nevaluate if the cross-CPU transfer mechanism is fast-enough.\n\nI do have to increase the ixgbe RX-ring size, else the ixgbe recycle\nscheme breaks down, and we stall on the page spin_lock (as Tariq have\ndemonstrated before).\n\n # ethtool -G ixgbe1 rx 1024 tx 1024\n\nStart RR program and add some CPUs:\n\n # ./xdp_redirect_cpu --dev ixgbe1 --prog 2 --cpu 1 --cpu 2 --cpu 3 --cpu 4\n\nRunning XDP/eBPF prog_num:2\nXDP-cpumap      CPU:to  pps            drop-pps    extra-info\nXDP-RX          0       11,006,992     0           0          \nXDP-RX          total   11,006,992     0          \ncpumap-enqueue    0:1   2,751,744      0           0          \ncpumap-enqueue  sum:1   2,751,744      0           0          \ncpumap-enqueue    0:2   2,751,748      0           0          \ncpumap-enqueue  sum:2   2,751,748      0           0          \ncpumap-enqueue    0:3   2,751,744      35          0          \ncpumap-enqueue  sum:3   2,751,744      35          0          \ncpumap-enqueue    0:4   2,751,748      0           0          \ncpumap-enqueue  sum:4   2,751,748      0           0          \ncpumap_kthread  1       2,751,745      0           156        time_exceed\ncpumap_kthread  2       2,751,749      0           142        time_exceed\ncpumap_kthread  3       2,751,713      0           131        time_exceed\ncpumap_kthread  4       2,751,749      0           128        time_exceed\ncpumap_kthread  total   11,006,957     0           0          \nredirect_err    total   0              0          \n\n$ nstat > /dev/null && sleep 1 && nstat | grep UdpNoPorts\nUdpNoPorts                      11042282           0.0\n\nThe nstat show that the Linux network stack is actually now processing,\nSKB alloc + free, 11Mpps. \n\nThe generator was sending with 14Mpps, thus the XDP-RX program is\nactually a bottleneck here. And I do see some drops on the HW level.\nThus, 1-CPU was not 100% fast-enough.\n\nThus, lets allocate two CPUs for XDP-RX:\n\nRunning XDP/eBPF prog_num:2\nXDP-cpumap      CPU:to  pps            drop-pps    extra-info\nXDP-RX          0       6,352,578      0           0          \nXDP-RX          1       6,352,711      0           0          \nXDP-RX          total   12,705,289     0          \ncpumap-enqueue    0:2   1,588,156      1,351       0          \ncpumap-enqueue    1:2   1,588,174      1,330       0          \ncpumap-enqueue  sum:2   3,176,331      2,682       0          \ncpumap-enqueue    0:3   1,588,157      994         0          \ncpumap-enqueue    1:3   1,588,170      912         0          \ncpumap-enqueue  sum:3   3,176,327      1,907       0          \ncpumap-enqueue    0:4   1,588,157      529         0          \ncpumap-enqueue    1:4   1,588,167      514         0          \ncpumap-enqueue  sum:4   3,176,324      1,044       0          \ncpumap-enqueue    0:5   1,588,159      625         0          \ncpumap-enqueue    1:5   1,588,166      614         0          \ncpumap-enqueue  sum:5   3,176,326      1,240       0          \ncpumap_kthread  2       3,173,642      0           11257      time_exceed\ncpumap_kthread  3       3,174,423      0           9779       time_exceed\ncpumap_kthread  4       3,175,283      0           3938       time_exceed\ncpumap_kthread  5       3,175,083      0           3120       time_exceed\ncpumap_kthread  total   12,698,432     0           0          (null)\nredirect_err    total   0              0          \n\nBelow, I'm using ./pktgen_sample04_many_flows.sh, and my generator\nmachine cannot generate more that 12,682,445 tx_packets /sec.\nnstat says: UdpNoPorts 12,698,001 pps.  The XDP-RX CPUs actually have\n30% idle CPU cycles, as the \"only\" handle 6.3Mpps each ;-)\n\nPerf top on a CPU(3) that have to alloc and free SKBs etc.\n\n# Overhead  CPU  Symbol                                 \n# ........  ...  .......................................\n#\n    15.51%  003  [k] fib_table_lookup\n     8.91%  003  [k] cpu_map_kthread_run\n     8.04%  003  [k] build_skb\n     7.88%  003  [k] page_frag_free\n     5.13%  003  [k] kmem_cache_alloc\n     4.76%  003  [k] ip_route_input_rcu\n     4.59%  003  [k] kmem_cache_free\n     4.02%  003  [k] __udp4_lib_rcv\n     3.20%  003  [k] fib_validate_source\n     3.02%  003  [k] __netif_receive_skb_core\n     3.02%  003  [k] udp_v4_early_demux\n     2.90%  003  [k] ip_rcv\n     2.80%  003  [k] ip_rcv_finish\n     2.26%  003  [k] eth_type_trans\n     2.23%  003  [k] __build_skb\n     2.00%  003  [k] icmp_send\n     1.84%  003  [k] __rcu_read_unlock\n     1.30%  003  [k] ip_local_deliver_finish\n     1.26%  003  [k] netif_receive_skb_internal\n     1.17%  003  [k] ip_route_input_noref\n     1.11%  003  [k] make_kuid\n     1.09%  003  [k] __udp4_lib_lookup\n     1.07%  003  [k] skb_release_head_state\n     1.04%  003  [k] __rcu_read_lock\n     0.95%  003  [k] kfree_skb\n     0.89%  003  [k] __local_bh_enable_ip\n     0.88%  003  [k] skb_release_data\n     0.71%  003  [k] ip_local_deliver\n     0.58%  003  [k] netif_receive_skb\n\ncmdline:\n perf report --sort cpu,symbol --kallsyms=/proc/kallsyms  --no-children  -C3 -g none --stdio","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2CB83p99z9tXT\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 27 Sep 2017 19:26:24 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752516AbdI0J0U (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 27 Sep 2017 05:26:20 -0400","from mx1.redhat.com ([209.132.183.28]:59160 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1752329AbdI0J0N (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tWed, 27 Sep 2017 05:26:13 -0400","from smtp.corp.redhat.com\n\t(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id E532D4A708;\n\tWed, 27 Sep 2017 09:26:12 +0000 (UTC)","from localhost (ovpn-200-30.brq.redhat.com [10.40.200.30])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 8D2097EE94;\n\tWed, 27 Sep 2017 09:26:06 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com E532D4A708","Date":"Wed, 27 Sep 2017 11:26:04 +0200","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"Daniel Borkmann <daniel@iogearbox.net>","Cc":"davem@davemloft.net, alexei.starovoitov@gmail.com,\n\tjohn.fastabend@gmail.com, peter.waskiewicz.jr@intel.com,\n\tjakub.kicinski@netronome.com, netdev@vger.kernel.org,\n\tAndy Gospodarek <andy@greyhouse.net>, brouer@redhat.com","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Message-ID":"<20170927112604.1284f536@redhat.com>","In-Reply-To":"<59CAB17D.5090204@iogearbox.net>","References":"<cover.1506297988.git.daniel@iogearbox.net>\n\t<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170926211342.0c8e72b0@redhat.com>\n\t<59CAB17D.5090204@iogearbox.net>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.11","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.38]);\n\tWed, 27 Sep 2017 09:26:13 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776346,"web_url":"http://patchwork.ozlabs.org/comment/1776346/","msgid":"<645e7a39-c172-5882-5dd9-f038430114d1@gmail.com>","list_archive_url":null,"date":"2017-09-27T13:35:40","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":20028,"url":"http://patchwork.ozlabs.org/api/people/20028/","name":"John Fastabend","email":"john.fastabend@gmail.com"},"content":"On 09/27/2017 02:26 AM, Jesper Dangaard Brouer wrote:\n> On Tue, 26 Sep 2017 21:58:53 +0200\n> Daniel Borkmann <daniel@iogearbox.net> wrote:\n> \n>> On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:\n>> [...]\n>>> I'm currently implementing a cpumap type, that transfers raw XDP frames\n>>> to another CPU, and the SKB is allocated on the remote CPU.  (It\n>>> actually works extremely well).  \n>>\n>> Meaning you let all the XDP_PASS packets get processed on a\n>> different CPU, so you can reserve the whole CPU just for\n>> prefiltering, right? \n> \n> Yes, exactly.  Except I use the XDP_REDIRECT action to steer packets.\n> The trick is using the map-flush point, to transfer packets in bulk to\n> the remote CPU (single call IPC is too slow), but at the same time\n> flush single packets if NAPI didn't see a bulk.\n> \n>> Do you have some numbers to share at this point, just curious when\n>> you mention it works extremely well.\n> \n> Sure... I've done a lot of benchmarking on this patchset ;-)\n> I have a benchmark program called xdp_redirect_cpu [1][2], that collect\n> stats via tracepoints (atm I'm limiting bulking 8 packets, and have\n> tracepoints at bulk spots, to amortize tracepoint cost 25ns/8=3.125ns)\n> \n>  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_kern.c\n>  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_user.c\n> \n> Here I'm installing a DDoS program that drops UDP port 9 (pktgen\n> packets) on RX CPU=0.  I'm forcing my netperf to hit the same CPU, that\n> the 11.9Mpps DDoS attack is hitting.\n> \n> Running XDP/eBPF prog_num:4\n> XDP-cpumap      CPU:to  pps            drop-pps    extra-info\n> XDP-RX          0       12,030,471     11,966,982  0          \n> XDP-RX          total   12,030,471     11,966,982 \n> cpumap-enqueue    0:2   63,488         0           0          \n> cpumap-enqueue  sum:2   63,488         0           0          \n> cpumap_kthread  2       63,488         0           3          time_exceed\n> cpumap_kthread  total   63,488         0           0          \n> redirect_err    total   0              0          \n> \n> $ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -D1 -T5,5 -- -r 1024,1024\n> Local /Remote\n> Socket Size   Request  Resp.   Elapsed  Trans.\n> Send   Recv   Size     Size    Time     Rate         \n> bytes  Bytes  bytes    bytes   secs.    per sec   \n> \n> 16384  87380  1024     1024    10.00    12735.97   \n> 16384  87380 \n> \n> The netperf TCP_CRR performance is the same, without XDP loaded.\n> \n\nJust curious could you also try this with RPS enabled (or does this have\nRPS enabled). RPS should effectively do the same thing but higher in the\nstack. I'm curious what the delta would be. Might be another interesting\ncase and fairly easy to setup if you already have the above scripts.\n\nThanks,\nJohn\n\n[...]","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"oACMzPzk\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2Jk65vVKz9sPr\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 27 Sep 2017 23:35:58 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752933AbdI0Nf4 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 27 Sep 2017 09:35:56 -0400","from mail-pf0-f193.google.com ([209.85.192.193]:33173 \"EHLO\n\tmail-pf0-f193.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1752895AbdI0Nfy (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Wed, 27 Sep 2017 09:35:54 -0400","by mail-pf0-f193.google.com with SMTP id h4so6858016pfk.0\n\tfor <netdev@vger.kernel.org>; Wed, 27 Sep 2017 06:35:54 -0700 (PDT)","from [192.168.86.74] ([72.168.144.35])\n\tby smtp.gmail.com with ESMTPSA id\n\tf13sm22614340pfj.127.2017.09.27.06.35.45\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tWed, 27 Sep 2017 06:35:53 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=subject:to:cc:references:from:message-id:date:user-agent\n\t:mime-version:in-reply-to:content-language:content-transfer-encoding; \n\tbh=bDBNsH9spgz7yHlAl3MOT1ucPWUn+2W97y7F46xS/4w=;\n\tb=oACMzPzkQF8oCaBjCMFcg2PCMJt33oeWZUjmbk7CIW1MaNSF+WQrhYTEDXzWeaEJCv\n\tgOFm3q4Xd9b3JKkj1J3p7GzA65e6YiI4Xve+IcrGNfYmhGg8BgSHID3+CC5FrhLTIFsb\n\tyfAfKPTIpmEr7ODdwbsMAy53tWhm6Ghs49kM0bYHEzP0laRlDoKtFvIb6VmmJn2RshbZ\n\tEJfUdzpmdS6qd/TjJGdz7HHCBSM0ebITdEDspRlZnA/BWkQcl24fJJoJQ6CnmdWkKAca\n\tUOPhKBJuyw9g9Y/lObxSAmBSi6u0nGQDx9e0GJGpAkjosV9bGT1SV9PiftZfwx6oDFJu\n\tnntw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:subject:to:cc:references:from:message-id:date\n\t:user-agent:mime-version:in-reply-to:content-language\n\t:content-transfer-encoding;\n\tbh=bDBNsH9spgz7yHlAl3MOT1ucPWUn+2W97y7F46xS/4w=;\n\tb=aD8s8DAbwSJS1GR5t/gq/ywN/PcHFS+QBxo4yKC7R9EpC9LupwYUuBmFdQp7FZqfcN\n\tdlk9YvARi5HKtXeKE3o8bZGGS8a5GZ5ivbG70+eTYiVI9qfjcte+WKRbsY7mNsKJRKWC\n\tbo7DT+zqgb6My6eNd443RMkQbMtuZi/IUNvFxP/STk6IHZMqCMNUeZDygnVJCCo5DCRM\n\tUUTl0ENsybEBZ0s8RBvFvvhgo8tNqoAuIIOu9WKNpSPYlUGo29KHBJqSM5ElTnztDuF4\n\tIJgShY4/TbFcwkb7JcYCrmjjuhPra/URhLH8DpKUGsL92CAkfA20bbIU5bDbJIJ608t1\n\tSzuQ==","X-Gm-Message-State":"AHPjjUheuBAlFIevOast4xke8fYD02XiznaWnKT8slO37LGoZRRDq6WR\n\t7rPZ9vRsmUSbK/rY7sJoomg=","X-Google-Smtp-Source":"AOwi7QAJt/BE8RQT26UDBl8lw3wARvSW8bzcVGVAtgrpxed7RfEcbwgxGO66lSrE/KchMRqKn1Vrng==","X-Received":"by 10.98.163.156 with SMTP id q28mr1377221pfl.185.1506519354081; \n\tWed, 27 Sep 2017 06:35:54 -0700 (PDT)","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","To":"Jesper Dangaard Brouer <brouer@redhat.com>,\n\tDaniel Borkmann <daniel@iogearbox.net>","Cc":"davem@davemloft.net, alexei.starovoitov@gmail.com,\n\tpeter.waskiewicz.jr@intel.com, jakub.kicinski@netronome.com,\n\tnetdev@vger.kernel.org, Andy Gospodarek <andy@greyhouse.net>","References":"<cover.1506297988.git.daniel@iogearbox.net>\n\t<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170926211342.0c8e72b0@redhat.com> <59CAB17D.5090204@iogearbox.net>\n\t<20170927112604.1284f536@redhat.com>","From":"John Fastabend <john.fastabend@gmail.com>","Message-ID":"<645e7a39-c172-5882-5dd9-f038430114d1@gmail.com>","Date":"Wed, 27 Sep 2017 06:35:40 -0700","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170927112604.1284f536@redhat.com>","Content-Type":"text/plain; charset=utf-8","Content-Language":"en-US","Content-Transfer-Encoding":"7bit","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776409,"web_url":"http://patchwork.ozlabs.org/comment/1776409/","msgid":"<20170927165457.4265bfc3@redhat.com>","list_archive_url":null,"date":"2017-09-27T14:54:57","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"content":"On Wed, 27 Sep 2017 06:35:40 -0700\nJohn Fastabend <john.fastabend@gmail.com> wrote:\n\n> On 09/27/2017 02:26 AM, Jesper Dangaard Brouer wrote:\n> > On Tue, 26 Sep 2017 21:58:53 +0200\n> > Daniel Borkmann <daniel@iogearbox.net> wrote:\n> >   \n> >> On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:\n> >> [...]  \n> >>> I'm currently implementing a cpumap type, that transfers raw XDP frames\n> >>> to another CPU, and the SKB is allocated on the remote CPU.  (It\n> >>> actually works extremely well).    \n> >>\n> >> Meaning you let all the XDP_PASS packets get processed on a\n> >> different CPU, so you can reserve the whole CPU just for\n> >> prefiltering, right?   \n> > \n> > Yes, exactly.  Except I use the XDP_REDIRECT action to steer packets.\n> > The trick is using the map-flush point, to transfer packets in bulk to\n> > the remote CPU (single call IPC is too slow), but at the same time\n> > flush single packets if NAPI didn't see a bulk.\n> >   \n> >> Do you have some numbers to share at this point, just curious when\n> >> you mention it works extremely well.  \n> > \n> > Sure... I've done a lot of benchmarking on this patchset ;-)\n> > I have a benchmark program called xdp_redirect_cpu [1][2], that collect\n> > stats via tracepoints (atm I'm limiting bulking 8 packets, and have\n> > tracepoints at bulk spots, to amortize tracepoint cost 25ns/8=3.125ns)\n> > \n> >  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_kern.c\n> >  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_user.c\n> > \n> > Here I'm installing a DDoS program that drops UDP port 9 (pktgen\n> > packets) on RX CPU=0.  I'm forcing my netperf to hit the same CPU, that\n> > the 11.9Mpps DDoS attack is hitting.\n> > \n> > Running XDP/eBPF prog_num:4\n> > XDP-cpumap      CPU:to  pps            drop-pps    extra-info\n> > XDP-RX          0       12,030,471     11,966,982  0          \n> > XDP-RX          total   12,030,471     11,966,982 \n> > cpumap-enqueue    0:2   63,488         0           0          \n> > cpumap-enqueue  sum:2   63,488         0           0          \n> > cpumap_kthread  2       63,488         0           3          time_exceed\n> > cpumap_kthread  total   63,488         0           0          \n> > redirect_err    total   0              0          \n> > \n> > $ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -D1 -T5,5 -- -r 1024,1024\n> > Local /Remote\n> > Socket Size   Request  Resp.   Elapsed  Trans.\n> > Send   Recv   Size     Size    Time     Rate         \n> > bytes  Bytes  bytes    bytes   secs.    per sec   \n> > \n> > 16384  87380  1024     1024    10.00    12735.97   \n> > 16384  87380 \n> > \n> > The netperf TCP_CRR performance is the same, without XDP loaded.\n> >   \n> \n> Just curious could you also try this with RPS enabled (or does this have\n> RPS enabled). RPS should effectively do the same thing but higher in the\n> stack. I'm curious what the delta would be. Might be another interesting\n> case and fairly easy to setup if you already have the above scripts.\n\nYes, I'm essentially competing with RSP, thus such a comparison is very\nrelevant...\n\nThis is only a 6 CPUs system. Allocate 2 CPUs to RPS receive and let\nother 4 CPUS process packet.\n\nSummary of RPS (Receive Packet Steering) performance:\n * End result is 6.3 Mpps max performance\n * netperf TCP_CRR is 1 trans/sec.\n * Each RX-RPS CPU stall at ~3.2Mpps.\n\nThe full test report below with setup:\n\nThe mask needed::\n\n perl -e 'printf \"%b\\n\",0x3C'\n 111100\n\nRPS setup::\n\n sudo sh -c 'echo 32768 > /proc/sys/net/core/rps_sock_flow_entries'\n\n for N in $(seq 0 5) ; do \\\n   sudo sh -c \"echo 8192 > /sys/class/net/ixgbe1/queues/rx-$N/rps_flow_cnt\" ; \\\n   sudo sh -c \"echo 3c > /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus\" ; \\\n   grep -H . /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus ; \\\n done\n\nReduce RX queues to two ::\n\n ethtool -L ixgbe1 combined 2\n\nIRQ align to CPU numbers::\n\n $ ~/setup01.sh\n Not root, running with sudo\n  --- Disable Ethernet flow-control ---\n rx unmodified, ignoring\n tx unmodified, ignoring\n no pause parameters changed, aborting\n rx unmodified, ignoring\n tx unmodified, ignoring\n no pause parameters changed, aborting\n  --- Align IRQs ---\n /proc/irq/54/ixgbe1-TxRx-0/../smp_affinity_list:0\n /proc/irq/55/ixgbe1-TxRx-1/../smp_affinity_list:1\n /proc/irq/56/ixgbe1/../smp_affinity_list:0-5\n\n$ grep -H . /sys/class/net/ixgbe1/queues/rx-*/rps_cpus\n/sys/class/net/ixgbe1/queues/rx-0/rps_cpus:3c\n/sys/class/net/ixgbe1/queues/rx-1/rps_cpus:3c\n\nGenerator is sending: 12,715,782 tx_packets /sec\n\n ./pktgen_sample04_many_flows.sh -vi ixgbe2 -m 00:1b:21:bb:9a:84 \\\n    -d 172.16.0.2 -t8\n\n$ nstat > /dev/null && sleep 1 && nstat\n#kernel\nIpInReceives                    6346544            0.0\nIpInDelivers                    6346544            0.0\nIpOutRequests                   1020               0.0\nIcmpOutMsgs                     1020               0.0\nIcmpOutDestUnreachs             1020               0.0\nIcmpMsgOutType3                 1020               0.0\nUdpNoPorts                      6346898            0.0\nIpExtInOctets                   291964714          0.0\nIpExtOutOctets                  73440              0.0\nIpExtInNoECTPkts                6347063            0.0\n\n$ mpstat -P ALL -u -I SCPU -I SUM\n\nAverage:     CPU    %usr   %nice    %sys   %irq   %soft  %idle\nAverage:     all    0.00    0.00    0.00   0.42   72.97  26.61\nAverage:       0    0.00    0.00    0.00   0.17   99.83   0.00\nAverage:       1    0.00    0.00    0.00   0.17   99.83   0.00\nAverage:       2    0.00    0.00    0.00   0.67   60.37  38.96\nAverage:       3    0.00    0.00    0.00   0.67   58.70  40.64\nAverage:       4    0.00    0.00    0.00   0.67   59.53  39.80\nAverage:       5    0.00    0.00    0.00   0.67   58.93  40.40\n\nAverage:     CPU    intr/s\nAverage:     all 152067.22\nAverage:       0  50064.73\nAverage:       1  50089.35\nAverage:       2  45095.17\nAverage:       3  44875.04\nAverage:       4  44906.32\nAverage:       5  45152.08\n\nAverage:     CPU     TIMER/s   NET_TX/s   NET_RX/s TASKLET/s  SCHED/s     RCU/s\nAverage:       0      609.48       0.17   49431.28      0.00     2.66     21.13\nAverage:       1      567.55       0.00   49498.00      0.00     2.66     21.13\nAverage:       2      998.34       0.00   43941.60      4.16    82.86     68.22\nAverage:       3      540.60       0.17   44140.27      0.00    85.52    108.49\nAverage:       4      537.27       0.00   44219.63      0.00    84.53     64.89\nAverage:       5      530.78       0.17   44445.59      0.00    85.02     90.52\n\nFrom mpstat it looks like it is the RX-RPS CPUs that are the bottleneck.\n\nShow adapter(s) (ixgbe1) statistics (ONLY that changed!)\nEthtool(ixgbe1) stat:     11109531 (   11,109,531) <= fdir_miss /sec\nEthtool(ixgbe1) stat:    380632356 (  380,632,356) <= rx_bytes /sec\nEthtool(ixgbe1) stat:    812792611 (  812,792,611) <= rx_bytes_nic /sec\nEthtool(ixgbe1) stat:      1753550 (    1,753,550) <= rx_missed_errors /sec\nEthtool(ixgbe1) stat:      4602487 (    4,602,487) <= rx_no_dma_resources /sec\nEthtool(ixgbe1) stat:      6343873 (    6,343,873) <= rx_packets /sec\nEthtool(ixgbe1) stat:     10946441 (   10,946,441) <= rx_pkts_nic /sec\nEthtool(ixgbe1) stat:    190287853 (  190,287,853) <= rx_queue_0_bytes /sec\nEthtool(ixgbe1) stat:      3171464 (    3,171,464) <= rx_queue_0_packets /sec\nEthtool(ixgbe1) stat:    190344503 (  190,344,503) <= rx_queue_1_bytes /sec\nEthtool(ixgbe1) stat:      3172408 (    3,172,408) <= rx_queue_1_packets /sec\n\nNotice, each RX-CPU can only process 3.1Mpps.\n\nRPS RX-CPU(0):\n\n # Overhead  CPU  Symbol\n # ........  ...  .......................................\n #\n    11.72%  000  [k] ixgbe_poll\n    11.29%  000  [k] _raw_spin_lock\n    10.35%  000  [k] dev_gro_receive\n     8.36%  000  [k] __build_skb\n     7.35%  000  [k] __skb_get_hash\n     6.22%  000  [k] enqueue_to_backlog\n     5.89%  000  [k] __skb_flow_dissect\n     4.43%  000  [k] inet_gro_receive\n     4.19%  000  [k] ___slab_alloc\n     3.90%  000  [k] queued_spin_lock_slowpath\n     3.85%  000  [k] kmem_cache_alloc\n     3.06%  000  [k] build_skb\n     2.66%  000  [k] get_rps_cpu\n     2.57%  000  [k] napi_gro_receive\n     2.34%  000  [k] eth_type_trans\n     1.81%  000  [k] __cmpxchg_double_slab.isra.61\n     1.47%  000  [k] ixgbe_alloc_rx_buffers\n     1.43%  000  [k] get_partial_node.isra.81\n     0.84%  000  [k] swiotlb_sync_single\n     0.74%  000  [k] udp4_gro_receive\n     0.73%  000  [k] netif_receive_skb_internal\n     0.72%  000  [k] udp_gro_receive\n     0.63%  000  [k] skb_gro_reset_offset\n     0.49%  000  [k] __skb_flow_get_ports\n     0.48%  000  [k] llist_add_batch\n     0.36%  000  [k] swiotlb_sync_single_for_cpu\n     0.34%  000  [k] __slab_alloc\n\n\nRemote RPS-CPU(3) getting packets::\n\n # Overhead  CPU  Symbol\n # ........  ...  ..............................................\n #\n    33.02%  003  [k] poll_idle\n    10.99%  003  [k] __netif_receive_skb_core\n    10.45%  003  [k] page_frag_free\n     8.49%  003  [k] ip_rcv\n     4.19%  003  [k] fib_table_lookup\n     2.84%  003  [k] __udp4_lib_rcv\n     2.81%  003  [k] __slab_free\n     2.23%  003  [k] __udp4_lib_lookup\n     2.09%  003  [k] ip_route_input_rcu\n     2.07%  003  [k] kmem_cache_free\n     2.06%  003  [k] udp_v4_early_demux\n     1.73%  003  [k] ip_rcv_finish\n     1.44%  003  [k] process_backlog\n     1.32%  003  [k] icmp_send\n     1.30%  003  [k] cmpxchg_double_slab.isra.73\n     0.95%  003  [k] intel_idle\n     0.88%  003  [k] _raw_spin_lock\n     0.84%  003  [k] fib_validate_source\n     0.79%  003  [k] ip_local_deliver_finish\n     0.67%  003  [k] ip_local_deliver\n     0.56%  003  [k] skb_release_data\n     0.53%  003  [k] unfreeze_partials.isra.80\n     0.51%  003  [k] skb_release_head_state\n     0.44%  003  [k] kfree_skb\n     0.44%  003  [k] queued_spin_lock_slowpath\n     0.44%  003  [k] __cmpxchg_double_slab.isra.61\n\n$ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -T5,5 -- -r 1024,1024\nMIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.0.2 () port 0 AF_INET : histogram : demo : cpu bind\nLocal /Remote\nSocket Size   Request  Resp.   Elapsed  Trans.\nSend   Recv   Size     Size    Time     Rate         \nbytes  Bytes  bytes    bytes   secs.    per sec   \n\n16384  87380  1024     1024    10.00       1.10   \n16384  87380","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2LTZ5mzWz9tXs\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 00:55:14 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751949AbdI0OzK convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 27 Sep 2017 10:55:10 -0400","from mx1.redhat.com ([209.132.183.28]:40926 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751907AbdI0OzJ (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tWed, 27 Sep 2017 10:55:09 -0400","from smtp.corp.redhat.com\n\t(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 8D84872D73;\n\tWed, 27 Sep 2017 14:55:08 +0000 (UTC)","from localhost (ovpn-200-30.brq.redhat.com [10.40.200.30])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 446CFCEAE1;\n\tWed, 27 Sep 2017 14:54:58 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 8D84872D73","Date":"Wed, 27 Sep 2017 16:54:57 +0200","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"John Fastabend <john.fastabend@gmail.com>","Cc":"Daniel Borkmann <daniel@iogearbox.net>, davem@davemloft.net,\n\talexei.starovoitov@gmail.com, peter.waskiewicz.jr@intel.com,\n\tjakub.kicinski@netronome.com, netdev@vger.kernel.org,\n\tAndy Gospodarek <andy@greyhouse.net>, brouer@redhat.com","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Message-ID":"<20170927165457.4265bfc3@redhat.com>","In-Reply-To":"<645e7a39-c172-5882-5dd9-f038430114d1@gmail.com>","References":"<cover.1506297988.git.daniel@iogearbox.net>\n\t<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170926211342.0c8e72b0@redhat.com>\n\t<59CAB17D.5090204@iogearbox.net>\n\t<20170927112604.1284f536@redhat.com>\n\t<645e7a39-c172-5882-5dd9-f038430114d1@gmail.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"8BIT","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.15","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.38]);\n\tWed, 27 Sep 2017 14:55:08 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776493,"web_url":"http://patchwork.ozlabs.org/comment/1776493/","msgid":"<20170927173233.tuqlutz6t2gwdk53@ast-mbp>","list_archive_url":null,"date":"2017-09-27T17:32:36","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":42586,"url":"http://patchwork.ozlabs.org/api/people/42586/","name":"Alexei Starovoitov","email":"alexei.starovoitov@gmail.com"},"content":"On Wed, Sep 27, 2017 at 04:54:57PM +0200, Jesper Dangaard Brouer wrote:\n> On Wed, 27 Sep 2017 06:35:40 -0700\n> John Fastabend <john.fastabend@gmail.com> wrote:\n> \n> > On 09/27/2017 02:26 AM, Jesper Dangaard Brouer wrote:\n> > > On Tue, 26 Sep 2017 21:58:53 +0200\n> > > Daniel Borkmann <daniel@iogearbox.net> wrote:\n> > >   \n> > >> On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:\n> > >> [...]  \n> > >>> I'm currently implementing a cpumap type, that transfers raw XDP frames\n> > >>> to another CPU, and the SKB is allocated on the remote CPU.  (It\n> > >>> actually works extremely well).    \n> > >>\n> > >> Meaning you let all the XDP_PASS packets get processed on a\n> > >> different CPU, so you can reserve the whole CPU just for\n> > >> prefiltering, right?   \n> > > \n> > > Yes, exactly.  Except I use the XDP_REDIRECT action to steer packets.\n> > > The trick is using the map-flush point, to transfer packets in bulk to\n> > > the remote CPU (single call IPC is too slow), but at the same time\n> > > flush single packets if NAPI didn't see a bulk.\n> > >   \n> > >> Do you have some numbers to share at this point, just curious when\n> > >> you mention it works extremely well.  \n> > > \n> > > Sure... I've done a lot of benchmarking on this patchset ;-)\n> > > I have a benchmark program called xdp_redirect_cpu [1][2], that collect\n> > > stats via tracepoints (atm I'm limiting bulking 8 packets, and have\n> > > tracepoints at bulk spots, to amortize tracepoint cost 25ns/8=3.125ns)\n> > > \n> > >  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_kern.c\n> > >  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_user.c\n> > > \n> > > Here I'm installing a DDoS program that drops UDP port 9 (pktgen\n> > > packets) on RX CPU=0.  I'm forcing my netperf to hit the same CPU, that\n> > > the 11.9Mpps DDoS attack is hitting.\n> > > \n> > > Running XDP/eBPF prog_num:4\n> > > XDP-cpumap      CPU:to  pps            drop-pps    extra-info\n> > > XDP-RX          0       12,030,471     11,966,982  0          \n> > > XDP-RX          total   12,030,471     11,966,982 \n> > > cpumap-enqueue    0:2   63,488         0           0          \n> > > cpumap-enqueue  sum:2   63,488         0           0          \n> > > cpumap_kthread  2       63,488         0           3          time_exceed\n> > > cpumap_kthread  total   63,488         0           0          \n> > > redirect_err    total   0              0          \n> > > \n> > > $ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -D1 -T5,5 -- -r 1024,1024\n> > > Local /Remote\n> > > Socket Size   Request  Resp.   Elapsed  Trans.\n> > > Send   Recv   Size     Size    Time     Rate         \n> > > bytes  Bytes  bytes    bytes   secs.    per sec   \n> > > \n> > > 16384  87380  1024     1024    10.00    12735.97   \n> > > 16384  87380 \n> > > \n> > > The netperf TCP_CRR performance is the same, without XDP loaded.\n> > >   \n> > \n> > Just curious could you also try this with RPS enabled (or does this have\n> > RPS enabled). RPS should effectively do the same thing but higher in the\n> > stack. I'm curious what the delta would be. Might be another interesting\n> > case and fairly easy to setup if you already have the above scripts.\n> \n> Yes, I'm essentially competing with RSP, thus such a comparison is very\n> relevant...\n> \n> This is only a 6 CPUs system. Allocate 2 CPUs to RPS receive and let\n> other 4 CPUS process packet.\n> \n> Summary of RPS (Receive Packet Steering) performance:\n>  * End result is 6.3 Mpps max performance\n>  * netperf TCP_CRR is 1 trans/sec.\n>  * Each RX-RPS CPU stall at ~3.2Mpps.\n> \n> The full test report below with setup:\n> \n> The mask needed::\n> \n>  perl -e 'printf \"%b\\n\",0x3C'\n>  111100\n> \n> RPS setup::\n> \n>  sudo sh -c 'echo 32768 > /proc/sys/net/core/rps_sock_flow_entries'\n> \n>  for N in $(seq 0 5) ; do \\\n>    sudo sh -c \"echo 8192 > /sys/class/net/ixgbe1/queues/rx-$N/rps_flow_cnt\" ; \\\n>    sudo sh -c \"echo 3c > /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus\" ; \\\n>    grep -H . /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus ; \\\n>  done\n> \n> Reduce RX queues to two ::\n> \n>  ethtool -L ixgbe1 combined 2\n> \n> IRQ align to CPU numbers::\n> \n>  $ ~/setup01.sh\n>  Not root, running with sudo\n>   --- Disable Ethernet flow-control ---\n>  rx unmodified, ignoring\n>  tx unmodified, ignoring\n>  no pause parameters changed, aborting\n>  rx unmodified, ignoring\n>  tx unmodified, ignoring\n>  no pause parameters changed, aborting\n>   --- Align IRQs ---\n>  /proc/irq/54/ixgbe1-TxRx-0/../smp_affinity_list:0\n>  /proc/irq/55/ixgbe1-TxRx-1/../smp_affinity_list:1\n>  /proc/irq/56/ixgbe1/../smp_affinity_list:0-5\n> \n> $ grep -H . /sys/class/net/ixgbe1/queues/rx-*/rps_cpus\n> /sys/class/net/ixgbe1/queues/rx-0/rps_cpus:3c\n> /sys/class/net/ixgbe1/queues/rx-1/rps_cpus:3c\n> \n> Generator is sending: 12,715,782 tx_packets /sec\n> \n>  ./pktgen_sample04_many_flows.sh -vi ixgbe2 -m 00:1b:21:bb:9a:84 \\\n>     -d 172.16.0.2 -t8\n> \n> $ nstat > /dev/null && sleep 1 && nstat\n> #kernel\n> IpInReceives                    6346544            0.0\n> IpInDelivers                    6346544            0.0\n> IpOutRequests                   1020               0.0\n> IcmpOutMsgs                     1020               0.0\n> IcmpOutDestUnreachs             1020               0.0\n> IcmpMsgOutType3                 1020               0.0\n> UdpNoPorts                      6346898            0.0\n> IpExtInOctets                   291964714          0.0\n> IpExtOutOctets                  73440              0.0\n> IpExtInNoECTPkts                6347063            0.0\n> \n> $ mpstat -P ALL -u -I SCPU -I SUM\n> \n> Average:     CPU    %usr   %nice    %sys   %irq   %soft  %idle\n> Average:     all    0.00    0.00    0.00   0.42   72.97  26.61\n> Average:       0    0.00    0.00    0.00   0.17   99.83   0.00\n> Average:       1    0.00    0.00    0.00   0.17   99.83   0.00\n> Average:       2    0.00    0.00    0.00   0.67   60.37  38.96\n> Average:       3    0.00    0.00    0.00   0.67   58.70  40.64\n> Average:       4    0.00    0.00    0.00   0.67   59.53  39.80\n> Average:       5    0.00    0.00    0.00   0.67   58.93  40.40\n> \n> Average:     CPU    intr/s\n> Average:     all 152067.22\n> Average:       0  50064.73\n> Average:       1  50089.35\n> Average:       2  45095.17\n> Average:       3  44875.04\n> Average:       4  44906.32\n> Average:       5  45152.08\n> \n> Average:     CPU     TIMER/s   NET_TX/s   NET_RX/s TASKLET/s  SCHED/s     RCU/s\n> Average:       0      609.48       0.17   49431.28      0.00     2.66     21.13\n> Average:       1      567.55       0.00   49498.00      0.00     2.66     21.13\n> Average:       2      998.34       0.00   43941.60      4.16    82.86     68.22\n> Average:       3      540.60       0.17   44140.27      0.00    85.52    108.49\n> Average:       4      537.27       0.00   44219.63      0.00    84.53     64.89\n> Average:       5      530.78       0.17   44445.59      0.00    85.02     90.52\n> \n> From mpstat it looks like it is the RX-RPS CPUs that are the bottleneck.\n> \n> Show adapter(s) (ixgbe1) statistics (ONLY that changed!)\n> Ethtool(ixgbe1) stat:     11109531 (   11,109,531) <= fdir_miss /sec\n> Ethtool(ixgbe1) stat:    380632356 (  380,632,356) <= rx_bytes /sec\n> Ethtool(ixgbe1) stat:    812792611 (  812,792,611) <= rx_bytes_nic /sec\n> Ethtool(ixgbe1) stat:      1753550 (    1,753,550) <= rx_missed_errors /sec\n> Ethtool(ixgbe1) stat:      4602487 (    4,602,487) <= rx_no_dma_resources /sec\n> Ethtool(ixgbe1) stat:      6343873 (    6,343,873) <= rx_packets /sec\n> Ethtool(ixgbe1) stat:     10946441 (   10,946,441) <= rx_pkts_nic /sec\n> Ethtool(ixgbe1) stat:    190287853 (  190,287,853) <= rx_queue_0_bytes /sec\n> Ethtool(ixgbe1) stat:      3171464 (    3,171,464) <= rx_queue_0_packets /sec\n> Ethtool(ixgbe1) stat:    190344503 (  190,344,503) <= rx_queue_1_bytes /sec\n> Ethtool(ixgbe1) stat:      3172408 (    3,172,408) <= rx_queue_1_packets /sec\n> \n> Notice, each RX-CPU can only process 3.1Mpps.\n> \n> RPS RX-CPU(0):\n> \n>  # Overhead  CPU  Symbol\n>  # ........  ...  .......................................\n>  #\n>     11.72%  000  [k] ixgbe_poll\n>     11.29%  000  [k] _raw_spin_lock\n>     10.35%  000  [k] dev_gro_receive\n>      8.36%  000  [k] __build_skb\n>      7.35%  000  [k] __skb_get_hash\n>      6.22%  000  [k] enqueue_to_backlog\n>      5.89%  000  [k] __skb_flow_dissect\n>      4.43%  000  [k] inet_gro_receive\n>      4.19%  000  [k] ___slab_alloc\n>      3.90%  000  [k] queued_spin_lock_slowpath\n>      3.85%  000  [k] kmem_cache_alloc\n>      3.06%  000  [k] build_skb\n>      2.66%  000  [k] get_rps_cpu\n>      2.57%  000  [k] napi_gro_receive\n>      2.34%  000  [k] eth_type_trans\n>      1.81%  000  [k] __cmpxchg_double_slab.isra.61\n>      1.47%  000  [k] ixgbe_alloc_rx_buffers\n>      1.43%  000  [k] get_partial_node.isra.81\n>      0.84%  000  [k] swiotlb_sync_single\n>      0.74%  000  [k] udp4_gro_receive\n>      0.73%  000  [k] netif_receive_skb_internal\n>      0.72%  000  [k] udp_gro_receive\n>      0.63%  000  [k] skb_gro_reset_offset\n>      0.49%  000  [k] __skb_flow_get_ports\n>      0.48%  000  [k] llist_add_batch\n>      0.36%  000  [k] swiotlb_sync_single_for_cpu\n>      0.34%  000  [k] __slab_alloc\n> \n> \n> Remote RPS-CPU(3) getting packets::\n> \n>  # Overhead  CPU  Symbol\n>  # ........  ...  ..............................................\n>  #\n>     33.02%  003  [k] poll_idle\n>     10.99%  003  [k] __netif_receive_skb_core\n>     10.45%  003  [k] page_frag_free\n>      8.49%  003  [k] ip_rcv\n>      4.19%  003  [k] fib_table_lookup\n>      2.84%  003  [k] __udp4_lib_rcv\n>      2.81%  003  [k] __slab_free\n>      2.23%  003  [k] __udp4_lib_lookup\n>      2.09%  003  [k] ip_route_input_rcu\n>      2.07%  003  [k] kmem_cache_free\n>      2.06%  003  [k] udp_v4_early_demux\n>      1.73%  003  [k] ip_rcv_finish\n\nVery interesting data.\nSo above perf report compares to xdp-redirect-cpu this one:\nPerf top on a CPU(3) that have to alloc and free SKBs etc.\n\n# Overhead  CPU  Symbol\n# ........  ...  .......................................\n#\n    15.51%  003  [k] fib_table_lookup\n     8.91%  003  [k] cpu_map_kthread_run\n     8.04%  003  [k] build_skb\n     7.88%  003  [k] page_frag_free\n     5.13%  003  [k] kmem_cache_alloc\n     4.76%  003  [k] ip_route_input_rcu\n     4.59%  003  [k] kmem_cache_free\n     4.02%  003  [k] __udp4_lib_rcv\n     3.20%  003  [k] fib_validate_source\n     3.02%  003  [k] __netif_receive_skb_core\n     3.02%  003  [k] udp_v4_early_demux\n     2.90%  003  [k] ip_rcv\n     2.80%  003  [k] ip_rcv_finish\n\nright?\nand in RPS case the consumer cpu is 33% idle whereas in redirect-cpu\nyou can load it up all the way.\nAm I interpreting all this correctly that with RPS cpu0 cannot\ndistributed the packets to other cpus fast enough and that's\na bottleneck?\nwhereas in redirect-cpu you're doing early packet distribution\nbefore skb alloc?\nSo in other words with redirect-cpu all consumer cpus are doing\nskb alloc and in RPS cpu0 is allocating skbs for all ?\nand that's where 6M->12M performance gain comes from?","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"qZZ6UQ7d\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2PzH6Ftmz9t4b\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 03:32:43 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751495AbdI0Rcm (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 27 Sep 2017 13:32:42 -0400","from mail-pg0-f65.google.com ([74.125.83.65]:34815 \"EHLO\n\tmail-pg0-f65.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1750852AbdI0Rck (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Wed, 27 Sep 2017 13:32:40 -0400","by mail-pg0-f65.google.com with SMTP id u18so9884133pgo.1\n\tfor <netdev@vger.kernel.org>; Wed, 27 Sep 2017 10:32:40 -0700 (PDT)","from ast-mbp ([2620:10d:c090:180::1:cbbe])\n\tby smtp.gmail.com with ESMTPSA id\n\tp85sm21559914pfj.47.2017.09.27.10.32.37\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tWed, 27 Sep 2017 10:32:38 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:in-reply-to:user-agent;\n\tbh=ryR61/gOGPl1LM1LPonRKMsc/X7/8vvsi5zsX1sZ1/g=;\n\tb=qZZ6UQ7dIQaf/t2ubXVifqHSkB/KXp2VcWGLrTJ3fNuyeBX5BhSRG/wuxkozC0VpV/\n\t8D4Bb+xVlziEk3Z+RINvapclpZUpWXIuLfOQxIUVZ5DJuREzAvimQftvMzha2PBOrecO\n\tPdZyAKdVlig1I/EyrT+sUXn7HJniXPJbA0ATVQxRAauXyJa2SffqzmbEfi7L91VodSe0\n\t2F6byRSanlhLJdSFZ7sdfH9a4JG4t6aXh2uF3yf+JlrG4jOX1w2dBer4iAldNyT9ZDGj\n\tetmPEcSxFfYtPjfnuvnwrM/MbRml69QdtuCG/MIsxNB26bpTnOzxk053K3IN/IjxouWu\n\tZLcg==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:in-reply-to:user-agent;\n\tbh=ryR61/gOGPl1LM1LPonRKMsc/X7/8vvsi5zsX1sZ1/g=;\n\tb=lRYx7wytPowjaklvnI/1mDtpmlZtq0ptBlaymegsY2vASwUlhTbDceqB5t5h5hwTuo\n\tTl1FzfVeMSQtvfa9n+WRVUbzaLhbEDfze4nYLB11S0VdEPO99C51akA4rpoTfqdwn4Kx\n\tsEuU7a/LpDrFxJhWgh+x+T2SVYtBmfTjg7AcXq6Z/4bYm7GRCeWKAJR/BgFpknqsPHT+\n\tZUxvRVF/YBasWI5xwa2FlO+XDCNdeJrcQJWGVCn6afJaXMGIWhXZk+CnsgoPosCuuJzY\n\t560chAwvUT+mj0MX7B8hx2nKz+aXeZfynvFp0Tmd6QlC06siaH9CvnSlBTsn/TOtSOVN\n\tEJ9Q==","X-Gm-Message-State":"AHPjjUiQMjvLUgE4XWRNBaoz1qSlfBZXwHKFsl/tgueFDFihugxNlO5t\n\tfTNMx5wSmf2tIy+AIzaCoXE=","X-Google-Smtp-Source":"AOwi7QBBurKJByU0mLwwL55/C+Yc+7bYUn67yOzD6/C+CEebBChsHYqSsxPbaO+acfwStlEaQJ5SCQ==","X-Received":"by 10.159.253.137 with SMTP id q9mr1869565pls.16.1506533559327; \n\tWed, 27 Sep 2017 10:32:39 -0700 (PDT)","Date":"Wed, 27 Sep 2017 10:32:36 -0700","From":"Alexei Starovoitov <alexei.starovoitov@gmail.com>","To":"Jesper Dangaard Brouer <brouer@redhat.com>","Cc":"John Fastabend <john.fastabend@gmail.com>,\n\tDaniel Borkmann <daniel@iogearbox.net>, davem@davemloft.net,\n\tpeter.waskiewicz.jr@intel.com, jakub.kicinski@netronome.com,\n\tnetdev@vger.kernel.org, Andy Gospodarek <andy@greyhouse.net>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Message-ID":"<20170927173233.tuqlutz6t2gwdk53@ast-mbp>","References":"<cover.1506297988.git.daniel@iogearbox.net>\n\t<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170926211342.0c8e72b0@redhat.com>\n\t<59CAB17D.5090204@iogearbox.net>\n\t<20170927112604.1284f536@redhat.com>\n\t<645e7a39-c172-5882-5dd9-f038430114d1@gmail.com>\n\t<20170927165457.4265bfc3@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170927165457.4265bfc3@redhat.com>","User-Agent":"NeoMutt/20170421 (1.8.2)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1776785,"web_url":"http://patchwork.ozlabs.org/comment/1776785/","msgid":"<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>","list_archive_url":null,"date":"2017-09-28T05:59:38","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":72228,"url":"http://patchwork.ozlabs.org/api/people/72228/","name":"Waskiewicz Jr, Peter","email":"peter.waskiewicz.jr@intel.com"},"content":"On 9/26/17 10:21 AM, Andy Gospodarek wrote:\n> On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:\n>> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:\n>> [...]\n>>> First, thanks for this detailed description.  It was helpful to read\n>>> along with the patches.\n>>>\n>>> My only concern about this area being generic is that you are now in a\n>>> state where any bpf program must know about all the bpf programs in the\n>>> receive pipeline before it can properly parse what is stored in the\n>>> meta-data and add it to an skb (or perform any other action).\n>>> Especially if each program adds it's own meta-data along the way.\n>>>\n>>> Maybe this isn't a big concern based on the number of users of this\n>>> today, but it just starts to seem like a concern as there are these\n>>> hints being passed between layers that are challenging to track due to a\n>>> lack of a standard format for passing data between.\n>>\n>> Btw, we do have similar kind of programmable scratch buffer also today\n>> wrt skb cb[] that you can program from tc side, the perf ring buffer,\n>> which doesn't have any fixed layout for the slots, or a per-cpu map\n>> where you can transfer data between tail calls for example, then tail\n>> calls themselves that need to coordinate, or simply mangling of packets\n>> itself if you will, but more below to your use case ...\n>>\n>>> The main reason I bring this up is that Michael and I had discussed and\n>>> designed a way for drivers to communicate between each other that rx\n>>> resources could be freed after a tx completion on an XDP_REDIRECT\n>>> action.  Much like this code, it involved adding an new element to\n>>> struct xdp_md that could point to the important information.  Now that\n>>> there is a generic way to handle this, it would seem nice to be able to\n>>> leverage it, but I'm not sure how reliable this meta-data area would be\n>>> without the ability to mark it in some manner.\n>>>\n>>> For additional background, the minimum amount of data needed in the case\n>>> Michael and I were discussing was really 2 words.  One to serve as a\n>>> pointer to an rx_ring structure and one to have a counter to the rx\n>>> producer entry.  This data could be acessed by the driver processing the\n>>> tx completions and callback to the driver that received the frame off the wire\n>>> to perform any needed processing.  (For those curious this would also require a\n>>> new callback/netdev op to act on this data stored in the XDP buffer.)\n>>\n>> What you describe above doesn't seem to be fitting to the use-case of\n>> this set, meaning the area here is fully programmable out of the BPF\n>> program, the infrastructure you're describing is some sort of means of\n>> communication between drivers for the XDP_REDIRECT, and should be\n>> outside of the control of the BPF program to mangle.\n> \n> OK, I understand that perspective.  I think saying this is really meant\n> as a BPF<->BPF communication channel for now is fine.\n> \n>> You could probably reuse the base infra here and make a part of that\n>> inaccessible for the program with some sort of a fixed layout, but I\n>> haven't seen your code yet to be able to fully judge. Intention here\n>> is to allow for programmability within the BPF prog in a generic way,\n>> such that based on the use-case it can be populated in specific ways\n>> and propagated to the skb w/o having to define a fixed layout and\n>> bloat xdp_buff all the way to an skb while still retaining all the\n>> flexibility.\n> \n> Some level of reuse might be proper, but I'd rather it be explicit for\n> my use since it's not exclusively something that will need to be used by\n> a BPF prog, but rather the driver.  I'll produce some patches this week\n> for reference.\n\nSorry for chiming in late, I've been offline.\n\nWe're looking to add some functionality from driver to XDP inside this \nxdp_buff->data_meta region.  We want to assign it to an opaque \nstructure, that would be specific per driver (think of a flex descriptor \ncoming out of the hardware).  We'd like to pass these offloaded \ncomputations into XDP programs to help accelerate them, such as packet \ntype, where headers are located, etc.  It's similar to Jesper's RFC \npatches back in May when passing through the mlx Rx descriptor to XDP.\n\nThis is actually what a few of us are planning to present at NetDev 2.2 \nin November.  If you're hoping to restrict this headroom in the xdp_buff \nfor an exclusive use case with XDP_REDIRECT, then I'd like to discuss \nthat further.\n\n-PJ","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2kYL2XFRz9t38\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 28 Sep 2017 15:59:50 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1750977AbdI1F7q convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 01:59:46 -0400","from mga02.intel.com ([134.134.136.20]:64031 \"EHLO mga02.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1750775AbdI1F7p (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 28 Sep 2017 01:59:45 -0400","from fmsmga004.fm.intel.com ([10.253.24.48])\n\tby orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t27 Sep 2017 22:59:44 -0700","from orsmsx105.amr.corp.intel.com ([10.22.225.132])\n\tby fmsmga004.fm.intel.com with ESMTP; 27 Sep 2017 22:59:39 -0700","from orsmsx111.amr.corp.intel.com (10.22.240.12) by\n\tORSMSX105.amr.corp.intel.com (10.22.225.132) with Microsoft SMTP\n\tServer (TLS) id 14.3.319.2; Wed, 27 Sep 2017 22:59:39 -0700","from orsmsx103.amr.corp.intel.com ([169.254.5.89]) by\n\tORSMSX111.amr.corp.intel.com ([169.254.12.101]) with mapi id\n\t14.03.0319.002; Wed, 27 Sep 2017 22:59:38 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,448,1500966000\"; d=\"scan'208\";a=\"317116403\"","From":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>","To":"Andy Gospodarek <andy@greyhouse.net>,\n\tDaniel Borkmann <daniel@iogearbox.net>","CC":"\"davem@davemloft.net\" <davem@davemloft.net>,\n\t\"alexei.starovoitov@gmail.com\" <alexei.starovoitov@gmail.com>,\n\t\"john.fastabend@gmail.com\" <john.fastabend@gmail.com>,\n\t\"jakub.kicinski@netronome.com\" <jakub.kicinski@netronome.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\t\"mchan@broadcom.com\" <mchan@broadcom.com>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Thread-Topic":"[PATCH net-next 2/6] bpf: add meta pointer for direct access","Thread-Index":"AQHTNZUpk+ugo/bl606zMT9GCRR4ag==","Date":"Thu, 28 Sep 2017 05:59:38 +0000","Message-ID":"<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>\n\t<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>","Accept-Language":"en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.254.101.78]","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777260,"web_url":"http://patchwork.ozlabs.org/comment/1777260/","msgid":"<CAHashqBMfXp-uYH9ANfdaNfez9f4pcrOjnbX2WAFAdBwaJAtvw@mail.gmail.com>","list_archive_url":null,"date":"2017-09-28T19:58:55","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":971,"url":"http://patchwork.ozlabs.org/api/people/971/","name":"Andy Gospodarek","email":"andy@greyhouse.net"},"content":"On Thu, Sep 28, 2017 at 1:59 AM, Waskiewicz Jr, Peter\n<peter.waskiewicz.jr@intel.com> wrote:\n> On 9/26/17 10:21 AM, Andy Gospodarek wrote:\n>> On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:\n>>> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:\n>>> [...]\n>>>> First, thanks for this detailed description.  It was helpful to read\n>>>> along with the patches.\n>>>>\n>>>> My only concern about this area being generic is that you are now in a\n>>>> state where any bpf program must know about all the bpf programs in the\n>>>> receive pipeline before it can properly parse what is stored in the\n>>>> meta-data and add it to an skb (or perform any other action).\n>>>> Especially if each program adds it's own meta-data along the way.\n>>>>\n>>>> Maybe this isn't a big concern based on the number of users of this\n>>>> today, but it just starts to seem like a concern as there are these\n>>>> hints being passed between layers that are challenging to track due to a\n>>>> lack of a standard format for passing data between.\n>>>\n>>> Btw, we do have similar kind of programmable scratch buffer also today\n>>> wrt skb cb[] that you can program from tc side, the perf ring buffer,\n>>> which doesn't have any fixed layout for the slots, or a per-cpu map\n>>> where you can transfer data between tail calls for example, then tail\n>>> calls themselves that need to coordinate, or simply mangling of packets\n>>> itself if you will, but more below to your use case ...\n>>>\n>>>> The main reason I bring this up is that Michael and I had discussed and\n>>>> designed a way for drivers to communicate between each other that rx\n>>>> resources could be freed after a tx completion on an XDP_REDIRECT\n>>>> action.  Much like this code, it involved adding an new element to\n>>>> struct xdp_md that could point to the important information.  Now that\n>>>> there is a generic way to handle this, it would seem nice to be able to\n>>>> leverage it, but I'm not sure how reliable this meta-data area would be\n>>>> without the ability to mark it in some manner.\n>>>>\n>>>> For additional background, the minimum amount of data needed in the case\n>>>> Michael and I were discussing was really 2 words.  One to serve as a\n>>>> pointer to an rx_ring structure and one to have a counter to the rx\n>>>> producer entry.  This data could be acessed by the driver processing the\n>>>> tx completions and callback to the driver that received the frame off the wire\n>>>> to perform any needed processing.  (For those curious this would also require a\n>>>> new callback/netdev op to act on this data stored in the XDP buffer.)\n>>>\n>>> What you describe above doesn't seem to be fitting to the use-case of\n>>> this set, meaning the area here is fully programmable out of the BPF\n>>> program, the infrastructure you're describing is some sort of means of\n>>> communication between drivers for the XDP_REDIRECT, and should be\n>>> outside of the control of the BPF program to mangle.\n>>\n>> OK, I understand that perspective.  I think saying this is really meant\n>> as a BPF<->BPF communication channel for now is fine.\n>>\n>>> You could probably reuse the base infra here and make a part of that\n>>> inaccessible for the program with some sort of a fixed layout, but I\n>>> haven't seen your code yet to be able to fully judge. Intention here\n>>> is to allow for programmability within the BPF prog in a generic way,\n>>> such that based on the use-case it can be populated in specific ways\n>>> and propagated to the skb w/o having to define a fixed layout and\n>>> bloat xdp_buff all the way to an skb while still retaining all the\n>>> flexibility.\n>>\n>> Some level of reuse might be proper, but I'd rather it be explicit for\n>> my use since it's not exclusively something that will need to be used by\n>> a BPF prog, but rather the driver.  I'll produce some patches this week\n>> for reference.\n>\n> Sorry for chiming in late, I've been offline.\n>\n> We're looking to add some functionality from driver to XDP inside this\n> xdp_buff->data_meta region.  We want to assign it to an opaque\n> structure, that would be specific per driver (think of a flex descriptor\n> coming out of the hardware).  We'd like to pass these offloaded\n> computations into XDP programs to help accelerate them, such as packet\n> type, where headers are located, etc.  It's similar to Jesper's RFC\n> patches back in May when passing through the mlx Rx descriptor to XDP.\n>\n> This is actually what a few of us are planning to present at NetDev 2.2\n> in November.  If you're hoping to restrict this headroom in the xdp_buff\n> for an exclusive use case with XDP_REDIRECT, then I'd like to discuss\n> that further.\n>\n\nNo sweat, PJ, thanks for replying.  I saw the notes for your accepted\nsession and I'm looking forward to it.\n\nJohn's suggestion earlier in the thread was actually similar to the\nconclusion I reached when thinking about Daniel's patch a bit more.\n(I like John's better though as it doesn't get constrained by UAPI.)\nSince redirect actions happen at a point where no other programs will\nrun on the buffer, that space can be used for this redirect data and\nthere are no conflicts.\n\nIt sounds like the idea behind your proposal includes populating some\ndata into the buffer before the XDP program is executed so that it can\nbe used by the program.  Would this data be useful later in the driver\nor stack or are you just hoping to accelerate processing of frames in\nthe BPF program?\n\nIf the headroom needed for redirect info was only added after it was\nclear the redirect action was needed, would this conflict with the\ninformation you are trying to provide?  I had planned to add this just\nafter the action was XDP_REDIRECT was selected or at the end of the\ndriver's ndo_xdp_xmit function -- it seems like it would not conflict.\n\n(There's also Jesper's series from today -- I've seen it but have not\nhad time to fully grok all of those changes.)\n\nThoughts?","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=greyhouse-net.20150623.gappssmtp.com\n\theader.i=@greyhouse-net.20150623.gappssmtp.com\n\theader.b=\"hwQTxDd+\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y35B12Kqwz9t5x\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 05:59:21 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751417AbdI1T7S (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 15:59:18 -0400","from mail-qk0-f182.google.com ([209.85.220.182]:56731 \"EHLO\n\tmail-qk0-f182.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1750821AbdI1T7R (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Thu, 28 Sep 2017 15:59:17 -0400","by mail-qk0-f182.google.com with SMTP id g128so2554931qke.13\n\tfor <netdev@vger.kernel.org>; Thu, 28 Sep 2017 12:59:16 -0700 (PDT)","by 10.12.153.66 with HTTP; Thu, 28 Sep 2017 12:58:55 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=greyhouse-net.20150623.gappssmtp.com; s=20150623;\n\th=mime-version:in-reply-to:references:from:date:message-id:subject:to\n\t:cc; bh=MPM9AqNAI5x2e/78ai2OGjouwGCe+2UDiXKF8VKygbs=;\n\tb=hwQTxDd+4yKQO7Qjq7Hn0t7diBQ2qCvis/rn+J1YpUBIHtf0AVzrcINlZI2BM/EyK9\n\tP2Klv69bCA7ozNTfhPgeGgYzD51KapOogBh1xrgeNCTosQN7QLUNyyBrqOd/TfHAiSXh\n\t5LSg6vTqxD4e+apNJ6huMAtAkesxlm2KpYDjvPFTF7A2H3B9XwxHmyL+ytu6tzBy+uof\n\tVjYkIVlZ/G0E+DjZnRDUnVgmN6kzjZOtydo3mtUmdp9O2fdIvzxVuNFbOIv9rD6WKxxm\n\t7uxLCXDCNncT3SxHnn98Sg6KHtpphze2HmpfpSUvGlfH050odc2mVKJFIeAsYQAa0wUR\n\tAq2w==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=MPM9AqNAI5x2e/78ai2OGjouwGCe+2UDiXKF8VKygbs=;\n\tb=BqXF83SFB4ugkyjynb2VSOpoC62wDupby/QYU7eqwAamUv/vybmEzDVNhHd4A1U9LP\n\tnIhAqvXRBQ6YOuVxDYQRsp5V5FVT4jr0s2gONxpSWdsa2kqmbEQaPp3kf6arqkR807uk\n\tKBz/1wKIwGKV/yhexMbILLdoCr4CPuEMoQYQ+01elO/K6/4G4/SuKlRP2fuGmt19LpKC\n\ty0qq/hJgWi6mvhNtzQh9LejZ94bDejkzpwot6kckVWFPwMKDrzGUWGwnJbOMvy9WsIcg\n\tIfnVEMfoGXvYmhik6Ta5cc5pb2BdQNWnHFtgebOUZSBzw9QRONCIme+6gsWCcsXaVQ3/\n\th5qA==","X-Gm-Message-State":"AMCzsaUXqBhBD9XlgsMaVpmMvm7w39l8qQ9jQv4aO0cJv8gCvbHREMmq\n\tC7f0hIPgTSChwNFNG4SPXvP+oS8fxGhk4OeSZHoULw==","X-Google-Smtp-Source":"AOwi7QDVRT453Ed6Jzsfi6ChvSjEsPL1s2Kr4PdDFZU+Dq8uSN5BK1mxfEMi1RIEx7NbwJbFvMHXETUM5SUSod1pAyA=","X-Received":"by 10.55.27.136 with SMTP id m8mr122259qkh.356.1506628756228;\n\tThu, 28 Sep 2017 12:59:16 -0700 (PDT)","MIME-Version":"1.0","X-Originating-IP":"[192.19.231.250]","In-Reply-To":"<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>\n\t<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>","From":"Andy Gospodarek <andy@greyhouse.net>","Date":"Thu, 28 Sep 2017 15:58:55 -0400","Message-ID":"<CAHashqBMfXp-uYH9ANfdaNfez9f4pcrOjnbX2WAFAdBwaJAtvw@mail.gmail.com>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","To":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>","Cc":"Daniel Borkmann <daniel@iogearbox.net>,\n\t\"davem@davemloft.net\" <davem@davemloft.net>,\n\t\"alexei.starovoitov@gmail.com\" <alexei.starovoitov@gmail.com>,\n\t\"john.fastabend@gmail.com\" <john.fastabend@gmail.com>,\n\t\"jakub.kicinski@netronome.com\" <jakub.kicinski@netronome.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\t\"mchan@broadcom.com\" <mchan@broadcom.com>","Content-Type":"text/plain; charset=\"UTF-8\"","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777283,"web_url":"http://patchwork.ozlabs.org/comment/1777283/","msgid":"<E0D909EE5BB15A4699798539EA149D7F077E6438@ORSMSX103.amr.corp.intel.com>","list_archive_url":null,"date":"2017-09-28T20:52:34","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":72228,"url":"http://patchwork.ozlabs.org/api/people/72228/","name":"Waskiewicz Jr, Peter","email":"peter.waskiewicz.jr@intel.com"},"content":"On 9/28/17 12:59 PM, Andy Gospodarek wrote:\n> On Thu, Sep 28, 2017 at 1:59 AM, Waskiewicz Jr, Peter\n> <peter.waskiewicz.jr@intel.com> wrote:\n>> On 9/26/17 10:21 AM, Andy Gospodarek wrote:\n>>> On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:\n>>>> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:\n>>>> [...]\n>>>>> First, thanks for this detailed description.  It was helpful to read\n>>>>> along with the patches.\n>>>>>\n>>>>> My only concern about this area being generic is that you are now in a\n>>>>> state where any bpf program must know about all the bpf programs in the\n>>>>> receive pipeline before it can properly parse what is stored in the\n>>>>> meta-data and add it to an skb (or perform any other action).\n>>>>> Especially if each program adds it's own meta-data along the way.\n>>>>>\n>>>>> Maybe this isn't a big concern based on the number of users of this\n>>>>> today, but it just starts to seem like a concern as there are these\n>>>>> hints being passed between layers that are challenging to track due to a\n>>>>> lack of a standard format for passing data between.\n>>>>\n>>>> Btw, we do have similar kind of programmable scratch buffer also today\n>>>> wrt skb cb[] that you can program from tc side, the perf ring buffer,\n>>>> which doesn't have any fixed layout for the slots, or a per-cpu map\n>>>> where you can transfer data between tail calls for example, then tail\n>>>> calls themselves that need to coordinate, or simply mangling of packets\n>>>> itself if you will, but more below to your use case ...\n>>>>\n>>>>> The main reason I bring this up is that Michael and I had discussed and\n>>>>> designed a way for drivers to communicate between each other that rx\n>>>>> resources could be freed after a tx completion on an XDP_REDIRECT\n>>>>> action.  Much like this code, it involved adding an new element to\n>>>>> struct xdp_md that could point to the important information.  Now that\n>>>>> there is a generic way to handle this, it would seem nice to be able to\n>>>>> leverage it, but I'm not sure how reliable this meta-data area would be\n>>>>> without the ability to mark it in some manner.\n>>>>>\n>>>>> For additional background, the minimum amount of data needed in the case\n>>>>> Michael and I were discussing was really 2 words.  One to serve as a\n>>>>> pointer to an rx_ring structure and one to have a counter to the rx\n>>>>> producer entry.  This data could be acessed by the driver processing the\n>>>>> tx completions and callback to the driver that received the frame off the wire\n>>>>> to perform any needed processing.  (For those curious this would also require a\n>>>>> new callback/netdev op to act on this data stored in the XDP buffer.)\n>>>>\n>>>> What you describe above doesn't seem to be fitting to the use-case of\n>>>> this set, meaning the area here is fully programmable out of the BPF\n>>>> program, the infrastructure you're describing is some sort of means of\n>>>> communication between drivers for the XDP_REDIRECT, and should be\n>>>> outside of the control of the BPF program to mangle.\n>>>\n>>> OK, I understand that perspective.  I think saying this is really meant\n>>> as a BPF<->BPF communication channel for now is fine.\n>>>\n>>>> You could probably reuse the base infra here and make a part of that\n>>>> inaccessible for the program with some sort of a fixed layout, but I\n>>>> haven't seen your code yet to be able to fully judge. Intention here\n>>>> is to allow for programmability within the BPF prog in a generic way,\n>>>> such that based on the use-case it can be populated in specific ways\n>>>> and propagated to the skb w/o having to define a fixed layout and\n>>>> bloat xdp_buff all the way to an skb while still retaining all the\n>>>> flexibility.\n>>>\n>>> Some level of reuse might be proper, but I'd rather it be explicit for\n>>> my use since it's not exclusively something that will need to be used by\n>>> a BPF prog, but rather the driver.  I'll produce some patches this week\n>>> for reference.\n>>\n>> Sorry for chiming in late, I've been offline.\n>>\n>> We're looking to add some functionality from driver to XDP inside this\n>> xdp_buff->data_meta region.  We want to assign it to an opaque\n>> structure, that would be specific per driver (think of a flex descriptor\n>> coming out of the hardware).  We'd like to pass these offloaded\n>> computations into XDP programs to help accelerate them, such as packet\n>> type, where headers are located, etc.  It's similar to Jesper's RFC\n>> patches back in May when passing through the mlx Rx descriptor to XDP.\n>>\n>> This is actually what a few of us are planning to present at NetDev 2.2\n>> in November.  If you're hoping to restrict this headroom in the xdp_buff\n>> for an exclusive use case with XDP_REDIRECT, then I'd like to discuss\n>> that further.\n>>\n> \n> No sweat, PJ, thanks for replying.  I saw the notes for your accepted\n> session and I'm looking forward to it.\n> \n> John's suggestion earlier in the thread was actually similar to the\n> conclusion I reached when thinking about Daniel's patch a bit more.\n> (I like John's better though as it doesn't get constrained by UAPI.)\n> Since redirect actions happen at a point where no other programs will\n> run on the buffer, that space can be used for this redirect data and\n> there are no conflicts.\n\nAh, yes, John and I spoke about this at Plumber's and this is basically \nwhat we came to as well.  A set of helpers that won't have to be in \nUAPI, but they will be potentially vendor-specific to extract the \nmeta-data hints out for the XDP program to use.\n\n> It sounds like the idea behind your proposal includes populating some\n> data into the buffer before the XDP program is executed so that it can\n> be used by the program.  Would this data be useful later in the driver\n> or stack or are you just hoping to accelerate processing of frames in\n> the BPF program?\n\nRight now we're thinking it would only be useful for XDP programs to \nexecute things quicker, i.e. not have to compute things that are already \ncomputed by the hardware (rxhash, ptype, header locations, etc.).  I \ndon't have any plans to pass this data off elsewhere in the stack or \nback to the driver at this point.\n\n> If the headroom needed for redirect info was only added after it was\n> clear the redirect action was needed, would this conflict with the\n> information you are trying to provide?  I had planned to add this just\n> after the action was XDP_REDIRECT was selected or at the end of the\n> driver's ndo_xdp_xmit function -- it seems like it would not conflict.\n\nI'm pretty sure I misunderstood what you were going after with \nXDP_REDIRECT reserving the headroom.  Our use case (patches coming in a \nfew weeks) will populate the headroom coming out of the driver to XDP, \nand then once the XDP program extracts whatever hints it wants via \nhelpers, I fully expect that area in the headroom to get stomped by \nsomething else.  If we want to send any of that hint data up farther, \nwe'll already have it extracted via the helpers, and the eBPF program \ncan happily assign it to wherever in the outbound metadata area.\n\n> (There's also Jesper's series from today -- I've seen it but have not\n> had time to fully grok all of those changes.)\n\nI'm also working through my inbox to get to that series.  I have some \nemail to catch up on...\n\nThanks Andy,\n-PJ","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y36MX2MFrz9t32\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 06:52:40 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751362AbdI1Uwh convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 16:52:37 -0400","from mga07.intel.com ([134.134.136.100]:6410 \"EHLO mga07.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751008AbdI1Uwg (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 28 Sep 2017 16:52:36 -0400","from orsmga002.jf.intel.com ([10.7.209.21])\n\tby orsmga105.jf.intel.com with ESMTP; 28 Sep 2017 13:52:35 -0700","from orsmsx110.amr.corp.intel.com ([10.22.240.8])\n\tby orsmga002.jf.intel.com with ESMTP; 28 Sep 2017 13:52:35 -0700","from orsmsx160.amr.corp.intel.com (10.22.226.43) by\n\tORSMSX110.amr.corp.intel.com (10.22.240.8) with Microsoft SMTP Server\n\t(TLS) id 14.3.319.2; Thu, 28 Sep 2017 13:52:35 -0700","from orsmsx103.amr.corp.intel.com ([169.254.5.89]) by\n\tORSMSX160.amr.corp.intel.com ([169.254.13.61]) with mapi id\n\t14.03.0319.002; Thu, 28 Sep 2017 13:52:34 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,450,1500966000\"; d=\"scan'208\";a=\"140657916\"","From":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>","To":"Andy Gospodarek <andy@greyhouse.net>","CC":"Daniel Borkmann <daniel@iogearbox.net>,\n\t\"davem@davemloft.net\" <davem@davemloft.net>,\n\t\"alexei.starovoitov@gmail.com\" <alexei.starovoitov@gmail.com>,\n\t\"john.fastabend@gmail.com\" <john.fastabend@gmail.com>,\n\t\"jakub.kicinski@netronome.com\" <jakub.kicinski@netronome.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\t\"mchan@broadcom.com\" <mchan@broadcom.com>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Thread-Topic":"[PATCH net-next 2/6] bpf: add meta pointer for direct access","Thread-Index":"AQHTNZUpk+ugo/bl606zMT9GCRR4ag==","Date":"Thu, 28 Sep 2017 20:52:34 +0000","Message-ID":"<E0D909EE5BB15A4699798539EA149D7F077E6438@ORSMSX103.amr.corp.intel.com>","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>\n\t<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>\n\t<CAHashqBMfXp-uYH9ANfdaNfez9f4pcrOjnbX2WAFAdBwaJAtvw@mail.gmail.com>","Accept-Language":"en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.254.101.78]","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777290,"web_url":"http://patchwork.ozlabs.org/comment/1777290/","msgid":"<bdce98b7-1d32-3cd9-1289-79807af8443f@gmail.com>","list_archive_url":null,"date":"2017-09-28T21:22:43","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":20028,"url":"http://patchwork.ozlabs.org/api/people/20028/","name":"John Fastabend","email":"john.fastabend@gmail.com"},"content":"[...]\n\n> I'm pretty sure I misunderstood what you were going after with \n> XDP_REDIRECT reserving the headroom.  Our use case (patches coming in a \n> few weeks) will populate the headroom coming out of the driver to XDP, \n> and then once the XDP program extracts whatever hints it wants via \n> helpers, I fully expect that area in the headroom to get stomped by \n> something else.  If we want to send any of that hint data up farther, \n> we'll already have it extracted via the helpers, and the eBPF program \n> can happily assign it to wherever in the outbound metadata area.\n\nIn case its not obvious with the latest xdp metadata patches the outbound\nmetadata can then be pushed into skb fields via a tc_cls program if needed.\n\n.John\n\n> \n>> (There's also Jesper's series from today -- I've seen it but have not\n>> had time to fully grok all of those changes.)\n> \n> I'm also working through my inbox to get to that series.  I have some \n> email to catch up on...\n> \n> Thanks Andy,\n> -PJ\n>","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"d7baAgN0\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y372g2Nwzz9t4F\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 07:23:07 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751527AbdI1VXE (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 17:23:04 -0400","from mail-pg0-f54.google.com ([74.125.83.54]:53365 \"EHLO\n\tmail-pg0-f54.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751129AbdI1VXC (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Thu, 28 Sep 2017 17:23:02 -0400","by mail-pg0-f54.google.com with SMTP id j70so1607028pgc.10\n\tfor <netdev@vger.kernel.org>; Thu, 28 Sep 2017 14:23:02 -0700 (PDT)","from [192.168.86.74] ([72.168.144.131])\n\tby smtp.gmail.com with ESMTPSA id\n\tr12sm3853478pgp.81.2017.09.28.14.22.48\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tThu, 28 Sep 2017 14:23:01 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=subject:to:cc:references:from:message-id:date:user-agent\n\t:mime-version:in-reply-to:content-language:content-transfer-encoding; \n\tbh=KeRDA+8BLRFAlkZqeHvGqxo7GyZAm6MssQgekvKGOBc=;\n\tb=d7baAgN04yztO2Xavk43sX4MpCyCBXbreJsjvXTxeeLN9UMzd/5jKJR46rAYq/J6LA\n\teT6lX8VqSjqZCnx6vbXZD4I2VOzUrTQ7XIDzsS4mu0Efb2LXrGdgCoBJ40sxinfqJif4\n\tcBzfnfeCobxjtdGQsrmlPHHKaQ50abqwe66fVAiV8SXitYLqK4cR+lRvMInhZOUVSBts\n\t6tV+xSB7vKxKe4NUsRPKvrfiCx6y15r8LNYwv2UzN2xJtvyV7ZkMrGNI/xlPZ8X8WwAZ\n\tNPS/krqz2e35jDgTcamVdzaLJ8txVjm2m5KKYm4BhasJRzBZ7lSwg0cf7Trb0ag2frCf\n\t4bBQ==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:subject:to:cc:references:from:message-id:date\n\t:user-agent:mime-version:in-reply-to:content-language\n\t:content-transfer-encoding;\n\tbh=KeRDA+8BLRFAlkZqeHvGqxo7GyZAm6MssQgekvKGOBc=;\n\tb=Kqyf2GR+HP/gWGMuDGVomN2mvMcz/eAkavWn/S7NEsrkH+qFT3K0ZMMcNzgDHi0Ks+\n\tApQYaA+8+a97cOSzp0v4rRJcUHfGwYniFVnZR24jvhuOOcYhgOKXQQu5Gi1W+XOBcyXM\n\tqszg1XzlOupyksUVu+TWTrgW9VJS0YlMbKT/29Z9uyCdSq/rgNMw93QhSz5NWoJGK3gj\n\tLIAJ80qsgOzqVIdPUQ798DRSTr3xG0ZeXZ1ruJ8mwyvqXhllvEqM0c9ez/hTupTJptkI\n\tCuNtWYcEbtUwk/jGHfQv1iUxre0nVlzymeYQl3CSYBTJumzXTiFq2GDKmE7rMqluwOFI\n\tGKzw==","X-Gm-Message-State":"AHPjjUixMACsBlaIiSEWyAxTpLxFVFbjyeNvlwQBMig3fH78R/5reTWU\n\t6aJ6fWCHBtA1surJzksqGRU=","X-Google-Smtp-Source":"AOwi7QDtTcV4uwymrhW8SM5nO17qMieyAZ9WwE0kWO46M0DAnNSIAVTff4PgT72fctuQkXCHI35aBg==","X-Received":"by 10.99.116.90 with SMTP id e26mr5289948pgn.290.1506633782314; \n\tThu, 28 Sep 2017 14:23:02 -0700 (PDT)","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","To":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>,\n\tAndy Gospodarek <andy@greyhouse.net>","Cc":"Daniel Borkmann <daniel@iogearbox.net>,\n\t\"davem@davemloft.net\" <davem@davemloft.net>,\n\t\"alexei.starovoitov@gmail.com\" <alexei.starovoitov@gmail.com>,\n\t\"jakub.kicinski@netronome.com\" <jakub.kicinski@netronome.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\t\"mchan@broadcom.com\" <mchan@broadcom.com>","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>\n\t<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>\n\t<CAHashqBMfXp-uYH9ANfdaNfez9f4pcrOjnbX2WAFAdBwaJAtvw@mail.gmail.com>\n\t<E0D909EE5BB15A4699798539EA149D7F077E6438@ORSMSX103.amr.corp.intel.com>","From":"John Fastabend <john.fastabend@gmail.com>","Message-ID":"<bdce98b7-1d32-3cd9-1289-79807af8443f@gmail.com>","Date":"Thu, 28 Sep 2017 14:22:43 -0700","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<E0D909EE5BB15A4699798539EA149D7F077E6438@ORSMSX103.amr.corp.intel.com>","Content-Type":"text/plain; charset=utf-8","Content-Language":"en-US","Content-Transfer-Encoding":"7bit","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777294,"web_url":"http://patchwork.ozlabs.org/comment/1777294/","msgid":"<59CD69CC.7070809@iogearbox.net>","list_archive_url":null,"date":"2017-09-28T21:29:48","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":65705,"url":"http://patchwork.ozlabs.org/api/people/65705/","name":"Daniel Borkmann","email":"daniel@iogearbox.net"},"content":"On 09/28/2017 10:52 PM, Waskiewicz Jr, Peter wrote:\n> On 9/28/17 12:59 PM, Andy Gospodarek wrote:\n>> On Thu, Sep 28, 2017 at 1:59 AM, Waskiewicz Jr, Peter\n>> <peter.waskiewicz.jr@intel.com> wrote:\n>>> On 9/26/17 10:21 AM, Andy Gospodarek wrote:\n>>>> On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:\n>>>>> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:\n>>>>> [...]\n>>>>>> First, thanks for this detailed description.  It was helpful to read\n>>>>>> along with the patches.\n>>>>>>\n>>>>>> My only concern about this area being generic is that you are now in a\n>>>>>> state where any bpf program must know about all the bpf programs in the\n>>>>>> receive pipeline before it can properly parse what is stored in the\n>>>>>> meta-data and add it to an skb (or perform any other action).\n>>>>>> Especially if each program adds it's own meta-data along the way.\n>>>>>>\n>>>>>> Maybe this isn't a big concern based on the number of users of this\n>>>>>> today, but it just starts to seem like a concern as there are these\n>>>>>> hints being passed between layers that are challenging to track due to a\n>>>>>> lack of a standard format for passing data between.\n>>>>>\n>>>>> Btw, we do have similar kind of programmable scratch buffer also today\n>>>>> wrt skb cb[] that you can program from tc side, the perf ring buffer,\n>>>>> which doesn't have any fixed layout for the slots, or a per-cpu map\n>>>>> where you can transfer data between tail calls for example, then tail\n>>>>> calls themselves that need to coordinate, or simply mangling of packets\n>>>>> itself if you will, but more below to your use case ...\n>>>>>\n>>>>>> The main reason I bring this up is that Michael and I had discussed and\n>>>>>> designed a way for drivers to communicate between each other that rx\n>>>>>> resources could be freed after a tx completion on an XDP_REDIRECT\n>>>>>> action.  Much like this code, it involved adding an new element to\n>>>>>> struct xdp_md that could point to the important information.  Now that\n>>>>>> there is a generic way to handle this, it would seem nice to be able to\n>>>>>> leverage it, but I'm not sure how reliable this meta-data area would be\n>>>>>> without the ability to mark it in some manner.\n>>>>>>\n>>>>>> For additional background, the minimum amount of data needed in the case\n>>>>>> Michael and I were discussing was really 2 words.  One to serve as a\n>>>>>> pointer to an rx_ring structure and one to have a counter to the rx\n>>>>>> producer entry.  This data could be acessed by the driver processing the\n>>>>>> tx completions and callback to the driver that received the frame off the wire\n>>>>>> to perform any needed processing.  (For those curious this would also require a\n>>>>>> new callback/netdev op to act on this data stored in the XDP buffer.)\n>>>>>\n>>>>> What you describe above doesn't seem to be fitting to the use-case of\n>>>>> this set, meaning the area here is fully programmable out of the BPF\n>>>>> program, the infrastructure you're describing is some sort of means of\n>>>>> communication between drivers for the XDP_REDIRECT, and should be\n>>>>> outside of the control of the BPF program to mangle.\n>>>>\n>>>> OK, I understand that perspective.  I think saying this is really meant\n>>>> as a BPF<->BPF communication channel for now is fine.\n>>>>\n>>>>> You could probably reuse the base infra here and make a part of that\n>>>>> inaccessible for the program with some sort of a fixed layout, but I\n>>>>> haven't seen your code yet to be able to fully judge. Intention here\n>>>>> is to allow for programmability within the BPF prog in a generic way,\n>>>>> such that based on the use-case it can be populated in specific ways\n>>>>> and propagated to the skb w/o having to define a fixed layout and\n>>>>> bloat xdp_buff all the way to an skb while still retaining all the\n>>>>> flexibility.\n>>>>\n>>>> Some level of reuse might be proper, but I'd rather it be explicit for\n>>>> my use since it's not exclusively something that will need to be used by\n>>>> a BPF prog, but rather the driver.  I'll produce some patches this week\n>>>> for reference.\n>>>\n>>> Sorry for chiming in late, I've been offline.\n>>>\n>>> We're looking to add some functionality from driver to XDP inside this\n>>> xdp_buff->data_meta region.  We want to assign it to an opaque\n>>> structure, that would be specific per driver (think of a flex descriptor\n>>> coming out of the hardware).  We'd like to pass these offloaded\n>>> computations into XDP programs to help accelerate them, such as packet\n>>> type, where headers are located, etc.  It's similar to Jesper's RFC\n>>> patches back in May when passing through the mlx Rx descriptor to XDP.\n>>>\n>>> This is actually what a few of us are planning to present at NetDev 2.2\n>>> in November.  If you're hoping to restrict this headroom in the xdp_buff\n>>> for an exclusive use case with XDP_REDIRECT, then I'd like to discuss\n>>> that further.\n>>\n>> No sweat, PJ, thanks for replying.  I saw the notes for your accepted\n>> session and I'm looking forward to it.\n>>\n>> John's suggestion earlier in the thread was actually similar to the\n>> conclusion I reached when thinking about Daniel's patch a bit more.\n>> (I like John's better though as it doesn't get constrained by UAPI.)\n>> Since redirect actions happen at a point where no other programs will\n>> run on the buffer, that space can be used for this redirect data and\n>> there are no conflicts.\n\nYep fully agree, it's not read anywhere else anymore or could go up\nthe stack where we'd read it out again, so that's the best solution\nfor your use-case moving forward, Andy. I do like that we don't expose\nto uapi.\n\n> Ah, yes, John and I spoke about this at Plumber's and this is basically\n> what we came to as well.  A set of helpers that won't have to be in\n> UAPI, but they will be potentially vendor-specific to extract the\n> meta-data hints out for the XDP program to use.\n>\n>> It sounds like the idea behind your proposal includes populating some\n>> data into the buffer before the XDP program is executed so that it can\n>> be used by the program.  Would this data be useful later in the driver\n>> or stack or are you just hoping to accelerate processing of frames in\n>> the BPF program?\n>\n> Right now we're thinking it would only be useful for XDP programs to\n> execute things quicker, i.e. not have to compute things that are already\n> computed by the hardware (rxhash, ptype, header locations, etc.).  I\n> don't have any plans to pass this data off elsewhere in the stack or\n> back to the driver at this point.\n>\n>> If the headroom needed for redirect info was only added after it was\n>> clear the redirect action was needed, would this conflict with the\n>> information you are trying to provide?  I had planned to add this just\n>> after the action was XDP_REDIRECT was selected or at the end of the\n>> driver's ndo_xdp_xmit function -- it seems like it would not conflict.\n>\n> I'm pretty sure I misunderstood what you were going after with\n> XDP_REDIRECT reserving the headroom.  Our use case (patches coming in a\n> few weeks) will populate the headroom coming out of the driver to XDP,\n> and then once the XDP program extracts whatever hints it wants via\n> helpers, I fully expect that area in the headroom to get stomped by\n> something else.  If we want to send any of that hint data up farther,\n> we'll already have it extracted via the helpers, and the eBPF program\n> can happily assign it to wherever in the outbound metadata area.\n\nSure, these two are compatible with each other; in your case it's\npopulated before the prog is called, and the prog can use it while\nprocessing, in Andy's case it's populated after the prog was called\nwhen we need to redirect, so both fine.\n\nThanks,\nDaniel","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y37C252MMz9t5l\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 07:30:22 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751927AbdI1VaD (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 17:30:03 -0400","from www62.your-server.de ([213.133.104.62]:45278 \"EHLO\n\twww62.your-server.de\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751438AbdI1V3z (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Thu, 28 Sep 2017 17:29:55 -0400","from [85.7.161.218] (helo=localhost.localdomain)\n\tby www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-SHA:256)\n\t(Exim 4.85_2) (envelope-from <daniel@iogearbox.net>)\n\tid 1dxgNR-0002d9-Fw; Thu, 28 Sep 2017 23:29:49 +0200"],"Message-ID":"<59CD69CC.7070809@iogearbox.net>","Date":"Thu, 28 Sep 2017 23:29:48 +0200","From":"Daniel Borkmann <daniel@iogearbox.net>","User-Agent":"Mozilla/5.0 (X11; Linux x86_64;\n\trv:31.0) Gecko/20100101 Thunderbird/31.7.0","MIME-Version":"1.0","To":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>,\n\tAndy Gospodarek <andy@greyhouse.net>","CC":"\"davem@davemloft.net\" <davem@davemloft.net>,\n\t\"alexei.starovoitov@gmail.com\" <alexei.starovoitov@gmail.com>,\n\t\"john.fastabend@gmail.com\" <john.fastabend@gmail.com>,\n\t\"jakub.kicinski@netronome.com\" <jakub.kicinski@netronome.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\t\"mchan@broadcom.com\" <mchan@broadcom.com>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>\n\t<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>\n\t<CAHashqBMfXp-uYH9ANfdaNfez9f4pcrOjnbX2WAFAdBwaJAtvw@mail.gmail.com>\n\t<E0D909EE5BB15A4699798539EA149D7F077E6438@ORSMSX103.amr.corp.intel.com>","In-Reply-To":"<E0D909EE5BB15A4699798539EA149D7F077E6438@ORSMSX103.amr.corp.intel.com>","Content-Type":"text/plain; charset=windows-1252; format=flowed","Content-Transfer-Encoding":"7bit","X-Authenticated-Sender":"daniel@iogearbox.net","X-Virus-Scanned":"Clear (ClamAV 0.99.2/23884/Thu Sep 28 22:46:49 2017)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777296,"web_url":"http://patchwork.ozlabs.org/comment/1777296/","msgid":"<E0D909EE5BB15A4699798539EA149D7F077E66D3@ORSMSX103.amr.corp.intel.com>","list_archive_url":null,"date":"2017-09-28T21:40:40","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":72228,"url":"http://patchwork.ozlabs.org/api/people/72228/","name":"Waskiewicz Jr, Peter","email":"peter.waskiewicz.jr@intel.com"},"content":"On 9/28/17 2:23 PM, John Fastabend wrote:\n> [...]\n> \n>> I'm pretty sure I misunderstood what you were going after with\n>> XDP_REDIRECT reserving the headroom.  Our use case (patches coming in a\n>> few weeks) will populate the headroom coming out of the driver to XDP,\n>> and then once the XDP program extracts whatever hints it wants via\n>> helpers, I fully expect that area in the headroom to get stomped by\n>> something else.  If we want to send any of that hint data up farther,\n>> we'll already have it extracted via the helpers, and the eBPF program\n>> can happily assign it to wherever in the outbound metadata area.\n> \n> In case its not obvious with the latest xdp metadata patches the outbound\n> metadata can then be pushed into skb fields via a tc_cls program if needed.\n\nYes, that was what I was alluding to with \"can happily assign it to \nwherever.\"  The patches we're working on are driver->XDP, then anything \nelse using the latest meta-data patches would be XDP->anywhere else.  So \nI don't think we're going to step on any toes.\n\nThanks John,\n-PJ","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y37R11TJ5z9sP1\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 07:40:45 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751485AbdI1Vkn convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 28 Sep 2017 17:40:43 -0400","from mga05.intel.com ([192.55.52.43]:4294 \"EHLO mga05.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751058AbdI1Vkm (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 28 Sep 2017 17:40:42 -0400","from fmsmga001.fm.intel.com ([10.253.24.23])\n\tby fmsmga105.fm.intel.com with ESMTP; 28 Sep 2017 14:40:41 -0700","from orsmsx106.amr.corp.intel.com ([10.22.225.133])\n\tby fmsmga001.fm.intel.com with ESMTP; 28 Sep 2017 14:40:41 -0700","from orsmsx162.amr.corp.intel.com (10.22.240.85) by\n\tORSMSX106.amr.corp.intel.com (10.22.225.133) with Microsoft SMTP\n\tServer (TLS) id 14.3.319.2; Thu, 28 Sep 2017 14:40:41 -0700","from orsmsx103.amr.corp.intel.com ([169.254.5.89]) by\n\tORSMSX162.amr.corp.intel.com ([169.254.3.44]) with mapi id\n\t14.03.0319.002; Thu, 28 Sep 2017 14:40:41 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,451,1500966000\"; d=\"scan'208\";a=\"1200199986\"","From":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>","To":"John Fastabend <john.fastabend@gmail.com>,\n\tAndy Gospodarek <andy@greyhouse.net>","CC":"Daniel Borkmann <daniel@iogearbox.net>,\n\t\"davem@davemloft.net\" <davem@davemloft.net>,\n\t\"alexei.starovoitov@gmail.com\" <alexei.starovoitov@gmail.com>,\n\t\"jakub.kicinski@netronome.com\" <jakub.kicinski@netronome.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\t\"mchan@broadcom.com\" <mchan@broadcom.com>","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Thread-Topic":"[PATCH net-next 2/6] bpf: add meta pointer for direct access","Thread-Index":"AQHTNZUpk+ugo/bl606zMT9GCRR4ag==","Date":"Thu, 28 Sep 2017 21:40:40 +0000","Message-ID":"<E0D909EE5BB15A4699798539EA149D7F077E66D3@ORSMSX103.amr.corp.intel.com>","References":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170925181000.GA60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<59C94FF4.8070900@iogearbox.net>\n\t<20170926172140.GB60144@C02RW35GFVH8.dhcp.broadcom.net>\n\t<E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>\n\t<CAHashqBMfXp-uYH9ANfdaNfez9f4pcrOjnbX2WAFAdBwaJAtvw@mail.gmail.com>\n\t<E0D909EE5BB15A4699798539EA149D7F077E6438@ORSMSX103.amr.corp.intel.com>\n\t<bdce98b7-1d32-3cd9-1289-79807af8443f@gmail.com>","Accept-Language":"en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.254.101.78]","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1777378,"web_url":"http://patchwork.ozlabs.org/comment/1777378/","msgid":"<20170929090738.040231a3@redhat.com>","list_archive_url":null,"date":"2017-09-29T07:09:40","subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"content":"On Wed, 27 Sep 2017 10:32:36 -0700\nAlexei Starovoitov <alexei.starovoitov@gmail.com> wrote:\n\n> On Wed, Sep 27, 2017 at 04:54:57PM +0200, Jesper Dangaard Brouer wrote:\n> > On Wed, 27 Sep 2017 06:35:40 -0700\n> > John Fastabend <john.fastabend@gmail.com> wrote:\n> >   \n> > > On 09/27/2017 02:26 AM, Jesper Dangaard Brouer wrote:  \n> > > > On Tue, 26 Sep 2017 21:58:53 +0200\n> > > > Daniel Borkmann <daniel@iogearbox.net> wrote:\n> > > >     \n> > > >> On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:\n> > > >> [...]    \n> > > >>> I'm currently implementing a cpumap type, that transfers raw XDP frames\n> > > >>> to another CPU, and the SKB is allocated on the remote CPU.  (It\n> > > >>> actually works extremely well).      \n> > > >>\n> > > >> Meaning you let all the XDP_PASS packets get processed on a\n> > > >> different CPU, so you can reserve the whole CPU just for\n> > > >> prefiltering, right?     \n> > > > \n> > > > Yes, exactly.  Except I use the XDP_REDIRECT action to steer packets.\n> > > > The trick is using the map-flush point, to transfer packets in bulk to\n> > > > the remote CPU (single call IPC is too slow), but at the same time\n> > > > flush single packets if NAPI didn't see a bulk.\n> > > >     \n> > > >> Do you have some numbers to share at this point, just curious when\n> > > >> you mention it works extremely well.    \n> > > > \n> > > > Sure... I've done a lot of benchmarking on this patchset ;-)\n> > > > I have a benchmark program called xdp_redirect_cpu [1][2], that collect\n> > > > stats via tracepoints (atm I'm limiting bulking 8 packets, and have\n> > > > tracepoints at bulk spots, to amortize tracepoint cost 25ns/8=3.125ns)\n> > > > \n> > > >  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_kern.c\n> > > >  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_user.c\n> > > > \n> > > > Here I'm installing a DDoS program that drops UDP port 9 (pktgen\n> > > > packets) on RX CPU=0.  I'm forcing my netperf to hit the same CPU, that\n> > > > the 11.9Mpps DDoS attack is hitting.\n> > > > \n> > > > Running XDP/eBPF prog_num:4\n> > > > XDP-cpumap      CPU:to  pps            drop-pps    extra-info\n> > > > XDP-RX          0       12,030,471     11,966,982  0          \n> > > > XDP-RX          total   12,030,471     11,966,982 \n> > > > cpumap-enqueue    0:2   63,488         0           0          \n> > > > cpumap-enqueue  sum:2   63,488         0           0          \n> > > > cpumap_kthread  2       63,488         0           3          time_exceed\n> > > > cpumap_kthread  total   63,488         0           0          \n> > > > redirect_err    total   0              0          \n> > > > \n> > > > $ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -D1 -T5,5 -- -r 1024,1024\n> > > > Local /Remote\n> > > > Socket Size   Request  Resp.   Elapsed  Trans.\n> > > > Send   Recv   Size     Size    Time     Rate         \n> > > > bytes  Bytes  bytes    bytes   secs.    per sec   \n> > > > \n> > > > 16384  87380  1024     1024    10.00    12735.97   \n> > > > 16384  87380 \n> > > > \n> > > > The netperf TCP_CRR performance is the same, without XDP loaded.\n> > > >     \n> > > \n> > > Just curious could you also try this with RPS enabled (or does this have\n> > > RPS enabled). RPS should effectively do the same thing but higher in the\n> > > stack. I'm curious what the delta would be. Might be another interesting\n> > > case and fairly easy to setup if you already have the above scripts.  \n> > \n> > Yes, I'm essentially competing with RSP, thus such a comparison is very\n> > relevant...\n> > \n> > This is only a 6 CPUs system. Allocate 2 CPUs to RPS receive and let\n> > other 4 CPUS process packet.\n> > \n> > Summary of RPS (Receive Packet Steering) performance:\n> >  * End result is 6.3 Mpps max performance\n> >  * netperf TCP_CRR is 1 trans/sec.\n> >  * Each RX-RPS CPU stall at ~3.2Mpps.\n> > \n> > The full test report below with setup:\n> > \n> > The mask needed::\n> > \n> >  perl -e 'printf \"%b\\n\",0x3C'\n> >  111100\n> > \n> > RPS setup::\n> > \n> >  sudo sh -c 'echo 32768 > /proc/sys/net/core/rps_sock_flow_entries'\n> > \n> >  for N in $(seq 0 5) ; do \\\n> >    sudo sh -c \"echo 8192 > /sys/class/net/ixgbe1/queues/rx-$N/rps_flow_cnt\" ; \\\n> >    sudo sh -c \"echo 3c > /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus\" ; \\\n> >    grep -H . /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus ; \\\n> >  done\n> > \n> > Reduce RX queues to two ::\n> > \n> >  ethtool -L ixgbe1 combined 2\n> > \n> > IRQ align to CPU numbers::\n> > \n> >  $ ~/setup01.sh\n> >  Not root, running with sudo\n> >   --- Disable Ethernet flow-control ---\n> >  rx unmodified, ignoring\n> >  tx unmodified, ignoring\n> >  no pause parameters changed, aborting\n> >  rx unmodified, ignoring\n> >  tx unmodified, ignoring\n> >  no pause parameters changed, aborting\n> >   --- Align IRQs ---\n> >  /proc/irq/54/ixgbe1-TxRx-0/../smp_affinity_list:0\n> >  /proc/irq/55/ixgbe1-TxRx-1/../smp_affinity_list:1\n> >  /proc/irq/56/ixgbe1/../smp_affinity_list:0-5\n> > \n> > $ grep -H . /sys/class/net/ixgbe1/queues/rx-*/rps_cpus\n> > /sys/class/net/ixgbe1/queues/rx-0/rps_cpus:3c\n> > /sys/class/net/ixgbe1/queues/rx-1/rps_cpus:3c\n> > \n> > Generator is sending: 12,715,782 tx_packets /sec\n> > \n> >  ./pktgen_sample04_many_flows.sh -vi ixgbe2 -m 00:1b:21:bb:9a:84 \\\n> >     -d 172.16.0.2 -t8\n> > \n> > $ nstat > /dev/null && sleep 1 && nstat\n> > #kernel\n> > IpInReceives                    6346544            0.0\n> > IpInDelivers                    6346544            0.0\n> > IpOutRequests                   1020               0.0\n> > IcmpOutMsgs                     1020               0.0\n> > IcmpOutDestUnreachs             1020               0.0\n> > IcmpMsgOutType3                 1020               0.0\n> > UdpNoPorts                      6346898            0.0\n> > IpExtInOctets                   291964714          0.0\n> > IpExtOutOctets                  73440              0.0\n> > IpExtInNoECTPkts                6347063            0.0\n> > \n> > $ mpstat -P ALL -u -I SCPU -I SUM\n> > \n> > Average:     CPU    %usr   %nice    %sys   %irq   %soft  %idle\n> > Average:     all    0.00    0.00    0.00   0.42   72.97  26.61\n> > Average:       0    0.00    0.00    0.00   0.17   99.83   0.00\n> > Average:       1    0.00    0.00    0.00   0.17   99.83   0.00\n> > Average:       2    0.00    0.00    0.00   0.67   60.37  38.96\n> > Average:       3    0.00    0.00    0.00   0.67   58.70  40.64\n> > Average:       4    0.00    0.00    0.00   0.67   59.53  39.80\n> > Average:       5    0.00    0.00    0.00   0.67   58.93  40.40\n> > \n> > Average:     CPU    intr/s\n> > Average:     all 152067.22\n> > Average:       0  50064.73\n> > Average:       1  50089.35\n> > Average:       2  45095.17\n> > Average:       3  44875.04\n> > Average:       4  44906.32\n> > Average:       5  45152.08\n> > \n> > Average:     CPU     TIMER/s   NET_TX/s   NET_RX/s TASKLET/s  SCHED/s     RCU/s\n> > Average:       0      609.48       0.17   49431.28      0.00     2.66     21.13\n> > Average:       1      567.55       0.00   49498.00      0.00     2.66     21.13\n> > Average:       2      998.34       0.00   43941.60      4.16    82.86     68.22\n> > Average:       3      540.60       0.17   44140.27      0.00    85.52    108.49\n> > Average:       4      537.27       0.00   44219.63      0.00    84.53     64.89\n> > Average:       5      530.78       0.17   44445.59      0.00    85.02     90.52\n> > \n> > From mpstat it looks like it is the RX-RPS CPUs that are the bottleneck.\n> > \n> > Show adapter(s) (ixgbe1) statistics (ONLY that changed!)\n> > Ethtool(ixgbe1) stat:     11109531 (   11,109,531) <= fdir_miss /sec\n> > Ethtool(ixgbe1) stat:    380632356 (  380,632,356) <= rx_bytes /sec\n> > Ethtool(ixgbe1) stat:    812792611 (  812,792,611) <= rx_bytes_nic /sec\n> > Ethtool(ixgbe1) stat:      1753550 (    1,753,550) <= rx_missed_errors /sec\n> > Ethtool(ixgbe1) stat:      4602487 (    4,602,487) <= rx_no_dma_resources /sec\n> > Ethtool(ixgbe1) stat:      6343873 (    6,343,873) <= rx_packets /sec\n> > Ethtool(ixgbe1) stat:     10946441 (   10,946,441) <= rx_pkts_nic /sec\n> > Ethtool(ixgbe1) stat:    190287853 (  190,287,853) <= rx_queue_0_bytes /sec\n> > Ethtool(ixgbe1) stat:      3171464 (    3,171,464) <= rx_queue_0_packets /sec\n> > Ethtool(ixgbe1) stat:    190344503 (  190,344,503) <= rx_queue_1_bytes /sec\n> > Ethtool(ixgbe1) stat:      3172408 (    3,172,408) <= rx_queue_1_packets /sec\n> > \n> > Notice, each RX-CPU can only process 3.1Mpps.\n> > \n> > RPS RX-CPU(0):\n> > \n> >  # Overhead  CPU  Symbol\n> >  # ........  ...  .......................................\n> >  #\n> >     11.72%  000  [k] ixgbe_poll\n> >     11.29%  000  [k] _raw_spin_lock\n> >     10.35%  000  [k] dev_gro_receive\n> >      8.36%  000  [k] __build_skb\n> >      7.35%  000  [k] __skb_get_hash\n> >      6.22%  000  [k] enqueue_to_backlog\n> >      5.89%  000  [k] __skb_flow_dissect\n> >      4.43%  000  [k] inet_gro_receive\n> >      4.19%  000  [k] ___slab_alloc\n> >      3.90%  000  [k] queued_spin_lock_slowpath\n> >      3.85%  000  [k] kmem_cache_alloc\n> >      3.06%  000  [k] build_skb\n> >      2.66%  000  [k] get_rps_cpu\n> >      2.57%  000  [k] napi_gro_receive\n> >      2.34%  000  [k] eth_type_trans\n> >      1.81%  000  [k] __cmpxchg_double_slab.isra.61\n> >      1.47%  000  [k] ixgbe_alloc_rx_buffers\n> >      1.43%  000  [k] get_partial_node.isra.81\n> >      0.84%  000  [k] swiotlb_sync_single\n> >      0.74%  000  [k] udp4_gro_receive\n> >      0.73%  000  [k] netif_receive_skb_internal\n> >      0.72%  000  [k] udp_gro_receive\n> >      0.63%  000  [k] skb_gro_reset_offset\n> >      0.49%  000  [k] __skb_flow_get_ports\n> >      0.48%  000  [k] llist_add_batch\n> >      0.36%  000  [k] swiotlb_sync_single_for_cpu\n> >      0.34%  000  [k] __slab_alloc\n> > \n> > \n> > Remote RPS-CPU(3) getting packets::\n> > \n> >  # Overhead  CPU  Symbol\n> >  # ........  ...  ..............................................\n> >  #\n> >     33.02%  003  [k] poll_idle\n> >     10.99%  003  [k] __netif_receive_skb_core\n> >     10.45%  003  [k] page_frag_free\n> >      8.49%  003  [k] ip_rcv\n> >      4.19%  003  [k] fib_table_lookup\n> >      2.84%  003  [k] __udp4_lib_rcv\n> >      2.81%  003  [k] __slab_free\n\nNotice slow-path of SLUB\n\n> >      2.23%  003  [k] __udp4_lib_lookup\n> >      2.09%  003  [k] ip_route_input_rcu\n> >      2.07%  003  [k] kmem_cache_free\n> >      2.06%  003  [k] udp_v4_early_demux\n> >      1.73%  003  [k] ip_rcv_finish  \n> \n> Very interesting data.\n\nYou removed some of the more interesting part of the perf-report, that\nshowed us hitting more of the SLUB slowpath for SKBs.  The slowpath\nconsist of many separate function calls, thus it doesn't bubble to the\ntop (the FlameGraph tool shows them easier).\n\n> So above perf report compares to xdp-redirect-cpu this one:\n> Perf top on a CPU(3) that have to alloc and free SKBs etc.\n> \n> # Overhead  CPU  Symbol\n> # ........  ...  .......................................\n> #\n>     15.51%  003  [k] fib_table_lookup\n>      8.91%  003  [k] cpu_map_kthread_run\n>      8.04%  003  [k] build_skb\n>      7.88%  003  [k] page_frag_free\n>      5.13%  003  [k] kmem_cache_alloc\n>      4.76%  003  [k] ip_route_input_rcu\n>      4.59%  003  [k] kmem_cache_free\n>      4.02%  003  [k] __udp4_lib_rcv\n>      3.20%  003  [k] fib_validate_source\n>      3.02%  003  [k] __netif_receive_skb_core\n>      3.02%  003  [k] udp_v4_early_demux\n>      2.90%  003  [k] ip_rcv\n>      2.80%  003  [k] ip_rcv_finish\n> \n> right?\n> and in RPS case the consumer cpu is 33% idle whereas in redirect-cpu\n> you can load it up all the way.\n> Am I interpreting all this correctly that with RPS cpu0 cannot\n> distributed the packets to other cpus fast enough and that's\n> a bottleneck?\n\nYes, exactly. The work needed on the RPS cpu0 is simply too much.\n\n> whereas in redirect-cpu you're doing early packet distribution\n> before skb alloc?\n\nYes, the main point to reducing the CPU cycles spend on the packet for\ndoing early packet distribution.\n\n> So in other words with redirect-cpu all consumer cpus are doing\n> skb alloc and in RPS cpu0 is allocating skbs for all ?\n\nYes.\n\n> and that's where 6M->12M performance gain comes from?\n\nYes, basically.  There are many small thing that help this along.  Like\ncpumap case always hitting the SLUB fastpath.  Another big thing is\nbulking. It is sort of hidden, but the XDP_REDIRECT flush mechanism is\nimplementing the RX bulking (I've been \"screaming\" about for the last\ncouple of years! ;-))","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y3N3l1lh7z9t2c\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 29 Sep 2017 17:09:55 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751963AbdI2HJw (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 29 Sep 2017 03:09:52 -0400","from mx1.redhat.com ([209.132.183.28]:35346 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1750979AbdI2HJu (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 29 Sep 2017 03:09:50 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id F18973680F;\n\tFri, 29 Sep 2017 07:09:49 +0000 (UTC)","from localhost (ovpn-200-30.brq.redhat.com [10.40.200.30])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id ECF685C1A3;\n\tFri, 29 Sep 2017 07:09:41 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com F18973680F","Date":"Fri, 29 Sep 2017 09:09:40 +0200","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"Alexei Starovoitov <alexei.starovoitov@gmail.com>","Cc":"John Fastabend <john.fastabend@gmail.com>,\n\tDaniel Borkmann <daniel@iogearbox.net>,\n\tpeter.waskiewicz.jr@intel.com, jakub.kicinski@netronome.com,\n\tnetdev@vger.kernel.org, Andy Gospodarek <andy@greyhouse.net>,\n\tbrouer@redhat.com","Subject":"Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access","Message-ID":"<20170929090738.040231a3@redhat.com>","In-Reply-To":"<20170927173233.tuqlutz6t2gwdk53@ast-mbp>","References":"<cover.1506297988.git.daniel@iogearbox.net>\n\t<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>\n\t<20170926211342.0c8e72b0@redhat.com>\n\t<59CAB17D.5090204@iogearbox.net>\n\t<20170927112604.1284f536@redhat.com>\n\t<645e7a39-c172-5882-5dd9-f038430114d1@gmail.com>\n\t<20170927165457.4265bfc3@redhat.com>\n\t<20170927173233.tuqlutz6t2gwdk53@ast-mbp>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.30]);\n\tFri, 29 Sep 2017 07:09:50 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}}]