{"id":818024,"url":"http://patchwork.ozlabs.org/api/patches/818024/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/patch/458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net/","project":{"id":7,"url":"http://patchwork.ozlabs.org/api/projects/7/?format=json","name":"Linux network development","link_name":"netdev","list_id":"netdev.vger.kernel.org","list_email":"netdev@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>","list_archive_url":null,"date":"2017-09-25T00:25:51","name":"[net-next,2/6] bpf: add meta pointer for direct access","commit_ref":null,"pull_url":null,"state":"accepted","archived":true,"hash":"235dbbd311d40c2ff20b0214d3039f14cd6c2847","submitter":{"id":65705,"url":"http://patchwork.ozlabs.org/api/people/65705/?format=json","name":"Daniel Borkmann","email":"daniel@iogearbox.net"},"delegate":{"id":34,"url":"http://patchwork.ozlabs.org/api/users/34/?format=json","username":"davem","first_name":"David","last_name":"Miller","email":"davem@davemloft.net"},"mbox":"http://patchwork.ozlabs.org/project/netdev/patch/458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net/mbox/","series":[{"id":4860,"url":"http://patchwork.ozlabs.org/api/series/4860/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/list/?series=4860","date":"2017-09-25T00:25:49","name":"BPF metadata for direct access","version":1,"mbox":"http://patchwork.ozlabs.org/series/4860/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/818024/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/818024/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y0lKM2vLQz9s82\n\tfor <patchwork-incoming@ozlabs.org>;\n\tMon, 25 Sep 2017 10:27:35 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S933361AbdIYA1d (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tSun, 24 Sep 2017 20:27:33 -0400","from www62.your-server.de ([213.133.104.62]:37213 \"EHLO\n\twww62.your-server.de\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S933009AbdIYA1G (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Sun, 24 Sep 2017 20:27:06 -0400","from [85.7.161.218] (helo=localhost)\n\tby www62.your-server.de with esmtpsa\n\t(TLSv1.2:DHE-RSA-AES128-GCM-SHA256:128) (Exim 4.85_2)\n\t(envelope-from <daniel@iogearbox.net>)\n\tid 1dwHEl-0000Hh-7m; Mon, 25 Sep 2017 02:27:03 +0200"],"From":"Daniel Borkmann <daniel@iogearbox.net>","To":"davem@davemloft.net","Cc":"alexei.starovoitov@gmail.com, john.fastabend@gmail.com,\n\tpeter.waskiewicz.jr@intel.com, jakub.kicinski@netronome.com,\n\tnetdev@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>","Subject":"[PATCH net-next 2/6] bpf: add meta pointer for direct access","Date":"Mon, 25 Sep 2017 02:25:51 +0200","Message-Id":"<458f9c13ab58abb1a15627906d03c33c42b02a7c.1506297988.git.daniel@iogearbox.net>","X-Mailer":"git-send-email 1.9.3","In-Reply-To":["<cover.1506297988.git.daniel@iogearbox.net>","<cover.1506297988.git.daniel@iogearbox.net>"],"References":["<cover.1506297988.git.daniel@iogearbox.net>","<cover.1506297988.git.daniel@iogearbox.net>"],"X-Authenticated-Sender":"daniel@iogearbox.net","X-Virus-Scanned":"Clear (ClamAV 0.99.2/23869/Sun Sep 24 18:45:57 2017)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"},"content":"This work enables generic transfer of metadata from XDP into skb. The\nbasic idea is that we can make use of the fact that the resulting skb\nmust be linear and already comes with a larger headroom for supporting\nbpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work\non a similar principle and introduce a small helper bpf_xdp_adjust_meta()\nfor adjusting a new pointer called xdp->data_meta. Thus, the packet has\na flexible and programmable room for meta data, followed by the actual\npacket data. struct xdp_buff is therefore laid out that we first point\nto data_hard_start, then data_meta directly prepended to data followed\nby data_end marking the end of packet. bpf_xdp_adjust_head() takes into\naccount whether we have meta data already prepended and if so, memmove()s\nthis along with the given offset provided there's enough room.\n\nxdp->data_meta is optional and programs are not required to use it. The\nrationale is that when we process the packet in XDP (e.g. as DoS filter),\nwe can push further meta data along with it for the XDP_PASS case, and\ngive the guarantee that a clsact ingress BPF program on the same device\ncan pick this up for further post-processing. Since we work with skb\nthere, we can also set skb->mark, skb->priority or other skb meta data\nout of BPF, thus having this scratch space generic and programmable\nallows for more flexibility than defining a direct 1:1 transfer of\npotentially new XDP members into skb (it's also more efficient as we\ndon't need to initialize/handle each of such new members). The facility\nalso works together with GRO aggregation. The scratch space at the head\nof the packet can be multiple of 4 byte up to 32 byte large. Drivers not\nyet supporting xdp->data_meta can simply be set up with xdp->data_meta\nas xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,\nsuch that the subsequent match against xdp->data for later access is\nguaranteed to fail.\n\nThe verifier treats xdp->data_meta/xdp->data the same way as we treat\nxdp->data/xdp->data_end pointer comparisons. The requirement for doing\nthe compare against xdp->data is that it hasn't been modified from it's\noriginal address we got from ctx access. It may have a range marking\nalready from prior successful xdp->data/xdp->data_end pointer comparisons\nthough.\n\nSigned-off-by: Daniel Borkmann <daniel@iogearbox.net>\nAcked-by: Alexei Starovoitov <ast@kernel.org>\nAcked-by: John Fastabend <john.fastabend@gmail.com>\n---\n drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c      |   1 +\n drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   1 +\n drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   1 +\n drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   1 +\n drivers/net/ethernet/mellanox/mlx4/en_rx.c         |   1 +\n drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   1 +\n .../net/ethernet/netronome/nfp/nfp_net_common.c    |   1 +\n drivers/net/ethernet/qlogic/qede/qede_fp.c         |   1 +\n drivers/net/tun.c                                  |   1 +\n drivers/net/virtio_net.c                           |   2 +\n include/linux/bpf.h                                |   1 +\n include/linux/filter.h                             |  21 +++-\n include/linux/skbuff.h                             |  68 +++++++++++-\n include/uapi/linux/bpf.h                           |  13 ++-\n kernel/bpf/verifier.c                              | 114 ++++++++++++++++-----\n net/bpf/test_run.c                                 |   1 +\n net/core/dev.c                                     |  31 +++++-\n net/core/filter.c                                  |  77 +++++++++++++-\n net/core/skbuff.c                                  |   2 +\n 19 files changed, 297 insertions(+), 42 deletions(-)","diff":"diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c\nindex d8f0c83..06ce63c 100644\n--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c\n+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c\n@@ -94,6 +94,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,\n \n \txdp.data_hard_start = *data_ptr - offset;\n \txdp.data = *data_ptr;\n+\txdp_set_data_meta_invalid(&xdp);\n \txdp.data_end = *data_ptr + *len;\n \torig_data = xdp.data;\n \tmapping = rx_buf->mapping - bp->rx_dma_offset;\ndiff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c\nindex 49b80da..d68478a 100644\n--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c\n+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c\n@@ -523,6 +523,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,\n \n \txdp.data_hard_start = page_address(page);\n \txdp.data = (void *)cpu_addr;\n+\txdp_set_data_meta_invalid(&xdp);\n \txdp.data_end = xdp.data + len;\n \torig_data = xdp.data;\n \ndiff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c\nindex 1519dfb..f426762 100644\n--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c\n+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c\n@@ -2107,6 +2107,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)\n \t\tif (!skb) {\n \t\t\txdp.data = page_address(rx_buffer->page) +\n \t\t\t\t   rx_buffer->page_offset;\n+\t\t\txdp_set_data_meta_invalid(&xdp);\n \t\t\txdp.data_hard_start = xdp.data -\n \t\t\t\t\t      i40e_rx_offset(rx_ring);\n \t\t\txdp.data_end = xdp.data + size;\ndiff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c\nindex d962368..04bb03b 100644\n--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c\n+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c\n@@ -2326,6 +2326,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,\n \t\tif (!skb) {\n \t\t\txdp.data = page_address(rx_buffer->page) +\n \t\t\t\t   rx_buffer->page_offset;\n+\t\t\txdp_set_data_meta_invalid(&xdp);\n \t\t\txdp.data_hard_start = xdp.data -\n \t\t\t\t\t      ixgbe_rx_offset(rx_ring);\n \t\t\txdp.data_end = xdp.data + size;\ndiff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c\nindex b97a55c8..8f9cb8a 100644\n--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c\n+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c\n@@ -762,6 +762,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud\n \n \t\t\txdp.data_hard_start = va - frags[0].page_offset;\n \t\t\txdp.data = va;\n+\t\t\txdp_set_data_meta_invalid(&xdp);\n \t\t\txdp.data_end = xdp.data + length;\n \t\t\torig_data = xdp.data;\n \ndiff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c\nindex f1dd638..30b3f3f 100644\n--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c\n+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c\n@@ -794,6 +794,7 @@ static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,\n \t\treturn false;\n \n \txdp.data = va + *rx_headroom;\n+\txdp_set_data_meta_invalid(&xdp);\n \txdp.data_end = xdp.data + *len;\n \txdp.data_hard_start = va;\n \ndiff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c\nindex 1c0187f..e3a38be 100644\n--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c\n+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c\n@@ -1583,6 +1583,7 @@ static int nfp_net_run_xdp(struct bpf_prog *prog, void *data, void *hard_start,\n \n \txdp.data_hard_start = hard_start;\n \txdp.data = data + *off;\n+\txdp_set_data_meta_invalid(&xdp);\n \txdp.data_end = data + *off + *len;\n \n \torig_data = xdp.data;\ndiff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c\nindex 6fc854b..48ec4c5 100644\n--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c\n+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c\n@@ -1004,6 +1004,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,\n \n \txdp.data_hard_start = page_address(bd->data);\n \txdp.data = xdp.data_hard_start + *data_offset;\n+\txdp_set_data_meta_invalid(&xdp);\n \txdp.data_end = xdp.data + *len;\n \n \t/* Queues always have a full reset currently, so for the time\ndiff --git a/drivers/net/tun.c b/drivers/net/tun.c\nindex 3c9985f..1757fd7 100644\n--- a/drivers/net/tun.c\n+++ b/drivers/net/tun.c\n@@ -1314,6 +1314,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,\n \n \t\txdp.data_hard_start = buf;\n \t\txdp.data = buf + pad;\n+\t\txdp_set_data_meta_invalid(&xdp);\n \t\txdp.data_end = xdp.data + len;\n \t\torig_data = xdp.data;\n \t\tact = bpf_prog_run_xdp(xdp_prog, &xdp);\ndiff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c\nindex dd14a45..fc059f1 100644\n--- a/drivers/net/virtio_net.c\n+++ b/drivers/net/virtio_net.c\n@@ -554,6 +554,7 @@ static struct sk_buff *receive_small(struct net_device *dev,\n \n \t\txdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;\n \t\txdp.data = xdp.data_hard_start + xdp_headroom;\n+\t\txdp_set_data_meta_invalid(&xdp);\n \t\txdp.data_end = xdp.data + len;\n \t\torig_data = xdp.data;\n \t\tact = bpf_prog_run_xdp(xdp_prog, &xdp);\n@@ -686,6 +687,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,\n \t\tdata = page_address(xdp_page) + offset;\n \t\txdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;\n \t\txdp.data = data + vi->hdr_len;\n+\t\txdp_set_data_meta_invalid(&xdp);\n \t\txdp.data_end = xdp.data + (len - vi->hdr_len);\n \t\tact = bpf_prog_run_xdp(xdp_prog, &xdp);\n \ndiff --git a/include/linux/bpf.h b/include/linux/bpf.h\nindex 8390859..2b672c5 100644\n--- a/include/linux/bpf.h\n+++ b/include/linux/bpf.h\n@@ -137,6 +137,7 @@ enum bpf_reg_type {\n \tPTR_TO_MAP_VALUE,\t /* reg points to map element value */\n \tPTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */\n \tPTR_TO_STACK,\t\t /* reg == frame_pointer + offset */\n+\tPTR_TO_PACKET_META,\t /* skb->data - meta_len */\n \tPTR_TO_PACKET,\t\t /* reg points to skb->data */\n \tPTR_TO_PACKET_END,\t /* skb->data + headlen */\n };\ndiff --git a/include/linux/filter.h b/include/linux/filter.h\nindex 052bab3..911d454 100644\n--- a/include/linux/filter.h\n+++ b/include/linux/filter.h\n@@ -487,12 +487,14 @@ struct sk_filter {\n \n struct bpf_skb_data_end {\n \tstruct qdisc_skb_cb qdisc_cb;\n+\tvoid *data_meta;\n \tvoid *data_end;\n };\n \n struct xdp_buff {\n \tvoid *data;\n \tvoid *data_end;\n+\tvoid *data_meta;\n \tvoid *data_hard_start;\n };\n \n@@ -507,7 +509,8 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)\n \tstruct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;\n \n \tBUILD_BUG_ON(sizeof(*cb) > FIELD_SIZEOF(struct sk_buff, cb));\n-\tcb->data_end = skb->data + skb_headlen(skb);\n+\tcb->data_meta = skb->data - skb_metadata_len(skb);\n+\tcb->data_end  = skb->data + skb_headlen(skb);\n }\n \n static inline u8 *bpf_skb_cb(struct sk_buff *skb)\n@@ -728,8 +731,22 @@ int xdp_do_redirect(struct net_device *dev,\n \t\t    struct bpf_prog *prog);\n void xdp_do_flush_map(void);\n \n+/* Drivers not supporting XDP metadata can use this helper, which\n+ * rejects any room expansion for metadata as a result.\n+ */\n+static __always_inline void\n+xdp_set_data_meta_invalid(struct xdp_buff *xdp)\n+{\n+\txdp->data_meta = xdp->data + 1;\n+}\n+\n+static __always_inline bool\n+xdp_data_meta_unsupported(const struct xdp_buff *xdp)\n+{\n+\treturn unlikely(xdp->data_meta > xdp->data);\n+}\n+\n void bpf_warn_invalid_xdp_action(u32 act);\n-void bpf_warn_invalid_xdp_redirect(u32 ifindex);\n \n struct sock *do_sk_redirect_map(void);\n \ndiff --git a/include/linux/skbuff.h b/include/linux/skbuff.h\nindex f9db553..19e64bf 100644\n--- a/include/linux/skbuff.h\n+++ b/include/linux/skbuff.h\n@@ -489,8 +489,9 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,\n  * the end of the header data, ie. at skb->end.\n  */\n struct skb_shared_info {\n-\tunsigned short\t_unused;\n-\tunsigned char\tnr_frags;\n+\t__u8\t\t__unused;\n+\t__u8\t\tmeta_len;\n+\t__u8\t\tnr_frags;\n \t__u8\t\ttx_flags;\n \tunsigned short\tgso_size;\n \t/* Warning: this field is not always filled in (UFO)! */\n@@ -3400,6 +3401,69 @@ static inline ktime_t net_invalid_timestamp(void)\n \treturn 0;\n }\n \n+static inline u8 skb_metadata_len(const struct sk_buff *skb)\n+{\n+\treturn skb_shinfo(skb)->meta_len;\n+}\n+\n+static inline void *skb_metadata_end(const struct sk_buff *skb)\n+{\n+\treturn skb_mac_header(skb);\n+}\n+\n+static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,\n+\t\t\t\t\t  const struct sk_buff *skb_b,\n+\t\t\t\t\t  u8 meta_len)\n+{\n+\tconst void *a = skb_metadata_end(skb_a);\n+\tconst void *b = skb_metadata_end(skb_b);\n+\t/* Using more efficient varaiant than plain call to memcmp(). */\n+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64\n+\tu64 diffs = 0;\n+\n+\tswitch (meta_len) {\n+#define __it(x, op) (x -= sizeof(u##op))\n+#define __it_diff(a, b, op) (*(u##op *)__it(a, op)) ^ (*(u##op *)__it(b, op))\n+\tcase 32: diffs |= __it_diff(a, b, 64);\n+\tcase 24: diffs |= __it_diff(a, b, 64);\n+\tcase 16: diffs |= __it_diff(a, b, 64);\n+\tcase  8: diffs |= __it_diff(a, b, 64);\n+\t\tbreak;\n+\tcase 28: diffs |= __it_diff(a, b, 64);\n+\tcase 20: diffs |= __it_diff(a, b, 64);\n+\tcase 12: diffs |= __it_diff(a, b, 64);\n+\tcase  4: diffs |= __it_diff(a, b, 32);\n+\t\tbreak;\n+\t}\n+\treturn diffs;\n+#else\n+\treturn memcmp(a - meta_len, b - meta_len, meta_len);\n+#endif\n+}\n+\n+static inline bool skb_metadata_differs(const struct sk_buff *skb_a,\n+\t\t\t\t\tconst struct sk_buff *skb_b)\n+{\n+\tu8 len_a = skb_metadata_len(skb_a);\n+\tu8 len_b = skb_metadata_len(skb_b);\n+\n+\tif (!(len_a | len_b))\n+\t\treturn false;\n+\n+\treturn len_a != len_b ?\n+\t       true : __skb_metadata_differs(skb_a, skb_b, len_a);\n+}\n+\n+static inline void skb_metadata_set(struct sk_buff *skb, u8 meta_len)\n+{\n+\tskb_shinfo(skb)->meta_len = meta_len;\n+}\n+\n+static inline void skb_metadata_clear(struct sk_buff *skb)\n+{\n+\tskb_metadata_set(skb, 0);\n+}\n+\n struct sk_buff *skb_clone_sk(struct sk_buff *skb);\n \n #ifdef CONFIG_NETWORK_PHY_TIMESTAMPING\ndiff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h\nindex 43ab5c4..e43491a 100644\n--- a/include/uapi/linux/bpf.h\n+++ b/include/uapi/linux/bpf.h\n@@ -582,6 +582,12 @@ enum bpf_attach_type {\n  *\t@map: pointer to sockmap to update\n  *\t@key: key to insert/update sock in map\n  *\t@flags: same flags as map update elem\n+ *\n+ * int bpf_xdp_adjust_meta(xdp_md, delta)\n+ *     Adjust the xdp_md.data_meta by delta\n+ *     @xdp_md: pointer to xdp_md\n+ *     @delta: An positive/negative integer to be added to xdp_md.data_meta\n+ *     Return: 0 on success or negative on error\n  */\n #define __BPF_FUNC_MAPPER(FN)\t\t\\\n \tFN(unspec),\t\t\t\\\n@@ -638,6 +644,7 @@ enum bpf_attach_type {\n \tFN(redirect_map),\t\t\\\n \tFN(sk_redirect_map),\t\t\\\n \tFN(sock_map_update),\t\t\\\n+\tFN(xdp_adjust_meta),\n \n /* integer value in 'imm' field of BPF_CALL instruction selects which helper\n  * function eBPF program intends to call\n@@ -715,7 +722,7 @@ struct __sk_buff {\n \t__u32 data_end;\n \t__u32 napi_id;\n \n-\t/* accessed by BPF_PROG_TYPE_sk_skb types */\n+\t/* Accessed by BPF_PROG_TYPE_sk_skb types from here to ... */\n \t__u32 family;\n \t__u32 remote_ip4;\t/* Stored in network byte order */\n \t__u32 local_ip4;\t/* Stored in network byte order */\n@@ -723,6 +730,9 @@ struct __sk_buff {\n \t__u32 local_ip6[4];\t/* Stored in network byte order */\n \t__u32 remote_port;\t/* Stored in network byte order */\n \t__u32 local_port;\t/* stored in host byte order */\n+\t/* ... here. */\n+\n+\t__u32 data_meta;\n };\n \n struct bpf_tunnel_key {\n@@ -783,6 +793,7 @@ enum xdp_action {\n struct xdp_md {\n \t__u32 data;\n \t__u32 data_end;\n+\t__u32 data_meta;\n };\n \n enum sk_action {\ndiff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c\nindex b914fbe..f849eca 100644\n--- a/kernel/bpf/verifier.c\n+++ b/kernel/bpf/verifier.c\n@@ -177,6 +177,12 @@ static __printf(1, 2) void verbose(const char *fmt, ...)\n \tva_end(args);\n }\n \n+static bool type_is_pkt_pointer(enum bpf_reg_type type)\n+{\n+\treturn type == PTR_TO_PACKET ||\n+\t       type == PTR_TO_PACKET_META;\n+}\n+\n /* string representation of 'enum bpf_reg_type' */\n static const char * const reg_type_str[] = {\n \t[NOT_INIT]\t\t= \"?\",\n@@ -187,6 +193,7 @@ static __printf(1, 2) void verbose(const char *fmt, ...)\n \t[PTR_TO_MAP_VALUE_OR_NULL] = \"map_value_or_null\",\n \t[PTR_TO_STACK]\t\t= \"fp\",\n \t[PTR_TO_PACKET]\t\t= \"pkt\",\n+\t[PTR_TO_PACKET_META]\t= \"pkt_meta\",\n \t[PTR_TO_PACKET_END]\t= \"pkt_end\",\n };\n \n@@ -226,7 +233,7 @@ static void print_verifier_state(struct bpf_verifier_state *state)\n \t\t\tverbose(\"(id=%d\", reg->id);\n \t\t\tif (t != SCALAR_VALUE)\n \t\t\t\tverbose(\",off=%d\", reg->off);\n-\t\t\tif (t == PTR_TO_PACKET)\n+\t\t\tif (type_is_pkt_pointer(t))\n \t\t\t\tverbose(\",r=%d\", reg->range);\n \t\t\telse if (t == CONST_PTR_TO_MAP ||\n \t\t\t\t t == PTR_TO_MAP_VALUE ||\n@@ -519,6 +526,31 @@ static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)\n \t__mark_reg_known_zero(regs + regno);\n }\n \n+static bool reg_is_pkt_pointer(const struct bpf_reg_state *reg)\n+{\n+\treturn type_is_pkt_pointer(reg->type);\n+}\n+\n+static bool reg_is_pkt_pointer_any(const struct bpf_reg_state *reg)\n+{\n+\treturn reg_is_pkt_pointer(reg) ||\n+\t       reg->type == PTR_TO_PACKET_END;\n+}\n+\n+/* Unmodified PTR_TO_PACKET[_META,_END] register from ctx access. */\n+static bool reg_is_init_pkt_pointer(const struct bpf_reg_state *reg,\n+\t\t\t\t    enum bpf_reg_type which)\n+{\n+\t/* The register can already have a range from prior markings.\n+\t * This is fine as long as it hasn't been advanced from its\n+\t * origin.\n+\t */\n+\treturn reg->type == which &&\n+\t       reg->id == 0 &&\n+\t       reg->off == 0 &&\n+\t       tnum_equals_const(reg->var_off, 0);\n+}\n+\n /* Attempts to improve min/max values based on var_off information */\n static void __update_reg_bounds(struct bpf_reg_state *reg)\n {\n@@ -702,6 +734,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)\n \tcase PTR_TO_STACK:\n \tcase PTR_TO_CTX:\n \tcase PTR_TO_PACKET:\n+\tcase PTR_TO_PACKET_META:\n \tcase PTR_TO_PACKET_END:\n \tcase CONST_PTR_TO_MAP:\n \t\treturn true;\n@@ -1047,7 +1080,10 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,\n \n \tswitch (reg->type) {\n \tcase PTR_TO_PACKET:\n-\t\t/* special case, because of NET_IP_ALIGN */\n+\tcase PTR_TO_PACKET_META:\n+\t\t/* Special case, because of NET_IP_ALIGN. Given metadata sits\n+\t\t * right in front, treat it the very same way.\n+\t\t */\n \t\treturn check_pkt_ptr_alignment(reg, off, size, strict);\n \tcase PTR_TO_MAP_VALUE:\n \t\tpointer_desc = \"value \";\n@@ -1124,8 +1160,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn\n \t\terr = check_ctx_access(env, insn_idx, off, size, t, &reg_type);\n \t\tif (!err && t == BPF_READ && value_regno >= 0) {\n \t\t\t/* ctx access returns either a scalar, or a\n-\t\t\t * PTR_TO_PACKET[_END].  In the latter case, we know\n-\t\t\t * the offset is zero.\n+\t\t\t * PTR_TO_PACKET[_META,_END]. In the latter\n+\t\t\t * case, we know the offset is zero.\n \t\t\t */\n \t\t\tif (reg_type == SCALAR_VALUE)\n \t\t\t\tmark_reg_unknown(state->regs, value_regno);\n@@ -1170,7 +1206,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn\n \t\t} else {\n \t\t\terr = check_stack_read(state, off, size, value_regno);\n \t\t}\n-\t} else if (reg->type == PTR_TO_PACKET) {\n+\t} else if (reg_is_pkt_pointer(reg)) {\n \t\tif (t == BPF_WRITE && !may_access_direct_pkt_data(env, NULL, t)) {\n \t\t\tverbose(\"cannot write into packet\\n\");\n \t\t\treturn -EACCES;\n@@ -1310,6 +1346,7 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,\n \n \tswitch (reg->type) {\n \tcase PTR_TO_PACKET:\n+\tcase PTR_TO_PACKET_META:\n \t\treturn check_packet_access(env, regno, reg->off, access_size);\n \tcase PTR_TO_MAP_VALUE:\n \t\treturn check_map_access(env, regno, reg->off, access_size);\n@@ -1342,7 +1379,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,\n \t\treturn 0;\n \t}\n \n-\tif (type == PTR_TO_PACKET &&\n+\tif (type_is_pkt_pointer(type) &&\n \t    !may_access_direct_pkt_data(env, meta, BPF_READ)) {\n \t\tverbose(\"helper access to the packet is not allowed\\n\");\n \t\treturn -EACCES;\n@@ -1351,7 +1388,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,\n \tif (arg_type == ARG_PTR_TO_MAP_KEY ||\n \t    arg_type == ARG_PTR_TO_MAP_VALUE) {\n \t\texpected_type = PTR_TO_STACK;\n-\t\tif (type != PTR_TO_PACKET && type != expected_type)\n+\t\tif (!type_is_pkt_pointer(type) &&\n+\t\t    type != expected_type)\n \t\t\tgoto err_type;\n \t} else if (arg_type == ARG_CONST_SIZE ||\n \t\t   arg_type == ARG_CONST_SIZE_OR_ZERO) {\n@@ -1375,7 +1413,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,\n \t\t */\n \t\tif (register_is_null(*reg))\n \t\t\t/* final test in check_stack_boundary() */;\n-\t\telse if (type != PTR_TO_PACKET && type != PTR_TO_MAP_VALUE &&\n+\t\telse if (!type_is_pkt_pointer(type) &&\n+\t\t\t type != PTR_TO_MAP_VALUE &&\n \t\t\t type != expected_type)\n \t\t\tgoto err_type;\n \t\tmeta->raw_mode = arg_type == ARG_PTR_TO_UNINIT_MEM;\n@@ -1401,7 +1440,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,\n \t\t\tverbose(\"invalid map_ptr to access map->key\\n\");\n \t\t\treturn -EACCES;\n \t\t}\n-\t\tif (type == PTR_TO_PACKET)\n+\t\tif (type_is_pkt_pointer(type))\n \t\t\terr = check_packet_access(env, regno, reg->off,\n \t\t\t\t\t\t  meta->map_ptr->key_size);\n \t\telse\n@@ -1417,7 +1456,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,\n \t\t\tverbose(\"invalid map_ptr to access map->value\\n\");\n \t\t\treturn -EACCES;\n \t\t}\n-\t\tif (type == PTR_TO_PACKET)\n+\t\tif (type_is_pkt_pointer(type))\n \t\t\terr = check_packet_access(env, regno, reg->off,\n \t\t\t\t\t\t  meta->map_ptr->value_size);\n \t\telse\n@@ -1590,8 +1629,8 @@ static int check_raw_mode(const struct bpf_func_proto *fn)\n \treturn count > 1 ? -EINVAL : 0;\n }\n \n-/* Packet data might have moved, any old PTR_TO_PACKET[_END] are now invalid,\n- * so turn them into unknown SCALAR_VALUE.\n+/* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]\n+ * are now invalid, so turn them into unknown SCALAR_VALUE.\n  */\n static void clear_all_pkt_pointers(struct bpf_verifier_env *env)\n {\n@@ -1600,18 +1639,15 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)\n \tint i;\n \n \tfor (i = 0; i < MAX_BPF_REG; i++)\n-\t\tif (regs[i].type == PTR_TO_PACKET ||\n-\t\t    regs[i].type == PTR_TO_PACKET_END)\n+\t\tif (reg_is_pkt_pointer_any(&regs[i]))\n \t\t\tmark_reg_unknown(regs, i);\n \n \tfor (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {\n \t\tif (state->stack_slot_type[i] != STACK_SPILL)\n \t\t\tcontinue;\n \t\treg = &state->spilled_regs[i / BPF_REG_SIZE];\n-\t\tif (reg->type != PTR_TO_PACKET &&\n-\t\t    reg->type != PTR_TO_PACKET_END)\n-\t\t\tcontinue;\n-\t\t__mark_reg_unknown(reg);\n+\t\tif (reg_is_pkt_pointer_any(reg))\n+\t\t\t__mark_reg_unknown(reg);\n \t}\n }\n \n@@ -1871,7 +1907,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,\n \t\t}\n \t\tdst_reg->var_off = tnum_add(ptr_reg->var_off, off_reg->var_off);\n \t\tdst_reg->off = ptr_reg->off;\n-\t\tif (ptr_reg->type == PTR_TO_PACKET) {\n+\t\tif (reg_is_pkt_pointer(ptr_reg)) {\n \t\t\tdst_reg->id = ++env->id_gen;\n \t\t\t/* something was added to pkt_ptr, set range to zero */\n \t\t\tdst_reg->range = 0;\n@@ -1931,7 +1967,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,\n \t\t}\n \t\tdst_reg->var_off = tnum_sub(ptr_reg->var_off, off_reg->var_off);\n \t\tdst_reg->off = ptr_reg->off;\n-\t\tif (ptr_reg->type == PTR_TO_PACKET) {\n+\t\tif (reg_is_pkt_pointer(ptr_reg)) {\n \t\t\tdst_reg->id = ++env->id_gen;\n \t\t\t/* something was added to pkt_ptr, set range to zero */\n \t\t\tif (smin_val < 0)\n@@ -2421,7 +2457,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)\n }\n \n static void find_good_pkt_pointers(struct bpf_verifier_state *state,\n-\t\t\t\t   struct bpf_reg_state *dst_reg)\n+\t\t\t\t   struct bpf_reg_state *dst_reg,\n+\t\t\t\t   enum bpf_reg_type type)\n {\n \tstruct bpf_reg_state *regs = state->regs, *reg;\n \tint i;\n@@ -2483,7 +2520,7 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *state,\n \t * dst_reg->off is known < MAX_PACKET_OFF, therefore it fits in a u16.\n \t */\n \tfor (i = 0; i < MAX_BPF_REG; i++)\n-\t\tif (regs[i].type == PTR_TO_PACKET && regs[i].id == dst_reg->id)\n+\t\tif (regs[i].type == type && regs[i].id == dst_reg->id)\n \t\t\t/* keep the maximum range already checked */\n \t\t\tregs[i].range = max_t(u16, regs[i].range, dst_reg->off);\n \n@@ -2491,7 +2528,7 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *state,\n \t\tif (state->stack_slot_type[i] != STACK_SPILL)\n \t\t\tcontinue;\n \t\treg = &state->spilled_regs[i / BPF_REG_SIZE];\n-\t\tif (reg->type == PTR_TO_PACKET && reg->id == dst_reg->id)\n+\t\tif (reg->type == type && reg->id == dst_reg->id)\n \t\t\treg->range = max_t(u16, reg->range, dst_reg->off);\n \t}\n }\n@@ -2856,19 +2893,39 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,\n \t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JGT &&\n \t\t   dst_reg->type == PTR_TO_PACKET &&\n \t\t   regs[insn->src_reg].type == PTR_TO_PACKET_END) {\n-\t\tfind_good_pkt_pointers(this_branch, dst_reg);\n+\t\tfind_good_pkt_pointers(this_branch, dst_reg, PTR_TO_PACKET);\n \t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JLT &&\n \t\t   dst_reg->type == PTR_TO_PACKET &&\n \t\t   regs[insn->src_reg].type == PTR_TO_PACKET_END) {\n-\t\tfind_good_pkt_pointers(other_branch, dst_reg);\n+\t\tfind_good_pkt_pointers(other_branch, dst_reg, PTR_TO_PACKET);\n \t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JGE &&\n \t\t   dst_reg->type == PTR_TO_PACKET_END &&\n \t\t   regs[insn->src_reg].type == PTR_TO_PACKET) {\n-\t\tfind_good_pkt_pointers(other_branch, &regs[insn->src_reg]);\n+\t\tfind_good_pkt_pointers(other_branch, &regs[insn->src_reg],\n+\t\t\t\t       PTR_TO_PACKET);\n \t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JLE &&\n \t\t   dst_reg->type == PTR_TO_PACKET_END &&\n \t\t   regs[insn->src_reg].type == PTR_TO_PACKET) {\n-\t\tfind_good_pkt_pointers(this_branch, &regs[insn->src_reg]);\n+\t\tfind_good_pkt_pointers(this_branch, &regs[insn->src_reg],\n+\t\t\t\t       PTR_TO_PACKET);\n+\t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JGT &&\n+\t\t   dst_reg->type == PTR_TO_PACKET_META &&\n+\t\t   reg_is_init_pkt_pointer(&regs[insn->src_reg], PTR_TO_PACKET)) {\n+\t\tfind_good_pkt_pointers(this_branch, dst_reg, PTR_TO_PACKET_META);\n+\t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JLT &&\n+\t\t   dst_reg->type == PTR_TO_PACKET_META &&\n+\t\t   reg_is_init_pkt_pointer(&regs[insn->src_reg], PTR_TO_PACKET)) {\n+\t\tfind_good_pkt_pointers(other_branch, dst_reg, PTR_TO_PACKET_META);\n+\t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JGE &&\n+\t\t   reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) &&\n+\t\t   regs[insn->src_reg].type == PTR_TO_PACKET_META) {\n+\t\tfind_good_pkt_pointers(other_branch, &regs[insn->src_reg],\n+\t\t\t\t       PTR_TO_PACKET_META);\n+\t} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JLE &&\n+\t\t   reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) &&\n+\t\t   regs[insn->src_reg].type == PTR_TO_PACKET_META) {\n+\t\tfind_good_pkt_pointers(this_branch, &regs[insn->src_reg],\n+\t\t\t\t       PTR_TO_PACKET_META);\n \t} else if (is_pointer_value(env, insn->dst_reg)) {\n \t\tverbose(\"R%d pointer comparison prohibited\\n\", insn->dst_reg);\n \t\treturn -EACCES;\n@@ -3298,8 +3355,9 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,\n \t\t\treturn false;\n \t\t/* Check our ids match any regs they're supposed to */\n \t\treturn check_ids(rold->id, rcur->id, idmap);\n+\tcase PTR_TO_PACKET_META:\n \tcase PTR_TO_PACKET:\n-\t\tif (rcur->type != PTR_TO_PACKET)\n+\t\tif (rcur->type != rold->type)\n \t\t\treturn false;\n \t\t/* We must have at least as much range as the old ptr\n \t\t * did, so that any accesses which were safe before are\ndiff --git a/net/bpf/test_run.c b/net/bpf/test_run.c\nindex df67251..a86e668 100644\n--- a/net/bpf/test_run.c\n+++ b/net/bpf/test_run.c\n@@ -162,6 +162,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,\n \n \txdp.data_hard_start = data;\n \txdp.data = data + XDP_PACKET_HEADROOM + NET_IP_ALIGN;\n+\txdp.data_meta = xdp.data;\n \txdp.data_end = xdp.data + size;\n \n \tretval = bpf_test_run(prog, &xdp, repeat, &duration);\ndiff --git a/net/core/dev.c b/net/core/dev.c\nindex 97abddd..e350c76 100644\n--- a/net/core/dev.c\n+++ b/net/core/dev.c\n@@ -3864,8 +3864,8 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,\n static u32 netif_receive_generic_xdp(struct sk_buff *skb,\n \t\t\t\t     struct bpf_prog *xdp_prog)\n {\n+\tu32 metalen, act = XDP_DROP;\n \tstruct xdp_buff xdp;\n-\tu32 act = XDP_DROP;\n \tvoid *orig_data;\n \tint hlen, off;\n \tu32 mac_len;\n@@ -3876,8 +3876,25 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,\n \tif (skb_cloned(skb))\n \t\treturn XDP_PASS;\n \n-\tif (skb_linearize(skb))\n-\t\tgoto do_drop;\n+\t/* XDP packets must be linear and must have sufficient headroom\n+\t * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also\n+\t * native XDP provides, thus we need to do it here as well.\n+\t */\n+\tif (skb_is_nonlinear(skb) ||\n+\t    skb_headroom(skb) < XDP_PACKET_HEADROOM) {\n+\t\tint hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);\n+\t\tint troom = skb->tail + skb->data_len - skb->end;\n+\n+\t\t/* In case we have to go down the path and also linearize,\n+\t\t * then lets do the pskb_expand_head() work just once here.\n+\t\t */\n+\t\tif (pskb_expand_head(skb,\n+\t\t\t\t     hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0,\n+\t\t\t\t     troom > 0 ? troom + 128 : 0, GFP_ATOMIC))\n+\t\t\tgoto do_drop;\n+\t\tif (troom > 0 && __skb_linearize(skb))\n+\t\t\tgoto do_drop;\n+\t}\n \n \t/* The XDP program wants to see the packet starting at the MAC\n \t * header.\n@@ -3885,6 +3902,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,\n \tmac_len = skb->data - skb_mac_header(skb);\n \thlen = skb_headlen(skb) + mac_len;\n \txdp.data = skb->data - mac_len;\n+\txdp.data_meta = xdp.data;\n \txdp.data_end = xdp.data + hlen;\n \txdp.data_hard_start = skb->data - skb_headroom(skb);\n \torig_data = xdp.data;\n@@ -3902,10 +3920,12 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,\n \tcase XDP_REDIRECT:\n \tcase XDP_TX:\n \t\t__skb_push(skb, mac_len);\n-\t\t/* fall through */\n+\t\tbreak;\n \tcase XDP_PASS:\n+\t\tmetalen = xdp.data - xdp.data_meta;\n+\t\tif (metalen)\n+\t\t\tskb_metadata_set(skb, metalen);\n \t\tbreak;\n-\n \tdefault:\n \t\tbpf_warn_invalid_xdp_action(act);\n \t\t/* fall through */\n@@ -4695,6 +4715,7 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)\n \t\tdiffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;\n \t\tdiffs |= p->vlan_tci ^ skb->vlan_tci;\n \t\tdiffs |= skb_metadata_dst_cmp(p, skb);\n+\t\tdiffs |= skb_metadata_differs(p, skb);\n \t\tif (maclen == ETH_HLEN)\n \t\t\tdiffs |= compare_ether_header(skb_mac_header(p),\n \t\t\t\t\t\t      skb_mac_header(skb));\ndiff --git a/net/core/filter.c b/net/core/filter.c\nindex c468e7c..9b6e7e8 100644\n--- a/net/core/filter.c\n+++ b/net/core/filter.c\n@@ -2447,14 +2447,26 @@ static int bpf_skb_trim_rcsum(struct sk_buff *skb, unsigned int new_len)\n \t.arg3_type\t= ARG_ANYTHING,\n };\n \n+static unsigned long xdp_get_metalen(const struct xdp_buff *xdp)\n+{\n+\treturn xdp_data_meta_unsupported(xdp) ? 0 :\n+\t       xdp->data - xdp->data_meta;\n+}\n+\n BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)\n {\n+\tunsigned long metalen = xdp_get_metalen(xdp);\n+\tvoid *data_start = xdp->data_hard_start + metalen;\n \tvoid *data = xdp->data + offset;\n \n-\tif (unlikely(data < xdp->data_hard_start ||\n+\tif (unlikely(data < data_start ||\n \t\t     data > xdp->data_end - ETH_HLEN))\n \t\treturn -EINVAL;\n \n+\tif (metalen)\n+\t\tmemmove(xdp->data_meta + offset,\n+\t\t\txdp->data_meta, metalen);\n+\txdp->data_meta += offset;\n \txdp->data = data;\n \n \treturn 0;\n@@ -2468,6 +2480,33 @@ static int bpf_skb_trim_rcsum(struct sk_buff *skb, unsigned int new_len)\n \t.arg2_type\t= ARG_ANYTHING,\n };\n \n+BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)\n+{\n+\tvoid *meta = xdp->data_meta + offset;\n+\tunsigned long metalen = xdp->data - meta;\n+\n+\tif (xdp_data_meta_unsupported(xdp))\n+\t\treturn -ENOTSUPP;\n+\tif (unlikely(meta < xdp->data_hard_start ||\n+\t\t     meta > xdp->data))\n+\t\treturn -EINVAL;\n+\tif (unlikely((metalen & (sizeof(__u32) - 1)) ||\n+\t\t     (metalen > 32)))\n+\t\treturn -EACCES;\n+\n+\txdp->data_meta = meta;\n+\n+\treturn 0;\n+}\n+\n+static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {\n+\t.func\t\t= bpf_xdp_adjust_meta,\n+\t.gpl_only\t= false,\n+\t.ret_type\t= RET_INTEGER,\n+\t.arg1_type\t= ARG_PTR_TO_CTX,\n+\t.arg2_type\t= ARG_ANYTHING,\n+};\n+\n static int __bpf_tx_xdp(struct net_device *dev,\n \t\t\tstruct bpf_map *map,\n \t\t\tstruct xdp_buff *xdp,\n@@ -2692,7 +2731,8 @@ bool bpf_helper_changes_pkt_data(void *func)\n \t    func == bpf_clone_redirect ||\n \t    func == bpf_l3_csum_replace ||\n \t    func == bpf_l4_csum_replace ||\n-\t    func == bpf_xdp_adjust_head)\n+\t    func == bpf_xdp_adjust_head ||\n+\t    func == bpf_xdp_adjust_meta)\n \t\treturn true;\n \n \treturn false;\n@@ -3288,6 +3328,8 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,\n \t\treturn &bpf_get_smp_processor_id_proto;\n \tcase BPF_FUNC_xdp_adjust_head:\n \t\treturn &bpf_xdp_adjust_head_proto;\n+\tcase BPF_FUNC_xdp_adjust_meta:\n+\t\treturn &bpf_xdp_adjust_meta_proto;\n \tcase BPF_FUNC_redirect:\n \t\treturn &bpf_xdp_redirect_proto;\n \tcase BPF_FUNC_redirect_map:\n@@ -3418,6 +3460,7 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type\n \tcase bpf_ctx_range_till(struct __sk_buff, remote_ip4, remote_ip4):\n \tcase bpf_ctx_range_till(struct __sk_buff, local_ip4, local_ip4):\n \tcase bpf_ctx_range(struct __sk_buff, data):\n+\tcase bpf_ctx_range(struct __sk_buff, data_meta):\n \tcase bpf_ctx_range(struct __sk_buff, data_end):\n \t\tif (size != size_default)\n \t\t\treturn false;\n@@ -3444,6 +3487,7 @@ static bool sk_filter_is_valid_access(int off, int size,\n \tswitch (off) {\n \tcase bpf_ctx_range(struct __sk_buff, tc_classid):\n \tcase bpf_ctx_range(struct __sk_buff, data):\n+\tcase bpf_ctx_range(struct __sk_buff, data_meta):\n \tcase bpf_ctx_range(struct __sk_buff, data_end):\n \tcase bpf_ctx_range_till(struct __sk_buff, family, local_port):\n \t\treturn false;\n@@ -3468,6 +3512,7 @@ static bool lwt_is_valid_access(int off, int size,\n \tswitch (off) {\n \tcase bpf_ctx_range(struct __sk_buff, tc_classid):\n \tcase bpf_ctx_range_till(struct __sk_buff, family, local_port):\n+\tcase bpf_ctx_range(struct __sk_buff, data_meta):\n \t\treturn false;\n \t}\n \n@@ -3586,6 +3631,9 @@ static bool tc_cls_act_is_valid_access(int off, int size,\n \tcase bpf_ctx_range(struct __sk_buff, data):\n \t\tinfo->reg_type = PTR_TO_PACKET;\n \t\tbreak;\n+\tcase bpf_ctx_range(struct __sk_buff, data_meta):\n+\t\tinfo->reg_type = PTR_TO_PACKET_META;\n+\t\tbreak;\n \tcase bpf_ctx_range(struct __sk_buff, data_end):\n \t\tinfo->reg_type = PTR_TO_PACKET_END;\n \t\tbreak;\n@@ -3619,6 +3667,9 @@ static bool xdp_is_valid_access(int off, int size,\n \tcase offsetof(struct xdp_md, data):\n \t\tinfo->reg_type = PTR_TO_PACKET;\n \t\tbreak;\n+\tcase offsetof(struct xdp_md, data_meta):\n+\t\tinfo->reg_type = PTR_TO_PACKET_META;\n+\t\tbreak;\n \tcase offsetof(struct xdp_md, data_end):\n \t\tinfo->reg_type = PTR_TO_PACKET_END;\n \t\tbreak;\n@@ -3677,6 +3728,12 @@ static bool sk_skb_is_valid_access(int off, int size,\n \t\t\t\t   enum bpf_access_type type,\n \t\t\t\t   struct bpf_insn_access_aux *info)\n {\n+\tswitch (off) {\n+\tcase bpf_ctx_range(struct __sk_buff, tc_classid):\n+\tcase bpf_ctx_range(struct __sk_buff, data_meta):\n+\t\treturn false;\n+\t}\n+\n \tif (type == BPF_WRITE) {\n \t\tswitch (off) {\n \t\tcase bpf_ctx_range(struct __sk_buff, mark):\n@@ -3689,8 +3746,6 @@ static bool sk_skb_is_valid_access(int off, int size,\n \t}\n \n \tswitch (off) {\n-\tcase bpf_ctx_range(struct __sk_buff, tc_classid):\n-\t\treturn false;\n \tcase bpf_ctx_range(struct __sk_buff, data):\n \t\tinfo->reg_type = PTR_TO_PACKET;\n \t\tbreak;\n@@ -3847,6 +3902,15 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,\n \t\t\t\t      offsetof(struct sk_buff, data));\n \t\tbreak;\n \n+\tcase offsetof(struct __sk_buff, data_meta):\n+\t\toff  = si->off;\n+\t\toff -= offsetof(struct __sk_buff, data_meta);\n+\t\toff += offsetof(struct sk_buff, cb);\n+\t\toff += offsetof(struct bpf_skb_data_end, data_meta);\n+\t\t*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg,\n+\t\t\t\t      si->src_reg, off);\n+\t\tbreak;\n+\n \tcase offsetof(struct __sk_buff, data_end):\n \t\toff  = si->off;\n \t\toff -= offsetof(struct __sk_buff, data_end);\n@@ -4095,6 +4159,11 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,\n \t\t\t\t      si->dst_reg, si->src_reg,\n \t\t\t\t      offsetof(struct xdp_buff, data));\n \t\tbreak;\n+\tcase offsetof(struct xdp_md, data_meta):\n+\t\t*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_meta),\n+\t\t\t\t      si->dst_reg, si->src_reg,\n+\t\t\t\t      offsetof(struct xdp_buff, data_meta));\n+\t\tbreak;\n \tcase offsetof(struct xdp_md, data_end):\n \t\t*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_end),\n \t\t\t\t      si->dst_reg, si->src_reg,\ndiff --git a/net/core/skbuff.c b/net/core/skbuff.c\nindex 16982de..681177b 100644\n--- a/net/core/skbuff.c\n+++ b/net/core/skbuff.c\n@@ -1509,6 +1509,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,\n \tskb->nohdr    = 0;\n \tatomic_set(&skb_shinfo(skb)->dataref, 1);\n \n+\tskb_metadata_clear(skb);\n+\n \t/* It is not generally safe to change skb->truesize.\n \t * For the moment, we really care of rx path, or\n \t * when skb is orphaned (not attached to a socket).\n","prefixes":["net-next","2/6"]}