diff mbox

[net-next] ebpf: add skb->hash to offset map for usage in {cls,act}_bpf or filters

Message ID 4b25d0b28e9c366a8a307d855d1690038956e6bd.1438382433.git.daniel@iogearbox.net
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann July 31, 2015, 10:46 p.m. UTC
Add skb->hash to the __sk_buff offset map, so it can be accessed from
an eBPF program. We currently already do this for classic BPF filters,
but not yet on eBPF, it might be useful as a demuxer in combination with
helpers like bpf_clone_redirect(), toy example:

  __section("cls-lb") int ingress_main(struct __sk_buff *skb)
  {
    unsigned int which = 3 + (skb->hash & 7);
    /* bpf_skb_store_bytes(skb, ...); */
    /* bpf_l{3,4}_csum_replace(skb, ...); */
    bpf_clone_redirect(skb, which, 0);
    return -1;
  }

I was thinking whether to add skb_get_hash(), but then concluded the
raw skb->hash seems fine in this case: we can directly access the hash
w/o extra eBPF helper function call, it's filled out by many NICs on
ingress, and in case the entropy level would not be sufficient, people
can still implement their own specific sw fallback hash mix anyway.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/bpf.h | 1 +
 net/core/filter.c        | 7 +++++++
 2 files changed, 8 insertions(+)

Comments

David Miller Aug. 3, 2015, 12:21 a.m. UTC | #1
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat,  1 Aug 2015 00:46:29 +0200

> Add skb->hash to the __sk_buff offset map, so it can be accessed from
> an eBPF program. We currently already do this for classic BPF filters,
> but not yet on eBPF, it might be useful as a demuxer in combination with
> helpers like bpf_clone_redirect(), toy example:
> 
>   __section("cls-lb") int ingress_main(struct __sk_buff *skb)
>   {
>     unsigned int which = 3 + (skb->hash & 7);
>     /* bpf_skb_store_bytes(skb, ...); */
>     /* bpf_l{3,4}_csum_replace(skb, ...); */
>     bpf_clone_redirect(skb, which, 0);
>     return -1;
>   }
> 
> I was thinking whether to add skb_get_hash(), but then concluded the
> raw skb->hash seems fine in this case: we can directly access the hash
> w/o extra eBPF helper function call, it's filled out by many NICs on
> ingress, and in case the entropy level would not be sufficient, people
> can still implement their own specific sw fallback hash mix anyway.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@plumgrid.com>

Applied, thanks Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Aug. 3, 2015, 1:09 a.m. UTC | #2
On Fri, Jul 31, 2015 at 3:46 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Add skb->hash to the __sk_buff offset map, so it can be accessed from
> an eBPF program. We currently already do this for classic BPF filters,
> but not yet on eBPF, it might be useful as a demuxer in combination with
> helpers like bpf_clone_redirect(), toy example:
>
>   __section("cls-lb") int ingress_main(struct __sk_buff *skb)
>   {
>     unsigned int which = 3 + (skb->hash & 7);
>     /* bpf_skb_store_bytes(skb, ...); */
>     /* bpf_l{3,4}_csum_replace(skb, ...); */
>     bpf_clone_redirect(skb, which, 0);
>     return -1;
>   }
>
> I was thinking whether to add skb_get_hash(), but then concluded the
> raw skb->hash seems fine in this case: we can directly access the hash
> w/o extra eBPF helper function call, it's filled out by many NICs on
> ingress, and in case the entropy level would not be sufficient, people
> can still implement their own specific sw fallback hash mix anyway.
>
Maybe we should add the skb_get_hash also? It doesn't as useful if
some scenarios we get a valid hash and in others not.

Tom

> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
>  include/uapi/linux/bpf.h | 1 +
>  net/core/filter.c        | 7 +++++++
>  2 files changed, 8 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index bc0d27d..2ce13c1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -290,6 +290,7 @@ struct __sk_buff {
>         __u32 ifindex;
>         __u32 tc_index;
>         __u32 cb[5];
> +       __u32 hash;
>  };
>
>  struct bpf_tunnel_key {
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1b72264..a50dbfa 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1711,6 +1711,13 @@ static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
>                                       offsetof(struct net_device, ifindex));
>                 break;
>
> +       case offsetof(struct __sk_buff, hash):
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
> +
> +               *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
> +                                     offsetof(struct sk_buff, hash));
> +               break;
> +
>         case offsetof(struct __sk_buff, mark):
>                 BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
>
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Aug. 3, 2015, 6:16 a.m. UTC | #3
On 8/2/15 6:09 PM, Tom Herbert wrote:
>> I was thinking whether to add skb_get_hash(), but then concluded the
>> >raw skb->hash seems fine in this case: we can directly access the hash
>> >w/o extra eBPF helper function call, it's filled out by many NICs on
>> >ingress, and in case the entropy level would not be sufficient, people
>> >can still implement their own specific sw fallback hash mix anyway.
>> >
> Maybe we should add the skb_get_hash also? It doesn't as useful if
> some scenarios we get a valid hash and in others not.

we also discussed whether it makes sense to expose l4_hash and sw_hash
bits as well. imo, seems a bit of overkill, since such call into sw hash
function like this exposes the logic of flow_dissector looking into
inner header. There are pros and cons. I think if we expose
flow_dissector it's cleaner to do it directly (instead of skb_get_hash).
Alternatively we can obfuscate skb_get_hash by calling it
'please compute some a hash on a packet somehow', but that will be
awkward to use. The programs can compute whatever hash they like anyway.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Aug. 3, 2015, 12:49 p.m. UTC | #4
On 08/03/2015 08:16 AM, Alexei Starovoitov wrote:
> On 8/2/15 6:09 PM, Tom Herbert wrote:
>>> I was thinking whether to add skb_get_hash(), but then concluded the
>>> >raw skb->hash seems fine in this case: we can directly access the hash
>>> >w/o extra eBPF helper function call, it's filled out by many NICs on
>>> >ingress, and in case the entropy level would not be sufficient, people
>>> >can still implement their own specific sw fallback hash mix anyway.
>>> >
>> Maybe we should add the skb_get_hash also? It doesn't as useful if
>> some scenarios we get a valid hash and in others not.
>
> we also discussed whether it makes sense to expose l4_hash and sw_hash
> bits as well. imo, seems a bit of overkill, since such call into sw hash
> function like this exposes the logic of flow_dissector looking into
> inner header. There are pros and cons. I think if we expose
> flow_dissector it's cleaner to do it directly (instead of skb_get_hash).
> Alternatively we can obfuscate skb_get_hash by calling it
> 'please compute some a hash on a packet somehow', but that will be
> awkward to use. The programs can compute whatever hash they like anyway.

I don't have a particularly strong opinion if you want to expose skb_get_hash()
to eBPF as well as a helper function (note, it will add a function call as
extra cost each time), but as Alexei says, there are pros and cons on either
way. We just have to be careful as what this is being advertised to uapi, so
we can reserve ourselves changes in future. I would definitely avoid to expose
l4_hash and sw_hash, though, as that should really remain an implementation
detail inside the kernel, and flow dissector for your own hash function you
could actually already code up for your specific use case in eBPF, hm.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bc0d27d..2ce13c1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -290,6 +290,7 @@  struct __sk_buff {
 	__u32 ifindex;
 	__u32 tc_index;
 	__u32 cb[5];
+	__u32 hash;
 };
 
 struct bpf_tunnel_key {
diff --git a/net/core/filter.c b/net/core/filter.c
index 1b72264..a50dbfa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1711,6 +1711,13 @@  static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 				      offsetof(struct net_device, ifindex));
 		break;
 
+	case offsetof(struct __sk_buff, hash):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
+				      offsetof(struct sk_buff, hash));
+		break;
+
 	case offsetof(struct __sk_buff, mark):
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);