diff mbox series

[bpf,1/2] flow_dissector: support FLOW_DISSECTOR_KEY_ETH_ADDRSwith BPF

Message ID 20190513185402.220122-1-sdf@google.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf,1/2] flow_dissector: support FLOW_DISSECTOR_KEY_ETH_ADDRSwith BPF | expand

Commit Message

Stanislav Fomichev May 13, 2019, 6:54 p.m. UTC
If we have a flow dissector BPF program attached to the namespace,
FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.

Handle FLOW_DISSECTOR_KEY_ETH_ADDRS before BPF and only if we have
an skb (used by tc-flower only).

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 net/core/flow_dissector.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

Comments

Willem de Bruijn May 13, 2019, 8:33 p.m. UTC | #1
On Mon, May 13, 2019 at 3:53 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> If we have a flow dissector BPF program attached to the namespace,
> FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.

I suppose that this is true for a variety of keys? For instance, also
FLOW_DISSECTOR_KEY_IPV4_ADDRS.

We originally intended BPF flow dissection for all paths except
tc_flower. As that catches all the vulnerable cases on the ingress
path on the one hand and it is infeasible to support all the
flower features, now and future. I think that is the real fix.

>
> Handle FLOW_DISSECTOR_KEY_ETH_ADDRS before BPF and only if we have
> an skb (used by tc-flower only).
>
> Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  net/core/flow_dissector.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 9ca784c592ac..ba76d9168c8b 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -825,6 +825,18 @@ bool __skb_flow_dissect(const struct net *net,
>                         else if (skb->sk)
>                                 net = sock_net(skb->sk);
>                 }
> +
> +               if (dissector_uses_key(flow_dissector,
> +                                      FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
> +                       struct ethhdr *eth = eth_hdr(skb);

Here as well as in the original patch: is it safe to just cast to
eth_hdr? In the same file, __skb_flow_dissect_gre does test for
(encapsulated) protocol first.
Stanislav Fomichev May 13, 2019, 9:02 p.m. UTC | #2
On 05/13, Willem de Bruijn wrote:
> On Mon, May 13, 2019 at 3:53 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > If we have a flow dissector BPF program attached to the namespace,
> > FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.
> 
> I suppose that this is true for a variety of keys? For instance, also
> FLOW_DISSECTOR_KEY_IPV4_ADDRS.
I though the intent was to support most of the basic stuff (eth/ip/tcp/udp)
without any esoteric protocols. Not sure about FLOW_DISSECTOR_KEY_IPV4_ADDRS,
looks like we support that (except FLOW_DISSECTOR_KEY_TIPC part).

> We originally intended BPF flow dissection for all paths except
> tc_flower. As that catches all the vulnerable cases on the ingress
> path on the one hand and it is infeasible to support all the
> flower features, now and future. I think that is the real fix.
Sorry, didn't get what you meant by the real fix.
Don't care about tc_flower? Just support a minimal set of features
needed by selftests?

> >
> > Handle FLOW_DISSECTOR_KEY_ETH_ADDRS before BPF and only if we have
> > an skb (used by tc-flower only).
> >
> > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  net/core/flow_dissector.c | 23 ++++++++++++-----------
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> >
> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > index 9ca784c592ac..ba76d9168c8b 100644
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -825,6 +825,18 @@ bool __skb_flow_dissect(const struct net *net,
> >                         else if (skb->sk)
> >                                 net = sock_net(skb->sk);
> >                 }
> > +
> > +               if (dissector_uses_key(flow_dissector,
> > +                                      FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
> > +                       struct ethhdr *eth = eth_hdr(skb);
> 
> Here as well as in the original patch: is it safe to just cast to
> eth_hdr? In the same file, __skb_flow_dissect_gre does test for
> (encapsulated) protocol first.
Good question, I guess the assumption here is that
FLOW_DISSECTOR_KEY_ETH_ADDRS is only used by tc_flower and the appropriate
checks should be there as well.
It's probably better to check skb->proto here though.
Willem de Bruijn May 13, 2019, 9:21 p.m. UTC | #3
On Mon, May 13, 2019 at 5:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 05/13, Willem de Bruijn wrote:
> > On Mon, May 13, 2019 at 3:53 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > If we have a flow dissector BPF program attached to the namespace,
> > > FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.
> >
> > I suppose that this is true for a variety of keys? For instance, also
> > FLOW_DISSECTOR_KEY_IPV4_ADDRS.

> I though the intent was to support most of the basic stuff (eth/ip/tcp/udp)
> without any esoteric protocols.

Indeed. But this applies both to protocols and the feature set. Both
are more limited.

> Not sure about FLOW_DISSECTOR_KEY_IPV4_ADDRS,
> looks like we support that (except FLOW_DISSECTOR_KEY_TIPC part).

Ah, I chose a bad example then.

> > We originally intended BPF flow dissection for all paths except
> > tc_flower. As that catches all the vulnerable cases on the ingress
> > path on the one hand and it is infeasible to support all the
> > flower features, now and future. I think that is the real fix.

> Sorry, didn't get what you meant by the real fix.
> Don't care about tc_flower? Just support a minimal set of features
> needed by selftests?

I do mean exclude BPF flow dissector (only) for tc_flower, as we
cannot guarantee that the BPF program can fully implement the
requested feature.

>
> > >
> > > Handle FLOW_DISSECTOR_KEY_ETH_ADDRS before BPF and only if we have
> > > an skb (used by tc-flower only).
> > >
> > > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  net/core/flow_dissector.c | 23 ++++++++++++-----------
> > >  1 file changed, 12 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > > index 9ca784c592ac..ba76d9168c8b 100644
> > > --- a/net/core/flow_dissector.c
> > > +++ b/net/core/flow_dissector.c
> > > @@ -825,6 +825,18 @@ bool __skb_flow_dissect(const struct net *net,
> > >                         else if (skb->sk)
> > >                                 net = sock_net(skb->sk);
> > >                 }
> > > +
> > > +               if (dissector_uses_key(flow_dissector,
> > > +                                      FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
> > > +                       struct ethhdr *eth = eth_hdr(skb);
> >
> > Here as well as in the original patch: is it safe to just cast to
> > eth_hdr? In the same file, __skb_flow_dissect_gre does test for
> > (encapsulated) protocol first.

> Good question, I guess the assumption here is that
> FLOW_DISSECTOR_KEY_ETH_ADDRS is only used by tc_flower and the appropriate
> checks should be there as well.
> It's probably better to check skb->proto here though.

Right, as a mistaken or malicious admin can request it on a non
Ethernet device and read garbage.
Willem de Bruijn May 13, 2019, 10:47 p.m. UTC | #4
On Mon, May 13, 2019 at 5:21 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, May 13, 2019 at 5:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 05/13, Willem de Bruijn wrote:
> > > On Mon, May 13, 2019 at 3:53 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > If we have a flow dissector BPF program attached to the namespace,
> > > > FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.
> > >
> > > I suppose that this is true for a variety of keys? For instance, also
> > > FLOW_DISSECTOR_KEY_IPV4_ADDRS.
>
> > I though the intent was to support most of the basic stuff (eth/ip/tcp/udp)
> > without any esoteric protocols.
>
> Indeed. But this applies both to protocols and the feature set. Both
> are more limited.
>
> > Not sure about FLOW_DISSECTOR_KEY_IPV4_ADDRS,
> > looks like we support that (except FLOW_DISSECTOR_KEY_TIPC part).
>
> Ah, I chose a bad example then.
>
> > > We originally intended BPF flow dissection for all paths except
> > > tc_flower. As that catches all the vulnerable cases on the ingress
> > > path on the one hand and it is infeasible to support all the
> > > flower features, now and future. I think that is the real fix.
>
> > Sorry, didn't get what you meant by the real fix.
> > Don't care about tc_flower? Just support a minimal set of features
> > needed by selftests?
>
> I do mean exclude BPF flow dissector (only) for tc_flower, as we
> cannot guarantee that the BPF program can fully implement the
> requested feature.

Though, the user inserting the BPF flow dissector is the same as the
user inserting the flower program, the (per netns) admin. So arguably
is aware of the constraints incurred by BPF flow dissection. And else
can still detect when a feature is not supported from the (lack of)
output, as in this case of Ethernet address. I don't think we want to
mix BPF and non-BPF flow dissection though. That subverts the safety
argument of switching to BPF for flow dissection.
Stanislav Fomichev May 13, 2019, 11:05 p.m. UTC | #5
On 05/13, Willem de Bruijn wrote:
> On Mon, May 13, 2019 at 5:21 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Mon, May 13, 2019 at 5:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 05/13, Willem de Bruijn wrote:
> > > > On Mon, May 13, 2019 at 3:53 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >
> > > > > If we have a flow dissector BPF program attached to the namespace,
> > > > > FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.
> > > >
> > > > I suppose that this is true for a variety of keys? For instance, also
> > > > FLOW_DISSECTOR_KEY_IPV4_ADDRS.
> >
> > > I though the intent was to support most of the basic stuff (eth/ip/tcp/udp)
> > > without any esoteric protocols.
> >
> > Indeed. But this applies both to protocols and the feature set. Both
> > are more limited.
> >
> > > Not sure about FLOW_DISSECTOR_KEY_IPV4_ADDRS,
> > > looks like we support that (except FLOW_DISSECTOR_KEY_TIPC part).
> >
> > Ah, I chose a bad example then.
> >
> > > > We originally intended BPF flow dissection for all paths except
> > > > tc_flower. As that catches all the vulnerable cases on the ingress
> > > > path on the one hand and it is infeasible to support all the
> > > > flower features, now and future. I think that is the real fix.
> >
> > > Sorry, didn't get what you meant by the real fix.
> > > Don't care about tc_flower? Just support a minimal set of features
> > > needed by selftests?
> >
> > I do mean exclude BPF flow dissector (only) for tc_flower, as we
> > cannot guarantee that the BPF program can fully implement the
> > requested feature.
> 
> Though, the user inserting the BPF flow dissector is the same as the
> user inserting the flower program, the (per netns) admin. So arguably
> is aware of the constraints incurred by BPF flow dissection. And else
> can still detect when a feature is not supported from the (lack of)
> output, as in this case of Ethernet address. I don't think we want to
> mix BPF and non-BPF flow dissection though. That subverts the safety
> argument of switching to BPF for flow dissection.
Yes, we cannot completely avoid tc_flower because we use it to do
the end-to-end testing. That's why I was trying to make sure "basic"
stuff works (it might feel confusing that tc_flower {src,dst}_mac
stop working with a bpf program installed).

TBH, I'd not call this particular piece of code that exports src/dst
addresses a dissection. At this point, it's a well-formed skb with
a proper l2 header and we just copy the addresses out. It's probably
part of the reason the original patch didn't include any skb->protocol
checks.
Stanislav Fomichev May 13, 2019, 11:21 p.m. UTC | #6
> On 05/13, Willem de Bruijn wrote:
> > On Mon, May 13, 2019 at 5:21 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Mon, May 13, 2019 at 5:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 05/13, Willem de Bruijn wrote:
> > > > > On Mon, May 13, 2019 at 3:53 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > >
> > > > > > If we have a flow dissector BPF program attached to the namespace,
> > > > > > FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.
> > > > >
> > > > > I suppose that this is true for a variety of keys? For instance, also
> > > > > FLOW_DISSECTOR_KEY_IPV4_ADDRS.
> > >
> > > > I though the intent was to support most of the basic stuff (eth/ip/tcp/udp)
> > > > without any esoteric protocols.
> > >
> > > Indeed. But this applies both to protocols and the feature set. Both
> > > are more limited.
> > >
> > > > Not sure about FLOW_DISSECTOR_KEY_IPV4_ADDRS,
> > > > looks like we support that (except FLOW_DISSECTOR_KEY_TIPC part).
> > >
> > > Ah, I chose a bad example then.
> > >
> > > > > We originally intended BPF flow dissection for all paths except
> > > > > tc_flower. As that catches all the vulnerable cases on the ingress
> > > > > path on the one hand and it is infeasible to support all the
> > > > > flower features, now and future. I think that is the real fix.
> > >
> > > > Sorry, didn't get what you meant by the real fix.
> > > > Don't care about tc_flower? Just support a minimal set of features
> > > > needed by selftests?
> > >
> > > I do mean exclude BPF flow dissector (only) for tc_flower, as we
> > > cannot guarantee that the BPF program can fully implement the
> > > requested feature.
> >
> > Though, the user inserting the BPF flow dissector is the same as the
> > user inserting the flower program, the (per netns) admin. So arguably
> > is aware of the constraints incurred by BPF flow dissection. And else
> > can still detect when a feature is not supported from the (lack of)
> > output, as in this case of Ethernet address. I don't think we want to
> > mix BPF and non-BPF flow dissection though. That subverts the safety
> > argument of switching to BPF for flow dissection.
> Yes, we cannot completely avoid tc_flower because we use it to do
> the end-to-end testing. That's why I was trying to make sure "basic"
> stuff works (it might feel confusing that tc_flower {src,dst}_mac
> stop working with a bpf program installed).
>
> TBH, I'd not call this particular piece of code that exports src/dst
> addresses a dissection. At this point, it's a well-formed skb with
> a proper l2 header and we just copy the addresses out. It's probably
> part of the reason the original patch didn't include any skb->protocol
> checks.
On the other hand, we can probably follow a simple rule:
if it's not exported via bpf_flow_keys (and src/dsc mac is not),
tc_flower is not supported as well.
Willem de Bruijn May 13, 2019, 11:44 p.m. UTC | #7
From: Stanislav Fomichev <sdf@google.com>
Date: Mon, May 13, 2019 at 7:21 PM
To: Stanislav Fomichev
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>, Network
Development, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
Willem de Bruijn <willemb@google.com>, Petar Penkov

> > On 05/13, Willem de Bruijn wrote:
> > > On Mon, May 13, 2019 at 5:21 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > On Mon, May 13, 2019 at 5:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > >
> > > > > On 05/13, Willem de Bruijn wrote:
> > > > > > On Mon, May 13, 2019 at 3:53 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > >
> > > > > > > If we have a flow dissector BPF program attached to the namespace,
> > > > > > > FLOW_DISSECTOR_KEY_ETH_ADDRS won't trigger because we exit early.
> > > > > >
> > > > > > I suppose that this is true for a variety of keys? For instance, also
> > > > > > FLOW_DISSECTOR_KEY_IPV4_ADDRS.
> > > >
> > > > > I though the intent was to support most of the basic stuff (eth/ip/tcp/udp)
> > > > > without any esoteric protocols.
> > > >
> > > > Indeed. But this applies both to protocols and the feature set. Both
> > > > are more limited.
> > > >
> > > > > Not sure about FLOW_DISSECTOR_KEY_IPV4_ADDRS,
> > > > > looks like we support that (except FLOW_DISSECTOR_KEY_TIPC part).
> > > >
> > > > Ah, I chose a bad example then.
> > > >
> > > > > > We originally intended BPF flow dissection for all paths except
> > > > > > tc_flower. As that catches all the vulnerable cases on the ingress
> > > > > > path on the one hand and it is infeasible to support all the
> > > > > > flower features, now and future. I think that is the real fix.
> > > >
> > > > > Sorry, didn't get what you meant by the real fix.
> > > > > Don't care about tc_flower? Just support a minimal set of features
> > > > > needed by selftests?
> > > >
> > > > I do mean exclude BPF flow dissector (only) for tc_flower, as we
> > > > cannot guarantee that the BPF program can fully implement the
> > > > requested feature.
> > >
> > > Though, the user inserting the BPF flow dissector is the same as the
> > > user inserting the flower program, the (per netns) admin. So arguably
> > > is aware of the constraints incurred by BPF flow dissection. And else
> > > can still detect when a feature is not supported from the (lack of)
> > > output, as in this case of Ethernet address. I don't think we want to
> > > mix BPF and non-BPF flow dissection though. That subverts the safety
> > > argument of switching to BPF for flow dissection.
> > Yes, we cannot completely avoid tc_flower because we use it to do
> > the end-to-end testing. That's why I was trying to make sure "basic"
> > stuff works (it might feel confusing that tc_flower {src,dst}_mac
> > stop working with a bpf program installed).
> >
> > TBH, I'd not call this particular piece of code that exports src/dst
> > addresses a dissection. At this point, it's a well-formed skb with
> > a proper l2 header and we just copy the addresses out. It's probably
> > part of the reason the original patch didn't include any skb->protocol
> > checks.

But it is not guaranteed to be an Ethernet link layer device. Making
this a good example of why when moving to BPF for safety we should not
keep any C dissection code in the path at all.

> On the other hand, we can probably follow a simple rule:
> if it's not exported via bpf_flow_keys (and src/dsc mac is not),
> tc_flower is not supported as well.

Agreed. I was using that as point of reference just now, too.
diff mbox series

Patch

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 9ca784c592ac..ba76d9168c8b 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -825,6 +825,18 @@  bool __skb_flow_dissect(const struct net *net,
 			else if (skb->sk)
 				net = sock_net(skb->sk);
 		}
+
+		if (dissector_uses_key(flow_dissector,
+				       FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+			struct ethhdr *eth = eth_hdr(skb);
+			struct flow_dissector_key_eth_addrs *key_eth_addrs;
+
+			key_eth_addrs = skb_flow_dissector_target(flow_dissector,
+								  FLOW_DISSECTOR_KEY_ETH_ADDRS,
+								  target_container);
+			memcpy(key_eth_addrs, &eth->h_dest,
+			       sizeof(*key_eth_addrs));
+		}
 	}
 
 	WARN_ON_ONCE(!net);
@@ -860,17 +872,6 @@  bool __skb_flow_dissect(const struct net *net,
 		rcu_read_unlock();
 	}
 
-	if (dissector_uses_key(flow_dissector,
-			       FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
-		struct ethhdr *eth = eth_hdr(skb);
-		struct flow_dissector_key_eth_addrs *key_eth_addrs;
-
-		key_eth_addrs = skb_flow_dissector_target(flow_dissector,
-							  FLOW_DISSECTOR_KEY_ETH_ADDRS,
-							  target_container);
-		memcpy(key_eth_addrs, &eth->h_dest, sizeof(*key_eth_addrs));
-	}
-
 proto_again:
 	fdret = FLOW_DISSECT_RET_CONTINUE;