diff mbox series

[RFC,2/6] net/sched: flower: add support for matching on ConnTrack

Message ID 6c976cc538f1f565b74bd2c750639af91a93adc1.1548285996.git.mleitner@redhat.com
State RFC
Delegated to: David Miller
Headers show
Series Initial, PoC implementation of sw datapath of tc+CT | expand

Commit Message

Marcelo Ricardo Leitner Jan. 25, 2019, 2:32 a.m. UTC
Hook on flow dissector's new interface on ConnTrack from previous patch.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/pkt_cls.h |  9 +++++++++
 net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

Comments

Simon Horman Jan. 25, 2019, 1:37 p.m. UTC | #1
Hi Marcelo,

On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> Hook on flow dissector's new interface on ConnTrack from previous patch.
> 
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
>  include/uapi/linux/pkt_cls.h |  9 +++++++++
>  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -490,6 +490,15 @@ enum {
>  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
>  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
>  
> +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */

With the corresponding flow dissector patch this API is
exposing the contents of an instance of enum ip_conntrack_info
as an ABI for conntrack state.

I believe (after getting similar review for my geneve options macthing
patches for flower) that this exposes implementation details as an ABI
to a degree that is not desirable.

My suggested would be to define, say in the form of named bits,
an ABI, that describes the state information that is exposed.
These bits may not correspond directly to the implementation of
ip_conntrack_info.

I think there should also be some consideration of if a mask makes
sense for the state as, f.e. in the implementation of enum
ip_conntrack_info not all bit combinations are valid. 

> +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> +
>  	__TCA_FLOWER_MAX,
>  };

...
Marcelo Ricardo Leitner Jan. 26, 2019, 3:52 p.m. UTC | #2
On Fri, Jan 25, 2019 at 02:37:13PM +0100, Simon Horman wrote:
> Hi Marcelo,
> 
> On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> > Hook on flow dissector's new interface on ConnTrack from previous patch.
> > 
> > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > ---
> >  include/uapi/linux/pkt_cls.h |  9 +++++++++
> >  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+)
> > 
> > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> > index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> > --- a/include/uapi/linux/pkt_cls.h
> > +++ b/include/uapi/linux/pkt_cls.h
> > @@ -490,6 +490,15 @@ enum {
> >  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
> >  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
> >  
> > +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> > +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> > +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> > +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
> 
> With the corresponding flow dissector patch this API is
> exposing the contents of an instance of enum ip_conntrack_info
> as an ABI for conntrack state.
> 
> I believe (after getting similar review for my geneve options macthing
> patches for flower) that this exposes implementation details as an ABI
> to a degree that is not desirable.
> 
> My suggested would be to define, say in the form of named bits,
> an ABI, that describes the state information that is exposed.
> These bits may not correspond directly to the implementation of
> ip_conntrack_info.
> 
> I think there should also be some consideration of if a mask makes
> sense for the state as, f.e. in the implementation of enum
> ip_conntrack_info not all bit combinations are valid. 

Right. ct_state must be handled differently. For conntrack it is a
linear enum and as we want to be able to OR match, we will have to
convert the states in a bitfield as you were saying or so.

I don't think the representation above wouldn't change, though: we have
8 bits wrapped under a u8. What would change is how we deal with it.

If iproute tc is able to parse the cmdline and set a corresponding bit
for each state, the flower-side of flow dissector here should be
mostly fine (need to consider the invalid bits as you mentioned, as
part of sanity checking).
Then just need to change on how flow dissector is reading ct_state
from the packet.


Is your comment only related to ct_state or other fields too? I'm
thinking only ct_state.

> 
> > +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> > +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> > +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> > +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> > +
> >  	__TCA_FLOWER_MAX,
> >  };
> 
> ...
>
Simon Horman Jan. 28, 2019, 9:44 a.m. UTC | #3
On Sat, Jan 26, 2019 at 01:52:01PM -0200, Marcelo Ricardo Leitner wrote:
> On Fri, Jan 25, 2019 at 02:37:13PM +0100, Simon Horman wrote:
> > Hi Marcelo,
> > 
> > On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> > > Hook on flow dissector's new interface on ConnTrack from previous patch.
> > > 
> > > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > > ---
> > >  include/uapi/linux/pkt_cls.h |  9 +++++++++
> > >  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
> > >  2 files changed, 42 insertions(+)
> > > 
> > > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> > > index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> > > --- a/include/uapi/linux/pkt_cls.h
> > > +++ b/include/uapi/linux/pkt_cls.h
> > > @@ -490,6 +490,15 @@ enum {
> > >  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
> > >  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
> > >  
> > > +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> > > +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> > > +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> > > +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
> > 
> > With the corresponding flow dissector patch this API is
> > exposing the contents of an instance of enum ip_conntrack_info
> > as an ABI for conntrack state.
> > 
> > I believe (after getting similar review for my geneve options macthing
> > patches for flower) that this exposes implementation details as an ABI
> > to a degree that is not desirable.
> > 
> > My suggested would be to define, say in the form of named bits,
> > an ABI, that describes the state information that is exposed.
> > These bits may not correspond directly to the implementation of
> > ip_conntrack_info.
> > 
> > I think there should also be some consideration of if a mask makes
> > sense for the state as, f.e. in the implementation of enum
> > ip_conntrack_info not all bit combinations are valid. 
> 
> Right. ct_state must be handled differently. For conntrack it is a
> linear enum and as we want to be able to OR match, we will have to
> convert the states in a bitfield as you were saying or so.
> 
> I don't think the representation above wouldn't change, though: we have
> 8 bits wrapped under a u8. What would change is how we deal with it.
> 
> If iproute tc is able to parse the cmdline and set a corresponding bit
> for each state, the flower-side of flow dissector here should be
> mostly fine (need to consider the invalid bits as you mentioned, as
> part of sanity checking).
> Then just need to change on how flow dissector is reading ct_state
> from the packet.

I'm not entirely opposed to a KABI which defines bits of
TCA_FLOWER_KEY_CT_STATE in such a way that they match exactly
the current implementation of enum ip_conntrack_info (though I do suspect
that a better representation is possible, for some value of better).

But, on the other hand, I am not comfortable in simply sating that
TCA_FLOWER_KEY_CT_STATE is the same as enum ip_conntrack_info, because
that exposes an internal implementation detail which may change.

> Is your comment only related to ct_state or other fields too? I'm
> thinking only ct_state.

Sorry that I wasn't clear, I was only referring to ct_state.

> > > +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> > > +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> > > +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> > > +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> > > +
> > >  	__TCA_FLOWER_MAX,
> > >  };
> > 
> > ...
> >
Marcelo Ricardo Leitner Jan. 28, 2019, 12:55 p.m. UTC | #4
On Mon, Jan 28, 2019 at 10:44:23AM +0100, Simon Horman wrote:
> On Sat, Jan 26, 2019 at 01:52:01PM -0200, Marcelo Ricardo Leitner wrote:
> > On Fri, Jan 25, 2019 at 02:37:13PM +0100, Simon Horman wrote:
> > > Hi Marcelo,
> > > 
> > > On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> > > > Hook on flow dissector's new interface on ConnTrack from previous patch.
> > > > 
> > > > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > > > ---
> > > >  include/uapi/linux/pkt_cls.h |  9 +++++++++
> > > >  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
> > > >  2 files changed, 42 insertions(+)
> > > > 
> > > > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> > > > index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> > > > --- a/include/uapi/linux/pkt_cls.h
> > > > +++ b/include/uapi/linux/pkt_cls.h
> > > > @@ -490,6 +490,15 @@ enum {
> > > >  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
> > > >  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
> > > >  
> > > > +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> > > > +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> > > > +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> > > > +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
> > > 
> > > With the corresponding flow dissector patch this API is
> > > exposing the contents of an instance of enum ip_conntrack_info
> > > as an ABI for conntrack state.
> > > 
> > > I believe (after getting similar review for my geneve options macthing
> > > patches for flower) that this exposes implementation details as an ABI
> > > to a degree that is not desirable.
> > > 
> > > My suggested would be to define, say in the form of named bits,
> > > an ABI, that describes the state information that is exposed.
> > > These bits may not correspond directly to the implementation of
> > > ip_conntrack_info.
> > > 
> > > I think there should also be some consideration of if a mask makes
> > > sense for the state as, f.e. in the implementation of enum
> > > ip_conntrack_info not all bit combinations are valid. 
> > 
> > Right. ct_state must be handled differently. For conntrack it is a
> > linear enum and as we want to be able to OR match, we will have to
> > convert the states in a bitfield as you were saying or so.
> > 
> > I don't think the representation above wouldn't change, though: we have
> > 8 bits wrapped under a u8. What would change is how we deal with it.
> > 
> > If iproute tc is able to parse the cmdline and set a corresponding bit
> > for each state, the flower-side of flow dissector here should be
> > mostly fine (need to consider the invalid bits as you mentioned, as
> > part of sanity checking).
> > Then just need to change on how flow dissector is reading ct_state
> > from the packet.
> 
> I'm not entirely opposed to a KABI which defines bits of
> TCA_FLOWER_KEY_CT_STATE in such a way that they match exactly
> the current implementation of enum ip_conntrack_info (though I do suspect
> that a better representation is possible, for some value of better).

Ok. Will check.

> 
> But, on the other hand, I am not comfortable in simply sating that
> TCA_FLOWER_KEY_CT_STATE is the same as enum ip_conntrack_info, because
> that exposes an internal implementation detail which may change.

"internal" here is relative because enum ip_conntrack_info is on UAPI
already, at include/uapi/linux/netfilter/nf_conntrack_common.h.
Anyhow, agree that a new listing may make more sense.

The duplicity in enum ip_conntrack_info might hide some surpises for
us too:

enum ip_conntrack_info {
        IP_CT_ESTABLISHED,                = 0
        IP_CT_RELATED,                    = 1
        IP_CT_NEW,                        = 2
        IP_CT_IS_REPLY,                   = 3   <--
        IP_CT_ESTABLISHED_REPLY = IP_CT_ESTABLISHED + IP_CT_IS_REPLY,
                                    0 + 3 = 3   <--
        IP_CT_RELATED_REPLY = IP_CT_RELATED + IP_CT_IS_REPLY,
                                    1 + 3 = 4

> 
> > Is your comment only related to ct_state or other fields too? I'm
> > thinking only ct_state.
> 
> Sorry that I wasn't clear, I was only referring to ct_state.

No need to be :-)

Thanks,
Marcelo

> 
> > > > +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> > > > +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> > > > +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> > > > +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> > > > +
> > > >  	__TCA_FLOWER_MAX,
> > > >  };
> > > 
> > > ...
> > > 
>
Florian Westphal Jan. 28, 2019, 1:02 p.m. UTC | #5
Marcelo Ricardo Leitner <mleitner@redhat.com> wrote:
> 
> us too:
> 
> enum ip_conntrack_info {
>         IP_CT_ESTABLISHED,                = 0
>         IP_CT_RELATED,                    = 1
>         IP_CT_NEW,                        = 2
>         IP_CT_IS_REPLY,                   = 3   <--
>         IP_CT_ESTABLISHED_REPLY = IP_CT_ESTABLISHED + IP_CT_IS_REPLY,
>                                     0 + 3 = 3   <--

Don't worry, there is no IP_CT_ESTABLISHED_REPLY state, just pretend
you did not see it :-)

As for UAPI, I suggest to check net/netfilter/nft_ct.c which already
exposes established/new/invalid and so on.
diff mbox series

Patch

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -490,6 +490,15 @@  enum {
 	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
 	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
 
+	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
+	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
+	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
+	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
+	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
+	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
+	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
+	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 85e9f8e1da10aa7b01b0f51768edfefbe63d6a10..430b7fceeca0998b8c904acd91f8de53571814ff 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -57,6 +57,7 @@  struct fl_flow_key {
 	struct flow_dissector_key_enc_opts enc_opts;
 	struct flow_dissector_key_ports tp_min;
 	struct flow_dissector_key_ports tp_max;
+	struct flow_dissector_key_ct ct;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
 
 struct fl_flow_mask_range {
@@ -1079,6 +1080,22 @@  static int fl_set_key(struct net *net, struct nlattr **tb,
 
 	fl_set_key_ip(tb, true, &key->enc_ip, &mask->enc_ip);
 
+	fl_set_key_val(tb, &key->ct.mark, TCA_FLOWER_KEY_CT_MARK,
+		       &mask->ct.mark, TCA_FLOWER_KEY_CT_MARK_MASK,
+		       sizeof(key->ct.mark));
+
+	fl_set_key_val(tb, &key->ct.zone, TCA_FLOWER_KEY_CT_ZONE,
+		       &mask->ct.zone, TCA_FLOWER_KEY_CT_ZONE_MASK,
+		       sizeof(key->ct.zone));
+
+	fl_set_key_val(tb, &key->ct.state, TCA_FLOWER_KEY_CT_STATE,
+		       &mask->ct.state, TCA_FLOWER_KEY_CT_STATE_MASK,
+		       sizeof(key->ct.state));
+
+	fl_set_key_val(tb, &key->ct.label, TCA_FLOWER_KEY_CT_LABEL,
+		       &mask->ct.label, TCA_FLOWER_KEY_CT_LABEL_MASK,
+		       sizeof(key->ct.label));
+
 	if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
 		ret = fl_set_enc_opt(tb, key, mask, extack);
 		if (ret)
@@ -1183,6 +1200,8 @@  static void fl_init_dissector(struct flow_dissector *dissector,
 			     FLOW_DISSECTOR_KEY_ENC_IP, enc_ip);
 	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 			     FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
+	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+			     FLOW_DISSECTOR_KEY_CT, ct);
 
 	skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -1994,6 +2013,20 @@  static int fl_dump_key(struct sk_buff *skb, struct net *net,
 	    fl_dump_key_enc_opt(skb, &key->enc_opts, &mask->enc_opts))
 		goto nla_put_failure;
 
+	if (fl_dump_key_val(skb, &key->ct.zone, TCA_FLOWER_KEY_CT_ZONE,
+			    &mask->ct.zone, TCA_FLOWER_KEY_CT_ZONE_MASK,
+			    sizeof(key->ct.zone)) ||
+	    fl_dump_key_val(skb, &key->ct.mark, TCA_FLOWER_KEY_CT_MARK,
+			    &mask->ct.mark, TCA_FLOWER_KEY_CT_MARK_MASK,
+			    sizeof(key->ct.mark)) ||
+	    fl_dump_key_val(skb, &key->ct.state, TCA_FLOWER_KEY_CT_STATE,
+			    &mask->ct.state, TCA_FLOWER_KEY_CT_STATE_MASK,
+			    sizeof(key->ct.state)) ||
+	    fl_dump_key_val(skb, &key->ct.label, TCA_FLOWER_KEY_CT_LABEL,
+			    &mask->ct.label, TCA_FLOWER_KEY_CT_LABEL_MASK,
+			    sizeof(key->ct.label)))
+		goto nla_put_failure;
+
 	if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
 		goto nla_put_failure;