diff mbox

[RFC,nft,userspace] nft: connlabel matching support

Message ID 1392504432-20918-1-git-send-email-fw@strlen.de
State RFC
Headers show

Commit Message

Florian Westphal Feb. 15, 2014, 10:47 p.m. UTC
Takes advantage of the fact that the current maximum label storage area
is 128 bits, i.e. it will fit into a nft register.

Advantages:
- kernel part is very small (just copies extension storage to register)
- userspace part is not huge either and very powerful since
  we can operate on the entire '128 bit connlabel pseudo-register'
  (f.e. "ct labels & foo|bar == foo" to match when foo is set and bar is not,
   etc)

Disadvantage:

- We get into trouble if kernel would ever increase the current limit
  of 128 labels (its just a matter of changing #define in kernel and
  making sure nf_conn extension offsets can deal with larger sizes).
- Its currently not necessarily doing what the user would think.

The latter point deserves example:

add rule filter output ct labels foo

is NOT the translation of iptables
-m connlabel --label foo

The former matches only if foo and ONLY foo is set.
The latter matches when foo label bit is set and doesn't care about
other labels, i.e. the nft translation would be

add rule filter output ct labels & foo == foo

Thus I am not sure at the moment if this is really the right approach
and if it would be better to follow the xt_connlabel model and ask kernel
for the desired *bit* instead.

It could be done by adding a 'ctlabel' instruction that would check the
bit given in sreg (eg. 3 would do test_bit(... 3) and BREAK if its not set.

Something to also keep in mind is that this currently lacks write
support.  In iptables this is done via
'-m connlabel --label foo --set', which causes atomic set-bit operation.

For nft with current approach it would mean we would need to copy to
reg, alter reg, then copy altered register back to extension area.

What do others think?

Just for reference i've included the userspace part of the change
(treating labels as 128 bit register).

Its not complete for merging at this time, so no SOB line.
---
 include/datatype.h                  |  2 ++
 include/linux/netfilter/nf_tables.h |  2 ++
 src/ct.c                            | 65 +++++++++++++++++++++++++++++++++++++
 src/parser.y                        |  2 ++
 src/scanner.l                       |  1 +
 5 files changed, 72 insertions(+)

Comments

Patrick McHardy Feb. 16, 2014, 8:53 a.m. UTC | #1
On Sat, Feb 15, 2014 at 11:47:12PM +0100, Florian Westphal wrote:
> Takes advantage of the fact that the current maximum label storage area
> is 128 bits, i.e. it will fit into a nft register.
> 
> Advantages:
> - kernel part is very small (just copies extension storage to register)
> - userspace part is not huge either and very powerful since
>   we can operate on the entire '128 bit connlabel pseudo-register'
>   (f.e. "ct labels & foo|bar == foo" to match when foo is set and bar is not,
>    etc)
> 
> Disadvantage:
> 
> - We get into trouble if kernel would ever increase the current limit
>   of 128 labels (its just a matter of changing #define in kernel and
>   making sure nf_conn extension offsets can deal with larger sizes).

We need to change the kernel registers anyway to deal with concatenations
properly. So this will most likely not be a problem.

> - Its currently not necessarily doing what the user would think.
> 
> The latter point deserves example:
> 
> add rule filter output ct labels foo
> 
> is NOT the translation of iptables
> -m connlabel --label foo
> 
> The former matches only if foo and ONLY foo is set.
> The latter matches when foo label bit is set and doesn't care about
> other labels, i.e. the nft translation would be
> 
> add rule filter output ct labels & foo == foo
> 
> Thus I am not sure at the moment if this is really the right approach
> and if it would be better to follow the xt_connlabel model and ask kernel
> for the desired *bit* instead.
> 
> It could be done by adding a 'ctlabel' instruction that would check the
> bit given in sreg (eg. 3 would do test_bit(... 3) and BREAK if its not set.

Actually you can also easily do this with by just generating an equality
comparison in userspace. Since you're using bitmask as a base type,
nft automatically generates instructions to only check that bit when
using an *implicit* op. If you use

"filter output ct labels == foo" that should match only the one label.

I have to admit that I don't know how connlabels are used, so I don't
know what the best default implicit op would be.

> Something to also keep in mind is that this currently lacks write
> support.  In iptables this is done via
> '-m connlabel --label foo --set', which causes atomic set-bit operation.
> 
> For nft with current approach it would mean we would need to copy to
> reg, alter reg, then copy altered register back to extension area.
> 
> What do others think?

Sounds fine. Its a quite rare operation I assume.

> --- a/include/datatype.h
> +++ b/include/datatype.h
> @@ -32,6 +32,7 @@
>   * @TYPE_CT_STATE:	conntrack state (bitmask subtype)
>   * @TYPE_CT_DIR:	conntrack direction
>   * @TYPE_CT_STATUS:	conntrack status (bitmask subtype)
> + * @TYPE_CT_LABELS:	Conntrack Labels (bitmask subtype)
>   * @TYPE_ICMP6_TYPE:	ICMPv6 type codes (integer subtype)
>   */
>  enum datatypes {
> @@ -63,6 +64,7 @@ enum datatypes {
>  	TYPE_CT_STATE,
>  	TYPE_CT_DIR,
>  	TYPE_CT_STATUS,
> +	TYPE_CT_LABELS,
>  	TYPE_ICMP6_TYPE,
>  	__TYPE_MAX
>  };

Small note - we should add datatypes at the end. The kernel stores those
types for userspace to interpret the type of sets, so when doing an update
of nft, the types should still match.

> +static void ct_label_type_print(const struct expr *expr)
> +{
> +	const struct symbolic_constant *s;
> +	unsigned long bit = 0;
> +	int cnt = 0;
> +
> +	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
> +		bit = mpz_scan1(expr->value, s->value);
> +		if (bit < CT_LABELS_BIT_SIZE && bit == s->value) {
> +			if (cnt)
> +				printf("|");

This looks incorrect since it doesn't match the input format. When using
a bitmask base type, nft should automatically reconstruct the list as
comma-seperated values. So I think you don't need a print function.

> +			printf("%s", s->identifier);
> +			++cnt;
> +		}
> +	}
> +}
> +
> +static struct error_record *ct_label_type_parse(const struct expr *sym,
> +						struct expr **res)
> +{
> +	const struct symbolic_constant *s;
> +	const struct datatype *dtype;
> +	uint8_t data[CT_LABELS_BIT_SIZE];
> +	mpz_t value;
> +
> +	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
> +		if (!strcmp(sym->identifier, s->identifier))
> +			break;
> +	}
> +
> +	dtype = sym->dtype;
> +	if (s->identifier == NULL)
> +		return error(&sym->location, "Could not parse %s", dtype->desc);
> +
> +	mpz_init2(value, dtype->size);
> +	mpz_setbit(value, s->value);
> +	mpz_export_data(data, value, BYTEORDER_HOST_ENDIAN, sizeof(data));
> +
> +	*res = constant_expr_alloc(&sym->location, dtype,
> +				   dtype->byteorder, sizeof(data),
> +				   data);
> +	mpz_clear(value);
> +	return NULL;
> +}

Same here. The defaults should work fine.

> +static const struct datatype ct_labels_type = {
> +	.type		= TYPE_CT_LABELS,
> +	.name		= "ct_labels",
> +	.desc		= "conntrack labels",

The type only refers to a single instance, so all should be singular.


> +	.byteorder	= BYTEORDER_HOST_ENDIAN,
> +	.size		= CT_LABELS_BIT_SIZE,
> +	.basetype	= &bitmask_type,
> +	.print		= ct_label_type_print,
> +	.parse		= ct_label_type_parse,
> +};
> +
>  static const struct ct_template ct_templates[] = {
>  	[NFT_CT_STATE]		= CT_TEMPLATE("state",	    &ct_state_type,
>  					      BYTEORDER_HOST_ENDIAN,
> @@ -124,6 +185,9 @@ static const struct ct_template ct_templates[] = {
>  	[NFT_CT_PROTO_DST]	= CT_TEMPLATE("proto-dst",  &invalid_type,
>  					      BYTEORDER_BIG_ENDIAN,
>  					      2 * BITS_PER_BYTE),
> +	[NFT_CT_LABELS]		= CT_TEMPLATE("labels", &ct_labels_type,
> +					      BYTEORDER_HOST_ENDIAN,
> +					      CT_LABELS_BIT_SIZE),
>  };
>  
>  static void ct_expr_print(const struct expr *expr)
> @@ -159,4 +223,5 @@ static void __init ct_init(void)
>  	datatype_register(&ct_state_type);
>  	datatype_register(&ct_dir_type);
>  	datatype_register(&ct_status_type);
> +	ct_label_tbl = rt_symbol_table_init("/etc/xtables/connlabel.conf");

We're using per type init functions for symbol table initializations so
far (see meta.c). I'd suggest to keep this consistent.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Feb. 16, 2014, 11:01 a.m. UTC | #2
Patrick McHardy <kaber@trash.net> wrote:
> On Sat, Feb 15, 2014 at 11:47:12PM +0100, Florian Westphal wrote:
> > Takes advantage of the fact that the current maximum label storage area
> > is 128 bits, i.e. it will fit into a nft register.
> > 
> > Advantages:
> > - kernel part is very small (just copies extension storage to register)
> > - userspace part is not huge either and very powerful since
> >   we can operate on the entire '128 bit connlabel pseudo-register'
> >   (f.e. "ct labels & foo|bar == foo" to match when foo is set and bar is not,
> >    etc)
> > 
> > Disadvantage:
> > 
> > - We get into trouble if kernel would ever increase the current limit
> >   of 128 labels (its just a matter of changing #define in kernel and
> >   making sure nf_conn extension offsets can deal with larger sizes).
> 
> We need to change the kernel registers anyway to deal with concatenations
> properly. So this will most likely not be a problem.

Oh.  Thats good.  One problem disappearing :-)

> > - Its currently not necessarily doing what the user would think.
> > 
> > The latter point deserves example:
> > 
> > add rule filter output ct labels foo
> > 
> > is NOT the translation of iptables
> > -m connlabel --label foo
> > 
> > The former matches only if foo and ONLY foo is set.
> > The latter matches when foo label bit is set and doesn't care about
> > other labels, i.e. the nft translation would be
> > 
> > add rule filter output ct labels & foo == foo
> > 
> > Thus I am not sure at the moment if this is really the right approach
> > and if it would be better to follow the xt_connlabel model and ask kernel
> > for the desired *bit* instead.
> > 
> > It could be done by adding a 'ctlabel' instruction that would check the
> > bit given in sreg (eg. 3 would do test_bit(... 3) and BREAK if its not set.
> 
> Actually you can also easily do this with by just generating an equality
> comparison in userspace. Since you're using bitmask as a base type,
> nft automatically generates instructions to only check that bit when
> using an *implicit* op. If you use
> 
> "filter output ct labels == foo" that should match only the one label.

src/nft add rule filter output ct labels == foo --debug=netlink
ip filter output 0 0
[ ct load labels => reg 1 ]
[ cmp eq reg 1 0x00000001 0x00000000 0x00000000 0x00000000 ]

I guess I am doing something wrong in my patch.
Let me investigate.  The behaviour you describe seems to be exactly
what I want.

> I have to admit that I don't know how connlabels are used, so I don't
> know what the best default implicit op would be.

I think it should be a bit-test to make it behave like xt_connlabels
match.

> > Something to also keep in mind is that this currently lacks write
> > support.  In iptables this is done via
> > '-m connlabel --label foo --set', which causes atomic set-bit operation.
> > 
> > For nft with current approach it would mean we would need to copy to
> > reg, alter reg, then copy altered register back to extension area.
> > 
> > What do others think?
> 
> Sounds fine. Its a quite rare operation I assume.

Yes.

The iptables match performs the set operation only if the bit is not
set to avoid the write in the common case.

It should be possible to do something similar for nft.

> > --- a/include/datatype.h
> > +++ b/include/datatype.h
> > @@ -32,6 +32,7 @@
> >   * @TYPE_CT_STATE:	conntrack state (bitmask subtype)
> >   * @TYPE_CT_DIR:	conntrack direction
> >   * @TYPE_CT_STATUS:	conntrack status (bitmask subtype)
> > + * @TYPE_CT_LABELS:	Conntrack Labels (bitmask subtype)
> >   * @TYPE_ICMP6_TYPE:	ICMPv6 type codes (integer subtype)
> >   */
> >  enum datatypes {
> > @@ -63,6 +64,7 @@ enum datatypes {
> >  	TYPE_CT_STATE,
> >  	TYPE_CT_DIR,
> >  	TYPE_CT_STATUS,
> > +	TYPE_CT_LABELS,
> >  	TYPE_ICMP6_TYPE,
> >  	__TYPE_MAX
> >  };
> 
> Small note - we should add datatypes at the end. The kernel stores those
> types for userspace to interpret the type of sets, so when doing an update
> of nft, the types should still match.

I was unaware of this.  Thanks for pointing it out.

> > +static void ct_label_type_print(const struct expr *expr)
> > +{
> > +	const struct symbolic_constant *s;
> > +	unsigned long bit = 0;
> > +	int cnt = 0;
> > +
> > +	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
> > +		bit = mpz_scan1(expr->value, s->value);
> > +		if (bit < CT_LABELS_BIT_SIZE && bit == s->value) {
> > +			if (cnt)
> > +				printf("|");
> 
> This looks incorrect since it doesn't match the input format. When using
> a bitmask base type, nft should automatically reconstruct the list as
> comma-seperated values. So I think you don't need a print function.

see below

> > +static struct error_record *ct_label_type_parse(const struct expr *sym,
> > +						struct expr **res)
> > +{
> > +	const struct symbolic_constant *s;
> > +	const struct datatype *dtype;
> > +	uint8_t data[CT_LABELS_BIT_SIZE];
> > +	mpz_t value;
> > +
> > +	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
> > +		if (!strcmp(sym->identifier, s->identifier))
> > +			break;
> > +	}
> > +
> > +	dtype = sym->dtype;
> > +	if (s->identifier == NULL)
> > +		return error(&sym->location, "Could not parse %s", dtype->desc);
> > +
> > +	mpz_init2(value, dtype->size);
> > +	mpz_setbit(value, s->value);
> > +	mpz_export_data(data, value, BYTEORDER_HOST_ENDIAN, sizeof(data));
> > +
> > +	*res = constant_expr_alloc(&sym->location, dtype,
> > +				   dtype->byteorder, sizeof(data),
> > +				   data);
> > +	mpz_clear(value);
> > +	return NULL;
> > +}
> 
> Same here. The defaults should work fine.

Hmm, I think the problem is that connlabel.conf
maps the bit-number to names, e.g.

0 foo
1 bar

But these should be parsed as (1<<$number)
[ in iptables these values are passed to the kernel which
  uses them as argument to test_bit and friends ].

Could probably be fixed by having my own parsing
function instead of reusing the rt_symbol_table one, but I would
need to replace the uint64_t value in the sym table structure
for that [need to be able to store (1<<127)]...

> > +static const struct datatype ct_labels_type = {
> > +	.type		= TYPE_CT_LABELS,
> > +	.name		= "ct_labels",
> > +	.desc		= "conntrack labels",
> 
> The type only refers to a single instance, so all should be singular.

Ok

> >  static void ct_expr_print(const struct expr *expr)
> > @@ -159,4 +223,5 @@ static void __init ct_init(void)
> >  	datatype_register(&ct_state_type);
> >  	datatype_register(&ct_dir_type);
> >  	datatype_register(&ct_status_type);
> > +	ct_label_tbl = rt_symbol_table_init("/etc/xtables/connlabel.conf");
> 
> We're using per type init functions for symbol table initializations so
> far (see meta.c). I'd suggest to keep this consistent.

Sure, I'll change it.

Thanks Patrick.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Feb. 16, 2014, 11:12 a.m. UTC | #3
On Sun, Feb 16, 2014 at 12:01:52PM +0100, Florian Westphal wrote:
> Patrick McHardy <kaber@trash.net> wrote:
> 
> > > - Its currently not necessarily doing what the user would think.
> > > 
> > > The latter point deserves example:
> > > 
> > > add rule filter output ct labels foo
> > > 
> > > is NOT the translation of iptables
> > > -m connlabel --label foo
> > > 
> > > The former matches only if foo and ONLY foo is set.
> > > The latter matches when foo label bit is set and doesn't care about
> > > other labels, i.e. the nft translation would be
> > > 
> > > add rule filter output ct labels & foo == foo
> > > 
> > > Thus I am not sure at the moment if this is really the right approach
> > > and if it would be better to follow the xt_connlabel model and ask kernel
> > > for the desired *bit* instead.
> > > 
> > > It could be done by adding a 'ctlabel' instruction that would check the
> > > bit given in sreg (eg. 3 would do test_bit(... 3) and BREAK if its not set.
> > 
> > Actually you can also easily do this with by just generating an equality
> > comparison in userspace. Since you're using bitmask as a base type,
> > nft automatically generates instructions to only check that bit when
> > using an *implicit* op. If you use
> > 
> > "filter output ct labels == foo" that should match only the one label.
> 
> src/nft add rule filter output ct labels == foo --debug=netlink
> ip filter output 0 0
> [ ct load labels => reg 1 ]
> [ cmp eq reg 1 0x00000001 0x00000000 0x00000000 0x00000000 ]
> 
> I guess I am doing something wrong in my patch.
> Let me investigate.  The behaviour you describe seems to be exactly
> what I want.

This looks correct, doesn't it? Its an equality comparison of the label.

> > I have to admit that I don't know how connlabels are used, so I don't
> > know what the best default implicit op would be.
> 
> I think it should be a bit-test to make it behave like xt_connlabels
> match.

Ok I misunderstood your initial problem statement. So basically what it
should currently do:

ct labels foo => test whether that bit is set
ct labels == foo => test whether foo and only foo is set

Ok I can see the problem :)

The implicit op only selects FLAGCMP for EXPR_LIST (see
expr_evaluate_relational()). That should probably be changed to take the
base type into account. This also seems wrong for the ct state expression,
we currently use equality if only one state is specified but use a flag
comparison if multiple flags are specified.

If you change the default case to take the basetype into account it should
do what you want.

> > > +static void ct_label_type_print(const struct expr *expr)
> > > +{
> > > +	const struct symbolic_constant *s;
> > > +	unsigned long bit = 0;
> > > +	int cnt = 0;
> > > +
> > > +	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
> > > +		bit = mpz_scan1(expr->value, s->value);
> > > +		if (bit < CT_LABELS_BIT_SIZE && bit == s->value) {
> > > +			if (cnt)
> > > +				printf("|");
> > 
> > This looks incorrect since it doesn't match the input format. When using
> > a bitmask base type, nft should automatically reconstruct the list as
> > comma-seperated values. So I think you don't need a print function.
> 
> see below
> 
> > > +static struct error_record *ct_label_type_parse(const struct expr *sym,
> > > +						struct expr **res)
> > > +{
> > > +	const struct symbolic_constant *s;
> > > +	const struct datatype *dtype;
> > > +	uint8_t data[CT_LABELS_BIT_SIZE];
> > > +	mpz_t value;
> > > +
> > > +	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
> > > +		if (!strcmp(sym->identifier, s->identifier))
> > > +			break;
> > > +	}
> > > +
> > > +	dtype = sym->dtype;
> > > +	if (s->identifier == NULL)
> > > +		return error(&sym->location, "Could not parse %s", dtype->desc);
> > > +
> > > +	mpz_init2(value, dtype->size);
> > > +	mpz_setbit(value, s->value);
> > > +	mpz_export_data(data, value, BYTEORDER_HOST_ENDIAN, sizeof(data));
> > > +
> > > +	*res = constant_expr_alloc(&sym->location, dtype,
> > > +				   dtype->byteorder, sizeof(data),
> > > +				   data);
> > > +	mpz_clear(value);
> > > +	return NULL;
> > > +}
> > 
> > Same here. The defaults should work fine.
> 
> Hmm, I think the problem is that connlabel.conf
> maps the bit-number to names, e.g.
> 
> 0 foo
> 1 bar
> 
> But these should be parsed as (1<<$number)
> [ in iptables these values are passed to the kernel which
>   uses them as argument to test_bit and friends ].
> 
> Could probably be fixed by having my own parsing
> function instead of reusing the rt_symbol_table one, but I would
> need to replace the uint64_t value in the sym table structure
> for that [need to be able to store (1<<127)]...

I see. Actually I think we could change the rt_parse_symbol function
to generate bitmasks in case of basetype == bitmask_type.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Feb. 16, 2014, 11:27 a.m. UTC | #4
Patrick McHardy <kaber@trash.net> wrote:
> > src/nft add rule filter output ct labels == foo --debug=netlink
> > ip filter output 0 0
> > [ ct load labels => reg 1 ]
> > [ cmp eq reg 1 0x00000001 0x00000000 0x00000000 0x00000000 ]
> > 
> > I guess I am doing something wrong in my patch.
> > Let me investigate.  The behaviour you describe seems to be exactly
> > what I want.
> 
> This looks correct, doesn't it? Its an equality comparison of the label.

Argh.  Sorry.  I misunderstood what you were saying :-/
This is not the behaviour that iptables -m connlabel --label foo
would have, since that peforms bit-test.

The equality compare would only match if 'foo' and only 'foo' label
is set.  This is obviously not *wrong*.  It just needs to be understood
that when doing 'ct labels foo' it may not be whats expected.

I think i'll leave it as-is for the time being; I like the register
approach since you can test for multiple labels in one operation
and can even express things like 'match if foo and bar are both set but
baz is not'.

> > > I have to admit that I don't know how connlabels are used, so I don't
> > > know what the best default implicit op would be.
> > 
> > I think it should be a bit-test to make it behave like xt_connlabels
> > match.
> 
> Ok I misunderstood your initial problem statement. So basically what it
> should currently do:
> 
> ct labels foo => test whether that bit is set
> ct labels == foo => test whether foo and only foo is set

Yes :-)

I think it would mimic the connlabel match in itpables better.

> Ok I can see the problem :)
> 
> The implicit op only selects FLAGCMP for EXPR_LIST (see
> expr_evaluate_relational()). That should probably be changed to take the
> base type into account. This also seems wrong for the ct state expression,
> we currently use equality if only one state is specified but use a flag
> comparison if multiple flags are specified.
> 
> If you change the default case to take the basetype into account it should
> do what you want.

Awesome, thanks a lot Patrick.  I'll try it out later today.

> > Could probably be fixed by having my own parsing
> > function instead of reusing the rt_symbol_table one, but I would
> > need to replace the uint64_t value in the sym table structure
> > for that [need to be able to store (1<<127)]...
> 
> I see. Actually I think we could change the rt_parse_symbol function
> to generate bitmasks in case of basetype == bitmask_type.

Ok.  I'll look into this, but, as I said I would have to replace
uint64_t with mpz_t to get the wider range required.

Thanks for your help.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Feb. 16, 2014, 11:33 a.m. UTC | #5
On Sun, Feb 16, 2014 at 12:27:39PM +0100, Florian Westphal wrote:
> Patrick McHardy <kaber@trash.net> wrote:
> 
> > > Could probably be fixed by having my own parsing
> > > function instead of reusing the rt_symbol_table one, but I would
> > > need to replace the uint64_t value in the sym table structure
> > > for that [need to be able to store (1<<127)]...
> > 
> > I see. Actually I think we could change the rt_parse_symbol function
> > to generate bitmasks in case of basetype == bitmask_type.
> 
> Ok.  I'll look into this, but, as I said I would have to replace
> uint64_t with mpz_t to get the wider range required.

Not sure if that will even work since those types have dynamic sizes.

If you keep your own functions (which seem to make sense), they should
only parse and print a single value at a time, handling lists and bitmasks
is done by the generic code - at least it should be :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Feb. 16, 2014, 4:59 p.m. UTC | #6
Patrick McHardy <kaber@trash.net> wrote:
> Ok I misunderstood your initial problem statement. So basically what it
> should currently do:
> 
> ct labels foo => test whether that bit is set
> ct labels == foo => test whether foo and only foo is set
> 
> Ok I can see the problem :)
> 
> The implicit op only selects FLAGCMP for EXPR_LIST (see
> expr_evaluate_relational()). That should probably be changed to take the
> base type into account. This also seems wrong for the ct state expression,
> we currently use equality if only one state is specified but use a flag
> comparison if multiple flags are specified.

I hacked something up to also select FLAGCMP for bitmask type.

$ nft --debug=netlink add rule filter output ct labels foo
ip filter output 0 0
 [ ct load labels => reg 1 ]
 [ bitwise reg 1 = (reg=1 & 0x00000001 0x00000000 0x00000000 0x00000000 ) ^ 0x00000000 0x00000000 0x00000000 0x00000000 ]
 [ cmp neq reg 1 0x00000001 0x00000000 0x00000000 0x00000000 ]

looks better.  Still not exactly the same though.
The cmp neq will cause it to match when the label is not set.

I then tried again with vanilla master branch:
tcp flags syn counter packets 0 bytes 0
tcp flags fin,syn counter packets 184 bytes 24880

So, same problem there: EXPR_LIST == cmp neq.  Is that intentional?
It seems wrong to me, e.g.  "tcp flags fin,syn" will match virtually all
tcp packets.

Maybe netlink_gen_flagcmp() should generate NFT_CMP_GT i.e.:
 [ bitwise reg 1 = (reg=1 & 0x00000012 ) ^ 0x00000000 ]
 [ cmp gt reg 1 0x00000000 ]

At least that would be what I would have expected :-}

Am I wrong?
As a side note, experimenting a bit with tcp flags:

add rule filter output tcp flags & (syn|ack) == (syn|ack)

works fine with current master branch.  But list shows

"tcp flags & 18 == 18", i.e. no symbol translation.

Shouldn't it restore the symbolic names?
I think this is the very same problem that I had with my connlabel
dabbling, so it would be nice if it could be solved in generic way.

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/datatype.h b/include/datatype.h
index 84dcdc8..3176092 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -32,6 +32,7 @@ 
  * @TYPE_CT_STATE:	conntrack state (bitmask subtype)
  * @TYPE_CT_DIR:	conntrack direction
  * @TYPE_CT_STATUS:	conntrack status (bitmask subtype)
+ * @TYPE_CT_LABELS:	Conntrack Labels (bitmask subtype)
  * @TYPE_ICMP6_TYPE:	ICMPv6 type codes (integer subtype)
  */
 enum datatypes {
@@ -63,6 +64,7 @@  enum datatypes {
 	TYPE_CT_STATE,
 	TYPE_CT_DIR,
 	TYPE_CT_STATUS,
+	TYPE_CT_LABELS,
 	TYPE_ICMP6_TYPE,
 	__TYPE_MAX
 };
diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h
index a236cc3..8d9d317 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -507,6 +507,7 @@  enum nft_meta_attributes {
  * @NFT_CT_PROTOCOL: conntrack layer 4 protocol
  * @NFT_CT_PROTO_SRC: conntrack layer 4 protocol source
  * @NFT_CT_PROTO_DST: conntrack layer 4 protocol destination
+ * @NFT_CT_LABELS: conntrack label bitset (stored in conntrack extension)
  */
 enum nft_ct_keys {
 	NFT_CT_STATE,
@@ -522,6 +523,7 @@  enum nft_ct_keys {
 	NFT_CT_PROTOCOL,
 	NFT_CT_PROTO_SRC,
 	NFT_CT_PROTO_DST,
+	NFT_CT_LABELS,
 };
 
 /**
diff --git a/src/ct.c b/src/ct.c
index c7849fb..7639944 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -20,8 +20,10 @@ 
 #include <linux/netfilter/nf_conntrack_common.h>
 #include <linux/netfilter/nf_conntrack_tuple_common.h>
 
+#include <erec.h>
 #include <expression.h>
 #include <datatype.h>
+#include <gmputil.h>
 #include <ct.h>
 #include <utils.h>
 
@@ -90,6 +92,65 @@  static const struct datatype ct_status_type = {
 	.sym_tbl	= &ct_status_tbl,
 };
 
+static struct symbol_table *ct_label_tbl;
+#define CT_LABELS_BIT_SIZE 128
+
+static void ct_label_type_print(const struct expr *expr)
+{
+	const struct symbolic_constant *s;
+	unsigned long bit = 0;
+	int cnt = 0;
+
+	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
+		bit = mpz_scan1(expr->value, s->value);
+		if (bit < CT_LABELS_BIT_SIZE && bit == s->value) {
+			if (cnt)
+				printf("|");
+			printf("%s", s->identifier);
+			++cnt;
+		}
+	}
+}
+
+static struct error_record *ct_label_type_parse(const struct expr *sym,
+						struct expr **res)
+{
+	const struct symbolic_constant *s;
+	const struct datatype *dtype;
+	uint8_t data[CT_LABELS_BIT_SIZE];
+	mpz_t value;
+
+	for (s = ct_label_tbl->symbols; s->identifier != NULL; s++) {
+		if (!strcmp(sym->identifier, s->identifier))
+			break;
+	}
+
+	dtype = sym->dtype;
+	if (s->identifier == NULL)
+		return error(&sym->location, "Could not parse %s", dtype->desc);
+
+	mpz_init2(value, dtype->size);
+	mpz_setbit(value, s->value);
+	mpz_export_data(data, value, BYTEORDER_HOST_ENDIAN, sizeof(data));
+
+	*res = constant_expr_alloc(&sym->location, dtype,
+				   dtype->byteorder, sizeof(data),
+				   data);
+	mpz_clear(value);
+	return NULL;
+}
+
+static const struct datatype ct_labels_type = {
+	.type		= TYPE_CT_LABELS,
+	.name		= "ct_labels",
+	.desc		= "conntrack labels",
+	.byteorder	= BYTEORDER_HOST_ENDIAN,
+	.size		= CT_LABELS_BIT_SIZE,
+	.basetype	= &bitmask_type,
+	.print		= ct_label_type_print,
+	.parse		= ct_label_type_parse,
+};
+
 static const struct ct_template ct_templates[] = {
 	[NFT_CT_STATE]		= CT_TEMPLATE("state",	    &ct_state_type,
 					      BYTEORDER_HOST_ENDIAN,
@@ -124,6 +185,9 @@  static const struct ct_template ct_templates[] = {
 	[NFT_CT_PROTO_DST]	= CT_TEMPLATE("proto-dst",  &invalid_type,
 					      BYTEORDER_BIG_ENDIAN,
 					      2 * BITS_PER_BYTE),
+	[NFT_CT_LABELS]		= CT_TEMPLATE("labels", &ct_labels_type,
+					      BYTEORDER_HOST_ENDIAN,
+					      CT_LABELS_BIT_SIZE),
 };
 
 static void ct_expr_print(const struct expr *expr)
@@ -159,4 +223,5 @@  static void __init ct_init(void)
 	datatype_register(&ct_state_type);
 	datatype_register(&ct_dir_type);
 	datatype_register(&ct_status_type);
+	ct_label_tbl = rt_symbol_table_init("/etc/xtables/connlabel.conf");
 }
diff --git a/src/parser.y b/src/parser.y
index 345d8d0..24bd9df 100644
--- a/src/parser.y
+++ b/src/parser.y
@@ -305,6 +305,7 @@  static void location_update(struct location *loc, struct location *rhs, int n)
 %token L3PROTOCOL		"l3proto"
 %token PROTO_SRC		"proto-src"
 %token PROTO_DST		"proto-dst"
+%token LABELS			"labels"
 
 %token COUNTER			"counter"
 %token PACKETS			"packets"
@@ -1405,6 +1406,7 @@  ct_key			:	STATE		{ $$ = NFT_CT_STATE; }
 			|	PROTOCOL	{ $$ = NFT_CT_PROTOCOL; }
 			|	PROTO_SRC	{ $$ = NFT_CT_PROTO_SRC; }
 			|	PROTO_DST	{ $$ = NFT_CT_PROTO_DST; }
+			|	LABELS		{ $$ = NFT_CT_LABELS; }
 			;
 
 payload_expr		:	payload_raw_expr
diff --git a/src/scanner.l b/src/scanner.l
index c47e610..bfb087c 100644
--- a/src/scanner.l
+++ b/src/scanner.l
@@ -399,6 +399,7 @@  addrstring	({macaddr}|{ip4addr}|{ip6addr})
 "l3proto"		{ return L3PROTOCOL; }
 "proto-src"		{ return PROTO_SRC; }
 "proto-dst"		{ return PROTO_DST; }
+"labels"		{ return LABELS; }
 
 {addrstring}		{
 				yylval->string = xstrdup(yytext);