[nftables,v2] src: Replace TOS support for using DSCP support
diff mbox

Message ID 1401200300-24583-1-git-send-email-alvaroneay@gmail.com
State Deferred
Headers show

Commit Message

Alvaro Neira May 27, 2014, 2:18 p.m. UTC
From: Álvaro Neira Ayuso <alvaroneay@gmail.com>

Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
---
[changes in v2]
* Replace TOS support for using DSCP support

Now, when we add a rule with DSCP, in the code generation step, nftables
compares 1 bytes but it should compare 6 bits. I think that the problem should
be in the code generation.

 include/datatype.h |    2 ++
 include/proto.h    |    2 +-
 src/parser.y       |    4 +--
 src/proto.c        |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/scanner.l      |    2 +-
 5 files changed, 82 insertions(+), 5 deletions(-)

Comments

Patrick McHardy June 1, 2014, 8:27 p.m. UTC | #1
On Tue, May 27, 2014 at 04:18:20PM +0200, Alvaro Neira Ayuso wrote:
> From: Álvaro Neira Ayuso <alvaroneay@gmail.com>
> 
> Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
> ---
> [changes in v2]
> * Replace TOS support for using DSCP support
> 
> Now, when we add a rule with DSCP, in the code generation step, nftables
> compares 1 bytes but it should compare 6 bits. I think that the problem should
> be in the code generation.

I don't really see how this patch changes this. The kernel operates in units
of bytes. For anything smaller nftables will have to generate appropriate
bitwise operations. Please explain in more detail how this patch changes this.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alvaro Neira June 2, 2014, 8:06 a.m. UTC | #2
El 01/06/14 22:27, Patrick McHardy escribió:
> On Tue, May 27, 2014 at 04:18:20PM +0200, Alvaro Neira Ayuso wrote:
>> From: Álvaro Neira Ayuso <alvaroneay@gmail.com>
>>
>> Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
>> ---
>> [changes in v2]
>> * Replace TOS support for using DSCP support
>>
>> Now, when we add a rule with DSCP, in the code generation step, nftables
>> compares 1 bytes but it should compare 6 bits. I think that the problem should
>> be in the code generation.
>
> I don't really see how this patch changes this. The kernel operates in units
> of bytes. For anything smaller nftables will have to generate appropriate
> bitwise operations. Please explain in more detail how this patch changes this.
>

Now, nothing. For that it's stopped. I'm working for doing a patch for 
operating in the kernel not only with units of bytes like you say. In a 
couple of days, I'm going to send it to the list.

Thanks for the review, Patrick

Álvaro
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal June 2, 2014, 8:58 a.m. UTC | #3
Álvaro Neira Ayuso <alvaroneay@gmail.com> wrote:
> >>Now, when we add a rule with DSCP, in the code generation step, nftables
> >>compares 1 bytes but it should compare 6 bits. I think that the problem should
> >>be in the code generation.
> >
> >I don't really see how this patch changes this. The kernel operates in units
> >of bytes. For anything smaller nftables will have to generate appropriate
> >bitwise operations. Please explain in more detail how this patch changes this.
> >
> 
> Now, nothing. For that it's stopped. I'm working for doing a patch
> for operating in the kernel not only with units of bytes like you
> say. In a couple of days, I'm going to send it to the list.

Are you sure this is the right approach?

It might be better to create appropriate masking instructions in
userspace, in most cases byte addressing is sufficient.

Something like this (warning: untested, misses 'reverse' mapping to
remove the implicit bitops when listing rules):

http://git.breakpoint.cc/cgit/fw/nftables.git/commit/?h=payload_offset_04&id=76ac27643400111785a8abb21fdd9e4311d9876e
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alvaro Neira June 2, 2014, 6:57 p.m. UTC | #4
El 02/06/14 10:58, Florian Westphal escribió:
> Álvaro Neira Ayuso <alvaroneay@gmail.com> wrote:
>>>> Now, when we add a rule with DSCP, in the code generation step, nftables
>>>> compares 1 bytes but it should compare 6 bits. I think that the problem should
>>>> be in the code generation.
>>>
>>> I don't really see how this patch changes this. The kernel operates in units
>>> of bytes. For anything smaller nftables will have to generate appropriate
>>> bitwise operations. Please explain in more detail how this patch changes this.
>>>
>>
>> Now, nothing. For that it's stopped. I'm working for doing a patch
>> for operating in the kernel not only with units of bytes like you
>> say. In a couple of days, I'm going to send it to the list.
>
> Are you sure this is the right approach?
>
> It might be better to create appropriate masking instructions in
> userspace, in most cases byte addressing is sufficient.
>
> Something like this (warning: untested, misses 'reverse' mapping to
> remove the implicit bitops when listing rules):
>
> http://git.breakpoint.cc/cgit/fw/nftables.git/commit/?h=payload_offset_04&id=76ac27643400111785a8abb21fdd9e4311d9876e
>

I have explained very bad. I'm working in a patch like you but I have 
done a different solution. I have done my solution in the evaluation. I 
have added a bitwise node in the tree when we evaluate the relational if 
we have a EXPR_PAYLOAD node in the left and when the size of this left 
node is not a multiple of BITS_PER_BYTE. And I have used the function 
mpz_prefixmask for doing the masks. The problem come when I have added a 
rule like:

nft add rule ip filter input ip frag-off != 0

The mask that we need to use for take the 13 bits for frag-off is like this:
|00052|N-|00002|	|len |flags| type|
|00008|--|00001|	|len |flags| type|
| 00 00 00 01  |	|      data      |	
|00008|--|00002|	|len |flags| type|
| 00 00 00 01  |	|      data      |	
|00008|--|00003|	|len |flags| type|
| 00 00 00 02  |	|      data      |	
|00012|N-|00004|	|len |flags| type|
|00006|--|00001|	|len |flags| type|
| 1f ff 00 00  |	|      data      |	

The problem is when I have seen the mask of the bitwise in the kernel, I 
have seen that the mask is 0xff1f. I'm working for trying to fix that. I 
have thought that maybe was a problem that I have tried this rule 
without my patch and we have the same problem:

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski June 3, 2014, 5:26 a.m. UTC | #5
On Mon, Jun 2, 2014 at 1:58 AM, Florian Westphal <fw@strlen.de> wrote:
> Álvaro Neira Ayuso <alvaroneay@gmail.com> wrote:
>> >>Now, when we add a rule with DSCP, in the code generation step, nftables
>> >>compares 1 bytes but it should compare 6 bits. I think that the problem should
>> >>be in the code generation.
>> >
>> >I don't really see how this patch changes this. The kernel operates in units
>> >of bytes. For anything smaller nftables will have to generate appropriate
>> >bitwise operations. Please explain in more detail how this patch changes this.
>> >
>>
>> Now, nothing. For that it's stopped. I'm working for doing a patch
>> for operating in the kernel not only with units of bytes like you
>> say. In a couple of days, I'm going to send it to the list.
>
> Are you sure this is the right approach?
>
> It might be better to create appropriate masking instructions in
> userspace, in most cases byte addressing is sufficient.
>
> Something like this (warning: untested, misses 'reverse' mapping to
> remove the implicit bitops when listing rules):
>
> http://git.breakpoint.cc/cgit/fw/nftables.git/commit/?h=payload_offset_04&id=76ac27643400111785a8abb21fdd9e4311d9876e
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I know very little about how nftables is actually implemented.  That said:

The IPv6 TOS byte is actually half in one byte half in the next, as a
result of which the 6 dscp bits are 4 bits of one byte and 2 bits of
the next.
The IPv6 flowinfo field is similarly a 20 bit field 12 bits into the packet.
The IPv4 fragment offset is 13 bits in size.

I wonder if for performance reasons the right answer isn't to allow a
fetch N<=64 bits from bit offset M (M=0..inf) opcode as opposed to
doing the masking by hand (perhaps require (M%8)+N<=64 as well).
Perhaps it should also be capable of left shifting the result by K
bits (to deal with fields - like the ipv4 header length field - which
count dwords instead of bytes).
Theoretically with a sufficiently good jit compiler it shouldn't
matter... but I'm assuming that's in the far future.  (??)

I'm not sure if there should be multiple versions of this
(little-endian vs big-endian, bit ordering...) or just the immediately
useful ones (ie. network byte order).

- Maciej
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/include/datatype.h b/include/datatype.h
index 2c66e9d..87f1202 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -35,6 +35,7 @@ 
  * @TYPE_CT_STATUS:	conntrack status (bitmask subtype)
  * @TYPE_ICMP6_TYPE:	ICMPv6 type codes (integer subtype)
  * @TYPE_CT_LABEL:	Conntrack Label (bitmask subtype)
+ * @TYPE_DSCP:		Differentiated Services Code Point (integer subtype)
  */
 enum datatypes {
 	TYPE_INVALID,
@@ -68,6 +69,7 @@  enum datatypes {
 	TYPE_CT_STATUS,
 	TYPE_ICMP6_TYPE,
 	TYPE_CT_LABEL,
+	TYPE_DSCP,
 	__TYPE_MAX
 };
 #define TYPE_MAX		(__TYPE_MAX - 1)
diff --git a/include/proto.h b/include/proto.h
index bd3701e..2a9cb86 100644
--- a/include/proto.h
+++ b/include/proto.h
@@ -169,7 +169,7 @@  enum ip_hdr_fields {
 	IPHDR_INVALID,
 	IPHDR_VERSION,
 	IPHDR_HDRLENGTH,
-	IPHDR_TOS,
+	IPHDR_DSCP,
 	IPHDR_LENGTH,
 	IPHDR_ID,
 	IPHDR_FRAG_OFF,
diff --git a/src/parser.y b/src/parser.y
index 9c20737..e64192c 100644
--- a/src/parser.y
+++ b/src/parser.y
@@ -229,7 +229,7 @@  static void location_update(struct location *loc, struct location *rhs, int n)
 %token IP			"ip"
 %token VERSION			"version"
 %token HDRLENGTH		"hdrlength"
-%token TOS			"tos"
+%token DSCP			"dscp"
 %token LENGTH			"length"
 %token FRAG_OFF			"frag-off"
 %token TTL			"ttl"
@@ -1792,7 +1792,7 @@  ip_hdr_expr		:	IP	ip_hdr_field
 
 ip_hdr_field		:	VERSION		{ $$ = IPHDR_VERSION; }
 			|	HDRLENGTH	{ $$ = IPHDR_HDRLENGTH; }
-			|	TOS		{ $$ = IPHDR_TOS; }
+			|	DSCP		{ $$ = IPHDR_DSCP; }
 			|	LENGTH		{ $$ = IPHDR_LENGTH; }
 			|	ID		{ $$ = IPHDR_ID; }
 			|	FRAG_OFF	{ $$ = IPHDR_FRAG_OFF; }
diff --git a/src/proto.c b/src/proto.c
index 0a37a65..0639d37 100644
--- a/src/proto.c
+++ b/src/proto.c
@@ -478,10 +478,84 @@  const struct proto_desc proto_sctp = {
  */
 
 #include <netinet/ip.h>
+
+static const struct symbol_table dscp_type_tbl = {
+	.symbols	= {
+		SYMBOL("CS0",	0x00),
+		SYMBOL("CS1",	0x08),
+		SYMBOL("CS2",	0x10),
+		SYMBOL("CS3",	0x18),
+		SYMBOL("CS4",	0x20),
+		SYMBOL("CS5",	0x28),
+		SYMBOL("CS6",	0x30),
+		SYMBOL("CS7",	0x38),
+		SYMBOL("BE",	0x00),
+		SYMBOL("AF11",	0x0a),
+		SYMBOL("AF12",	0x0c),
+		SYMBOL("AF13",	0x0e),
+		SYMBOL("AF21",	0x12),
+		SYMBOL("AF22",	0x14),
+		SYMBOL("AF23",	0x16),
+		SYMBOL("AF31",	0x1a),
+		SYMBOL("AF32",	0x1c),
+		SYMBOL("AF33",	0x1e),
+		SYMBOL("AF41",	0x22),
+		SYMBOL("AF42",	0x24),
+		SYMBOL("AF43",	0x26),
+		SYMBOL("EF",	0x2e),
+		SYMBOL_LIST_END
+	},
+};
+
+static struct error_record *dscp_type_parse(const struct expr *sym,
+						  struct expr **res)
+{
+	struct error_record *erec;
+	const struct symbolic_constant *s;
+
+	for (s = dscp_type_tbl.symbols; s->identifier != NULL; s++) {
+		if (!strcmp(sym->identifier, s->identifier)) {
+			*res = constant_expr_alloc(&sym->location, sym->dtype,
+						   sym->dtype->byteorder,
+						   sym->dtype->size,
+						   &s->value);
+			return NULL;
+		}
+	}
+
+	*res = NULL;
+	erec = sym->dtype->basetype->parse(sym, res);
+	if (erec != NULL)
+		return erec;
+	if (*res)
+		return NULL;
+
+	return symbolic_constant_parse(sym, &dscp_type_tbl, res);
+}
+
+static void dscp_type_print(const struct expr *expr)
+{
+	return symbolic_constant_print(&dscp_type_tbl, expr);
+}
+
+static const struct datatype dscp_type = {
+	.type		= TYPE_DSCP,
+	.name		= "dscp_type",
+	.desc		= "Differentiated Services Field",
+	.byteorder	= BYTEORDER_BIG_ENDIAN,
+	.size		= 6,
+	.basetype	= &integer_type,
+	.basefmt	= "0x%.2Zx",
+	.print		= dscp_type_print,
+	.parse		= dscp_type_parse,
+};
+
 #define IPHDR_FIELD(__name, __member) \
 	HDR_FIELD(__name, struct iphdr, __member)
 #define IPHDR_ADDR(__name, __member) \
 	HDR_TYPE(__name, &ipaddr_type, struct iphdr, __member)
+#define IPHDR_TOS(__name, __type) \
+	HDR_TYPE(__name, __type, struct iphdr, tos)
 
 const struct proto_desc proto_ip = {
 	.name		= "ip",
@@ -501,7 +575,7 @@  const struct proto_desc proto_ip = {
 	.templates	= {
 		[IPHDR_VERSION]		= HDR_BITFIELD("version", &integer_type, 0, 4),
 		[IPHDR_HDRLENGTH]	= HDR_BITFIELD("hdrlength", &integer_type, 4, 4),
-		[IPHDR_TOS]		= IPHDR_FIELD("tos",		tos),
+		[IPHDR_DSCP]		= IPHDR_TOS("dscp",		&dscp_type),
 		[IPHDR_LENGTH]		= IPHDR_FIELD("length",		tot_len),
 		[IPHDR_ID]		= IPHDR_FIELD("id",		id),
 		[IPHDR_FRAG_OFF]	= IPHDR_FIELD("frag-off",	frag_off),
@@ -811,4 +885,5 @@  static void __init proto_init(void)
 	datatype_register(&arpop_type);
 	datatype_register(&ethertype_type);
 	datatype_register(&icmp6_type_type);
+	datatype_register(&dscp_type);
 }
diff --git a/src/scanner.l b/src/scanner.l
index 801c030..8371e6f 100644
--- a/src/scanner.l
+++ b/src/scanner.l
@@ -329,7 +329,7 @@  addrstring	({macaddr}|{ip4addr}|{ip6addr})
 "ip"			{ return IP; }
 "version"		{ return VERSION; }
 "hdrlength"		{ return HDRLENGTH; }
-"tos"			{ return TOS; }
+"dscp"			{ return DSCP; }
 "length"		{ return LENGTH; }
 "frag-off"		{ return FRAG_OFF; }
 "ttl"			{ return TTL; }