diff mbox series

[ovs-dev,RFC,2/2] lib/tc: add set ipv6 traffic class action offload via pedit

Message ID 1547207513-22802-3-git-send-email-pieter.jansenvanvuuren@netronome.com
State Changes Requested
Headers show
Series extend ovs-tc offload for more pedit action | expand

Commit Message

Pieter Jansen van Vuuren Jan. 11, 2019, 11:51 a.m. UTC
Extend ovs-tc translation by allowing non-byte-aligned fields
for set actions. Use new boundary shifts and add set ipv6 traffic
class action offload via pedit.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 lib/byte-order.h         | 18 ++++++++++++
 lib/netdev-tc-offloads.c |  4 +++
 lib/tc.c                 | 63 ++++++++++++++++++++++++++++------------
 lib/tc.h                 |  1 +
 4 files changed, 68 insertions(+), 18 deletions(-)

Comments

Ben Pfaff Jan. 11, 2019, 4:49 p.m. UTC | #1
On Fri, Jan 11, 2019 at 11:51:53AM +0000, Pieter Jansen van Vuuren wrote:
> +/* These functions specifically help shifting words in network
> + * byte order, given that they are specified in host order. */
> +static inline uint32_t
> +shift_ovs_be32_left(uint32_t word, int shift)
> +{
> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
> +
> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
> +}
> +
> +static inline uint32_t
> +shift_ovs_be32_right(uint32_t word, int shift)
> +{
> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
> +
> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
> +}

I don't understand how these new operations make sense.  Can you
explain?
Pieter Jansen van Vuuren Jan. 11, 2019, 5:35 p.m. UTC | #2
On 11/01/2019 16:49, Ben Pfaff wrote:
> On Fri, Jan 11, 2019 at 11:51:53AM +0000, Pieter Jansen van Vuuren wrote:
>> +/* These functions specifically help shifting words in network
>> + * byte order, given that they are specified in host order. */
>> +static inline uint32_t
>> +shift_ovs_be32_left(uint32_t word, int shift)
>> +{
>> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
>> +
>> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
>> +}
>> +
>> +static inline uint32_t
>> +shift_ovs_be32_right(uint32_t word, int shift)
>> +{
>> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
>> +
>> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
>> +}
> 
> I don't understand how these new operations make sense.  Can you
> explain?
> 

Sure,

Background:
The original issue is that pedit performs set action operation per word, and
the way we are describing the translation from OvS to TC is by using byte
offsets(flower_pedit_map). However some fields like Traffic Class in IPv6 span
2 pedit-words - one nibble of the field is in one word and the other nibble is
in another word. In order to accommodate this a shift may be used.

We need to do this shift in network order thus we need the conversion from host
to network, but the types for htonl/ntohl either takes as parameter or returns
ovs_be32. Doing the shift operation on ovs_be32 results in the compiler
degrades to integer:

 error: restricted ovs_be32 degrades to integer

Passing that back to ntohl would result in:

 error: incorrect type in argument 1 (different base types)
    expected restricted ovs_be32 [usertype] x
    got unsigned int

therefore we introduced these new operations. Hope this helps. I'm open to
alternate approaches to this problem.
Ben Pfaff Jan. 11, 2019, 6:02 p.m. UTC | #3
On Fri, Jan 11, 2019 at 05:35:45PM +0000, Pieter Jansen van Vuuren wrote:
> On 11/01/2019 16:49, Ben Pfaff wrote:
> > On Fri, Jan 11, 2019 at 11:51:53AM +0000, Pieter Jansen van Vuuren wrote:
> >> +/* These functions specifically help shifting words in network
> >> + * byte order, given that they are specified in host order. */
> >> +static inline uint32_t
> >> +shift_ovs_be32_left(uint32_t word, int shift)
> >> +{
> >> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
> >> +
> >> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
> >> +}
> >> +
> >> +static inline uint32_t
> >> +shift_ovs_be32_right(uint32_t word, int shift)
> >> +{
> >> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
> >> +
> >> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
> >> +}
> > 
> > I don't understand how these new operations make sense.  Can you
> > explain?
> > 
> 
> Sure,
> 
> Background:
> The original issue is that pedit performs set action operation per word, and
> the way we are describing the translation from OvS to TC is by using byte
> offsets(flower_pedit_map). However some fields like Traffic Class in IPv6 span
> 2 pedit-words - one nibble of the field is in one word and the other nibble is
> in another word. In order to accommodate this a shift may be used.
> 
> We need to do this shift in network order thus we need the conversion from host
> to network, but the types for htonl/ntohl either takes as parameter or returns
> ovs_be32. Doing the shift operation on ovs_be32 results in the compiler
> degrades to integer:
> 
>  error: restricted ovs_be32 degrades to integer
> 
> Passing that back to ntohl would result in:
> 
>  error: incorrect type in argument 1 (different base types)
>     expected restricted ovs_be32 [usertype] x
>     got unsigned int
> 
> therefore we introduced these new operations. Hope this helps. I'm open to
> alternate approaches to this problem.

The part I don't understand is why doing a shift of a word in network
byte order makes any sense.  It will have different semantics depending
on whether the host is little-endian or big-endian.  Why is that
desirable?
Pieter Jansen van Vuuren Jan. 13, 2019, 11:59 a.m. UTC | #4
On 11/01/2019 18:02, Ben Pfaff wrote:
> On Fri, Jan 11, 2019 at 05:35:45PM +0000, Pieter Jansen van Vuuren wrote:
>> On 11/01/2019 16:49, Ben Pfaff wrote:
>>> On Fri, Jan 11, 2019 at 11:51:53AM +0000, Pieter Jansen van Vuuren wrote:
>>>> +/* These functions specifically help shifting words in network
>>>> + * byte order, given that they are specified in host order. */
>>>> +static inline uint32_t
>>>> +shift_ovs_be32_left(uint32_t word, int shift)
>>>> +{
>>>> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
>>>> +
>>>> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
>>>> +}
>>>> +
>>>> +static inline uint32_t
>>>> +shift_ovs_be32_right(uint32_t word, int shift)
>>>> +{
>>>> +        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
>>>> +
>>>> +        return ntohl((OVS_FORCE ovs_be32)word_shifted);
>>>> +}
>>>
>>> I don't understand how these new operations make sense.  Can you
>>> explain?
>>>
>>
>> Sure,
>>
>> Background:
>> The original issue is that pedit performs set action operation per word, and
>> the way we are describing the translation from OvS to TC is by using byte
>> offsets(flower_pedit_map). However some fields like Traffic Class in IPv6 span
>> 2 pedit-words - one nibble of the field is in one word and the other nibble is
>> in another word. In order to accommodate this a shift may be used.
>>
>> We need to do this shift in network order thus we need the conversion from host
>> to network, but the types for htonl/ntohl either takes as parameter or returns
>> ovs_be32. Doing the shift operation on ovs_be32 results in the compiler
>> degrades to integer:
>>
>>  error: restricted ovs_be32 degrades to integer
>>
>> Passing that back to ntohl would result in:
>>
>>  error: incorrect type in argument 1 (different base types)
>>     expected restricted ovs_be32 [usertype] x
>>     got unsigned int
>>
>> therefore we introduced these new operations. Hope this helps. I'm open to
>> alternate approaches to this problem.
> 
> The part I don't understand is why doing a shift of a word in network
> byte order makes any sense.  It will have different semantics depending
> on whether the host is little-endian or big-endian.  Why is that
> desirable?
> 

Oh, yes that makes sense. We would not want different semantics for little/big
endian machines.

One confusing aspect of the code is that for historical reasons the pedit deals
with values that are in network byte order using uint32_t variables. So I think
in this case the comment is incorrect and the usage of htonl/ntohl is inverted.
What we actually want is a shift of a value stored in network byte order, and
we want that shift done in host byte order (to avoid the problem you describe).

In terms of run-time the result would be the same as the current patch. But it
would be documented correctly. As a follow-up documentation of the byte order
used by pedit would likely be a good idea.
diff mbox series

Patch

diff --git a/lib/byte-order.h b/lib/byte-order.h
index 66d29a2b3..7bbadb24e 100644
--- a/lib/byte-order.h
+++ b/lib/byte-order.h
@@ -116,6 +116,24 @@  bytes_to_be32(uint8_t b0, uint8_t b1, uint8_t b2, uint8_t b3)
     return (OVS_FORCE ovs_be32) x;
 }
 
+/* These functions specifically help shifting words in network
+ * byte order, given that they are specified in host order. */
+static inline uint32_t
+shift_ovs_be32_left(uint32_t word, int shift)
+{
+        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) << shift;
+
+        return ntohl((OVS_FORCE ovs_be32)word_shifted);
+}
+
+static inline uint32_t
+shift_ovs_be32_right(uint32_t word, int shift)
+{
+        uint32_t word_shifted = (OVS_FORCE uint32_t)htonl(word) >> shift;
+
+        return ntohl((OVS_FORCE ovs_be32)word_shifted);
+}
+
 /* These functions zero-extend big-endian values to longer ones,
  * or truncate long big-endian value to shorter ones. */
 #ifndef __CHECKER__
diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index 90bd3c585..1c5bd22e4 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -84,6 +84,10 @@  static struct netlink_field set_flower_map[][4] = {
           offsetof(struct tc_flower_key, ipv6.rewrite_hlimit),
           MEMBER_SIZEOF(struct tc_flower_key, ipv6.rewrite_hlimit)
         },
+        { offsetof(struct ovs_key_ipv6, ipv6_tclass),
+          offsetof(struct tc_flower_key, ipv6.rewrite_tclass),
+          MEMBER_SIZEOF(struct tc_flower_key, ipv6.rewrite_tclass)
+        },
     },
     [OVS_KEY_ATTR_ETHERNET] = {
         { offsetof(struct ovs_key_ethernet, eth_src),
diff --git a/lib/tc.c b/lib/tc.c
index 1696c9836..f93814fa3 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -73,6 +73,7 @@  struct flower_key_to_pedit {
     int offset;
     int flower_offset;
     int size;
+    int boundary_shift;
 };
 
 static struct flower_key_to_pedit flower_pedit_map[] = {
@@ -80,72 +81,92 @@  static struct flower_key_to_pedit flower_pedit_map[] = {
         TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
         12,
         offsetof(struct tc_flower_key, ipv4.ipv4_src),
-        MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_src)
+        MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_src),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
         16,
         offsetof(struct tc_flower_key, ipv4.ipv4_dst),
-        MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_dst)
+        MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_dst),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
         8,
         offsetof(struct tc_flower_key, ipv4.rewrite_ttl),
-        MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_ttl)
+        MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_ttl),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
         1,
         offsetof(struct tc_flower_key, ipv4.rewrite_tos),
-        MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_tos)
+        MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_tos),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_IP6,
         7,
         offsetof(struct tc_flower_key, ipv6.rewrite_hlimit),
-        MEMBER_SIZEOF(struct tc_flower_key, ipv6.rewrite_hlimit)
+        MEMBER_SIZEOF(struct tc_flower_key, ipv6.rewrite_hlimit),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_IP6,
         8,
         offsetof(struct tc_flower_key, ipv6.ipv6_src),
-        MEMBER_SIZEOF(struct tc_flower_key, ipv6.ipv6_src)
+        MEMBER_SIZEOF(struct tc_flower_key, ipv6.ipv6_src),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_IP6,
         24,
         offsetof(struct tc_flower_key, ipv6.ipv6_dst),
-        MEMBER_SIZEOF(struct tc_flower_key, ipv6.ipv6_dst)
+        MEMBER_SIZEOF(struct tc_flower_key, ipv6.ipv6_dst),
+        0
+    }, {
+        TCA_PEDIT_KEY_EX_HDR_TYPE_IP6,
+        0,
+        offsetof(struct tc_flower_key, ipv6.rewrite_tclass),
+        MEMBER_SIZEOF(struct tc_flower_key, ipv6.rewrite_tclass),
+        4
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_ETH,
         6,
         offsetof(struct tc_flower_key, src_mac),
-        MEMBER_SIZEOF(struct tc_flower_key, src_mac)
+        MEMBER_SIZEOF(struct tc_flower_key, src_mac),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_ETH,
         0,
         offsetof(struct tc_flower_key, dst_mac),
-        MEMBER_SIZEOF(struct tc_flower_key, dst_mac)
+        MEMBER_SIZEOF(struct tc_flower_key, dst_mac),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_ETH,
         12,
         offsetof(struct tc_flower_key, eth_type),
-        MEMBER_SIZEOF(struct tc_flower_key, eth_type)
+        MEMBER_SIZEOF(struct tc_flower_key, eth_type),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_TCP,
         0,
         offsetof(struct tc_flower_key, tcp_src),
-        MEMBER_SIZEOF(struct tc_flower_key, tcp_src)
+        MEMBER_SIZEOF(struct tc_flower_key, tcp_src),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_TCP,
         2,
         offsetof(struct tc_flower_key, tcp_dst),
-        MEMBER_SIZEOF(struct tc_flower_key, tcp_dst)
+        MEMBER_SIZEOF(struct tc_flower_key, tcp_dst),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_UDP,
         0,
         offsetof(struct tc_flower_key, udp_src),
-        MEMBER_SIZEOF(struct tc_flower_key, udp_src)
+        MEMBER_SIZEOF(struct tc_flower_key, udp_src),
+        0
     }, {
         TCA_PEDIT_KEY_EX_HDR_TYPE_UDP,
         2,
         offsetof(struct tc_flower_key, udp_dst),
-        MEMBER_SIZEOF(struct tc_flower_key, udp_dst)
+        MEMBER_SIZEOF(struct tc_flower_key, udp_dst),
+        0
     },
 };
 
@@ -832,8 +853,11 @@  nl_parse_act_pedit(struct nlattr *options, struct tc_flower *flower)
                 int diff = flower_off + (keys->off - mf);
                 uint32_t *dst = (void *) (rewrite_key + diff);
                 uint32_t *dst_m = (void *) (rewrite_mask + diff);
-                uint32_t mask = ~(keys->mask);
-                uint32_t zero_bits;
+                uint32_t mask_word, data_word, mask, zero_bits;
+
+                mask_word = shift_ovs_be32_left(keys->mask, m->boundary_shift);
+                data_word = shift_ovs_be32_left(keys->val, m->boundary_shift);
+                mask = ~(mask_word);
 
                 if (keys->off < mf) {
                     zero_bits = 8 * (mf - keys->off);
@@ -844,7 +868,7 @@  nl_parse_act_pedit(struct nlattr *options, struct tc_flower *flower)
                 }
 
                 *dst_m |= mask;
-                *dst |= keys->val & mask;
+                *dst |= data_word & mask;
             }
         }
 
@@ -1832,6 +1856,7 @@  nl_msg_put_flower_rewrite_pedits(struct ofpbuf *request,
 
         for (j = 0; j < cnt; j++,  mask++, data++, cur_offset += 4) {
             uint32_t mask_word = *mask;
+            uint32_t data_word = *data;
 
             if (j == 0) {
                 mask_word &= first_word_mask;
@@ -1853,8 +1878,10 @@  nl_msg_put_flower_rewrite_pedits(struct ofpbuf *request,
             pedit_key_ex->cmd = TCA_PEDIT_KEY_EX_CMD_SET;
             pedit_key_ex->htype = m->htype;
             pedit_key->off = cur_offset;
+            mask_word = shift_ovs_be32_right(mask_word, m->boundary_shift);
+            data_word = shift_ovs_be32_right(data_word, m->boundary_shift);
             pedit_key->mask = ~mask_word;
-            pedit_key->val = *data & mask_word;
+            pedit_key->val = data_word & mask_word;
             sel.sel.nkeys++;
 
             err = csum_update_flag(flower, m->htype);
diff --git a/lib/tc.h b/lib/tc.h
index 04b08e298..6c909df21 100644
--- a/lib/tc.h
+++ b/lib/tc.h
@@ -109,6 +109,7 @@  struct tc_flower_key {
         struct in6_addr ipv6_src;
         struct in6_addr ipv6_dst;
         uint8_t rewrite_hlimit;
+        uint8_t rewrite_tclass;
     } ipv6;
 
     struct {