diff mbox series

[ovs-dev,v3,2/2] userspace: Add new option srv6_flowlabel in SRv6 tunnel.

Message ID 20230509093800.33596-3-nmiki@yahoo-corp.jp
State Changes Requested
Headers show
Series Support flowlabel calculation in SRv6 tunnels | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Nobuhiro MIKI May 9, 2023, 9:38 a.m. UTC
It supports flowlabel based load balancing by controlling the flowlabel
of outer IPv6 header, which is already implemented in Linux kernel as
seg6_flowlabel sysctl [1].

[1]: https://docs.kernel.org/networking/seg6-sysctl.html

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
---
 lib/flow.c                    | 24 +++++++++++
 lib/flow.h                    |  1 +
 lib/netdev-native-tnl.c       | 22 +++++++++-
 lib/netdev-vport.c            |  8 ++++
 lib/netdev.h                  | 12 ++++++
 tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
 vswitchd/vswitch.xml          | 29 +++++++++++++
 7 files changed, 174 insertions(+), 1 deletion(-)

Comments

Ilya Maximets May 10, 2023, 9:33 p.m. UTC | #1
On 5/9/23 11:38, Nobuhiro MIKI wrote:
> It supports flowlabel based load balancing by controlling the flowlabel
> of outer IPv6 header, which is already implemented in Linux kernel as
> seg6_flowlabel sysctl [1].
> 
> [1]: https://docs.kernel.org/networking/seg6-sysctl.html
> 
> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
> ---

Hi.  Thanks for the patch!  See some comments inline.

Best regards, Ilya Maximets.

>  lib/flow.c                    | 24 +++++++++++
>  lib/flow.h                    |  1 +
>  lib/netdev-native-tnl.c       | 22 +++++++++-
>  lib/netdev-vport.c            |  8 ++++
>  lib/netdev.h                  | 12 ++++++
>  tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
>  vswitchd/vswitch.xml          | 29 +++++++++++++
>  7 files changed, 174 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/flow.c b/lib/flow.c
> index 9501a259e9d4..f27ec4795bc7 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -2734,6 +2734,30 @@ flow_hash_in_wildcards(const struct flow *flow,
>      return hash_finish(hash, 8 * FLOW_U64S);
>  }
>  
> +uint32_t
> +flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
> +{
> +    uint32_t hash = basis;
> +
> +    if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
> +        const uint64_t *flow_u64 = (const uint64_t *) flow;
> +        int ofs = offsetof(struct flow, ipv6_src) / 8;
> +        int end = ofs + 2 * sizeof flow->ipv6_src / 8;
> +
> +        for (;ofs < end; ofs++) {
> +            hash = hash_add64(hash, flow_u64[ofs]);
> +        }
> +
> +        hash = hash_add(hash, flow->nw_proto);
> +        hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
> +    } else if (flow->dl_type == htons(ETH_TYPE_IP)
> +               || flow->dl_type == htons(ETH_TYPE_ARP)) {
> +        hash = flow_hash_5tuple(flow, basis);
> +    }
> +
> +    return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number. */
> +}

Is it necessary to use a custom hash function for this?
I guess, above can be replaced with a call to flow_hash_5tuple using the
ipv6_label as a basis.  OTOH, the seg6_make_flowlabel() in the kernel
just using the skb hash which likely doesn't include label and even just
a random number of locally originated traffic, so maybe mixing in the
label is not really necessary?

See comments below though.

> +
>  /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted as an
>   * OpenFlow 1.0 "dl_vlan" value:
>   *
> diff --git a/lib/flow.h b/lib/flow.h
> index a9d026e1ce3b..7b8ef5164465 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -258,6 +258,7 @@ bool flow_hash_fields_valid(enum nx_hash_fields);
>  uint32_t flow_hash_in_wildcards(const struct flow *,
>                                  const struct flow_wildcards *,
>                                  uint32_t basis);
> +uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
>  
>  bool flow_equal_except(const struct flow *a, const struct flow *b,
>                         const struct flow_wildcards *);
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index 55e1bd567fa1..18bd9df57175 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -856,6 +856,7 @@ netdev_srv6_build_header(const struct netdev *netdev,
>      struct netdev_tunnel_config *tnl_cfg;
>      const struct in6_addr *segs;
>      struct srv6_base_hdr *srh;
> +    uint32_t ipv6_label = 0;
>      struct in6_addr *s;
>      ovs_be16 dl_type;
>      int err = 0;
> @@ -882,7 +883,26 @@ netdev_srv6_build_header(const struct netdev *netdev,
>          goto out;
>      }
>  
> -    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
> +    switch (tnl_cfg->srv6_flowlabel) {
> +    case SRV6_FLOWLABEL_COPY:
> +        ipv6_label = ntohl(params->flow->ipv6_label);
> +        break;
> +
> +    case SRV6_FLOWLABEL_ZERO:
> +        ipv6_label = 0;
> +        break;
> +
> +    case SRV6_FLOWLABEL_COMPUTE:
> +        ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
> +        break;
> +
> +    default:
> +        err = EINVAL;
> +        goto out;
> +    }
> +
> +    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
> +                                     ipv6_label);

We can't really do that on the header build stage.  The main reason
is that we're building the header based on the packet information.
In case of COPY we're using the label from the original packaet, in case
of COMPUTE we're using majority of the packet header to produce the value.
But if we're using packet fields, we have to add these fields to the
match criteria.  And that is not happening here.

I didn't try, but I'm sure that if you'll look at results of ofproto/trace
calls in your tests below, you'll see that even though produced actions
are different, the match criteria is exactly the same.  That means that
when the first packet with label A will hit the datapath, the flow that
pushes SRv6 header with flow label A will be added to the datapath.
If the next packet will have label B it will match the installed datapath
flow (because it doesn't match on the label) and the SRv6 header with
label A will be pushed.

Matching on the actual fields used to produce the action is necessary.
However, there is a workaround.  That is to generate these fileds
on a header push, and not on the header build.

See for example the netdev_tnl_push_udp_header() function.  It is updating
the UDP source port on header push.  IIUC, this is for the same reason
the SRv6 flow label is updated - to ensure load balancing of UDP flows
in tunnels.  The way update is done is by using packet's RSS hash.
See the netdev_tnl_get_src_port().  By using RSS hash we can avoid
re-parsing and re-hashing of the packet.  And packets from the same session
should normally have the same RSS hash, so that works.  And we don't need
to match on any extra fileds, because the hash of actual packet is used
while pushing the header.

Another example is netdev_gre_push_header() where we update a sequence
number while pushing a header (doesn't look thread-safe though :) ).

So, we should probbaly just use RSS hash as a flow label for SRv6 and
set it while pushing SRv6 header (in netdev_tnl_push_ip_header()) in the
COMPUTE case and parse out and use the ipv6_label of the current packet
in case of COPY.  The actual packet is available in the header push
context.  And kernel's seg6_make_flowlabel() seems to do a very similar
thing.  Full packet parsing will not be good for performance reasons.

Does that make sense?

>      srh->rt_hdr.segments_left = nr_segs - 1;
>      srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>      srh->rt_hdr.hdrlen = 2 * nr_segs;
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 663ee8606c3b..2141621cf23e 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -792,6 +792,14 @@ set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
>                                name, node->value);
>                  break;
>              }
> +        } else if (!strcmp(node->key, "srv6_flowlabel")) {
> +            if (!strcmp(node->value, "zero")) {
> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
> +            } else if (!strcmp(node->value, "compute")) {
> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
> +            } else {
> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
> +            }
>          } else if (!strcmp(node->key, "payload_type")) {
>              if (!strcmp(node->value, "mpls")) {
>                   tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
> diff --git a/lib/netdev.h b/lib/netdev.h
> index ff207f56c28c..743a56ca1629 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -97,6 +97,17 @@ enum netdev_pt_mode {
>      NETDEV_PT_LEGACY_L3,
>  };
>  
> +enum netdev_srv6_flowlabel {
> +    /* Copy the flowlabel from inner packet. */
> +    SRV6_FLOWLABEL_COPY,
> +
> +    /* Simply set flowlabel to 0. */
> +    SRV6_FLOWLABEL_ZERO,
> +
> +    /* Calculate a hash for some fields and set the result to flowlabel. */
> +    SRV6_FLOWLABEL_COMPUTE,
> +};
> +
>  /* Configuration specific to tunnels. */
>  struct netdev_tunnel_config {
>      ovs_be64 in_key;
> @@ -144,6 +155,7 @@ struct netdev_tunnel_config {
>      uint8_t srv6_num_segs;
>      #define SRV6_MAX_SEGS 6
>      struct in6_addr srv6_segs[SRV6_MAX_SEGS];
> +    enum netdev_srv6_flowlabel srv6_flowlabel;
>  };
>  
>  void netdev_run(void);
> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
> index e300fe3a0d26..33edc8319eed 100644
> --- a/tests/tunnel-push-pop-ipv6.at
> +++ b/tests/tunnel-push-pop-ipv6.at
> @@ -1,5 +1,84 @@
>  AT_BANNER([tunnel_push_pop_ipv6])
>  
> +AT_SETUP([tunnel_push_pop_ipv6 - srv6])
> +
> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> +AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1 datapath_type=dummy], [0])
> +AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2 datapath_type=dummy], [0])
> +AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3 datapath_type=dummy], [0])
> +AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
> +                       options:remote_ip=2001:cafe::91 ofport_request=2 \
> +                       options:srv6_flowlabel=copy \
> +                       ], [0])
> +AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
> +                       options:remote_ip=2001:cafe::92 ofport_request=3 \
> +                       options:srv6_flowlabel=zero \
> +                       ], [0])
> +AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
> +                       options:remote_ip=2001:cafe::93 ofport_request=4 \
> +                       options:srv6_flowlabel=compute \
> +                       ], [0])
> +
> +dnl First setup dummy interface IP address, then add the route
> +dnl so that tnl-port table can get valid IP address for the device.
> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01], [0], [OK
> +])
> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02], [0], [OK
> +])
> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03], [0], [OK
> +])
> +AT_CHECK([ovs-ofctl add-flow br0 action=1])
> +AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
> +AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
> +AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
> +
> +dnl Check "srv6_flowlabel=copy".
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
> +cat stdout
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=12345
> +])
> +
> +dnl Check "srv6_flowlabel=zero".
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=0
> +])
> +
> +dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=944785
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=772289
> +])
> +
> +dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=468935
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=1012207
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=629290
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
>  
>  OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index edb5eafa04c3..7a0682503233 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3287,6 +3287,35 @@
>             <ref column="options" key="remote_ip"/>.
>          </p>
>        </column>
> +      <column name="options" key="srv6_flowlabel"
> +              type='{"type": "string",
> +                     "enum": ["set", ["zero", "copy", "compute"]]}'>
> +        <p>
> +          Optional.
> +          This option controls how flowlabel in outer IPv6 header is
> +          configured. This would give you the benefit of IPv6 flow label

We should avoid pronouns like 'you' in the documentation.  It should describe
the functionality without trying to talk to the reader.  E.g. 'It gives'
instead of 'it will give you'.

> +          based load balancing, which is supported by some popular vendor
> +          appliances. You can choose from the following three types, as

This also should be somehow re-worded to not have 'You'.  If nothing works,
can just be replaced with 'Users'.

> +          in net.ipv6.seg6_flowlabel syscall.

s/syscall/sysconfig/

> +        </p>
> +        <ul>
> +          <li>
> +            By default, or if this option is <code>copy</code>, copy the
> +            flowlabel of inner IPv6 header to the flowlabel of outer IPv6
> +            header. If inner header is not IPv6, it is set to 0.
> +          </li>
> +          <li>
> +            If this option is <code>zero</code>, simply set flowlabel to 0.
> +          </li>
> +          <li>
> +            If this option is <code>compute</code>, calculate a hash for
> +            some fields in inner header and set the result to flowlabel.

'some fields' is a bit strange thing to say.  "hash over fields of the inner
header" or smething like that.  Same applie to the comment in the code.

> +            If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
> +            flowlabel are the targets of hash calculation. If it is IPv4,
> +            src_ip, dst_ip, L4 proto, src_port, and dst_port are the targets.

I'm not sure if we can actually say which fields are going to be used
if we go with RSS hash.

> +          </li>
> +        </ul>
> +      </column>
>      </group>
>  
>      <group title="Patch Options">
Nobuhiro MIKI May 11, 2023, 10:39 a.m. UTC | #2
On 2023/05/11 6:33, Ilya Maximets wrote:
> On 5/9/23 11:38, Nobuhiro MIKI wrote:
>> It supports flowlabel based load balancing by controlling the flowlabel
>> of outer IPv6 header, which is already implemented in Linux kernel as
>> seg6_flowlabel sysctl [1].
>>
>> [1]: https://docs.kernel.org/networking/seg6-sysctl.html
>>
>> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
>> ---
> 
> Hi.  Thanks for the patch!  See some comments inline.
> 

Hi Ilya,

Thanks for your review!

> 
>>  lib/flow.c                    | 24 +++++++++++
>>  lib/flow.h                    |  1 +
>>  lib/netdev-native-tnl.c       | 22 +++++++++-
>>  lib/netdev-vport.c            |  8 ++++
>>  lib/netdev.h                  | 12 ++++++
>>  tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
>>  vswitchd/vswitch.xml          | 29 +++++++++++++
>>  7 files changed, 174 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/flow.c b/lib/flow.c
>> index 9501a259e9d4..f27ec4795bc7 100644
>> --- a/lib/flow.c
>> +++ b/lib/flow.c
>> @@ -2734,6 +2734,30 @@ flow_hash_in_wildcards(const struct flow *flow,
>>      return hash_finish(hash, 8 * FLOW_U64S);
>>  }
>>  
>> +uint32_t
>> +flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
>> +{
>> +    uint32_t hash = basis;
>> +
>> +    if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>> +        const uint64_t *flow_u64 = (const uint64_t *) flow;
>> +        int ofs = offsetof(struct flow, ipv6_src) / 8;
>> +        int end = ofs + 2 * sizeof flow->ipv6_src / 8;
>> +
>> +        for (;ofs < end; ofs++) {
>> +            hash = hash_add64(hash, flow_u64[ofs]);
>> +        }
>> +
>> +        hash = hash_add(hash, flow->nw_proto);
>> +        hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
>> +    } else if (flow->dl_type == htons(ETH_TYPE_IP)
>> +               || flow->dl_type == htons(ETH_TYPE_ARP)) {
>> +        hash = flow_hash_5tuple(flow, basis);
>> +    }
>> +
>> +    return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number. */
>> +}
> 
> Is it necessary to use a custom hash function for this?
> I guess, above can be replaced with a call to flow_hash_5tuple using the
> ipv6_label as a basis.  OTOH, the seg6_make_flowlabel() in the kernel
> just using the skb hash which likely doesn't include label and even just
> a random number of locally originated traffic, so maybe mixing in the
> label is not really necessary?
> 
> See comments below though.
> 

I also did a quick check seg6_make_flowlabel() in the kernel.
It calls skb_get_hash() and flowlabel seems to not be included.
I guess I was mistaken. Sorry. 

Af far as I know, the purpose is load balancing, so I feel 5tuple
hash is sufficient. And there is SRV6_FLOWLABEL_COPY if someone want
flowlabel based hash instead 5tuple hash.

>> +
>>  /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted as an
>>   * OpenFlow 1.0 "dl_vlan" value:
>>   *
>> diff --git a/lib/flow.h b/lib/flow.h
>> index a9d026e1ce3b..7b8ef5164465 100644
>> --- a/lib/flow.h
>> +++ b/lib/flow.h
>> @@ -258,6 +258,7 @@ bool flow_hash_fields_valid(enum nx_hash_fields);
>>  uint32_t flow_hash_in_wildcards(const struct flow *,
>>                                  const struct flow_wildcards *,
>>                                  uint32_t basis);
>> +uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
>>  
>>  bool flow_equal_except(const struct flow *a, const struct flow *b,
>>                         const struct flow_wildcards *);
>> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
>> index 55e1bd567fa1..18bd9df57175 100644
>> --- a/lib/netdev-native-tnl.c
>> +++ b/lib/netdev-native-tnl.c
>> @@ -856,6 +856,7 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>      struct netdev_tunnel_config *tnl_cfg;
>>      const struct in6_addr *segs;
>>      struct srv6_base_hdr *srh;
>> +    uint32_t ipv6_label = 0;
>>      struct in6_addr *s;
>>      ovs_be16 dl_type;
>>      int err = 0;
>> @@ -882,7 +883,26 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>          goto out;
>>      }
>>  
>> -    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
>> +    switch (tnl_cfg->srv6_flowlabel) {
>> +    case SRV6_FLOWLABEL_COPY:
>> +        ipv6_label = ntohl(params->flow->ipv6_label);
>> +        break;
>> +
>> +    case SRV6_FLOWLABEL_ZERO:
>> +        ipv6_label = 0;
>> +        break;
>> +
>> +    case SRV6_FLOWLABEL_COMPUTE:
>> +        ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
>> +        break;
>> +
>> +    default:
>> +        err = EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
>> +                                     ipv6_label);
> 
> We can't really do that on the header build stage.  The main reason
> is that we're building the header based on the packet information.
> In case of COPY we're using the label from the original packaet, in case
> of COMPUTE we're using majority of the packet header to produce the value.
> But if we're using packet fields, we have to add these fields to the
> match criteria.  And that is not happening here.
> 
> I didn't try, but I'm sure that if you'll look at results of ofproto/trace
> calls in your tests below, you'll see that even though produced actions
> are different, the match criteria is exactly the same.  That means that
> when the first packet with label A will hit the datapath, the flow that
> pushes SRv6 header with flow label A will be added to the datapath.
> If the next packet will have label B it will match the installed datapath
> flow (because it doesn't match on the label) and the SRv6 header with
> label A will be pushed.
> 
> Matching on the actual fields used to produce the action is necessary.
> However, there is a workaround.  That is to generate these fileds
> on a header push, and not on the header build.
> 
> See for example the netdev_tnl_push_udp_header() function.  It is updating
> the UDP source port on header push.  IIUC, this is for the same reason
> the SRv6 flow label is updated - to ensure load balancing of UDP flows
> in tunnels.  The way update is done is by using packet's RSS hash.
> See the netdev_tnl_get_src_port().  By using RSS hash we can avoid
> re-parsing and re-hashing of the packet.  And packets from the same session
> should normally have the same RSS hash, so that works.  And we don't need
> to match on any extra fileds, because the hash of actual packet is used
> while pushing the header.
> 
> Another example is netdev_gre_push_header() where we update a sequence
> number while pushing a header (doesn't look thread-safe though :) ).
> 
> So, we should probbaly just use RSS hash as a flow label for SRv6 and
> set it while pushing SRv6 header (in netdev_tnl_push_ip_header()) in the
> COMPUTE case and parse out and use the ipv6_label of the current packet
> in case of COPY.  The actual packet is available in the header push
> context.  And kernel's seg6_make_flowlabel() seems to do a very similar
> thing.  Full packet parsing will not be good for performance reasons.
> 
> Does that make sense?
> 

Thanks very much for your kind explanation.
It makes sense to me.

I would like to add a test for the flowlabel for the second and
subsequent packets in the packet capture by specifying options:pcap
for the dummy port and sending the packet with "ovs-appctl netdev-dummy/receive".
I have tried this and it seems to work.

Thanks also for the UDP and GRE examples. I now understand that both
rewrite fields at the time of header push.

Since header push is performed in the context of the actual packet,
I assume that header parsing and hashing is done for all packets.
For performance reasons, it makes sense to avoid parsing and hashing
and instead use the RSS hash. I would replace it with RSS hash.
And it can probably be retrieved with dp_packet_get_rss_hash().

>>      srh->rt_hdr.segments_left = nr_segs - 1;
>>      srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>>      srh->rt_hdr.hdrlen = 2 * nr_segs;
>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index 663ee8606c3b..2141621cf23e 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -792,6 +792,14 @@ set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
>>                                name, node->value);
>>                  break;
>>              }
>> +        } else if (!strcmp(node->key, "srv6_flowlabel")) {
>> +            if (!strcmp(node->value, "zero")) {
>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
>> +            } else if (!strcmp(node->value, "compute")) {
>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
>> +            } else {
>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
>> +            }
>>          } else if (!strcmp(node->key, "payload_type")) {
>>              if (!strcmp(node->value, "mpls")) {
>>                   tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
>> diff --git a/lib/netdev.h b/lib/netdev.h
>> index ff207f56c28c..743a56ca1629 100644
>> --- a/lib/netdev.h
>> +++ b/lib/netdev.h
>> @@ -97,6 +97,17 @@ enum netdev_pt_mode {
>>      NETDEV_PT_LEGACY_L3,
>>  };
>>  
>> +enum netdev_srv6_flowlabel {
>> +    /* Copy the flowlabel from inner packet. */
>> +    SRV6_FLOWLABEL_COPY,
>> +
>> +    /* Simply set flowlabel to 0. */
>> +    SRV6_FLOWLABEL_ZERO,
>> +
>> +    /* Calculate a hash for some fields and set the result to flowlabel. */
>> +    SRV6_FLOWLABEL_COMPUTE,
>> +};
>> +
>>  /* Configuration specific to tunnels. */
>>  struct netdev_tunnel_config {
>>      ovs_be64 in_key;
>> @@ -144,6 +155,7 @@ struct netdev_tunnel_config {
>>      uint8_t srv6_num_segs;
>>      #define SRV6_MAX_SEGS 6
>>      struct in6_addr srv6_segs[SRV6_MAX_SEGS];
>> +    enum netdev_srv6_flowlabel srv6_flowlabel;
>>  };
>>  
>>  void netdev_run(void);
>> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
>> index e300fe3a0d26..33edc8319eed 100644
>> --- a/tests/tunnel-push-pop-ipv6.at
>> +++ b/tests/tunnel-push-pop-ipv6.at
>> @@ -1,5 +1,84 @@
>>  AT_BANNER([tunnel_push_pop_ipv6])
>>  
>> +AT_SETUP([tunnel_push_pop_ipv6 - srv6])
>> +
>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>> +AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1 datapath_type=dummy], [0])
>> +AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2 datapath_type=dummy], [0])
>> +AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3 datapath_type=dummy], [0])
>> +AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
>> +                       options:remote_ip=2001:cafe::91 ofport_request=2 \
>> +                       options:srv6_flowlabel=copy \
>> +                       ], [0])
>> +AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
>> +                       options:remote_ip=2001:cafe::92 ofport_request=3 \
>> +                       options:srv6_flowlabel=zero \
>> +                       ], [0])
>> +AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
>> +                       options:remote_ip=2001:cafe::93 ofport_request=4 \
>> +                       options:srv6_flowlabel=compute \
>> +                       ], [0])
>> +
>> +dnl First setup dummy interface IP address, then add the route
>> +dnl so that tnl-port table can get valid IP address for the device.
>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01], [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02], [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03], [0], [OK
>> +])
>> +AT_CHECK([ovs-ofctl add-flow br0 action=1])
>> +AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
>> +AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
>> +AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
>> +
>> +dnl Check "srv6_flowlabel=copy".
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>> +cat stdout
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=12345
>> +])
>> +
>> +dnl Check "srv6_flowlabel=zero".
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=0
>> +])
>> +
>> +dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'], [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=944785
>> +])
>> +
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'], [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=772289
>> +])
>> +
>> +dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=468935
>> +])
>> +
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=1012207
>> +])
>> +
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=629290
>> +])
>> +
>> +OVS_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
>>  
>>  OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index edb5eafa04c3..7a0682503233 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -3287,6 +3287,35 @@
>>             <ref column="options" key="remote_ip"/>.
>>          </p>
>>        </column>
>> +      <column name="options" key="srv6_flowlabel"
>> +              type='{"type": "string",
>> +                     "enum": ["set", ["zero", "copy", "compute"]]}'>
>> +        <p>
>> +          Optional.
>> +          This option controls how flowlabel in outer IPv6 header is
>> +          configured. This would give you the benefit of IPv6 flow label
> 
> We should avoid pronouns like 'you' in the documentation.  It should describe
> the functionality without trying to talk to the reader.  E.g. 'It gives'
> instead of 'it will give you'.
> 

OK. I'll fix.

>> +          based load balancing, which is supported by some popular vendor
>> +          appliances. You can choose from the following three types, as
> 
> This also should be somehow re-worded to not have 'You'.  If nothing works,
> can just be replaced with 'Users'.
> 

OK.

>> +          in net.ipv6.seg6_flowlabel syscall.
> 
> s/syscall/sysconfig/
> 

OK.

>> +        </p>
>> +        <ul>
>> +          <li>
>> +            By default, or if this option is <code>copy</code>, copy the
>> +            flowlabel of inner IPv6 header to the flowlabel of outer IPv6
>> +            header. If inner header is not IPv6, it is set to 0.
>> +          </li>
>> +          <li>
>> +            If this option is <code>zero</code>, simply set flowlabel to 0.
>> +          </li>
>> +          <li>
>> +            If this option is <code>compute</code>, calculate a hash for
>> +            some fields in inner header and set the result to flowlabel.
> 
> 'some fields' is a bit strange thing to say.  "hash over fields of the inner
> header" or smething like that.  Same applie to the comment in the code.
> 

OK.

>> +            If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
>> +            flowlabel are the targets of hash calculation. If it is IPv4,
>> +            src_ip, dst_ip, L4 proto, src_port, and dst_port are the targets.
> 
> I'm not sure if we can actually say which fields are going to be used
> if we go with RSS hash.
> 

I will only mention that RSS hash is used, not field names.

>> +          </li>
>> +        </ul>
>> +      </column>
>>      </group>
>>  
>>      <group title="Patch Options">
>
Ilya Maximets May 11, 2023, 11:50 a.m. UTC | #3
On 5/11/23 12:39, Nobuhiro MIKI wrote:
> On 2023/05/11 6:33, Ilya Maximets wrote:
>> On 5/9/23 11:38, Nobuhiro MIKI wrote:
>>> It supports flowlabel based load balancing by controlling the flowlabel
>>> of outer IPv6 header, which is already implemented in Linux kernel as
>>> seg6_flowlabel sysctl [1].
>>>
>>> [1]: https://docs.kernel.org/networking/seg6-sysctl.html
>>>
>>> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
>>> ---
>>
>> Hi.  Thanks for the patch!  See some comments inline.
>>
> 
> Hi Ilya,
> 
> Thanks for your review!
> 
>>
>>>  lib/flow.c                    | 24 +++++++++++
>>>  lib/flow.h                    |  1 +
>>>  lib/netdev-native-tnl.c       | 22 +++++++++-
>>>  lib/netdev-vport.c            |  8 ++++
>>>  lib/netdev.h                  | 12 ++++++
>>>  tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
>>>  vswitchd/vswitch.xml          | 29 +++++++++++++
>>>  7 files changed, 174 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/flow.c b/lib/flow.c
>>> index 9501a259e9d4..f27ec4795bc7 100644
>>> --- a/lib/flow.c
>>> +++ b/lib/flow.c
>>> @@ -2734,6 +2734,30 @@ flow_hash_in_wildcards(const struct flow *flow,
>>>      return hash_finish(hash, 8 * FLOW_U64S);
>>>  }
>>>  
>>> +uint32_t
>>> +flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
>>> +{
>>> +    uint32_t hash = basis;
>>> +
>>> +    if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>>> +        const uint64_t *flow_u64 = (const uint64_t *) flow;
>>> +        int ofs = offsetof(struct flow, ipv6_src) / 8;
>>> +        int end = ofs + 2 * sizeof flow->ipv6_src / 8;
>>> +
>>> +        for (;ofs < end; ofs++) {
>>> +            hash = hash_add64(hash, flow_u64[ofs]);
>>> +        }
>>> +
>>> +        hash = hash_add(hash, flow->nw_proto);
>>> +        hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
>>> +    } else if (flow->dl_type == htons(ETH_TYPE_IP)
>>> +               || flow->dl_type == htons(ETH_TYPE_ARP)) {
>>> +        hash = flow_hash_5tuple(flow, basis);
>>> +    }
>>> +
>>> +    return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number. */
>>> +}
>>
>> Is it necessary to use a custom hash function for this?
>> I guess, above can be replaced with a call to flow_hash_5tuple using the
>> ipv6_label as a basis.  OTOH, the seg6_make_flowlabel() in the kernel
>> just using the skb hash which likely doesn't include label and even just
>> a random number of locally originated traffic, so maybe mixing in the
>> label is not really necessary?
>>
>> See comments below though.
>>
> 
> I also did a quick check seg6_make_flowlabel() in the kernel.
> It calls skb_get_hash() and flowlabel seems to not be included.
> I guess I was mistaken. Sorry. 
> 
> Af far as I know, the purpose is load balancing, so I feel 5tuple
> hash is sufficient. And there is SRV6_FLOWLABEL_COPY if someone want
> flowlabel based hash instead 5tuple hash.
> 
>>> +
>>>  /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted as an
>>>   * OpenFlow 1.0 "dl_vlan" value:
>>>   *
>>> diff --git a/lib/flow.h b/lib/flow.h
>>> index a9d026e1ce3b..7b8ef5164465 100644
>>> --- a/lib/flow.h
>>> +++ b/lib/flow.h
>>> @@ -258,6 +258,7 @@ bool flow_hash_fields_valid(enum nx_hash_fields);
>>>  uint32_t flow_hash_in_wildcards(const struct flow *,
>>>                                  const struct flow_wildcards *,
>>>                                  uint32_t basis);
>>> +uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
>>>  
>>>  bool flow_equal_except(const struct flow *a, const struct flow *b,
>>>                         const struct flow_wildcards *);
>>> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
>>> index 55e1bd567fa1..18bd9df57175 100644
>>> --- a/lib/netdev-native-tnl.c
>>> +++ b/lib/netdev-native-tnl.c
>>> @@ -856,6 +856,7 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>>      struct netdev_tunnel_config *tnl_cfg;
>>>      const struct in6_addr *segs;
>>>      struct srv6_base_hdr *srh;
>>> +    uint32_t ipv6_label = 0;
>>>      struct in6_addr *s;
>>>      ovs_be16 dl_type;
>>>      int err = 0;
>>> @@ -882,7 +883,26 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>>          goto out;
>>>      }
>>>  
>>> -    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
>>> +    switch (tnl_cfg->srv6_flowlabel) {
>>> +    case SRV6_FLOWLABEL_COPY:
>>> +        ipv6_label = ntohl(params->flow->ipv6_label);
>>> +        break;
>>> +
>>> +    case SRV6_FLOWLABEL_ZERO:
>>> +        ipv6_label = 0;
>>> +        break;
>>> +
>>> +    case SRV6_FLOWLABEL_COMPUTE:
>>> +        ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
>>> +        break;
>>> +
>>> +    default:
>>> +        err = EINVAL;
>>> +        goto out;
>>> +    }
>>> +
>>> +    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
>>> +                                     ipv6_label);
>>
>> We can't really do that on the header build stage.  The main reason
>> is that we're building the header based on the packet information.
>> In case of COPY we're using the label from the original packaet, in case
>> of COMPUTE we're using majority of the packet header to produce the value.
>> But if we're using packet fields, we have to add these fields to the
>> match criteria.  And that is not happening here.
>>
>> I didn't try, but I'm sure that if you'll look at results of ofproto/trace
>> calls in your tests below, you'll see that even though produced actions
>> are different, the match criteria is exactly the same.  That means that
>> when the first packet with label A will hit the datapath, the flow that
>> pushes SRv6 header with flow label A will be added to the datapath.
>> If the next packet will have label B it will match the installed datapath
>> flow (because it doesn't match on the label) and the SRv6 header with
>> label A will be pushed.
>>
>> Matching on the actual fields used to produce the action is necessary.
>> However, there is a workaround.  That is to generate these fileds
>> on a header push, and not on the header build.
>>
>> See for example the netdev_tnl_push_udp_header() function.  It is updating
>> the UDP source port on header push.  IIUC, this is for the same reason
>> the SRv6 flow label is updated - to ensure load balancing of UDP flows
>> in tunnels.  The way update is done is by using packet's RSS hash.
>> See the netdev_tnl_get_src_port().  By using RSS hash we can avoid
>> re-parsing and re-hashing of the packet.  And packets from the same session
>> should normally have the same RSS hash, so that works.  And we don't need
>> to match on any extra fileds, because the hash of actual packet is used
>> while pushing the header.
>>
>> Another example is netdev_gre_push_header() where we update a sequence
>> number while pushing a header (doesn't look thread-safe though :) ).
>>
>> So, we should probbaly just use RSS hash as a flow label for SRv6 and
>> set it while pushing SRv6 header (in netdev_tnl_push_ip_header()) in the
>> COMPUTE case and parse out and use the ipv6_label of the current packet
>> in case of COPY.  The actual packet is available in the header push
>> context.  And kernel's seg6_make_flowlabel() seems to do a very similar
>> thing.  Full packet parsing will not be good for performance reasons.
>>
>> Does that make sense?
>>
> 
> Thanks very much for your kind explanation.
> It makes sense to me.
> 
> I would like to add a test for the flowlabel for the second and
> subsequent packets in the packet capture by specifying options:pcap
> for the dummy port and sending the packet with "ovs-appctl netdev-dummy/receive".
> I have tried this and it seems to work.

Just make sure that you're not checking the exact hash value.
Hashing implementations depend on compiler flags and CPU
architecture.  So, the exact value of the flowlabel will be
different depending on these factors.  For example, OVS built
with -msse4.2 will use special CPU instructions for crc hash
calculations.  Without this flag it will use implementation
of Murmurhash.

You may find one test example for the UDP port number being
carved out in tests/tunnel-push-pop.at by looking for
"Source port is based on a packet hash".

So, you may extract the flow label from one packet and compare
with the other, but you can't directly compare with some
pre-computed value.

> 
> Thanks also for the UDP and GRE examples. I now understand that both
> rewrite fields at the time of header push.
> 
> Since header push is performed in the context of the actual packet,
> I assume that header parsing and hashing is done for all packets.
> For performance reasons, it makes sense to avoid parsing and hashing
> and instead use the RSS hash. I would replace it with RSS hash.
> And it can probably be retrieved with dp_packet_get_rss_hash().

Right.  That's what netdev_tnl_get_src_port() does, for example.

> 
>>>      srh->rt_hdr.segments_left = nr_segs - 1;
>>>      srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>>>      srh->rt_hdr.hdrlen = 2 * nr_segs;
>>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>>> index 663ee8606c3b..2141621cf23e 100644
>>> --- a/lib/netdev-vport.c
>>> +++ b/lib/netdev-vport.c
>>> @@ -792,6 +792,14 @@ set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
>>>                                name, node->value);
>>>                  break;
>>>              }
>>> +        } else if (!strcmp(node->key, "srv6_flowlabel")) {
>>> +            if (!strcmp(node->value, "zero")) {
>>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
>>> +            } else if (!strcmp(node->value, "compute")) {
>>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
>>> +            } else {
>>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
>>> +            }
>>>          } else if (!strcmp(node->key, "payload_type")) {
>>>              if (!strcmp(node->value, "mpls")) {
>>>                   tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
>>> diff --git a/lib/netdev.h b/lib/netdev.h
>>> index ff207f56c28c..743a56ca1629 100644
>>> --- a/lib/netdev.h
>>> +++ b/lib/netdev.h
>>> @@ -97,6 +97,17 @@ enum netdev_pt_mode {
>>>      NETDEV_PT_LEGACY_L3,
>>>  };
>>>  
>>> +enum netdev_srv6_flowlabel {
>>> +    /* Copy the flowlabel from inner packet. */
>>> +    SRV6_FLOWLABEL_COPY,
>>> +
>>> +    /* Simply set flowlabel to 0. */
>>> +    SRV6_FLOWLABEL_ZERO,
>>> +
>>> +    /* Calculate a hash for some fields and set the result to flowlabel. */
>>> +    SRV6_FLOWLABEL_COMPUTE,
>>> +};
>>> +
>>>  /* Configuration specific to tunnels. */
>>>  struct netdev_tunnel_config {
>>>      ovs_be64 in_key;
>>> @@ -144,6 +155,7 @@ struct netdev_tunnel_config {
>>>      uint8_t srv6_num_segs;
>>>      #define SRV6_MAX_SEGS 6
>>>      struct in6_addr srv6_segs[SRV6_MAX_SEGS];
>>> +    enum netdev_srv6_flowlabel srv6_flowlabel;
>>>  };
>>>  
>>>  void netdev_run(void);
>>> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
>>> index e300fe3a0d26..33edc8319eed 100644
>>> --- a/tests/tunnel-push-pop-ipv6.at
>>> +++ b/tests/tunnel-push-pop-ipv6.at
>>> @@ -1,5 +1,84 @@
>>>  AT_BANNER([tunnel_push_pop_ipv6])
>>>  
>>> +AT_SETUP([tunnel_push_pop_ipv6 - srv6])
>>> +
>>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>> +AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1 datapath_type=dummy], [0])
>>> +AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2 datapath_type=dummy], [0])
>>> +AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3 datapath_type=dummy], [0])
>>> +AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
>>> +                       options:remote_ip=2001:cafe::91 ofport_request=2 \
>>> +                       options:srv6_flowlabel=copy \
>>> +                       ], [0])
>>> +AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
>>> +                       options:remote_ip=2001:cafe::92 ofport_request=3 \
>>> +                       options:srv6_flowlabel=zero \
>>> +                       ], [0])
>>> +AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
>>> +                       options:remote_ip=2001:cafe::93 ofport_request=4 \
>>> +                       options:srv6_flowlabel=compute \
>>> +                       ], [0])
>>> +
>>> +dnl First setup dummy interface IP address, then add the route
>>> +dnl so that tnl-port table can get valid IP address for the device.
>>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
>>> +])
>>> +AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
>>> +])
>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01], [0], [OK
>>> +])
>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02], [0], [OK
>>> +])
>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03], [0], [OK
>>> +])
>>> +AT_CHECK([ovs-ofctl add-flow br0 action=1])
>>> +AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
>>> +AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
>>> +AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
>>> +
>>> +dnl Check "srv6_flowlabel=copy".
>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>> +cat stdout
>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>> +label=12345
>>> +])
>>> +
>>> +dnl Check "srv6_flowlabel=zero".
>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>> +label=0
>>> +])
>>> +
>>> +dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'], [0], [stdout])
>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>> +label=944785
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'], [0], [stdout])
>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>> +label=772289
>>> +])
>>> +
>>> +dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>> +label=468935
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>> +label=1012207
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>> +label=629290
>>> +])
>>> +
>>> +OVS_VSWITCHD_STOP
>>> +AT_CLEANUP
>>> +
>>>  AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
>>>  
>>>  OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index edb5eafa04c3..7a0682503233 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -3287,6 +3287,35 @@
>>>             <ref column="options" key="remote_ip"/>.
>>>          </p>
>>>        </column>
>>> +      <column name="options" key="srv6_flowlabel"
>>> +              type='{"type": "string",
>>> +                     "enum": ["set", ["zero", "copy", "compute"]]}'>
>>> +        <p>
>>> +          Optional.
>>> +          This option controls how flowlabel in outer IPv6 header is
>>> +          configured. This would give you the benefit of IPv6 flow label
>>
>> We should avoid pronouns like 'you' in the documentation.  It should describe
>> the functionality without trying to talk to the reader.  E.g. 'It gives'
>> instead of 'it will give you'.
>>
> 
> OK. I'll fix.
> 
>>> +          based load balancing, which is supported by some popular vendor
>>> +          appliances. You can choose from the following three types, as
>>
>> This also should be somehow re-worded to not have 'You'.  If nothing works,
>> can just be replaced with 'Users'.
>>
> 
> OK.
> 
>>> +          in net.ipv6.seg6_flowlabel syscall.
>>
>> s/syscall/sysconfig/
>>
> 
> OK.
> 
>>> +        </p>
>>> +        <ul>
>>> +          <li>
>>> +            By default, or if this option is <code>copy</code>, copy the
>>> +            flowlabel of inner IPv6 header to the flowlabel of outer IPv6
>>> +            header. If inner header is not IPv6, it is set to 0.
>>> +          </li>
>>> +          <li>
>>> +            If this option is <code>zero</code>, simply set flowlabel to 0.
>>> +          </li>
>>> +          <li>
>>> +            If this option is <code>compute</code>, calculate a hash for
>>> +            some fields in inner header and set the result to flowlabel.
>>
>> 'some fields' is a bit strange thing to say.  "hash over fields of the inner
>> header" or smething like that.  Same applie to the comment in the code.
>>
> 
> OK.
> 
>>> +            If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
>>> +            flowlabel are the targets of hash calculation. If it is IPv4,
>>> +            src_ip, dst_ip, L4 proto, src_port, and dst_port are the targets.
>>
>> I'm not sure if we can actually say which fields are going to be used
>> if we go with RSS hash.
>>
> 
> I will only mention that RSS hash is used, not field names.
> 
>>> +          </li>
>>> +        </ul>
>>> +      </column>
>>>      </group>
>>>  
>>>      <group title="Patch Options">
>>
Nobuhiro MIKI May 12, 2023, 1:46 a.m. UTC | #4
On 2023/05/11 20:50, Ilya Maximets wrote:
> On 5/11/23 12:39, Nobuhiro MIKI wrote:
>> On 2023/05/11 6:33, Ilya Maximets wrote:
>>> On 5/9/23 11:38, Nobuhiro MIKI wrote:
>>>> It supports flowlabel based load balancing by controlling the flowlabel
>>>> of outer IPv6 header, which is already implemented in Linux kernel as
>>>> seg6_flowlabel sysctl [1].
>>>>
>>>> [1]: https://docs.kernel.org/networking/seg6-sysctl.html
>>>>
>>>> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
>>>> ---
>>>
>>> Hi.  Thanks for the patch!  See some comments inline.
>>>
>>
>> Hi Ilya,
>>
>> Thanks for your review!
>>
>>>
>>>>  lib/flow.c                    | 24 +++++++++++
>>>>  lib/flow.h                    |  1 +
>>>>  lib/netdev-native-tnl.c       | 22 +++++++++-
>>>>  lib/netdev-vport.c            |  8 ++++
>>>>  lib/netdev.h                  | 12 ++++++
>>>>  tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
>>>>  vswitchd/vswitch.xml          | 29 +++++++++++++
>>>>  7 files changed, 174 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/flow.c b/lib/flow.c
>>>> index 9501a259e9d4..f27ec4795bc7 100644
>>>> --- a/lib/flow.c
>>>> +++ b/lib/flow.c
>>>> @@ -2734,6 +2734,30 @@ flow_hash_in_wildcards(const struct flow *flow,
>>>>      return hash_finish(hash, 8 * FLOW_U64S);
>>>>  }
>>>>  
>>>> +uint32_t
>>>> +flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
>>>> +{
>>>> +    uint32_t hash = basis;
>>>> +
>>>> +    if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>>>> +        const uint64_t *flow_u64 = (const uint64_t *) flow;
>>>> +        int ofs = offsetof(struct flow, ipv6_src) / 8;
>>>> +        int end = ofs + 2 * sizeof flow->ipv6_src / 8;
>>>> +
>>>> +        for (;ofs < end; ofs++) {
>>>> +            hash = hash_add64(hash, flow_u64[ofs]);
>>>> +        }
>>>> +
>>>> +        hash = hash_add(hash, flow->nw_proto);
>>>> +        hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
>>>> +    } else if (flow->dl_type == htons(ETH_TYPE_IP)
>>>> +               || flow->dl_type == htons(ETH_TYPE_ARP)) {
>>>> +        hash = flow_hash_5tuple(flow, basis);
>>>> +    }
>>>> +
>>>> +    return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number. */
>>>> +}
>>>
>>> Is it necessary to use a custom hash function for this?
>>> I guess, above can be replaced with a call to flow_hash_5tuple using the
>>> ipv6_label as a basis.  OTOH, the seg6_make_flowlabel() in the kernel
>>> just using the skb hash which likely doesn't include label and even just
>>> a random number of locally originated traffic, so maybe mixing in the
>>> label is not really necessary?
>>>
>>> See comments below though.
>>>
>>
>> I also did a quick check seg6_make_flowlabel() in the kernel.
>> It calls skb_get_hash() and flowlabel seems to not be included.
>> I guess I was mistaken. Sorry. 
>>
>> Af far as I know, the purpose is load balancing, so I feel 5tuple
>> hash is sufficient. And there is SRV6_FLOWLABEL_COPY if someone want
>> flowlabel based hash instead 5tuple hash.
>>
>>>> +
>>>>  /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted as an
>>>>   * OpenFlow 1.0 "dl_vlan" value:
>>>>   *
>>>> diff --git a/lib/flow.h b/lib/flow.h
>>>> index a9d026e1ce3b..7b8ef5164465 100644
>>>> --- a/lib/flow.h
>>>> +++ b/lib/flow.h
>>>> @@ -258,6 +258,7 @@ bool flow_hash_fields_valid(enum nx_hash_fields);
>>>>  uint32_t flow_hash_in_wildcards(const struct flow *,
>>>>                                  const struct flow_wildcards *,
>>>>                                  uint32_t basis);
>>>> +uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
>>>>  
>>>>  bool flow_equal_except(const struct flow *a, const struct flow *b,
>>>>                         const struct flow_wildcards *);
>>>> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
>>>> index 55e1bd567fa1..18bd9df57175 100644
>>>> --- a/lib/netdev-native-tnl.c
>>>> +++ b/lib/netdev-native-tnl.c
>>>> @@ -856,6 +856,7 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>>>      struct netdev_tunnel_config *tnl_cfg;
>>>>      const struct in6_addr *segs;
>>>>      struct srv6_base_hdr *srh;
>>>> +    uint32_t ipv6_label = 0;
>>>>      struct in6_addr *s;
>>>>      ovs_be16 dl_type;
>>>>      int err = 0;
>>>> @@ -882,7 +883,26 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>>>          goto out;
>>>>      }
>>>>  
>>>> -    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
>>>> +    switch (tnl_cfg->srv6_flowlabel) {
>>>> +    case SRV6_FLOWLABEL_COPY:
>>>> +        ipv6_label = ntohl(params->flow->ipv6_label);
>>>> +        break;
>>>> +
>>>> +    case SRV6_FLOWLABEL_ZERO:
>>>> +        ipv6_label = 0;
>>>> +        break;
>>>> +
>>>> +    case SRV6_FLOWLABEL_COMPUTE:
>>>> +        ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        err = EINVAL;
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
>>>> +                                     ipv6_label);
>>>
>>> We can't really do that on the header build stage.  The main reason
>>> is that we're building the header based on the packet information.
>>> In case of COPY we're using the label from the original packaet, in case
>>> of COMPUTE we're using majority of the packet header to produce the value.
>>> But if we're using packet fields, we have to add these fields to the
>>> match criteria.  And that is not happening here.
>>>
>>> I didn't try, but I'm sure that if you'll look at results of ofproto/trace
>>> calls in your tests below, you'll see that even though produced actions
>>> are different, the match criteria is exactly the same.  That means that
>>> when the first packet with label A will hit the datapath, the flow that
>>> pushes SRv6 header with flow label A will be added to the datapath.
>>> If the next packet will have label B it will match the installed datapath
>>> flow (because it doesn't match on the label) and the SRv6 header with
>>> label A will be pushed.
>>>
>>> Matching on the actual fields used to produce the action is necessary.
>>> However, there is a workaround.  That is to generate these fileds
>>> on a header push, and not on the header build.
>>>
>>> See for example the netdev_tnl_push_udp_header() function.  It is updating
>>> the UDP source port on header push.  IIUC, this is for the same reason
>>> the SRv6 flow label is updated - to ensure load balancing of UDP flows
>>> in tunnels.  The way update is done is by using packet's RSS hash.
>>> See the netdev_tnl_get_src_port().  By using RSS hash we can avoid
>>> re-parsing and re-hashing of the packet.  And packets from the same session
>>> should normally have the same RSS hash, so that works.  And we don't need
>>> to match on any extra fileds, because the hash of actual packet is used
>>> while pushing the header.
>>>
>>> Another example is netdev_gre_push_header() where we update a sequence
>>> number while pushing a header (doesn't look thread-safe though :) ).
>>>
>>> So, we should probbaly just use RSS hash as a flow label for SRv6 and
>>> set it while pushing SRv6 header (in netdev_tnl_push_ip_header()) in the
>>> COMPUTE case and parse out and use the ipv6_label of the current packet
>>> in case of COPY.  The actual packet is available in the header push
>>> context.  And kernel's seg6_make_flowlabel() seems to do a very similar
>>> thing.  Full packet parsing will not be good for performance reasons.
>>>
>>> Does that make sense?
>>>
>>
>> Thanks very much for your kind explanation.
>> It makes sense to me.
>>
>> I would like to add a test for the flowlabel for the second and
>> subsequent packets in the packet capture by specifying options:pcap
>> for the dummy port and sending the packet with "ovs-appctl netdev-dummy/receive".
>> I have tried this and it seems to work.
> 
> Just make sure that you're not checking the exact hash value.
> Hashing implementations depend on compiler flags and CPU
> architecture.  So, the exact value of the flowlabel will be
> different depending on these factors.  For example, OVS built
> with -msse4.2 will use special CPU instructions for crc hash
> calculations.  Without this flag it will use implementation
> of Murmurhash.
> 
> You may find one test example for the UDP port number being
> carved out in tests/tunnel-push-pop.at by looking for
> "Source port is based on a packet hash".
> 
> So, you may extract the flow label from one packet and compare
> with the other, but you can't directly compare with some
> pre-computed value.
> 

Thanks for the explanation.
Instead of using pre-computed values, hash values carved out of
several packets should be compared at runtime.

>>
>> Thanks also for the UDP and GRE examples. I now understand that both
>> rewrite fields at the time of header push.
>>
>> Since header push is performed in the context of the actual packet,
>> I assume that header parsing and hashing is done for all packets.
>> For performance reasons, it makes sense to avoid parsing and hashing
>> and instead use the RSS hash. I would replace it with RSS hash.
>> And it can probably be retrieved with dp_packet_get_rss_hash().
> 
> Right.  That's what netdev_tnl_get_src_port() does, for example.
> 

I'll implement flowlabel hash as well.
Again, thanks for your review. I'll start working on v4.

>>
>>>>      srh->rt_hdr.segments_left = nr_segs - 1;
>>>>      srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>>>>      srh->rt_hdr.hdrlen = 2 * nr_segs;
>>>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>>>> index 663ee8606c3b..2141621cf23e 100644
>>>> --- a/lib/netdev-vport.c
>>>> +++ b/lib/netdev-vport.c
>>>> @@ -792,6 +792,14 @@ set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
>>>>                                name, node->value);
>>>>                  break;
>>>>              }
>>>> +        } else if (!strcmp(node->key, "srv6_flowlabel")) {
>>>> +            if (!strcmp(node->value, "zero")) {
>>>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
>>>> +            } else if (!strcmp(node->value, "compute")) {
>>>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
>>>> +            } else {
>>>> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
>>>> +            }
>>>>          } else if (!strcmp(node->key, "payload_type")) {
>>>>              if (!strcmp(node->value, "mpls")) {
>>>>                   tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
>>>> diff --git a/lib/netdev.h b/lib/netdev.h
>>>> index ff207f56c28c..743a56ca1629 100644
>>>> --- a/lib/netdev.h
>>>> +++ b/lib/netdev.h
>>>> @@ -97,6 +97,17 @@ enum netdev_pt_mode {
>>>>      NETDEV_PT_LEGACY_L3,
>>>>  };
>>>>  
>>>> +enum netdev_srv6_flowlabel {
>>>> +    /* Copy the flowlabel from inner packet. */
>>>> +    SRV6_FLOWLABEL_COPY,
>>>> +
>>>> +    /* Simply set flowlabel to 0. */
>>>> +    SRV6_FLOWLABEL_ZERO,
>>>> +
>>>> +    /* Calculate a hash for some fields and set the result to flowlabel. */
>>>> +    SRV6_FLOWLABEL_COMPUTE,
>>>> +};
>>>> +
>>>>  /* Configuration specific to tunnels. */
>>>>  struct netdev_tunnel_config {
>>>>      ovs_be64 in_key;
>>>> @@ -144,6 +155,7 @@ struct netdev_tunnel_config {
>>>>      uint8_t srv6_num_segs;
>>>>      #define SRV6_MAX_SEGS 6
>>>>      struct in6_addr srv6_segs[SRV6_MAX_SEGS];
>>>> +    enum netdev_srv6_flowlabel srv6_flowlabel;
>>>>  };
>>>>  
>>>>  void netdev_run(void);
>>>> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
>>>> index e300fe3a0d26..33edc8319eed 100644
>>>> --- a/tests/tunnel-push-pop-ipv6.at
>>>> +++ b/tests/tunnel-push-pop-ipv6.at
>>>> @@ -1,5 +1,84 @@
>>>>  AT_BANNER([tunnel_push_pop_ipv6])
>>>>  
>>>> +AT_SETUP([tunnel_push_pop_ipv6 - srv6])
>>>> +
>>>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>>> +AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1 datapath_type=dummy], [0])
>>>> +AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2 datapath_type=dummy], [0])
>>>> +AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3 datapath_type=dummy], [0])
>>>> +AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
>>>> +                       options:remote_ip=2001:cafe::91 ofport_request=2 \
>>>> +                       options:srv6_flowlabel=copy \
>>>> +                       ], [0])
>>>> +AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
>>>> +                       options:remote_ip=2001:cafe::92 ofport_request=3 \
>>>> +                       options:srv6_flowlabel=zero \
>>>> +                       ], [0])
>>>> +AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
>>>> +                       options:remote_ip=2001:cafe::93 ofport_request=4 \
>>>> +                       options:srv6_flowlabel=compute \
>>>> +                       ], [0])
>>>> +
>>>> +dnl First setup dummy interface IP address, then add the route
>>>> +dnl so that tnl-port table can get valid IP address for the device.
>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01], [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02], [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03], [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-ofctl add-flow br0 action=1])
>>>> +AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
>>>> +AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
>>>> +AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
>>>> +
>>>> +dnl Check "srv6_flowlabel=copy".
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>>> +cat stdout
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=12345
>>>> +])
>>>> +
>>>> +dnl Check "srv6_flowlabel=zero".
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=0
>>>> +])
>>>> +
>>>> +dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'], [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=944785
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'], [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=772289
>>>> +])
>>>> +
>>>> +dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=468935
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=1012207
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=629290
>>>> +])
>>>> +
>>>> +OVS_VSWITCHD_STOP
>>>> +AT_CLEANUP
>>>> +
>>>>  AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
>>>>  
>>>>  OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>>> index edb5eafa04c3..7a0682503233 100644
>>>> --- a/vswitchd/vswitch.xml
>>>> +++ b/vswitchd/vswitch.xml
>>>> @@ -3287,6 +3287,35 @@
>>>>             <ref column="options" key="remote_ip"/>.
>>>>          </p>
>>>>        </column>
>>>> +      <column name="options" key="srv6_flowlabel"
>>>> +              type='{"type": "string",
>>>> +                     "enum": ["set", ["zero", "copy", "compute"]]}'>
>>>> +        <p>
>>>> +          Optional.
>>>> +          This option controls how flowlabel in outer IPv6 header is
>>>> +          configured. This would give you the benefit of IPv6 flow label
>>>
>>> We should avoid pronouns like 'you' in the documentation.  It should describe
>>> the functionality without trying to talk to the reader.  E.g. 'It gives'
>>> instead of 'it will give you'.
>>>
>>
>> OK. I'll fix.
>>
>>>> +          based load balancing, which is supported by some popular vendor
>>>> +          appliances. You can choose from the following three types, as
>>>
>>> This also should be somehow re-worded to not have 'You'.  If nothing works,
>>> can just be replaced with 'Users'.
>>>
>>
>> OK.
>>
>>>> +          in net.ipv6.seg6_flowlabel syscall.
>>>
>>> s/syscall/sysconfig/
>>>
>>
>> OK.
>>
>>>> +        </p>
>>>> +        <ul>
>>>> +          <li>
>>>> +            By default, or if this option is <code>copy</code>, copy the
>>>> +            flowlabel of inner IPv6 header to the flowlabel of outer IPv6
>>>> +            header. If inner header is not IPv6, it is set to 0.
>>>> +          </li>
>>>> +          <li>
>>>> +            If this option is <code>zero</code>, simply set flowlabel to 0.
>>>> +          </li>
>>>> +          <li>
>>>> +            If this option is <code>compute</code>, calculate a hash for
>>>> +            some fields in inner header and set the result to flowlabel.
>>>
>>> 'some fields' is a bit strange thing to say.  "hash over fields of the inner
>>> header" or smething like that.  Same applie to the comment in the code.
>>>
>>
>> OK.
>>
>>>> +            If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
>>>> +            flowlabel are the targets of hash calculation. If it is IPv4,
>>>> +            src_ip, dst_ip, L4 proto, src_port, and dst_port are the targets.
>>>
>>> I'm not sure if we can actually say which fields are going to be used
>>> if we go with RSS hash.
>>>
>>
>> I will only mention that RSS hash is used, not field names.
>>
>>>> +          </li>
>>>> +        </ul>
>>>> +      </column>
>>>>      </group>
>>>>  
>>>>      <group title="Patch Options">
>>>
>
diff mbox series

Patch

diff --git a/lib/flow.c b/lib/flow.c
index 9501a259e9d4..f27ec4795bc7 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -2734,6 +2734,30 @@  flow_hash_in_wildcards(const struct flow *flow,
     return hash_finish(hash, 8 * FLOW_U64S);
 }
 
+uint32_t
+flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
+{
+    uint32_t hash = basis;
+
+    if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
+        const uint64_t *flow_u64 = (const uint64_t *) flow;
+        int ofs = offsetof(struct flow, ipv6_src) / 8;
+        int end = ofs + 2 * sizeof flow->ipv6_src / 8;
+
+        for (;ofs < end; ofs++) {
+            hash = hash_add64(hash, flow_u64[ofs]);
+        }
+
+        hash = hash_add(hash, flow->nw_proto);
+        hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
+    } else if (flow->dl_type == htons(ETH_TYPE_IP)
+               || flow->dl_type == htons(ETH_TYPE_ARP)) {
+        hash = flow_hash_5tuple(flow, basis);
+    }
+
+    return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number. */
+}
+
 /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted as an
  * OpenFlow 1.0 "dl_vlan" value:
  *
diff --git a/lib/flow.h b/lib/flow.h
index a9d026e1ce3b..7b8ef5164465 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -258,6 +258,7 @@  bool flow_hash_fields_valid(enum nx_hash_fields);
 uint32_t flow_hash_in_wildcards(const struct flow *,
                                 const struct flow_wildcards *,
                                 uint32_t basis);
+uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
 
 bool flow_equal_except(const struct flow *a, const struct flow *b,
                        const struct flow_wildcards *);
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 55e1bd567fa1..18bd9df57175 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -856,6 +856,7 @@  netdev_srv6_build_header(const struct netdev *netdev,
     struct netdev_tunnel_config *tnl_cfg;
     const struct in6_addr *segs;
     struct srv6_base_hdr *srh;
+    uint32_t ipv6_label = 0;
     struct in6_addr *s;
     ovs_be16 dl_type;
     int err = 0;
@@ -882,7 +883,26 @@  netdev_srv6_build_header(const struct netdev *netdev,
         goto out;
     }
 
-    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
+    switch (tnl_cfg->srv6_flowlabel) {
+    case SRV6_FLOWLABEL_COPY:
+        ipv6_label = ntohl(params->flow->ipv6_label);
+        break;
+
+    case SRV6_FLOWLABEL_ZERO:
+        ipv6_label = 0;
+        break;
+
+    case SRV6_FLOWLABEL_COMPUTE:
+        ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
+        break;
+
+    default:
+        err = EINVAL;
+        goto out;
+    }
+
+    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
+                                     ipv6_label);
     srh->rt_hdr.segments_left = nr_segs - 1;
     srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
     srh->rt_hdr.hdrlen = 2 * nr_segs;
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 663ee8606c3b..2141621cf23e 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -792,6 +792,14 @@  set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
                               name, node->value);
                 break;
             }
+        } else if (!strcmp(node->key, "srv6_flowlabel")) {
+            if (!strcmp(node->value, "zero")) {
+                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
+            } else if (!strcmp(node->value, "compute")) {
+                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
+            } else {
+                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
+            }
         } else if (!strcmp(node->key, "payload_type")) {
             if (!strcmp(node->value, "mpls")) {
                  tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
diff --git a/lib/netdev.h b/lib/netdev.h
index ff207f56c28c..743a56ca1629 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -97,6 +97,17 @@  enum netdev_pt_mode {
     NETDEV_PT_LEGACY_L3,
 };
 
+enum netdev_srv6_flowlabel {
+    /* Copy the flowlabel from inner packet. */
+    SRV6_FLOWLABEL_COPY,
+
+    /* Simply set flowlabel to 0. */
+    SRV6_FLOWLABEL_ZERO,
+
+    /* Calculate a hash for some fields and set the result to flowlabel. */
+    SRV6_FLOWLABEL_COMPUTE,
+};
+
 /* Configuration specific to tunnels. */
 struct netdev_tunnel_config {
     ovs_be64 in_key;
@@ -144,6 +155,7 @@  struct netdev_tunnel_config {
     uint8_t srv6_num_segs;
     #define SRV6_MAX_SEGS 6
     struct in6_addr srv6_segs[SRV6_MAX_SEGS];
+    enum netdev_srv6_flowlabel srv6_flowlabel;
 };
 
 void netdev_run(void);
diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
index e300fe3a0d26..33edc8319eed 100644
--- a/tests/tunnel-push-pop-ipv6.at
+++ b/tests/tunnel-push-pop-ipv6.at
@@ -1,5 +1,84 @@ 
 AT_BANNER([tunnel_push_pop_ipv6])
 
+AT_SETUP([tunnel_push_pop_ipv6 - srv6])
+
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
+AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1 datapath_type=dummy], [0])
+AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2 datapath_type=dummy], [0])
+AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3 datapath_type=dummy], [0])
+AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
+                       options:remote_ip=2001:cafe::91 ofport_request=2 \
+                       options:srv6_flowlabel=copy \
+                       ], [0])
+AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
+                       options:remote_ip=2001:cafe::92 ofport_request=3 \
+                       options:srv6_flowlabel=zero \
+                       ], [0])
+AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
+                       options:remote_ip=2001:cafe::93 ofport_request=4 \
+                       options:srv6_flowlabel=compute \
+                       ], [0])
+
+dnl First setup dummy interface IP address, then add the route
+dnl so that tnl-port table can get valid IP address for the device.
+AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01], [0], [OK
+])
+AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02], [0], [OK
+])
+AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03], [0], [OK
+])
+AT_CHECK([ovs-ofctl add-flow br0 action=1])
+AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
+AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
+AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
+
+dnl Check "srv6_flowlabel=copy".
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
+cat stdout
+AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
+label=12345
+])
+
+dnl Check "srv6_flowlabel=zero".
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
+AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
+label=0
+])
+
+dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'], [0], [stdout])
+AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
+label=944785
+])
+
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'], [0], [stdout])
+AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
+label=772289
+])
+
+dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
+AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
+label=468935
+])
+
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
+AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
+label=1012207
+])
+
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
+AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
+label=629290
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
 
 OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index edb5eafa04c3..7a0682503233 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3287,6 +3287,35 @@ 
            <ref column="options" key="remote_ip"/>.
         </p>
       </column>
+      <column name="options" key="srv6_flowlabel"
+              type='{"type": "string",
+                     "enum": ["set", ["zero", "copy", "compute"]]}'>
+        <p>
+          Optional.
+          This option controls how flowlabel in outer IPv6 header is
+          configured. This would give you the benefit of IPv6 flow label
+          based load balancing, which is supported by some popular vendor
+          appliances. You can choose from the following three types, as
+          in net.ipv6.seg6_flowlabel syscall.
+        </p>
+        <ul>
+          <li>
+            By default, or if this option is <code>copy</code>, copy the
+            flowlabel of inner IPv6 header to the flowlabel of outer IPv6
+            header. If inner header is not IPv6, it is set to 0.
+          </li>
+          <li>
+            If this option is <code>zero</code>, simply set flowlabel to 0.
+          </li>
+          <li>
+            If this option is <code>compute</code>, calculate a hash for
+            some fields in inner header and set the result to flowlabel.
+            If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
+            flowlabel are the targets of hash calculation. If it is IPv4,
+            src_ip, dst_ip, L4 proto, src_port, and dst_port are the targets.
+          </li>
+        </ul>
+      </column>
     </group>
 
     <group title="Patch Options">