diff mbox series

[ovs-dev,v9,4/5] userspace: Add SRv6 tunnel support.

Message ID 20230315060725.61286-5-nmiki@yahoo-corp.jp
State Changes Requested
Headers show
Series userspace: Add SRv6 tunnel support. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Nobuhiro MIKI March 15, 2023, 6:07 a.m. UTC
SRv6 (Segment Routing IPv6) tunnel vport is responsible
for encapsulation and decapsulation the inner packets with
IPv6 header and an extended header called SRH
(Segment Routing Header). See spec in:

https://datatracker.ietf.org/doc/html/rfc8754

This patch implements SRv6 tunneling in userspace datapath.
It uses `remote_ip` and `local_ip` options as with existing
tunnel protocols. It also adds a dedicated `srv6_segs` option
to define a sequence of routers called segment list.

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
---
 Documentation/faq/configuration.rst |  21 ++++
 Documentation/faq/releases.rst      |   1 +
 NEWS                                |   2 +
 include/linux/openvswitch.h         |   1 +
 lib/dpif-netlink-rtnl.c             |   5 +
 lib/dpif-netlink.c                  |   5 +
 lib/netdev-native-tnl.c             | 145 ++++++++++++++++++++++++++++
 lib/netdev-native-tnl.h             |  10 ++
 lib/netdev-vport.c                  |  53 ++++++++++
 lib/netdev.h                        |   4 +
 lib/packets.h                       |  11 +++
 lib/tnl-ports.c                     |   6 +-
 ofproto/ofproto-dpif-xlate.c        |   3 +
 tests/system-kmod-macros.at         |   8 ++
 tests/system-traffic.at             | 119 +++++++++++++++++++++++
 tests/system-userspace-macros.at    |   6 ++
 tests/tunnel.at                     |  56 +++++++++++
 17 files changed, 455 insertions(+), 1 deletion(-)

Comments

Ilya Maximets March 21, 2023, 11:15 p.m. UTC | #1
On 3/15/23 07:07, Nobuhiro MIKI wrote:
> SRv6 (Segment Routing IPv6) tunnel vport is responsible
> for encapsulation and decapsulation the inner packets with
> IPv6 header and an extended header called SRH
> (Segment Routing Header). See spec in:
> 
> https://datatracker.ietf.org/doc/html/rfc8754
> 
> This patch implements SRv6 tunneling in userspace datapath.
> It uses `remote_ip` and `local_ip` options as with existing
> tunnel protocols. It also adds a dedicated `srv6_segs` option
> to define a sequence of routers called segment list.
> 
> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
> ---
>  Documentation/faq/configuration.rst |  21 ++++
>  Documentation/faq/releases.rst      |   1 +
>  NEWS                                |   2 +
>  include/linux/openvswitch.h         |   1 +
>  lib/dpif-netlink-rtnl.c             |   5 +
>  lib/dpif-netlink.c                  |   5 +
>  lib/netdev-native-tnl.c             | 145 ++++++++++++++++++++++++++++
>  lib/netdev-native-tnl.h             |  10 ++
>  lib/netdev-vport.c                  |  53 ++++++++++
>  lib/netdev.h                        |   4 +
>  lib/packets.h                       |  11 +++
>  lib/tnl-ports.c                     |   6 +-
>  ofproto/ofproto-dpif-xlate.c        |   3 +
>  tests/system-kmod-macros.at         |   8 ++
>  tests/system-traffic.at             | 119 +++++++++++++++++++++++
>  tests/system-userspace-macros.at    |   6 ++
>  tests/tunnel.at                     |  56 +++++++++++
>  17 files changed, 455 insertions(+), 1 deletion(-)

Thanks for the new version!  The code looks good in general.
See some small comments inline.

Best regards, Ilya Maximets.

> 
> diff --git a/Documentation/faq/configuration.rst b/Documentation/faq/configuration.rst
> index dc6c92446f98..11b1c7e826f3 100644
> --- a/Documentation/faq/configuration.rst
> +++ b/Documentation/faq/configuration.rst
> @@ -238,6 +238,27 @@ Q: Does Open vSwitch support GTP-U?
>                  set int gtpu0 type=gtpu options:key=<teid> \
>                  options:remote_ip=172.31.1.1
>  
> +Q: Does Open vSwitch support SRv6?
> +
> +    A: Yes. Starting with version 3.2, the Open vSwitch userspace
> +    datapath supports SRv6 (Segment Routing over IPv6). The following
> +    example shows tunneling to fc00:300::1 via fc00:100::1 and fc00:200::1.
> +    In the current implementation, if "IPv6 in IPv6" or "IPv4 in IPv6" packets
> +    are routed to this interface, and these packets are not SRv6 packets, they
> +    may be dropped, so be careful in workloads with a mix of these tunnels.
> +    Also note the following restrictions:
> +
> +    * Segment list length is limited to 6.
> +    * SRv6 packets with other than segments_left = 0 are simply dropped.
> +
> +    ::
> +
> +        $ ovs-vsctl add-br br0
> +        $ ovs-vsctl add-port br0 srv6_0 -- \
> +                set int srv6_0 type=srv6  \
> +                options:remote_ip=fc00:300::1 \
> +                options:srv6_segs="fc00:100::1,fc00:200::1,fc00:300::1"
> +
>  Q: How do I connect two bridges?
>  
>      A: First, why do you want to do this?  Two connected bridges are not much
> diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
> index 9e1b42262000..9fb679e307d9 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -151,6 +151,7 @@ Q: Are all features available with all datapaths?
>      Tunnel - ERSPAN                 4.18           2.10         2.10     NO
>      Tunnel - ERSPAN-IPv6            4.18           2.10         2.10     NO
>      Tunnel - GTP-U                  NO             NO           2.14     NO
> +    Tunnel - SRv6                   NO             NO           3.2      NO
>      Tunnel - Bareudp                5.7            NO           NO       NO
>      QoS - Policing                  YES            1.1          2.6      NO
>      QoS - Shaping                   YES            1.1          NO       NO
> diff --git a/NEWS b/NEWS
> index 72b9024e6d8a..9d2adff6cf3e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,6 +17,8 @@ Post-v3.1.0
>         in order to create OVSDB sockets with access mode of 0770.
>     - QoS:
>       * Added new configuration option 'jitter' for a linux-netem QoS type.
> +   - SRv6 Tunnel Protocol
> +     * Only support for userspace datapath.
>  
>  
>  v3.1.0 - 16 Feb 2023
> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> index bc8f74991849..e305c331516b 100644
> --- a/include/linux/openvswitch.h
> +++ b/include/linux/openvswitch.h
> @@ -254,6 +254,7 @@ enum ovs_vport_type {
>  	OVS_VPORT_TYPE_IP6GRE = 109,
>  	OVS_VPORT_TYPE_GTPU = 110,
>  	OVS_VPORT_TYPE_BAREUDP = 111,  /* Bareudp tunnel. */
> +	OVS_VPORT_TYPE_SRV6 = 112,  /* SRv6 tunnel. */
>  	__OVS_VPORT_TYPE_MAX
>  };
>  
> diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> index 4fc42daed2d9..5788294ae0d7 100644
> --- a/lib/dpif-netlink-rtnl.c
> +++ b/lib/dpif-netlink-rtnl.c
> @@ -129,6 +129,8 @@ vport_type_to_kind(enum ovs_vport_type type,
>          }
>      case OVS_VPORT_TYPE_GTPU:
>          return NULL;
> +    case OVS_VPORT_TYPE_SRV6:
> +        return "srv6";
>      case OVS_VPORT_TYPE_BAREUDP:
>          return "bareudp";
>      case OVS_VPORT_TYPE_NETDEV:
> @@ -319,6 +321,7 @@ dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg,
>      case OVS_VPORT_TYPE_LISP:
>      case OVS_VPORT_TYPE_STT:
>      case OVS_VPORT_TYPE_GTPU:
> +    case OVS_VPORT_TYPE_SRV6:
>      case OVS_VPORT_TYPE_UNSPEC:
>      case __OVS_VPORT_TYPE_MAX:
>      default:
> @@ -411,6 +414,7 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
>      case OVS_VPORT_TYPE_LISP:
>      case OVS_VPORT_TYPE_STT:
>      case OVS_VPORT_TYPE_GTPU:
> +    case OVS_VPORT_TYPE_SRV6:
>      case OVS_VPORT_TYPE_UNSPEC:
>      case __OVS_VPORT_TYPE_MAX:
>      default:
> @@ -519,6 +523,7 @@ dpif_netlink_rtnl_port_destroy(const char *name, const char *type)
>      case OVS_VPORT_TYPE_ERSPAN:
>      case OVS_VPORT_TYPE_IP6ERSPAN:
>      case OVS_VPORT_TYPE_IP6GRE:
> +    case OVS_VPORT_TYPE_SRV6:
>      case OVS_VPORT_TYPE_BAREUDP:
>          return dpif_netlink_rtnl_destroy(name);
>      case OVS_VPORT_TYPE_NETDEV:
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 7875e573e643..44da6f54c983 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -919,6 +919,9 @@ get_vport_type(const struct dpif_netlink_vport *vport)
>      case OVS_VPORT_TYPE_GTPU:
>          return "gtpu";
>  
> +    case OVS_VPORT_TYPE_SRV6:
> +        return "srv6";
> +
>      case OVS_VPORT_TYPE_BAREUDP:
>          return "bareudp";
>  
> @@ -957,6 +960,8 @@ netdev_to_ovs_vport_type(const char *type)
>          return OVS_VPORT_TYPE_GRE;
>      } else if (!strcmp(type, "gtpu")) {
>          return OVS_VPORT_TYPE_GTPU;
> +    } else if (!strcmp(type, "srv6")) {
> +        return OVS_VPORT_TYPE_SRV6;
>      } else if (!strcmp(type, "bareudp")) {
>          return OVS_VPORT_TYPE_BAREUDP;
>      } else {
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index b89dfdd52a86..b324c8d058e1 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -845,6 +845,151 @@ netdev_gtpu_build_header(const struct netdev *netdev,
>      return 0;
>  }
>  
> +static void
> +srv6_build_header(struct ovs_action_push_tnl *data,
> +                  const struct netdev_tnl_build_header_params *params,
> +                  int nr_segs, const struct in6_addr *segs)
> +{
> +    struct ovs_16aligned_ip6_hdr *nh6;
> +    struct srv6_base_hdr *srh;
> +    struct in6_addr *s;
> +    unsigned int hlen;
> +    ovs_be16 dl_type;
> +    int i;
> +
> +    ovs_assert(nr_segs > 0);
> +
> +    nh6 = (struct ovs_16aligned_ip6_hdr *) eth_build_header(data, params);
> +    put_16aligned_be32(&nh6->ip6_flow, htonl(6 << 28) |
> +                       htonl(params->flow->tunnel.ip_tos << 20));
> +    nh6->ip6_hlim = params->flow->tunnel.ip_ttl;
> +    nh6->ip6_nxt = IPPROTO_ROUTING;
> +    memcpy(&nh6->ip6_src, params->s_ip, sizeof(ovs_be32[4]));
> +    memcpy(&nh6->ip6_dst, &segs[0], sizeof(ovs_be32[4]));

We should probably use netdev_tnl_ip_build_header() here.
See below...

> +
> +    srh = (struct srv6_base_hdr *) (nh6 + 1);
> +    dl_type = params->flow->dl_type;
> +    if (dl_type == htons(ETH_TYPE_IP)) {
> +        srh->rt_hdr.nexthdr = IPPROTO_IPIP;
> +    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> +        srh->rt_hdr.nexthdr = IPPROTO_IPV6;
> +    }
> +    srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
> +    srh->rt_hdr.hdrlen = 2 * nr_segs;
> +    srh->rt_hdr.segments_left = nr_segs - 1;
> +    srh->last_entry = nr_segs - 1;
> +    srh->flags = 0;
> +    srh->tag = 0;
> +
> +    s = ALIGNED_CAST(struct in6_addr *,
> +                     (char *) srh + sizeof(struct srv6_base_hdr));
> +    for (i = 0; i < nr_segs; i++) {
> +        /* Segment list is written to the header in reverse order. */
> +        memcpy(s, &segs[nr_segs - i - 1], sizeof(ovs_be32[4]));

It should be sizeof *s instead.  We should avoid using type sizes
as per the coding style. (I know netdev_tnl_ip_build_header is not
following that, unfortunately.)

> +        s++;
> +    }
> +
> +    hlen = IPV6_HEADER_LEN + sizeof(struct srv6_base_hdr) +
> +           8 * srh->rt_hdr.hdrlen;
> +
> +    data->header_len += hlen;
> +    data->tnl_type = OVS_VPORT_TYPE_SRV6;
> +}
> +
> +int
> +netdev_srv6_build_header(const struct netdev *netdev,
> +                         struct ovs_action_push_tnl *data,
> +                         const struct netdev_tnl_build_header_params *params)
> +{
> +    struct netdev_vport *dev = netdev_vport_cast(netdev);
> +    struct netdev_tunnel_config *tnl_cfg;
> +
> +    ovs_mutex_lock(&dev->mutex);
> +    tnl_cfg = &dev->tnl_cfg;
> +
> +    if (tnl_cfg->srv6_num_segs) {
> +        srv6_build_header(data, params,
> +                          tnl_cfg->srv6_num_segs, tnl_cfg->srv6_segs);
> +    } else {
> +        /*
> +         * If explicit segment list setting is omitted, tunnel destination
> +         * is considered to be the first segment list.
> +         */
> +        srv6_build_header(data, params,
> +                          1, &params->flow->tunnel.ipv6_dst);
> +    }

Isn't it a misconfig if segs[0] != params->flow->tunnel.ipv6_dst ?
I mean, we shouldn't be sending a packet anywhere other than the
first address in the segment list, right?

I thinking that maybe we can check that tunnel.ipv6_dst equals seg[0]
and fail the header build if it's not the case.  Then we can simply
use a generic function netdev_tnl_ip_build_header() to build a base
IPv6 header for us.  Then only fill in SRv6 specific fields.

What do you think?

> +
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    return 0;
> +}
> +
> +void
> +netdev_srv6_push_header(const struct netdev *netdev OVS_UNUSED,
> +                        struct dp_packet *packet OVS_UNUSED,
> +                        const struct ovs_action_push_tnl *data OVS_UNUSED)
> +{
> +    int ip_tot_size;
> +
> +    netdev_tnl_push_ip_header(packet, data->header,
> +                              data->header_len, &ip_tot_size);
> +}
> +
> +struct dp_packet *
> +netdev_srv6_pop_header(struct dp_packet *packet)
> +{
> +    const struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
> +    size_t size = dp_packet_l3_size(packet) - IPV6_HEADER_LEN;
> +    const struct ovs_16aligned_ip6_frag *frag_hdr = NULL;
> +    const struct ip6_rt_hdr *rt_hdr = NULL;
> +    struct pkt_metadata *md = &packet->md;
> +    struct flow_tnl *tnl = &md->tunnel;
> +    uint8_t nw_proto = nh->ip6_nxt;
> +    const void *data = nh + 1;
> +    uint8_t nw_frag = 0;
> +    unsigned int hlen;
> +
> +    /*
> +     * Verifies that the routing header is present in the IPv6
> +     * extension headers and that its type is SRv6.
> +     * */
> +    if (!parse_ipv6_ext_hdrs(&data, &size, &nw_proto, &nw_frag,
> +                             &frag_hdr, &rt_hdr)) {
> +        goto err;
> +    }
> +
> +    if (!rt_hdr) {
> +        goto err;
> +    }
> +
> +    if (rt_hdr->type != IPV6_SRCRT_TYPE_4) {
> +        goto err;
> +    }
> +
> +    if (rt_hdr->segments_left > 0) {
> +        VLOG_WARN_RL(&err_rl, "invalid srv6 segments_left=%d\n",
> +                     rt_hdr->segments_left);
> +        goto err;
> +    }
> +
> +    if (rt_hdr->nexthdr == IPPROTO_IPIP) {
> +        packet->packet_type = htonl(PT_IPV4);
> +    } else if (rt_hdr->nexthdr == IPPROTO_IPV6) {
> +        packet->packet_type = htonl(PT_IPV6);
> +    } else {
> +        goto err;
> +    }
> +
> +    pkt_metadata_init_tnl(md);
> +    netdev_tnl_ip_extract_tnl_md(packet, tnl, &hlen);
> +    dp_packet_reset_packet(packet, hlen);
> +
> +    return packet;
> +err:
> +    dp_packet_delete(packet);
> +    return NULL;
> +}
> +
>  struct dp_packet *
>  netdev_vxlan_pop_header(struct dp_packet *packet)
>  {
> diff --git a/lib/netdev-native-tnl.h b/lib/netdev-native-tnl.h
> index 22ae2ce5369b..07dae27973e6 100644
> --- a/lib/netdev-native-tnl.h
> +++ b/lib/netdev-native-tnl.h
> @@ -65,6 +65,16 @@ netdev_gtpu_build_header(const struct netdev *netdev,
>                           struct ovs_action_push_tnl *data,
>                           const struct netdev_tnl_build_header_params *p);
>  
> +struct dp_packet *netdev_srv6_pop_header(struct dp_packet *packet);
> +
> +void netdev_srv6_push_header(const struct netdev *netdev,
> +                             struct dp_packet *packet,
> +                             const struct ovs_action_push_tnl *data);
> +
> +int netdev_srv6_build_header(const struct netdev *netdev,
> +                             struct ovs_action_push_tnl *data,
> +                             const struct netdev_tnl_build_header_params *p);

Nit: There is no need to have variable names in function prototypes.
     Types here are self-descriptive.

> +
>  void
>  netdev_tnl_push_udp_header(const struct netdev *netdev,
>                             struct dp_packet *packet,
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 3b39278650d3..663ee8606c3b 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -424,6 +424,35 @@ parse_tunnel_ip(const char *value, bool accept_mcast, bool *flow,
>      return 0;
>  }
>  
> +static int
> +parse_srv6_segs(char *s, struct in6_addr *segs, uint8_t *num_segs)
> +{
> +    char *save_ptr = NULL;
> +    char *token;
> +
> +    if (!s) {
> +        return EINVAL;
> +    }
> +
> +    *num_segs = 0;
> +
> +    while ((token = strtok_r(s, ",", &save_ptr)) != NULL) {
> +        if (*num_segs == SRV6_MAX_SEGS) {
> +            return EINVAL;
> +        }
> +
> +        if (inet_pton(AF_INET6, token, segs) != 1) {
> +            return EINVAL;
> +        }
> +
> +        segs++;
> +        (*num_segs)++;
> +        s = NULL;
> +    }
> +
> +    return 0;
> +}
> +
>  enum tunnel_layers {
>      TNL_L2 = 1 << 0,       /* 1 if a tunnel type can carry Ethernet traffic. */
>      TNL_L3 = 1 << 1        /* 1 if a tunnel type can carry L3 traffic. */
> @@ -443,6 +472,8 @@ tunnel_supported_layers(const char *type,
>          return TNL_L3;
>      } else if (!strcmp(type, "bareudp")) {
>          return TNL_L3;
> +    } else if (!strcmp(type, "srv6")) {
> +        return TNL_L3;
>      } else {
>          return TNL_L2;
>      }
> @@ -750,6 +781,17 @@ set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
>                      goto out;
>                  }
>              }
> +        } else if (!strcmp(node->key, "srv6_segs")) {
> +            err = parse_srv6_segs(node->value,
> +                                  tnl_cfg.srv6_segs,
> +                                  &tnl_cfg.srv6_num_segs);
> +
> +            switch (err) {
> +            case EINVAL:
> +                ds_put_format(&errors, "%s: bad %s 'srv6_segs'\n",
> +                              name, node->value);
> +                break;
> +            }
>          } else if (!strcmp(node->key, "payload_type")) {
>              if (!strcmp(node->value, "mpls")) {
>                   tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
> @@ -1290,6 +1332,17 @@ netdev_vport_tunnel_register(void)
>            },
>            {{NULL, NULL, 0, 0}}
>          },
> +        { "srv6_sys",
> +          {
> +              TUNNEL_FUNCTIONS_COMMON,
> +              .type = "srv6",
> +              .build_header = netdev_srv6_build_header,
> +              .push_header = netdev_srv6_push_header,
> +              .pop_header = netdev_srv6_pop_header,
> +              .get_ifindex = NETDEV_VPORT_GET_IFINDEX,
> +          },
> +          {{NULL, NULL, 0, 0}}
> +        },
>  
>      };
>      static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> diff --git a/lib/netdev.h b/lib/netdev.h
> index acf174927d24..ff207f56c28c 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -140,6 +140,10 @@ struct netdev_tunnel_config {
>      bool erspan_idx_flow;
>      bool erspan_dir_flow;
>      bool erspan_hwid_flow;
> +
> +    uint8_t srv6_num_segs;
> +    #define SRV6_MAX_SEGS 6
> +    struct in6_addr srv6_segs[SRV6_MAX_SEGS];
>  };
>  
>  void netdev_run(void);
> diff --git a/lib/packets.h b/lib/packets.h
> index abff24db016e..312e849f9f26 100644
> --- a/lib/packets.h
> +++ b/lib/packets.h
> @@ -1527,6 +1527,17 @@ BUILD_ASSERT_DECL(sizeof(struct vxlanhdr) == 8);
>  #define VXLAN_F_GPE  0x4000
>  #define VXLAN_HF_GPE 0x04000000
>  
> +/* SRv6 protocol header */

Period at the end of a comment.

> +#define IPV6_SRCRT_TYPE_4 4
> +#define SRV6_BASE_HDR_LEN 8
> +struct srv6_base_hdr {
> +    struct ip6_rt_hdr rt_hdr;
> +    uint8_t last_entry;
> +    uint8_t flags;
> +    ovs_be16 tag;
> +};
> +BUILD_ASSERT_DECL(sizeof(struct srv6_base_hdr) == SRV6_BASE_HDR_LEN);
> +
>  /* Input values for PACKET_TYPE macros have to be in host byte order.
>   * The _BE postfix indicates result is in network byte order. Otherwise result
>   * is in host byte order. */
> diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c
> index da9939afa8a1..297962895f20 100644
> --- a/lib/tnl-ports.c
> +++ b/lib/tnl-ports.c
> @@ -126,7 +126,7 @@ map_insert(odp_port_t port, struct eth_addr mac, struct in6_addr *addr,
>           /* XXX: No fragments support. */
>          match.wc.masks.nw_frag = FLOW_NW_FRAG_MASK;
>  
> -        /* 'tp_port' is zero for GRE tunnels. In this case it
> +        /* 'tp_port' is zero for GRE and SRv6 tunnels. In this case it
>           * doesn't make sense to match on UDP port numbers. */
>          if (tp_port) {
>              match.wc.masks.tp_dst = OVS_BE16_MAX;
> @@ -182,6 +182,10 @@ tnl_type_to_nw_proto(const char type[], uint8_t nw_protos[2])
>      if (!strcmp(type, "gtpu")) {
>          nw_protos[0] = IPPROTO_UDP;
>      }
> +    if (!strcmp(type, "srv6")) {
> +        nw_protos[0] = IPPROTO_IPIP;
> +        nw_protos[1] = IPPROTO_IPV6;
> +    }
>  }
>  
>  static void
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index a9cf3cbee0be..15c814d6285b 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3632,6 +3632,9 @@ propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, struct eth_addr dmac,
>      case OVS_VPORT_TYPE_BAREUDP:
>          nw_proto = IPPROTO_UDP;
>          break;
> +    case OVS_VPORT_TYPE_SRV6:
> +        nw_proto = IPPROTO_IPIP;
> +        break;
>      case OVS_VPORT_TYPE_LISP:
>      case OVS_VPORT_TYPE_STT:
>      case OVS_VPORT_TYPE_UNSPEC:
> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
> index 822a80618d6f..fb15a5a7ce03 100644
> --- a/tests/system-kmod-macros.at
> +++ b/tests/system-kmod-macros.at
> @@ -202,6 +202,14 @@ m4_define([OVS_CHECK_KERNEL_EXCL],
>      AT_SKIP_IF([ ! ( test $version -lt $1 || ( test $version -eq $1 && test $sublevel -lt $2 ) || test $version -gt $3 || ( test $version -eq $3 && test $sublevel -gt $4 ) ) ])
>  ])
>  
> +# OVS_CHECK_SRV6()
> +#
> +# The kernel datapath does not support this feature.
> +m4_define([OVS_CHECK_SRV6],
> +[
> +    AT_SKIP_IF([:])
> +])
> +
>  # CHECK_LATER_IPV6_FRAGMENTS()
>  #
>  # Upstream kernels beetween 4.20 and 5.19 are not parsing IPv6 fragments
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 2558f3b24d7d..c023932a1d48 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -1164,6 +1164,125 @@ OVS_WAIT_UNTIL([cat p0.pcap | grep -E "IP6 fc00:100::100 > fc00:100::1: GREv0, .
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([datapath - ping over srv6 tunnel])
> +OVS_CHECK_TUNNEL_TSO()
> +OVS_CHECK_SRV6()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0)
> +ADD_NAMESPACES(at_ns1)
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.seg6_enabled=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv4.conf.default.forwarding=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.forwarding=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.seg6_enabled=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv4.conf.all.forwarding=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.forwarding=1])
> +
> +dnl Set up underlay link from host into the namespace 'at_ns0'
> +dnl using veth pair. Kernel side tunnel endpoint (SID) is
> +dnl 'fc00:a::1/128', so add it to the route.
> +ADD_BR([br-underlay])
> +ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +AT_CHECK([ip route add fc00:a::1/128 dev br-underlay via fc00::1])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace.
> +ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [10.100.100.100/24])
> +AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
> +AT_CHECK([ip route add 10.1.1.0/24 dev br0 via 10.100.100.1])
> +AT_CHECK([arp -s 10.100.100.1 aa:55:aa:55:00:01])
> +AT_CHECK([ovs-ofctl add-flow br0 in_port=LOCAL,actions=output:at_srv6])
> +AT_CHECK([ovs-ofctl add-flow br0 in_port=at_srv6,actions=mod_dl_dst:aa:55:aa:55:00:00,output:LOCAL])
> +
> +dnl Set up tunnel endpoints on the namespace 'at_ns0',
> +dnl and overlay port on the namespace 'at_ns1'
> +ADD_VETH_NS([at_ns0], [veth0], [10.1.1.2/24], [at_ns1], [veth1], [10.1.1.1/24])
> +NS_CHECK_EXEC([at_ns0], [ip sr tunsrc set fc00:a::1])
> +NS_CHECK_EXEC([at_ns0], [ip route add 10.100.100.0/24 encap seg6 mode encap segs fc00::100 dev p0])
> +NS_CHECK_EXEC([at_ns0], [ip -6 route add fc00:a::1 encap seg6local action End.DX4 nh4 0.0.0.0 dev veth0])
> +NS_CHECK_EXEC([at_ns1], [ip route add 10.100.100.0/24 via 10.1.1.2 dev veth1])
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> +
> +dnl First, check the underlay.
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay.
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.100.100.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping6 over srv6 tunnel])
> +OVS_CHECK_TUNNEL_TSO()
> +OVS_CHECK_SRV6()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0)
> +ADD_NAMESPACES(at_ns1)
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.seg6_enabled=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.forwarding=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.seg6_enabled=1])
> +NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.forwarding=1])
> +
> +dnl Set up underlay link from host into the namespace 'at_ns0'
> +dnl using veth pair. Kernel side tunnel endpoint (SID) is
> +dnl 'fc00:a::1/128', so add it to the route.
> +ADD_BR([br-underlay])
> +ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +AT_CHECK([ip -6 route add fc00:a::1/128 dev br-underlay via fc00::1])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace.
> +ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [fc00:100::100/64])
> +AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
> +dnl [sleep infinity]

This line seems strange.

> +AT_CHECK([ip addr add dev br0 fc00:100::100/64])
> +AT_CHECK([ip -6 route add fc00:1::1/128 dev br0 via fc00:100::1])
> +AT_CHECK([ip -6 neigh add fc00:100::1 lladdr aa:55:aa:55:00:01 dev br0])
> +AT_CHECK([ovs-ofctl add-flow br0 in_port=LOCAL,actions=output:at_srv6])
> +AT_CHECK([ovs-ofctl add-flow br0 in_port=at_srv6,actions=mod_dl_dst:aa:55:aa:55:00:00,output:LOCAL])
> +
> +dnl Set up tunnel endpoints on the namespace 'at_ns0',
> +dnl and overlay port on the namespace 'at_ns1'
> +ADD_VETH_NS([at_ns0], [veth0], [fc00:1::2/64], [at_ns1], [veth1], [fc00:1::1/64])
> +NS_CHECK_EXEC([at_ns0], [ip sr tunsrc set fc00:a::1])
> +NS_CHECK_EXEC([at_ns0], [ip -6 route add fc00:100::0/64 encap seg6 mode encap segs fc00::100 dev p0])
> +NS_CHECK_EXEC([at_ns0], [ip -6 route add fc00:a::1 encap seg6local action End.DX6 nh6 :: dev veth0])
> +NS_CHECK_EXEC([at_ns1], [ip -6 route add fc00:100::/64 via fc00:1::2 dev veth1])
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> +OVS_WAIT_UNTIL([ip netns exec at_ns1 ping6 -c 1 fc00:100::100])
> +
> +dnl First, check the underlay.
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay.
> +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([datapath - clone action])
>  OVS_TRAFFIC_VSWITCHD_START()
>  
> diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at
> index 610fa2e94ae8..482079386a43 100644
> --- a/tests/system-userspace-macros.at
> +++ b/tests/system-userspace-macros.at
> @@ -301,6 +301,12 @@ m4_define([OVS_CHECK_KERNEL_EXCL],
>      AT_SKIP_IF([:])
>  ])
>  
> +# OVS_CHECK_SRV6()
> +m4_define([OVS_CHECK_SRV6],
> +    [AT_SKIP_IF([! ip -6 route add fc00::1/96 encap seg6 mode encap dev lo 2>&1 >/dev/null])
> +     AT_CHECK([ip -6 route del fc00::1/96 2>&1 >/dev/null])
> +     OVS_CHECK_FIREWALL()])
> +
>  # CHECK_LATER_IPV6_FRAGMENTS()
>  #
>  # Userspace is parsing later IPv6 fragments correctly.
> diff --git a/tests/tunnel.at b/tests/tunnel.at
> index 78cc3f3e99a6..ddeb66bc9fb7 100644
> --- a/tests/tunnel.at
> +++ b/tests/tunnel.at
> @@ -1223,3 +1223,59 @@ AT_CHECK([ovs-vsctl add-port br0 p1 -- set int p1 type=dummy])
>  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])]
>  AT_CLEANUP
> +
> +AT_SETUP([tunnel - SRV6 basic])
> +OVS_VSWITCHD_START([add-port br0 p1 -- set Interface p1 type=dummy \
> +                    ofport_request=1 \
> +                    -- add-port br0 p2 -- set Interface p2 type=srv6 \
> +                    options:remote_ip=flow \
> +                    ofport_request=2])
> +OVS_VSWITCHD_DISABLE_TUNNEL_PUSH_POP
> +
> +dnl First setup dummy interface IP address, then add the route
> +dnl so that tnl-port table can get valid IP address for the device.
> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 fc00::1/64], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00::0/64 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/show], [0], [dnl
> +Route Table:
> +User: fc00::/64 dev br0 SRC fc00::1
> +])
> +
> +AT_DATA([flows.txt], [dnl
> +in_port=1,actions=set_field:fc00::2->tun_ipv6_dst,output:2
> +in_port=2,actions=1
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +AT_CHECK([ovs-appctl dpif/show | tail -n +3], [0], [dnl
> +    br0 65534/100: (dummy-internal)
> +    p1 1/1: (dummy)
> +    p2 2/6: (srv6: remote_ip=flow)
> +])
> +
> +AT_CHECK([ovs-appctl tnl/ports/show |sort], [0], [dnl
> +Listening ports:
> +srv6_sys (6) ref_cnt=1
> +srv6_sys (6) ref_cnt=1
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/list-tunnels], [0], [dnl
> +port 6: p2 (srv6: ::->flow, key=0, legacy_l3, dp port=6, ttl=64)
> +])
> +
> +dnl Encap: ipv4 inner packet
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=4,ttl=128,frag=no),tcp(src=8,dst=9)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout], [0],
> +  [Datapath actions: set(tunnel(ipv6_dst=fc00::2,ttl=64,flags(df))),pop_eth,6
> +])
> +
> +dnl Encap: ipv6 inner packet
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x86dd),ipv6(src=2001:cafe::92,dst=2001:cafe::88,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
> +AT_CHECK([tail -1 stdout], [0],
> +  [Datapath actions: set(tunnel(ipv6_dst=fc00::2,ttl=64,flags(df))),pop_eth,6
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
Nobuhiro MIKI March 22, 2023, 7:34 a.m. UTC | #2
On 2023/03/22 8:15, Ilya Maximets wrote:
> On 3/15/23 07:07, Nobuhiro MIKI wrote:
>> SRv6 (Segment Routing IPv6) tunnel vport is responsible
>> for encapsulation and decapsulation the inner packets with
>> IPv6 header and an extended header called SRH
>> (Segment Routing Header). See spec in:
>>
>> https://datatracker.ietf.org/doc/html/rfc8754
>>
>> This patch implements SRv6 tunneling in userspace datapath.
>> It uses `remote_ip` and `local_ip` options as with existing
>> tunnel protocols. It also adds a dedicated `srv6_segs` option
>> to define a sequence of routers called segment list.
>>
>> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
>> ---
>>  Documentation/faq/configuration.rst |  21 ++++
>>  Documentation/faq/releases.rst      |   1 +
>>  NEWS                                |   2 +
>>  include/linux/openvswitch.h         |   1 +
>>  lib/dpif-netlink-rtnl.c             |   5 +
>>  lib/dpif-netlink.c                  |   5 +
>>  lib/netdev-native-tnl.c             | 145 ++++++++++++++++++++++++++++
>>  lib/netdev-native-tnl.h             |  10 ++
>>  lib/netdev-vport.c                  |  53 ++++++++++
>>  lib/netdev.h                        |   4 +
>>  lib/packets.h                       |  11 +++
>>  lib/tnl-ports.c                     |   6 +-
>>  ofproto/ofproto-dpif-xlate.c        |   3 +
>>  tests/system-kmod-macros.at         |   8 ++
>>  tests/system-traffic.at             | 119 +++++++++++++++++++++++
>>  tests/system-userspace-macros.at    |   6 ++
>>  tests/tunnel.at                     |  56 +++++++++++
>>  17 files changed, 455 insertions(+), 1 deletion(-)
> 
> Thanks for the new version!  The code looks good in general.
> See some small comments inline.
> 

Thanks for your review!
I have written reply inline.

Best Regards,
Nobuhiro Miki

>> +static void
>> +srv6_build_header(struct ovs_action_push_tnl *data,
>> +                  const struct netdev_tnl_build_header_params *params,
>> +                  int nr_segs, const struct in6_addr *segs)
>> +{
>> +    struct ovs_16aligned_ip6_hdr *nh6;
>> +    struct srv6_base_hdr *srh;
>> +    struct in6_addr *s;
>> +    unsigned int hlen;
>> +    ovs_be16 dl_type;
>> +    int i;
>> +
>> +    ovs_assert(nr_segs > 0);
>> +
>> +    nh6 = (struct ovs_16aligned_ip6_hdr *) eth_build_header(data, params);
>> +    put_16aligned_be32(&nh6->ip6_flow, htonl(6 << 28) |
>> +                       htonl(params->flow->tunnel.ip_tos << 20));
>> +    nh6->ip6_hlim = params->flow->tunnel.ip_ttl;
>> +    nh6->ip6_nxt = IPPROTO_ROUTING;
>> +    memcpy(&nh6->ip6_src, params->s_ip, sizeof(ovs_be32[4]));
>> +    memcpy(&nh6->ip6_dst, &segs[0], sizeof(ovs_be32[4]));
> 
> We should probably use netdev_tnl_ip_build_header() here.
> See below...
> 

Indeed it is. I'll fix it.

>> +
>> +    srh = (struct srv6_base_hdr *) (nh6 + 1);
>> +    dl_type = params->flow->dl_type;
>> +    if (dl_type == htons(ETH_TYPE_IP)) {
>> +        srh->rt_hdr.nexthdr = IPPROTO_IPIP;
>> +    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
>> +        srh->rt_hdr.nexthdr = IPPROTO_IPV6;
>> +    }
>> +    srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>> +    srh->rt_hdr.hdrlen = 2 * nr_segs;
>> +    srh->rt_hdr.segments_left = nr_segs - 1;
>> +    srh->last_entry = nr_segs - 1;
>> +    srh->flags = 0;
>> +    srh->tag = 0;
>> +
>> +    s = ALIGNED_CAST(struct in6_addr *,
>> +                     (char *) srh + sizeof(struct srv6_base_hdr));
>> +    for (i = 0; i < nr_segs; i++) {
>> +        /* Segment list is written to the header in reverse order. */
>> +        memcpy(s, &segs[nr_segs - i - 1], sizeof(ovs_be32[4]));
> 
> It should be sizeof *s instead.  We should avoid using type sizes
> as per the coding style. (I know netdev_tnl_ip_build_header is not
> following that, unfortunately.)
> 

OK. I'll look at similar code together.

>> +        s++;
>> +    }
>> +
>> +    hlen = IPV6_HEADER_LEN + sizeof(struct srv6_base_hdr) +
>> +           8 * srh->rt_hdr.hdrlen;
>> +
>> +    data->header_len += hlen;
>> +    data->tnl_type = OVS_VPORT_TYPE_SRV6;
>> +}
>> +
>> +int
>> +netdev_srv6_build_header(const struct netdev *netdev,
>> +                         struct ovs_action_push_tnl *data,
>> +                         const struct netdev_tnl_build_header_params *params)
>> +{
>> +    struct netdev_vport *dev = netdev_vport_cast(netdev);
>> +    struct netdev_tunnel_config *tnl_cfg;
>> +
>> +    ovs_mutex_lock(&dev->mutex);
>> +    tnl_cfg = &dev->tnl_cfg;
>> +
>> +    if (tnl_cfg->srv6_num_segs) {
>> +        srv6_build_header(data, params,
>> +                          tnl_cfg->srv6_num_segs, tnl_cfg->srv6_segs);
>> +    } else {
>> +        /*
>> +         * If explicit segment list setting is omitted, tunnel destination
>> +         * is considered to be the first segment list.
>> +         */
>> +        srv6_build_header(data, params,
>> +                          1, &params->flow->tunnel.ipv6_dst);
>> +    }
> 
> Isn't it a misconfig if segs[0] != params->flow->tunnel.ipv6_dst ?
> I mean, we shouldn't be sending a packet anywhere other than the
> first address in the segment list, right?
> 

Right.
We need to be sure that segs[0] == params->flow->tunnel.ipv6_dst.

> I thinking that maybe we can check that tunnel.ipv6_dst equals seg[0]
> and fail the header build if it's not the case.  Then we can simply
> use a generic function netdev_tnl_ip_build_header() to build a base
> IPv6 header for us.  Then only fill in SRv6 specific fields.
> 
> What do you think?
> 

Thanks. It seems like a simpler and better approach.
Less conditional branching and also the intended behavior.
I imagine the implementation to be as follows:
 
1. if num_segs = 0 (i.e., the user did not specify a segment list),
   set segs[0] = tunnel.ipv6_dst and num_segs = 1.

2. if tunnel.ipv6_dst != segs[0], we make header build fail.

3. call netdev_tnl_ip_build_header().

4. fill segment routing header.

How about this approach?

>> diff --git a/lib/netdev-native-tnl.h b/lib/netdev-native-tnl.h
>> index 22ae2ce5369b..07dae27973e6 100644
>> --- a/lib/netdev-native-tnl.h
>> +++ b/lib/netdev-native-tnl.h
>> @@ -65,6 +65,16 @@ netdev_gtpu_build_header(const struct netdev *netdev,
>>                           struct ovs_action_push_tnl *data,
>>                           const struct netdev_tnl_build_header_params *p);
>>  
>> +struct dp_packet *netdev_srv6_pop_header(struct dp_packet *packet);
>> +
>> +void netdev_srv6_push_header(const struct netdev *netdev,
>> +                             struct dp_packet *packet,
>> +                             const struct ovs_action_push_tnl *data);
>> +
>> +int netdev_srv6_build_header(const struct netdev *netdev,
>> +                             struct ovs_action_push_tnl *data,
>> +                             const struct netdev_tnl_build_header_params *p);
> 
> Nit: There is no need to have variable names in function prototypes.
>      Types here are self-descriptive.
> 

OK. I'll fix.

>> diff --git a/lib/packets.h b/lib/packets.h
>> index abff24db016e..312e849f9f26 100644
>> --- a/lib/packets.h
>> +++ b/lib/packets.h
>> @@ -1527,6 +1527,17 @@ BUILD_ASSERT_DECL(sizeof(struct vxlanhdr) == 8);
>>  #define VXLAN_F_GPE  0x4000
>>  #define VXLAN_HF_GPE 0x04000000
>>  
>> +/* SRv6 protocol header */
> 
> Period at the end of a comment.
> 

OK. I'll fix.

>> +dnl Set up tunnel endpoints on OVS outside the namespace.
>> +ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [fc00:100::100/64])
>> +AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
>> +dnl [sleep infinity]
> 
> This line seems strange.
> 

Sorry. I'll remove it.
Ilya Maximets March 22, 2023, 9:57 a.m. UTC | #3
On 3/22/23 08:34, Nobuhiro MIKI wrote:
> On 2023/03/22 8:15, Ilya Maximets wrote:
>> On 3/15/23 07:07, Nobuhiro MIKI wrote:
>>> SRv6 (Segment Routing IPv6) tunnel vport is responsible
>>> for encapsulation and decapsulation the inner packets with
>>> IPv6 header and an extended header called SRH
>>> (Segment Routing Header). See spec in:
>>>
>>> https://datatracker.ietf.org/doc/html/rfc8754
>>>
>>> This patch implements SRv6 tunneling in userspace datapath.
>>> It uses `remote_ip` and `local_ip` options as with existing
>>> tunnel protocols. It also adds a dedicated `srv6_segs` option
>>> to define a sequence of routers called segment list.
>>>
>>> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
>>> ---
>>>  Documentation/faq/configuration.rst |  21 ++++
>>>  Documentation/faq/releases.rst      |   1 +
>>>  NEWS                                |   2 +
>>>  include/linux/openvswitch.h         |   1 +
>>>  lib/dpif-netlink-rtnl.c             |   5 +
>>>  lib/dpif-netlink.c                  |   5 +
>>>  lib/netdev-native-tnl.c             | 145 ++++++++++++++++++++++++++++
>>>  lib/netdev-native-tnl.h             |  10 ++
>>>  lib/netdev-vport.c                  |  53 ++++++++++
>>>  lib/netdev.h                        |   4 +
>>>  lib/packets.h                       |  11 +++
>>>  lib/tnl-ports.c                     |   6 +-
>>>  ofproto/ofproto-dpif-xlate.c        |   3 +
>>>  tests/system-kmod-macros.at         |   8 ++
>>>  tests/system-traffic.at             | 119 +++++++++++++++++++++++
>>>  tests/system-userspace-macros.at    |   6 ++
>>>  tests/tunnel.at                     |  56 +++++++++++
>>>  17 files changed, 455 insertions(+), 1 deletion(-)
>>
>> Thanks for the new version!  The code looks good in general.
>> See some small comments inline.
>>
> 
> Thanks for your review!
> I have written reply inline.
> 
> Best Regards,
> Nobuhiro Miki
> 
>>> +static void
>>> +srv6_build_header(struct ovs_action_push_tnl *data,
>>> +                  const struct netdev_tnl_build_header_params *params,
>>> +                  int nr_segs, const struct in6_addr *segs)
>>> +{
>>> +    struct ovs_16aligned_ip6_hdr *nh6;
>>> +    struct srv6_base_hdr *srh;
>>> +    struct in6_addr *s;
>>> +    unsigned int hlen;
>>> +    ovs_be16 dl_type;
>>> +    int i;
>>> +
>>> +    ovs_assert(nr_segs > 0);
>>> +
>>> +    nh6 = (struct ovs_16aligned_ip6_hdr *) eth_build_header(data, params);
>>> +    put_16aligned_be32(&nh6->ip6_flow, htonl(6 << 28) |
>>> +                       htonl(params->flow->tunnel.ip_tos << 20));
>>> +    nh6->ip6_hlim = params->flow->tunnel.ip_ttl;
>>> +    nh6->ip6_nxt = IPPROTO_ROUTING;
>>> +    memcpy(&nh6->ip6_src, params->s_ip, sizeof(ovs_be32[4]));
>>> +    memcpy(&nh6->ip6_dst, &segs[0], sizeof(ovs_be32[4]));
>>
>> We should probably use netdev_tnl_ip_build_header() here.
>> See below...
>>
> 
> Indeed it is. I'll fix it.
> 
>>> +
>>> +    srh = (struct srv6_base_hdr *) (nh6 + 1);
>>> +    dl_type = params->flow->dl_type;
>>> +    if (dl_type == htons(ETH_TYPE_IP)) {
>>> +        srh->rt_hdr.nexthdr = IPPROTO_IPIP;
>>> +    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
>>> +        srh->rt_hdr.nexthdr = IPPROTO_IPV6;
>>> +    }
>>> +    srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>>> +    srh->rt_hdr.hdrlen = 2 * nr_segs;
>>> +    srh->rt_hdr.segments_left = nr_segs - 1;
>>> +    srh->last_entry = nr_segs - 1;
>>> +    srh->flags = 0;
>>> +    srh->tag = 0;
>>> +
>>> +    s = ALIGNED_CAST(struct in6_addr *,
>>> +                     (char *) srh + sizeof(struct srv6_base_hdr));
>>> +    for (i = 0; i < nr_segs; i++) {
>>> +        /* Segment list is written to the header in reverse order. */
>>> +        memcpy(s, &segs[nr_segs - i - 1], sizeof(ovs_be32[4]));
>>
>> It should be sizeof *s instead.  We should avoid using type sizes
>> as per the coding style. (I know netdev_tnl_ip_build_header is not
>> following that, unfortunately.)
>>
> 
> OK. I'll look at similar code together.
> 
>>> +        s++;
>>> +    }
>>> +
>>> +    hlen = IPV6_HEADER_LEN + sizeof(struct srv6_base_hdr) +
>>> +           8 * srh->rt_hdr.hdrlen;
>>> +
>>> +    data->header_len += hlen;
>>> +    data->tnl_type = OVS_VPORT_TYPE_SRV6;
>>> +}
>>> +
>>> +int
>>> +netdev_srv6_build_header(const struct netdev *netdev,
>>> +                         struct ovs_action_push_tnl *data,
>>> +                         const struct netdev_tnl_build_header_params *params)
>>> +{
>>> +    struct netdev_vport *dev = netdev_vport_cast(netdev);
>>> +    struct netdev_tunnel_config *tnl_cfg;
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    tnl_cfg = &dev->tnl_cfg;
>>> +
>>> +    if (tnl_cfg->srv6_num_segs) {
>>> +        srv6_build_header(data, params,
>>> +                          tnl_cfg->srv6_num_segs, tnl_cfg->srv6_segs);
>>> +    } else {
>>> +        /*
>>> +         * If explicit segment list setting is omitted, tunnel destination
>>> +         * is considered to be the first segment list.
>>> +         */
>>> +        srv6_build_header(data, params,
>>> +                          1, &params->flow->tunnel.ipv6_dst);
>>> +    }
>>
>> Isn't it a misconfig if segs[0] != params->flow->tunnel.ipv6_dst ?
>> I mean, we shouldn't be sending a packet anywhere other than the
>> first address in the segment list, right?
>>
> 
> Right.
> We need to be sure that segs[0] == params->flow->tunnel.ipv6_dst.
> 
>> I thinking that maybe we can check that tunnel.ipv6_dst equals seg[0]
>> and fail the header build if it's not the case.  Then we can simply
>> use a generic function netdev_tnl_ip_build_header() to build a base
>> IPv6 header for us.  Then only fill in SRv6 specific fields.
>>
>> What do you think?
>>
> 
> Thanks. It seems like a simpler and better approach.
> Less conditional branching and also the intended behavior.
> I imagine the implementation to be as follows:
>  
> 1. if num_segs = 0 (i.e., the user did not specify a segment list),
>    set segs[0] = tunnel.ipv6_dst and num_segs = 1.
> 
> 2. if tunnel.ipv6_dst != segs[0], we make header build fail.
> 
> 3. call netdev_tnl_ip_build_header().
> 
> 4. fill segment routing header.
> 
> How about this approach?

Sounds good.  Thanks!

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/Documentation/faq/configuration.rst b/Documentation/faq/configuration.rst
index dc6c92446f98..11b1c7e826f3 100644
--- a/Documentation/faq/configuration.rst
+++ b/Documentation/faq/configuration.rst
@@ -238,6 +238,27 @@  Q: Does Open vSwitch support GTP-U?
                 set int gtpu0 type=gtpu options:key=<teid> \
                 options:remote_ip=172.31.1.1
 
+Q: Does Open vSwitch support SRv6?
+
+    A: Yes. Starting with version 3.2, the Open vSwitch userspace
+    datapath supports SRv6 (Segment Routing over IPv6). The following
+    example shows tunneling to fc00:300::1 via fc00:100::1 and fc00:200::1.
+    In the current implementation, if "IPv6 in IPv6" or "IPv4 in IPv6" packets
+    are routed to this interface, and these packets are not SRv6 packets, they
+    may be dropped, so be careful in workloads with a mix of these tunnels.
+    Also note the following restrictions:
+
+    * Segment list length is limited to 6.
+    * SRv6 packets with other than segments_left = 0 are simply dropped.
+
+    ::
+
+        $ ovs-vsctl add-br br0
+        $ ovs-vsctl add-port br0 srv6_0 -- \
+                set int srv6_0 type=srv6  \
+                options:remote_ip=fc00:300::1 \
+                options:srv6_segs="fc00:100::1,fc00:200::1,fc00:300::1"
+
 Q: How do I connect two bridges?
 
     A: First, why do you want to do this?  Two connected bridges are not much
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 9e1b42262000..9fb679e307d9 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -151,6 +151,7 @@  Q: Are all features available with all datapaths?
     Tunnel - ERSPAN                 4.18           2.10         2.10     NO
     Tunnel - ERSPAN-IPv6            4.18           2.10         2.10     NO
     Tunnel - GTP-U                  NO             NO           2.14     NO
+    Tunnel - SRv6                   NO             NO           3.2      NO
     Tunnel - Bareudp                5.7            NO           NO       NO
     QoS - Policing                  YES            1.1          2.6      NO
     QoS - Shaping                   YES            1.1          NO       NO
diff --git a/NEWS b/NEWS
index 72b9024e6d8a..9d2adff6cf3e 100644
--- a/NEWS
+++ b/NEWS
@@ -17,6 +17,8 @@  Post-v3.1.0
        in order to create OVSDB sockets with access mode of 0770.
    - QoS:
      * Added new configuration option 'jitter' for a linux-netem QoS type.
+   - SRv6 Tunnel Protocol
+     * Only support for userspace datapath.
 
 
 v3.1.0 - 16 Feb 2023
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index bc8f74991849..e305c331516b 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -254,6 +254,7 @@  enum ovs_vport_type {
 	OVS_VPORT_TYPE_IP6GRE = 109,
 	OVS_VPORT_TYPE_GTPU = 110,
 	OVS_VPORT_TYPE_BAREUDP = 111,  /* Bareudp tunnel. */
+	OVS_VPORT_TYPE_SRV6 = 112,  /* SRv6 tunnel. */
 	__OVS_VPORT_TYPE_MAX
 };
 
diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
index 4fc42daed2d9..5788294ae0d7 100644
--- a/lib/dpif-netlink-rtnl.c
+++ b/lib/dpif-netlink-rtnl.c
@@ -129,6 +129,8 @@  vport_type_to_kind(enum ovs_vport_type type,
         }
     case OVS_VPORT_TYPE_GTPU:
         return NULL;
+    case OVS_VPORT_TYPE_SRV6:
+        return "srv6";
     case OVS_VPORT_TYPE_BAREUDP:
         return "bareudp";
     case OVS_VPORT_TYPE_NETDEV:
@@ -319,6 +321,7 @@  dpif_netlink_rtnl_verify(const struct netdev_tunnel_config *tnl_cfg,
     case OVS_VPORT_TYPE_LISP:
     case OVS_VPORT_TYPE_STT:
     case OVS_VPORT_TYPE_GTPU:
+    case OVS_VPORT_TYPE_SRV6:
     case OVS_VPORT_TYPE_UNSPEC:
     case __OVS_VPORT_TYPE_MAX:
     default:
@@ -411,6 +414,7 @@  dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
     case OVS_VPORT_TYPE_LISP:
     case OVS_VPORT_TYPE_STT:
     case OVS_VPORT_TYPE_GTPU:
+    case OVS_VPORT_TYPE_SRV6:
     case OVS_VPORT_TYPE_UNSPEC:
     case __OVS_VPORT_TYPE_MAX:
     default:
@@ -519,6 +523,7 @@  dpif_netlink_rtnl_port_destroy(const char *name, const char *type)
     case OVS_VPORT_TYPE_ERSPAN:
     case OVS_VPORT_TYPE_IP6ERSPAN:
     case OVS_VPORT_TYPE_IP6GRE:
+    case OVS_VPORT_TYPE_SRV6:
     case OVS_VPORT_TYPE_BAREUDP:
         return dpif_netlink_rtnl_destroy(name);
     case OVS_VPORT_TYPE_NETDEV:
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 7875e573e643..44da6f54c983 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -919,6 +919,9 @@  get_vport_type(const struct dpif_netlink_vport *vport)
     case OVS_VPORT_TYPE_GTPU:
         return "gtpu";
 
+    case OVS_VPORT_TYPE_SRV6:
+        return "srv6";
+
     case OVS_VPORT_TYPE_BAREUDP:
         return "bareudp";
 
@@ -957,6 +960,8 @@  netdev_to_ovs_vport_type(const char *type)
         return OVS_VPORT_TYPE_GRE;
     } else if (!strcmp(type, "gtpu")) {
         return OVS_VPORT_TYPE_GTPU;
+    } else if (!strcmp(type, "srv6")) {
+        return OVS_VPORT_TYPE_SRV6;
     } else if (!strcmp(type, "bareudp")) {
         return OVS_VPORT_TYPE_BAREUDP;
     } else {
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index b89dfdd52a86..b324c8d058e1 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -845,6 +845,151 @@  netdev_gtpu_build_header(const struct netdev *netdev,
     return 0;
 }
 
+static void
+srv6_build_header(struct ovs_action_push_tnl *data,
+                  const struct netdev_tnl_build_header_params *params,
+                  int nr_segs, const struct in6_addr *segs)
+{
+    struct ovs_16aligned_ip6_hdr *nh6;
+    struct srv6_base_hdr *srh;
+    struct in6_addr *s;
+    unsigned int hlen;
+    ovs_be16 dl_type;
+    int i;
+
+    ovs_assert(nr_segs > 0);
+
+    nh6 = (struct ovs_16aligned_ip6_hdr *) eth_build_header(data, params);
+    put_16aligned_be32(&nh6->ip6_flow, htonl(6 << 28) |
+                       htonl(params->flow->tunnel.ip_tos << 20));
+    nh6->ip6_hlim = params->flow->tunnel.ip_ttl;
+    nh6->ip6_nxt = IPPROTO_ROUTING;
+    memcpy(&nh6->ip6_src, params->s_ip, sizeof(ovs_be32[4]));
+    memcpy(&nh6->ip6_dst, &segs[0], sizeof(ovs_be32[4]));
+
+    srh = (struct srv6_base_hdr *) (nh6 + 1);
+    dl_type = params->flow->dl_type;
+    if (dl_type == htons(ETH_TYPE_IP)) {
+        srh->rt_hdr.nexthdr = IPPROTO_IPIP;
+    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
+        srh->rt_hdr.nexthdr = IPPROTO_IPV6;
+    }
+    srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
+    srh->rt_hdr.hdrlen = 2 * nr_segs;
+    srh->rt_hdr.segments_left = nr_segs - 1;
+    srh->last_entry = nr_segs - 1;
+    srh->flags = 0;
+    srh->tag = 0;
+
+    s = ALIGNED_CAST(struct in6_addr *,
+                     (char *) srh + sizeof(struct srv6_base_hdr));
+    for (i = 0; i < nr_segs; i++) {
+        /* Segment list is written to the header in reverse order. */
+        memcpy(s, &segs[nr_segs - i - 1], sizeof(ovs_be32[4]));
+        s++;
+    }
+
+    hlen = IPV6_HEADER_LEN + sizeof(struct srv6_base_hdr) +
+           8 * srh->rt_hdr.hdrlen;
+
+    data->header_len += hlen;
+    data->tnl_type = OVS_VPORT_TYPE_SRV6;
+}
+
+int
+netdev_srv6_build_header(const struct netdev *netdev,
+                         struct ovs_action_push_tnl *data,
+                         const struct netdev_tnl_build_header_params *params)
+{
+    struct netdev_vport *dev = netdev_vport_cast(netdev);
+    struct netdev_tunnel_config *tnl_cfg;
+
+    ovs_mutex_lock(&dev->mutex);
+    tnl_cfg = &dev->tnl_cfg;
+
+    if (tnl_cfg->srv6_num_segs) {
+        srv6_build_header(data, params,
+                          tnl_cfg->srv6_num_segs, tnl_cfg->srv6_segs);
+    } else {
+        /*
+         * If explicit segment list setting is omitted, tunnel destination
+         * is considered to be the first segment list.
+         */
+        srv6_build_header(data, params,
+                          1, &params->flow->tunnel.ipv6_dst);
+    }
+
+    ovs_mutex_unlock(&dev->mutex);
+
+    return 0;
+}
+
+void
+netdev_srv6_push_header(const struct netdev *netdev OVS_UNUSED,
+                        struct dp_packet *packet OVS_UNUSED,
+                        const struct ovs_action_push_tnl *data OVS_UNUSED)
+{
+    int ip_tot_size;
+
+    netdev_tnl_push_ip_header(packet, data->header,
+                              data->header_len, &ip_tot_size);
+}
+
+struct dp_packet *
+netdev_srv6_pop_header(struct dp_packet *packet)
+{
+    const struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
+    size_t size = dp_packet_l3_size(packet) - IPV6_HEADER_LEN;
+    const struct ovs_16aligned_ip6_frag *frag_hdr = NULL;
+    const struct ip6_rt_hdr *rt_hdr = NULL;
+    struct pkt_metadata *md = &packet->md;
+    struct flow_tnl *tnl = &md->tunnel;
+    uint8_t nw_proto = nh->ip6_nxt;
+    const void *data = nh + 1;
+    uint8_t nw_frag = 0;
+    unsigned int hlen;
+
+    /*
+     * Verifies that the routing header is present in the IPv6
+     * extension headers and that its type is SRv6.
+     * */
+    if (!parse_ipv6_ext_hdrs(&data, &size, &nw_proto, &nw_frag,
+                             &frag_hdr, &rt_hdr)) {
+        goto err;
+    }
+
+    if (!rt_hdr) {
+        goto err;
+    }
+
+    if (rt_hdr->type != IPV6_SRCRT_TYPE_4) {
+        goto err;
+    }
+
+    if (rt_hdr->segments_left > 0) {
+        VLOG_WARN_RL(&err_rl, "invalid srv6 segments_left=%d\n",
+                     rt_hdr->segments_left);
+        goto err;
+    }
+
+    if (rt_hdr->nexthdr == IPPROTO_IPIP) {
+        packet->packet_type = htonl(PT_IPV4);
+    } else if (rt_hdr->nexthdr == IPPROTO_IPV6) {
+        packet->packet_type = htonl(PT_IPV6);
+    } else {
+        goto err;
+    }
+
+    pkt_metadata_init_tnl(md);
+    netdev_tnl_ip_extract_tnl_md(packet, tnl, &hlen);
+    dp_packet_reset_packet(packet, hlen);
+
+    return packet;
+err:
+    dp_packet_delete(packet);
+    return NULL;
+}
+
 struct dp_packet *
 netdev_vxlan_pop_header(struct dp_packet *packet)
 {
diff --git a/lib/netdev-native-tnl.h b/lib/netdev-native-tnl.h
index 22ae2ce5369b..07dae27973e6 100644
--- a/lib/netdev-native-tnl.h
+++ b/lib/netdev-native-tnl.h
@@ -65,6 +65,16 @@  netdev_gtpu_build_header(const struct netdev *netdev,
                          struct ovs_action_push_tnl *data,
                          const struct netdev_tnl_build_header_params *p);
 
+struct dp_packet *netdev_srv6_pop_header(struct dp_packet *packet);
+
+void netdev_srv6_push_header(const struct netdev *netdev,
+                             struct dp_packet *packet,
+                             const struct ovs_action_push_tnl *data);
+
+int netdev_srv6_build_header(const struct netdev *netdev,
+                             struct ovs_action_push_tnl *data,
+                             const struct netdev_tnl_build_header_params *p);
+
 void
 netdev_tnl_push_udp_header(const struct netdev *netdev,
                            struct dp_packet *packet,
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 3b39278650d3..663ee8606c3b 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -424,6 +424,35 @@  parse_tunnel_ip(const char *value, bool accept_mcast, bool *flow,
     return 0;
 }
 
+static int
+parse_srv6_segs(char *s, struct in6_addr *segs, uint8_t *num_segs)
+{
+    char *save_ptr = NULL;
+    char *token;
+
+    if (!s) {
+        return EINVAL;
+    }
+
+    *num_segs = 0;
+
+    while ((token = strtok_r(s, ",", &save_ptr)) != NULL) {
+        if (*num_segs == SRV6_MAX_SEGS) {
+            return EINVAL;
+        }
+
+        if (inet_pton(AF_INET6, token, segs) != 1) {
+            return EINVAL;
+        }
+
+        segs++;
+        (*num_segs)++;
+        s = NULL;
+    }
+
+    return 0;
+}
+
 enum tunnel_layers {
     TNL_L2 = 1 << 0,       /* 1 if a tunnel type can carry Ethernet traffic. */
     TNL_L3 = 1 << 1        /* 1 if a tunnel type can carry L3 traffic. */
@@ -443,6 +472,8 @@  tunnel_supported_layers(const char *type,
         return TNL_L3;
     } else if (!strcmp(type, "bareudp")) {
         return TNL_L3;
+    } else if (!strcmp(type, "srv6")) {
+        return TNL_L3;
     } else {
         return TNL_L2;
     }
@@ -750,6 +781,17 @@  set_tunnel_config(struct netdev *dev_, const struct smap *args, char **errp)
                     goto out;
                 }
             }
+        } else if (!strcmp(node->key, "srv6_segs")) {
+            err = parse_srv6_segs(node->value,
+                                  tnl_cfg.srv6_segs,
+                                  &tnl_cfg.srv6_num_segs);
+
+            switch (err) {
+            case EINVAL:
+                ds_put_format(&errors, "%s: bad %s 'srv6_segs'\n",
+                              name, node->value);
+                break;
+            }
         } else if (!strcmp(node->key, "payload_type")) {
             if (!strcmp(node->value, "mpls")) {
                  tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
@@ -1290,6 +1332,17 @@  netdev_vport_tunnel_register(void)
           },
           {{NULL, NULL, 0, 0}}
         },
+        { "srv6_sys",
+          {
+              TUNNEL_FUNCTIONS_COMMON,
+              .type = "srv6",
+              .build_header = netdev_srv6_build_header,
+              .push_header = netdev_srv6_push_header,
+              .pop_header = netdev_srv6_pop_header,
+              .get_ifindex = NETDEV_VPORT_GET_IFINDEX,
+          },
+          {{NULL, NULL, 0, 0}}
+        },
 
     };
     static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
diff --git a/lib/netdev.h b/lib/netdev.h
index acf174927d24..ff207f56c28c 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -140,6 +140,10 @@  struct netdev_tunnel_config {
     bool erspan_idx_flow;
     bool erspan_dir_flow;
     bool erspan_hwid_flow;
+
+    uint8_t srv6_num_segs;
+    #define SRV6_MAX_SEGS 6
+    struct in6_addr srv6_segs[SRV6_MAX_SEGS];
 };
 
 void netdev_run(void);
diff --git a/lib/packets.h b/lib/packets.h
index abff24db016e..312e849f9f26 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -1527,6 +1527,17 @@  BUILD_ASSERT_DECL(sizeof(struct vxlanhdr) == 8);
 #define VXLAN_F_GPE  0x4000
 #define VXLAN_HF_GPE 0x04000000
 
+/* SRv6 protocol header */
+#define IPV6_SRCRT_TYPE_4 4
+#define SRV6_BASE_HDR_LEN 8
+struct srv6_base_hdr {
+    struct ip6_rt_hdr rt_hdr;
+    uint8_t last_entry;
+    uint8_t flags;
+    ovs_be16 tag;
+};
+BUILD_ASSERT_DECL(sizeof(struct srv6_base_hdr) == SRV6_BASE_HDR_LEN);
+
 /* Input values for PACKET_TYPE macros have to be in host byte order.
  * The _BE postfix indicates result is in network byte order. Otherwise result
  * is in host byte order. */
diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c
index da9939afa8a1..297962895f20 100644
--- a/lib/tnl-ports.c
+++ b/lib/tnl-ports.c
@@ -126,7 +126,7 @@  map_insert(odp_port_t port, struct eth_addr mac, struct in6_addr *addr,
          /* XXX: No fragments support. */
         match.wc.masks.nw_frag = FLOW_NW_FRAG_MASK;
 
-        /* 'tp_port' is zero for GRE tunnels. In this case it
+        /* 'tp_port' is zero for GRE and SRv6 tunnels. In this case it
          * doesn't make sense to match on UDP port numbers. */
         if (tp_port) {
             match.wc.masks.tp_dst = OVS_BE16_MAX;
@@ -182,6 +182,10 @@  tnl_type_to_nw_proto(const char type[], uint8_t nw_protos[2])
     if (!strcmp(type, "gtpu")) {
         nw_protos[0] = IPPROTO_UDP;
     }
+    if (!strcmp(type, "srv6")) {
+        nw_protos[0] = IPPROTO_IPIP;
+        nw_protos[1] = IPPROTO_IPV6;
+    }
 }
 
 static void
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index a9cf3cbee0be..15c814d6285b 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3632,6 +3632,9 @@  propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, struct eth_addr dmac,
     case OVS_VPORT_TYPE_BAREUDP:
         nw_proto = IPPROTO_UDP;
         break;
+    case OVS_VPORT_TYPE_SRV6:
+        nw_proto = IPPROTO_IPIP;
+        break;
     case OVS_VPORT_TYPE_LISP:
     case OVS_VPORT_TYPE_STT:
     case OVS_VPORT_TYPE_UNSPEC:
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 822a80618d6f..fb15a5a7ce03 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -202,6 +202,14 @@  m4_define([OVS_CHECK_KERNEL_EXCL],
     AT_SKIP_IF([ ! ( test $version -lt $1 || ( test $version -eq $1 && test $sublevel -lt $2 ) || test $version -gt $3 || ( test $version -eq $3 && test $sublevel -gt $4 ) ) ])
 ])
 
+# OVS_CHECK_SRV6()
+#
+# The kernel datapath does not support this feature.
+m4_define([OVS_CHECK_SRV6],
+[
+    AT_SKIP_IF([:])
+])
+
 # CHECK_LATER_IPV6_FRAGMENTS()
 #
 # Upstream kernels beetween 4.20 and 5.19 are not parsing IPv6 fragments
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 2558f3b24d7d..c023932a1d48 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1164,6 +1164,125 @@  OVS_WAIT_UNTIL([cat p0.pcap | grep -E "IP6 fc00:100::100 > fc00:100::1: GREv0, .
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - ping over srv6 tunnel])
+OVS_CHECK_TUNNEL_TSO()
+OVS_CHECK_SRV6()
+
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0)
+ADD_NAMESPACES(at_ns1)
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.seg6_enabled=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv4.conf.default.forwarding=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.forwarding=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.seg6_enabled=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv4.conf.all.forwarding=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.forwarding=1])
+
+dnl Set up underlay link from host into the namespace 'at_ns0'
+dnl using veth pair. Kernel side tunnel endpoint (SID) is
+dnl 'fc00:a::1/128', so add it to the route.
+ADD_BR([br-underlay])
+ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+AT_CHECK([ip route add fc00:a::1/128 dev br-underlay via fc00::1])
+
+dnl Set up tunnel endpoints on OVS outside the namespace.
+ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [10.100.100.100/24])
+AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
+AT_CHECK([ip route add 10.1.1.0/24 dev br0 via 10.100.100.1])
+AT_CHECK([arp -s 10.100.100.1 aa:55:aa:55:00:01])
+AT_CHECK([ovs-ofctl add-flow br0 in_port=LOCAL,actions=output:at_srv6])
+AT_CHECK([ovs-ofctl add-flow br0 in_port=at_srv6,actions=mod_dl_dst:aa:55:aa:55:00:00,output:LOCAL])
+
+dnl Set up tunnel endpoints on the namespace 'at_ns0',
+dnl and overlay port on the namespace 'at_ns1'
+ADD_VETH_NS([at_ns0], [veth0], [10.1.1.2/24], [at_ns1], [veth1], [10.1.1.1/24])
+NS_CHECK_EXEC([at_ns0], [ip sr tunsrc set fc00:a::1])
+NS_CHECK_EXEC([at_ns0], [ip route add 10.100.100.0/24 encap seg6 mode encap segs fc00::100 dev p0])
+NS_CHECK_EXEC([at_ns0], [ip -6 route add fc00:a::1 encap seg6local action End.DX4 nh4 0.0.0.0 dev veth0])
+NS_CHECK_EXEC([at_ns1], [ip route add 10.100.100.0/24 via 10.1.1.2 dev veth1])
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay.
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.100.100.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 over srv6 tunnel])
+OVS_CHECK_TUNNEL_TSO()
+OVS_CHECK_SRV6()
+
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0)
+ADD_NAMESPACES(at_ns1)
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.seg6_enabled=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.default.forwarding=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.seg6_enabled=1])
+NS_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.forwarding=1])
+
+dnl Set up underlay link from host into the namespace 'at_ns0'
+dnl using veth pair. Kernel side tunnel endpoint (SID) is
+dnl 'fc00:a::1/128', so add it to the route.
+ADD_BR([br-underlay])
+ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+AT_CHECK([ip -6 route add fc00:a::1/128 dev br-underlay via fc00::1])
+
+dnl Set up tunnel endpoints on OVS outside the namespace.
+ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [fc00:100::100/64])
+AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
+dnl [sleep infinity]
+AT_CHECK([ip addr add dev br0 fc00:100::100/64])
+AT_CHECK([ip -6 route add fc00:1::1/128 dev br0 via fc00:100::1])
+AT_CHECK([ip -6 neigh add fc00:100::1 lladdr aa:55:aa:55:00:01 dev br0])
+AT_CHECK([ovs-ofctl add-flow br0 in_port=LOCAL,actions=output:at_srv6])
+AT_CHECK([ovs-ofctl add-flow br0 in_port=at_srv6,actions=mod_dl_dst:aa:55:aa:55:00:00,output:LOCAL])
+
+dnl Set up tunnel endpoints on the namespace 'at_ns0',
+dnl and overlay port on the namespace 'at_ns1'
+ADD_VETH_NS([at_ns0], [veth0], [fc00:1::2/64], [at_ns1], [veth1], [fc00:1::1/64])
+NS_CHECK_EXEC([at_ns0], [ip sr tunsrc set fc00:a::1])
+NS_CHECK_EXEC([at_ns0], [ip -6 route add fc00:100::0/64 encap seg6 mode encap segs fc00::100 dev p0])
+NS_CHECK_EXEC([at_ns0], [ip -6 route add fc00:a::1 encap seg6local action End.DX6 nh6 :: dev veth0])
+NS_CHECK_EXEC([at_ns1], [ip -6 route add fc00:100::/64 via fc00:1::2 dev veth1])
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+OVS_WAIT_UNTIL([ip netns exec at_ns1 ping6 -c 1 fc00:100::100])
+
+dnl First, check the underlay.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay.
+NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - clone action])
 OVS_TRAFFIC_VSWITCHD_START()
 
diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at
index 610fa2e94ae8..482079386a43 100644
--- a/tests/system-userspace-macros.at
+++ b/tests/system-userspace-macros.at
@@ -301,6 +301,12 @@  m4_define([OVS_CHECK_KERNEL_EXCL],
     AT_SKIP_IF([:])
 ])
 
+# OVS_CHECK_SRV6()
+m4_define([OVS_CHECK_SRV6],
+    [AT_SKIP_IF([! ip -6 route add fc00::1/96 encap seg6 mode encap dev lo 2>&1 >/dev/null])
+     AT_CHECK([ip -6 route del fc00::1/96 2>&1 >/dev/null])
+     OVS_CHECK_FIREWALL()])
+
 # CHECK_LATER_IPV6_FRAGMENTS()
 #
 # Userspace is parsing later IPv6 fragments correctly.
diff --git a/tests/tunnel.at b/tests/tunnel.at
index 78cc3f3e99a6..ddeb66bc9fb7 100644
--- a/tests/tunnel.at
+++ b/tests/tunnel.at
@@ -1223,3 +1223,59 @@  AT_CHECK([ovs-vsctl add-port br0 p1 -- set int p1 type=dummy])
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])]
 AT_CLEANUP
+
+AT_SETUP([tunnel - SRV6 basic])
+OVS_VSWITCHD_START([add-port br0 p1 -- set Interface p1 type=dummy \
+                    ofport_request=1 \
+                    -- add-port br0 p2 -- set Interface p2 type=srv6 \
+                    options:remote_ip=flow \
+                    ofport_request=2])
+OVS_VSWITCHD_DISABLE_TUNNEL_PUSH_POP
+
+dnl First setup dummy interface IP address, then add the route
+dnl so that tnl-port table can get valid IP address for the device.
+AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 fc00::1/64], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::0/64 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/show], [0], [dnl
+Route Table:
+User: fc00::/64 dev br0 SRC fc00::1
+])
+
+AT_DATA([flows.txt], [dnl
+in_port=1,actions=set_field:fc00::2->tun_ipv6_dst,output:2
+in_port=2,actions=1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-appctl dpif/show | tail -n +3], [0], [dnl
+    br0 65534/100: (dummy-internal)
+    p1 1/1: (dummy)
+    p2 2/6: (srv6: remote_ip=flow)
+])
+
+AT_CHECK([ovs-appctl tnl/ports/show |sort], [0], [dnl
+Listening ports:
+srv6_sys (6) ref_cnt=1
+srv6_sys (6) ref_cnt=1
+])
+
+AT_CHECK([ovs-appctl ofproto/list-tunnels], [0], [dnl
+port 6: p2 (srv6: ::->flow, key=0, legacy_l3, dp port=6, ttl=64)
+])
+
+dnl Encap: ipv4 inner packet
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=4,ttl=128,frag=no),tcp(src=8,dst=9)'], [0], [stdout])
+AT_CHECK([tail -1 stdout], [0],
+  [Datapath actions: set(tunnel(ipv6_dst=fc00::2,ttl=64,flags(df))),pop_eth,6
+])
+
+dnl Encap: ipv6 inner packet
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x86dd),ipv6(src=2001:cafe::92,dst=2001:cafe::88,label=0,proto=47,tclass=0x0,hlimit=64)'], [0], [stdout])
+AT_CHECK([tail -1 stdout], [0],
+  [Datapath actions: set(tunnel(ipv6_dst=fc00::2,ttl=64,flags(df))),pop_eth,6
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP