diff mbox series

[ovs-dev] RFC: Adding a new TLV for transferring the route table.

Message ID 20251218165808.70904-1-arukomoinikova@k2.cloud
State New
Headers show
Series [ovs-dev] RFC: Adding a new TLV for transferring the route table. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Alexandra Rukomoinikova Dec. 18, 2025, 4:58 p.m. UTC
The patch below is very messy and only a proof of concept. Its further development is
impossible without solving the issue with registers, which I described in the cover letter.

Problem that the patch solves:
Currently, for each routing table we need to create a separate switch,
because the routing table is determined by the port from which the traffic arrived.

As shown in the diagram below, for cross-AZ deployment.
      +---------------+                         +---------------+
      |  subnet 1     |                         |  subnet 2     |
      | route table 1 |                         | route table 2 |
      +-------+-------+                         +-------+-------+
              |                                       |
              |               +---------------+       |
              +-------------->|    Router     |<------+
                              +-------+-------+
                                      |
                    +-----------------+----------+
                    |                            |
              +-----+--------+            +------+-------+
              | TS Router    |            | TS Router    |
              |  Table 1     |            |   Table 2    |
              +-----+-------+              +------+-------+
                                   AZ2

After these changes:
The route selector will store the route table and send it to the tunnel in the
GENEVE header option field.
Based on this information, when processing on the router in the second availability zone,
the correct routing table will be selected, and a separate switch for each route
table will no longer be needed.
In the future, this functionality will be hidden behind an optional flag
to avoid breaking offloading in regular deployments.

Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud>
---
 controller/ofctrl.c          |  45 ++++++++++++---
 controller/ofctrl.h          |   3 +-
 controller/ovn-controller.c  |  22 +++++---
 controller/physical.c        | 104 +++++++++++++++++++++++++----------
 controller/physical.h        |   7 +++
 include/ovn/logical-fields.h |   2 +
 lib/logical-fields.c         |   2 +
 northd/northd.c              |  14 +++--
 8 files changed, 149 insertions(+), 50 deletions(-)

Comments

Alexandra Rukomoinikova Dec. 18, 2025, 5:05 p.m. UTC | #1
Hi everyone!

I added maintainers to copy because I really need your opinion)

It seems we are already exceeding the number of available registers in 
Open vSwitch, at least at the controller level.

Currently, all registers accessible to the controller are already in 
use. Earlier in this thread, I sent RFC patch, which at the moment is 
nothing more than a messy proof of concept, precisely because I ran into 
the issue of lacking registers at the controller level. My goal is 
passing register values between pipelines, which can be possible only 
with ovn-controller registers. For now, I've implemented this through  a 
very dirty workaround using registers from northd, which is just 
terrible, but I couldn't find another option.

What do you think: should we expand the number of registers in Open 
vSwitch? It seems it's time! If it seems it's not yet the right moment, 
should I add a separate mff_ field in OVS? I initially dismissed this 
idea because transferring a routing table doesn't really fit into the 
OpenFlow concept.

Thanks in advance! If we agree that it's time to expand the registers, I 
would like to take this on myself.
On 18.12.2025 19:58, Alexandra Rukomoinikova wrote:
> The patch below is very messy and only a proof of concept. Its further development is
> impossible without solving the issue with registers, which I described in the cover letter.
>
> Problem that the patch solves:
> Currently, for each routing table we need to create a separate switch,
> because the routing table is determined by the port from which the traffic arrived.
>
> As shown in the diagram below, for cross-AZ deployment.
>        +---------------+                         +---------------+
>        |  subnet 1     |                         |  subnet 2     |
>        | route table 1 |                         | route table 2 |
>        +-------+-------+                         +-------+-------+
>                |                                       |
>                |               +---------------+       |
>                +-------------->|    Router     |<------+
>                                +-------+-------+
>                                        |
>                      +-----------------+----------+
>                      |                            |
>                +-----+--------+            +------+-------+
>                | TS Router    |            | TS Router    |
>                |  Table 1     |            |   Table 2    |
>                +-----+-------+              +------+-------+
>                                     AZ2
>
> After these changes:
> The route selector will store the route table and send it to the tunnel in the
> GENEVE header option field.
> Based on this information, when processing on the router in the second availability zone,
> the correct routing table will be selected, and a separate switch for each route
> table will no longer be needed.
> In the future, this functionality will be hidden behind an optional flag
> to avoid breaking offloading in regular deployments.
>
> Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud>
> ---
>   controller/ofctrl.c          |  45 ++++++++++++---
>   controller/ofctrl.h          |   3 +-
>   controller/ovn-controller.c  |  22 +++++---
>   controller/physical.c        | 104 +++++++++++++++++++++++++----------
>   controller/physical.h        |   7 +++
>   include/ovn/logical-fields.h |   2 +
>   lib/logical-fields.c         |   2 +
>   northd/northd.c              |  14 +++--
>   8 files changed, 149 insertions(+), 50 deletions(-)
>
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index 9f3ce0368..e35c034ac 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -399,6 +399,7 @@ static void ofctrl_meter_bands_clear(void);
>    * S_CLEAR_FLOWS or S_UPDATE_FLOWS, this is really the option we have. */
>   static enum mf_field_id mff_ovn_geneve;
>   
> +static enum mf_field_id mff_ovn_geneve_route_selector;
>   /* Indicates if we just went through the S_CLEAR_FLOWS state, which means we
>    * need to perform a one time deletion for all the existing flows, groups and
>    * meters. This can happen during initialization or OpenFlow reconnection
> @@ -505,6 +506,13 @@ process_tlv_table_reply(const struct ofputil_tlv_table_reply *reply)
>               } else {
>                   mff_ovn_geneve = MFF_TUN_METADATA0 + map->index;
>                   state = S_WAIT_BEFORE_CLEAR;
> +                /* alocate next index for route selector tlv */
> +                /* TODO: handle case of setting this option
> +                   from the command line. */
> +                if (ovs_list_size(&reply->mappings) > map->index + 1) {
> +                    mff_ovn_geneve_route_selector =
> +                        MFF_TUN_METADATA0 + map->index + 1;
> +                }
>                   return true;
>               }
>           }
> @@ -520,18 +528,28 @@ process_tlv_table_reply(const struct ofputil_tlv_table_reply *reply)
>           return false;
>       }
>   
> +    /* TODO: This works but its look ugly, remove code duplicate */
>       unsigned int index = rightmost_1bit_idx(md_free);
>       mff_ovn_geneve = MFF_TUN_METADATA0 + index;
> -    struct ofputil_tlv_map tm;
> -    tm.option_class = OVN_GENEVE_CLASS;
> -    tm.option_type = OVN_GENEVE_TYPE;
> -    tm.option_len = OVN_GENEVE_LEN;
> -    tm.index = index;
> +    struct ofputil_tlv_map tm_ovn_geneve;
> +    tm_ovn_geneve.option_class = OVN_GENEVE_CLASS;
> +    tm_ovn_geneve.option_type = OVN_GENEVE_TYPE;
> +    tm_ovn_geneve.option_len = OVN_GENEVE_LEN;
> +    tm_ovn_geneve.index = index;
> +    index++;
> +
> +    mff_ovn_geneve_route_selector = MFF_TUN_METADATA0 + index;
> +    struct ofputil_tlv_map tm_route_selector;
> +    tm_route_selector.option_class = OVN_GENEVE_ROUTE_SELECTOR_CLASS;
> +    tm_route_selector.option_type = OVN_GENEVE_ROUTE_SELECTOR_TYPE;
> +    tm_route_selector.option_len = OVN_GENEVE_ROUTE_SELECTOR_LEN;
> +    tm_route_selector.index = index;
>   
>       struct ofputil_tlv_table_mod ttm;
>       ttm.command = NXTTMC_ADD;
>       ovs_list_init(&ttm.mappings);
> -    ovs_list_push_back(&ttm.mappings, &tm.list_node);
> +    ovs_list_push_back(&ttm.mappings, &tm_ovn_geneve.list_node);
> +    ovs_list_push_back(&ttm.mappings, &tm_route_selector.list_node);
>   
>       xid = queue_msg(ofputil_encode_tlv_table_mod(OFP15_VERSION, &ttm));
>       xid2 = queue_msg(ofputil_encode_barrier_request(OFP15_VERSION));
> @@ -572,6 +590,7 @@ recv_S_TLV_TABLE_REQUESTED(const struct ofp_header *oh, enum ofptype type,
>   
>       /* Error path. */
>       mff_ovn_geneve = 0;
> +    mff_ovn_geneve_route_selector = 0;
>       state = S_WAIT_BEFORE_CLEAR;
>   }
>   
> @@ -768,7 +787,7 @@ recv_S_UPDATE_FLOWS(const struct ofp_header *oh, enum ofptype type,
>   
>   
>   enum mf_field_id
> -ofctrl_get_mf_field_id(void)
> +ofctrl_get_mf_field_id_ovn_geneve_base(void)
>   {
>       if (!rconn_is_connected(swconn)) {
>           return 0;
> @@ -779,6 +798,18 @@ ofctrl_get_mf_field_id(void)
>               ? mff_ovn_geneve : 0);
>   }
>   
> +enum mf_field_id
> +ofctrl_get_mf_field_id_ovn_geneve_route_selector(void)
> +{
> +    if (!rconn_is_connected(swconn)) {
> +        return 0;
> +    }
> +    return (state == S_WAIT_BEFORE_CLEAR
> +            || state == S_CLEAR_FLOWS
> +            || state == S_UPDATE_FLOWS
> +            ? mff_ovn_geneve_route_selector : 0);
> +}
> +
>   /* Runs the OpenFlow state machine against 'br_int', which is local to the
>    * hypervisor on which we are running.  Attempts to negotiate a Geneve option
>    * field for class OVN_GENEVE_CLASS, type OVN_GENEVE_TYPE.
> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> index abd2ff1c9..5b004a298 100644
> --- a/controller/ofctrl.h
> +++ b/controller/ofctrl.h
> @@ -57,7 +57,8 @@ bool ofctrl_run(const char *conn_target, int probe_interval,
>                   const struct ovsrec_open_vswitch_table *ovs_table,
>                   struct shash *pending_ct_zones,
>                   struct tracked_acl_ids *tracked_acl_ids);
> -enum mf_field_id ofctrl_get_mf_field_id(void);
> +enum mf_field_id ofctrl_get_mf_field_id_ovn_geneve_base(void);
> +enum mf_field_id ofctrl_get_mf_field_id_ovn_geneve_route_selector(void);
>   void ofctrl_put(struct ovn_desired_flow_table *lflow_table,
>                   struct ovn_desired_flow_table *pflow_table,
>                   struct shash *pending_ct_zones,
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 2d9b3e033..dedb76ff9 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -2471,6 +2471,7 @@ en_ct_zones_is_valid(struct engine_node *node OVS_UNUSED)
>   
>   struct ed_type_mff_ovn_geneve {
>       enum mf_field_id mff_ovn_geneve;
> +    enum mf_field_id mff_ovn_geneve_route_selector;
>   };
>   
>   static void *
> @@ -2489,13 +2490,18 @@ en_mff_ovn_geneve_cleanup(void *data OVS_UNUSED)
>   static enum engine_node_state
>   en_mff_ovn_geneve_run(struct engine_node *node OVS_UNUSED, void *data)
>   {
> -    struct ed_type_mff_ovn_geneve *ed_mff_ovn_geneve = data;
> -    enum mf_field_id mff_ovn_geneve = ofctrl_get_mf_field_id();
> -    if (ed_mff_ovn_geneve->mff_ovn_geneve != mff_ovn_geneve) {
> -        ed_mff_ovn_geneve->mff_ovn_geneve = mff_ovn_geneve;
> -        return EN_UPDATED;
> -    }
> -    return EN_UNCHANGED;
> +    struct ed_type_mff_ovn_geneve *ed = data;
> +    enum mf_field_id old_geneve = ed->mff_ovn_geneve;
> +    enum mf_field_id old_selector = ed->mff_ovn_geneve_route_selector;
> +
> +    ed->mff_ovn_geneve =
> +        ofctrl_get_mf_field_id_ovn_geneve_base();
> +    ed->mff_ovn_geneve_route_selector =
> +        ofctrl_get_mf_field_id_ovn_geneve_route_selector();
> +
> +    return (ed->mff_ovn_geneve != old_geneve ? EN_UPDATED : 0)
> +            | (ed->mff_ovn_geneve_route_selector != old_selector
> +            ? EN_UPDATED : 0);
>   }
>   
>   /* Stores the load balancers that are applied to the datapath 'dp'. */
> @@ -4649,6 +4655,8 @@ static void init_physical_ctx(struct engine_node *node,
>       p_ctx->local_datapaths = &rt_data->local_datapaths;
>       p_ctx->ct_zones = ct_zones;
>       p_ctx->mff_ovn_geneve = ed_mff_ovn_geneve->mff_ovn_geneve;
> +    p_ctx->mff_ovn_geneve_route_selector =
> +        ed_mff_ovn_geneve->mff_ovn_geneve_route_selector;
>       p_ctx->local_bindings = &rt_data->lbinding_data.bindings;
>       p_ctx->patch_ofports = &non_vif_data->patch_ofports;
>       p_ctx->chassis_tunnels = &non_vif_data->chassis_tunnels;
> diff --git a/controller/physical.c b/controller/physical.c
> index 2683f2d97..9bbb28e1e 100644
> --- a/controller/physical.c
> +++ b/controller/physical.c
> @@ -147,6 +147,7 @@ get_port_binding_tun(const struct sbrec_encap *remote_encap,
>   
>   static void
>   put_encapsulation(enum mf_field_id mff_ovn_geneve,
> +                  enum mf_field_id mff_ovn_geneve_route_selector,
>                     const struct chassis_tunnel *tun,
>                     const struct sbrec_datapath_binding *datapath,
>                     uint16_t outport, bool is_ramp_switch,
> @@ -156,6 +157,8 @@ put_encapsulation(enum mf_field_id mff_ovn_geneve,
>           put_load(datapath->tunnel_key, MFF_TUN_ID, 0, 24, ofpacts);
>           put_load(outport, mff_ovn_geneve, 0, 32, ofpacts);
>           put_move(MFF_LOG_INPORT, 0, mff_ovn_geneve, 16, 15, ofpacts);
> +        put_move(MFF_LOG_ROUTE_SELECTOR, 0, mff_ovn_geneve_route_selector,
> +                 0, 16, ofpacts);
>       } else if (tun->type == VXLAN) {
>           uint64_t vni = datapath->tunnel_key;
>           if (!is_ramp_switch) {
> @@ -171,6 +174,7 @@ put_encapsulation(enum mf_field_id mff_ovn_geneve,
>   
>   static void
>   put_decapsulation(enum mf_field_id mff_ovn_geneve,
> +                  enum mf_field_id mff_ovn_geneve_route_selector,
>                     const struct chassis_tunnel *tun,
>                     struct ofpbuf *ofpacts)
>   {
> @@ -178,8 +182,9 @@ put_decapsulation(enum mf_field_id mff_ovn_geneve,
>           put_move(MFF_TUN_ID, 0,  MFF_LOG_DATAPATH, 0, 24, ofpacts);
>           put_move(mff_ovn_geneve, 16, MFF_LOG_INPORT, 0, 15, ofpacts);
>           put_move(mff_ovn_geneve, 0, MFF_LOG_OUTPORT, 0, 16, ofpacts);
> -        put_load(ofp_to_u16(tun->ofport), MFF_LOG_TUN_OFPORT,
> -                 16, 16, ofpacts);
> +        put_move(mff_ovn_geneve_route_selector, 0,
> +                 MFF_LOG_ROUTE_SELECTOR, 0, 16, ofpacts);
> +        put_load(ofp_to_u16(tun->ofport), MFF_LOG_TUN_OFPORT, 16, 16, ofpacts);
>       } else if (tun->type == VXLAN) {
>           /* Add flows for non-VTEP tunnels. Split VNI into two 12-bit
>            * sections and use them for datapath and outport IDs. */
> @@ -255,6 +260,7 @@ put_set_tunnel_ip(const char *ip, bool is_src, struct ofpbuf *ofpacts)
>   /* Flow-based encapsulation that sets tunnel metadata and endpoint IPs. */
>   static void
>   put_flow_based_encapsulation(enum mf_field_id mff_ovn_geneve,
> +                             enum mf_field_id mff_ovn_geneve_route_selector,
>                                enum chassis_tunnel_type tunnel_type,
>                                const char *local_ip, const char *remote_ip,
>                                const struct sbrec_datapath_binding *datapath,
> @@ -264,8 +270,10 @@ put_flow_based_encapsulation(enum mf_field_id mff_ovn_geneve,
>       struct chassis_tunnel temp_tun = {
>           .type = tunnel_type,
>       };
> -    put_encapsulation(mff_ovn_geneve, &temp_tun, datapath,
> -                      outport, is_ramp_switch, ofpacts);
> +    put_encapsulation(mff_ovn_geneve,
> +                      mff_ovn_geneve_route_selector,
> +                      &temp_tun, datapath, outport,
> +                      is_ramp_switch, ofpacts);
>   
>       /* Set tunnel source and destination IPs (flow-based specific) */
>       put_set_tunnel_ip(local_ip, true, ofpacts);
> @@ -335,10 +343,12 @@ put_flow_based_remote_port_redirect_overlay(
>           }
>   
>           /* Set flow-based tunnel encapsulation. */
> -        put_flow_based_encapsulation(ctx->mff_ovn_geneve, tunnel_type,
> -                                     local_encap_ip, remote_ip,
> -                                     binding->datapath, port_key,
> -                                     is_vtep_port, ofpacts_clone);
> +        put_flow_based_encapsulation(ctx->mff_ovn_geneve,
> +                                     ctx->mff_ovn_geneve_route_selector,
> +                                     tunnel_type, local_encap_ip,
> +                                     remote_ip, binding->datapath,
> +                                     port_key, is_vtep_port,
> +                                     ofpacts_clone);
>   
>           ofpact_put_OUTPUT(ofpacts_clone)->port = flow_port;
>           put_resubmit(OFTABLE_LOCAL_OUTPUT, ofpacts_clone);
> @@ -354,6 +364,7 @@ put_flow_based_remote_port_redirect_overlay(
>   static void
>   add_tunnel_ingress_flows(const struct chassis_tunnel *tun,
>                            enum mf_field_id mff_ovn_geneve,
> +                         enum mf_field_id mff_ovn_geneve_route_selector,
>                            struct ovn_desired_flow_table *flow_table,
>                            struct ofpbuf *ofpacts)
>   {
> @@ -362,7 +373,8 @@ add_tunnel_ingress_flows(const struct chassis_tunnel *tun,
>       match_set_in_port(&match, tun->ofport);
>   
>       ofpbuf_clear(ofpacts);
> -    put_decapsulation(mff_ovn_geneve, tun, ofpacts);
> +    put_decapsulation(mff_ovn_geneve, mff_ovn_geneve_route_selector,
> +                      tun, ofpacts);
>       put_resubmit(OFTABLE_LOCAL_OUTPUT, ofpacts);
>   
>       ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 100, 0, &match,
> @@ -636,8 +648,10 @@ put_port_based_remote_port_redirect_overlay(
>   
>               const struct chassis_tunnel *tun;
>               VECTOR_FOR_EACH (&tuns, tun) {
> -                put_encapsulation(ctx->mff_ovn_geneve, tun, binding->datapath,
> -                                  port_key, is_vtep_port, ofpacts_clone);
> +                put_encapsulation(ctx->mff_ovn_geneve,
> +                                  ctx->mff_ovn_geneve_route_selector,
> +                                  tun, binding->datapath, port_key,
> +                                  is_vtep_port, ofpacts_clone);
>                   ofpact_put_OUTPUT(ofpacts_clone)->port = tun->ofport;
>               }
>               put_resubmit(OFTABLE_REMOTE_VTEP_OUTPUT, ofpacts_clone);
> @@ -909,7 +923,9 @@ put_remote_port_redirect_overlay_ha_remote(
>       const struct sbrec_port_binding *binding,
>       const enum en_lport_type type,
>       struct ha_chassis_ordered *ha_ch_ordered,
> -    enum mf_field_id mff_ovn_geneve, uint32_t port_key,
> +    enum mf_field_id mff_ovn_geneve,
> +    enum mf_field_id mff_ovn_geneve_route_selector,
> +    uint32_t port_key,
>       struct match *match, struct ofpbuf *ofpacts_p,
>       const struct hmap *chassis_tunnels,
>       struct ovn_desired_flow_table *flow_table)
> @@ -947,9 +963,11 @@ put_remote_port_redirect_overlay_ha_remote(
>           return;
>       }
>   
> -    put_encapsulation(mff_ovn_geneve, tun, binding->datapath, port_key,
> +    put_encapsulation(mff_ovn_geneve, mff_ovn_geneve_route_selector,
> +                      tun, binding->datapath, port_key,
>                         type == LP_VTEP, ofpacts_p);
>   
> +
>       /* Output to tunnels with active/backup */
>       struct ofpact_bundle *bundle = ofpact_put_BUNDLE(ofpacts_p);
>   
> @@ -2141,7 +2159,9 @@ enforce_tunneling_for_multichassis_ports(
>   
>           const struct chassis_tunnel *tun;
>           VECTOR_FOR_EACH (&tuns, tun) {
> -            put_encapsulation(ctx->mff_ovn_geneve, tun, binding->datapath,
> +            put_encapsulation(ctx->mff_ovn_geneve,
> +                              ctx->mff_ovn_geneve_route_selector,
> +                              tun, binding->datapath,
>                                 port_key, is_vtep_port, &ofpacts);
>               ofpact_put_OUTPUT(&ofpacts)->port = tun->ofport;
>           }
> @@ -2216,6 +2236,10 @@ consider_port_binding(const struct physical_ctx *ctx,
>           put_load(0, MFF_LOG_FLAGS, 0, 32, ofpacts_p);
>           put_load(0, MFF_LOG_OUTPORT, 0, 32, ofpacts_p);
>           for (int i = 0; i < MFF_N_LOG_REGS; i++) {
> +            /* figure out issue with register and remove this mess */
> +            if (i == 3) {
> +                continue;
> +            }
>               put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
>           }
>           put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
> @@ -2670,7 +2694,8 @@ consider_port_binding(const struct physical_ctx *ctx,
>               &match, ofpacts_p, flow_table);
>       } else if (access_type == PORT_HA_REMOTE) {
>           put_remote_port_redirect_overlay_ha_remote(
> -            binding, type, ha_ch_ordered, ctx->mff_ovn_geneve, port_key,
> +            binding, type, ha_ch_ordered, ctx->mff_ovn_geneve,
> +            ctx->mff_ovn_geneve_route_selector, port_key,
>               &match, ofpacts_p, ctx->chassis_tunnels, flow_table);
>       } else {
>           put_remote_port_redirect_overlay(
> @@ -2697,6 +2722,7 @@ get_vxlan_port_key(int64_t port_key)
>   /* Encapsulate and send to a single remote chassis. */
>   static void
>   tunnel_to_chassis(enum mf_field_id mff_ovn_geneve,
> +                  enum mf_field_id mff_ovn_geneve_route_selector,
>                     const char *chassis_name,
>                     const struct hmap *chassis_tunnels,
>                     const struct sbrec_datapath_binding *datapath,
> @@ -2708,8 +2734,8 @@ tunnel_to_chassis(enum mf_field_id mff_ovn_geneve,
>           return;
>       }
>   
> -    put_encapsulation(mff_ovn_geneve, tun, datapath, outport, false,
> -                      remote_ofpacts);
> +    put_encapsulation(mff_ovn_geneve, mff_ovn_geneve_route_selector,
> +                      tun, datapath, outport, false, remote_ofpacts);
>       ofpact_put_OUTPUT(remote_ofpacts)->port = tun->ofport;
>   }
>   
> @@ -2788,8 +2814,10 @@ fanout_to_chassis_flow_based(const struct physical_ctx *ctx,
>                   .ofport = flow_port,
>                   .type = tunnel_type
>               };
> -            put_encapsulation(ctx->mff_ovn_geneve, &temp_tun, datapath,
> -                              outport, is_ramp_switch, remote_ofpacts);
> +            put_encapsulation(ctx->mff_ovn_geneve,
> +                              ctx->mff_ovn_geneve_route_selector,
> +                              &temp_tun, datapath, outport,
> +                              is_ramp_switch, remote_ofpacts);
>               prev_type = tunnel_type;
>           }
>   
> @@ -2803,6 +2831,7 @@ fanout_to_chassis_flow_based(const struct physical_ctx *ctx,
>   /* Encapsulate and send to a set of remote chassis (port-based tunnels). */
>   static void
>   fanout_to_chassis_port_based(enum mf_field_id mff_ovn_geneve,
> +                             enum mf_field_id mff_ovn_geneve_route_selector,
>                                struct sset *remote_chassis,
>                                const struct hmap *chassis_tunnels,
>                                const struct sbrec_datapath_binding *datapath,
> @@ -2819,8 +2848,10 @@ fanout_to_chassis_port_based(enum mf_field_id mff_ovn_geneve,
>           }
>   
>           if (!prev || tun->type != prev->type) {
> -            put_encapsulation(mff_ovn_geneve, tun, datapath,
> -                              outport, is_ramp_switch, remote_ofpacts);
> +            put_encapsulation(mff_ovn_geneve,
> +                              mff_ovn_geneve_route_selector,
> +                              tun, datapath, outport, is_ramp_switch,
> +                              remote_ofpacts);
>               prev = tun;
>           }
>           ofpact_put_OUTPUT(remote_ofpacts)->port = tun->ofport;
> @@ -2974,7 +3005,9 @@ consider_mc_group(const struct physical_ctx *ctx,
>               if (port->chassis) {
>                   put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32,
>                            &remote_ctx->ofpacts);
> -                tunnel_to_chassis(ctx->mff_ovn_geneve, port->chassis->name,
> +                tunnel_to_chassis(ctx->mff_ovn_geneve,
> +                                  ctx->mff_ovn_geneve_route_selector,
> +                                  port->chassis->name,
>                                     ctx->chassis_tunnels, mc->datapath,
>                                     port->tunnel_key, &remote_ctx->ofpacts);
>               }
> @@ -3055,11 +3088,15 @@ consider_mc_group(const struct physical_ctx *ctx,
>           VLOG_DBG("Using port-based tunnels for multicast group %s "
>                    "(tunnel_key=%"PRId64") with %"PRIuSIZE" remote chassis",
>                    mc->name, mc->tunnel_key, sset_count(&remote_chassis));
> -        fanout_to_chassis_port_based(ctx->mff_ovn_geneve, &remote_chassis,
> +        fanout_to_chassis_port_based(ctx->mff_ovn_geneve,
> +                                     ctx->mff_ovn_geneve_route_selector,
> +                                     &remote_chassis,
>                                        ctx->chassis_tunnels, mc->datapath,
>                                        mc->tunnel_key, false,
>                                        &remote_ctx->ofpacts);
> -        fanout_to_chassis_port_based(ctx->mff_ovn_geneve, &vtep_chassis,
> +        fanout_to_chassis_port_based(ctx->mff_ovn_geneve,
> +                                     ctx->mff_ovn_geneve_route_selector,
> +                                     &vtep_chassis,
>                                        ctx->chassis_tunnels, mc->datapath,
>                                        mc->tunnel_key, true,
>                                        &remote_ctx->ofpacts);
> @@ -3130,7 +3167,9 @@ physical_eval_remote_chassis_flows(const struct physical_ctx *ctx,
>           ofpbuf_clear(&ingress_ofpacts);
>           put_load(1, MFF_LOG_FLAGS, MLF_RX_FROM_TUNNEL_BIT, 1,
>                    &ingress_ofpacts);
> -        put_decapsulation(ctx->mff_ovn_geneve, tun, &ingress_ofpacts);
> +        put_decapsulation(ctx->mff_ovn_geneve,
> +                          ctx->mff_ovn_geneve_route_selector,
> +                          tun, &ingress_ofpacts);
>           put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, &ingress_ofpacts);
>           if (tun->type == VXLAN) {
>               /* VXLAN doesn't carry the inport information, we cannot set
> @@ -3709,8 +3748,9 @@ physical_run(struct physical_ctx *p_ctx,
>        * packets to the local hypervisor. */
>       struct chassis_tunnel *tun;
>       HMAP_FOR_EACH (tun, hmap_node, p_ctx->chassis_tunnels) {
> -        add_tunnel_ingress_flows(tun, p_ctx->mff_ovn_geneve, flow_table,
> -                                &ofpacts);
> +        add_tunnel_ingress_flows(tun, p_ctx->mff_ovn_geneve,
> +                                 p_ctx->mff_ovn_geneve_route_selector,
> +                                 flow_table, &ofpacts);
>       }
>   
>       /* Process packets that arrive from flow-based tunnels. */
> @@ -3733,8 +3773,10 @@ physical_run(struct physical_ctx *p_ctx,
>                        "type=%s", p_ctx->flow_tunnels[i].ofport,
>                        i == GENEVE ? "geneve" : "vxlan");
>   
> -            add_tunnel_ingress_flows(&temp_tunnel, p_ctx->mff_ovn_geneve,
> -                                    flow_table, &ofpacts);
> +            add_tunnel_ingress_flows(&temp_tunnel,
> +                                     p_ctx->mff_ovn_geneve,
> +                                     p_ctx->mff_ovn_geneve_route_selector,
> +                                     flow_table, &ofpacts);
>           }
>       }
>   
> @@ -3904,6 +3946,10 @@ physical_run(struct physical_ctx *p_ctx,
>       match_init_catchall(&match);
>       ofpbuf_clear(&ofpacts);
>       for (int i = 0; i < MFF_N_LOG_REGS; i++) {
> +        /* figure out issue with register and remove this mess */
> +        if (i == 3) {
> +            continue;
> +        }
>           put_load(0, MFF_REG0 + i, 0, 32, &ofpacts);
>       }
>       put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, &ofpacts);
> diff --git a/controller/physical.h b/controller/physical.h
> index c7a33bd02..7997ba913 100644
> --- a/controller/physical.h
> +++ b/controller/physical.h
> @@ -43,6 +43,12 @@ struct local_nonvif_data;
>   #define OVN_GENEVE_TYPE 0x80     /* Critical option. */
>   #define OVN_GENEVE_LEN 4
>   
> +/* Assigned Geneve class for OVN route selector. */
> +#define OVN_GENEVE_ROUTE_SELECTOR_CLASS 0x0101
> +#define OVN_GENEVE_ROUTE_SELECTOR_TYPE 0x81
> + /* in used first 16 bit */
> +#define OVN_GENEVE_ROUTE_SELECTOR_LEN 4
> +
>   struct physical_debug {
>       uint32_t collector_set_id;
>       uint32_t obs_domain_id;
> @@ -63,6 +69,7 @@ struct physical_ctx {
>       struct hmap *local_datapaths;
>       const struct shash *ct_zones;
>       enum mf_field_id mff_ovn_geneve;
> +    enum mf_field_id mff_ovn_geneve_route_selector;
>       struct shash *local_bindings;
>       struct simap *patch_ofports;
>       struct hmap *chassis_tunnels;
> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
> index f0d34196a..4ed827e08 100644
> --- a/include/ovn/logical-fields.h
> +++ b/include/ovn/logical-fields.h
> @@ -30,6 +30,7 @@ enum ovn_controller_event {
>    *
>    * These values are documented in ovn-architecture(7), please update the
>    * documentation if you change any of them. */
> + /* Logical datapath (64 bits  */
>   #define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */
>   #define MFF_LOG_FLAGS      MFF_REG10  /* One of MLF_* (32 bits). */
>   #define MFF_LOG_DNAT_ZONE  MFF_REG11  /* conntrack dnat zone for gateway router
> @@ -43,6 +44,7 @@ enum ovn_controller_event {
>   #define MFF_LOG_INPORT     MFF_REG14  /* Logical input port (32 bits). */
>   #define MFF_LOG_OUTPORT    MFF_REG15  /* Logical output port (32 bits). */
>   #define MFF_LOG_TUN_OFPORT MFF_REG5   /* 16..31 of the 32 bits */
> +#define MFF_LOG_ROUTE_SELECTOR MFF_REG7
>   
>   /* Logical registers.
>    *
> diff --git a/lib/logical-fields.c b/lib/logical-fields.c
> index c8bddcdc5..978b3468f 100644
> --- a/lib/logical-fields.c
> +++ b/lib/logical-fields.c
> @@ -71,6 +71,8 @@ ovn_init_symtab(struct shash *symtab)
>        * doesn't yet support string fields that occupy less than a full OXM. */
>       expr_symtab_add_string(symtab, "inport", MFF_LOG_INPORT, NULL);
>       expr_symtab_add_string(symtab, "outport", MFF_LOG_OUTPORT, NULL);
> +    expr_symtab_add_field(symtab, "route_selector",
> +                          MFF_LOG_ROUTE_SELECTOR, NULL, false);
>   
>       /* The port isn't reserved along the pipeline it's just defined as symbol
>        * to support matching on string and moving between string registers. */
> diff --git a/northd/northd.c b/northd/northd.c
> index c3c0780a3..17fa93346 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -217,7 +217,6 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2));
>   #define REG_SRC_IPV6 "xxreg1"
>   #define REG_DHCP_RELAY_DIP_IPV4 "reg2"
>   #define REG_POLICY_CHAIN_ID "reg9[16..31]"
> -#define REG_ROUTE_TABLE_ID "reg7"
>   
>   /* Registers used for pasing observability information for switches:
>    * domain and point ID. */
> @@ -11607,9 +11606,10 @@ build_route_table_lflow(struct ovn_datapath *od, struct lflow_table *lflows,
>           return;
>       }
>   
> -    ds_put_format(&match, "inport == \"%s\"", lrp->name);
> -    ds_put_format(&actions, "%s = %d; next;",
> -                  REG_ROUTE_TABLE_ID, rtb_id);
> +    /* Don't overwrite route_selector if we recirved it */
> +    ds_put_format(&match, "route_selector == 0 && inport == \"%s\"",
> +                  lrp->name);
> +    ds_put_format(&actions, "route_selector = %d; next;", rtb_id);
>   
>       ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING_PRE, 100,
>                     ds_cstr(&match), ds_cstr(&actions), lflow_ref);
> @@ -12081,14 +12081,16 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id,
>       if (op_inport) {
>           ds_put_format(match, "inport == %s && ", op_inport->json_key);
>       }
> +
>       if (rtb_id || source == ROUTE_SOURCE_STATIC ||
>               source == ROUTE_SOURCE_LEARNED) {
> -        ds_put_format(match, "%s == %d && ", REG_ROUTE_TABLE_ID, rtb_id);
> +        ds_put_format(match, "route_selector == %d && ", rtb_id);
>       }
>   
>       if (has_protocol_match) {
>           ofs += 1;
>       }
> +
>       *priority = (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs;
>   
>       ds_put_format(match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", dir,
> @@ -14426,7 +14428,7 @@ build_ip_routing_pre_flows_for_lrouter(struct ovn_datapath *od,
>   {
>       ovs_assert(od->nbr);
>       ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING_PRE, 0, "1",
> -                  REG_ROUTE_TABLE_ID" = 0; next;", lflow_ref);
> +                  "next;", lflow_ref);
>   }
>   
>   static void
Ilya Maximets Dec. 19, 2025, 12:18 p.m. UTC | #2
On 12/18/25 6:05 PM, Rukomoinikova Aleksandra wrote:
> Hi everyone!
> 
> I added maintainers to copy because I really need your opinion)
> 
> It seems we are already exceeding the number of available registers in 
> Open vSwitch, at least at the controller level.
> 
> Currently, all registers accessible to the controller are already in 
> use. Earlier in this thread, I sent RFC patch, which at the moment is 
> nothing more than a messy proof of concept, precisely because I ran into 
> the issue of lacking registers at the controller level. My goal is 
> passing register values between pipelines, which can be possible only 
> with ovn-controller registers. For now, I've implemented this through  a 
> very dirty workaround using registers from northd, which is just 
> terrible, but I couldn't find another option.
> 
> What do you think: should we expand the number of registers in Open 
> vSwitch? It seems it's time! If it seems it's not yet the right moment, 
> should I add a separate mff_ field in OVS? I initially dismissed this 
> idea because transferring a routing table doesn't really fit into the 
> OpenFlow concept.
> 
> Thanks in advance! If we agree that it's time to expand the registers, I 
> would like to take this on myself.
Hi!  The increase of the number of registers was brought up multiple times,
I believe, in the recent years.  The latest conversation happened in the
following thread just recently:
  https://mail.openvswitch.org/pipermail/ovs-dev/2025-October/426703.html

And, AFAIK, Dumitru already started working on that some time ago with a
plan to likely post it in time for the OVS 3.7 soft freeze.  So, we may
hopefully see a patch for that soon.

FWIW, there was also an off-list conversation about detection of support
for new registers and how to properly handle turning new features on and
off based on that.  Primarily in context of maintainability.  E.g. we
likely need to just not allow use of new features that require extra
registers instead of trying to work around the limitation and implement
the same feature differently based on availability.  But that's a separate
topic that can be explored when implementing first features that need new
registers in OVN.

Best regards, Ilya Maximets.
Dumitru Ceara Dec. 19, 2025, 1:24 p.m. UTC | #3
Hi Alexandra, Ilya,

On 12/19/25 1:18 PM, Ilya Maximets wrote:
> On 12/18/25 6:05 PM, Rukomoinikova Aleksandra wrote:
>> Hi everyone!
>>
>> I added maintainers to copy because I really need your opinion)
>>
>> It seems we are already exceeding the number of available registers in 
>> Open vSwitch, at least at the controller level.
>>
>> Currently, all registers accessible to the controller are already in 
>> use. Earlier in this thread, I sent RFC patch, which at the moment is 
>> nothing more than a messy proof of concept, precisely because I ran into 
>> the issue of lacking registers at the controller level. My goal is 
>> passing register values between pipelines, which can be possible only 
>> with ovn-controller registers. For now, I've implemented this through  a 
>> very dirty workaround using registers from northd, which is just 
>> terrible, but I couldn't find another option.
>>
>> What do you think: should we expand the number of registers in Open 
>> vSwitch? It seems it's time! If it seems it's not yet the right moment, 
>> should I add a separate mff_ field in OVS? I initially dismissed this 
>> idea because transferring a routing table doesn't really fit into the 
>> OpenFlow concept.
>>
>> Thanks in advance! If we agree that it's time to expand the registers, I 
>> would like to take this on myself.
> Hi!  The increase of the number of registers was brought up multiple times,
> I believe, in the recent years.  The latest conversation happened in the
> following thread just recently:
>   https://mail.openvswitch.org/pipermail/ovs-dev/2025-October/426703.html
> 
> And, AFAIK, Dumitru already started working on that some time ago with a
> plan to likely post it in time for the OVS 3.7 soft freeze.  So, we may
> hopefully see a patch for that soon.
> 

I managed to post the patch now, hopefully in time for someone to review
it before soft freeze:

https://patchwork.ozlabs.org/project/openvswitch/patch/20251219131927.1669591-1-dceara@redhat.com/

I also ran OVN tests (with a modified version of OVN that still only
uses 16 registers but compiles with OVS that supports 32 registers) and
CI looked green there too.

> FWIW, there was also an off-list conversation about detection of support
> for new registers and how to properly handle turning new features on and
> off based on that.  Primarily in context of maintainability.  E.g. we
> likely need to just not allow use of new features that require extra
> registers instead of trying to work around the limitation and implement
> the same feature differently based on availability.  But that's a separate
> topic that can be explored when implementing first features that need new
> registers in OVN.
> 

I agree, let's figure out the right way to consume this in OVN in the
future.

Regards,
Dumitru

> Best regards, Ilya Maximets.
>
diff mbox series

Patch

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 9f3ce0368..e35c034ac 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -399,6 +399,7 @@  static void ofctrl_meter_bands_clear(void);
  * S_CLEAR_FLOWS or S_UPDATE_FLOWS, this is really the option we have. */
 static enum mf_field_id mff_ovn_geneve;
 
+static enum mf_field_id mff_ovn_geneve_route_selector;
 /* Indicates if we just went through the S_CLEAR_FLOWS state, which means we
  * need to perform a one time deletion for all the existing flows, groups and
  * meters. This can happen during initialization or OpenFlow reconnection
@@ -505,6 +506,13 @@  process_tlv_table_reply(const struct ofputil_tlv_table_reply *reply)
             } else {
                 mff_ovn_geneve = MFF_TUN_METADATA0 + map->index;
                 state = S_WAIT_BEFORE_CLEAR;
+                /* alocate next index for route selector tlv */
+                /* TODO: handle case of setting this option
+                   from the command line. */
+                if (ovs_list_size(&reply->mappings) > map->index + 1) {
+                    mff_ovn_geneve_route_selector =
+                        MFF_TUN_METADATA0 + map->index + 1;
+                }
                 return true;
             }
         }
@@ -520,18 +528,28 @@  process_tlv_table_reply(const struct ofputil_tlv_table_reply *reply)
         return false;
     }
 
+    /* TODO: This works but its look ugly, remove code duplicate */
     unsigned int index = rightmost_1bit_idx(md_free);
     mff_ovn_geneve = MFF_TUN_METADATA0 + index;
-    struct ofputil_tlv_map tm;
-    tm.option_class = OVN_GENEVE_CLASS;
-    tm.option_type = OVN_GENEVE_TYPE;
-    tm.option_len = OVN_GENEVE_LEN;
-    tm.index = index;
+    struct ofputil_tlv_map tm_ovn_geneve;
+    tm_ovn_geneve.option_class = OVN_GENEVE_CLASS;
+    tm_ovn_geneve.option_type = OVN_GENEVE_TYPE;
+    tm_ovn_geneve.option_len = OVN_GENEVE_LEN;
+    tm_ovn_geneve.index = index;
+    index++;
+
+    mff_ovn_geneve_route_selector = MFF_TUN_METADATA0 + index;
+    struct ofputil_tlv_map tm_route_selector;
+    tm_route_selector.option_class = OVN_GENEVE_ROUTE_SELECTOR_CLASS;
+    tm_route_selector.option_type = OVN_GENEVE_ROUTE_SELECTOR_TYPE;
+    tm_route_selector.option_len = OVN_GENEVE_ROUTE_SELECTOR_LEN;
+    tm_route_selector.index = index;
 
     struct ofputil_tlv_table_mod ttm;
     ttm.command = NXTTMC_ADD;
     ovs_list_init(&ttm.mappings);
-    ovs_list_push_back(&ttm.mappings, &tm.list_node);
+    ovs_list_push_back(&ttm.mappings, &tm_ovn_geneve.list_node);
+    ovs_list_push_back(&ttm.mappings, &tm_route_selector.list_node);
 
     xid = queue_msg(ofputil_encode_tlv_table_mod(OFP15_VERSION, &ttm));
     xid2 = queue_msg(ofputil_encode_barrier_request(OFP15_VERSION));
@@ -572,6 +590,7 @@  recv_S_TLV_TABLE_REQUESTED(const struct ofp_header *oh, enum ofptype type,
 
     /* Error path. */
     mff_ovn_geneve = 0;
+    mff_ovn_geneve_route_selector = 0;
     state = S_WAIT_BEFORE_CLEAR;
 }
 
@@ -768,7 +787,7 @@  recv_S_UPDATE_FLOWS(const struct ofp_header *oh, enum ofptype type,
 
 
 enum mf_field_id
-ofctrl_get_mf_field_id(void)
+ofctrl_get_mf_field_id_ovn_geneve_base(void)
 {
     if (!rconn_is_connected(swconn)) {
         return 0;
@@ -779,6 +798,18 @@  ofctrl_get_mf_field_id(void)
             ? mff_ovn_geneve : 0);
 }
 
+enum mf_field_id
+ofctrl_get_mf_field_id_ovn_geneve_route_selector(void)
+{
+    if (!rconn_is_connected(swconn)) {
+        return 0;
+    }
+    return (state == S_WAIT_BEFORE_CLEAR
+            || state == S_CLEAR_FLOWS
+            || state == S_UPDATE_FLOWS
+            ? mff_ovn_geneve_route_selector : 0);
+}
+
 /* Runs the OpenFlow state machine against 'br_int', which is local to the
  * hypervisor on which we are running.  Attempts to negotiate a Geneve option
  * field for class OVN_GENEVE_CLASS, type OVN_GENEVE_TYPE.
diff --git a/controller/ofctrl.h b/controller/ofctrl.h
index abd2ff1c9..5b004a298 100644
--- a/controller/ofctrl.h
+++ b/controller/ofctrl.h
@@ -57,7 +57,8 @@  bool ofctrl_run(const char *conn_target, int probe_interval,
                 const struct ovsrec_open_vswitch_table *ovs_table,
                 struct shash *pending_ct_zones,
                 struct tracked_acl_ids *tracked_acl_ids);
-enum mf_field_id ofctrl_get_mf_field_id(void);
+enum mf_field_id ofctrl_get_mf_field_id_ovn_geneve_base(void);
+enum mf_field_id ofctrl_get_mf_field_id_ovn_geneve_route_selector(void);
 void ofctrl_put(struct ovn_desired_flow_table *lflow_table,
                 struct ovn_desired_flow_table *pflow_table,
                 struct shash *pending_ct_zones,
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 2d9b3e033..dedb76ff9 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -2471,6 +2471,7 @@  en_ct_zones_is_valid(struct engine_node *node OVS_UNUSED)
 
 struct ed_type_mff_ovn_geneve {
     enum mf_field_id mff_ovn_geneve;
+    enum mf_field_id mff_ovn_geneve_route_selector;
 };
 
 static void *
@@ -2489,13 +2490,18 @@  en_mff_ovn_geneve_cleanup(void *data OVS_UNUSED)
 static enum engine_node_state
 en_mff_ovn_geneve_run(struct engine_node *node OVS_UNUSED, void *data)
 {
-    struct ed_type_mff_ovn_geneve *ed_mff_ovn_geneve = data;
-    enum mf_field_id mff_ovn_geneve = ofctrl_get_mf_field_id();
-    if (ed_mff_ovn_geneve->mff_ovn_geneve != mff_ovn_geneve) {
-        ed_mff_ovn_geneve->mff_ovn_geneve = mff_ovn_geneve;
-        return EN_UPDATED;
-    }
-    return EN_UNCHANGED;
+    struct ed_type_mff_ovn_geneve *ed = data;
+    enum mf_field_id old_geneve = ed->mff_ovn_geneve;
+    enum mf_field_id old_selector = ed->mff_ovn_geneve_route_selector;
+
+    ed->mff_ovn_geneve =
+        ofctrl_get_mf_field_id_ovn_geneve_base();
+    ed->mff_ovn_geneve_route_selector =
+        ofctrl_get_mf_field_id_ovn_geneve_route_selector();
+
+    return (ed->mff_ovn_geneve != old_geneve ? EN_UPDATED : 0)
+            | (ed->mff_ovn_geneve_route_selector != old_selector
+            ? EN_UPDATED : 0);
 }
 
 /* Stores the load balancers that are applied to the datapath 'dp'. */
@@ -4649,6 +4655,8 @@  static void init_physical_ctx(struct engine_node *node,
     p_ctx->local_datapaths = &rt_data->local_datapaths;
     p_ctx->ct_zones = ct_zones;
     p_ctx->mff_ovn_geneve = ed_mff_ovn_geneve->mff_ovn_geneve;
+    p_ctx->mff_ovn_geneve_route_selector =
+        ed_mff_ovn_geneve->mff_ovn_geneve_route_selector;
     p_ctx->local_bindings = &rt_data->lbinding_data.bindings;
     p_ctx->patch_ofports = &non_vif_data->patch_ofports;
     p_ctx->chassis_tunnels = &non_vif_data->chassis_tunnels;
diff --git a/controller/physical.c b/controller/physical.c
index 2683f2d97..9bbb28e1e 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -147,6 +147,7 @@  get_port_binding_tun(const struct sbrec_encap *remote_encap,
 
 static void
 put_encapsulation(enum mf_field_id mff_ovn_geneve,
+                  enum mf_field_id mff_ovn_geneve_route_selector,
                   const struct chassis_tunnel *tun,
                   const struct sbrec_datapath_binding *datapath,
                   uint16_t outport, bool is_ramp_switch,
@@ -156,6 +157,8 @@  put_encapsulation(enum mf_field_id mff_ovn_geneve,
         put_load(datapath->tunnel_key, MFF_TUN_ID, 0, 24, ofpacts);
         put_load(outport, mff_ovn_geneve, 0, 32, ofpacts);
         put_move(MFF_LOG_INPORT, 0, mff_ovn_geneve, 16, 15, ofpacts);
+        put_move(MFF_LOG_ROUTE_SELECTOR, 0, mff_ovn_geneve_route_selector,
+                 0, 16, ofpacts);
     } else if (tun->type == VXLAN) {
         uint64_t vni = datapath->tunnel_key;
         if (!is_ramp_switch) {
@@ -171,6 +174,7 @@  put_encapsulation(enum mf_field_id mff_ovn_geneve,
 
 static void
 put_decapsulation(enum mf_field_id mff_ovn_geneve,
+                  enum mf_field_id mff_ovn_geneve_route_selector,
                   const struct chassis_tunnel *tun,
                   struct ofpbuf *ofpacts)
 {
@@ -178,8 +182,9 @@  put_decapsulation(enum mf_field_id mff_ovn_geneve,
         put_move(MFF_TUN_ID, 0,  MFF_LOG_DATAPATH, 0, 24, ofpacts);
         put_move(mff_ovn_geneve, 16, MFF_LOG_INPORT, 0, 15, ofpacts);
         put_move(mff_ovn_geneve, 0, MFF_LOG_OUTPORT, 0, 16, ofpacts);
-        put_load(ofp_to_u16(tun->ofport), MFF_LOG_TUN_OFPORT,
-                 16, 16, ofpacts);
+        put_move(mff_ovn_geneve_route_selector, 0,
+                 MFF_LOG_ROUTE_SELECTOR, 0, 16, ofpacts);
+        put_load(ofp_to_u16(tun->ofport), MFF_LOG_TUN_OFPORT, 16, 16, ofpacts);
     } else if (tun->type == VXLAN) {
         /* Add flows for non-VTEP tunnels. Split VNI into two 12-bit
          * sections and use them for datapath and outport IDs. */
@@ -255,6 +260,7 @@  put_set_tunnel_ip(const char *ip, bool is_src, struct ofpbuf *ofpacts)
 /* Flow-based encapsulation that sets tunnel metadata and endpoint IPs. */
 static void
 put_flow_based_encapsulation(enum mf_field_id mff_ovn_geneve,
+                             enum mf_field_id mff_ovn_geneve_route_selector,
                              enum chassis_tunnel_type tunnel_type,
                              const char *local_ip, const char *remote_ip,
                              const struct sbrec_datapath_binding *datapath,
@@ -264,8 +270,10 @@  put_flow_based_encapsulation(enum mf_field_id mff_ovn_geneve,
     struct chassis_tunnel temp_tun = {
         .type = tunnel_type,
     };
-    put_encapsulation(mff_ovn_geneve, &temp_tun, datapath,
-                      outport, is_ramp_switch, ofpacts);
+    put_encapsulation(mff_ovn_geneve,
+                      mff_ovn_geneve_route_selector,
+                      &temp_tun, datapath, outport,
+                      is_ramp_switch, ofpacts);
 
     /* Set tunnel source and destination IPs (flow-based specific) */
     put_set_tunnel_ip(local_ip, true, ofpacts);
@@ -335,10 +343,12 @@  put_flow_based_remote_port_redirect_overlay(
         }
 
         /* Set flow-based tunnel encapsulation. */
-        put_flow_based_encapsulation(ctx->mff_ovn_geneve, tunnel_type,
-                                     local_encap_ip, remote_ip,
-                                     binding->datapath, port_key,
-                                     is_vtep_port, ofpacts_clone);
+        put_flow_based_encapsulation(ctx->mff_ovn_geneve,
+                                     ctx->mff_ovn_geneve_route_selector,
+                                     tunnel_type, local_encap_ip,
+                                     remote_ip, binding->datapath,
+                                     port_key, is_vtep_port,
+                                     ofpacts_clone);
 
         ofpact_put_OUTPUT(ofpacts_clone)->port = flow_port;
         put_resubmit(OFTABLE_LOCAL_OUTPUT, ofpacts_clone);
@@ -354,6 +364,7 @@  put_flow_based_remote_port_redirect_overlay(
 static void
 add_tunnel_ingress_flows(const struct chassis_tunnel *tun,
                          enum mf_field_id mff_ovn_geneve,
+                         enum mf_field_id mff_ovn_geneve_route_selector,
                          struct ovn_desired_flow_table *flow_table,
                          struct ofpbuf *ofpacts)
 {
@@ -362,7 +373,8 @@  add_tunnel_ingress_flows(const struct chassis_tunnel *tun,
     match_set_in_port(&match, tun->ofport);
 
     ofpbuf_clear(ofpacts);
-    put_decapsulation(mff_ovn_geneve, tun, ofpacts);
+    put_decapsulation(mff_ovn_geneve, mff_ovn_geneve_route_selector,
+                      tun, ofpacts);
     put_resubmit(OFTABLE_LOCAL_OUTPUT, ofpacts);
 
     ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 100, 0, &match,
@@ -636,8 +648,10 @@  put_port_based_remote_port_redirect_overlay(
 
             const struct chassis_tunnel *tun;
             VECTOR_FOR_EACH (&tuns, tun) {
-                put_encapsulation(ctx->mff_ovn_geneve, tun, binding->datapath,
-                                  port_key, is_vtep_port, ofpacts_clone);
+                put_encapsulation(ctx->mff_ovn_geneve,
+                                  ctx->mff_ovn_geneve_route_selector,
+                                  tun, binding->datapath, port_key,
+                                  is_vtep_port, ofpacts_clone);
                 ofpact_put_OUTPUT(ofpacts_clone)->port = tun->ofport;
             }
             put_resubmit(OFTABLE_REMOTE_VTEP_OUTPUT, ofpacts_clone);
@@ -909,7 +923,9 @@  put_remote_port_redirect_overlay_ha_remote(
     const struct sbrec_port_binding *binding,
     const enum en_lport_type type,
     struct ha_chassis_ordered *ha_ch_ordered,
-    enum mf_field_id mff_ovn_geneve, uint32_t port_key,
+    enum mf_field_id mff_ovn_geneve,
+    enum mf_field_id mff_ovn_geneve_route_selector,
+    uint32_t port_key,
     struct match *match, struct ofpbuf *ofpacts_p,
     const struct hmap *chassis_tunnels,
     struct ovn_desired_flow_table *flow_table)
@@ -947,9 +963,11 @@  put_remote_port_redirect_overlay_ha_remote(
         return;
     }
 
-    put_encapsulation(mff_ovn_geneve, tun, binding->datapath, port_key,
+    put_encapsulation(mff_ovn_geneve, mff_ovn_geneve_route_selector,
+                      tun, binding->datapath, port_key,
                       type == LP_VTEP, ofpacts_p);
 
+
     /* Output to tunnels with active/backup */
     struct ofpact_bundle *bundle = ofpact_put_BUNDLE(ofpacts_p);
 
@@ -2141,7 +2159,9 @@  enforce_tunneling_for_multichassis_ports(
 
         const struct chassis_tunnel *tun;
         VECTOR_FOR_EACH (&tuns, tun) {
-            put_encapsulation(ctx->mff_ovn_geneve, tun, binding->datapath,
+            put_encapsulation(ctx->mff_ovn_geneve,
+                              ctx->mff_ovn_geneve_route_selector,
+                              tun, binding->datapath,
                               port_key, is_vtep_port, &ofpacts);
             ofpact_put_OUTPUT(&ofpacts)->port = tun->ofport;
         }
@@ -2216,6 +2236,10 @@  consider_port_binding(const struct physical_ctx *ctx,
         put_load(0, MFF_LOG_FLAGS, 0, 32, ofpacts_p);
         put_load(0, MFF_LOG_OUTPORT, 0, 32, ofpacts_p);
         for (int i = 0; i < MFF_N_LOG_REGS; i++) {
+            /* figure out issue with register and remove this mess */
+            if (i == 3) {
+                continue;
+            }
             put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
         }
         put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
@@ -2670,7 +2694,8 @@  consider_port_binding(const struct physical_ctx *ctx,
             &match, ofpacts_p, flow_table);
     } else if (access_type == PORT_HA_REMOTE) {
         put_remote_port_redirect_overlay_ha_remote(
-            binding, type, ha_ch_ordered, ctx->mff_ovn_geneve, port_key,
+            binding, type, ha_ch_ordered, ctx->mff_ovn_geneve,
+            ctx->mff_ovn_geneve_route_selector, port_key,
             &match, ofpacts_p, ctx->chassis_tunnels, flow_table);
     } else {
         put_remote_port_redirect_overlay(
@@ -2697,6 +2722,7 @@  get_vxlan_port_key(int64_t port_key)
 /* Encapsulate and send to a single remote chassis. */
 static void
 tunnel_to_chassis(enum mf_field_id mff_ovn_geneve,
+                  enum mf_field_id mff_ovn_geneve_route_selector,
                   const char *chassis_name,
                   const struct hmap *chassis_tunnels,
                   const struct sbrec_datapath_binding *datapath,
@@ -2708,8 +2734,8 @@  tunnel_to_chassis(enum mf_field_id mff_ovn_geneve,
         return;
     }
 
-    put_encapsulation(mff_ovn_geneve, tun, datapath, outport, false,
-                      remote_ofpacts);
+    put_encapsulation(mff_ovn_geneve, mff_ovn_geneve_route_selector,
+                      tun, datapath, outport, false, remote_ofpacts);
     ofpact_put_OUTPUT(remote_ofpacts)->port = tun->ofport;
 }
 
@@ -2788,8 +2814,10 @@  fanout_to_chassis_flow_based(const struct physical_ctx *ctx,
                 .ofport = flow_port,
                 .type = tunnel_type
             };
-            put_encapsulation(ctx->mff_ovn_geneve, &temp_tun, datapath,
-                              outport, is_ramp_switch, remote_ofpacts);
+            put_encapsulation(ctx->mff_ovn_geneve,
+                              ctx->mff_ovn_geneve_route_selector,
+                              &temp_tun, datapath, outport,
+                              is_ramp_switch, remote_ofpacts);
             prev_type = tunnel_type;
         }
 
@@ -2803,6 +2831,7 @@  fanout_to_chassis_flow_based(const struct physical_ctx *ctx,
 /* Encapsulate and send to a set of remote chassis (port-based tunnels). */
 static void
 fanout_to_chassis_port_based(enum mf_field_id mff_ovn_geneve,
+                             enum mf_field_id mff_ovn_geneve_route_selector,
                              struct sset *remote_chassis,
                              const struct hmap *chassis_tunnels,
                              const struct sbrec_datapath_binding *datapath,
@@ -2819,8 +2848,10 @@  fanout_to_chassis_port_based(enum mf_field_id mff_ovn_geneve,
         }
 
         if (!prev || tun->type != prev->type) {
-            put_encapsulation(mff_ovn_geneve, tun, datapath,
-                              outport, is_ramp_switch, remote_ofpacts);
+            put_encapsulation(mff_ovn_geneve,
+                              mff_ovn_geneve_route_selector,
+                              tun, datapath, outport, is_ramp_switch,
+                              remote_ofpacts);
             prev = tun;
         }
         ofpact_put_OUTPUT(remote_ofpacts)->port = tun->ofport;
@@ -2974,7 +3005,9 @@  consider_mc_group(const struct physical_ctx *ctx,
             if (port->chassis) {
                 put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32,
                          &remote_ctx->ofpacts);
-                tunnel_to_chassis(ctx->mff_ovn_geneve, port->chassis->name,
+                tunnel_to_chassis(ctx->mff_ovn_geneve,
+                                  ctx->mff_ovn_geneve_route_selector,
+                                  port->chassis->name,
                                   ctx->chassis_tunnels, mc->datapath,
                                   port->tunnel_key, &remote_ctx->ofpacts);
             }
@@ -3055,11 +3088,15 @@  consider_mc_group(const struct physical_ctx *ctx,
         VLOG_DBG("Using port-based tunnels for multicast group %s "
                  "(tunnel_key=%"PRId64") with %"PRIuSIZE" remote chassis",
                  mc->name, mc->tunnel_key, sset_count(&remote_chassis));
-        fanout_to_chassis_port_based(ctx->mff_ovn_geneve, &remote_chassis,
+        fanout_to_chassis_port_based(ctx->mff_ovn_geneve,
+                                     ctx->mff_ovn_geneve_route_selector,
+                                     &remote_chassis,
                                      ctx->chassis_tunnels, mc->datapath,
                                      mc->tunnel_key, false,
                                      &remote_ctx->ofpacts);
-        fanout_to_chassis_port_based(ctx->mff_ovn_geneve, &vtep_chassis,
+        fanout_to_chassis_port_based(ctx->mff_ovn_geneve,
+                                     ctx->mff_ovn_geneve_route_selector,
+                                     &vtep_chassis,
                                      ctx->chassis_tunnels, mc->datapath,
                                      mc->tunnel_key, true,
                                      &remote_ctx->ofpacts);
@@ -3130,7 +3167,9 @@  physical_eval_remote_chassis_flows(const struct physical_ctx *ctx,
         ofpbuf_clear(&ingress_ofpacts);
         put_load(1, MFF_LOG_FLAGS, MLF_RX_FROM_TUNNEL_BIT, 1,
                  &ingress_ofpacts);
-        put_decapsulation(ctx->mff_ovn_geneve, tun, &ingress_ofpacts);
+        put_decapsulation(ctx->mff_ovn_geneve,
+                          ctx->mff_ovn_geneve_route_selector,
+                          tun, &ingress_ofpacts);
         put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, &ingress_ofpacts);
         if (tun->type == VXLAN) {
             /* VXLAN doesn't carry the inport information, we cannot set
@@ -3709,8 +3748,9 @@  physical_run(struct physical_ctx *p_ctx,
      * packets to the local hypervisor. */
     struct chassis_tunnel *tun;
     HMAP_FOR_EACH (tun, hmap_node, p_ctx->chassis_tunnels) {
-        add_tunnel_ingress_flows(tun, p_ctx->mff_ovn_geneve, flow_table,
-                                &ofpacts);
+        add_tunnel_ingress_flows(tun, p_ctx->mff_ovn_geneve,
+                                 p_ctx->mff_ovn_geneve_route_selector,
+                                 flow_table, &ofpacts);
     }
 
     /* Process packets that arrive from flow-based tunnels. */
@@ -3733,8 +3773,10 @@  physical_run(struct physical_ctx *p_ctx,
                      "type=%s", p_ctx->flow_tunnels[i].ofport,
                      i == GENEVE ? "geneve" : "vxlan");
 
-            add_tunnel_ingress_flows(&temp_tunnel, p_ctx->mff_ovn_geneve,
-                                    flow_table, &ofpacts);
+            add_tunnel_ingress_flows(&temp_tunnel,
+                                     p_ctx->mff_ovn_geneve,
+                                     p_ctx->mff_ovn_geneve_route_selector,
+                                     flow_table, &ofpacts);
         }
     }
 
@@ -3904,6 +3946,10 @@  physical_run(struct physical_ctx *p_ctx,
     match_init_catchall(&match);
     ofpbuf_clear(&ofpacts);
     for (int i = 0; i < MFF_N_LOG_REGS; i++) {
+        /* figure out issue with register and remove this mess */
+        if (i == 3) {
+            continue;
+        }
         put_load(0, MFF_REG0 + i, 0, 32, &ofpacts);
     }
     put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, &ofpacts);
diff --git a/controller/physical.h b/controller/physical.h
index c7a33bd02..7997ba913 100644
--- a/controller/physical.h
+++ b/controller/physical.h
@@ -43,6 +43,12 @@  struct local_nonvif_data;
 #define OVN_GENEVE_TYPE 0x80     /* Critical option. */
 #define OVN_GENEVE_LEN 4
 
+/* Assigned Geneve class for OVN route selector. */
+#define OVN_GENEVE_ROUTE_SELECTOR_CLASS 0x0101
+#define OVN_GENEVE_ROUTE_SELECTOR_TYPE 0x81
+ /* in used first 16 bit */
+#define OVN_GENEVE_ROUTE_SELECTOR_LEN 4
+
 struct physical_debug {
     uint32_t collector_set_id;
     uint32_t obs_domain_id;
@@ -63,6 +69,7 @@  struct physical_ctx {
     struct hmap *local_datapaths;
     const struct shash *ct_zones;
     enum mf_field_id mff_ovn_geneve;
+    enum mf_field_id mff_ovn_geneve_route_selector;
     struct shash *local_bindings;
     struct simap *patch_ofports;
     struct hmap *chassis_tunnels;
diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
index f0d34196a..4ed827e08 100644
--- a/include/ovn/logical-fields.h
+++ b/include/ovn/logical-fields.h
@@ -30,6 +30,7 @@  enum ovn_controller_event {
  *
  * These values are documented in ovn-architecture(7), please update the
  * documentation if you change any of them. */
+ /* Logical datapath (64 bits  */
 #define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */
 #define MFF_LOG_FLAGS      MFF_REG10  /* One of MLF_* (32 bits). */
 #define MFF_LOG_DNAT_ZONE  MFF_REG11  /* conntrack dnat zone for gateway router
@@ -43,6 +44,7 @@  enum ovn_controller_event {
 #define MFF_LOG_INPORT     MFF_REG14  /* Logical input port (32 bits). */
 #define MFF_LOG_OUTPORT    MFF_REG15  /* Logical output port (32 bits). */
 #define MFF_LOG_TUN_OFPORT MFF_REG5   /* 16..31 of the 32 bits */
+#define MFF_LOG_ROUTE_SELECTOR MFF_REG7
 
 /* Logical registers.
  *
diff --git a/lib/logical-fields.c b/lib/logical-fields.c
index c8bddcdc5..978b3468f 100644
--- a/lib/logical-fields.c
+++ b/lib/logical-fields.c
@@ -71,6 +71,8 @@  ovn_init_symtab(struct shash *symtab)
      * doesn't yet support string fields that occupy less than a full OXM. */
     expr_symtab_add_string(symtab, "inport", MFF_LOG_INPORT, NULL);
     expr_symtab_add_string(symtab, "outport", MFF_LOG_OUTPORT, NULL);
+    expr_symtab_add_field(symtab, "route_selector",
+                          MFF_LOG_ROUTE_SELECTOR, NULL, false);
 
     /* The port isn't reserved along the pipeline it's just defined as symbol
      * to support matching on string and moving between string registers. */
diff --git a/northd/northd.c b/northd/northd.c
index c3c0780a3..17fa93346 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -217,7 +217,6 @@  BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2));
 #define REG_SRC_IPV6 "xxreg1"
 #define REG_DHCP_RELAY_DIP_IPV4 "reg2"
 #define REG_POLICY_CHAIN_ID "reg9[16..31]"
-#define REG_ROUTE_TABLE_ID "reg7"
 
 /* Registers used for pasing observability information for switches:
  * domain and point ID. */
@@ -11607,9 +11606,10 @@  build_route_table_lflow(struct ovn_datapath *od, struct lflow_table *lflows,
         return;
     }
 
-    ds_put_format(&match, "inport == \"%s\"", lrp->name);
-    ds_put_format(&actions, "%s = %d; next;",
-                  REG_ROUTE_TABLE_ID, rtb_id);
+    /* Don't overwrite route_selector if we recirved it */
+    ds_put_format(&match, "route_selector == 0 && inport == \"%s\"",
+                  lrp->name);
+    ds_put_format(&actions, "route_selector = %d; next;", rtb_id);
 
     ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING_PRE, 100,
                   ds_cstr(&match), ds_cstr(&actions), lflow_ref);
@@ -12081,14 +12081,16 @@  build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id,
     if (op_inport) {
         ds_put_format(match, "inport == %s && ", op_inport->json_key);
     }
+
     if (rtb_id || source == ROUTE_SOURCE_STATIC ||
             source == ROUTE_SOURCE_LEARNED) {
-        ds_put_format(match, "%s == %d && ", REG_ROUTE_TABLE_ID, rtb_id);
+        ds_put_format(match, "route_selector == %d && ", rtb_id);
     }
 
     if (has_protocol_match) {
         ofs += 1;
     }
+
     *priority = (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs;
 
     ds_put_format(match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", dir,
@@ -14426,7 +14428,7 @@  build_ip_routing_pre_flows_for_lrouter(struct ovn_datapath *od,
 {
     ovs_assert(od->nbr);
     ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING_PRE, 0, "1",
-                  REG_ROUTE_TABLE_ID" = 0; next;", lflow_ref);
+                  "next;", lflow_ref);
 }
 
 static void