[ovs-dev,v7,4/4] Replace router internal MAC with gateway MAC for reply packets

Message ID 20180801121635.14509-5-vkommadi@redhat.com
State Deferred
Delegated to: Justin Pettit
Headers show
Series
  • Use VLANs for VLAN packets redirected to a gateway chassis
Related show

Commit Message

Venkata Anil Aug. 1, 2018, 12:16 p.m.
From: venkata anil <vkommadi@redhat.com>

Previous patches in the series doesn't address issue 1 explained in [1]
i.e
1) removal of router gateway port MAC address on external switches
   after expiring of aging time.
2) then external switches unable to learn the gateway MAC as
   reply packets carry router internal port MAC address as source

To fix this, router on gateway node will use router gateway MAC address
instead of router internal port MAC address as source for reply packets,
so that external switches can learn gateway MAC address.
This is done only for reply packets from router gateway to tenant VLAN
switch ports.
Later before delivering the packet to the port, ovn-controller will
replace the gateway MAC with router internal port MAC in table 33.

[1] //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html

Reported-by: Miguel Angel Ajo <majopela@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
Signed-off-by: Venkata Anil <vkommadi@redhat.com>
---

v6->v7:
* Added this patch


 ovn/controller/physical.c   | 60 ++++++++++++++++++++++++++++++++++++++++++---
 ovn/northd/ovn-northd.8.xml | 10 ++++++++
 ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
 ovn/ovn-architecture.7.xml  |  4 ++-
 4 files changed, 99 insertions(+), 4 deletions(-)

Comments

Mark Michelson Aug. 2, 2018, 8:47 p.m. | #1
On 08/01/2018 08:16 AM, vkommadi@redhat.com wrote:
> From: venkata anil <vkommadi@redhat.com>
> 
> Previous patches in the series doesn't address issue 1 explained in [1]
> i.e
> 1) removal of router gateway port MAC address on external switches
>     after expiring of aging time.
> 2) then external switches unable to learn the gateway MAC as
>     reply packets carry router internal port MAC address as source
> 
> To fix this, router on gateway node will use router gateway MAC address
> instead of router internal port MAC address as source for reply packets,
> so that external switches can learn gateway MAC address.
> This is done only for reply packets from router gateway to tenant VLAN
> switch ports.
> Later before delivering the packet to the port, ovn-controller will
> replace the gateway MAC with router internal port MAC in table 33.
> 
> [1] //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
> 
> Reported-by: Miguel Angel Ajo <majopela@redhat.com>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
> Signed-off-by: Venkata Anil <vkommadi@redhat.com>
> ---
> 
> v6->v7:
> * Added this patch
> 
> 
>   ovn/controller/physical.c   | 60 ++++++++++++++++++++++++++++++++++++++++++---
>   ovn/northd/ovn-northd.8.xml | 10 ++++++++
>   ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>   ovn/ovn-architecture.7.xml  |  4 ++-
>   4 files changed, 99 insertions(+), 4 deletions(-)
> 
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index f269a1d..1f41f59 100644
> --- a/ovn/controller/physical.c
> +++ b/ovn/controller/physical.c
> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>   static void
>   put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>                          bool nested_container, const struct zone_ids *zone_ids,
> -                       struct ofpbuf *ofpacts_p, struct hmap *flow_table)
> +                       struct ofpbuf *ofpacts_p, struct hmap *flow_table,
> +                       struct local_datapath *ld,
> +                       const struct hmap *local_datapaths)
>   {
>       struct match match;
>   
> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>           }
>       }
>   
> +    struct ofpbuf *clone = NULL;
> +    clone = ofpbuf_clone(ofpacts_p);
> +

I don't understand the use of the cloned ofpbuf here. You could just 
ofpbuf_clear(ofpacts_p) in each iteration of the for loop below and 
reuse ofpacts_p where you've used clone below.

>       /* Resubmit to table 34. */
>       put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>       ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>                       &match, ofpacts_p);
>   
> +    /* For a reply packet from gateway with VLAN switch port as destination
> +     * (excluding localnet_port and external VLAN networks), gateway router
> +     * will use gateway MAC address as source MAC instead of router internal
> +     * port MAC, so that external switches can learn gateway MAC address.
> +     * Here (before packet is given to the port) we replace router gateway
> +     * MAC address with router internal port MAC. */
> +    if (ld->localnet_port && (port_key != ld->localnet_port->tunnel_key)) {
> +        for (int i = 0; i < ld->n_peer_dps; i++) {
> +            struct local_datapath *peer_ldp = get_local_datapath(
> +                local_datapaths, ld->peer_dps[i]->peer_dp->tunnel_key);
> +            const struct sbrec_port_binding *crp;
> +            crp = peer_ldp->chassisredirect_port;
> +            if (!crp) {
> +                continue;
> +            }
> +
> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
> +                       ld->peer_dps[i]->peer->logical_port) &&
> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
> +                for (int j = 0; j < crp->n_mac; j++) {
> +                    struct lport_addresses laddrs;
> +                    if (!extract_lsp_addresses(crp->mac[j], &laddrs)) {
> +                        continue;
> +                    }
> +                    match_set_dl_src(&match, laddrs.ea);
> +                    destroy_lport_addresses(&laddrs);
> +                    break;
> +                }
> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; j++) {
> +                    struct lport_addresses laddrs;
> +                    uint64_t mac64;
> +                    if (!extract_lsp_addresses(
> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
> +                        continue;
> +                    }
> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> +                    put_load(mac64,
> +                             MFF_ETH_SRC, 0, 48, clone);
> +                    destroy_lport_addresses(&laddrs);
> +                    break;
> +                }
> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 150, 0,
> +                    &match, clone);
> +            }
> +        }
> +    }
> +    ofpbuf_delete(clone);
> +
>       /* Table 34, Priority 100.
>        * =======================
>        *
> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
>   
>           struct zone_ids binding_zones = get_zone_ids(binding, ct_zones);
>           put_local_common_flows(dp_key, port_key, false, &binding_zones,
> -                               ofpacts_p, flow_table);
> +                               ofpacts_p, flow_table, ld, local_datapaths);
>   
>           match_init_catchall(&match);
>           ofpbuf_clear(ofpacts_p);
> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
>   
>           struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
>           put_local_common_flows(dp_key, port_key, nested_container, &zone_ids,
> -                               ofpacts_p, flow_table);
> +                               ofpacts_p, flow_table, ld, local_datapaths);
>   
>           /* Table 0, Priority 200, 150 and 100.
>            * ==============================
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 8fa5272..876c121 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -2013,6 +2013,16 @@ next;
>         </li>
>   
>         <li>
> +        A priority-100 logical flow with match
> +        <code>inport == <var>GW</var> &amp;&amp;
> +        flags.rcv_from_vlan == 1</code> has actions
> +        <code>eth.dst = <var>E</var>; next;</code>, where
> +        <var>GW</var> is the logical router distributed gateway
> +        port and <var>E</var> is the MAC address of router
> +        distributed gateway port.
> +      </li>
> +
> +      <li>
>           For each NAT rule in the OVN Northbound database that can
>           be handled in a distributed manner, a priority-100 logical
>           flow with match <code>ip4.src == <var>B</var> &amp;&amp;
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index bcf0b66..d012bb8 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct ovn_port *op,
>       } else {
>           ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>       }
> +
> +    if (op->peer && op->peer->od->localnet_port &&
> +        op->od->l3dgw_port && op->od->l3redirect_port &&
> +        (op != op->od->l3redirect_port) &&
> +        (op != op->od->l3dgw_port)) {
> +        ds_put_format(&match, " && is_chassis_resident(%s)",
> +                      op->od->l3redirect_port->json_key);
> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
> +    }
>       ds_put_format(&actions, "; "
>                     "%sreg1 = %s; "
>                     "eth.src = %s; "
> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>                         op->lrp_networks.ipv6_addrs[i].network_s,
>                         op->lrp_networks.ipv6_addrs[i].plen, NULL, NULL);
>           }
> +
> +        /* For a reply packet from gateway with VLAN switch port as
> +         * destination, replace router internal port MAC with router gateway
> +         * MAC address, so that external switches can learn gateway MAC
> +         * address. Later before delivering the packet to the port,
> +         * controller will replace the gateway MAC with router internal port
> +         * MAC in table 33. */
> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
> +            op->od->l3redirect_port) {
> +            ds_clear(&actions);
> +            ds_clear(&match);
> +            ds_put_format(&match, "inport == %s", op->json_key);
> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
> +            ds_put_format(&match, " && is_chassis_resident(%s)",
> +                          op->od->l3redirect_port->json_key);
> +            ds_put_format(&actions,
> +                          "eth.src = %s; next;", op->lrp_networks.ea_s);
> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, 100,
> +                          ds_cstr(&match), ds_cstr(&actions));
> +        }
>       }
>   
>       /* Convert the static routes to flows. */
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index ad2101c..0de41d2 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -1067,7 +1067,9 @@
>   
>         <p>
>           Flows in table 33 resemble those in table 32 but for logical ports that
> -        reside locally rather than remotely.  For unicast logical output ports
> +        reside locally rather than remotely.  If these are VLAN ports and
> +        packet has router gateway port MAC address as source, replace it with
> +        router internal port MAC address. For unicast logical output ports
>           on the local hypervisor, the actions just resubmit to table 34.  For
>           multicast output ports that include one or more logical ports on the
>           local hypervisor, for each such logical port <var>P</var>, the actions
>
Anil Venkata Aug. 6, 2018, 5:58 p.m. | #2
Thanks Mark. Kindly look at my comment inline.

On Fri, Aug 3, 2018 at 2:17 AM, Mark Michelson <mmichels@redhat.com> wrote:

> On 08/01/2018 08:16 AM, vkommadi@redhat.com wrote:
>
>> From: venkata anil <vkommadi@redhat.com>
>>
>> Previous patches in the series doesn't address issue 1 explained in [1]
>> i.e
>> 1) removal of router gateway port MAC address on external switches
>>     after expiring of aging time.
>> 2) then external switches unable to learn the gateway MAC as
>>     reply packets carry router internal port MAC address as source
>>
>> To fix this, router on gateway node will use router gateway MAC address
>> instead of router internal port MAC address as source for reply packets,
>> so that external switches can learn gateway MAC address.
>> This is done only for reply packets from router gateway to tenant VLAN
>> switch ports.
>> Later before delivering the packet to the port, ovn-controller will
>> replace the gateway MAC with router internal port MAC in table 33.
>>
>> [1] //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
>>
>> Reported-by: Miguel Angel Ajo <majopela@redhat.com>
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349
>> 803.html
>> Signed-off-by: Venkata Anil <vkommadi@redhat.com>
>> ---
>>
>> v6->v7:
>> * Added this patch
>>
>>
>>   ovn/controller/physical.c   | 60 ++++++++++++++++++++++++++++++
>> ++++++++++++---
>>   ovn/northd/ovn-northd.8.xml | 10 ++++++++
>>   ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>>   ovn/ovn-architecture.7.xml  |  4 ++-
>>   4 files changed, 99 insertions(+), 4 deletions(-)
>>
>> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>> index f269a1d..1f41f59 100644
>> --- a/ovn/controller/physical.c
>> +++ b/ovn/controller/physical.c
>> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>   static void
>>   put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>>                          bool nested_container, const struct zone_ids
>> *zone_ids,
>> -                       struct ofpbuf *ofpacts_p, struct hmap *flow_table)
>> +                       struct ofpbuf *ofpacts_p, struct hmap *flow_table,
>> +                       struct local_datapath *ld,
>> +                       const struct hmap *local_datapaths)
>>   {
>>       struct match match;
>>   @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, uint32_t
>> port_key,
>>           }
>>       }
>>   +    struct ofpbuf *clone = NULL;
>> +    clone = ofpbuf_clone(ofpacts_p);
>> +
>>
>
> I don't understand the use of the cloned ofpbuf here. You could just
> ofpbuf_clear(ofpacts_p) in each iteration of the for loop below and reuse
> ofpacts_p where you've used clone below.
>
>
>

I need to add additional flow with priority 150 along with default flow
which exists with prioirty 100.

1) table=33,  priority=100,reg15=0x5,metadata=0x2 actions=load:0x2->NXM_NX_
REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
2) table=33,  priority=150,reg15=0x5,metadata=0x2,dl_src=fa:16:3e:48:66:e7
actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_
REG12[],mod_dl_src:fa:16:3e:65:f2:ae,resubmit(,34)

The first flow with priority 100 is the default flow. This flow is added
for each output port residing on the current chassis, sets some registers
and resubmits to table 34.
But when the packet has router gateway MAC as source MAC address, I want to
replace the MAC before resubmitting to table 34. Second flow has same match
conditions as first flow and also same actions, additionally it checks for
the source MAC and then modifies the source MAC address.

Before I construct the second flow, I have the registers and resubmit as
part of actions i.e
actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_
REG12[],resubmit(,34)

If I simply add MAC to actions with put_load(mac64, MFF_ETH_SRC, 0, 48,
ofpacts_p); then actions in 2nd flow(with 150 priority) will look like
actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_
REG12[],resubmit(,34),mod_dl_src:fa:16:3e:65:f2:ae
i.e modifying the MAC will happen after resubmit, which is not the expected
behaviour.

So I am cloning the ofpacts_p before resubmit is added to it and using the
cloned one in constructing flow with priority 150.
With this changes, actions will look like
actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_
REG12[],mod_dl_src:fa:16:3e:65:f2:ae,resubmit(,34)



>       /* Resubmit to table 34. */
>>       put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>>       ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>>                       &match, ofpacts_p);
>>   +    /* For a reply packet from gateway with VLAN switch port as
>> destination
>> +     * (excluding localnet_port and external VLAN networks), gateway
>> router
>> +     * will use gateway MAC address as source MAC instead of router
>> internal
>> +     * port MAC, so that external switches can learn gateway MAC address.
>> +     * Here (before packet is given to the port) we replace router
>> gateway
>> +     * MAC address with router internal port MAC. */
>> +    if (ld->localnet_port && (port_key != ld->localnet_port->tunnel_key))
>> {
>> +        for (int i = 0; i < ld->n_peer_dps; i++) {
>> +            struct local_datapath *peer_ldp = get_local_datapath(
>> +                local_datapaths, ld->peer_dps[i]->peer_dp->tunnel_key);
>> +            const struct sbrec_port_binding *crp;
>> +            crp = peer_ldp->chassisredirect_port;
>> +            if (!crp) {
>> +                continue;
>> +            }
>> +
>> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
>> +                       ld->peer_dps[i]->peer->logical_port) &&
>> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
>> +                for (int j = 0; j < crp->n_mac; j++) {
>> +                    struct lport_addresses laddrs;
>> +                    if (!extract_lsp_addresses(crp->mac[j], &laddrs)) {
>> +                        continue;
>> +                    }
>> +                    match_set_dl_src(&match, laddrs.ea);
>> +                    destroy_lport_addresses(&laddrs);
>> +                    break;
>> +                }
>> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; j++) {
>> +                    struct lport_addresses laddrs;
>> +                    uint64_t mac64;
>> +                    if (!extract_lsp_addresses(
>> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
>> +                        continue;
>> +                    }
>> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
>> +                    put_load(mac64,
>> +                             MFF_ETH_SRC, 0, 48, clone);
>> +                    destroy_lport_addresses(&laddrs);
>> +                    break;
>> +                }
>> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
>> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 150, 0,
>> +                    &match, clone);
>> +            }
>> +        }
>> +    }
>> +    ofpbuf_delete(clone);
>> +
>>       /* Table 34, Priority 100.
>>        * =======================
>>        *
>> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>             struct zone_ids binding_zones = get_zone_ids(binding,
>> ct_zones);
>>           put_local_common_flows(dp_key, port_key, false, &binding_zones,
>> -                               ofpacts_p, flow_table);
>> +                               ofpacts_p, flow_table, ld,
>> local_datapaths);
>>             match_init_catchall(&match);
>>           ofpbuf_clear(ofpacts_p);
>> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>             struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
>>           put_local_common_flows(dp_key, port_key, nested_container,
>> &zone_ids,
>> -                               ofpacts_p, flow_table);
>> +                               ofpacts_p, flow_table, ld,
>> local_datapaths);
>>             /* Table 0, Priority 200, 150 and 100.
>>            * ==============================
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index 8fa5272..876c121 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -2013,6 +2013,16 @@ next;
>>         </li>
>>           <li>
>> +        A priority-100 logical flow with match
>> +        <code>inport == <var>GW</var> &amp;&amp;
>> +        flags.rcv_from_vlan == 1</code> has actions
>> +        <code>eth.dst = <var>E</var>; next;</code>, where
>> +        <var>GW</var> is the logical router distributed gateway
>> +        port and <var>E</var> is the MAC address of router
>> +        distributed gateway port.
>> +      </li>
>> +
>> +      <li>
>>           For each NAT rule in the OVN Northbound database that can
>>           be handled in a distributed manner, a priority-100 logical
>>           flow with match <code>ip4.src == <var>B</var> &amp;&amp;
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index bcf0b66..d012bb8 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
>> ovn_port *op,
>>       } else {
>>           ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>>       }
>> +
>> +    if (op->peer && op->peer->od->localnet_port &&
>> +        op->od->l3dgw_port && op->od->l3redirect_port &&
>> +        (op != op->od->l3redirect_port) &&
>> +        (op != op->od->l3dgw_port)) {
>> +        ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                      op->od->l3redirect_port->json_key);
>> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
>> +    }
>>       ds_put_format(&actions, "; "
>>                     "%sreg1 = %s; "
>>                     "eth.src = %s; "
>> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>                         op->lrp_networks.ipv6_addrs[i].network_s,
>>                         op->lrp_networks.ipv6_addrs[i].plen, NULL, NULL);
>>           }
>> +
>> +        /* For a reply packet from gateway with VLAN switch port as
>> +         * destination, replace router internal port MAC with router
>> gateway
>> +         * MAC address, so that external switches can learn gateway MAC
>> +         * address. Later before delivering the packet to the port,
>> +         * controller will replace the gateway MAC with router internal
>> port
>> +         * MAC in table 33. */
>> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
>> +            op->od->l3redirect_port) {
>> +            ds_clear(&actions);
>> +            ds_clear(&match);
>> +            ds_put_format(&match, "inport == %s", op->json_key);
>> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                          op->od->l3redirect_port->json_key);
>> +            ds_put_format(&actions,
>> +                          "eth.src = %s; next;", op->lrp_networks.ea_s);
>> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, 100,
>> +                          ds_cstr(&match), ds_cstr(&actions));
>> +        }
>>       }
>>         /* Convert the static routes to flows. */
>> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>> index ad2101c..0de41d2 100644
>> --- a/ovn/ovn-architecture.7.xml
>> +++ b/ovn/ovn-architecture.7.xml
>> @@ -1067,7 +1067,9 @@
>>           <p>
>>           Flows in table 33 resemble those in table 32 but for logical
>> ports that
>> -        reside locally rather than remotely.  For unicast logical output
>> ports
>> +        reside locally rather than remotely.  If these are VLAN ports and
>> +        packet has router gateway port MAC address as source, replace it
>> with
>> +        router internal port MAC address. For unicast logical output
>> ports
>>           on the local hypervisor, the actions just resubmit to table
>> 34.  For
>>           multicast output ports that include one or more logical ports
>> on the
>>           local hypervisor, for each such logical port <var>P</var>, the
>> actions
>>
>>
>
Mark Michelson Aug. 6, 2018, 7:19 p.m. | #3
On 08/06/2018 01:58 PM, Anil Venkata wrote:
> Thanks Mark. Kindly look at my comment inline.
> 
> On Fri, Aug 3, 2018 at 2:17 AM, Mark Michelson <mmichels@redhat.com 
> <mailto:mmichels@redhat.com>> wrote:
> 
>     On 08/01/2018 08:16 AM, vkommadi@redhat.com
>     <mailto:vkommadi@redhat.com> wrote:
> 
>         From: venkata anil <vkommadi@redhat.com
>         <mailto:vkommadi@redhat.com>>
> 
>         Previous patches in the series doesn't address issue 1 explained
>         in [1]
>         i.e
>         1) removal of router gateway port MAC address on external switches
>              after expiring of aging time.
>         2) then external switches unable to learn the gateway MAC as
>              reply packets carry router internal port MAC address as source
> 
>         To fix this, router on gateway node will use router gateway MAC
>         address
>         instead of router internal port MAC address as source for reply
>         packets,
>         so that external switches can learn gateway MAC address.
>         This is done only for reply packets from router gateway to
>         tenant VLAN
>         switch ports.
>         Later before delivering the packet to the port, ovn-controller will
>         replace the gateway MAC with router internal port MAC in table 33.
> 
>         [1]
>         //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
>         <http://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html>
> 
>         Reported-by: Miguel Angel Ajo <majopela@redhat.com
>         <mailto:majopela@redhat.com>>
>         Reported-at:
>         https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
>         <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html>
>         Signed-off-by: Venkata Anil <vkommadi@redhat.com
>         <mailto:vkommadi@redhat.com>>
>         ---
> 
>         v6->v7:
>         * Added this patch
> 
> 
>            ovn/controller/physical.c   | 60
>         ++++++++++++++++++++++++++++++++++++++++++---
>            ovn/northd/ovn-northd.8.xml | 10 ++++++++
>            ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>            ovn/ovn-architecture.7.xml  |  4 ++-
>            4 files changed, 99 insertions(+), 4 deletions(-)
> 
>         diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>         index f269a1d..1f41f59 100644
>         --- a/ovn/controller/physical.c
>         +++ b/ovn/controller/physical.c
>         @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
>         *binding,
>            static void
>            put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>                                   bool nested_container, const struct
>         zone_ids *zone_ids,
>         -                       struct ofpbuf *ofpacts_p, struct hmap
>         *flow_table)
>         +                       struct ofpbuf *ofpacts_p, struct hmap
>         *flow_table,
>         +                       struct local_datapath *ld,
>         +                       const struct hmap *local_datapaths)
>            {
>                struct match match;
>            @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key,
>         uint32_t port_key,
>                    }
>                }
>            +    struct ofpbuf *clone = NULL;
>         +    clone = ofpbuf_clone(ofpacts_p);
>         +
> 
> 
>     I don't understand the use of the cloned ofpbuf here. You could just
>     ofpbuf_clear(ofpacts_p) in each iteration of the for loop below and
>     reuse ofpacts_p where you've used clone below.
> 
> 
> 
> 
> I need to add additional flow with priority 150 along with default flow 
> which exists with prioirty 100.
> 
> 1) table=33,  priority=100,reg15=0x5,metadata=0x2 
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> 2) table=33, 
>   priority=150,reg15=0x5,metadata=0x2,dl_src=fa:16:3e:48:66:e7 
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:65:f2:ae,resubmit(,34)
> 
> The first flow with priority 100 is the default flow. This flow is added 
> for each output port residing on the current chassis, sets some 
> registers and resubmits to table 34.
> But when the packet has router gateway MAC as source MAC address, I want 
> to replace the MAC before resubmitting to table 34. Second flow has same 
> match conditions as first flow and also same actions, additionally it 
> checks for the source MAC and then modifies the source MAC address.
> 
> Before I construct the second flow, I have the registers and resubmit as 
> part of actions i.e
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> 
> If I simply add MAC to actions with put_load(mac64, MFF_ETH_SRC, 0, 48, 
> ofpacts_p); then actions in 2nd flow(with 150 priority) will look like
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34),mod_dl_src:fa:16:3e:65:f2:ae
> i.e modifying the MAC will happen after resubmit, which is not the 
> expected behaviour.
> 
> So I am cloning the ofpacts_p before resubmit is added to it and using 
> the cloned one in constructing flow with priority 150.
> With this changes, actions will look like
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:65:f2:ae,resubmit(,34)

Thank you very much for the explanation. It makes a lot more sense now.

> 
> 
>                /* Resubmit to table 34. */
>                put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>                                &match, ofpacts_p);
>            +    /* For a reply packet from gateway with VLAN switch port
>         as destination
>         +     * (excluding localnet_port and external VLAN networks),
>         gateway router
>         +     * will use gateway MAC address as source MAC instead of
>         router internal
>         +     * port MAC, so that external switches can learn gateway
>         MAC address.
>         +     * Here (before packet is given to the port) we replace
>         router gateway
>         +     * MAC address with router internal port MAC. */
>         +    if (ld->localnet_port && (port_key !=
>         ld->localnet_port->tunnel_key)) {
>         +        for (int i = 0; i < ld->n_peer_dps; i++) {
>         +            struct local_datapath *peer_ldp = get_local_datapath(
>         +                local_datapaths,
>         ld->peer_dps[i]->peer_dp->tunnel_key);
>         +            const struct sbrec_port_binding *crp;
>         +            crp = peer_ldp->chassisredirect_port;
>         +            if (!crp) {
>         +                continue;
>         +            }
>         +
>         +            if (strcmp(smap_get(&crp->options, "distributed-port"),
>         +                       ld->peer_dps[i]->peer->logical_port) &&
>         +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
>         +                for (int j = 0; j < crp->n_mac; j++) {
>         +                    struct lport_addresses laddrs;
>         +                    if (!extract_lsp_addresses(crp->mac[j],
>         &laddrs)) {
>         +                        continue;
>         +                    }
>         +                    match_set_dl_src(&match, laddrs.ea);
>         +                    destroy_lport_addresses(&laddrs);
>         +                    break;
>         +                }
>         +                for (int j = 0; j <
>         ld->peer_dps[i]->peer->n_mac; j++) {
>         +                    struct lport_addresses laddrs;
>         +                    uint64_t mac64;
>         +                    if (!extract_lsp_addresses(
>         +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
>         +                        continue;
>         +                    }
>         +                    mac64 = eth_addr_to_uint64(laddrs.ea);
>         +                    put_load(mac64,
>         +                             MFF_ETH_SRC, 0, 48, clone);
>         +                    destroy_lport_addresses(&laddrs);
>         +                    break;
>         +                }
>         +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
>         +                ofctrl_add_flow(flow_table,
>         OFTABLE_LOCAL_OUTPUT, 150, 0,
>         +                    &match, clone);
>         +            }
>         +        }
>         +    }
>         +    ofpbuf_delete(clone);
>         +
>                /* Table 34, Priority 100.
>                 * =======================
>                 *
>         @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
>         *sbrec_chassis_by_name,
>                      struct zone_ids binding_zones =
>         get_zone_ids(binding, ct_zones);
>                    put_local_common_flows(dp_key, port_key, false,
>         &binding_zones,
>         -                               ofpacts_p, flow_table);
>         +                               ofpacts_p, flow_table, ld,
>         local_datapaths);
>                      match_init_catchall(&match);
>                    ofpbuf_clear(ofpacts_p);
>         @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
>         *sbrec_chassis_by_name,
>                      struct zone_ids zone_ids = get_zone_ids(binding,
>         ct_zones);
>                    put_local_common_flows(dp_key, port_key,
>         nested_container, &zone_ids,
>         -                               ofpacts_p, flow_table);
>         +                               ofpacts_p, flow_table, ld,
>         local_datapaths);
>                      /* Table 0, Priority 200, 150 and 100.
>                     * ==============================
>         diff --git a/ovn/northd/ovn-northd.8.xml
>         b/ovn/northd/ovn-northd.8.xml
>         index 8fa5272..876c121 100644
>         --- a/ovn/northd/ovn-northd.8.xml
>         +++ b/ovn/northd/ovn-northd.8.xml
>         @@ -2013,6 +2013,16 @@ next;
>                  </li>
>                    <li>
>         +        A priority-100 logical flow with match
>         +        <code>inport == <var>GW</var> &amp;&amp;
>         +        flags.rcv_from_vlan == 1</code> has actions
>         +        <code>eth.dst = <var>E</var>; next;</code>, where
>         +        <var>GW</var> is the logical router distributed gateway
>         +        port and <var>E</var> is the MAC address of router
>         +        distributed gateway port.
>         +      </li>
>         +
>         +      <li>
>                    For each NAT rule in the OVN Northbound database that can
>                    be handled in a distributed manner, a priority-100
>         logical
>                    flow with match <code>ip4.src == <var>B</var> &amp;&amp;
>         diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>         index bcf0b66..d012bb8 100644
>         --- a/ovn/northd/ovn-northd.c
>         +++ b/ovn/northd/ovn-northd.c
>         @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const
>         struct ovn_port *op,
>                } else {
>                    ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>                }
>         +
>         +    if (op->peer && op->peer->od->localnet_port &&
>         +        op->od->l3dgw_port && op->od->l3redirect_port &&
>         +        (op != op->od->l3redirect_port) &&
>         +        (op != op->od->l3dgw_port)) {
>         +        ds_put_format(&match, " && is_chassis_resident(%s)",
>         +                      op->od->l3redirect_port->json_key);
>         +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
>         +    }
>                ds_put_format(&actions, "; "
>                              "%sreg1 = %s; "
>                              "eth.src = %s; "
>         @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap
>         *datapaths, struct hmap *ports,
>                                  op->lrp_networks.ipv6_addrs[i].network_s,
>                                  op->lrp_networks.ipv6_addrs[i].plen,
>         NULL, NULL);
>                    }
>         +
>         +        /* For a reply packet from gateway with VLAN switch port as
>         +         * destination, replace router internal port MAC with
>         router gateway
>         +         * MAC address, so that external switches can learn
>         gateway MAC
>         +         * address. Later before delivering the packet to the port,
>         +         * controller will replace the gateway MAC with router
>         internal port
>         +         * MAC in table 33. */
>         +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
>         +            op->od->l3redirect_port) {
>         +            ds_clear(&actions);
>         +            ds_clear(&match);
>         +            ds_put_format(&match, "inport == %s", op->json_key);
>         +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>         +            ds_put_format(&match, " && is_chassis_resident(%s)",
>         +                          op->od->l3redirect_port->json_key);
>         +            ds_put_format(&actions,
>         +                          "eth.src = %s; next;",
>         op->lrp_networks.ea_s);
>         +            ovn_lflow_add(lflows, op->od,
>         S_ROUTER_IN_GW_REDIRECT, 100,
>         +                          ds_cstr(&match), ds_cstr(&actions));
>         +        }
>                }
>                  /* Convert the static routes to flows. */
>         diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>         index ad2101c..0de41d2 100644
>         --- a/ovn/ovn-architecture.7.xml
>         +++ b/ovn/ovn-architecture.7.xml
>         @@ -1067,7 +1067,9 @@
>                    <p>
>                    Flows in table 33 resemble those in table 32 but for
>         logical ports that
>         -        reside locally rather than remotely.  For unicast
>         logical output ports
>         +        reside locally rather than remotely.  If these are VLAN
>         ports and
>         +        packet has router gateway port MAC address as source,
>         replace it with
>         +        router internal port MAC address. For unicast logical
>         output ports
>                    on the local hypervisor, the actions just resubmit to
>         table 34.  For
>                    multicast output ports that include one or more
>         logical ports on the
>                    local hypervisor, for each such logical port
>         <var>P</var>, the actions
> 
> 
>
Miguel Angel Ajo Pelayo Aug. 14, 2018, 2:15 p.m. | #4
Thank you very much,

I wasn't able to perform a proper code review, but you can add the
Tested-By: Miguel Angel Ajo <majopela@redhat.com>

The issue I described previously seems to be fixed on the v7 of the series.


On Mon, Aug 6, 2018 at 9:19 PM Mark Michelson <mmichels@redhat.com> wrote:

> On 08/06/2018 01:58 PM, Anil Venkata wrote:
> > Thanks Mark. Kindly look at my comment inline.
> >
> > On Fri, Aug 3, 2018 at 2:17 AM, Mark Michelson <mmichels@redhat.com
> > <mailto:mmichels@redhat.com>> wrote:
> >
> >     On 08/01/2018 08:16 AM, vkommadi@redhat.com
> >     <mailto:vkommadi@redhat.com> wrote:
> >
> >         From: venkata anil <vkommadi@redhat.com
> >         <mailto:vkommadi@redhat.com>>
> >
> >         Previous patches in the series doesn't address issue 1 explained
> >         in [1]
> >         i.e
> >         1) removal of router gateway port MAC address on external
> switches
> >              after expiring of aging time.
> >         2) then external switches unable to learn the gateway MAC as
> >              reply packets carry router internal port MAC address as
> source
> >
> >         To fix this, router on gateway node will use router gateway MAC
> >         address
> >         instead of router internal port MAC address as source for reply
> >         packets,
> >         so that external switches can learn gateway MAC address.
> >         This is done only for reply packets from router gateway to
> >         tenant VLAN
> >         switch ports.
> >         Later before delivering the packet to the port, ovn-controller
> will
> >         replace the gateway MAC with router internal port MAC in table
> 33.
> >
> >         [1]
> >         //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
> >         <
> http://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html>
> >
> >         Reported-by: Miguel Angel Ajo <majopela@redhat.com
> >         <mailto:majopela@redhat.com>>
> >         Reported-at:
> >
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
> >         <
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html>
> >         Signed-off-by: Venkata Anil <vkommadi@redhat.com
> >         <mailto:vkommadi@redhat.com>>
> >         ---
> >
> >         v6->v7:
> >         * Added this patch
> >
> >
> >            ovn/controller/physical.c   | 60
> >         ++++++++++++++++++++++++++++++++++++++++++---
> >            ovn/northd/ovn-northd.8.xml | 10 ++++++++
> >            ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
> >            ovn/ovn-architecture.7.xml  |  4 ++-
> >            4 files changed, 99 insertions(+), 4 deletions(-)
> >
> >         diff --git a/ovn/controller/physical.c
> b/ovn/controller/physical.c
> >         index f269a1d..1f41f59 100644
> >         --- a/ovn/controller/physical.c
> >         +++ b/ovn/controller/physical.c
> >         @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
> >         *binding,
> >            static void
> >            put_local_common_flows(uint32_t dp_key, uint32_t port_key,
> >                                   bool nested_container, const struct
> >         zone_ids *zone_ids,
> >         -                       struct ofpbuf *ofpacts_p, struct hmap
> >         *flow_table)
> >         +                       struct ofpbuf *ofpacts_p, struct hmap
> >         *flow_table,
> >         +                       struct local_datapath *ld,
> >         +                       const struct hmap *local_datapaths)
> >            {
> >                struct match match;
> >            @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key,
> >         uint32_t port_key,
> >                    }
> >                }
> >            +    struct ofpbuf *clone = NULL;
> >         +    clone = ofpbuf_clone(ofpacts_p);
> >         +
> >
> >
> >     I don't understand the use of the cloned ofpbuf here. You could just
> >     ofpbuf_clear(ofpacts_p) in each iteration of the for loop below and
> >     reuse ofpacts_p where you've used clone below.
> >
> >
> >
> >
> > I need to add additional flow with priority 150 along with default flow
> > which exists with prioirty 100.
> >
> > 1) table=33,  priority=100,reg15=0x5,metadata=0x2
> >
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> > 2) table=33,
> >   priority=150,reg15=0x5,metadata=0x2,dl_src=fa:16:3e:48:66:e7
> >
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:65:f2:ae,resubmit(,34)
> >
> > The first flow with priority 100 is the default flow. This flow is added
> > for each output port residing on the current chassis, sets some
> > registers and resubmits to table 34.
> > But when the packet has router gateway MAC as source MAC address, I want
> > to replace the MAC before resubmitting to table 34. Second flow has same
> > match conditions as first flow and also same actions, additionally it
> > checks for the source MAC and then modifies the source MAC address.
> >
> > Before I construct the second flow, I have the registers and resubmit as
> > part of actions i.e
> >
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> >
> > If I simply add MAC to actions with put_load(mac64, MFF_ETH_SRC, 0, 48,
> > ofpacts_p); then actions in 2nd flow(with 150 priority) will look like
> >
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34),mod_dl_src:fa:16:3e:65:f2:ae
> > i.e modifying the MAC will happen after resubmit, which is not the
> > expected behaviour.
> >
> > So I am cloning the ofpacts_p before resubmit is added to it and using
> > the cloned one in constructing flow with priority 150.
> > With this changes, actions will look like
> >
> actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:65:f2:ae,resubmit(,34)
>
> Thank you very much for the explanation. It makes a lot more sense now.
>
> >
> >
> >                /* Resubmit to table 34. */
> >                put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> >                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
> >                                &match, ofpacts_p);
> >            +    /* For a reply packet from gateway with VLAN switch port
> >         as destination
> >         +     * (excluding localnet_port and external VLAN networks),
> >         gateway router
> >         +     * will use gateway MAC address as source MAC instead of
> >         router internal
> >         +     * port MAC, so that external switches can learn gateway
> >         MAC address.
> >         +     * Here (before packet is given to the port) we replace
> >         router gateway
> >         +     * MAC address with router internal port MAC. */
> >         +    if (ld->localnet_port && (port_key !=
> >         ld->localnet_port->tunnel_key)) {
> >         +        for (int i = 0; i < ld->n_peer_dps; i++) {
> >         +            struct local_datapath *peer_ldp =
> get_local_datapath(
> >         +                local_datapaths,
> >         ld->peer_dps[i]->peer_dp->tunnel_key);
> >         +            const struct sbrec_port_binding *crp;
> >         +            crp = peer_ldp->chassisredirect_port;
> >         +            if (!crp) {
> >         +                continue;
> >         +            }
> >         +
> >         +            if (strcmp(smap_get(&crp->options,
> "distributed-port"),
> >         +                       ld->peer_dps[i]->peer->logical_port) &&
> >         +                (port_key !=
> ld->peer_dps[i]->patch->tunnel_key)) {
> >         +                for (int j = 0; j < crp->n_mac; j++) {
> >         +                    struct lport_addresses laddrs;
> >         +                    if (!extract_lsp_addresses(crp->mac[j],
> >         &laddrs)) {
> >         +                        continue;
> >         +                    }
> >         +                    match_set_dl_src(&match, laddrs.ea);
> >         +                    destroy_lport_addresses(&laddrs);
> >         +                    break;
> >         +                }
> >         +                for (int j = 0; j <
> >         ld->peer_dps[i]->peer->n_mac; j++) {
> >         +                    struct lport_addresses laddrs;
> >         +                    uint64_t mac64;
> >         +                    if (!extract_lsp_addresses(
> >         +                        ld->peer_dps[i]->peer->mac[j],
> &laddrs)) {
> >         +                        continue;
> >         +                    }
> >         +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> >         +                    put_load(mac64,
> >         +                             MFF_ETH_SRC, 0, 48, clone);
> >         +                    destroy_lport_addresses(&laddrs);
> >         +                    break;
> >         +                }
> >         +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
> >         +                ofctrl_add_flow(flow_table,
> >         OFTABLE_LOCAL_OUTPUT, 150, 0,
> >         +                    &match, clone);
> >         +            }
> >         +        }
> >         +    }
> >         +    ofpbuf_delete(clone);
> >         +
> >                /* Table 34, Priority 100.
> >                 * =======================
> >                 *
> >         @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
> >         *sbrec_chassis_by_name,
> >                      struct zone_ids binding_zones =
> >         get_zone_ids(binding, ct_zones);
> >                    put_local_common_flows(dp_key, port_key, false,
> >         &binding_zones,
> >         -                               ofpacts_p, flow_table);
> >         +                               ofpacts_p, flow_table, ld,
> >         local_datapaths);
> >                      match_init_catchall(&match);
> >                    ofpbuf_clear(ofpacts_p);
> >         @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
> >         *sbrec_chassis_by_name,
> >                      struct zone_ids zone_ids = get_zone_ids(binding,
> >         ct_zones);
> >                    put_local_common_flows(dp_key, port_key,
> >         nested_container, &zone_ids,
> >         -                               ofpacts_p, flow_table);
> >         +                               ofpacts_p, flow_table, ld,
> >         local_datapaths);
> >                      /* Table 0, Priority 200, 150 and 100.
> >                     * ==============================
> >         diff --git a/ovn/northd/ovn-northd.8.xml
> >         b/ovn/northd/ovn-northd.8.xml
> >         index 8fa5272..876c121 100644
> >         --- a/ovn/northd/ovn-northd.8.xml
> >         +++ b/ovn/northd/ovn-northd.8.xml
> >         @@ -2013,6 +2013,16 @@ next;
> >                  </li>
> >                    <li>
> >         +        A priority-100 logical flow with match
> >         +        <code>inport == <var>GW</var> &amp;&amp;
> >         +        flags.rcv_from_vlan == 1</code> has actions
> >         +        <code>eth.dst = <var>E</var>; next;</code>, where
> >         +        <var>GW</var> is the logical router distributed gateway
> >         +        port and <var>E</var> is the MAC address of router
> >         +        distributed gateway port.
> >         +      </li>
> >         +
> >         +      <li>
> >                    For each NAT rule in the OVN Northbound database that
> can
> >                    be handled in a distributed manner, a priority-100
> >         logical
> >                    flow with match <code>ip4.src == <var>B</var>
> &amp;&amp;
> >         diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >         index bcf0b66..d012bb8 100644
> >         --- a/ovn/northd/ovn-northd.c
> >         +++ b/ovn/northd/ovn-northd.c
> >         @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const
> >         struct ovn_port *op,
> >                } else {
> >                    ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" :
> "6");
> >                }
> >         +
> >         +    if (op->peer && op->peer->od->localnet_port &&
> >         +        op->od->l3dgw_port && op->od->l3redirect_port &&
> >         +        (op != op->od->l3redirect_port) &&
> >         +        (op != op->od->l3dgw_port)) {
> >         +        ds_put_format(&match, " && is_chassis_resident(%s)",
> >         +                      op->od->l3redirect_port->json_key);
> >         +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
> >         +    }
> >                ds_put_format(&actions, "; "
> >                              "%sreg1 = %s; "
> >                              "eth.src = %s; "
> >         @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap
> >         *datapaths, struct hmap *ports,
> >
> op->lrp_networks.ipv6_addrs[i].network_s,
> >                                  op->lrp_networks.ipv6_addrs[i].plen,
> >         NULL, NULL);
> >                    }
> >         +
> >         +        /* For a reply packet from gateway with VLAN switch
> port as
> >         +         * destination, replace router internal port MAC with
> >         router gateway
> >         +         * MAC address, so that external switches can learn
> >         gateway MAC
> >         +         * address. Later before delivering the packet to the
> port,
> >         +         * controller will replace the gateway MAC with router
> >         internal port
> >         +         * MAC in table 33. */
> >         +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
> >         +            op->od->l3redirect_port) {
> >         +            ds_clear(&actions);
> >         +            ds_clear(&match);
> >         +            ds_put_format(&match, "inport == %s", op->json_key);
> >         +            ds_put_format(&match, " && flags.rcv_from_vlan ==
> 1");
> >         +            ds_put_format(&match, " && is_chassis_resident(%s)",
> >         +                          op->od->l3redirect_port->json_key);
> >         +            ds_put_format(&actions,
> >         +                          "eth.src = %s; next;",
> >         op->lrp_networks.ea_s);
> >         +            ovn_lflow_add(lflows, op->od,
> >         S_ROUTER_IN_GW_REDIRECT, 100,
> >         +                          ds_cstr(&match), ds_cstr(&actions));
> >         +        }
> >                }
> >                  /* Convert the static routes to flows. */
> >         diff --git a/ovn/ovn-architecture.7.xml
> b/ovn/ovn-architecture.7.xml
> >         index ad2101c..0de41d2 100644
> >         --- a/ovn/ovn-architecture.7.xml
> >         +++ b/ovn/ovn-architecture.7.xml
> >         @@ -1067,7 +1067,9 @@
> >                    <p>
> >                    Flows in table 33 resemble those in table 32 but for
> >         logical ports that
> >         -        reside locally rather than remotely.  For unicast
> >         logical output ports
> >         +        reside locally rather than remotely.  If these are VLAN
> >         ports and
> >         +        packet has router gateway port MAC address as source,
> >         replace it with
> >         +        router internal port MAC address. For unicast logical
> >         output ports
> >                    on the local hypervisor, the actions just resubmit to
> >         table 34.  For
> >                    multicast output ports that include one or more
> >         logical ports on the
> >                    local hypervisor, for each such logical port
> >         <var>P</var>, the actions
> >
> >
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Anil Venkata Aug. 14, 2018, 3:18 p.m. | #5
Gentle reminder requesting more reviews..

Mark has reviewed the patch series and Miguel has already tested the patch
series.

Thanks
Anil

On Tue, Aug 14, 2018 at 7:45 PM, Miguel Angel Ajo Pelayo <
majopela@redhat.com> wrote:

> Thank you very much,
>
> I wasn't able to perform a proper code review, but you can add the
> Tested-By: Miguel Angel Ajo <majopela@redhat.com>
>
> The issue I described previously seems to be fixed on the v7 of the series.
>
>
> On Mon, Aug 6, 2018 at 9:19 PM Mark Michelson <mmichels@redhat.com> wrote:
>
>> On 08/06/2018 01:58 PM, Anil Venkata wrote:
>> > Thanks Mark. Kindly look at my comment inline.
>> >
>> > On Fri, Aug 3, 2018 at 2:17 AM, Mark Michelson <mmichels@redhat.com
>> > <mailto:mmichels@redhat.com>> wrote:
>> >
>> >     On 08/01/2018 08:16 AM, vkommadi@redhat.com
>> >     <mailto:vkommadi@redhat.com> wrote:
>> >
>> >         From: venkata anil <vkommadi@redhat.com
>> >         <mailto:vkommadi@redhat.com>>
>> >
>> >         Previous patches in the series doesn't address issue 1 explained
>> >         in [1]
>> >         i.e
>> >         1) removal of router gateway port MAC address on external
>> switches
>> >              after expiring of aging time.
>> >         2) then external switches unable to learn the gateway MAC as
>> >              reply packets carry router internal port MAC address as
>> source
>> >
>> >         To fix this, router on gateway node will use router gateway MAC
>> >         address
>> >         instead of router internal port MAC address as source for reply
>> >         packets,
>> >         so that external switches can learn gateway MAC address.
>> >         This is done only for reply packets from router gateway to
>> >         tenant VLAN
>> >         switch ports.
>> >         Later before delivering the packet to the port, ovn-controller
>> will
>> >         replace the gateway MAC with router internal port MAC in table
>> 33.
>> >
>> >         [1]
>> >         //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
>> >         <http://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
>> 349803.html>
>> >
>> >         Reported-by: Miguel Angel Ajo <majopela@redhat.com
>> >         <mailto:majopela@redhat.com>>
>> >         Reported-at:
>> >         https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
>> 349803.html
>> >         <https://mail.openvswitch.org/pipermail/ovs-dev/2018-
>> July/349803.html>
>> >         Signed-off-by: Venkata Anil <vkommadi@redhat.com
>> >         <mailto:vkommadi@redhat.com>>
>> >         ---
>> >
>> >         v6->v7:
>> >         * Added this patch
>> >
>> >
>> >            ovn/controller/physical.c   | 60
>> >         ++++++++++++++++++++++++++++++++++++++++++---
>> >            ovn/northd/ovn-northd.8.xml | 10 ++++++++
>> >            ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>> >            ovn/ovn-architecture.7.xml  |  4 ++-
>> >            4 files changed, 99 insertions(+), 4 deletions(-)
>> >
>> >         diff --git a/ovn/controller/physical.c
>> b/ovn/controller/physical.c
>> >         index f269a1d..1f41f59 100644
>> >         --- a/ovn/controller/physical.c
>> >         +++ b/ovn/controller/physical.c
>> >         @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
>> >         *binding,
>> >            static void
>> >            put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>> >                                   bool nested_container, const struct
>> >         zone_ids *zone_ids,
>> >         -                       struct ofpbuf *ofpacts_p, struct hmap
>> >         *flow_table)
>> >         +                       struct ofpbuf *ofpacts_p, struct hmap
>> >         *flow_table,
>> >         +                       struct local_datapath *ld,
>> >         +                       const struct hmap *local_datapaths)
>> >            {
>> >                struct match match;
>> >            @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t
>> dp_key,
>> >         uint32_t port_key,
>> >                    }
>> >                }
>> >            +    struct ofpbuf *clone = NULL;
>> >         +    clone = ofpbuf_clone(ofpacts_p);
>> >         +
>> >
>> >
>> >     I don't understand the use of the cloned ofpbuf here. You could just
>> >     ofpbuf_clear(ofpacts_p) in each iteration of the for loop below and
>> >     reuse ofpacts_p where you've used clone below.
>> >
>> >
>> >
>> >
>> > I need to add additional flow with priority 150 along with default flow
>> > which exists with prioirty 100.
>> >
>> > 1) table=33,  priority=100,reg15=0x5,metadata=0x2
>> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
>> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
>> > 2) table=33,
>> >   priority=150,reg15=0x5,metadata=0x2,dl_src=fa:16:3e:48:66:e7
>> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
>> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
>> 65:f2:ae,resubmit(,34)
>> >
>> > The first flow with priority 100 is the default flow. This flow is
>> added
>> > for each output port residing on the current chassis, sets some
>> > registers and resubmits to table 34.
>> > But when the packet has router gateway MAC as source MAC address, I
>> want
>> > to replace the MAC before resubmitting to table 34. Second flow has
>> same
>> > match conditions as first flow and also same actions, additionally it
>> > checks for the source MAC and then modifies the source MAC address.
>> >
>> > Before I construct the second flow, I have the registers and resubmit
>> as
>> > part of actions i.e
>> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
>> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
>> >
>> > If I simply add MAC to actions with put_load(mac64, MFF_ETH_SRC, 0, 48,
>> > ofpacts_p); then actions in 2nd flow(with 150 priority) will look like
>> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
>> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34),mod_dl_
>> src:fa:16:3e:65:f2:ae
>> > i.e modifying the MAC will happen after resubmit, which is not the
>> > expected behaviour.
>> >
>> > So I am cloning the ofpacts_p before resubmit is added to it and using
>> > the cloned one in constructing flow with priority 150.
>> > With this changes, actions will look like
>> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
>> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
>> 65:f2:ae,resubmit(,34)
>>
>> Thank you very much for the explanation. It makes a lot more sense now.
>>
>> >
>> >
>> >                /* Resubmit to table 34. */
>> >                put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>> >                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>> >                                &match, ofpacts_p);
>> >            +    /* For a reply packet from gateway with VLAN switch port
>> >         as destination
>> >         +     * (excluding localnet_port and external VLAN networks),
>> >         gateway router
>> >         +     * will use gateway MAC address as source MAC instead of
>> >         router internal
>> >         +     * port MAC, so that external switches can learn gateway
>> >         MAC address.
>> >         +     * Here (before packet is given to the port) we replace
>> >         router gateway
>> >         +     * MAC address with router internal port MAC. */
>> >         +    if (ld->localnet_port && (port_key !=
>> >         ld->localnet_port->tunnel_key)) {
>> >         +        for (int i = 0; i < ld->n_peer_dps; i++) {
>> >         +            struct local_datapath *peer_ldp =
>> get_local_datapath(
>> >         +                local_datapaths,
>> >         ld->peer_dps[i]->peer_dp->tunnel_key);
>> >         +            const struct sbrec_port_binding *crp;
>> >         +            crp = peer_ldp->chassisredirect_port;
>> >         +            if (!crp) {
>> >         +                continue;
>> >         +            }
>> >         +
>> >         +            if (strcmp(smap_get(&crp->options,
>> "distributed-port"),
>> >         +                       ld->peer_dps[i]->peer->logical_port) &&
>> >         +                (port_key != ld->peer_dps[i]->patch->tunnel_key))
>> {
>> >         +                for (int j = 0; j < crp->n_mac; j++) {
>> >         +                    struct lport_addresses laddrs;
>> >         +                    if (!extract_lsp_addresses(crp->mac[j],
>> >         &laddrs)) {
>> >         +                        continue;
>> >         +                    }
>> >         +                    match_set_dl_src(&match, laddrs.ea);
>> >         +                    destroy_lport_addresses(&laddrs);
>> >         +                    break;
>> >         +                }
>> >         +                for (int j = 0; j <
>> >         ld->peer_dps[i]->peer->n_mac; j++) {
>> >         +                    struct lport_addresses laddrs;
>> >         +                    uint64_t mac64;
>> >         +                    if (!extract_lsp_addresses(
>> >         +                        ld->peer_dps[i]->peer->mac[j],
>> &laddrs)) {
>> >         +                        continue;
>> >         +                    }
>> >         +                    mac64 = eth_addr_to_uint64(laddrs.ea);
>> >         +                    put_load(mac64,
>> >         +                             MFF_ETH_SRC, 0, 48, clone);
>> >         +                    destroy_lport_addresses(&laddrs);
>> >         +                    break;
>> >         +                }
>> >         +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
>> >         +                ofctrl_add_flow(flow_table,
>> >         OFTABLE_LOCAL_OUTPUT, 150, 0,
>> >         +                    &match, clone);
>> >         +            }
>> >         +        }
>> >         +    }
>> >         +    ofpbuf_delete(clone);
>> >         +
>> >                /* Table 34, Priority 100.
>> >                 * =======================
>> >                 *
>> >         @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
>> >         *sbrec_chassis_by_name,
>> >                      struct zone_ids binding_zones =
>> >         get_zone_ids(binding, ct_zones);
>> >                    put_local_common_flows(dp_key, port_key, false,
>> >         &binding_zones,
>> >         -                               ofpacts_p, flow_table);
>> >         +                               ofpacts_p, flow_table, ld,
>> >         local_datapaths);
>> >                      match_init_catchall(&match);
>> >                    ofpbuf_clear(ofpacts_p);
>> >         @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
>> >         *sbrec_chassis_by_name,
>> >                      struct zone_ids zone_ids = get_zone_ids(binding,
>> >         ct_zones);
>> >                    put_local_common_flows(dp_key, port_key,
>> >         nested_container, &zone_ids,
>> >         -                               ofpacts_p, flow_table);
>> >         +                               ofpacts_p, flow_table, ld,
>> >         local_datapaths);
>> >                      /* Table 0, Priority 200, 150 and 100.
>> >                     * ==============================
>> >         diff --git a/ovn/northd/ovn-northd.8.xml
>> >         b/ovn/northd/ovn-northd.8.xml
>> >         index 8fa5272..876c121 100644
>> >         --- a/ovn/northd/ovn-northd.8.xml
>> >         +++ b/ovn/northd/ovn-northd.8.xml
>> >         @@ -2013,6 +2013,16 @@ next;
>> >                  </li>
>> >                    <li>
>> >         +        A priority-100 logical flow with match
>> >         +        <code>inport == <var>GW</var> &amp;&amp;
>> >         +        flags.rcv_from_vlan == 1</code> has actions
>> >         +        <code>eth.dst = <var>E</var>; next;</code>, where
>> >         +        <var>GW</var> is the logical router distributed gateway
>> >         +        port and <var>E</var> is the MAC address of router
>> >         +        distributed gateway port.
>> >         +      </li>
>> >         +
>> >         +      <li>
>> >                    For each NAT rule in the OVN Northbound database
>> that can
>> >                    be handled in a distributed manner, a priority-100
>> >         logical
>> >                    flow with match <code>ip4.src == <var>B</var>
>> &amp;&amp;
>> >         diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> >         index bcf0b66..d012bb8 100644
>> >         --- a/ovn/northd/ovn-northd.c
>> >         +++ b/ovn/northd/ovn-northd.c
>> >         @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const
>> >         struct ovn_port *op,
>> >                } else {
>> >                    ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" :
>> "6");
>> >                }
>> >         +
>> >         +    if (op->peer && op->peer->od->localnet_port &&
>> >         +        op->od->l3dgw_port && op->od->l3redirect_port &&
>> >         +        (op != op->od->l3redirect_port) &&
>> >         +        (op != op->od->l3dgw_port)) {
>> >         +        ds_put_format(&match, " && is_chassis_resident(%s)",
>> >         +                      op->od->l3redirect_port->json_key);
>> >         +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
>> >         +    }
>> >                ds_put_format(&actions, "; "
>> >                              "%sreg1 = %s; "
>> >                              "eth.src = %s; "
>> >         @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap
>> >         *datapaths, struct hmap *ports,
>> >                                  op->lrp_networks.ipv6_addrs[i]
>> .network_s,
>> >                                  op->lrp_networks.ipv6_addrs[i].plen,
>> >         NULL, NULL);
>> >                    }
>> >         +
>> >         +        /* For a reply packet from gateway with VLAN switch
>> port as
>> >         +         * destination, replace router internal port MAC with
>> >         router gateway
>> >         +         * MAC address, so that external switches can learn
>> >         gateway MAC
>> >         +         * address. Later before delivering the packet to the
>> port,
>> >         +         * controller will replace the gateway MAC with router
>> >         internal port
>> >         +         * MAC in table 33. */
>> >         +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
>> >         +            op->od->l3redirect_port) {
>> >         +            ds_clear(&actions);
>> >         +            ds_clear(&match);
>> >         +            ds_put_format(&match, "inport == %s",
>> op->json_key);
>> >         +            ds_put_format(&match, " && flags.rcv_from_vlan ==
>> 1");
>> >         +            ds_put_format(&match, " &&
>> is_chassis_resident(%s)",
>> >         +                          op->od->l3redirect_port->json_key);
>> >         +            ds_put_format(&actions,
>> >         +                          "eth.src = %s; next;",
>> >         op->lrp_networks.ea_s);
>> >         +            ovn_lflow_add(lflows, op->od,
>> >         S_ROUTER_IN_GW_REDIRECT, 100,
>> >         +                          ds_cstr(&match), ds_cstr(&actions));
>> >         +        }
>> >                }
>> >                  /* Convert the static routes to flows. */
>> >         diff --git a/ovn/ovn-architecture.7.xml
>> b/ovn/ovn-architecture.7.xml
>> >         index ad2101c..0de41d2 100644
>> >         --- a/ovn/ovn-architecture.7.xml
>> >         +++ b/ovn/ovn-architecture.7.xml
>> >         @@ -1067,7 +1067,9 @@
>> >                    <p>
>> >                    Flows in table 33 resemble those in table 32 but for
>> >         logical ports that
>> >         -        reside locally rather than remotely.  For unicast
>> >         logical output ports
>> >         +        reside locally rather than remotely.  If these are VLAN
>> >         ports and
>> >         +        packet has router gateway port MAC address as source,
>> >         replace it with
>> >         +        router internal port MAC address. For unicast logical
>> >         output ports
>> >                    on the local hypervisor, the actions just resubmit to
>> >         table 34.  For
>> >                    multicast output ports that include one or more
>> >         logical ports on the
>> >                    local hypervisor, for each such logical port
>> >         <var>P</var>, the actions
>> >
>> >
>> >
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Ben Pfaff Aug. 15, 2018, 1:10 a.m. | #6
I've asked Justin to take a look at this series.

On Tue, Aug 14, 2018 at 08:48:25PM +0530, Anil Venkata wrote:
> Gentle reminder requesting more reviews..
> 
> Mark has reviewed the patch series and Miguel has already tested the patch
> series.
> 
> Thanks
> Anil
> 
> On Tue, Aug 14, 2018 at 7:45 PM, Miguel Angel Ajo Pelayo <
> majopela@redhat.com> wrote:
> 
> > Thank you very much,
> >
> > I wasn't able to perform a proper code review, but you can add the
> > Tested-By: Miguel Angel Ajo <majopela@redhat.com>
> >
> > The issue I described previously seems to be fixed on the v7 of the series.
> >
> >
> > On Mon, Aug 6, 2018 at 9:19 PM Mark Michelson <mmichels@redhat.com> wrote:
> >
> >> On 08/06/2018 01:58 PM, Anil Venkata wrote:
> >> > Thanks Mark. Kindly look at my comment inline.
> >> >
> >> > On Fri, Aug 3, 2018 at 2:17 AM, Mark Michelson <mmichels@redhat.com
> >> > <mailto:mmichels@redhat.com>> wrote:
> >> >
> >> >     On 08/01/2018 08:16 AM, vkommadi@redhat.com
> >> >     <mailto:vkommadi@redhat.com> wrote:
> >> >
> >> >         From: venkata anil <vkommadi@redhat.com
> >> >         <mailto:vkommadi@redhat.com>>
> >> >
> >> >         Previous patches in the series doesn't address issue 1 explained
> >> >         in [1]
> >> >         i.e
> >> >         1) removal of router gateway port MAC address on external
> >> switches
> >> >              after expiring of aging time.
> >> >         2) then external switches unable to learn the gateway MAC as
> >> >              reply packets carry router internal port MAC address as
> >> source
> >> >
> >> >         To fix this, router on gateway node will use router gateway MAC
> >> >         address
> >> >         instead of router internal port MAC address as source for reply
> >> >         packets,
> >> >         so that external switches can learn gateway MAC address.
> >> >         This is done only for reply packets from router gateway to
> >> >         tenant VLAN
> >> >         switch ports.
> >> >         Later before delivering the packet to the port, ovn-controller
> >> will
> >> >         replace the gateway MAC with router internal port MAC in table
> >> 33.
> >> >
> >> >         [1]
> >> >         //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
> >> >         <http://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> >> 349803.html>
> >> >
> >> >         Reported-by: Miguel Angel Ajo <majopela@redhat.com
> >> >         <mailto:majopela@redhat.com>>
> >> >         Reported-at:
> >> >         https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> >> 349803.html
> >> >         <https://mail.openvswitch.org/pipermail/ovs-dev/2018-
> >> July/349803.html>
> >> >         Signed-off-by: Venkata Anil <vkommadi@redhat.com
> >> >         <mailto:vkommadi@redhat.com>>
> >> >         ---
> >> >
> >> >         v6->v7:
> >> >         * Added this patch
> >> >
> >> >
> >> >            ovn/controller/physical.c   | 60
> >> >         ++++++++++++++++++++++++++++++++++++++++++---
> >> >            ovn/northd/ovn-northd.8.xml | 10 ++++++++
> >> >            ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
> >> >            ovn/ovn-architecture.7.xml  |  4 ++-
> >> >            4 files changed, 99 insertions(+), 4 deletions(-)
> >> >
> >> >         diff --git a/ovn/controller/physical.c
> >> b/ovn/controller/physical.c
> >> >         index f269a1d..1f41f59 100644
> >> >         --- a/ovn/controller/physical.c
> >> >         +++ b/ovn/controller/physical.c
> >> >         @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
> >> >         *binding,
> >> >            static void
> >> >            put_local_common_flows(uint32_t dp_key, uint32_t port_key,
> >> >                                   bool nested_container, const struct
> >> >         zone_ids *zone_ids,
> >> >         -                       struct ofpbuf *ofpacts_p, struct hmap
> >> >         *flow_table)
> >> >         +                       struct ofpbuf *ofpacts_p, struct hmap
> >> >         *flow_table,
> >> >         +                       struct local_datapath *ld,
> >> >         +                       const struct hmap *local_datapaths)
> >> >            {
> >> >                struct match match;
> >> >            @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t
> >> dp_key,
> >> >         uint32_t port_key,
> >> >                    }
> >> >                }
> >> >            +    struct ofpbuf *clone = NULL;
> >> >         +    clone = ofpbuf_clone(ofpacts_p);
> >> >         +
> >> >
> >> >
> >> >     I don't understand the use of the cloned ofpbuf here. You could just
> >> >     ofpbuf_clear(ofpacts_p) in each iteration of the for loop below and
> >> >     reuse ofpacts_p where you've used clone below.
> >> >
> >> >
> >> >
> >> >
> >> > I need to add additional flow with priority 150 along with default flow
> >> > which exists with prioirty 100.
> >> >
> >> > 1) table=33,  priority=100,reg15=0x5,metadata=0x2
> >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> >> > 2) table=33,
> >> >   priority=150,reg15=0x5,metadata=0x2,dl_src=fa:16:3e:48:66:e7
> >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> >> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
> >> 65:f2:ae,resubmit(,34)
> >> >
> >> > The first flow with priority 100 is the default flow. This flow is
> >> added
> >> > for each output port residing on the current chassis, sets some
> >> > registers and resubmits to table 34.
> >> > But when the packet has router gateway MAC as source MAC address, I
> >> want
> >> > to replace the MAC before resubmitting to table 34. Second flow has
> >> same
> >> > match conditions as first flow and also same actions, additionally it
> >> > checks for the source MAC and then modifies the source MAC address.
> >> >
> >> > Before I construct the second flow, I have the registers and resubmit
> >> as
> >> > part of actions i.e
> >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> >> >
> >> > If I simply add MAC to actions with put_load(mac64, MFF_ETH_SRC, 0, 48,
> >> > ofpacts_p); then actions in 2nd flow(with 150 priority) will look like
> >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34),mod_dl_
> >> src:fa:16:3e:65:f2:ae
> >> > i.e modifying the MAC will happen after resubmit, which is not the
> >> > expected behaviour.
> >> >
> >> > So I am cloning the ofpacts_p before resubmit is added to it and using
> >> > the cloned one in constructing flow with priority 150.
> >> > With this changes, actions will look like
> >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> >> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
> >> 65:f2:ae,resubmit(,34)
> >>
> >> Thank you very much for the explanation. It makes a lot more sense now.
> >>
> >> >
> >> >
> >> >                /* Resubmit to table 34. */
> >> >                put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> >> >                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
> >> >                                &match, ofpacts_p);
> >> >            +    /* For a reply packet from gateway with VLAN switch port
> >> >         as destination
> >> >         +     * (excluding localnet_port and external VLAN networks),
> >> >         gateway router
> >> >         +     * will use gateway MAC address as source MAC instead of
> >> >         router internal
> >> >         +     * port MAC, so that external switches can learn gateway
> >> >         MAC address.
> >> >         +     * Here (before packet is given to the port) we replace
> >> >         router gateway
> >> >         +     * MAC address with router internal port MAC. */
> >> >         +    if (ld->localnet_port && (port_key !=
> >> >         ld->localnet_port->tunnel_key)) {
> >> >         +        for (int i = 0; i < ld->n_peer_dps; i++) {
> >> >         +            struct local_datapath *peer_ldp =
> >> get_local_datapath(
> >> >         +                local_datapaths,
> >> >         ld->peer_dps[i]->peer_dp->tunnel_key);
> >> >         +            const struct sbrec_port_binding *crp;
> >> >         +            crp = peer_ldp->chassisredirect_port;
> >> >         +            if (!crp) {
> >> >         +                continue;
> >> >         +            }
> >> >         +
> >> >         +            if (strcmp(smap_get(&crp->options,
> >> "distributed-port"),
> >> >         +                       ld->peer_dps[i]->peer->logical_port) &&
> >> >         +                (port_key != ld->peer_dps[i]->patch->tunnel_key))
> >> {
> >> >         +                for (int j = 0; j < crp->n_mac; j++) {
> >> >         +                    struct lport_addresses laddrs;
> >> >         +                    if (!extract_lsp_addresses(crp->mac[j],
> >> >         &laddrs)) {
> >> >         +                        continue;
> >> >         +                    }
> >> >         +                    match_set_dl_src(&match, laddrs.ea);
> >> >         +                    destroy_lport_addresses(&laddrs);
> >> >         +                    break;
> >> >         +                }
> >> >         +                for (int j = 0; j <
> >> >         ld->peer_dps[i]->peer->n_mac; j++) {
> >> >         +                    struct lport_addresses laddrs;
> >> >         +                    uint64_t mac64;
> >> >         +                    if (!extract_lsp_addresses(
> >> >         +                        ld->peer_dps[i]->peer->mac[j],
> >> &laddrs)) {
> >> >         +                        continue;
> >> >         +                    }
> >> >         +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> >> >         +                    put_load(mac64,
> >> >         +                             MFF_ETH_SRC, 0, 48, clone);
> >> >         +                    destroy_lport_addresses(&laddrs);
> >> >         +                    break;
> >> >         +                }
> >> >         +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
> >> >         +                ofctrl_add_flow(flow_table,
> >> >         OFTABLE_LOCAL_OUTPUT, 150, 0,
> >> >         +                    &match, clone);
> >> >         +            }
> >> >         +        }
> >> >         +    }
> >> >         +    ofpbuf_delete(clone);
> >> >         +
> >> >                /* Table 34, Priority 100.
> >> >                 * =======================
> >> >                 *
> >> >         @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
> >> >         *sbrec_chassis_by_name,
> >> >                      struct zone_ids binding_zones =
> >> >         get_zone_ids(binding, ct_zones);
> >> >                    put_local_common_flows(dp_key, port_key, false,
> >> >         &binding_zones,
> >> >         -                               ofpacts_p, flow_table);
> >> >         +                               ofpacts_p, flow_table, ld,
> >> >         local_datapaths);
> >> >                      match_init_catchall(&match);
> >> >                    ofpbuf_clear(ofpacts_p);
> >> >         @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
> >> >         *sbrec_chassis_by_name,
> >> >                      struct zone_ids zone_ids = get_zone_ids(binding,
> >> >         ct_zones);
> >> >                    put_local_common_flows(dp_key, port_key,
> >> >         nested_container, &zone_ids,
> >> >         -                               ofpacts_p, flow_table);
> >> >         +                               ofpacts_p, flow_table, ld,
> >> >         local_datapaths);
> >> >                      /* Table 0, Priority 200, 150 and 100.
> >> >                     * ==============================
> >> >         diff --git a/ovn/northd/ovn-northd.8.xml
> >> >         b/ovn/northd/ovn-northd.8.xml
> >> >         index 8fa5272..876c121 100644
> >> >         --- a/ovn/northd/ovn-northd.8.xml
> >> >         +++ b/ovn/northd/ovn-northd.8.xml
> >> >         @@ -2013,6 +2013,16 @@ next;
> >> >                  </li>
> >> >                    <li>
> >> >         +        A priority-100 logical flow with match
> >> >         +        <code>inport == <var>GW</var> &amp;&amp;
> >> >         +        flags.rcv_from_vlan == 1</code> has actions
> >> >         +        <code>eth.dst = <var>E</var>; next;</code>, where
> >> >         +        <var>GW</var> is the logical router distributed gateway
> >> >         +        port and <var>E</var> is the MAC address of router
> >> >         +        distributed gateway port.
> >> >         +      </li>
> >> >         +
> >> >         +      <li>
> >> >                    For each NAT rule in the OVN Northbound database
> >> that can
> >> >                    be handled in a distributed manner, a priority-100
> >> >         logical
> >> >                    flow with match <code>ip4.src == <var>B</var>
> >> &amp;&amp;
> >> >         diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >> >         index bcf0b66..d012bb8 100644
> >> >         --- a/ovn/northd/ovn-northd.c
> >> >         +++ b/ovn/northd/ovn-northd.c
> >> >         @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const
> >> >         struct ovn_port *op,
> >> >                } else {
> >> >                    ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" :
> >> "6");
> >> >                }
> >> >         +
> >> >         +    if (op->peer && op->peer->od->localnet_port &&
> >> >         +        op->od->l3dgw_port && op->od->l3redirect_port &&
> >> >         +        (op != op->od->l3redirect_port) &&
> >> >         +        (op != op->od->l3dgw_port)) {
> >> >         +        ds_put_format(&match, " && is_chassis_resident(%s)",
> >> >         +                      op->od->l3redirect_port->json_key);
> >> >         +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
> >> >         +    }
> >> >                ds_put_format(&actions, "; "
> >> >                              "%sreg1 = %s; "
> >> >                              "eth.src = %s; "
> >> >         @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap
> >> >         *datapaths, struct hmap *ports,
> >> >                                  op->lrp_networks.ipv6_addrs[i]
> >> .network_s,
> >> >                                  op->lrp_networks.ipv6_addrs[i].plen,
> >> >         NULL, NULL);
> >> >                    }
> >> >         +
> >> >         +        /* For a reply packet from gateway with VLAN switch
> >> port as
> >> >         +         * destination, replace router internal port MAC with
> >> >         router gateway
> >> >         +         * MAC address, so that external switches can learn
> >> >         gateway MAC
> >> >         +         * address. Later before delivering the packet to the
> >> port,
> >> >         +         * controller will replace the gateway MAC with router
> >> >         internal port
> >> >         +         * MAC in table 33. */
> >> >         +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
> >> >         +            op->od->l3redirect_port) {
> >> >         +            ds_clear(&actions);
> >> >         +            ds_clear(&match);
> >> >         +            ds_put_format(&match, "inport == %s",
> >> op->json_key);
> >> >         +            ds_put_format(&match, " && flags.rcv_from_vlan ==
> >> 1");
> >> >         +            ds_put_format(&match, " &&
> >> is_chassis_resident(%s)",
> >> >         +                          op->od->l3redirect_port->json_key);
> >> >         +            ds_put_format(&actions,
> >> >         +                          "eth.src = %s; next;",
> >> >         op->lrp_networks.ea_s);
> >> >         +            ovn_lflow_add(lflows, op->od,
> >> >         S_ROUTER_IN_GW_REDIRECT, 100,
> >> >         +                          ds_cstr(&match), ds_cstr(&actions));
> >> >         +        }
> >> >                }
> >> >                  /* Convert the static routes to flows. */
> >> >         diff --git a/ovn/ovn-architecture.7.xml
> >> b/ovn/ovn-architecture.7.xml
> >> >         index ad2101c..0de41d2 100644
> >> >         --- a/ovn/ovn-architecture.7.xml
> >> >         +++ b/ovn/ovn-architecture.7.xml
> >> >         @@ -1067,7 +1067,9 @@
> >> >                    <p>
> >> >                    Flows in table 33 resemble those in table 32 but for
> >> >         logical ports that
> >> >         -        reside locally rather than remotely.  For unicast
> >> >         logical output ports
> >> >         +        reside locally rather than remotely.  If these are VLAN
> >> >         ports and
> >> >         +        packet has router gateway port MAC address as source,
> >> >         replace it with
> >> >         +        router internal port MAC address. For unicast logical
> >> >         output ports
> >> >                    on the local hypervisor, the actions just resubmit to
> >> >         table 34.  For
> >> >                    multicast output ports that include one or more
> >> >         logical ports on the
> >> >                    local hypervisor, for each such logical port
> >> >         <var>P</var>, the actions
> >> >
> >> >
> >> >
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Anil Venkata Aug. 16, 2018, 8:35 a.m. | #7
Thanks Ben

On Wed, Aug 15, 2018 at 6:40 AM, Ben Pfaff <blp@ovn.org> wrote:

> I've asked Justin to take a look at this series.
>
> On Tue, Aug 14, 2018 at 08:48:25PM +0530, Anil Venkata wrote:
> > Gentle reminder requesting more reviews..
> >
> > Mark has reviewed the patch series and Miguel has already tested the
> patch
> > series.
> >
> > Thanks
> > Anil
> >
> > On Tue, Aug 14, 2018 at 7:45 PM, Miguel Angel Ajo Pelayo <
> > majopela@redhat.com> wrote:
> >
> > > Thank you very much,
> > >
> > > I wasn't able to perform a proper code review, but you can add the
> > > Tested-By: Miguel Angel Ajo <majopela@redhat.com>
> > >
> > > The issue I described previously seems to be fixed on the v7 of the
> series.
> > >
> > >
> > > On Mon, Aug 6, 2018 at 9:19 PM Mark Michelson <mmichels@redhat.com>
> wrote:
> > >
> > >> On 08/06/2018 01:58 PM, Anil Venkata wrote:
> > >> > Thanks Mark. Kindly look at my comment inline.
> > >> >
> > >> > On Fri, Aug 3, 2018 at 2:17 AM, Mark Michelson <mmichels@redhat.com
> > >> > <mailto:mmichels@redhat.com>> wrote:
> > >> >
> > >> >     On 08/01/2018 08:16 AM, vkommadi@redhat.com
> > >> >     <mailto:vkommadi@redhat.com> wrote:
> > >> >
> > >> >         From: venkata anil <vkommadi@redhat.com
> > >> >         <mailto:vkommadi@redhat.com>>
> > >> >
> > >> >         Previous patches in the series doesn't address issue 1
> explained
> > >> >         in [1]
> > >> >         i.e
> > >> >         1) removal of router gateway port MAC address on external
> > >> switches
> > >> >              after expiring of aging time.
> > >> >         2) then external switches unable to learn the gateway MAC as
> > >> >              reply packets carry router internal port MAC address as
> > >> source
> > >> >
> > >> >         To fix this, router on gateway node will use router gateway
> MAC
> > >> >         address
> > >> >         instead of router internal port MAC address as source for
> reply
> > >> >         packets,
> > >> >         so that external switches can learn gateway MAC address.
> > >> >         This is done only for reply packets from router gateway to
> > >> >         tenant VLAN
> > >> >         switch ports.
> > >> >         Later before delivering the packet to the port,
> ovn-controller
> > >> will
> > >> >         replace the gateway MAC with router internal port MAC in
> table
> > >> 33.
> > >> >
> > >> >         [1]
> > >> >         //mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> 349803.html
> > >> >         <http://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> > >> 349803.html>
> > >> >
> > >> >         Reported-by: Miguel Angel Ajo <majopela@redhat.com
> > >> >         <mailto:majopela@redhat.com>>
> > >> >         Reported-at:
> > >> >         https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> > >> 349803.html
> > >> >         <https://mail.openvswitch.org/pipermail/ovs-dev/2018-
> > >> July/349803.html>
> > >> >         Signed-off-by: Venkata Anil <vkommadi@redhat.com
> > >> >         <mailto:vkommadi@redhat.com>>
> > >> >         ---
> > >> >
> > >> >         v6->v7:
> > >> >         * Added this patch
> > >> >
> > >> >
> > >> >            ovn/controller/physical.c   | 60
> > >> >         ++++++++++++++++++++++++++++++++++++++++++---
> > >> >            ovn/northd/ovn-northd.8.xml | 10 ++++++++
> > >> >            ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
> > >> >            ovn/ovn-architecture.7.xml  |  4 ++-
> > >> >            4 files changed, 99 insertions(+), 4 deletions(-)
> > >> >
> > >> >         diff --git a/ovn/controller/physical.c
> > >> b/ovn/controller/physical.c
> > >> >         index f269a1d..1f41f59 100644
> > >> >         --- a/ovn/controller/physical.c
> > >> >         +++ b/ovn/controller/physical.c
> > >> >         @@ -190,7 +190,9 @@ get_zone_ids(const struct
> sbrec_port_binding
> > >> >         *binding,
> > >> >            static void
> > >> >            put_local_common_flows(uint32_t dp_key, uint32_t
> port_key,
> > >> >                                   bool nested_container, const
> struct
> > >> >         zone_ids *zone_ids,
> > >> >         -                       struct ofpbuf *ofpacts_p, struct
> hmap
> > >> >         *flow_table)
> > >> >         +                       struct ofpbuf *ofpacts_p, struct
> hmap
> > >> >         *flow_table,
> > >> >         +                       struct local_datapath *ld,
> > >> >         +                       const struct hmap *local_datapaths)
> > >> >            {
> > >> >                struct match match;
> > >> >            @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t
> > >> dp_key,
> > >> >         uint32_t port_key,
> > >> >                    }
> > >> >                }
> > >> >            +    struct ofpbuf *clone = NULL;
> > >> >         +    clone = ofpbuf_clone(ofpacts_p);
> > >> >         +
> > >> >
> > >> >
> > >> >     I don't understand the use of the cloned ofpbuf here. You could
> just
> > >> >     ofpbuf_clear(ofpacts_p) in each iteration of the for loop below
> and
> > >> >     reuse ofpacts_p where you've used clone below.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > I need to add additional flow with priority 150 along with default
> flow
> > >> > which exists with prioirty 100.
> > >> >
> > >> > 1) table=33,  priority=100,reg15=0x5,metadata=0x2
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> > >> > 2) table=33,
> > >> >   priority=150,reg15=0x5,metadata=0x2,dl_src=fa:16:3e:48:66:e7
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
> > >> 65:f2:ae,resubmit(,34)
> > >> >
> > >> > The first flow with priority 100 is the default flow. This flow is
> > >> added
> > >> > for each output port residing on the current chassis, sets some
> > >> > registers and resubmits to table 34.
> > >> > But when the packet has router gateway MAC as source MAC address, I
> > >> want
> > >> > to replace the MAC before resubmitting to table 34. Second flow has
> > >> same
> > >> > match conditions as first flow and also same actions, additionally
> it
> > >> > checks for the source MAC and then modifies the source MAC address.
> > >> >
> > >> > Before I construct the second flow, I have the registers and
> resubmit
> > >> as
> > >> > part of actions i.e
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> > >> >
> > >> > If I simply add MAC to actions with put_load(mac64, MFF_ETH_SRC, 0,
> 48,
> > >> > ofpacts_p); then actions in 2nd flow(with 150 priority) will look
> like
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34),mod_dl_
> > >> src:fa:16:3e:65:f2:ae
> > >> > i.e modifying the MAC will happen after resubmit, which is not the
> > >> > expected behaviour.
> > >> >
> > >> > So I am cloning the ofpacts_p before resubmit is added to it and
> using
> > >> > the cloned one in constructing flow with priority 150.
> > >> > With this changes, actions will look like
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
> > >> 65:f2:ae,resubmit(,34)
> > >>
> > >> Thank you very much for the explanation. It makes a lot more sense
> now.
> > >>
> > >> >
> > >> >
> > >> >                /* Resubmit to table 34. */
> > >> >                put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> > >> >                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT,
> 100, 0,
> > >> >                                &match, ofpacts_p);
> > >> >            +    /* For a reply packet from gateway with VLAN switch
> port
> > >> >         as destination
> > >> >         +     * (excluding localnet_port and external VLAN
> networks),
> > >> >         gateway router
> > >> >         +     * will use gateway MAC address as source MAC instead
> of
> > >> >         router internal
> > >> >         +     * port MAC, so that external switches can learn
> gateway
> > >> >         MAC address.
> > >> >         +     * Here (before packet is given to the port) we replace
> > >> >         router gateway
> > >> >         +     * MAC address with router internal port MAC. */
> > >> >         +    if (ld->localnet_port && (port_key !=
> > >> >         ld->localnet_port->tunnel_key)) {
> > >> >         +        for (int i = 0; i < ld->n_peer_dps; i++) {
> > >> >         +            struct local_datapath *peer_ldp =
> > >> get_local_datapath(
> > >> >         +                local_datapaths,
> > >> >         ld->peer_dps[i]->peer_dp->tunnel_key);
> > >> >         +            const struct sbrec_port_binding *crp;
> > >> >         +            crp = peer_ldp->chassisredirect_port;
> > >> >         +            if (!crp) {
> > >> >         +                continue;
> > >> >         +            }
> > >> >         +
> > >> >         +            if (strcmp(smap_get(&crp->options,
> > >> "distributed-port"),
> > >> >         +                       ld->peer_dps[i]->peer->logical_port)
> &&
> > >> >         +                (port_key != ld->peer_dps[i]->patch->
> tunnel_key))
> > >> {
> > >> >         +                for (int j = 0; j < crp->n_mac; j++) {
> > >> >         +                    struct lport_addresses laddrs;
> > >> >         +                    if (!extract_lsp_addresses(crp->
> mac[j],
> > >> >         &laddrs)) {
> > >> >         +                        continue;
> > >> >         +                    }
> > >> >         +                    match_set_dl_src(&match, laddrs.ea);
> > >> >         +                    destroy_lport_addresses(&laddrs);
> > >> >         +                    break;
> > >> >         +                }
> > >> >         +                for (int j = 0; j <
> > >> >         ld->peer_dps[i]->peer->n_mac; j++) {
> > >> >         +                    struct lport_addresses laddrs;
> > >> >         +                    uint64_t mac64;
> > >> >         +                    if (!extract_lsp_addresses(
> > >> >         +                        ld->peer_dps[i]->peer->mac[j],
> > >> &laddrs)) {
> > >> >         +                        continue;
> > >> >         +                    }
> > >> >         +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> > >> >         +                    put_load(mac64,
> > >> >         +                             MFF_ETH_SRC, 0, 48, clone);
> > >> >         +                    destroy_lport_addresses(&laddrs);
> > >> >         +                    break;
> > >> >         +                }
> > >> >         +                put_resubmit(OFTABLE_CHECK_LOOPBACK,
> clone);
> > >> >         +                ofctrl_add_flow(flow_table,
> > >> >         OFTABLE_LOCAL_OUTPUT, 150, 0,
> > >> >         +                    &match, clone);
> > >> >         +            }
> > >> >         +        }
> > >> >         +    }
> > >> >         +    ofpbuf_delete(clone);
> > >> >         +
> > >> >                /* Table 34, Priority 100.
> > >> >                 * =======================
> > >> >                 *
> > >> >         @@ -330,7 +384,7 @@ consider_port_binding(struct
> ovsdb_idl_index
> > >> >         *sbrec_chassis_by_name,
> > >> >                      struct zone_ids binding_zones =
> > >> >         get_zone_ids(binding, ct_zones);
> > >> >                    put_local_common_flows(dp_key, port_key, false,
> > >> >         &binding_zones,
> > >> >         -                               ofpacts_p, flow_table);
> > >> >         +                               ofpacts_p, flow_table, ld,
> > >> >         local_datapaths);
> > >> >                      match_init_catchall(&match);
> > >> >                    ofpbuf_clear(ofpacts_p);
> > >> >         @@ -531,7 +585,7 @@ consider_port_binding(struct
> ovsdb_idl_index
> > >> >         *sbrec_chassis_by_name,
> > >> >                      struct zone_ids zone_ids =
> get_zone_ids(binding,
> > >> >         ct_zones);
> > >> >                    put_local_common_flows(dp_key, port_key,
> > >> >         nested_container, &zone_ids,
> > >> >         -                               ofpacts_p, flow_table);
> > >> >         +                               ofpacts_p, flow_table, ld,
> > >> >         local_datapaths);
> > >> >                      /* Table 0, Priority 200, 150 and 100.
> > >> >                     * ==============================
> > >> >         diff --git a/ovn/northd/ovn-northd.8.xml
> > >> >         b/ovn/northd/ovn-northd.8.xml
> > >> >         index 8fa5272..876c121 100644
> > >> >         --- a/ovn/northd/ovn-northd.8.xml
> > >> >         +++ b/ovn/northd/ovn-northd.8.xml
> > >> >         @@ -2013,6 +2013,16 @@ next;
> > >> >                  </li>
> > >> >                    <li>
> > >> >         +        A priority-100 logical flow with match
> > >> >         +        <code>inport == <var>GW</var> &amp;&amp;
> > >> >         +        flags.rcv_from_vlan == 1</code> has actions
> > >> >         +        <code>eth.dst = <var>E</var>; next;</code>, where
> > >> >         +        <var>GW</var> is the logical router distributed
> gateway
> > >> >         +        port and <var>E</var> is the MAC address of router
> > >> >         +        distributed gateway port.
> > >> >         +      </li>
> > >> >         +
> > >> >         +      <li>
> > >> >                    For each NAT rule in the OVN Northbound database
> > >> that can
> > >> >                    be handled in a distributed manner, a
> priority-100
> > >> >         logical
> > >> >                    flow with match <code>ip4.src == <var>B</var>
> > >> &amp;&amp;
> > >> >         diff --git a/ovn/northd/ovn-northd.c
> b/ovn/northd/ovn-northd.c
> > >> >         index bcf0b66..d012bb8 100644
> > >> >         --- a/ovn/northd/ovn-northd.c
> > >> >         +++ b/ovn/northd/ovn-northd.c
> > >> >         @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const
> > >> >         struct ovn_port *op,
> > >> >                } else {
> > >> >                    ds_put_format(&actions, "ip%s.dst", is_ipv4 ?
> "4" :
> > >> "6");
> > >> >                }
> > >> >         +
> > >> >         +    if (op->peer && op->peer->od->localnet_port &&
> > >> >         +        op->od->l3dgw_port && op->od->l3redirect_port &&
> > >> >         +        (op != op->od->l3redirect_port) &&
> > >> >         +        (op != op->od->l3dgw_port)) {
> > >> >         +        ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> > >> >         +                      op->od->l3redirect_port->json_key);
> > >> >         +        ds_put_format(&actions, "; flags.rcv_from_vlan =
> 1");
> > >> >         +    }
> > >> >                ds_put_format(&actions, "; "
> > >> >                              "%sreg1 = %s; "
> > >> >                              "eth.src = %s; "
> > >> >         @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap
> > >> >         *datapaths, struct hmap *ports,
> > >> >                                  op->lrp_networks.ipv6_addrs[i]
> > >> .network_s,
> > >> >                                  op->lrp_networks.ipv6_addrs[i]
> .plen,
> > >> >         NULL, NULL);
> > >> >                    }
> > >> >         +
> > >> >         +        /* For a reply packet from gateway with VLAN switch
> > >> port as
> > >> >         +         * destination, replace router internal port MAC
> with
> > >> >         router gateway
> > >> >         +         * MAC address, so that external switches can learn
> > >> >         gateway MAC
> > >> >         +         * address. Later before delivering the packet to
> the
> > >> port,
> > >> >         +         * controller will replace the gateway MAC with
> router
> > >> >         internal port
> > >> >         +         * MAC in table 33. */
> > >> >         +        if (op->od->l3dgw_port && (op ==
> op->od->l3dgw_port) &&
> > >> >         +            op->od->l3redirect_port) {
> > >> >         +            ds_clear(&actions);
> > >> >         +            ds_clear(&match);
> > >> >         +            ds_put_format(&match, "inport == %s",
> > >> op->json_key);
> > >> >         +            ds_put_format(&match, " && flags.rcv_from_vlan
> ==
> > >> 1");
> > >> >         +            ds_put_format(&match, " &&
> > >> is_chassis_resident(%s)",
> > >> >         +                          op->od->l3redirect_port->json_
> key);
> > >> >         +            ds_put_format(&actions,
> > >> >         +                          "eth.src = %s; next;",
> > >> >         op->lrp_networks.ea_s);
> > >> >         +            ovn_lflow_add(lflows, op->od,
> > >> >         S_ROUTER_IN_GW_REDIRECT, 100,
> > >> >         +                          ds_cstr(&match),
> ds_cstr(&actions));
> > >> >         +        }
> > >> >                }
> > >> >                  /* Convert the static routes to flows. */
> > >> >         diff --git a/ovn/ovn-architecture.7.xml
> > >> b/ovn/ovn-architecture.7.xml
> > >> >         index ad2101c..0de41d2 100644
> > >> >         --- a/ovn/ovn-architecture.7.xml
> > >> >         +++ b/ovn/ovn-architecture.7.xml
> > >> >         @@ -1067,7 +1067,9 @@
> > >> >                    <p>
> > >> >                    Flows in table 33 resemble those in table 32 but
> for
> > >> >         logical ports that
> > >> >         -        reside locally rather than remotely.  For unicast
> > >> >         logical output ports
> > >> >         +        reside locally rather than remotely.  If these are
> VLAN
> > >> >         ports and
> > >> >         +        packet has router gateway port MAC address as
> source,
> > >> >         replace it with
> > >> >         +        router internal port MAC address. For unicast
> logical
> > >> >         output ports
> > >> >                    on the local hypervisor, the actions just
> resubmit to
> > >> >         table 34.  For
> > >> >                    multicast output ports that include one or more
> > >> >         logical ports on the
> > >> >                    local hypervisor, for each such logical port
> > >> >         <var>P</var>, the actions
> > >> >
> > >> >
> > >> >
> > >>
> > >> _______________________________________________
> > >> dev mailing list
> > >> dev@openvswitch.org
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Justin Pettit Sept. 21, 2018, 4:40 p.m. | #8
Yes, sorry.  I'll take a look early next week.

--Justin


> On Sep 21, 2018, at 5:47 AM, Anil Venkata <anilvenkata@redhat.com> wrote:
> 
> Gentle ping :) Thanks
> 
> On Thu, Aug 16, 2018 at 2:05 PM Anil Venkata <anilvenkata@redhat.com> wrote:
> Thanks Ben
> 
> On Wed, Aug 15, 2018 at 6:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> I've asked Justin to take a look at this series.
> 
> On Tue, Aug 14, 2018 at 08:48:25PM +0530, Anil Venkata wrote:
> > Gentle reminder requesting more reviews..
> > 
> > Mark has reviewed the patch series and Miguel has already tested the patch
> > series.
> > 
> > Thanks
> > Anil
> > 
> > On Tue, Aug 14, 2018 at 7:45 PM, Miguel Angel Ajo Pelayo <
> > majopela@redhat.com> wrote:
> > 
> > > Thank you very much,
> > >
> > > I wasn't able to perform a proper code review, but you can add the
> > > Tested-By: Miguel Angel Ajo <majopela@redhat.com>
> > >
> > > The issue I described previously seems to be fixed on the v7 of the series.
> > >
> > >
> > > On Mon, Aug 6, 2018 at 9:19 PM Mark Michelson <mmichels@redhat.com> wrote:
> > >
> > >> On 08/06/2018 01:58 PM, Anil Venkata wrote:
> > >> > Thanks Mark. Kindly look at my comment inline.
> > >> >
> > >> > On Fri, Aug 3, 2018 at 2:17 AM, Mark Michelson <mmichels@redhat.com
> > >> > <mailto:mmichels@redhat.com>> wrote:
> > >> >
> > >> >     On 08/01/2018 08:16 AM, vkommadi@redhat.com
> > >> >     <mailto:vkommadi@redhat.com> wrote:
> > >> >
> > >> >         From: venkata anil <vkommadi@redhat.com
> > >> >         <mailto:vkommadi@redhat.com>>
> > >> >
> > >> >         Previous patches in the series doesn't address issue 1 explained
> > >> >         in [1]
> > >> >         i.e
> > >> >         1) removal of router gateway port MAC address on external
> > >> switches
> > >> >              after expiring of aging time.
> > >> >         2) then external switches unable to learn the gateway MAC as
> > >> >              reply packets carry router internal port MAC address as
> > >> source
> > >> >
> > >> >         To fix this, router on gateway node will use router gateway MAC
> > >> >         address
> > >> >         instead of router internal port MAC address as source for reply
> > >> >         packets,
> > >> >         so that external switches can learn gateway MAC address.
> > >> >         This is done only for reply packets from router gateway to
> > >> >         tenant VLAN
> > >> >         switch ports.
> > >> >         Later before delivering the packet to the port, ovn-controller
> > >> will
> > >> >         replace the gateway MAC with router internal port MAC in table
> > >> 33.
> > >> >
> > >> >         [1]
> > >> >         //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
> > >> >         <http://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> > >> 349803.html>
> > >> >
> > >> >         Reported-by: Miguel Angel Ajo <majopela@redhat.com
> > >> >         <mailto:majopela@redhat.com>>
> > >> >         Reported-at:
> > >> >         https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> > >> 349803.html
> > >> >         <https://mail.openvswitch.org/pipermail/ovs-dev/2018-
> > >> July/349803.html>
> > >> >         Signed-off-by: Venkata Anil <vkommadi@redhat.com
> > >> >         <mailto:vkommadi@redhat.com>>
> > >> >         ---
> > >> >
> > >> >         v6->v7:
> > >> >         * Added this patch
> > >> >
> > >> >
> > >> >            ovn/controller/physical.c   | 60
> > >> >         ++++++++++++++++++++++++++++++++++++++++++---
> > >> >            ovn/northd/ovn-northd.8.xml | 10 ++++++++
> > >> >            ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
> > >> >            ovn/ovn-architecture.7.xml  |  4 ++-
> > >> >            4 files changed, 99 insertions(+), 4 deletions(-)
> > >> >
> > >> >         diff --git a/ovn/controller/physical.c
> > >> b/ovn/controller/physical.c
> > >> >         index f269a1d..1f41f59 100644
> > >> >         --- a/ovn/controller/physical.c
> > >> >         +++ b/ovn/controller/physical.c
> > >> >         @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
> > >> >         *binding,
> > >> >            static void
> > >> >            put_local_common_flows(uint32_t dp_key, uint32_t port_key,
> > >> >                                   bool nested_container, const struct
> > >> >         zone_ids *zone_ids,
> > >> >         -                       struct ofpbuf *ofpacts_p, struct hmap
> > >> >         *flow_table)
> > >> >         +                       struct ofpbuf *ofpacts_p, struct hmap
> > >> >         *flow_table,
> > >> >         +                       struct local_datapath *ld,
> > >> >         +                       const struct hmap *local_datapaths)
> > >> >            {
> > >> >                struct match match;
> > >> >            @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t
> > >> dp_key,
> > >> >         uint32_t port_key,
> > >> >                    }
> > >> >                }
> > >> >            +    struct ofpbuf *clone = NULL;
> > >> >         +    clone = ofpbuf_clone(ofpacts_p);
> > >> >         +
> > >> >
> > >> >
> > >> >     I don't understand the use of the cloned ofpbuf here. You could just
> > >> >     ofpbuf_clear(ofpacts_p) in each iteration of the for loop below and
> > >> >     reuse ofpacts_p where you've used clone below.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > I need to add additional flow with priority 150 along with default flow
> > >> > which exists with prioirty 100.
> > >> >
> > >> > 1) table=33,  priority=100,reg15=0x5,metadata=0x2
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> > >> > 2) table=33,
> > >> >   priority=150,reg15=0x5,metadata=0x2,dl_src=fa:16:3e:48:66:e7
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
> > >> 65:f2:ae,resubmit(,34)
> > >> >
> > >> > The first flow with priority 100 is the default flow. This flow is
> > >> added
> > >> > for each output port residing on the current chassis, sets some
> > >> > registers and resubmits to table 34.
> > >> > But when the packet has router gateway MAC as source MAC address, I
> > >> want
> > >> > to replace the MAC before resubmitting to table 34. Second flow has
> > >> same
> > >> > match conditions as first flow and also same actions, additionally it
> > >> > checks for the source MAC and then modifies the source MAC address.
> > >> >
> > >> > Before I construct the second flow, I have the registers and resubmit
> > >> as
> > >> > part of actions i.e
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34)
> > >> >
> > >> > If I simply add MAC to actions with put_load(mac64, MFF_ETH_SRC, 0, 48,
> > >> > ofpacts_p); then actions in 2nd flow(with 150 priority) will look like
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],resubmit(,34),mod_dl_
> > >> src:fa:16:3e:65:f2:ae
> > >> > i.e modifying the MAC will happen after resubmit, which is not the
> > >> > expected behaviour.
> > >> >
> > >> > So I am cloning the ofpacts_p before resubmit is added to it and using
> > >> > the cloned one in constructing flow with priority 150.
> > >> > With this changes, actions will look like
> > >> > actions=load:0x2->NXM_NX_REG13[],load:0xc->NXM_NX_
> > >> REG11[],load:0xb->NXM_NX_REG12[],mod_dl_src:fa:16:3e:
> > >> 65:f2:ae,resubmit(,34)
> > >>
> > >> Thank you very much for the explanation. It makes a lot more sense now.
> > >>
> > >> >
> > >> >
> > >> >                /* Resubmit to table 34. */
> > >> >                put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> > >> >                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
> > >> >                                &match, ofpacts_p);
> > >> >            +    /* For a reply packet from gateway with VLAN switch port
> > >> >         as destination
> > >> >         +     * (excluding localnet_port and external VLAN networks),
> > >> >         gateway router
> > >> >         +     * will use gateway MAC address as source MAC instead of
> > >> >         router internal
> > >> >         +     * port MAC, so that external switches can learn gateway
> > >> >         MAC address.
> > >> >         +     * Here (before packet is given to the port) we replace
> > >> >         router gateway
> > >> >         +     * MAC address with router internal port MAC. */
> > >> >         +    if (ld->localnet_port && (port_key !=
> > >> >         ld->localnet_port->tunnel_key)) {
> > >> >         +        for (int i = 0; i < ld->n_peer_dps; i++) {
> > >> >         +            struct local_datapath *peer_ldp =
> > >> get_local_datapath(
> > >> >         +                local_datapaths,
> > >> >         ld->peer_dps[i]->peer_dp->tunnel_key);
> > >> >         +            const struct sbrec_port_binding *crp;
> > >> >         +            crp = peer_ldp->chassisredirect_port;
> > >> >         +            if (!crp) {
> > >> >         +                continue;
> > >> >         +            }
> > >> >         +
> > >> >         +            if (strcmp(smap_get(&crp->options,
> > >> "distributed-port"),
> > >> >         +                       ld->peer_dps[i]->peer->logical_port) &&
> > >> >         +                (port_key != ld->peer_dps[i]->patch->tunnel_key))
> > >> {
> > >> >         +                for (int j = 0; j < crp->n_mac; j++) {
> > >> >         +                    struct lport_addresses laddrs;
> > >> >         +                    if (!extract_lsp_addresses(crp->mac[j],
> > >> >         &laddrs)) {
> > >> >         +                        continue;
> > >> >         +                    }
> > >> >         +                    match_set_dl_src(&match, laddrs.ea);
> > >> >         +                    destroy_lport_addresses(&laddrs);
> > >> >         +                    break;
> > >> >         +                }
> > >> >         +                for (int j = 0; j <
> > >> >         ld->peer_dps[i]->peer->n_mac; j++) {
> > >> >         +                    struct lport_addresses laddrs;
> > >> >         +                    uint64_t mac64;
> > >> >         +                    if (!extract_lsp_addresses(
> > >> >         +                        ld->peer_dps[i]->peer->mac[j],
> > >> &laddrs)) {
> > >> >         +                        continue;
> > >> >         +                    }
> > >> >         +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> > >> >         +                    put_load(mac64,
> > >> >         +                             MFF_ETH_SRC, 0, 48, clone);
> > >> >         +                    destroy_lport_addresses(&laddrs);
> > >> >         +                    break;
> > >> >         +                }
> > >> >         +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
> > >> >         +                ofctrl_add_flow(flow_table,
> > >> >         OFTABLE_LOCAL_OUTPUT, 150, 0,
> > >> >         +                    &match, clone);
> > >> >         +            }
> > >> >         +        }
> > >> >         +    }
> > >> >         +    ofpbuf_delete(clone);
> > >> >         +
> > >> >                /* Table 34, Priority 100.
> > >> >                 * =======================
> > >> >                 *
> > >> >         @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
> > >> >         *sbrec_chassis_by_name,
> > >> >                      struct zone_ids binding_zones =
> > >> >         get_zone_ids(binding, ct_zones);
> > >> >                    put_local_common_flows(dp_key, port_key, false,
> > >> >         &binding_zones,
> > >> >         -                               ofpacts_p, flow_table);
> > >> >         +                               ofpacts_p, flow_table, ld,
> > >> >         local_datapaths);
> > >> >                      match_init_catchall(&match);
> > >> >                    ofpbuf_clear(ofpacts_p);
> > >> >         @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
> > >> >         *sbrec_chassis_by_name,
> > >> >                      struct zone_ids zone_ids = get_zone_ids(binding,
> > >> >         ct_zones);
> > >> >                    put_local_common_flows(dp_key, port_key,
> > >> >         nested_container, &zone_ids,
> > >> >         -                               ofpacts_p, flow_table);
> > >> >         +                               ofpacts_p, flow_table, ld,
> > >> >         local_datapaths);
> > >> >                      /* Table 0, Priority 200, 150 and 100.
> > >> >                     * ==============================
> > >> >         diff --git a/ovn/northd/ovn-northd.8.xml
> > >> >         b/ovn/northd/ovn-northd.8.xml
> > >> >         index 8fa5272..876c121 100644
> > >> >         --- a/ovn/northd/ovn-northd.8.xml
> > >> >         +++ b/ovn/northd/ovn-northd.8.xml
> > >> >         @@ -2013,6 +2013,16 @@ next;
> > >> >                  </li>
> > >> >                    <li>
> > >> >         +        A priority-100 logical flow with match
> > >> >         +        <code>inport == <var>GW</var> &amp;&amp;
> > >> >         +        flags.rcv_from_vlan == 1</code> has actions
> > >> >         +        <code>eth.dst = <var>E</var>; next;</code>, where
> > >> >         +        <var>GW</var> is the logical router distributed gateway
> > >> >         +        port and <var>E</var> is the MAC address of router
> > >> >         +        distributed gateway port.
> > >> >         +      </li>
> > >> >         +
> > >> >         +      <li>
> > >> >                    For each NAT rule in the OVN Northbound database
> > >> that can
> > >> >                    be handled in a distributed manner, a priority-100
> > >> >         logical
> > >> >                    flow with match <code>ip4.src == <var>B</var>
> > >> &amp;&amp;
> > >> >         diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > >> >         index bcf0b66..d012bb8 100644
> > >> >         --- a/ovn/northd/ovn-northd.c
> > >> >         +++ b/ovn/northd/ovn-northd.c
> > >> >         @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const
> > >> >         struct ovn_port *op,
> > >> >                } else {
> > >> >                    ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" :
> > >> "6");
> > >> >                }
> > >> >         +
> > >> >         +    if (op->peer && op->peer->od->localnet_port &&
> > >> >         +        op->od->l3dgw_port && op->od->l3redirect_port &&
> > >> >         +        (op != op->od->l3redirect_port) &&
> > >> >         +        (op != op->od->l3dgw_port)) {
> > >> >         +        ds_put_format(&match, " && is_chassis_resident(%s)",
> > >> >         +                      op->od->l3redirect_port->json_key);
> > >> >         +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
> > >> >         +    }
> > >> >                ds_put_format(&actions, "; "
> > >> >                              "%sreg1 = %s; "
> > >> >                              "eth.src = %s; "
> > >> >         @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap
> > >> >         *datapaths, struct hmap *ports,
> > >> >                                  op->lrp_networks.ipv6_addrs[i]
> > >> .network_s,
> > >> >                                  op->lrp_networks.ipv6_addrs[i].plen,
> > >> >         NULL, NULL);
> > >> >                    }
> > >> >         +
> > >> >         +        /* For a reply packet from gateway with VLAN switch
> > >> port as
> > >> >         +         * destination, replace router internal port MAC with
> > >> >         router gateway
> > >> >         +         * MAC address, so that external switches can learn
> > >> >         gateway MAC
> > >> >         +         * address. Later before delivering the packet to the
> > >> port,
> > >> >         +         * controller will replace the gateway MAC with router
> > >> >         internal port
> > >> >         +         * MAC in table 33. */
> > >> >         +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
> > >> >         +            op->od->l3redirect_port) {
> > >> >         +            ds_clear(&actions);
> > >> >         +            ds_clear(&match);
> > >> >         +            ds_put_format(&match, "inport == %s",
> > >> op->json_key);
> > >> >         +            ds_put_format(&match, " && flags.rcv_from_vlan ==
> > >> 1");
> > >> >         +            ds_put_format(&match, " &&
> > >> is_chassis_resident(%s)",
> > >> >         +                          op->od->l3redirect_port->json_key);
> > >> >         +            ds_put_format(&actions,
> > >> >         +                          "eth.src = %s; next;",
> > >> >         op->lrp_networks.ea_s);
> > >> >         +            ovn_lflow_add(lflows, op->od,
> > >> >         S_ROUTER_IN_GW_REDIRECT, 100,
> > >> >         +                          ds_cstr(&match), ds_cstr(&actions));
> > >> >         +        }
> > >> >                }
> > >> >                  /* Convert the static routes to flows. */
> > >> >         diff --git a/ovn/ovn-architecture.7.xml
> > >> b/ovn/ovn-architecture.7.xml
> > >> >         index ad2101c..0de41d2 100644
> > >> >         --- a/ovn/ovn-architecture.7.xml
> > >> >         +++ b/ovn/ovn-architecture.7.xml
> > >> >         @@ -1067,7 +1067,9 @@
> > >> >                    <p>
> > >> >                    Flows in table 33 resemble those in table 32 but for
> > >> >         logical ports that
> > >> >         -        reside locally rather than remotely.  For unicast
> > >> >         logical output ports
> > >> >         +        reside locally rather than remotely.  If these are VLAN
> > >> >         ports and
> > >> >         +        packet has router gateway port MAC address as source,
> > >> >         replace it with
> > >> >         +        router internal port MAC address. For unicast logical
> > >> >         output ports
> > >> >                    on the local hypervisor, the actions just resubmit to
> > >> >         table 34.  For
> > >> >                    multicast output ports that include one or more
> > >> >         logical ports on the
> > >> >                    local hypervisor, for each such logical port
> > >> >         <var>P</var>, the actions
> > >> >
> > >> >
> > >> >
> > >>
> > >> _______________________________________________
> > >> dev mailing list
> > >> dev@openvswitch.org
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Guru Shetty Sept. 21, 2018, 4:54 p.m. | #9
On Wed, 1 Aug 2018 at 05:18, <vkommadi@redhat.com> wrote:

> From: venkata anil <vkommadi@redhat.com>
>
> Previous patches in the series doesn't address issue 1 explained in [1]
> i.e
> 1) removal of router gateway port MAC address on external switches
>    after expiring of aging time.
> 2) then external switches unable to learn the gateway MAC as
>    reply packets carry router internal port MAC address as source
>
> To fix this, router on gateway node will use router gateway MAC address
> instead of router internal port MAC address as source for reply packets,
> so that external switches can learn gateway MAC address.
> This is done only for reply packets from router gateway to tenant VLAN
> switch ports.
> Later before delivering the packet to the port, ovn-controller will
> replace the gateway MAC with router internal port MAC in table 33.
>
> [1] //mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
>
> Reported-by: Miguel Angel Ajo <majopela@redhat.com>
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/349803.html
> Signed-off-by: Venkata Anil <vkommadi@redhat.com>
>

I have tried to make sense of this patch series a few times. I think adding
increasing complications like this will make gateway code unmaintainable.
The whole gateway redirect chassis already makes it un-understandable and
now this will mean that no one will be able to understand it as we go
forward. [/rant off]


> ---
>
> v6->v7:
> * Added this patch
>
>
>  ovn/controller/physical.c   | 60
> ++++++++++++++++++++++++++++++++++++++++++---
>  ovn/northd/ovn-northd.8.xml | 10 ++++++++
>  ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>  ovn/ovn-architecture.7.xml  |  4 ++-
>  4 files changed, 99 insertions(+), 4 deletions(-)
>
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index f269a1d..1f41f59 100644
> --- a/ovn/controller/physical.c
> +++ b/ovn/controller/physical.c
> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>  static void
>  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>                         bool nested_container, const struct zone_ids
> *zone_ids,
> -                       struct ofpbuf *ofpacts_p, struct hmap *flow_table)
> +                       struct ofpbuf *ofpacts_p, struct hmap *flow_table,
> +                       struct local_datapath *ld,
> +                       const struct hmap *local_datapaths)
>  {
>      struct match match;
>
> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, uint32_t
> port_key,
>          }
>      }
>
> +    struct ofpbuf *clone = NULL;
> +    clone = ofpbuf_clone(ofpacts_p);
> +
>      /* Resubmit to table 34. */
>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>      ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>                      &match, ofpacts_p);
>
> +    /* For a reply packet from gateway with VLAN switch port as
> destination
> +     * (excluding localnet_port and external VLAN networks), gateway
> router
> +     * will use gateway MAC address as source MAC instead of router
> internal
> +     * port MAC, so that external switches can learn gateway MAC address.
> +     * Here (before packet is given to the port) we replace router gateway
> +     * MAC address with router internal port MAC. */
> +    if (ld->localnet_port && (port_key != ld->localnet_port->tunnel_key))
> {
> +        for (int i = 0; i < ld->n_peer_dps; i++) {
> +            struct local_datapath *peer_ldp = get_local_datapath(
> +                local_datapaths, ld->peer_dps[i]->peer_dp->tunnel_key);
> +            const struct sbrec_port_binding *crp;
> +            crp = peer_ldp->chassisredirect_port;
> +            if (!crp) {
> +                continue;
> +            }
> +
> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
> +                       ld->peer_dps[i]->peer->logical_port) &&
> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
> +                for (int j = 0; j < crp->n_mac; j++) {
> +                    struct lport_addresses laddrs;
> +                    if (!extract_lsp_addresses(crp->mac[j], &laddrs)) {
> +                        continue;
> +                    }
> +                    match_set_dl_src(&match, laddrs.ea);
> +                    destroy_lport_addresses(&laddrs);
> +                    break;
> +                }
> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; j++) {
> +                    struct lport_addresses laddrs;
> +                    uint64_t mac64;
> +                    if (!extract_lsp_addresses(
> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
> +                        continue;
> +                    }
> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> +                    put_load(mac64,
> +                             MFF_ETH_SRC, 0, 48, clone);
> +                    destroy_lport_addresses(&laddrs);
> +                    break;
> +                }
> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 150, 0,
> +                    &match, clone);
> +            }
> +        }
> +    }
> +    ofpbuf_delete(clone);
> +
>      /* Table 34, Priority 100.
>       * =======================
>       *
> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
> *sbrec_chassis_by_name,
>
>          struct zone_ids binding_zones = get_zone_ids(binding, ct_zones);
>          put_local_common_flows(dp_key, port_key, false, &binding_zones,
> -                               ofpacts_p, flow_table);
> +                               ofpacts_p, flow_table, ld,
> local_datapaths);
>
>          match_init_catchall(&match);
>          ofpbuf_clear(ofpacts_p);
> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
> *sbrec_chassis_by_name,
>
>          struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
>          put_local_common_flows(dp_key, port_key, nested_container,
> &zone_ids,
> -                               ofpacts_p, flow_table);
> +                               ofpacts_p, flow_table, ld,
> local_datapaths);
>
>          /* Table 0, Priority 200, 150 and 100.
>           * ==============================
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 8fa5272..876c121 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -2013,6 +2013,16 @@ next;
>        </li>
>
>        <li>
> +        A priority-100 logical flow with match
> +        <code>inport == <var>GW</var> &amp;&amp;
> +        flags.rcv_from_vlan == 1</code> has actions
> +        <code>eth.dst = <var>E</var>; next;</code>, where
> +        <var>GW</var> is the logical router distributed gateway
> +        port and <var>E</var> is the MAC address of router
> +        distributed gateway port.
> +      </li>
> +
> +      <li>
>          For each NAT rule in the OVN Northbound database that can
>          be handled in a distributed manner, a priority-100 logical
>          flow with match <code>ip4.src == <var>B</var> &amp;&amp;
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index bcf0b66..d012bb8 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
> ovn_port *op,
>      } else {
>          ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>      }
> +
> +    if (op->peer && op->peer->od->localnet_port &&
> +        op->od->l3dgw_port && op->od->l3redirect_port &&
> +        (op != op->od->l3redirect_port) &&
> +        (op != op->od->l3dgw_port)) {
> +        ds_put_format(&match, " && is_chassis_resident(%s)",
> +                      op->od->l3redirect_port->json_key);
> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
> +    }
>      ds_put_format(&actions, "; "
>                    "%sreg1 = %s; "
>                    "eth.src = %s; "
> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>                        op->lrp_networks.ipv6_addrs[i].network_s,
>                        op->lrp_networks.ipv6_addrs[i].plen, NULL, NULL);
>          }
> +
> +        /* For a reply packet from gateway with VLAN switch port as
> +         * destination, replace router internal port MAC with router
> gateway
> +         * MAC address, so that external switches can learn gateway MAC
> +         * address. Later before delivering the packet to the port,
> +         * controller will replace the gateway MAC with router internal
> port
> +         * MAC in table 33. */
> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
> +            op->od->l3redirect_port) {
> +            ds_clear(&actions);
> +            ds_clear(&match);
> +            ds_put_format(&match, "inport == %s", op->json_key);
> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
> +            ds_put_format(&match, " && is_chassis_resident(%s)",
> +                          op->od->l3redirect_port->json_key);
> +            ds_put_format(&actions,
> +                          "eth.src = %s; next;", op->lrp_networks.ea_s);
> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, 100,
> +                          ds_cstr(&match), ds_cstr(&actions));
> +        }
>      }
>
>      /* Convert the static routes to flows. */
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index ad2101c..0de41d2 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -1067,7 +1067,9 @@
>
>        <p>
>          Flows in table 33 resemble those in table 32 but for logical
> ports that
> -        reside locally rather than remotely.  For unicast logical output
> ports
> +        reside locally rather than remotely.  If these are VLAN ports and
> +        packet has router gateway port MAC address as source, replace it
> with
> +        router internal port MAC address. For unicast logical output ports
>          on the local hypervisor, the actions just resubmit to table 34.
> For
>          multicast output ports that include one or more logical ports on
> the
>          local hypervisor, for each such logical port <var>P</var>, the
> actions
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Guru Shetty Sept. 21, 2018, 5:35 p.m. | #10
>
>
> I have tried to make sense of this patch series a few times. I think
> adding increasing complications like this will make gateway code
> unmaintainable. The whole gateway redirect chassis already makes it
> un-understandable and now this will mean that no one will be able to
> understand it as we go forward. [/rant off]
>

Sorry about the "rant" if it came out as rude - was not my intention. I
will try to offer something more constructive. When you initially looked at
the gateway redirect code, how long did it take for you to understand it?
My main concern is not with your code, but the underlying code. I initially
reviewed the underlying redirect code. And it took me a good 2 days to
understand it. Back then, we wanted something working and it looked like
something that can eventually be simplified. But, we have been adding more
things to it. And right now, if this patch series is added, it will likely
get very very hard to understand it as we go forward. Do you reckon that a
year from now, you will be able to debug a corner case in the code?

If there are others who have understood the "redirect" gateway code easily,
please do chime in.

My suggestion is to re-look at the redirect gateway design and see whether
the code can be simplified. Or whether we can write detailed documentation
about it to make it easy to understand. Only once we are satisfied with it,
we can attempt to add more stuff to it.


>
>
>> ---
>>
>> v6->v7:
>> * Added this patch
>>
>>
>>  ovn/controller/physical.c   | 60
>> ++++++++++++++++++++++++++++++++++++++++++---
>>  ovn/northd/ovn-northd.8.xml | 10 ++++++++
>>  ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>>  ovn/ovn-architecture.7.xml  |  4 ++-
>>  4 files changed, 99 insertions(+), 4 deletions(-)
>>
>> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>> index f269a1d..1f41f59 100644
>> --- a/ovn/controller/physical.c
>> +++ b/ovn/controller/physical.c
>> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>  static void
>>  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>>                         bool nested_container, const struct zone_ids
>> *zone_ids,
>> -                       struct ofpbuf *ofpacts_p, struct hmap *flow_table)
>> +                       struct ofpbuf *ofpacts_p, struct hmap *flow_table,
>> +                       struct local_datapath *ld,
>> +                       const struct hmap *local_datapaths)
>>  {
>>      struct match match;
>>
>> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, uint32_t
>> port_key,
>>          }
>>      }
>>
>> +    struct ofpbuf *clone = NULL;
>> +    clone = ofpbuf_clone(ofpacts_p);
>> +
>>      /* Resubmit to table 34. */
>>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>>      ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>>                      &match, ofpacts_p);
>>
>> +    /* For a reply packet from gateway with VLAN switch port as
>> destination
>> +     * (excluding localnet_port and external VLAN networks), gateway
>> router
>> +     * will use gateway MAC address as source MAC instead of router
>> internal
>> +     * port MAC, so that external switches can learn gateway MAC address.
>> +     * Here (before packet is given to the port) we replace router
>> gateway
>> +     * MAC address with router internal port MAC. */
>> +    if (ld->localnet_port && (port_key !=
>> ld->localnet_port->tunnel_key)) {
>> +        for (int i = 0; i < ld->n_peer_dps; i++) {
>> +            struct local_datapath *peer_ldp = get_local_datapath(
>> +                local_datapaths, ld->peer_dps[i]->peer_dp->tunnel_key);
>> +            const struct sbrec_port_binding *crp;
>> +            crp = peer_ldp->chassisredirect_port;
>> +            if (!crp) {
>> +                continue;
>> +            }
>> +
>> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
>> +                       ld->peer_dps[i]->peer->logical_port) &&
>> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
>> +                for (int j = 0; j < crp->n_mac; j++) {
>> +                    struct lport_addresses laddrs;
>> +                    if (!extract_lsp_addresses(crp->mac[j], &laddrs)) {
>> +                        continue;
>> +                    }
>> +                    match_set_dl_src(&match, laddrs.ea);
>> +                    destroy_lport_addresses(&laddrs);
>> +                    break;
>> +                }
>> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; j++) {
>> +                    struct lport_addresses laddrs;
>> +                    uint64_t mac64;
>> +                    if (!extract_lsp_addresses(
>> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
>> +                        continue;
>> +                    }
>> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
>> +                    put_load(mac64,
>> +                             MFF_ETH_SRC, 0, 48, clone);
>> +                    destroy_lport_addresses(&laddrs);
>> +                    break;
>> +                }
>> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
>> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 150, 0,
>> +                    &match, clone);
>> +            }
>> +        }
>> +    }
>> +    ofpbuf_delete(clone);
>> +
>>      /* Table 34, Priority 100.
>>       * =======================
>>       *
>> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>
>>          struct zone_ids binding_zones = get_zone_ids(binding, ct_zones);
>>          put_local_common_flows(dp_key, port_key, false, &binding_zones,
>> -                               ofpacts_p, flow_table);
>> +                               ofpacts_p, flow_table, ld,
>> local_datapaths);
>>
>>          match_init_catchall(&match);
>>          ofpbuf_clear(ofpacts_p);
>> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>
>>          struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
>>          put_local_common_flows(dp_key, port_key, nested_container,
>> &zone_ids,
>> -                               ofpacts_p, flow_table);
>> +                               ofpacts_p, flow_table, ld,
>> local_datapaths);
>>
>>          /* Table 0, Priority 200, 150 and 100.
>>           * ==============================
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index 8fa5272..876c121 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -2013,6 +2013,16 @@ next;
>>        </li>
>>
>>        <li>
>> +        A priority-100 logical flow with match
>> +        <code>inport == <var>GW</var> &amp;&amp;
>> +        flags.rcv_from_vlan == 1</code> has actions
>> +        <code>eth.dst = <var>E</var>; next;</code>, where
>> +        <var>GW</var> is the logical router distributed gateway
>> +        port and <var>E</var> is the MAC address of router
>> +        distributed gateway port.
>> +      </li>
>> +
>> +      <li>
>>          For each NAT rule in the OVN Northbound database that can
>>          be handled in a distributed manner, a priority-100 logical
>>          flow with match <code>ip4.src == <var>B</var> &amp;&amp;
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index bcf0b66..d012bb8 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
>> ovn_port *op,
>>      } else {
>>          ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>>      }
>> +
>> +    if (op->peer && op->peer->od->localnet_port &&
>> +        op->od->l3dgw_port && op->od->l3redirect_port &&
>> +        (op != op->od->l3redirect_port) &&
>> +        (op != op->od->l3dgw_port)) {
>> +        ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                      op->od->l3redirect_port->json_key);
>> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
>> +    }
>>      ds_put_format(&actions, "; "
>>                    "%sreg1 = %s; "
>>                    "eth.src = %s; "
>> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>                        op->lrp_networks.ipv6_addrs[i].network_s,
>>                        op->lrp_networks.ipv6_addrs[i].plen, NULL, NULL);
>>          }
>> +
>> +        /* For a reply packet from gateway with VLAN switch port as
>> +         * destination, replace router internal port MAC with router
>> gateway
>> +         * MAC address, so that external switches can learn gateway MAC
>> +         * address. Later before delivering the packet to the port,
>> +         * controller will replace the gateway MAC with router internal
>> port
>> +         * MAC in table 33. */
>> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
>> +            op->od->l3redirect_port) {
>> +            ds_clear(&actions);
>> +            ds_clear(&match);
>> +            ds_put_format(&match, "inport == %s", op->json_key);
>> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                          op->od->l3redirect_port->json_key);
>> +            ds_put_format(&actions,
>> +                          "eth.src = %s; next;", op->lrp_networks.ea_s);
>> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, 100,
>> +                          ds_cstr(&match), ds_cstr(&actions));
>> +        }
>>      }
>>
>>      /* Convert the static routes to flows. */
>> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>> index ad2101c..0de41d2 100644
>> --- a/ovn/ovn-architecture.7.xml
>> +++ b/ovn/ovn-architecture.7.xml
>> @@ -1067,7 +1067,9 @@
>>
>>        <p>
>>          Flows in table 33 resemble those in table 32 but for logical
>> ports that
>> -        reside locally rather than remotely.  For unicast logical output
>> ports
>> +        reside locally rather than remotely.  If these are VLAN ports and
>> +        packet has router gateway port MAC address as source, replace it
>> with
>> +        router internal port MAC address. For unicast logical output
>> ports
>>          on the local hypervisor, the actions just resubmit to table 34.
>> For
>>          multicast output ports that include one or more logical ports on
>> the
>>          local hypervisor, for each such logical port <var>P</var>, the
>> actions
>> --
>> 1.8.3.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Numan Siddique Sept. 21, 2018, 7:27 p.m. | #11
On Fri, Sep 21, 2018 at 11:06 PM Guru Shetty <guru@ovn.org> wrote:

> >
> >
> > I have tried to make sense of this patch series a few times. I think
> > adding increasing complications like this will make gateway code
> > unmaintainable. The whole gateway redirect chassis already makes it
> > un-understandable and now this will mean that no one will be able to
> > understand it as we go forward. [/rant off]
> >
>
> Sorry about the "rant" if it came out as rude - was not my intention. I
> will try to offer something more constructive. When you initially looked at
> the gateway redirect code, how long did it take for you to understand it?
> My main concern is not with your code, but the underlying code. I initially
> reviewed the underlying redirect code. And it took me a good 2 days to
> understand it. Back then, we wanted something working and it looked like
> something that can eventually be simplified. But, we have been adding more
> things to it. And right now, if this patch series is added, it will likely
> get very very hard to understand it as we go forward. Do you reckon that a
> year from now, you will be able to debug a corner case in the code?
>
> If there are others who have understood the "redirect" gateway code easily,
> please do chime in.
>
> My suggestion is to re-look at the redirect gateway design and see whether
> the code can be simplified. Or whether we can write detailed documentation
> about it to make it easy to understand. Only once we are satisfied with it,
> we can attempt to add more stuff to it.
>
>
I always forget the gateway code and the design. (May be because I haven't
worked
on this part of the code much) and I have to refresh my memory everytime.
I agree that we need to have a detailed documentation. May be "a journey of
a packet "
kind of documentation in ovn-architecture would also help.

The proposed solution in this patch series, If I understand it correctly
runs the router pipeline
in the source chassis and then resumes it on the gateway chassis. Recently
I was looking into
SR-IOV kind of scenarios, where the VM traffic by passes the host kernel,
and I think the
proposed solution will not work. In order to support N/S traffic for SR-IOV
instances
we have to find probably another way and that would also complicate the
code further.

However I think it is possible to have one solution to support gateway
traffic for
VLAN tenant networks which covers SR-IOV and the normal case. I think the
router pipeline
should be executed only on the gateway chassis. I will have a discussion
with
Anil on this, this monday and see if it's possible.

But unfortunately, supporting these use cases will add some amount of
complexity :(.

Thanks
Numan


>
> >
> >> ---
> >>
> >> v6->v7:
> >> * Added this patch
> >>
> >>
> >>  ovn/controller/physical.c   | 60
> >> ++++++++++++++++++++++++++++++++++++++++++---
> >>  ovn/northd/ovn-northd.8.xml | 10 ++++++++
> >>  ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
> >>  ovn/ovn-architecture.7.xml  |  4 ++-
> >>  4 files changed, 99 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> >> index f269a1d..1f41f59 100644
> >> --- a/ovn/controller/physical.c
> >> +++ b/ovn/controller/physical.c
> >> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
> *binding,
> >>  static void
> >>  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
> >>                         bool nested_container, const struct zone_ids
> >> *zone_ids,
> >> -                       struct ofpbuf *ofpacts_p, struct hmap
> *flow_table)
> >> +                       struct ofpbuf *ofpacts_p, struct hmap
> *flow_table,
> >> +                       struct local_datapath *ld,
> >> +                       const struct hmap *local_datapaths)
> >>  {
> >>      struct match match;
> >>
> >> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, uint32_t
> >> port_key,
> >>          }
> >>      }
> >>
> >> +    struct ofpbuf *clone = NULL;
> >> +    clone = ofpbuf_clone(ofpacts_p);
> >> +
> >>      /* Resubmit to table 34. */
> >>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> >>      ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
> >>                      &match, ofpacts_p);
> >>
> >> +    /* For a reply packet from gateway with VLAN switch port as
> >> destination
> >> +     * (excluding localnet_port and external VLAN networks), gateway
> >> router
> >> +     * will use gateway MAC address as source MAC instead of router
> >> internal
> >> +     * port MAC, so that external switches can learn gateway MAC
> address.
> >> +     * Here (before packet is given to the port) we replace router
> >> gateway
> >> +     * MAC address with router internal port MAC. */
> >> +    if (ld->localnet_port && (port_key !=
> >> ld->localnet_port->tunnel_key)) {
> >> +        for (int i = 0; i < ld->n_peer_dps; i++) {
> >> +            struct local_datapath *peer_ldp = get_local_datapath(
> >> +                local_datapaths, ld->peer_dps[i]->peer_dp->tunnel_key);
> >> +            const struct sbrec_port_binding *crp;
> >> +            crp = peer_ldp->chassisredirect_port;
> >> +            if (!crp) {
> >> +                continue;
> >> +            }
> >> +
> >> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
> >> +                       ld->peer_dps[i]->peer->logical_port) &&
> >> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
> >> +                for (int j = 0; j < crp->n_mac; j++) {
> >> +                    struct lport_addresses laddrs;
> >> +                    if (!extract_lsp_addresses(crp->mac[j], &laddrs)) {
> >> +                        continue;
> >> +                    }
> >> +                    match_set_dl_src(&match, laddrs.ea);
> >> +                    destroy_lport_addresses(&laddrs);
> >> +                    break;
> >> +                }
> >> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; j++)
> {
> >> +                    struct lport_addresses laddrs;
> >> +                    uint64_t mac64;
> >> +                    if (!extract_lsp_addresses(
> >> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
> >> +                        continue;
> >> +                    }
> >> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> >> +                    put_load(mac64,
> >> +                             MFF_ETH_SRC, 0, 48, clone);
> >> +                    destroy_lport_addresses(&laddrs);
> >> +                    break;
> >> +                }
> >> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
> >> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 150,
> 0,
> >> +                    &match, clone);
> >> +            }
> >> +        }
> >> +    }
> >> +    ofpbuf_delete(clone);
> >> +
> >>      /* Table 34, Priority 100.
> >>       * =======================
> >>       *
> >> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
> >> *sbrec_chassis_by_name,
> >>
> >>          struct zone_ids binding_zones = get_zone_ids(binding,
> ct_zones);
> >>          put_local_common_flows(dp_key, port_key, false, &binding_zones,
> >> -                               ofpacts_p, flow_table);
> >> +                               ofpacts_p, flow_table, ld,
> >> local_datapaths);
> >>
> >>          match_init_catchall(&match);
> >>          ofpbuf_clear(ofpacts_p);
> >> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
> >> *sbrec_chassis_by_name,
> >>
> >>          struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
> >>          put_local_common_flows(dp_key, port_key, nested_container,
> >> &zone_ids,
> >> -                               ofpacts_p, flow_table);
> >> +                               ofpacts_p, flow_table, ld,
> >> local_datapaths);
> >>
> >>          /* Table 0, Priority 200, 150 and 100.
> >>           * ==============================
> >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> >> index 8fa5272..876c121 100644
> >> --- a/ovn/northd/ovn-northd.8.xml
> >> +++ b/ovn/northd/ovn-northd.8.xml
> >> @@ -2013,6 +2013,16 @@ next;
> >>        </li>
> >>
> >>        <li>
> >> +        A priority-100 logical flow with match
> >> +        <code>inport == <var>GW</var> &amp;&amp;
> >> +        flags.rcv_from_vlan == 1</code> has actions
> >> +        <code>eth.dst = <var>E</var>; next;</code>, where
> >> +        <var>GW</var> is the logical router distributed gateway
> >> +        port and <var>E</var> is the MAC address of router
> >> +        distributed gateway port.
> >> +      </li>
> >> +
> >> +      <li>
> >>          For each NAT rule in the OVN Northbound database that can
> >>          be handled in a distributed manner, a priority-100 logical
> >>          flow with match <code>ip4.src == <var>B</var> &amp;&amp;
> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >> index bcf0b66..d012bb8 100644
> >> --- a/ovn/northd/ovn-northd.c
> >> +++ b/ovn/northd/ovn-northd.c
> >> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
> >> ovn_port *op,
> >>      } else {
> >>          ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
> >>      }
> >> +
> >> +    if (op->peer && op->peer->od->localnet_port &&
> >> +        op->od->l3dgw_port && op->od->l3redirect_port &&
> >> +        (op != op->od->l3redirect_port) &&
> >> +        (op != op->od->l3dgw_port)) {
> >> +        ds_put_format(&match, " && is_chassis_resident(%s)",
> >> +                      op->od->l3redirect_port->json_key);
> >> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
> >> +    }
> >>      ds_put_format(&actions, "; "
> >>                    "%sreg1 = %s; "
> >>                    "eth.src = %s; "
> >> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths,
> struct
> >> hmap *ports,
> >>                        op->lrp_networks.ipv6_addrs[i].network_s,
> >>                        op->lrp_networks.ipv6_addrs[i].plen, NULL, NULL);
> >>          }
> >> +
> >> +        /* For a reply packet from gateway with VLAN switch port as
> >> +         * destination, replace router internal port MAC with router
> >> gateway
> >> +         * MAC address, so that external switches can learn gateway MAC
> >> +         * address. Later before delivering the packet to the port,
> >> +         * controller will replace the gateway MAC with router internal
> >> port
> >> +         * MAC in table 33. */
> >> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
> >> +            op->od->l3redirect_port) {
> >> +            ds_clear(&actions);
> >> +            ds_clear(&match);
> >> +            ds_put_format(&match, "inport == %s", op->json_key);
> >> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
> >> +            ds_put_format(&match, " && is_chassis_resident(%s)",
> >> +                          op->od->l3redirect_port->json_key);
> >> +            ds_put_format(&actions,
> >> +                          "eth.src = %s; next;",
> op->lrp_networks.ea_s);
> >> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, 100,
> >> +                          ds_cstr(&match), ds_cstr(&actions));
> >> +        }
> >>      }
> >>
> >>      /* Convert the static routes to flows. */
> >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> >> index ad2101c..0de41d2 100644
> >> --- a/ovn/ovn-architecture.7.xml
> >> +++ b/ovn/ovn-architecture.7.xml
> >> @@ -1067,7 +1067,9 @@
> >>
> >>        <p>
> >>          Flows in table 33 resemble those in table 32 but for logical
> >> ports that
> >> -        reside locally rather than remotely.  For unicast logical
> output
> >> ports
> >> +        reside locally rather than remotely.  If these are VLAN ports
> and
> >> +        packet has router gateway port MAC address as source, replace
> it
> >> with
> >> +        router internal port MAC address. For unicast logical output
> >> ports
> >>          on the local hypervisor, the actions just resubmit to table 34.
> >> For
> >>          multicast output ports that include one or more logical ports
> on
> >> the
> >>          local hypervisor, for each such logical port <var>P</var>, the
> >> actions
> >> --
> >> 1.8.3.1
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Miguel Angel Ajo Pelayo Sept. 24, 2018, 10:10 a.m. | #12
No worries Guru, I understand your feeling, I worked with Anil on
developing this feature, and it's indeed rather complex (we are actually
replacing keepalived + VRRP with openflow and BFD).

I'm happy to work on a more detailed documentation, I guess that Numan,
Anil and I could team up to help that put documentation in shape where [1]
should be a good starting point for the admin side, and may be after that
Numan can give it a thought for making the code simpler where there's room
for that. It's easier to have others help simplify once we have good
documentation.

I agree with Numans that, may be we should take also this opportunity (this
patch Anil has been working on to make VLAN private networks possible with
l3ha) to also support SR-IOV, since it's actually not far from there at all.

Guru, what's your biggest concern, the design documentation? the user side
? both?. My worry with design documentaiton is that tends to rot when new
patches comes, do we have any example of internals documentation that talk
about code in tree?


[1]
https://github.com/openvswitch/ovs/blob/master/Documentation/topics/high-availability.rst



On Fri, Sep 21, 2018 at 9:28 PM Numan Siddique <nusiddiq@redhat.com> wrote:

> On Fri, Sep 21, 2018 at 11:06 PM Guru Shetty <guru@ovn.org> wrote:
>
> > >
> > >
> > > I have tried to make sense of this patch series a few times. I think
> > > adding increasing complications like this will make gateway code
> > > unmaintainable. The whole gateway redirect chassis already makes it
> > > un-understandable and now this will mean that no one will be able to
> > > understand it as we go forward. [/rant off]
> > >
> >
> > Sorry about the "rant" if it came out as rude - was not my intention. I
> > will try to offer something more constructive. When you initially looked
> at
> > the gateway redirect code, how long did it take for you to understand it?
> > My main concern is not with your code, but the underlying code. I
> initially
> > reviewed the underlying redirect code. And it took me a good 2 days to
> > understand it. Back then, we wanted something working and it looked like
> > something that can eventually be simplified. But, we have been adding
> more
> > things to it. And right now, if this patch series is added, it will
> likely
> > get very very hard to understand it as we go forward. Do you reckon that
> a
> > year from now, you will be able to debug a corner case in the code?
> >
> > If there are others who have understood the "redirect" gateway code
> easily,
> > please do chime in.
> >
> > My suggestion is to re-look at the redirect gateway design and see
> whether
> > the code can be simplified. Or whether we can write detailed
> documentation
> > about it to make it easy to understand. Only once we are satisfied with
> it,
> > we can attempt to add more stuff to it.
> >
> >
> I always forget the gateway code and the design. (May be because I haven't
> worked
> on this part of the code much) and I have to refresh my memory everytime.
> I agree that we need to have a detailed documentation. May be "a journey of
> a packet "
> kind of documentation in ovn-architecture would also help.
>
> The proposed solution in this patch series, If I understand it correctly
> runs the router pipeline
> in the source chassis and then resumes it on the gateway chassis. Recently
> I was looking into
> SR-IOV kind of scenarios, where the VM traffic by passes the host kernel,
> and I think the
> proposed solution will not work. In order to support N/S traffic for SR-IOV
> instances
> we have to find probably another way and that would also complicate the
> code further.
>
> However I think it is possible to have one solution to support gateway
> traffic for
> VLAN tenant networks which covers SR-IOV and the normal case. I think the
> router pipeline
> should be executed only on the gateway chassis. I will have a discussion
> with
> Anil on this, this monday and see if it's possible.
>
> But unfortunately, supporting these use cases will add some amount of
> complexity :(.
>
> Thanks
> Numan
>
>
> >
> > >
> > >> ---
> > >>
> > >> v6->v7:
> > >> * Added this patch
> > >>
> > >>
> > >>  ovn/controller/physical.c   | 60
> > >> ++++++++++++++++++++++++++++++++++++++++++---
> > >>  ovn/northd/ovn-northd.8.xml | 10 ++++++++
> > >>  ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
> > >>  ovn/ovn-architecture.7.xml  |  4 ++-
> > >>  4 files changed, 99 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> > >> index f269a1d..1f41f59 100644
> > >> --- a/ovn/controller/physical.c
> > >> +++ b/ovn/controller/physical.c
> > >> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
> > *binding,
> > >>  static void
> > >>  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
> > >>                         bool nested_container, const struct zone_ids
> > >> *zone_ids,
> > >> -                       struct ofpbuf *ofpacts_p, struct hmap
> > *flow_table)
> > >> +                       struct ofpbuf *ofpacts_p, struct hmap
> > *flow_table,
> > >> +                       struct local_datapath *ld,
> > >> +                       const struct hmap *local_datapaths)
> > >>  {
> > >>      struct match match;
> > >>
> > >> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, uint32_t
> > >> port_key,
> > >>          }
> > >>      }
> > >>
> > >> +    struct ofpbuf *clone = NULL;
> > >> +    clone = ofpbuf_clone(ofpacts_p);
> > >> +
> > >>      /* Resubmit to table 34. */
> > >>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> > >>      ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
> > >>                      &match, ofpacts_p);
> > >>
> > >> +    /* For a reply packet from gateway with VLAN switch port as
> > >> destination
> > >> +     * (excluding localnet_port and external VLAN networks), gateway
> > >> router
> > >> +     * will use gateway MAC address as source MAC instead of router
> > >> internal
> > >> +     * port MAC, so that external switches can learn gateway MAC
> > address.
> > >> +     * Here (before packet is given to the port) we replace router
> > >> gateway
> > >> +     * MAC address with router internal port MAC. */
> > >> +    if (ld->localnet_port && (port_key !=
> > >> ld->localnet_port->tunnel_key)) {
> > >> +        for (int i = 0; i < ld->n_peer_dps; i++) {
> > >> +            struct local_datapath *peer_ldp = get_local_datapath(
> > >> +                local_datapaths,
> ld->peer_dps[i]->peer_dp->tunnel_key);
> > >> +            const struct sbrec_port_binding *crp;
> > >> +            crp = peer_ldp->chassisredirect_port;
> > >> +            if (!crp) {
> > >> +                continue;
> > >> +            }
> > >> +
> > >> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
> > >> +                       ld->peer_dps[i]->peer->logical_port) &&
> > >> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
> > >> +                for (int j = 0; j < crp->n_mac; j++) {
> > >> +                    struct lport_addresses laddrs;
> > >> +                    if (!extract_lsp_addresses(crp->mac[j],
> &laddrs)) {
> > >> +                        continue;
> > >> +                    }
> > >> +                    match_set_dl_src(&match, laddrs.ea);
> > >> +                    destroy_lport_addresses(&laddrs);
> > >> +                    break;
> > >> +                }
> > >> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac;
> j++)
> > {
> > >> +                    struct lport_addresses laddrs;
> > >> +                    uint64_t mac64;
> > >> +                    if (!extract_lsp_addresses(
> > >> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
> > >> +                        continue;
> > >> +                    }
> > >> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
> > >> +                    put_load(mac64,
> > >> +                             MFF_ETH_SRC, 0, 48, clone);
> > >> +                    destroy_lport_addresses(&laddrs);
> > >> +                    break;
> > >> +                }
> > >> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
> > >> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT,
> 150,
> > 0,
> > >> +                    &match, clone);
> > >> +            }
> > >> +        }
> > >> +    }
> > >> +    ofpbuf_delete(clone);
> > >> +
> > >>      /* Table 34, Priority 100.
> > >>       * =======================
> > >>       *
> > >> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
> > >> *sbrec_chassis_by_name,
> > >>
> > >>          struct zone_ids binding_zones = get_zone_ids(binding,
> > ct_zones);
> > >>          put_local_common_flows(dp_key, port_key, false,
> &binding_zones,
> > >> -                               ofpacts_p, flow_table);
> > >> +                               ofpacts_p, flow_table, ld,
> > >> local_datapaths);
> > >>
> > >>          match_init_catchall(&match);
> > >>          ofpbuf_clear(ofpacts_p);
> > >> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
> > >> *sbrec_chassis_by_name,
> > >>
> > >>          struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
> > >>          put_local_common_flows(dp_key, port_key, nested_container,
> > >> &zone_ids,
> > >> -                               ofpacts_p, flow_table);
> > >> +                               ofpacts_p, flow_table, ld,
> > >> local_datapaths);
> > >>
> > >>          /* Table 0, Priority 200, 150 and 100.
> > >>           * ==============================
> > >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> > >> index 8fa5272..876c121 100644
> > >> --- a/ovn/northd/ovn-northd.8.xml
> > >> +++ b/ovn/northd/ovn-northd.8.xml
> > >> @@ -2013,6 +2013,16 @@ next;
> > >>        </li>
> > >>
> > >>        <li>
> > >> +        A priority-100 logical flow with match
> > >> +        <code>inport == <var>GW</var> &amp;&amp;
> > >> +        flags.rcv_from_vlan == 1</code> has actions
> > >> +        <code>eth.dst = <var>E</var>; next;</code>, where
> > >> +        <var>GW</var> is the logical router distributed gateway
> > >> +        port and <var>E</var> is the MAC address of router
> > >> +        distributed gateway port.
> > >> +      </li>
> > >> +
> > >> +      <li>
> > >>          For each NAT rule in the OVN Northbound database that can
> > >>          be handled in a distributed manner, a priority-100 logical
> > >>          flow with match <code>ip4.src == <var>B</var> &amp;&amp;
> > >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > >> index bcf0b66..d012bb8 100644
> > >> --- a/ovn/northd/ovn-northd.c
> > >> +++ b/ovn/northd/ovn-northd.c
> > >> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
> > >> ovn_port *op,
> > >>      } else {
> > >>          ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
> > >>      }
> > >> +
> > >> +    if (op->peer && op->peer->od->localnet_port &&
> > >> +        op->od->l3dgw_port && op->od->l3redirect_port &&
> > >> +        (op != op->od->l3redirect_port) &&
> > >> +        (op != op->od->l3dgw_port)) {
> > >> +        ds_put_format(&match, " && is_chassis_resident(%s)",
> > >> +                      op->od->l3redirect_port->json_key);
> > >> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
> > >> +    }
> > >>      ds_put_format(&actions, "; "
> > >>                    "%sreg1 = %s; "
> > >>                    "eth.src = %s; "
> > >> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths,
> > struct
> > >> hmap *ports,
> > >>                        op->lrp_networks.ipv6_addrs[i].network_s,
> > >>                        op->lrp_networks.ipv6_addrs[i].plen, NULL,
> NULL);
> > >>          }
> > >> +
> > >> +        /* For a reply packet from gateway with VLAN switch port as
> > >> +         * destination, replace router internal port MAC with router
> > >> gateway
> > >> +         * MAC address, so that external switches can learn gateway
> MAC
> > >> +         * address. Later before delivering the packet to the port,
> > >> +         * controller will replace the gateway MAC with router
> internal
> > >> port
> > >> +         * MAC in table 33. */
> > >> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
> > >> +            op->od->l3redirect_port) {
> > >> +            ds_clear(&actions);
> > >> +            ds_clear(&match);
> > >> +            ds_put_format(&match, "inport == %s", op->json_key);
> > >> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
> > >> +            ds_put_format(&match, " && is_chassis_resident(%s)",
> > >> +                          op->od->l3redirect_port->json_key);
> > >> +            ds_put_format(&actions,
> > >> +                          "eth.src = %s; next;",
> > op->lrp_networks.ea_s);
> > >> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT,
> 100,
> > >> +                          ds_cstr(&match), ds_cstr(&actions));
> > >> +        }
> > >>      }
> > >>
> > >>      /* Convert the static routes to flows. */
> > >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> > >> index ad2101c..0de41d2 100644
> > >> --- a/ovn/ovn-architecture.7.xml
> > >> +++ b/ovn/ovn-architecture.7.xml
> > >> @@ -1067,7 +1067,9 @@
> > >>
> > >>        <p>
> > >>          Flows in table 33 resemble those in table 32 but for logical
> > >> ports that
> > >> -        reside locally rather than remotely.  For unicast logical
> > output
> > >> ports
> > >> +        reside locally rather than remotely.  If these are VLAN ports
> > and
> > >> +        packet has router gateway port MAC address as source, replace
> > it
> > >> with
> > >> +        router internal port MAC address. For unicast logical output
> > >> ports
> > >>          on the local hypervisor, the actions just resubmit to table
> 34.
> > >> For
> > >>          multicast output ports that include one or more logical ports
> > on
> > >> the
> > >>          local hypervisor, for each such logical port <var>P</var>,
> the
> > >> actions
> > >> --
> > >> 1.8.3.1
> > >>
> > >> _______________________________________________
> > >> dev mailing list
> > >> dev@openvswitch.org
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Guru Shetty Sept. 27, 2018, 7:05 p.m. | #13
On Mon, 24 Sep 2018 at 03:10, Miguel Angel Ajo Pelayo <majopela@redhat.com>
wrote:

>
>
> No worries Guru, I understand your feeling, I worked with Anil on
> developing this feature, and it's indeed rather complex (we are actually
> replacing keepalived + VRRP with openflow and BFD).
>
> I'm happy to work on a more detailed documentation, I guess that Numan,
> Anil and I could team up to help that put documentation in shape where [1]
> should be a good starting point for the admin side, and may be after that
> Numan can give it a thought for making the code simpler where there's room
> for that. It's easier to have others help simplify once we have good
> documentation.
>
> I agree with Numans that, may be we should take also this opportunity
> (this patch Anil has been working on to make VLAN private networks possible
> with l3ha) to also support SR-IOV, since it's actually not far from there
> at all.
>
> Guru, what's your biggest concern, the design documentation? the user side
> ? both?. My worry with design documentaiton is that tends to rot when new
> patches comes, do we have any example of internals documentation that talk
> about code in tree?
>

My biggest concern is that reading the code does not give out a story.
Everything in ovn-northd can be understood by just reading the code. But
when it comes to gateway redirect, it is hard.

For e.g., I am pasting a snippet of code from one of the patches in this
series.

++    if (op->peer && op->peer->od->localnet_port &&+
op->od->l3dgw_port && op->od->l3redirect_port &&+        (op !=
op->od->l3redirect_port) &&+        (op != op->od->l3dgw_port)) {+
   ds_put_format(&match, " && is_chassis_resident(%s)",+
       op->od->l3redirect_port->json_key);+
ds_put_format(&actions, "; flags.rcv_from_vlan = 1");+    }

You need to go back and read the first patches of the feature (gateway
redirect) and its commit messages to understand the implications of the
above if condition. The primary reason why it is hard is because, we try to
move back and forth with packet flow and force it to move in non-natural
directions.

I feel that writing a documentation (A hybrid of man pages of
ovn-architecture and ovn-northd) would make it easy to follow. For e.g., we
follow a packet when it leaves the VM and heads to the gateway and gets
SNATted.  Similarly, we follow the packet for DNAT. We can then enhance it
with other use cases (for e.g this patch series).

On a larger note, OVN is a network virtualization project. We implement the
functionality of a router, switch, firewall etc and then when we
interconnect them, things just work well. Code around Switches, routers,
load-balancers etc are easy to understand because they are all in logical
space. But gateways transcend the logical boundary to physical boundary.
The reason why the "gateway redirect" code is hard to understand is because
we are trying to implement it in the same router that is also used in
logical space. A clean separation of boundaries does not exist. I tried to
implement the clean separation with gateway routers, wherein the there is a
separate router that connects to the physical world and is chassis bound.
But OpenStack-OVN as a client did not want to handle the complication of
maintaining 2 routers.






>
> [1]
> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/high-availability.rst
>
>
>
> On Fri, Sep 21, 2018 at 9:28 PM Numan Siddique <nusiddiq@redhat.com>
> wrote:
>
>> On Fri, Sep 21, 2018 at 11:06 PM Guru Shetty <guru@ovn.org> wrote:
>>
>> > >
>> > >
>> > > I have tried to make sense of this patch series a few times. I think
>> > > adding increasing complications like this will make gateway code
>> > > unmaintainable. The whole gateway redirect chassis already makes it
>> > > un-understandable and now this will mean that no one will be able to
>> > > understand it as we go forward. [/rant off]
>> > >
>> >
>> > Sorry about the "rant" if it came out as rude - was not my intention. I
>> > will try to offer something more constructive. When you initially
>> looked at
>> > the gateway redirect code, how long did it take for you to understand
>> it?
>> > My main concern is not with your code, but the underlying code. I
>> initially
>> > reviewed the underlying redirect code. And it took me a good 2 days to
>> > understand it. Back then, we wanted something working and it looked like
>> > something that can eventually be simplified. But, we have been adding
>> more
>> > things to it. And right now, if this patch series is added, it will
>> likely
>> > get very very hard to understand it as we go forward. Do you reckon
>> that a
>> > year from now, you will be able to debug a corner case in the code?
>> >
>> > If there are others who have understood the "redirect" gateway code
>> easily,
>> > please do chime in.
>> >
>> > My suggestion is to re-look at the redirect gateway design and see
>> whether
>> > the code can be simplified. Or whether we can write detailed
>> documentation
>> > about it to make it easy to understand. Only once we are satisfied with
>> it,
>> > we can attempt to add more stuff to it.
>> >
>> >
>> I always forget the gateway code and the design. (May be because I haven't
>> worked
>> on this part of the code much) and I have to refresh my memory everytime.
>> I agree that we need to have a detailed documentation. May be "a journey
>> of
>> a packet "
>> kind of documentation in ovn-architecture would also help.
>>
>> The proposed solution in this patch series, If I understand it correctly
>> runs the router pipeline
>> in the source chassis and then resumes it on the gateway chassis. Recently
>> I was looking into
>> SR-IOV kind of scenarios, where the VM traffic by passes the host kernel,
>> and I think the
>> proposed solution will not work. In order to support N/S traffic for
>> SR-IOV
>> instances
>> we have to find probably another way and that would also complicate the
>> code further.
>>
>> However I think it is possible to have one solution to support gateway
>> traffic for
>> VLAN tenant networks which covers SR-IOV and the normal case. I think the
>> router pipeline
>> should be executed only on the gateway chassis. I will have a discussion
>> with
>> Anil on this, this monday and see if it's possible.
>>
>> But unfortunately, supporting these use cases will add some amount of
>> complexity :(.
>>
>> Thanks
>> Numan
>>
>>
>> >
>> > >
>> > >> ---
>> > >>
>> > >> v6->v7:
>> > >> * Added this patch
>> > >>
>> > >>
>> > >>  ovn/controller/physical.c   | 60
>> > >> ++++++++++++++++++++++++++++++++++++++++++---
>> > >>  ovn/northd/ovn-northd.8.xml | 10 ++++++++
>> > >>  ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>> > >>  ovn/ovn-architecture.7.xml  |  4 ++-
>> > >>  4 files changed, 99 insertions(+), 4 deletions(-)
>> > >>
>> > >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>> > >> index f269a1d..1f41f59 100644
>> > >> --- a/ovn/controller/physical.c
>> > >> +++ b/ovn/controller/physical.c
>> > >> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
>> > *binding,
>> > >>  static void
>> > >>  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>> > >>                         bool nested_container, const struct zone_ids
>> > >> *zone_ids,
>> > >> -                       struct ofpbuf *ofpacts_p, struct hmap
>> > *flow_table)
>> > >> +                       struct ofpbuf *ofpacts_p, struct hmap
>> > *flow_table,
>> > >> +                       struct local_datapath *ld,
>> > >> +                       const struct hmap *local_datapaths)
>> > >>  {
>> > >>      struct match match;
>> > >>
>> > >> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key,
>> uint32_t
>> > >> port_key,
>> > >>          }
>> > >>      }
>> > >>
>> > >> +    struct ofpbuf *clone = NULL;
>> > >> +    clone = ofpbuf_clone(ofpacts_p);
>> > >> +
>> > >>      /* Resubmit to table 34. */
>> > >>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>> > >>      ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>> > >>                      &match, ofpacts_p);
>> > >>
>> > >> +    /* For a reply packet from gateway with VLAN switch port as
>> > >> destination
>> > >> +     * (excluding localnet_port and external VLAN networks), gateway
>> > >> router
>> > >> +     * will use gateway MAC address as source MAC instead of router
>> > >> internal
>> > >> +     * port MAC, so that external switches can learn gateway MAC
>> > address.
>> > >> +     * Here (before packet is given to the port) we replace router
>> > >> gateway
>> > >> +     * MAC address with router internal port MAC. */
>> > >> +    if (ld->localnet_port && (port_key !=
>> > >> ld->localnet_port->tunnel_key)) {
>> > >> +        for (int i = 0; i < ld->n_peer_dps; i++) {
>> > >> +            struct local_datapath *peer_ldp = get_local_datapath(
>> > >> +                local_datapaths,
>> ld->peer_dps[i]->peer_dp->tunnel_key);
>> > >> +            const struct sbrec_port_binding *crp;
>> > >> +            crp = peer_ldp->chassisredirect_port;
>> > >> +            if (!crp) {
>> > >> +                continue;
>> > >> +            }
>> > >> +
>> > >> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
>> > >> +                       ld->peer_dps[i]->peer->logical_port) &&
>> > >> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
>> > >> +                for (int j = 0; j < crp->n_mac; j++) {
>> > >> +                    struct lport_addresses laddrs;
>> > >> +                    if (!extract_lsp_addresses(crp->mac[j],
>> &laddrs)) {
>> > >> +                        continue;
>> > >> +                    }
>> > >> +                    match_set_dl_src(&match, laddrs.ea);
>> > >> +                    destroy_lport_addresses(&laddrs);
>> > >> +                    break;
>> > >> +                }
>> > >> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac;
>> j++)
>> > {
>> > >> +                    struct lport_addresses laddrs;
>> > >> +                    uint64_t mac64;
>> > >> +                    if (!extract_lsp_addresses(
>> > >> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
>> > >> +                        continue;
>> > >> +                    }
>> > >> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
>> > >> +                    put_load(mac64,
>> > >> +                             MFF_ETH_SRC, 0, 48, clone);
>> > >> +                    destroy_lport_addresses(&laddrs);
>> > >> +                    break;
>> > >> +                }
>> > >> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
>> > >> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT,
>> 150,
>> > 0,
>> > >> +                    &match, clone);
>> > >> +            }
>> > >> +        }
>> > >> +    }
>> > >> +    ofpbuf_delete(clone);
>> > >> +
>> > >>      /* Table 34, Priority 100.
>> > >>       * =======================
>> > >>       *
>> > >> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
>> > >> *sbrec_chassis_by_name,
>> > >>
>> > >>          struct zone_ids binding_zones = get_zone_ids(binding,
>> > ct_zones);
>> > >>          put_local_common_flows(dp_key, port_key, false,
>> &binding_zones,
>> > >> -                               ofpacts_p, flow_table);
>> > >> +                               ofpacts_p, flow_table, ld,
>> > >> local_datapaths);
>> > >>
>> > >>          match_init_catchall(&match);
>> > >>          ofpbuf_clear(ofpacts_p);
>> > >> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
>> > >> *sbrec_chassis_by_name,
>> > >>
>> > >>          struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
>> > >>          put_local_common_flows(dp_key, port_key, nested_container,
>> > >> &zone_ids,
>> > >> -                               ofpacts_p, flow_table);
>> > >> +                               ofpacts_p, flow_table, ld,
>> > >> local_datapaths);
>> > >>
>> > >>          /* Table 0, Priority 200, 150 and 100.
>> > >>           * ==============================
>> > >> diff --git a/ovn/northd/ovn-northd.8.xml
>> b/ovn/northd/ovn-northd.8.xml
>> > >> index 8fa5272..876c121 100644
>> > >> --- a/ovn/northd/ovn-northd.8.xml
>> > >> +++ b/ovn/northd/ovn-northd.8.xml
>> > >> @@ -2013,6 +2013,16 @@ next;
>> > >>        </li>
>> > >>
>> > >>        <li>
>> > >> +        A priority-100 logical flow with match
>> > >> +        <code>inport == <var>GW</var> &amp;&amp;
>> > >> +        flags.rcv_from_vlan == 1</code> has actions
>> > >> +        <code>eth.dst = <var>E</var>; next;</code>, where
>> > >> +        <var>GW</var> is the logical router distributed gateway
>> > >> +        port and <var>E</var> is the MAC address of router
>> > >> +        distributed gateway port.
>> > >> +      </li>
>> > >> +
>> > >> +      <li>
>> > >>          For each NAT rule in the OVN Northbound database that can
>> > >>          be handled in a distributed manner, a priority-100 logical
>> > >>          flow with match <code>ip4.src == <var>B</var> &amp;&amp;
>> > >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> > >> index bcf0b66..d012bb8 100644
>> > >> --- a/ovn/northd/ovn-northd.c
>> > >> +++ b/ovn/northd/ovn-northd.c
>> > >> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
>> > >> ovn_port *op,
>> > >>      } else {
>> > >>          ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>> > >>      }
>> > >> +
>> > >> +    if (op->peer && op->peer->od->localnet_port &&
>> > >> +        op->od->l3dgw_port && op->od->l3redirect_port &&
>> > >> +        (op != op->od->l3redirect_port) &&
>> > >> +        (op != op->od->l3dgw_port)) {
>> > >> +        ds_put_format(&match, " && is_chassis_resident(%s)",
>> > >> +                      op->od->l3redirect_port->json_key);
>> > >> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
>> > >> +    }
>> > >>      ds_put_format(&actions, "; "
>> > >>                    "%sreg1 = %s; "
>> > >>                    "eth.src = %s; "
>> > >> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths,
>> > struct
>> > >> hmap *ports,
>> > >>                        op->lrp_networks.ipv6_addrs[i].network_s,
>> > >>                        op->lrp_networks.ipv6_addrs[i].plen, NULL,
>> NULL);
>> > >>          }
>> > >> +
>> > >> +        /* For a reply packet from gateway with VLAN switch port as
>> > >> +         * destination, replace router internal port MAC with router
>> > >> gateway
>> > >> +         * MAC address, so that external switches can learn gateway
>> MAC
>> > >> +         * address. Later before delivering the packet to the port,
>> > >> +         * controller will replace the gateway MAC with router
>> internal
>> > >> port
>> > >> +         * MAC in table 33. */
>> > >> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
>> > >> +            op->od->l3redirect_port) {
>> > >> +            ds_clear(&actions);
>> > >> +            ds_clear(&match);
>> > >> +            ds_put_format(&match, "inport == %s", op->json_key);
>> > >> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>> > >> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>> > >> +                          op->od->l3redirect_port->json_key);
>> > >> +            ds_put_format(&actions,
>> > >> +                          "eth.src = %s; next;",
>> > op->lrp_networks.ea_s);
>> > >> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT,
>> 100,
>> > >> +                          ds_cstr(&match), ds_cstr(&actions));
>> > >> +        }
>> > >>      }
>> > >>
>> > >>      /* Convert the static routes to flows. */
>> > >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>> > >> index ad2101c..0de41d2 100644
>> > >> --- a/ovn/ovn-architecture.7.xml
>> > >> +++ b/ovn/ovn-architecture.7.xml
>> > >> @@ -1067,7 +1067,9 @@
>> > >>
>> > >>        <p>
>> > >>          Flows in table 33 resemble those in table 32 but for logical
>> > >> ports that
>> > >> -        reside locally rather than remotely.  For unicast logical
>> > output
>> > >> ports
>> > >> +        reside locally rather than remotely.  If these are VLAN
>> ports
>> > and
>> > >> +        packet has router gateway port MAC address as source,
>> replace
>> > it
>> > >> with
>> > >> +        router internal port MAC address. For unicast logical output
>> > >> ports
>> > >>          on the local hypervisor, the actions just resubmit to table
>> 34.
>> > >> For
>> > >>          multicast output ports that include one or more logical
>> ports
>> > on
>> > >> the
>> > >>          local hypervisor, for each such logical port <var>P</var>,
>> the
>> > >> actions
>> > >> --
>> > >> 1.8.3.1
>> > >>
>> > >> _______________________________________________
>> > >> dev mailing list
>> > >> dev@openvswitch.org
>> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> > >>
>> > >
>> > _______________________________________________
>> > dev mailing list
>> > dev@openvswitch.org
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
>
> --
> Miguel Ángel Ajo
> OSP / Networking DFG, OVN Squad Engineering
>
Numan Siddique Oct. 9, 2018, 9:14 a.m. | #14
On Fri, Sep 28, 2018 at 12:36 AM Guru Shetty <guru@ovn.org> wrote:

>
>
> On Mon, 24 Sep 2018 at 03:10, Miguel Angel Ajo Pelayo <majopela@redhat.com>
> wrote:
>
>>
>>
>> No worries Guru, I understand your feeling, I worked with Anil on
>> developing this feature, and it's indeed rather complex (we are actually
>> replacing keepalived + VRRP with openflow and BFD).
>>
>> I'm happy to work on a more detailed documentation, I guess that Numan,
>> Anil and I could team up to help that put documentation in shape where [1]
>> should be a good starting point for the admin side, and may be after that
>> Numan can give it a thought for making the code simpler where there's room
>> for that. It's easier to have others help simplify once we have good
>> documentation.
>>
>> I agree with Numans that, may be we should take also this opportunity
>> (this patch Anil has been working on to make VLAN private networks possible
>> with l3ha) to also support SR-IOV, since it's actually not far from there
>> at all.
>>
>> Guru, what's your biggest concern, the design documentation? the user
>> side ? both?. My worry with design documentaiton is that tends to rot when
>> new patches comes, do we have any example of internals documentation that
>> talk about code in tree?
>>
>
> My biggest concern is that reading the code does not give out a story.
> Everything in ovn-northd can be understood by just reading the code. But
> when it comes to gateway redirect, it is hard.
>
> For e.g., I am pasting a snippet of code from one of the patches in this
> series.
>
> ++    if (op->peer && op->peer->od->localnet_port &&+        op->od->l3dgw_port && op->od->l3redirect_port &&+        (op != op->od->l3redirect_port) &&+        (op != op->od->l3dgw_port)) {+        ds_put_format(&match, " && is_chassis_resident(%s)",+                      op->od->l3redirect_port->json_key);+        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");+    }
>
> You need to go back and read the first patches of the feature (gateway
> redirect) and its commit messages to understand the implications of the
> above if condition. The primary reason why it is hard is because, we try to
> move back and forth with packet flow and force it to move in non-natural
> directions.
>
> I feel that writing a documentation (A hybrid of man pages of
> ovn-architecture and ovn-northd) would make it easy to follow. For e.g., we
> follow a packet when it leaves the VM and heads to the gateway and gets
> SNATted.  Similarly, we follow the packet for DNAT. We can then enhance it
> with other use cases (for e.g this patch series).
>
> On a larger note, OVN is a network virtualization project. We implement
> the functionality of a router, switch, firewall etc and then when we
> interconnect them, things just work well. Code around Switches, routers,
> load-balancers etc are easy to understand because they are all in logical
> space. But gateways transcend the logical boundary to physical boundary.
> The reason why the "gateway redirect" code is hard to understand is because
> we are trying to implement it in the same router that is also used in
> logical space. A clean separation of boundaries does not exist. I tried to
> implement the clean separation with gateway routers, wherein the there is a
> separate router that connects to the physical world and is chassis bound.
> But OpenStack-OVN as a client did not want to handle the complication of
> maintaining 2 routers.
>
>
I was involved in moving away from the Gateway router to a distributed
gateway router port once Mickey Spiegel added the redirect chassis. It's
true that we didn't want to maintain 2 logical routers in networking-ovn.
But we moved mainly because we were co relating with the default neutron
reference implementation and wanted to bridge gaps between it (with various
agents ) and neutron + OVN. With the new redirect chassis support, we could
add the floating ip support for the VMs with distributed routing (instead
of centralized routing) and also later- HA support for centralized routing.

I have submitted a patch series as  an alternate solution to solve this
issue. Can you please take a look at it -
https://patchwork.ozlabs.org/project/openvswitch/list/?series=69280 and
provide your comments.
I will also work on having a documentation as you have suggested above.

Thanks
Numan


>
>
>
>
>>
>> [1]
>> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/high-availability.rst
>>
>>
>>
>> On Fri, Sep 21, 2018 at 9:28 PM Numan Siddique <nusiddiq@redhat.com>
>> wrote:
>>
>>> On Fri, Sep 21, 2018 at 11:06 PM Guru Shetty <guru@ovn.org> wrote:
>>>
>>> > >
>>> > >
>>> > > I have tried to make sense of this patch series a few times. I think
>>> > > adding increasing complications like this will make gateway code
>>> > > unmaintainable. The whole gateway redirect chassis already makes it
>>> > > un-understandable and now this will mean that no one will be able to
>>> > > understand it as we go forward. [/rant off]
>>> > >
>>> >
>>> > Sorry about the "rant" if it came out as rude - was not my intention. I
>>> > will try to offer something more constructive. When you initially
>>> looked at
>>> > the gateway redirect code, how long did it take for you to understand
>>> it?
>>> > My main concern is not with your code, but the underlying code. I
>>> initially
>>> > reviewed the underlying redirect code. And it took me a good 2 days to
>>> > understand it. Back then, we wanted something working and it looked
>>> like
>>> > something that can eventually be simplified. But, we have been adding
>>> more
>>> > things to it. And right now, if this patch series is added, it will
>>> likely
>>> > get very very hard to understand it as we go forward. Do you reckon
>>> that a
>>> > year from now, you will be able to debug a corner case in the code?
>>> >
>>> > If there are others who have understood the "redirect" gateway code
>>> easily,
>>> > please do chime in.
>>> >
>>> > My suggestion is to re-look at the redirect gateway design and see
>>> whether
>>> > the code can be simplified. Or whether we can write detailed
>>> documentation
>>> > about it to make it easy to understand. Only once we are satisfied
>>> with it,
>>> > we can attempt to add more stuff to it.
>>> >
>>> >
>>> I always forget the gateway code and the design. (May be because I
>>> haven't
>>> worked
>>> on this part of the code much) and I have to refresh my memory everytime.
>>> I agree that we need to have a detailed documentation. May be "a journey
>>> of
>>> a packet "
>>> kind of documentation in ovn-architecture would also help.
>>>
>>> The proposed solution in this patch series, If I understand it correctly
>>> runs the router pipeline
>>> in the source chassis and then resumes it on the gateway chassis.
>>> Recently
>>> I was looking into
>>> SR-IOV kind of scenarios, where the VM traffic by passes the host kernel,
>>> and I think the
>>> proposed solution will not work. In order to support N/S traffic for
>>> SR-IOV
>>> instances
>>> we have to find probably another way and that would also complicate the
>>> code further.
>>>
>>> However I think it is possible to have one solution to support gateway
>>> traffic for
>>> VLAN tenant networks which covers SR-IOV and the normal case. I think the
>>> router pipeline
>>> should be executed only on the gateway chassis. I will have a discussion
>>> with
>>> Anil on this, this monday and see if it's possible.
>>>
>>> But unfortunately, supporting these use cases will add some amount of
>>> complexity :(.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> >
>>> > >
>>> > >> ---
>>> > >>
>>> > >> v6->v7:
>>> > >> * Added this patch
>>> > >>
>>> > >>
>>> > >>  ovn/controller/physical.c   | 60
>>> > >> ++++++++++++++++++++++++++++++++++++++++++---
>>> > >>  ovn/northd/ovn-northd.8.xml | 10 ++++++++
>>> > >>  ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>>> > >>  ovn/ovn-architecture.7.xml  |  4 ++-
>>> > >>  4 files changed, 99 insertions(+), 4 deletions(-)
>>> > >>
>>> > >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>>> > >> index f269a1d..1f41f59 100644
>>> > >> --- a/ovn/controller/physical.c
>>> > >> +++ b/ovn/controller/physical.c
>>> > >> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
>>> > *binding,
>>> > >>  static void
>>> > >>  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>>> > >>                         bool nested_container, const struct zone_ids
>>> > >> *zone_ids,
>>> > >> -                       struct ofpbuf *ofpacts_p, struct hmap
>>> > *flow_table)
>>> > >> +                       struct ofpbuf *ofpacts_p, struct hmap
>>> > *flow_table,
>>> > >> +                       struct local_datapath *ld,
>>> > >> +                       const struct hmap *local_datapaths)
>>> > >>  {
>>> > >>      struct match match;
>>> > >>
>>> > >> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key,
>>> uint32_t
>>> > >> port_key,
>>> > >>          }
>>> > >>      }
>>> > >>
>>> > >> +    struct ofpbuf *clone = NULL;
>>> > >> +    clone = ofpbuf_clone(ofpacts_p);
>>> > >> +
>>> > >>      /* Resubmit to table 34. */
>>> > >>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>>> > >>      ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>>> > >>                      &match, ofpacts_p);
>>> > >>
>>> > >> +    /* For a reply packet from gateway with VLAN switch port as
>>> > >> destination
>>> > >> +     * (excluding localnet_port and external VLAN networks),
>>> gateway
>>> > >> router
>>> > >> +     * will use gateway MAC address as source MAC instead of router
>>> > >> internal
>>> > >> +     * port MAC, so that external switches can learn gateway MAC
>>> > address.
>>> > >> +     * Here (before packet is given to the port) we replace router
>>> > >> gateway
>>> > >> +     * MAC address with router internal port MAC. */
>>> > >> +    if (ld->localnet_port && (port_key !=
>>> > >> ld->localnet_port->tunnel_key)) {
>>> > >> +        for (int i = 0; i < ld->n_peer_dps; i++) {
>>> > >> +            struct local_datapath *peer_ldp = get_local_datapath(
>>> > >> +                local_datapaths,
>>> ld->peer_dps[i]->peer_dp->tunnel_key);
>>> > >> +            const struct sbrec_port_binding *crp;
>>> > >> +            crp = peer_ldp->chassisredirect_port;
>>> > >> +            if (!crp) {
>>> > >> +                continue;
>>> > >> +            }
>>> > >> +
>>> > >> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
>>> > >> +                       ld->peer_dps[i]->peer->logical_port) &&
>>> > >> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
>>> > >> +                for (int j = 0; j < crp->n_mac; j++) {
>>> > >> +                    struct lport_addresses laddrs;
>>> > >> +                    if (!extract_lsp_addresses(crp->mac[j],
>>> &laddrs)) {
>>> > >> +                        continue;
>>> > >> +                    }
>>> > >> +                    match_set_dl_src(&match, laddrs.ea);
>>> > >> +                    destroy_lport_addresses(&laddrs);
>>> > >> +                    break;
>>> > >> +                }
>>> > >> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac;
>>> j++)
>>> > {
>>> > >> +                    struct lport_addresses laddrs;
>>> > >> +                    uint64_t mac64;
>>> > >> +                    if (!extract_lsp_addresses(
>>> > >> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
>>> > >> +                        continue;
>>> > >> +                    }
>>> > >> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
>>> > >> +                    put_load(mac64,
>>> > >> +                             MFF_ETH_SRC, 0, 48, clone);
>>> > >> +                    destroy_lport_addresses(&laddrs);
>>> > >> +                    break;
>>> > >> +                }
>>> > >> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
>>> > >> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT,
>>> 150,
>>> > 0,
>>> > >> +                    &match, clone);
>>> > >> +            }
>>> > >> +        }
>>> > >> +    }
>>> > >> +    ofpbuf_delete(clone);
>>> > >> +
>>> > >>      /* Table 34, Priority 100.
>>> > >>       * =======================
>>> > >>       *
>>> > >> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
>>> > >> *sbrec_chassis_by_name,
>>> > >>
>>> > >>          struct zone_ids binding_zones = get_zone_ids(binding,
>>> > ct_zones);
>>> > >>          put_local_common_flows(dp_key, port_key, false,
>>> &binding_zones,
>>> > >> -                               ofpacts_p, flow_table);
>>> > >> +                               ofpacts_p, flow_table, ld,
>>> > >> local_datapaths);
>>> > >>
>>> > >>          match_init_catchall(&match);
>>> > >>          ofpbuf_clear(ofpacts_p);
>>> > >> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
>>> > >> *sbrec_chassis_by_name,
>>> > >>
>>> > >>          struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
>>> > >>          put_local_common_flows(dp_key, port_key, nested_container,
>>> > >> &zone_ids,
>>> > >> -                               ofpacts_p, flow_table);
>>> > >> +                               ofpacts_p, flow_table, ld,
>>> > >> local_datapaths);
>>> > >>
>>> > >>          /* Table 0, Priority 200, 150 and 100.
>>> > >>           * ==============================
>>> > >> diff --git a/ovn/northd/ovn-northd.8.xml
>>> b/ovn/northd/ovn-northd.8.xml
>>> > >> index 8fa5272..876c121 100644
>>> > >> --- a/ovn/northd/ovn-northd.8.xml
>>> > >> +++ b/ovn/northd/ovn-northd.8.xml
>>> > >> @@ -2013,6 +2013,16 @@ next;
>>> > >>        </li>
>>> > >>
>>> > >>        <li>
>>> > >> +        A priority-100 logical flow with match
>>> > >> +        <code>inport == <var>GW</var> &amp;&amp;
>>> > >> +        flags.rcv_from_vlan == 1</code> has actions
>>> > >> +        <code>eth.dst = <var>E</var>; next;</code>, where
>>> > >> +        <var>GW</var> is the logical router distributed gateway
>>> > >> +        port and <var>E</var> is the MAC address of router
>>> > >> +        distributed gateway port.
>>> > >> +      </li>
>>> > >> +
>>> > >> +      <li>
>>> > >>          For each NAT rule in the OVN Northbound database that can
>>> > >>          be handled in a distributed manner, a priority-100 logical
>>> > >>          flow with match <code>ip4.src == <var>B</var> &amp;&amp;
>>> > >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>>> > >> index bcf0b66..d012bb8 100644
>>> > >> --- a/ovn/northd/ovn-northd.c
>>> > >> +++ b/ovn/northd/ovn-northd.c
>>> > >> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
>>> > >> ovn_port *op,
>>> > >>      } else {
>>> > >>          ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>>> > >>      }
>>> > >> +
>>> > >> +    if (op->peer && op->peer->od->localnet_port &&
>>> > >> +        op->od->l3dgw_port && op->od->l3redirect_port &&
>>> > >> +        (op != op->od->l3redirect_port) &&
>>> > >> +        (op != op->od->l3dgw_port)) {
>>> > >> +        ds_put_format(&match, " && is_chassis_resident(%s)",
>>> > >> +                      op->od->l3redirect_port->json_key);
>>> > >> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
>>> > >> +    }
>>> > >>      ds_put_format(&actions, "; "
>>> > >>                    "%sreg1 = %s; "
>>> > >>                    "eth.src = %s; "
>>> > >> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths,
>>> > struct
>>> > >> hmap *ports,
>>> > >>                        op->lrp_networks.ipv6_addrs[i].network_s,
>>> > >>                        op->lrp_networks.ipv6_addrs[i].plen, NULL,
>>> NULL);
>>> > >>          }
>>> > >> +
>>> > >> +        /* For a reply packet from gateway with VLAN switch port as
>>> > >> +         * destination, replace router internal port MAC with
>>> router
>>> > >> gateway
>>> > >> +         * MAC address, so that external switches can learn
>>> gateway MAC
>>> > >> +         * address. Later before delivering the packet to the port,
>>> > >> +         * controller will replace the gateway MAC with router
>>> internal
>>> > >> port
>>> > >> +         * MAC in table 33. */
>>> > >> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
>>> > >> +            op->od->l3redirect_port) {
>>> > >> +            ds_clear(&actions);
>>> > >> +            ds_clear(&match);
>>> > >> +            ds_put_format(&match, "inport == %s", op->json_key);
>>> > >> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>>> > >> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>>> > >> +                          op->od->l3redirect_port->json_key);
>>> > >> +            ds_put_format(&actions,
>>> > >> +                          "eth.src = %s; next;",
>>> > op->lrp_networks.ea_s);
>>> > >> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT,
>>> 100,
>>> > >> +                          ds_cstr(&match), ds_cstr(&actions));
>>> > >> +        }
>>> > >>      }
>>> > >>
>>> > >>      /* Convert the static routes to flows. */
>>> > >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>>> > >> index ad2101c..0de41d2 100644
>>> > >> --- a/ovn/ovn-architecture.7.xml
>>> > >> +++ b/ovn/ovn-architecture.7.xml
>>> > >> @@ -1067,7 +1067,9 @@
>>> > >>
>>> > >>        <p>
>>> > >>          Flows in table 33 resemble those in table 32 but for
>>> logical
>>> > >> ports that
>>> > >> -        reside locally rather than remotely.  For unicast logical
>>> > output
>>> > >> ports
>>> > >> +        reside locally rather than remotely.  If these are VLAN
>>> ports
>>> > and
>>> > >> +        packet has router gateway port MAC address as source,
>>> replace
>>> > it
>>> > >> with
>>> > >> +        router internal port MAC address. For unicast logical
>>> output
>>> > >> ports
>>> > >>          on the local hypervisor, the actions just resubmit to
>>> table 34.
>>> > >> For
>>> > >>          multicast output ports that include one or more logical
>>> ports
>>> > on
>>> > >> the
>>> > >>          local hypervisor, for each such logical port <var>P</var>,
>>> the
>>> > >> actions
>>> > >> --
>>> > >> 1.8.3.1
>>> > >>
>>> > >> _______________________________________________
>>> > >> dev mailing list
>>> > >> dev@openvswitch.org
>>> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> > >>
>>> > >
>>> > _______________________________________________
>>> > dev mailing list
>>> > dev@openvswitch.org
>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> >
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>>
>> --
>> Miguel Ángel Ajo
>> OSP / Networking DFG, OVN Squad Engineering
>>
>

Patch

diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index f269a1d..1f41f59 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -190,7 +190,9 @@  get_zone_ids(const struct sbrec_port_binding *binding,
 static void
 put_local_common_flows(uint32_t dp_key, uint32_t port_key,
                        bool nested_container, const struct zone_ids *zone_ids,
-                       struct ofpbuf *ofpacts_p, struct hmap *flow_table)
+                       struct ofpbuf *ofpacts_p, struct hmap *flow_table,
+                       struct local_datapath *ld,
+                       const struct hmap *local_datapaths)
 {
     struct match match;
 
@@ -221,11 +223,63 @@  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
         }
     }
 
+    struct ofpbuf *clone = NULL;
+    clone = ofpbuf_clone(ofpacts_p);
+
     /* Resubmit to table 34. */
     put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
     ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
                     &match, ofpacts_p);
 
+    /* For a reply packet from gateway with VLAN switch port as destination
+     * (excluding localnet_port and external VLAN networks), gateway router
+     * will use gateway MAC address as source MAC instead of router internal
+     * port MAC, so that external switches can learn gateway MAC address.
+     * Here (before packet is given to the port) we replace router gateway
+     * MAC address with router internal port MAC. */
+    if (ld->localnet_port && (port_key != ld->localnet_port->tunnel_key)) {
+        for (int i = 0; i < ld->n_peer_dps; i++) {
+            struct local_datapath *peer_ldp = get_local_datapath(
+                local_datapaths, ld->peer_dps[i]->peer_dp->tunnel_key);
+            const struct sbrec_port_binding *crp;
+            crp = peer_ldp->chassisredirect_port;
+            if (!crp) {
+                continue;
+            }
+
+            if (strcmp(smap_get(&crp->options, "distributed-port"),
+                       ld->peer_dps[i]->peer->logical_port) &&
+                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
+                for (int j = 0; j < crp->n_mac; j++) {
+                    struct lport_addresses laddrs;
+                    if (!extract_lsp_addresses(crp->mac[j], &laddrs)) {
+                        continue;
+                    }
+                    match_set_dl_src(&match, laddrs.ea);
+                    destroy_lport_addresses(&laddrs);
+                    break;
+                }
+                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; j++) {
+                    struct lport_addresses laddrs;
+                    uint64_t mac64;
+                    if (!extract_lsp_addresses(
+                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
+                        continue;
+                    }
+                    mac64 = eth_addr_to_uint64(laddrs.ea);
+                    put_load(mac64,
+                             MFF_ETH_SRC, 0, 48, clone);
+                    destroy_lport_addresses(&laddrs);
+                    break;
+                }
+                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
+                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 150, 0,
+                    &match, clone);
+            }
+        }
+    }
+    ofpbuf_delete(clone);
+
     /* Table 34, Priority 100.
      * =======================
      *
@@ -330,7 +384,7 @@  consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
 
         struct zone_ids binding_zones = get_zone_ids(binding, ct_zones);
         put_local_common_flows(dp_key, port_key, false, &binding_zones,
-                               ofpacts_p, flow_table);
+                               ofpacts_p, flow_table, ld, local_datapaths);
 
         match_init_catchall(&match);
         ofpbuf_clear(ofpacts_p);
@@ -531,7 +585,7 @@  consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
 
         struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
         put_local_common_flows(dp_key, port_key, nested_container, &zone_ids,
-                               ofpacts_p, flow_table);
+                               ofpacts_p, flow_table, ld, local_datapaths);
 
         /* Table 0, Priority 200, 150 and 100.
          * ==============================
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 8fa5272..876c121 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -2013,6 +2013,16 @@  next;
       </li>
 
       <li>
+        A priority-100 logical flow with match
+        <code>inport == <var>GW</var> &amp;&amp;
+        flags.rcv_from_vlan == 1</code> has actions
+        <code>eth.dst = <var>E</var>; next;</code>, where
+        <var>GW</var> is the logical router distributed gateway
+        port and <var>E</var> is the MAC address of router
+        distributed gateway port.
+      </li>
+
+      <li>
         For each NAT rule in the OVN Northbound database that can
         be handled in a distributed manner, a priority-100 logical
         flow with match <code>ip4.src == <var>B</var> &amp;&amp;
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index bcf0b66..d012bb8 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4419,6 +4419,15 @@  add_route(struct hmap *lflows, const struct ovn_port *op,
     } else {
         ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
     }
+
+    if (op->peer && op->peer->od->localnet_port &&
+        op->od->l3dgw_port && op->od->l3redirect_port &&
+        (op != op->od->l3redirect_port) &&
+        (op != op->od->l3dgw_port)) {
+        ds_put_format(&match, " && is_chassis_resident(%s)",
+                      op->od->l3redirect_port->json_key);
+        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
+    }
     ds_put_format(&actions, "; "
                   "%sreg1 = %s; "
                   "eth.src = %s; "
@@ -6131,6 +6140,26 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                       op->lrp_networks.ipv6_addrs[i].network_s,
                       op->lrp_networks.ipv6_addrs[i].plen, NULL, NULL);
         }
+
+        /* For a reply packet from gateway with VLAN switch port as
+         * destination, replace router internal port MAC with router gateway
+         * MAC address, so that external switches can learn gateway MAC
+         * address. Later before delivering the packet to the port,
+         * controller will replace the gateway MAC with router internal port
+         * MAC in table 33. */
+        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
+            op->od->l3redirect_port) {
+            ds_clear(&actions);
+            ds_clear(&match);
+            ds_put_format(&match, "inport == %s", op->json_key);
+            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
+            ds_put_format(&match, " && is_chassis_resident(%s)",
+                          op->od->l3redirect_port->json_key);
+            ds_put_format(&actions,
+                          "eth.src = %s; next;", op->lrp_networks.ea_s);
+            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, 100,
+                          ds_cstr(&match), ds_cstr(&actions));
+        }
     }
 
     /* Convert the static routes to flows. */
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index ad2101c..0de41d2 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -1067,7 +1067,9 @@ 
 
       <p>
         Flows in table 33 resemble those in table 32 but for logical ports that
-        reside locally rather than remotely.  For unicast logical output ports
+        reside locally rather than remotely.  If these are VLAN ports and
+        packet has router gateway port MAC address as source, replace it with
+        router internal port MAC address. For unicast logical output ports
         on the local hypervisor, the actions just resubmit to table 34.  For
         multicast output ports that include one or more logical ports on the
         local hypervisor, for each such logical port <var>P</var>, the actions