diff mbox series

[ovs-dev] Add support to make fdb table local to the chassis.

Message ID 20231129064550.96265-1-naveen.yerramneni@nutanix.com
State Changes Requested
Delegated to: Dumitru Ceara
Headers show
Series [ovs-dev] Add support to make fdb table local to the chassis. | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Naveen Yerramneni Nov. 29, 2023, 6:45 a.m. UTC
This functionality can be enabled at the logical switch level:
  - "other_config:fdb_local" can be used to enable/disable this
    functionality, it is disabled by default.
  - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
    for locally learned fdb flows, default timeout is 300 secs.

If enabled, below lflow is added for each port that has unknown addr set.
  - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
    action=(commit_fdb_local(timeout=<timeout>); next;

New OVN action: "commit_fdb_local". This sets following OVS action.
  - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
          NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])

This is useful when OVN is managing VLAN network that has multiple ports
set with unknown addr and localnet_learn_fdb is enabled. With this config,
if there is east-west traffic flowing between VMs part of same VLAN
deployed on different hypervisors then, MAC addrs of the source and
destination VMs keeps flapping between VM port and localnet port in 
Southbound FDB table. Enabling fdb_local config makes fdb table local to
the chassis and avoids MAC flapping.

Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
---
 include/ovn/actions.h |   7 +++
 lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
 northd/northd.c       |  26 ++++++++++
 ovn-nb.xml            |  14 ++++++
 tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
 utilities/ovn-trace.c |   2 +
 6 files changed, 251 insertions(+)

Comments

0-day Robot Nov. 29, 2023, 6:58 a.m. UTC | #1
References:  <20231129064550.96265-1-naveen.yerramneni@nutanix.com>
 

Bleep bloop.  Greetings naveen.yerramneni, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author naveen.yerramneni <naveen.yerramneni@nutanix.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
WARNING: Line is 82 characters long (recommended limit is 79)
#100 FILE: lib/actions.c:5268:
ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)

WARNING: Line is 80 characters long (recommended limit is 79)
#160 FILE: lib/actions.c:5328:
     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);

ERROR: Inappropriate bracing around statement
#206 FILE: northd/northd.c:7055:
        if (ls_is_fdb_local(op->od->nbs))

Lines checked: 386, Warnings: 3, Errors: 2


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Dumitru Ceara Nov. 29, 2023, 8:54 a.m. UTC | #2
On 11/29/23 07:45, naveen.yerramneni wrote:
> This functionality can be enabled at the logical switch level:
>   - "other_config:fdb_local" can be used to enable/disable this
>     functionality, it is disabled by default.
>   - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>     for locally learned fdb flows, default timeout is 300 secs.
> 
> If enabled, below lflow is added for each port that has unknown addr set.
>   - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>     action=(commit_fdb_local(timeout=<timeout>); next;
> 
> New OVN action: "commit_fdb_local". This sets following OVS action.
>   - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>           NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
> 
> This is useful when OVN is managing VLAN network that has multiple ports
> set with unknown addr and localnet_learn_fdb is enabled. With this config,
> if there is east-west traffic flowing between VMs part of same VLAN
> deployed on different hypervisors then, MAC addrs of the source and
> destination VMs keeps flapping between VM port and localnet port in 
> Southbound FDB table. Enabling fdb_local config makes fdb table local to
> the chassis and avoids MAC flapping.
> 
> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
> ---

Hi Naveen,

Thanks a lot for the patch!

Just a note, we already have a fix for the east-west traffic that causes
FDB flapping when localnet is used:

https://github.com/ovn-org/ovn/commit/2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e

https://github.com/ovn-org/ovn/commit/f3a14907fe2b1ecdcfddfbed595cd097b6efbe14

In general, however, I think it's a very good idea to move the FDB away
from the Southbound and make it local to each hypervisor.  That reduces
load on the Southbound among other things.

>  include/ovn/actions.h |   7 +++
>  lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>  northd/northd.c       |  26 ++++++++++
>  ovn-nb.xml            |  14 ++++++
>  tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>  utilities/ovn-trace.c |   2 +
>  6 files changed, 251 insertions(+)
> 
> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> index 49cfe0624..85ac92cd3 100644
> --- a/include/ovn/actions.h
> +++ b/include/ovn/actions.h
> @@ -127,6 +127,7 @@ struct collector_set_ids;
>      OVNACT(CHK_LB_AFF,        ovnact_result)          \
>      OVNACT(SAMPLE,            ovnact_sample)          \
>      OVNACT(MAC_CACHE_USE,     ovnact_null)            \
> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>  
>  /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>  enum OVS_PACKED_ENUM ovnact_type {
> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>      uint16_t timeout;
>  };
>  
> +/* OVNACT_COMMIT_FBD_LOCAL. */
> +struct ovnact_commit_fdb_local{
> +    struct ovnact ovnact;
> +    uint16_t timeout;  /* fdb_local flow timeout */
> +};
> +
>  /* Internal use by the helpers below. */
>  void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>  void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
> diff --git a/lib/actions.c b/lib/actions.c
> index a73fe1a1e..f5aa78db1 100644
> --- a/lib/actions.c
> +++ b/lib/actions.c
> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>      ds_put_cstr(s, "mac_cache_use;");
>  }
>  
> +static void
> +parse_commit_fdb_local(struct action_context *ctx,
> +                     struct ovnact_commit_fdb_local *fdb_local)
> +{
> +    uint16_t timeout = 0;
> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
> +        return;
> +    }
> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
> +        return;
> +    }
> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
> +        return;
> +    }
> +    fdb_local->timeout = timeout;
> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
> +}
> +
> +static void
> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
> +                      struct ds *s)
> +{
> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
> +}
> +
> +static void
> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
> +{
> +}
> +
> +static void
> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
> +                        struct ofpbuf *ofpacts, uint32_t cookie)
> +{
> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
> +    struct match match = MATCH_CATCHALL_INITIALIZER;
> +    struct ofpact_learn_spec *ol_spec;
> +    unsigned int imm_bytes;
> +    uint8_t *src_imm;
> +
> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
> +    ol->idle_timeout = fdb_local->timeout;
> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
> +    ol->priority = OFP_DEFAULT_PRIORITY;
> +    ol->table_id = OFTABLE_GET_FDB;
> +    ol->cookie = htonll(cookie);
> +
> +    /* Match on metadata of the packet that created the new table. */
> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
> +    ol_spec->dst.ofs = 0;
> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
> +
> +    /* Match on metadata of the packet. */
> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
> +    ol_spec->dst.ofs = 0;
> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
> +
> +
> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
> +    ol_spec->dst.ofs = 0;
> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
> +
> +    ofpact_finish_LEARN(ofpacts, &ol);
> +}

A difference from today's SB.FDB centralized approach is that when
ovn-controller restarts these flows will be cleared, I think.

Are we OK with that?  I think so but if not what are the options to
avoid clearing the local fdb cache on restart?

Another difference with today's approach is that this avoids a
controller action, that's great!

> +
> +static void
> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
> +                      const struct ovnact_encode_params *ep,
> +                      struct ofpbuf *ofpacts)
> +{
> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
> +}
> +
>  static void
>  encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>                       const struct ovnact_encode_params *ep,
> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>          parse_sample(ctx);
>      } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>          ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>      } else {
>          lexer_syntax_error(ctx->lexer, "expecting action");
>      }
> diff --git a/northd/northd.c b/northd/northd.c
> index d1465ddf7..de18694a0 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>      return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>  }
>  
> +static bool
> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
> +{
> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
> +}
> +

Personally, I'd prefer if we don't add another config knob and we just
make this the only way FDB works.  We could also document that the FDB
SB table should be deprecated.

>  static bool
>  lsp_is_type_changed(const struct sbrec_port_binding *sb,
>                  const struct nbrec_logical_switch_port *nbsp,
> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows,
>      }
>  }
>  
> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300
> +

This, on the other hand, might be a good candidate for a config option.

>  static void
>  build_lswitch_learn_fdb_op(
>          struct ovn_port *op, struct hmap *lflows,
> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op(
>  
>      if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") ||
>          (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) {
> +
> +        if (ls_is_fdb_local(op->od->nbs))
> +        {
> +            uint32_t idle_timeout =smap_get_uint(
> +                        &op->od->nbs->other_config, "fdb_local_idle_timeout",
> +                        FDB_LOCAL_DEF_IDLE_TIMEOUT_S);
> +            ds_clear(match);
> +            ds_clear(actions);
> +            ds_put_format(match, "inport == %s", op->json_key);
> +            ds_put_format(actions, "commit_fdb_local(timeout=%u); next;",
> +                                    idle_timeout);
> +            ovn_lflow_add_with_lport_and_hint(lflows, op->od,
> +                                              S_SWITCH_IN_LOOKUP_FDB, 100,
> +                                              ds_cstr(match), ds_cstr(actions),
> +                                              op->key, &op->nbsp->header_);
> +            return;
> +        }
> +
>          ds_clear(match);
>          ds_clear(actions);
>          ds_put_format(match, "inport == %s", op->json_key);
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index fcb1c6ecc..3547ec4a6 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -803,6 +803,20 @@
>        </column>
>      </group>
>  
> +    <group title="Local FDB options">
> +      <column name="other_config" key="fdb_local"
> +              type='{"type": "boolean"}'>
> +        If set to <code>true</code>, FDB flows are commited only to the
> +        local chassis instead of southbound DB. Default is false.
> +      </column>
> +      <column name="other_config" key="fdb_local_idle_timeout"
> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
> +        Local FDB flows <code>idle_timeout</code> value in seconds. FDB local
> +        flows exceeding this timeout will be automatically removed. The value
> +        defaults to 300, 0 means disabled.
> +      </column>
> +    </group>
> +
>      <column name="copp">
>        <p>
>          The control plane protection policy from table <ref table="Copp"/>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 92cf27581..dff50364a 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
>  
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([Local FDB MAC learning])
> +ovn_start
> +net_add n1
> +
> +AT_CHECK([ovn-nbctl ls-add ls0])
> +
> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0])
> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"])
> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2])
> +
> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1])
> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"])
> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3])
> +
> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port])
> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1])
> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true])
> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1])
> +
> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true])
> +
> +sim_add hv1
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +ovs-vsctl -- add-port br-int vif0 -- \
> +    set interface vif0 external-ids:iface-id=vif0 \
> +    options:tx_pcap=hv1/vif0-tx.pcap \
> +    options:rxq_pcap=hv1/vif0-rx.pcap \
> +    ofport-request=1
> +ovs-vsctl -- add-port br-int vif1 -- \
> +    set interface vif1 external-ids:iface-id=vif1 \
> +    options:tx_pcap=hv1/vif1-tx.pcap \
> +    options:rxq_pcap=hv1/vif1-rx.pcap \
> +    ofport-request=2
> +ovs-vsctl -- add-port br-phys ext0 -- \
> +    set interface ext0 \
> +    options:tx_pcap=hv1/ext0-tx.pcap \
> +    options:rxq_pcap=hv1/ext0-rx.pcap \
> +    ofport-request=3
> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys
> +
> +wait_for_ports_up
> +AT_CHECK([ovn-nbctl --wait=hv sync])
> +
> +send_packet() {
> +    src_mac=$1
> +    src_ip=$2
> +    dst_mac=$3
> +    dst_ip=$4
> +    iface=$5
> +    packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
> +    ovs-appctl netdev-dummy/receive $iface $packet
> +}
> +
> +
> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet
> +ovn-sbctl dump-flows ls0 > sw0flows
> +AT_CAPTURE_FILE([sw0flows])
> +
> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl
> +  table=2 (ls_in_lookup_fdb   ), priority=0    , dnl
> +match=(1), action=(next;)
> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;)
> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;)
> +])
> +
> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl
> +  table=3 (ls_in_put_fdb      ), priority=0    , dnl
> +match=(1), action=(next;)
> +])
> +
> +
> +src_mac="505400000003"
> +src_ip=`ip_to_hex 10.0.0.3`
> +dst_mac="505400000004"
> +dst_ip=`ip_to_hex 10.0.0.4`
> +
> +# send packet from vif0(which has unknown addr set) to vif1
> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0
> +
> +# send packet from vif1 to vif0(which has unknown addr set)
> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1
> +
> +# send packet from underlay to vif1
> +src_mac="505400000064"
> +src_ip=`ip_to_hex 10.0.0.100`
> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0
> +AT_CHECK([ovn-nbctl --wait=hv sync])
> +
> +# Make sure that OVS table 71 is populated on hv1.
> +AS_BOX([Check that ovn-controller programs the flows for FDB])
> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt
> +AT_CAPTURE_FILE([hv1_offlows_table71.txt])
> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl
> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]]
> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]]
> +])
> +
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> +])
> +
>  OVN_FOR_EACH_NORTHD([
>  AT_SETUP([MAC binding aging])
>  AT_SKIP_IF([test $HAVE_SCAPY = no])
> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
> index 0b86eae7b..354f84a4b 100644
> --- a/utilities/ovn-trace.c
> +++ b/utilities/ovn-trace.c
> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len,
>              break;
>          case OVNACT_MAC_CACHE_USE:
>              break;
> +        case OVNACT_COMMIT_FDB_LOCAL:
> +            break;
>          }
>      }
>      ofpbuf_uninit(&stack);

Regards,
Dumitru
Naveen Yerramneni Nov. 30, 2023, 8:45 a.m. UTC | #3
> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 11/29/23 07:45, naveen.yerramneni wrote:
>> This functionality can be enabled at the logical switch level:
>>  - "other_config:fdb_local" can be used to enable/disable this
>>    functionality, it is disabled by default.
>>  - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>    for locally learned fdb flows, default timeout is 300 secs.
>> 
>> If enabled, below lflow is added for each port that has unknown addr set.
>>  - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>    action=(commit_fdb_local(timeout=<timeout>); next;
>> 
>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>  - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>          NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>> 
>> This is useful when OVN is managing VLAN network that has multiple ports
>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>> if there is east-west traffic flowing between VMs part of same VLAN
>> deployed on different hypervisors then, MAC addrs of the source and
>> destination VMs keeps flapping between VM port and localnet port in 
>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>> the chassis and avoids MAC flapping.
>> 
>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>> ---
> 
> Hi Naveen,
> 
> Thanks a lot for the patch!
> 
> Just a note, we already have a fix for the east-west traffic that causes
> FDB flapping when localnet is used:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
> 
> In general, however, I think it's a very good idea to move the FDB away
> from the Southbound and make it local to each hypervisor.  That reduces
> load on the Southbound among other things.
> 

Hi Dumitru,

Thanks for informing about the patches.
Yes, local FDB reduces load on southbound.


>> include/ovn/actions.h |   7 +++
>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>> northd/northd.c       |  26 ++++++++++
>> ovn-nb.xml            |  14 ++++++
>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>> utilities/ovn-trace.c |   2 +
>> 6 files changed, 251 insertions(+)
>> 
>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>> index 49cfe0624..85ac92cd3 100644
>> --- a/include/ovn/actions.h
>> +++ b/include/ovn/actions.h
>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>     OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>     OVNACT(SAMPLE,            ovnact_sample)          \
>>     OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>> 
>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>> enum OVS_PACKED_ENUM ovnact_type {
>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>     uint16_t timeout;
>> };
>> 
>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>> +struct ovnact_commit_fdb_local{
>> +    struct ovnact ovnact;
>> +    uint16_t timeout;  /* fdb_local flow timeout */
>> +};
>> +
>> /* Internal use by the helpers below. */
>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>> diff --git a/lib/actions.c b/lib/actions.c
>> index a73fe1a1e..f5aa78db1 100644
>> --- a/lib/actions.c
>> +++ b/lib/actions.c
>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>     ds_put_cstr(s, "mac_cache_use;");
>> }
>> 
>> +static void
>> +parse_commit_fdb_local(struct action_context *ctx,
>> +                     struct ovnact_commit_fdb_local *fdb_local)
>> +{
>> +    uint16_t timeout = 0;
>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>> +        return;
>> +    }
>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>> +        return;
>> +    }
>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>> +        return;
>> +    }
>> +    fdb_local->timeout = timeout;
>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>> +}
>> +
>> +static void
>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>> +                      struct ds *s)
>> +{
>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>> +}
>> +
>> +static void
>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>> +{
>> +}
>> +
>> +static void
>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>> +{
>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>> +    struct ofpact_learn_spec *ol_spec;
>> +    unsigned int imm_bytes;
>> +    uint8_t *src_imm;
>> +
>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>> +    ol->idle_timeout = fdb_local->timeout;
>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>> +    ol->table_id = OFTABLE_GET_FDB;
>> +    ol->cookie = htonll(cookie);
>> +
>> +    /* Match on metadata of the packet that created the new table. */
>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>> +    ol_spec->dst.ofs = 0;
>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>> +
>> +    /* Match on metadata of the packet. */
>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>> +    ol_spec->dst.ofs = 0;
>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>> +
>> +
>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>> +    ol_spec->dst.ofs = 0;
>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>> +
>> +    ofpact_finish_LEARN(ofpacts, &ol);
>> +}
> 
> A difference from today's SB.FDB centralized approach is that when
> ovn-controller restarts these flows will be cleared, I think.
> 
> Are we OK with that?  I think so but if not what are the options to
> avoid clearing the local fdb cache on restart?
> 

OVS has to relearn the FDB  flows whenever tables are cleared.
During this time, packets gets flooded. I need to think about possible 
options if we want to retain FDB table.

Can we take this up as an enhancement in a separate patch
once we identify a solution for this ?

> Another difference with today's approach is that this avoids a
> controller action, that's great!
> 
>> +
>> +static void
>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>> +                      const struct ovnact_encode_params *ep,
>> +                      struct ofpbuf *ofpacts)
>> +{
>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>> +}
>> +
>> static void
>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>                      const struct ovnact_encode_params *ep,
>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>         parse_sample(ctx);
>>     } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>         ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>     } else {
>>         lexer_syntax_error(ctx->lexer, "expecting action");
>>     }
>> diff --git a/northd/northd.c b/northd/northd.c
>> index d1465ddf7..de18694a0 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>     return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>> }
>> 
>> +static bool
>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>> +{
>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>> +}
>> +
> 
> Personally, I'd prefer if we don't add another config knob and we just
> make this the only way FDB works.  We could also document that the FDB
> SB table should be deprecated.
> 

If we want to make local FDB as default then, I think we need to handle overlay 
use case as well. Probably, we might have to add a new stage in logical switch
egress pipeline to learn FDB entries for packets coming over tunnel (or)
something similar.

Can we take this up in a separate patch ?
  

>> static bool
>> lsp_is_type_changed(const struct sbrec_port_binding *sb,
>>                 const struct nbrec_logical_switch_port *nbsp,
>> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows,
>>     }
>> }
>> 
>> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300
>> +
> 
> This, on the other hand, might be a good candidate for a config option.

“fdb_local_idle_timeout” option is added in this patch to configure the timeout.
Default value is 300 secs.

> 
>> static void
>> build_lswitch_learn_fdb_op(
>>         struct ovn_port *op, struct hmap *lflows,
>> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op(
>> 
>>     if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") ||
>>         (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) {
>> +
>> +        if (ls_is_fdb_local(op->od->nbs))
>> +        {
>> +            uint32_t idle_timeout =smap_get_uint(
>> +                        &op->od->nbs->other_config, "fdb_local_idle_timeout",
>> +                        FDB_LOCAL_DEF_IDLE_TIMEOUT_S);
>> +            ds_clear(match);
>> +            ds_clear(actions);
>> +            ds_put_format(match, "inport == %s", op->json_key);
>> +            ds_put_format(actions, "commit_fdb_local(timeout=%u); next;",
>> +                                    idle_timeout);
>> +            ovn_lflow_add_with_lport_and_hint(lflows, op->od,
>> +                                              S_SWITCH_IN_LOOKUP_FDB, 100,
>> +                                              ds_cstr(match), ds_cstr(actions),
>> +                                              op->key, &op->nbsp->header_);
>> +            return;
>> +        }
>> +
>>         ds_clear(match);
>>         ds_clear(actions);
>>         ds_put_format(match, "inport == %s", op->json_key);
>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>> index fcb1c6ecc..3547ec4a6 100644
>> --- a/ovn-nb.xml
>> +++ b/ovn-nb.xml
>> @@ -803,6 +803,20 @@
>>       </column>
>>     </group>
>> 
>> +    <group title="Local FDB options">
>> +      <column name="other_config" key="fdb_local"
>> +              type='{"type": "boolean"}'>
>> +        If set to <code>true</code>, FDB flows are commited only to the
>> +        local chassis instead of southbound DB. Default is false.
>> +      </column>
>> +      <column name="other_config" key="fdb_local_idle_timeout"
>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
>> +        Local FDB flows <code>idle_timeout</code> value in seconds. FDB local
>> +        flows exceeding this timeout will be automatically removed. The value
>> +        defaults to 300, 0 means disabled.
>> +      </column>
>> +    </group>
>> +
>>     <column name="copp">
>>       <p>
>>         The control plane protection policy from table <ref table="Copp"/>
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index 92cf27581..dff50364a 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1])
>> AT_CLEANUP
>> ])
>> 
>> +OVN_FOR_EACH_NORTHD([
>> +AT_SETUP([Local FDB MAC learning])
>> +ovn_start
>> +net_add n1
>> +
>> +AT_CHECK([ovn-nbctl ls-add ls0])
>> +
>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0])
>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"])
>> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2])
>> +
>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1])
>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"])
>> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3])
>> +
>> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port])
>> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
>> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
>> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1])
>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true])
>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1])
>> +
>> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true])
>> +
>> +sim_add hv1
>> +as hv1
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +ovs-vsctl -- add-port br-int vif0 -- \
>> +    set interface vif0 external-ids:iface-id=vif0 \
>> +    options:tx_pcap=hv1/vif0-tx.pcap \
>> +    options:rxq_pcap=hv1/vif0-rx.pcap \
>> +    ofport-request=1
>> +ovs-vsctl -- add-port br-int vif1 -- \
>> +    set interface vif1 external-ids:iface-id=vif1 \
>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>> +    options:rxq_pcap=hv1/vif1-rx.pcap \
>> +    ofport-request=2
>> +ovs-vsctl -- add-port br-phys ext0 -- \
>> +    set interface ext0 \
>> +    options:tx_pcap=hv1/ext0-tx.pcap \
>> +    options:rxq_pcap=hv1/ext0-rx.pcap \
>> +    ofport-request=3
>> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys
>> +
>> +wait_for_ports_up
>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>> +
>> +send_packet() {
>> +    src_mac=$1
>> +    src_ip=$2
>> +    dst_mac=$3
>> +    dst_ip=$4
>> +    iface=$5
>> +    packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>> +    ovs-appctl netdev-dummy/receive $iface $packet
>> +}
>> +
>> +
>> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet
>> +ovn-sbctl dump-flows ls0 > sw0flows
>> +AT_CAPTURE_FILE([sw0flows])
>> +
>> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl
>> +  table=2 (ls_in_lookup_fdb   ), priority=0    , dnl
>> +match=(1), action=(next;)
>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;)
>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;)
>> +])
>> +
>> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl
>> +  table=3 (ls_in_put_fdb      ), priority=0    , dnl
>> +match=(1), action=(next;)
>> +])
>> +
>> +
>> +src_mac="505400000003"
>> +src_ip=`ip_to_hex 10.0.0.3`
>> +dst_mac="505400000004"
>> +dst_ip=`ip_to_hex 10.0.0.4`
>> +
>> +# send packet from vif0(which has unknown addr set) to vif1
>> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0
>> +
>> +# send packet from vif1 to vif0(which has unknown addr set)
>> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1
>> +
>> +# send packet from underlay to vif1
>> +src_mac="505400000064"
>> +src_ip=`ip_to_hex 10.0.0.100`
>> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0
>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>> +
>> +# Make sure that OVS table 71 is populated on hv1.
>> +AS_BOX([Check that ovn-controller programs the flows for FDB])
>> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt
>> +AT_CAPTURE_FILE([hv1_offlows_table71.txt])
>> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl
>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]]
>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]]
>> +])
>> +
>> +OVN_CLEANUP([hv1])
>> +AT_CLEANUP
>> +])
>> +
>> OVN_FOR_EACH_NORTHD([
>> AT_SETUP([MAC binding aging])
>> AT_SKIP_IF([test $HAVE_SCAPY = no])
>> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
>> index 0b86eae7b..354f84a4b 100644
>> --- a/utilities/ovn-trace.c
>> +++ b/utilities/ovn-trace.c
>> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len,
>>             break;
>>         case OVNACT_MAC_CACHE_USE:
>>             break;
>> +        case OVNACT_COMMIT_FDB_LOCAL:
>> +            break;
>>         }
>>     }
>>     ofpbuf_uninit(&stack);
> 
> Regards,
> Dumitru
Dumitru Ceara Nov. 30, 2023, 12:36 p.m. UTC | #4
On 11/30/23 09:45, Naveen Yerramneni wrote:
> 
> 
>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>> This functionality can be enabled at the logical switch level:
>>>  - "other_config:fdb_local" can be used to enable/disable this
>>>    functionality, it is disabled by default.
>>>  - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>    for locally learned fdb flows, default timeout is 300 secs.
>>>
>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>  - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>    action=(commit_fdb_local(timeout=<timeout>); next;
>>>
>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>  - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>          NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>
>>> This is useful when OVN is managing VLAN network that has multiple ports
>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>> if there is east-west traffic flowing between VMs part of same VLAN
>>> deployed on different hypervisors then, MAC addrs of the source and
>>> destination VMs keeps flapping between VM port and localnet port in 
>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>> the chassis and avoids MAC flapping.
>>>
>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>> ---
>>
>> Hi Naveen,
>>
>> Thanks a lot for the patch!
>>
>> Just a note, we already have a fix for the east-west traffic that causes
>> FDB flapping when localnet is used:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>
>> In general, however, I think it's a very good idea to move the FDB away
>> from the Southbound and make it local to each hypervisor.  That reduces
>> load on the Southbound among other things.
>>
> 
> Hi Dumitru,
> 
> Thanks for informing about the patches.
> Yes, local FDB reduces load on southbound.
> 
> 
>>> include/ovn/actions.h |   7 +++
>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>> northd/northd.c       |  26 ++++++++++
>>> ovn-nb.xml            |  14 ++++++
>>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>> utilities/ovn-trace.c |   2 +
>>> 6 files changed, 251 insertions(+)
>>>
>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>> index 49cfe0624..85ac92cd3 100644
>>> --- a/include/ovn/actions.h
>>> +++ b/include/ovn/actions.h
>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>     OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>     OVNACT(SAMPLE,            ovnact_sample)          \
>>>     OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>
>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>> enum OVS_PACKED_ENUM ovnact_type {
>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>     uint16_t timeout;
>>> };
>>>
>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>> +struct ovnact_commit_fdb_local{
>>> +    struct ovnact ovnact;
>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>> +};
>>> +
>>> /* Internal use by the helpers below. */
>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>> diff --git a/lib/actions.c b/lib/actions.c
>>> index a73fe1a1e..f5aa78db1 100644
>>> --- a/lib/actions.c
>>> +++ b/lib/actions.c
>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>     ds_put_cstr(s, "mac_cache_use;");
>>> }
>>>
>>> +static void
>>> +parse_commit_fdb_local(struct action_context *ctx,
>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>> +{
>>> +    uint16_t timeout = 0;
>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>> +        return;
>>> +    }
>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>> +        return;
>>> +    }
>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>> +        return;
>>> +    }
>>> +    fdb_local->timeout = timeout;
>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>> +}
>>> +
>>> +static void
>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>> +                      struct ds *s)
>>> +{
>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>> +}
>>> +
>>> +static void
>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>> +{
>>> +}
>>> +
>>> +static void
>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>> +{
>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>> +    struct ofpact_learn_spec *ol_spec;
>>> +    unsigned int imm_bytes;
>>> +    uint8_t *src_imm;
>>> +
>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>> +    ol->idle_timeout = fdb_local->timeout;
>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>> +    ol->table_id = OFTABLE_GET_FDB;
>>> +    ol->cookie = htonll(cookie);
>>> +
>>> +    /* Match on metadata of the packet that created the new table. */
>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>> +    ol_spec->dst.ofs = 0;
>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>> +
>>> +    /* Match on metadata of the packet. */
>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>> +    ol_spec->dst.ofs = 0;
>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>> +
>>> +
>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>> +    ol_spec->dst.ofs = 0;
>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>> +
>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>> +}
>>
>> A difference from today's SB.FDB centralized approach is that when
>> ovn-controller restarts these flows will be cleared, I think.
>>
>> Are we OK with that?  I think so but if not what are the options to
>> avoid clearing the local fdb cache on restart?
>>
> 
> OVS has to relearn the FDB  flows whenever tables are cleared.
> During this time, packets gets flooded. I need to think about possible 
> options if we want to retain FDB table.
> 
> Can we take this up as an enhancement in a separate patch
> once we identify a solution for this ?
> 

Sounds good to me.

>> Another difference with today's approach is that this avoids a
>> controller action, that's great!
>>
>>> +
>>> +static void
>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>> +                      const struct ovnact_encode_params *ep,
>>> +                      struct ofpbuf *ofpacts)
>>> +{
>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>> +}
>>> +
>>> static void
>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>                      const struct ovnact_encode_params *ep,
>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>         parse_sample(ctx);
>>>     } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>         ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>     } else {
>>>         lexer_syntax_error(ctx->lexer, "expecting action");
>>>     }
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index d1465ddf7..de18694a0 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>     return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>> }
>>>
>>> +static bool
>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>> +{
>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>> +}
>>> +
>>
>> Personally, I'd prefer if we don't add another config knob and we just
>> make this the only way FDB works.  We could also document that the FDB
>> SB table should be deprecated.
>>
> 
> If we want to make local FDB as default then, I think we need to handle overlay 
> use case as well. Probably, we might have to add a new stage in logical switch
> egress pipeline to learn FDB entries for packets coming over tunnel (or)
> something similar.
> 
> Can we take this up in a separate patch ?
>   
> 

Given that we have a fix already for the original problem you were
trying to address I would prefer that we avoid adding new config knobs
and handle both the localnet and overlay cases at the same time.

OTOH, why is there a difference?

The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
learn() action) into MFF_LOG_INPORT:

With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
VXLAN that's not the case but that's already a documented limitation,
LOG_INPORT is not available after VXLAN tunneling therefore features
that need it (like egress ACLs matching against ingress port
identifiers) are not supported:

https://github.com/ovn-org/ovn/blob/main/ovn-architecture.7.xml#L2842

One more thing we need to take care of in order to be able to make local
FDB the default is "FDB refresh":

551527a5e68e ("controller: Update FDB timestamp")

https://github.com/ovn-org/ovn/commit/551527a5e68e7233ad80d212d549df98f13e37bc

>>> static bool
>>> lsp_is_type_changed(const struct sbrec_port_binding *sb,
>>>                 const struct nbrec_logical_switch_port *nbsp,
>>> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows,
>>>     }
>>> }
>>>
>>> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300
>>> +
>>
>> This, on the other hand, might be a good candidate for a config option.
> 
> “fdb_local_idle_timeout” option is added in this patch to configure the timeout.
> Default value is 300 secs.
> 

True, I missed that you do that below.  However, we already have
"LS.other_config:fdb_age_threshold" in the NB database to control
exactly the same thing.  I'd just use that one.

>>
>>> static void
>>> build_lswitch_learn_fdb_op(
>>>         struct ovn_port *op, struct hmap *lflows,
>>> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op(
>>>
>>>     if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") ||
>>>         (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) {
>>> +
>>> +        if (ls_is_fdb_local(op->od->nbs))
>>> +        {
>>> +            uint32_t idle_timeout =smap_get_uint(
>>> +                        &op->od->nbs->other_config, "fdb_local_idle_timeout",
>>> +                        FDB_LOCAL_DEF_IDLE_TIMEOUT_S);
>>> +            ds_clear(match);
>>> +            ds_clear(actions);
>>> +            ds_put_format(match, "inport == %s", op->json_key);
>>> +            ds_put_format(actions, "commit_fdb_local(timeout=%u); next;",
>>> +                                    idle_timeout);
>>> +            ovn_lflow_add_with_lport_and_hint(lflows, op->od,
>>> +                                              S_SWITCH_IN_LOOKUP_FDB, 100,
>>> +                                              ds_cstr(match), ds_cstr(actions),
>>> +                                              op->key, &op->nbsp->header_);
>>> +            return;
>>> +        }
>>> +
>>>         ds_clear(match);
>>>         ds_clear(actions);
>>>         ds_put_format(match, "inport == %s", op->json_key);
>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>> index fcb1c6ecc..3547ec4a6 100644
>>> --- a/ovn-nb.xml
>>> +++ b/ovn-nb.xml
>>> @@ -803,6 +803,20 @@
>>>       </column>
>>>     </group>
>>>
>>> +    <group title="Local FDB options">
>>> +      <column name="other_config" key="fdb_local"
>>> +              type='{"type": "boolean"}'>
>>> +        If set to <code>true</code>, FDB flows are commited only to the
>>> +        local chassis instead of southbound DB. Default is false.
>>> +      </column>
>>> +      <column name="other_config" key="fdb_local_idle_timeout"
>>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
>>> +        Local FDB flows <code>idle_timeout</code> value in seconds. FDB local
>>> +        flows exceeding this timeout will be automatically removed. The value
>>> +        defaults to 300, 0 means disabled.
>>> +      </column>
>>> +    </group>
>>> +
>>>     <column name="copp">
>>>       <p>
>>>         The control plane protection policy from table <ref table="Copp"/>
>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>> index 92cf27581..dff50364a 100644
>>> --- a/tests/ovn.at
>>> +++ b/tests/ovn.at
>>> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1])
>>> AT_CLEANUP
>>> ])
>>>
>>> +OVN_FOR_EACH_NORTHD([
>>> +AT_SETUP([Local FDB MAC learning])
>>> +ovn_start
>>> +net_add n1
>>> +
>>> +AT_CHECK([ovn-nbctl ls-add ls0])
>>> +
>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0])
>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"])
>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2])
>>> +
>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1])
>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"])
>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3])
>>> +
>>> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port])
>>> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
>>> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
>>> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1])
>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true])
>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1])
>>> +
>>> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true])
>>> +
>>> +sim_add hv1
>>> +as hv1
>>> +ovs-vsctl add-br br-phys
>>> +ovn_attach n1 br-phys 192.168.0.1
>>> +ovs-vsctl -- add-port br-int vif0 -- \
>>> +    set interface vif0 external-ids:iface-id=vif0 \
>>> +    options:tx_pcap=hv1/vif0-tx.pcap \
>>> +    options:rxq_pcap=hv1/vif0-rx.pcap \
>>> +    ofport-request=1
>>> +ovs-vsctl -- add-port br-int vif1 -- \
>>> +    set interface vif1 external-ids:iface-id=vif1 \
>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>>> +    options:rxq_pcap=hv1/vif1-rx.pcap \
>>> +    ofport-request=2
>>> +ovs-vsctl -- add-port br-phys ext0 -- \
>>> +    set interface ext0 \
>>> +    options:tx_pcap=hv1/ext0-tx.pcap \
>>> +    options:rxq_pcap=hv1/ext0-rx.pcap \
>>> +    ofport-request=3
>>> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys
>>> +
>>> +wait_for_ports_up
>>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>>> +
>>> +send_packet() {
>>> +    src_mac=$1
>>> +    src_ip=$2
>>> +    dst_mac=$3
>>> +    dst_ip=$4
>>> +    iface=$5
>>> +    packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>>> +    ovs-appctl netdev-dummy/receive $iface $packet
>>> +}
>>> +
>>> +
>>> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet
>>> +ovn-sbctl dump-flows ls0 > sw0flows
>>> +AT_CAPTURE_FILE([sw0flows])
>>> +
>>> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl
>>> +  table=2 (ls_in_lookup_fdb   ), priority=0    , dnl
>>> +match=(1), action=(next;)
>>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>>> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;)
>>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>>> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;)
>>> +])
>>> +
>>> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl
>>> +  table=3 (ls_in_put_fdb      ), priority=0    , dnl
>>> +match=(1), action=(next;)
>>> +])
>>> +
>>> +
>>> +src_mac="505400000003"
>>> +src_ip=`ip_to_hex 10.0.0.3`
>>> +dst_mac="505400000004"
>>> +dst_ip=`ip_to_hex 10.0.0.4`
>>> +
>>> +# send packet from vif0(which has unknown addr set) to vif1
>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0
>>> +
>>> +# send packet from vif1 to vif0(which has unknown addr set)
>>> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1
>>> +
>>> +# send packet from underlay to vif1
>>> +src_mac="505400000064"
>>> +src_ip=`ip_to_hex 10.0.0.100`
>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0
>>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>>> +
>>> +# Make sure that OVS table 71 is populated on hv1.
>>> +AS_BOX([Check that ovn-controller programs the flows for FDB])
>>> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt
>>> +AT_CAPTURE_FILE([hv1_offlows_table71.txt])
>>> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl
>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]]
>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]]
>>> +])
>>> +
>>> +OVN_CLEANUP([hv1])
>>> +AT_CLEANUP
>>> +])
>>> +
>>> OVN_FOR_EACH_NORTHD([
>>> AT_SETUP([MAC binding aging])
>>> AT_SKIP_IF([test $HAVE_SCAPY = no])
>>> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
>>> index 0b86eae7b..354f84a4b 100644
>>> --- a/utilities/ovn-trace.c
>>> +++ b/utilities/ovn-trace.c
>>> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len,
>>>             break;
>>>         case OVNACT_MAC_CACHE_USE:
>>>             break;
>>> +        case OVNACT_COMMIT_FDB_LOCAL:
>>> +            break;
>>>         }
>>>     }
>>>     ofpbuf_uninit(&stack);
>>

Regards,
Dumitru
Naveen Yerramneni Nov. 30, 2023, 2:54 p.m. UTC | #5
> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 11/30/23 09:45, Naveen Yerramneni wrote:
>> 
>> 
>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>> This functionality can be enabled at the logical switch level:
>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>   functionality, it is disabled by default.
>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>   for locally learned fdb flows, default timeout is 300 secs.
>>>> 
>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>   action=(commit_fdb_local(timeout=<timeout>); next;
>>>> 
>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>         NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>> 
>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>> the chassis and avoids MAC flapping.
>>>> 
>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>> ---
>>> 
>>> Hi Naveen,
>>> 
>>> Thanks a lot for the patch!
>>> 
>>> Just a note, we already have a fix for the east-west traffic that causes
>>> FDB flapping when localnet is used:
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>> 
>>> In general, however, I think it's a very good idea to move the FDB away
>>> from the Southbound and make it local to each hypervisor.  That reduces
>>> load on the Southbound among other things.
>>> 
>> 
>> Hi Dumitru,
>> 
>> Thanks for informing about the patches.
>> Yes, local FDB reduces load on southbound.
>> 
>> 
>>>> include/ovn/actions.h |   7 +++
>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>> northd/northd.c       |  26 ++++++++++
>>>> ovn-nb.xml            |  14 ++++++
>>>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>> utilities/ovn-trace.c |   2 +
>>>> 6 files changed, 251 insertions(+)
>>>> 
>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>> index 49cfe0624..85ac92cd3 100644
>>>> --- a/include/ovn/actions.h
>>>> +++ b/include/ovn/actions.h
>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>    OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>    OVNACT(SAMPLE,            ovnact_sample)          \
>>>>    OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>> 
>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>    uint16_t timeout;
>>>> };
>>>> 
>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>> +struct ovnact_commit_fdb_local{
>>>> +    struct ovnact ovnact;
>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>> +};
>>>> +
>>>> /* Internal use by the helpers below. */
>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>> index a73fe1a1e..f5aa78db1 100644
>>>> --- a/lib/actions.c
>>>> +++ b/lib/actions.c
>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>    ds_put_cstr(s, "mac_cache_use;");
>>>> }
>>>> 
>>>> +static void
>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>> +{
>>>> +    uint16_t timeout = 0;
>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>> +        return;
>>>> +    }
>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>> +        return;
>>>> +    }
>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>> +        return;
>>>> +    }
>>>> +    fdb_local->timeout = timeout;
>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>> +}
>>>> +
>>>> +static void
>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>> +                      struct ds *s)
>>>> +{
>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>> +}
>>>> +
>>>> +static void
>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>> +{
>>>> +}
>>>> +
>>>> +static void
>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>> +{
>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>> +    struct ofpact_learn_spec *ol_spec;
>>>> +    unsigned int imm_bytes;
>>>> +    uint8_t *src_imm;
>>>> +
>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>> +    ol->cookie = htonll(cookie);
>>>> +
>>>> +    /* Match on metadata of the packet that created the new table. */
>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>> +    ol_spec->dst.ofs = 0;
>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>> +
>>>> +    /* Match on metadata of the packet. */
>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>> +    ol_spec->dst.ofs = 0;
>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>> +
>>>> +
>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>> +    ol_spec->dst.ofs = 0;
>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>> +
>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>> +}
>>> 
>>> A difference from today's SB.FDB centralized approach is that when
>>> ovn-controller restarts these flows will be cleared, I think.
>>> 
>>> Are we OK with that?  I think so but if not what are the options to
>>> avoid clearing the local fdb cache on restart?
>>> 
>> 
>> OVS has to relearn the FDB  flows whenever tables are cleared.
>> During this time, packets gets flooded. I need to think about possible 
>> options if we want to retain FDB table.
>> 
>> Can we take this up as an enhancement in a separate patch
>> once we identify a solution for this ?
>> 
> 
> Sounds good to me.
> 
>>> Another difference with today's approach is that this avoids a
>>> controller action, that's great!
>>> 
>>>> +
>>>> +static void
>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>> +                      const struct ovnact_encode_params *ep,
>>>> +                      struct ofpbuf *ofpacts)
>>>> +{
>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>> +}
>>>> +
>>>> static void
>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>                     const struct ovnact_encode_params *ep,
>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>        parse_sample(ctx);
>>>>    } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>        ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>    } else {
>>>>        lexer_syntax_error(ctx->lexer, "expecting action");
>>>>    }
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index d1465ddf7..de18694a0 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>    return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>> }
>>>> 
>>>> +static bool
>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>> +{
>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>> +}
>>>> +
>>> 
>>> Personally, I'd prefer if we don't add another config knob and we just
>>> make this the only way FDB works.  We could also document that the FDB
>>> SB table should be deprecated.
>>> 
>> 
>> If we want to make local FDB as default then, I think we need to handle overlay 
>> use case as well. Probably, we might have to add a new stage in logical switch
>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>> something similar.
>> 
>> Can we take this up in a separate patch ?
>> 
>> 
> 
> Given that we have a fix already for the original problem you were
> trying to address I would prefer that we avoid adding new config knobs
> and handle both the localnet and overlay cases at the same time.
> 

Sure.

> OTOH, why is there a difference?
> 
> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
> learn() action) into MFF_LOG_INPORT:
> 
> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
> VXLAN that's not the case but that's already a documented limitation,
> LOG_INPORT is not available after VXLAN tunneling therefore features
> that need it (like egress ACLs matching against ingress port
> identifiers) are not supported:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
> 

In case of overlay, ingress and egress pipelines are processed on different nodes
assuming source and dest VMs are on different nodes. I think this makes local FDB
to learn only local VMs MACs and it never learn remote VMs MACs.
 
Example: 
2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.


> One more thing we need to take care of in order to be able to make local
> FDB the default is "FDB refresh":
> 
> 551527a5e68e ("controller: Update FDB timestamp")
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
> 

Sure.

>>>> static bool
>>>> lsp_is_type_changed(const struct sbrec_port_binding *sb,
>>>>                const struct nbrec_logical_switch_port *nbsp,
>>>> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows,
>>>>    }
>>>> }
>>>> 
>>>> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300
>>>> +
>>> 
>>> This, on the other hand, might be a good candidate for a config option.
>> 
>> “fdb_local_idle_timeout” option is added in this patch to configure the timeout.
>> Default value is 300 secs.
>> 
> 
> True, I missed that you do that below.  However, we already have
> "LS.other_config:fdb_age_threshold" in the NB database to control
> exactly the same thing.  I'd just use that one.
> 

Yes, we can reuse the same.

>>> 
>>>> static void
>>>> build_lswitch_learn_fdb_op(
>>>>        struct ovn_port *op, struct hmap *lflows,
>>>> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op(
>>>> 
>>>>    if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") ||
>>>>        (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) {
>>>> +
>>>> +        if (ls_is_fdb_local(op->od->nbs))
>>>> +        {
>>>> +            uint32_t idle_timeout =smap_get_uint(
>>>> +                        &op->od->nbs->other_config, "fdb_local_idle_timeout",
>>>> +                        FDB_LOCAL_DEF_IDLE_TIMEOUT_S);
>>>> +            ds_clear(match);
>>>> +            ds_clear(actions);
>>>> +            ds_put_format(match, "inport == %s", op->json_key);
>>>> +            ds_put_format(actions, "commit_fdb_local(timeout=%u); next;",
>>>> +                                    idle_timeout);
>>>> +            ovn_lflow_add_with_lport_and_hint(lflows, op->od,
>>>> +                                              S_SWITCH_IN_LOOKUP_FDB, 100,
>>>> +                                              ds_cstr(match), ds_cstr(actions),
>>>> +                                              op->key, &op->nbsp->header_);
>>>> +            return;
>>>> +        }
>>>> +
>>>>        ds_clear(match);
>>>>        ds_clear(actions);
>>>>        ds_put_format(match, "inport == %s", op->json_key);
>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>>> index fcb1c6ecc..3547ec4a6 100644
>>>> --- a/ovn-nb.xml
>>>> +++ b/ovn-nb.xml
>>>> @@ -803,6 +803,20 @@
>>>>      </column>
>>>>    </group>
>>>> 
>>>> +    <group title="Local FDB options">
>>>> +      <column name="other_config" key="fdb_local"
>>>> +              type='{"type": "boolean"}'>
>>>> +        If set to <code>true</code>, FDB flows are commited only to the
>>>> +        local chassis instead of southbound DB. Default is false.
>>>> +      </column>
>>>> +      <column name="other_config" key="fdb_local_idle_timeout"
>>>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
>>>> +        Local FDB flows <code>idle_timeout</code> value in seconds. FDB local
>>>> +        flows exceeding this timeout will be automatically removed. The value
>>>> +        defaults to 300, 0 means disabled.
>>>> +      </column>
>>>> +    </group>
>>>> +
>>>>    <column name="copp">
>>>>      <p>
>>>>        The control plane protection policy from table <ref table="Copp"/>
>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>> index 92cf27581..dff50364a 100644
>>>> --- a/tests/ovn.at
>>>> +++ b/tests/ovn.at
>>>> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1])
>>>> AT_CLEANUP
>>>> ])
>>>> 
>>>> +OVN_FOR_EACH_NORTHD([
>>>> +AT_SETUP([Local FDB MAC learning])
>>>> +ovn_start
>>>> +net_add n1
>>>> +
>>>> +AT_CHECK([ovn-nbctl ls-add ls0])
>>>> +
>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0])
>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"])
>>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2])
>>>> +
>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1])
>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"])
>>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3])
>>>> +
>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port])
>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
>>>> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
>>>> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1])
>>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true])
>>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1])
>>>> +
>>>> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true])
>>>> +
>>>> +sim_add hv1
>>>> +as hv1
>>>> +ovs-vsctl add-br br-phys
>>>> +ovn_attach n1 br-phys 192.168.0.1
>>>> +ovs-vsctl -- add-port br-int vif0 -- \
>>>> +    set interface vif0 external-ids:iface-id=vif0 \
>>>> +    options:tx_pcap=hv1/vif0-tx.pcap \
>>>> +    options:rxq_pcap=hv1/vif0-rx.pcap \
>>>> +    ofport-request=1
>>>> +ovs-vsctl -- add-port br-int vif1 -- \
>>>> +    set interface vif1 external-ids:iface-id=vif1 \
>>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>>>> +    options:rxq_pcap=hv1/vif1-rx.pcap \
>>>> +    ofport-request=2
>>>> +ovs-vsctl -- add-port br-phys ext0 -- \
>>>> +    set interface ext0 \
>>>> +    options:tx_pcap=hv1/ext0-tx.pcap \
>>>> +    options:rxq_pcap=hv1/ext0-rx.pcap \
>>>> +    ofport-request=3
>>>> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys
>>>> +
>>>> +wait_for_ports_up
>>>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>>>> +
>>>> +send_packet() {
>>>> +    src_mac=$1
>>>> +    src_ip=$2
>>>> +    dst_mac=$3
>>>> +    dst_ip=$4
>>>> +    iface=$5
>>>> +    packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>>>> +    ovs-appctl netdev-dummy/receive $iface $packet
>>>> +}
>>>> +
>>>> +
>>>> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet
>>>> +ovn-sbctl dump-flows ls0 > sw0flows
>>>> +AT_CAPTURE_FILE([sw0flows])
>>>> +
>>>> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl
>>>> +  table=2 (ls_in_lookup_fdb   ), priority=0    , dnl
>>>> +match=(1), action=(next;)
>>>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>>>> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;)
>>>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>>>> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;)
>>>> +])
>>>> +
>>>> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl
>>>> +  table=3 (ls_in_put_fdb      ), priority=0    , dnl
>>>> +match=(1), action=(next;)
>>>> +])
>>>> +
>>>> +
>>>> +src_mac="505400000003"
>>>> +src_ip=`ip_to_hex 10.0.0.3`
>>>> +dst_mac="505400000004"
>>>> +dst_ip=`ip_to_hex 10.0.0.4`
>>>> +
>>>> +# send packet from vif0(which has unknown addr set) to vif1
>>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0
>>>> +
>>>> +# send packet from vif1 to vif0(which has unknown addr set)
>>>> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1
>>>> +
>>>> +# send packet from underlay to vif1
>>>> +src_mac="505400000064"
>>>> +src_ip=`ip_to_hex 10.0.0.100`
>>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0
>>>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>>>> +
>>>> +# Make sure that OVS table 71 is populated on hv1.
>>>> +AS_BOX([Check that ovn-controller programs the flows for FDB])
>>>> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt
>>>> +AT_CAPTURE_FILE([hv1_offlows_table71.txt])
>>>> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl
>>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]]
>>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]]
>>>> +])
>>>> +
>>>> +OVN_CLEANUP([hv1])
>>>> +AT_CLEANUP
>>>> +])
>>>> +
>>>> OVN_FOR_EACH_NORTHD([
>>>> AT_SETUP([MAC binding aging])
>>>> AT_SKIP_IF([test $HAVE_SCAPY = no])
>>>> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
>>>> index 0b86eae7b..354f84a4b 100644
>>>> --- a/utilities/ovn-trace.c
>>>> +++ b/utilities/ovn-trace.c
>>>> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len,
>>>>            break;
>>>>        case OVNACT_MAC_CACHE_USE:
>>>>            break;
>>>> +        case OVNACT_COMMIT_FDB_LOCAL:
>>>> +            break;
>>>>        }
>>>>    }
>>>>    ofpbuf_uninit(&stack);
>>> 
> 
> Regards,
> Dumitru
Dumitru Ceara Nov. 30, 2023, 3:32 p.m. UTC | #6
On 11/30/23 15:54, Naveen Yerramneni wrote:
> 
> 
>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 11/30/23 09:45, Naveen Yerramneni wrote:
>>>
>>>
>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>>> This functionality can be enabled at the logical switch level:
>>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>>   functionality, it is disabled by default.
>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>>   for locally learned fdb flows, default timeout is 300 secs.
>>>>>
>>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>>   action=(commit_fdb_local(timeout=<timeout>); next;
>>>>>
>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>>         NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>>>
>>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>>> the chassis and avoids MAC flapping.
>>>>>
>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>>> ---
>>>>
>>>> Hi Naveen,
>>>>
>>>> Thanks a lot for the patch!
>>>>
>>>> Just a note, we already have a fix for the east-west traffic that causes
>>>> FDB flapping when localnet is used:
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>>>
>>>> In general, however, I think it's a very good idea to move the FDB away
>>>> from the Southbound and make it local to each hypervisor.  That reduces
>>>> load on the Southbound among other things.
>>>>
>>>
>>> Hi Dumitru,
>>>
>>> Thanks for informing about the patches.
>>> Yes, local FDB reduces load on southbound.
>>>
>>>
>>>>> include/ovn/actions.h |   7 +++
>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>>> northd/northd.c       |  26 ++++++++++
>>>>> ovn-nb.xml            |  14 ++++++
>>>>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>>> utilities/ovn-trace.c |   2 +
>>>>> 6 files changed, 251 insertions(+)
>>>>>
>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>> index 49cfe0624..85ac92cd3 100644
>>>>> --- a/include/ovn/actions.h
>>>>> +++ b/include/ovn/actions.h
>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>>    OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>>    OVNACT(SAMPLE,            ovnact_sample)          \
>>>>>    OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>>>
>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>>    uint16_t timeout;
>>>>> };
>>>>>
>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>>> +struct ovnact_commit_fdb_local{
>>>>> +    struct ovnact ovnact;
>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>>> +};
>>>>> +
>>>>> /* Internal use by the helpers below. */
>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>> index a73fe1a1e..f5aa78db1 100644
>>>>> --- a/lib/actions.c
>>>>> +++ b/lib/actions.c
>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>>    ds_put_cstr(s, "mac_cache_use;");
>>>>> }
>>>>>
>>>>> +static void
>>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>>> +{
>>>>> +    uint16_t timeout = 0;
>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>> +        return;
>>>>> +    }
>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>> +        return;
>>>>> +    }
>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>>> +        return;
>>>>> +    }
>>>>> +    fdb_local->timeout = timeout;
>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>> +                      struct ds *s)
>>>>> +{
>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>>> +{
>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>>> +    struct ofpact_learn_spec *ol_spec;
>>>>> +    unsigned int imm_bytes;
>>>>> +    uint8_t *src_imm;
>>>>> +
>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>>> +    ol->cookie = htonll(cookie);
>>>>> +
>>>>> +    /* Match on metadata of the packet that created the new table. */
>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>>> +    ol_spec->dst.ofs = 0;
>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>>> +
>>>>> +    /* Match on metadata of the packet. */
>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>>> +    ol_spec->dst.ofs = 0;
>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>>> +
>>>>> +
>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>>> +    ol_spec->dst.ofs = 0;
>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>>> +
>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>>> +}
>>>>
>>>> A difference from today's SB.FDB centralized approach is that when
>>>> ovn-controller restarts these flows will be cleared, I think.
>>>>
>>>> Are we OK with that?  I think so but if not what are the options to
>>>> avoid clearing the local fdb cache on restart?
>>>>
>>>
>>> OVS has to relearn the FDB  flows whenever tables are cleared.
>>> During this time, packets gets flooded. I need to think about possible 
>>> options if we want to retain FDB table.
>>>
>>> Can we take this up as an enhancement in a separate patch
>>> once we identify a solution for this ?
>>>
>>
>> Sounds good to me.
>>
>>>> Another difference with today's approach is that this avoids a
>>>> controller action, that's great!
>>>>
>>>>> +
>>>>> +static void
>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>> +                      const struct ovnact_encode_params *ep,
>>>>> +                      struct ofpbuf *ofpacts)
>>>>> +{
>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>>> +}
>>>>> +
>>>>> static void
>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>>                     const struct ovnact_encode_params *ep,
>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>>        parse_sample(ctx);
>>>>>    } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>>        ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>>    } else {
>>>>>        lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>    }
>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>> index d1465ddf7..de18694a0 100644
>>>>> --- a/northd/northd.c
>>>>> +++ b/northd/northd.c
>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>>    return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>>> }
>>>>>
>>>>> +static bool
>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>>> +{
>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>>> +}
>>>>> +
>>>>
>>>> Personally, I'd prefer if we don't add another config knob and we just
>>>> make this the only way FDB works.  We could also document that the FDB
>>>> SB table should be deprecated.
>>>>
>>>
>>> If we want to make local FDB as default then, I think we need to handle overlay 
>>> use case as well. Probably, we might have to add a new stage in logical switch
>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>>> something similar.
>>>
>>> Can we take this up in a separate patch ?
>>>
>>>
>>
>> Given that we have a fix already for the original problem you were
>> trying to address I would prefer that we avoid adding new config knobs
>> and handle both the localnet and overlay cases at the same time.
>>
> 
> Sure.
> 
>> OTOH, why is there a difference?
>>
>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
>> learn() action) into MFF_LOG_INPORT:
>>
>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
>> VXLAN that's not the case but that's already a documented limitation,
>> LOG_INPORT is not available after VXLAN tunneling therefore features
>> that need it (like egress ACLs matching against ingress port
>> identifiers) are not supported:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
>>
> 
> In case of overlay, ingress and egress pipelines are processed on different nodes
> assuming source and dest VMs are on different nodes. I think this makes local FDB
> to learn only local VMs MACs and it never learn remote VMs MACs.
>  
> Example: 
> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
> 
> 

Ah, I see your point, you're right.  So it does look like we'd need a
dedicated fdb learning stage in the egress pipeline.  But that is
probably not that terrible.

>> One more thing we need to take care of in order to be able to make local
>> FDB the default is "FDB refresh":
>>
>> 551527a5e68e ("controller: Update FDB timestamp")
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
>>
> 
> Sure.
> 
>>>>> static bool
>>>>> lsp_is_type_changed(const struct sbrec_port_binding *sb,
>>>>>                const struct nbrec_logical_switch_port *nbsp,
>>>>> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows,
>>>>>    }
>>>>> }
>>>>>
>>>>> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300
>>>>> +
>>>>
>>>> This, on the other hand, might be a good candidate for a config option.
>>>
>>> “fdb_local_idle_timeout” option is added in this patch to configure the timeout.
>>> Default value is 300 secs.
>>>
>>
>> True, I missed that you do that below.  However, we already have
>> "LS.other_config:fdb_age_threshold" in the NB database to control
>> exactly the same thing.  I'd just use that one.
>>
> 
> Yes, we can reuse the same.
> 
>>>>
>>>>> static void
>>>>> build_lswitch_learn_fdb_op(
>>>>>        struct ovn_port *op, struct hmap *lflows,
>>>>> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op(
>>>>>
>>>>>    if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") ||
>>>>>        (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) {
>>>>> +
>>>>> +        if (ls_is_fdb_local(op->od->nbs))
>>>>> +        {
>>>>> +            uint32_t idle_timeout =smap_get_uint(
>>>>> +                        &op->od->nbs->other_config, "fdb_local_idle_timeout",
>>>>> +                        FDB_LOCAL_DEF_IDLE_TIMEOUT_S);
>>>>> +            ds_clear(match);
>>>>> +            ds_clear(actions);
>>>>> +            ds_put_format(match, "inport == %s", op->json_key);
>>>>> +            ds_put_format(actions, "commit_fdb_local(timeout=%u); next;",
>>>>> +                                    idle_timeout);
>>>>> +            ovn_lflow_add_with_lport_and_hint(lflows, op->od,
>>>>> +                                              S_SWITCH_IN_LOOKUP_FDB, 100,
>>>>> +                                              ds_cstr(match), ds_cstr(actions),
>>>>> +                                              op->key, &op->nbsp->header_);
>>>>> +            return;
>>>>> +        }
>>>>> +
>>>>>        ds_clear(match);
>>>>>        ds_clear(actions);
>>>>>        ds_put_format(match, "inport == %s", op->json_key);
>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>>>> index fcb1c6ecc..3547ec4a6 100644
>>>>> --- a/ovn-nb.xml
>>>>> +++ b/ovn-nb.xml
>>>>> @@ -803,6 +803,20 @@
>>>>>      </column>
>>>>>    </group>
>>>>>
>>>>> +    <group title="Local FDB options">
>>>>> +      <column name="other_config" key="fdb_local"
>>>>> +              type='{"type": "boolean"}'>
>>>>> +        If set to <code>true</code>, FDB flows are commited only to the
>>>>> +        local chassis instead of southbound DB. Default is false.
>>>>> +      </column>
>>>>> +      <column name="other_config" key="fdb_local_idle_timeout"
>>>>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
>>>>> +        Local FDB flows <code>idle_timeout</code> value in seconds. FDB local
>>>>> +        flows exceeding this timeout will be automatically removed. The value
>>>>> +        defaults to 300, 0 means disabled.
>>>>> +      </column>
>>>>> +    </group>
>>>>> +
>>>>>    <column name="copp">
>>>>>      <p>
>>>>>        The control plane protection policy from table <ref table="Copp"/>
>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>> index 92cf27581..dff50364a 100644
>>>>> --- a/tests/ovn.at
>>>>> +++ b/tests/ovn.at
>>>>> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1])
>>>>> AT_CLEANUP
>>>>> ])
>>>>>
>>>>> +OVN_FOR_EACH_NORTHD([
>>>>> +AT_SETUP([Local FDB MAC learning])
>>>>> +ovn_start
>>>>> +net_add n1
>>>>> +
>>>>> +AT_CHECK([ovn-nbctl ls-add ls0])
>>>>> +
>>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0])
>>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"])
>>>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2])
>>>>> +
>>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1])
>>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"])
>>>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3])
>>>>> +
>>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port])
>>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
>>>>> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
>>>>> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1])
>>>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true])
>>>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1])
>>>>> +
>>>>> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true])
>>>>> +
>>>>> +sim_add hv1
>>>>> +as hv1
>>>>> +ovs-vsctl add-br br-phys
>>>>> +ovn_attach n1 br-phys 192.168.0.1
>>>>> +ovs-vsctl -- add-port br-int vif0 -- \
>>>>> +    set interface vif0 external-ids:iface-id=vif0 \
>>>>> +    options:tx_pcap=hv1/vif0-tx.pcap \
>>>>> +    options:rxq_pcap=hv1/vif0-rx.pcap \
>>>>> +    ofport-request=1
>>>>> +ovs-vsctl -- add-port br-int vif1 -- \
>>>>> +    set interface vif1 external-ids:iface-id=vif1 \
>>>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>>>>> +    options:rxq_pcap=hv1/vif1-rx.pcap \
>>>>> +    ofport-request=2
>>>>> +ovs-vsctl -- add-port br-phys ext0 -- \
>>>>> +    set interface ext0 \
>>>>> +    options:tx_pcap=hv1/ext0-tx.pcap \
>>>>> +    options:rxq_pcap=hv1/ext0-rx.pcap \
>>>>> +    ofport-request=3
>>>>> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys
>>>>> +
>>>>> +wait_for_ports_up
>>>>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>>>>> +
>>>>> +send_packet() {
>>>>> +    src_mac=$1
>>>>> +    src_ip=$2
>>>>> +    dst_mac=$3
>>>>> +    dst_ip=$4
>>>>> +    iface=$5
>>>>> +    packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>>>>> +    ovs-appctl netdev-dummy/receive $iface $packet
>>>>> +}
>>>>> +
>>>>> +
>>>>> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet
>>>>> +ovn-sbctl dump-flows ls0 > sw0flows
>>>>> +AT_CAPTURE_FILE([sw0flows])
>>>>> +
>>>>> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl
>>>>> +  table=2 (ls_in_lookup_fdb   ), priority=0    , dnl
>>>>> +match=(1), action=(next;)
>>>>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>>>>> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;)
>>>>> +  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
>>>>> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;)
>>>>> +])
>>>>> +
>>>>> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl
>>>>> +  table=3 (ls_in_put_fdb      ), priority=0    , dnl
>>>>> +match=(1), action=(next;)
>>>>> +])
>>>>> +
>>>>> +
>>>>> +src_mac="505400000003"
>>>>> +src_ip=`ip_to_hex 10.0.0.3`
>>>>> +dst_mac="505400000004"
>>>>> +dst_ip=`ip_to_hex 10.0.0.4`
>>>>> +
>>>>> +# send packet from vif0(which has unknown addr set) to vif1
>>>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0
>>>>> +
>>>>> +# send packet from vif1 to vif0(which has unknown addr set)
>>>>> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1
>>>>> +
>>>>> +# send packet from underlay to vif1
>>>>> +src_mac="505400000064"
>>>>> +src_ip=`ip_to_hex 10.0.0.100`
>>>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0
>>>>> +AT_CHECK([ovn-nbctl --wait=hv sync])
>>>>> +
>>>>> +# Make sure that OVS table 71 is populated on hv1.
>>>>> +AS_BOX([Check that ovn-controller programs the flows for FDB])
>>>>> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt
>>>>> +AT_CAPTURE_FILE([hv1_offlows_table71.txt])
>>>>> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl
>>>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]]
>>>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]]
>>>>> +])
>>>>> +
>>>>> +OVN_CLEANUP([hv1])
>>>>> +AT_CLEANUP
>>>>> +])
>>>>> +
>>>>> OVN_FOR_EACH_NORTHD([
>>>>> AT_SETUP([MAC binding aging])
>>>>> AT_SKIP_IF([test $HAVE_SCAPY = no])
>>>>> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
>>>>> index 0b86eae7b..354f84a4b 100644
>>>>> --- a/utilities/ovn-trace.c
>>>>> +++ b/utilities/ovn-trace.c
>>>>> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len,
>>>>>            break;
>>>>>        case OVNACT_MAC_CACHE_USE:
>>>>>            break;
>>>>> +        case OVNACT_COMMIT_FDB_LOCAL:
>>>>> +            break;
>>>>>        }
>>>>>    }
>>>>>    ofpbuf_uninit(&stack);
>>>>
>>
>> Regards,
>> Dumitru
>
Aaron Conole Dec. 6, 2023, 5:52 p.m. UTC | #7
"naveen.yerramneni" <naveen.yerramneni@nutanix.com> writes:

> This functionality can be enabled at the logical switch level:
>   - "other_config:fdb_local" can be used to enable/disable this
>     functionality, it is disabled by default.
>   - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>     for locally learned fdb flows, default timeout is 300 secs.
>
> If enabled, below lflow is added for each port that has unknown addr set.
>   - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>     action=(commit_fdb_local(timeout=<timeout>); next;
>
> New OVN action: "commit_fdb_local". This sets following OVS action.
>   - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>           NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>
> This is useful when OVN is managing VLAN network that has multiple ports
> set with unknown addr and localnet_learn_fdb is enabled. With this config,
> if there is east-west traffic flowing between VMs part of same VLAN
> deployed on different hypervisors then, MAC addrs of the source and
> destination VMs keeps flapping between VM port and localnet port in 
> Southbound FDB table. Enabling fdb_local config makes fdb table local to
> the chassis and avoids MAC flapping.
>
> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
> ---

Recheck-request: github-robot-_ovn-kubernetes
Dumitru Ceara Dec. 18, 2023, 1:56 p.m. UTC | #8
On 11/30/23 16:32, Dumitru Ceara wrote:
> On 11/30/23 15:54, Naveen Yerramneni wrote:
>>
>>
>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>
>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
>>>>
>>>>
>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>
>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>>>> This functionality can be enabled at the logical switch level:
>>>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>>>   functionality, it is disabled by default.
>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>>>   for locally learned fdb flows, default timeout is 300 secs.
>>>>>>
>>>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>>>   action=(commit_fdb_local(timeout=<timeout>); next;
>>>>>>
>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>>>         NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>>>>
>>>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>>>> the chassis and avoids MAC flapping.
>>>>>>
>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>>>> ---
>>>>>
>>>>> Hi Naveen,
>>>>>
>>>>> Thanks a lot for the patch!
>>>>>
>>>>> Just a note, we already have a fix for the east-west traffic that causes
>>>>> FDB flapping when localnet is used:
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>>>>
>>>>> In general, however, I think it's a very good idea to move the FDB away
>>>>> from the Southbound and make it local to each hypervisor.  That reduces
>>>>> load on the Southbound among other things.
>>>>>
>>>>
>>>> Hi Dumitru,
>>>>
>>>> Thanks for informing about the patches.
>>>> Yes, local FDB reduces load on southbound.
>>>>
>>>>
>>>>>> include/ovn/actions.h |   7 +++
>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>>>> northd/northd.c       |  26 ++++++++++
>>>>>> ovn-nb.xml            |  14 ++++++
>>>>>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>>>> utilities/ovn-trace.c |   2 +
>>>>>> 6 files changed, 251 insertions(+)
>>>>>>
>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>>> index 49cfe0624..85ac92cd3 100644
>>>>>> --- a/include/ovn/actions.h
>>>>>> +++ b/include/ovn/actions.h
>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>>>    OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>>>    OVNACT(SAMPLE,            ovnact_sample)          \
>>>>>>    OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>>>>
>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>>>    uint16_t timeout;
>>>>>> };
>>>>>>
>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>>>> +struct ovnact_commit_fdb_local{
>>>>>> +    struct ovnact ovnact;
>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>>>> +};
>>>>>> +
>>>>>> /* Internal use by the helpers below. */
>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>>> index a73fe1a1e..f5aa78db1 100644
>>>>>> --- a/lib/actions.c
>>>>>> +++ b/lib/actions.c
>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>>>    ds_put_cstr(s, "mac_cache_use;");
>>>>>> }
>>>>>>
>>>>>> +static void
>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>>>> +{
>>>>>> +    uint16_t timeout = 0;
>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>> +        return;
>>>>>> +    }
>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>> +        return;
>>>>>> +    }
>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +    fdb_local->timeout = timeout;
>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>>>> +}
>>>>>> +
>>>>>> +static void
>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>> +                      struct ds *s)
>>>>>> +{
>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>>>> +}
>>>>>> +
>>>>>> +static void
>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>> +static void
>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>>>> +{
>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>>>> +    struct ofpact_learn_spec *ol_spec;
>>>>>> +    unsigned int imm_bytes;
>>>>>> +    uint8_t *src_imm;
>>>>>> +
>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>>>> +    ol->cookie = htonll(cookie);
>>>>>> +
>>>>>> +    /* Match on metadata of the packet that created the new table. */
>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>>>> +
>>>>>> +    /* Match on metadata of the packet. */
>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>>>> +
>>>>>> +
>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>>>> +
>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>>>> +}
>>>>>
>>>>> A difference from today's SB.FDB centralized approach is that when
>>>>> ovn-controller restarts these flows will be cleared, I think.
>>>>>
>>>>> Are we OK with that?  I think so but if not what are the options to
>>>>> avoid clearing the local fdb cache on restart?
>>>>>
>>>>
>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
>>>> During this time, packets gets flooded. I need to think about possible 
>>>> options if we want to retain FDB table.
>>>>
>>>> Can we take this up as an enhancement in a separate patch
>>>> once we identify a solution for this ?
>>>>
>>>
>>> Sounds good to me.
>>>
>>>>> Another difference with today's approach is that this avoids a
>>>>> controller action, that's great!
>>>>>
>>>>>> +
>>>>>> +static void
>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>> +                      const struct ovnact_encode_params *ep,
>>>>>> +                      struct ofpbuf *ofpacts)
>>>>>> +{
>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>>>> +}
>>>>>> +
>>>>>> static void
>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>>>                     const struct ovnact_encode_params *ep,
>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>>>        parse_sample(ctx);
>>>>>>    } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>>>        ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>>>    } else {
>>>>>>        lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>>    }
>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>> index d1465ddf7..de18694a0 100644
>>>>>> --- a/northd/northd.c
>>>>>> +++ b/northd/northd.c
>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>>>    return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>>>> }
>>>>>>
>>>>>> +static bool
>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>>>> +{
>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>>>> +}
>>>>>> +
>>>>>
>>>>> Personally, I'd prefer if we don't add another config knob and we just
>>>>> make this the only way FDB works.  We could also document that the FDB
>>>>> SB table should be deprecated.
>>>>>
>>>>
>>>> If we want to make local FDB as default then, I think we need to handle overlay 
>>>> use case as well. Probably, we might have to add a new stage in logical switch
>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>>>> something similar.
>>>>
>>>> Can we take this up in a separate patch ?
>>>>
>>>>
>>>
>>> Given that we have a fix already for the original problem you were
>>> trying to address I would prefer that we avoid adding new config knobs
>>> and handle both the localnet and overlay cases at the same time.
>>>
>>
>> Sure.
>>
>>> OTOH, why is there a difference?
>>>
>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
>>> learn() action) into MFF_LOG_INPORT:
>>>
>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
>>> VXLAN that's not the case but that's already a documented limitation,
>>> LOG_INPORT is not available after VXLAN tunneling therefore features
>>> that need it (like egress ACLs matching against ingress port
>>> identifiers) are not supported:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
>>>
>>
>> In case of overlay, ingress and egress pipelines are processed on different nodes
>> assuming source and dest VMs are on different nodes. I think this makes local FDB
>> to learn only local VMs MACs and it never learn remote VMs MACs.
>>  
>> Example: 
>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
>> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
>>
>>
> 
> Ah, I see your point, you're right.  So it does look like we'd need a
> dedicated fdb learning stage in the egress pipeline.  But that is
> probably not that terrible.
> 
>>> One more thing we need to take care of in order to be able to make local
>>> FDB the default is "FDB refresh":
>>>
>>> 551527a5e68e ("controller: Update FDB timestamp")
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
>>>
>>
>> Sure.
>>

Hi Naveen,

Just touching base on this again, I was curious if you're still planning
to work on making the FDB table local; I still think it's beneficial in
general.

Thanks,
Dumitru
Naveen Yerramneni Dec. 18, 2023, 3:17 p.m. UTC | #9
> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 11/30/23 16:32, Dumitru Ceara wrote:
>> On 11/30/23 15:54, Naveen Yerramneni wrote:
>>> 
>>> 
>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>> 
>>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
>>>>> 
>>>>> 
>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>> 
>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>>>>> This functionality can be enabled at the logical switch level:
>>>>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>>>>  functionality, it is disabled by default.
>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>>>>  for locally learned fdb flows, default timeout is 300 secs.
>>>>>>> 
>>>>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>>>>  action=(commit_fdb_local(timeout=<timeout>); next;
>>>>>>> 
>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>>>>        NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>>>>> 
>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>>>>> the chassis and avoids MAC flapping.
>>>>>>> 
>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>>>>> ---
>>>>>> 
>>>>>> Hi Naveen,
>>>>>> 
>>>>>> Thanks a lot for the patch!
>>>>>> 
>>>>>> Just a note, we already have a fix for the east-west traffic that causes
>>>>>> FDB flapping when localnet is used:
>>>>>> 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>>>>> 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>>>>> 
>>>>>> In general, however, I think it's a very good idea to move the FDB away
>>>>>> from the Southbound and make it local to each hypervisor.  That reduces
>>>>>> load on the Southbound among other things.
>>>>>> 
>>>>> 
>>>>> Hi Dumitru,
>>>>> 
>>>>> Thanks for informing about the patches.
>>>>> Yes, local FDB reduces load on southbound.
>>>>> 
>>>>> 
>>>>>>> include/ovn/actions.h |   7 +++
>>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>>>>> northd/northd.c       |  26 ++++++++++
>>>>>>> ovn-nb.xml            |  14 ++++++
>>>>>>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>> utilities/ovn-trace.c |   2 +
>>>>>>> 6 files changed, 251 insertions(+)
>>>>>>> 
>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>>>> index 49cfe0624..85ac92cd3 100644
>>>>>>> --- a/include/ovn/actions.h
>>>>>>> +++ b/include/ovn/actions.h
>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>>>>   OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>>>>   OVNACT(SAMPLE,            ovnact_sample)          \
>>>>>>>   OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>>>>> 
>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>>>>   uint16_t timeout;
>>>>>>> };
>>>>>>> 
>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>>>>> +struct ovnact_commit_fdb_local{
>>>>>>> +    struct ovnact ovnact;
>>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>>>>> +};
>>>>>>> +
>>>>>>> /* Internal use by the helpers below. */
>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>>>> index a73fe1a1e..f5aa78db1 100644
>>>>>>> --- a/lib/actions.c
>>>>>>> +++ b/lib/actions.c
>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>>>>   ds_put_cstr(s, "mac_cache_use;");
>>>>>>> }
>>>>>>> 
>>>>>>> +static void
>>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>>>>> +{
>>>>>>> +    uint16_t timeout = 0;
>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +    fdb_local->timeout = timeout;
>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>> +                      struct ds *s)
>>>>>>> +{
>>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>>>>> +{
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>>>>> +{
>>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>>>>> +    struct ofpact_learn_spec *ol_spec;
>>>>>>> +    unsigned int imm_bytes;
>>>>>>> +    uint8_t *src_imm;
>>>>>>> +
>>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>>>>> +    ol->cookie = htonll(cookie);
>>>>>>> +
>>>>>>> +    /* Match on metadata of the packet that created the new table. */
>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>>>>> +
>>>>>>> +    /* Match on metadata of the packet. */
>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>>>>> +
>>>>>>> +
>>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>>>>> +
>>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>>>>> +}
>>>>>> 
>>>>>> A difference from today's SB.FDB centralized approach is that when
>>>>>> ovn-controller restarts these flows will be cleared, I think.
>>>>>> 
>>>>>> Are we OK with that?  I think so but if not what are the options to
>>>>>> avoid clearing the local fdb cache on restart?
>>>>>> 
>>>>> 
>>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
>>>>> During this time, packets gets flooded. I need to think about possible 
>>>>> options if we want to retain FDB table.
>>>>> 
>>>>> Can we take this up as an enhancement in a separate patch
>>>>> once we identify a solution for this ?
>>>>> 
>>>> 
>>>> Sounds good to me.
>>>> 
>>>>>> Another difference with today's approach is that this avoids a
>>>>>> controller action, that's great!
>>>>>> 
>>>>>>> +
>>>>>>> +static void
>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>> +                      const struct ovnact_encode_params *ep,
>>>>>>> +                      struct ofpbuf *ofpacts)
>>>>>>> +{
>>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>>>>> +}
>>>>>>> +
>>>>>>> static void
>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>>>>                    const struct ovnact_encode_params *ep,
>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>>>>       parse_sample(ctx);
>>>>>>>   } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>>>>       ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>>>>   } else {
>>>>>>>       lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>>>   }
>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>> index d1465ddf7..de18694a0 100644
>>>>>>> --- a/northd/northd.c
>>>>>>> +++ b/northd/northd.c
>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>>>>   return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>>>>> }
>>>>>>> 
>>>>>>> +static bool
>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>>>>> +{
>>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>>>>> +}
>>>>>>> +
>>>>>> 
>>>>>> Personally, I'd prefer if we don't add another config knob and we just
>>>>>> make this the only way FDB works.  We could also document that the FDB
>>>>>> SB table should be deprecated.
>>>>>> 
>>>>> 
>>>>> If we want to make local FDB as default then, I think we need to handle overlay 
>>>>> use case as well. Probably, we might have to add a new stage in logical switch
>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>>>>> something similar.
>>>>> 
>>>>> Can we take this up in a separate patch ?
>>>>> 
>>>>> 
>>>> 
>>>> Given that we have a fix already for the original problem you were
>>>> trying to address I would prefer that we avoid adding new config knobs
>>>> and handle both the localnet and overlay cases at the same time.
>>>> 
>>> 
>>> Sure.
>>> 
>>>> OTOH, why is there a difference?
>>>> 
>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
>>>> learn() action) into MFF_LOG_INPORT:
>>>> 
>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
>>>> VXLAN that's not the case but that's already a documented limitation,
>>>> LOG_INPORT is not available after VXLAN tunneling therefore features
>>>> that need it (like egress ACLs matching against ingress port
>>>> identifiers) are not supported:
>>>> 
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
>>>> 
>>> 
>>> In case of overlay, ingress and egress pipelines are processed on different nodes
>>> assuming source and dest VMs are on different nodes. I think this makes local FDB
>>> to learn only local VMs MACs and it never learn remote VMs MACs.
>>> 
>>> Example: 
>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
>>> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
>>> 
>>> 
>> 
>> Ah, I see your point, you're right.  So it does look like we'd need a
>> dedicated fdb learning stage in the egress pipeline.  But that is
>> probably not that terrible.
>> 
>>>> One more thing we need to take care of in order to be able to make local
>>>> FDB the default is "FDB refresh":
>>>> 
>>>> 551527a5e68e ("controller: Update FDB timestamp")
>>>> 
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
>>>> 
>>> 
>>> Sure.
>>> 
> 
> Hi Naveen,
> 
> Just touching base on this again, I was curious if you're still planning
> to work on making the FDB table local; I still think it's beneficial in
> general.
> 
> Thanks,
> Dumitru


Hi Dumitru,

Yes, I will work on this. I will try to send out the patch by Jan 1st week.

Thanks,
Naveen
Dumitru Ceara Dec. 18, 2023, 3:23 p.m. UTC | #10
On 12/18/23 16:17, Naveen Yerramneni wrote:
> 
> 
>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 11/30/23 16:32, Dumitru Ceara wrote:
>>> On 11/30/23 15:54, Naveen Yerramneni wrote:
>>>>
>>>>
>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>
>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
>>>>>>
>>>>>>
>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>
>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>>>>>> This functionality can be enabled at the logical switch level:
>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>>>>>  functionality, it is disabled by default.
>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>>>>>  for locally learned fdb flows, default timeout is 300 secs.
>>>>>>>>
>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>>>>>  action=(commit_fdb_local(timeout=<timeout>); next;
>>>>>>>>
>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>>>>>        NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>>>>>>
>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>>>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>>>>>> the chassis and avoids MAC flapping.
>>>>>>>>
>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>>>>>> ---
>>>>>>>
>>>>>>> Hi Naveen,
>>>>>>>
>>>>>>> Thanks a lot for the patch!
>>>>>>>
>>>>>>> Just a note, we already have a fix for the east-west traffic that causes
>>>>>>> FDB flapping when localnet is used:
>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>>>>>>
>>>>>>> In general, however, I think it's a very good idea to move the FDB away
>>>>>>> from the Southbound and make it local to each hypervisor.  That reduces
>>>>>>> load on the Southbound among other things.
>>>>>>>
>>>>>>
>>>>>> Hi Dumitru,
>>>>>>
>>>>>> Thanks for informing about the patches.
>>>>>> Yes, local FDB reduces load on southbound.
>>>>>>
>>>>>>
>>>>>>>> include/ovn/actions.h |   7 +++
>>>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>>>>>> northd/northd.c       |  26 ++++++++++
>>>>>>>> ovn-nb.xml            |  14 ++++++
>>>>>>>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> utilities/ovn-trace.c |   2 +
>>>>>>>> 6 files changed, 251 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>>>>> index 49cfe0624..85ac92cd3 100644
>>>>>>>> --- a/include/ovn/actions.h
>>>>>>>> +++ b/include/ovn/actions.h
>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>>>>>   OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>>>>>   OVNACT(SAMPLE,            ovnact_sample)          \
>>>>>>>>   OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>>>>>>
>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>>>>>   uint16_t timeout;
>>>>>>>> };
>>>>>>>>
>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>>>>>> +struct ovnact_commit_fdb_local{
>>>>>>>> +    struct ovnact ovnact;
>>>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> /* Internal use by the helpers below. */
>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>>>>> index a73fe1a1e..f5aa78db1 100644
>>>>>>>> --- a/lib/actions.c
>>>>>>>> +++ b/lib/actions.c
>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>>>>>   ds_put_cstr(s, "mac_cache_use;");
>>>>>>>> }
>>>>>>>>
>>>>>>>> +static void
>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>>>>>> +{
>>>>>>>> +    uint16_t timeout = 0;
>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +    fdb_local->timeout = timeout;
>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>> +                      struct ds *s)
>>>>>>>> +{
>>>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>>>>>> +{
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>>>>>> +{
>>>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>>>>>> +    struct ofpact_learn_spec *ol_spec;
>>>>>>>> +    unsigned int imm_bytes;
>>>>>>>> +    uint8_t *src_imm;
>>>>>>>> +
>>>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>>>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>>>>>> +    ol->cookie = htonll(cookie);
>>>>>>>> +
>>>>>>>> +    /* Match on metadata of the packet that created the new table. */
>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>>>>>> +
>>>>>>>> +    /* Match on metadata of the packet. */
>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>>>>>> +
>>>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>>>>>> +}
>>>>>>>
>>>>>>> A difference from today's SB.FDB centralized approach is that when
>>>>>>> ovn-controller restarts these flows will be cleared, I think.
>>>>>>>
>>>>>>> Are we OK with that?  I think so but if not what are the options to
>>>>>>> avoid clearing the local fdb cache on restart?
>>>>>>>
>>>>>>
>>>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
>>>>>> During this time, packets gets flooded. I need to think about possible 
>>>>>> options if we want to retain FDB table.
>>>>>>
>>>>>> Can we take this up as an enhancement in a separate patch
>>>>>> once we identify a solution for this ?
>>>>>>
>>>>>
>>>>> Sounds good to me.
>>>>>
>>>>>>> Another difference with today's approach is that this avoids a
>>>>>>> controller action, that's great!
>>>>>>>
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>> +                      const struct ovnact_encode_params *ep,
>>>>>>>> +                      struct ofpbuf *ofpacts)
>>>>>>>> +{
>>>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static void
>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>>>>>                    const struct ovnact_encode_params *ep,
>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>>>>>       parse_sample(ctx);
>>>>>>>>   } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>>>>>       ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>>>>>   } else {
>>>>>>>>       lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>>>>   }
>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>>> index d1465ddf7..de18694a0 100644
>>>>>>>> --- a/northd/northd.c
>>>>>>>> +++ b/northd/northd.c
>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>>>>>   return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +static bool
>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>>>>>> +{
>>>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>
>>>>>>> Personally, I'd prefer if we don't add another config knob and we just
>>>>>>> make this the only way FDB works.  We could also document that the FDB
>>>>>>> SB table should be deprecated.
>>>>>>>
>>>>>>
>>>>>> If we want to make local FDB as default then, I think we need to handle overlay 
>>>>>> use case as well. Probably, we might have to add a new stage in logical switch
>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>>>>>> something similar.
>>>>>>
>>>>>> Can we take this up in a separate patch ?
>>>>>>
>>>>>>
>>>>>
>>>>> Given that we have a fix already for the original problem you were
>>>>> trying to address I would prefer that we avoid adding new config knobs
>>>>> and handle both the localnet and overlay cases at the same time.
>>>>>
>>>>
>>>> Sure.
>>>>
>>>>> OTOH, why is there a difference?
>>>>>
>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
>>>>> learn() action) into MFF_LOG_INPORT:
>>>>>
>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
>>>>> VXLAN that's not the case but that's already a documented limitation,
>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features
>>>>> that need it (like egress ACLs matching against ingress port
>>>>> identifiers) are not supported:
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
>>>>>
>>>>
>>>> In case of overlay, ingress and egress pipelines are processed on different nodes
>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB
>>>> to learn only local VMs MACs and it never learn remote VMs MACs.
>>>>
>>>> Example: 
>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
>>>> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
>>>>
>>>>
>>>
>>> Ah, I see your point, you're right.  So it does look like we'd need a
>>> dedicated fdb learning stage in the egress pipeline.  But that is
>>> probably not that terrible.
>>>
>>>>> One more thing we need to take care of in order to be able to make local
>>>>> FDB the default is "FDB refresh":
>>>>>
>>>>> 551527a5e68e ("controller: Update FDB timestamp")
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
>>>>>
>>>>
>>>> Sure.
>>>>
>>
>> Hi Naveen,
>>
>> Just touching base on this again, I was curious if you're still planning
>> to work on making the FDB table local; I still think it's beneficial in
>> general.
>>
>> Thanks,
>> Dumitru
> 
> 
> Hi Dumitru,
> 
> Yes, I will work on this. I will try to send out the patch by Jan 1st week.
> 

Awesome, thanks Naveen!  No rush though, I just wanted to confirm
whether you were interested in pursuing this.

Best regards,
Dumitru
Naveen Yerramneni March 6, 2024, 7:24 p.m. UTC | #11
> On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 12/18/23 16:17, Naveen Yerramneni wrote:
>> 
>> 
>>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> On 11/30/23 16:32, Dumitru Ceara wrote:
>>>> On 11/30/23 15:54, Naveen Yerramneni wrote:
>>>>> 
>>>>> 
>>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>> 
>>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>> 
>>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>>>>>>> This functionality can be enabled at the logical switch level:
>>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>>>>>> functionality, it is disabled by default.
>>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>>>>>> for locally learned fdb flows, default timeout is 300 secs.
>>>>>>>>> 
>>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next;
>>>>>>>>> 
>>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>>>>>>       NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>>>>>>> 
>>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>>>>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>>>>>>> the chassis and avoids MAC flapping.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>>>>>>> ---
>>>>>>>> 
>>>>>>>> Hi Naveen,
>>>>>>>> 
>>>>>>>> Thanks a lot for the patch!
>>>>>>>> 
>>>>>>>> Just a note, we already have a fix for the east-west traffic that causes
>>>>>>>> FDB flapping when localnet is used:
>>>>>>>> 
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>>>>>>> 
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>>>>>>> 
>>>>>>>> In general, however, I think it's a very good idea to move the FDB away
>>>>>>>> from the Southbound and make it local to each hypervisor.  That reduces
>>>>>>>> load on the Southbound among other things.
>>>>>>>> 
>>>>>>> 
>>>>>>> Hi Dumitru,
>>>>>>> 
>>>>>>> Thanks for informing about the patches.
>>>>>>> Yes, local FDB reduces load on southbound.
>>>>>>> 
>>>>>>> 
>>>>>>>>> include/ovn/actions.h |   7 +++
>>>>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>>>>>>> northd/northd.c       |  26 ++++++++++
>>>>>>>>> ovn-nb.xml            |  14 ++++++
>>>>>>>>> tests/ovn.at          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> utilities/ovn-trace.c |   2 +
>>>>>>>>> 6 files changed, 251 insertions(+)
>>>>>>>>> 
>>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>>>>>> index 49cfe0624..85ac92cd3 100644
>>>>>>>>> --- a/include/ovn/actions.h
>>>>>>>>> +++ b/include/ovn/actions.h
>>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>>>>>>  OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>>>>>>  OVNACT(SAMPLE,            ovnact_sample)          \
>>>>>>>>>  OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>>>>>>> 
>>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>>>>>>  uint16_t timeout;
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>>>>>>> +struct ovnact_commit_fdb_local{
>>>>>>>>> +    struct ovnact ovnact;
>>>>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> /* Internal use by the helpers below. */
>>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>>>>>> index a73fe1a1e..f5aa78db1 100644
>>>>>>>>> --- a/lib/actions.c
>>>>>>>>> +++ b/lib/actions.c
>>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>>>>>>  ds_put_cstr(s, "mac_cache_use;");
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> +static void
>>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>>>>>>> +{
>>>>>>>>> +    uint16_t timeout = 0;
>>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>>> +        return;
>>>>>>>>> +    }
>>>>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>>> +        return;
>>>>>>>>> +    }
>>>>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>>>>>>> +        return;
>>>>>>>>> +    }
>>>>>>>>> +    fdb_local->timeout = timeout;
>>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void
>>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>> +                      struct ds *s)
>>>>>>>>> +{
>>>>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void
>>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>>>>>>> +{
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void
>>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>>>>>>> +{
>>>>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>>>>>>> +    struct ofpact_learn_spec *ol_spec;
>>>>>>>>> +    unsigned int imm_bytes;
>>>>>>>>> +    uint8_t *src_imm;
>>>>>>>>> +
>>>>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>>>>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>>>>>>> +    ol->cookie = htonll(cookie);
>>>>>>>>> +
>>>>>>>>> +    /* Match on metadata of the packet that created the new table. */
>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>>>>>>> +
>>>>>>>>> +    /* Match on metadata of the packet. */
>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>>>>>>> +
>>>>>>>>> +
>>>>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>>>>>>> +
>>>>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>>>>>>> +}
>>>>>>>> 
>>>>>>>> A difference from today's SB.FDB centralized approach is that when
>>>>>>>> ovn-controller restarts these flows will be cleared, I think.
>>>>>>>> 
>>>>>>>> Are we OK with that?  I think so but if not what are the options to
>>>>>>>> avoid clearing the local fdb cache on restart?
>>>>>>>> 
>>>>>>> 
>>>>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
>>>>>>> During this time, packets gets flooded. I need to think about possible 
>>>>>>> options if we want to retain FDB table.
>>>>>>> 
>>>>>>> Can we take this up as an enhancement in a separate patch
>>>>>>> once we identify a solution for this ?
>>>>>>> 
>>>>>> 
>>>>>> Sounds good to me.
>>>>>> 
>>>>>>>> Another difference with today's approach is that this avoids a
>>>>>>>> controller action, that's great!
>>>>>>>> 
>>>>>>>>> +
>>>>>>>>> +static void
>>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>> +                      const struct ovnact_encode_params *ep,
>>>>>>>>> +                      struct ofpbuf *ofpacts)
>>>>>>>>> +{
>>>>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> static void
>>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>>>>>>                   const struct ovnact_encode_params *ep,
>>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>>>>>>      parse_sample(ctx);
>>>>>>>>>  } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>>>>>>      ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>>>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>>>>>>  } else {
>>>>>>>>>      lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>>>>>  }
>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>>>> index d1465ddf7..de18694a0 100644
>>>>>>>>> --- a/northd/northd.c
>>>>>>>>> +++ b/northd/northd.c
>>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>>>>>>  return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> +static bool
>>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>>>>>>> +{
>>>>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>> 
>>>>>>>> Personally, I'd prefer if we don't add another config knob and we just
>>>>>>>> make this the only way FDB works.  We could also document that the FDB
>>>>>>>> SB table should be deprecated.
>>>>>>>> 
>>>>>>> 
>>>>>>> If we want to make local FDB as default then, I think we need to handle overlay 
>>>>>>> use case as well. Probably, we might have to add a new stage in logical switch
>>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>>>>>>> something similar.
>>>>>>> 
>>>>>>> Can we take this up in a separate patch ?
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> Given that we have a fix already for the original problem you were
>>>>>> trying to address I would prefer that we avoid adding new config knobs
>>>>>> and handle both the localnet and overlay cases at the same time.
>>>>>> 
>>>>> 
>>>>> Sure.
>>>>> 
>>>>>> OTOH, why is there a difference?
>>>>>> 
>>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
>>>>>> learn() action) into MFF_LOG_INPORT:
>>>>>> 
>>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
>>>>>> VXLAN that's not the case but that's already a documented limitation,
>>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features
>>>>>> that need it (like egress ACLs matching against ingress port
>>>>>> identifiers) are not supported:
>>>>>> 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
>>>>>> 
>>>>> 
>>>>> In case of overlay, ingress and egress pipelines are processed on different nodes
>>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB
>>>>> to learn only local VMs MACs and it never learn remote VMs MACs.
>>>>> 
>>>>> Example: 
>>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
>>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
>>>>> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
>>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
>>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
>>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
>>>>> 
>>>>> 
>>>> 
>>>> Ah, I see your point, you're right.  So it does look like we'd need a
>>>> dedicated fdb learning stage in the egress pipeline.  But that is
>>>> probably not that terrible.
>>>> 


Hi Dumitru,

I started working on this patch, sorry for the delay.

On further investigation, I realised that we cannot permanently remove FDB table from southbound even
if we add new FDB leaning stage in egress pipeline.

Following is the example:

Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to 
subnet-1 (LS1) and is running on hv1  and VM2 isconnected to subnet-2 (LS2) and is running
on hv2. Both subnets are connected to same logical router(LR).

When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below.
  - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets
    flooded in LS2.
  - ICMP Req on HV2: LS2-out.
  - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in.  HV2 learns VM2 MAC, packet gets
    flooded in LS1.
  - ICMP Resp on HV1: LS1-out.
  - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC.


I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled
(or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and
explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR
have ports with unknown addr configured.

Please let me know your thoughts.

Thanks,
Naveen

>>>>>> One more thing we need to take care of in order to be able to make local
>>>>>> FDB the default is "FDB refresh":
>>>>>> 
>>>>>> 551527a5e68e ("controller: Update FDB timestamp")
>>>>>> 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
>>>>>> 
>>>>> 
>>>>> Sure.
>>>>> 
>>> 
>>> Hi Naveen,
>>> 
>>> Just touching base on this again, I was curious if you're still planning
>>> to work on making the FDB table local; I still think it's beneficial in
>>> general.
>>> 
>>> Thanks,
>>> Dumitru
>> 
>> 
>> Hi Dumitru,
>> 
>> Yes, I will work on this. I will try to send out the patch by Jan 1st week.
>> 
> 
> Awesome, thanks Naveen!  No rush though, I just wanted to confirm
> whether you were interested in pursuing this.
> 
> Best regards,
> Dumitru
> 
>
Ales Musil March 8, 2024, 9:07 a.m. UTC | #12
On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni <
naveen.yerramneni@nutanix.com> wrote:

>
>
> > On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> >
> > On 12/18/23 16:17, Naveen Yerramneni wrote:
> >>
> >>
> >>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> >>>
> >>> On 11/30/23 16:32, Dumitru Ceara wrote:
> >>>> On 11/30/23 15:54, Naveen Yerramneni wrote:
> >>>>>
> >>>>>
> >>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com>
> wrote:
> >>>>>>
> >>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com>
> wrote:
> >>>>>>>>
> >>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
> >>>>>>>>> This functionality can be enabled at the logical switch level:
> >>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this
> >>>>>>>>> functionality, it is disabled by default.
> >>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
> >>>>>>>>> for locally learned fdb flows, default timeout is 300 secs.
> >>>>>>>>>
> >>>>>>>>> If enabled, below lflow is added for each port that has unknown
> addr set.
> >>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport ==
> <in_port>),
> >>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next;
> >>>>>>>>>
> >>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS
> action.
> >>>>>>>>> -
> learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
> >>>>>>>>>
>  NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
> >>>>>>>>>
> >>>>>>>>> This is useful when OVN is managing VLAN network that has
> multiple ports
> >>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With
> this config,
> >>>>>>>>> if there is east-west traffic flowing between VMs part of same
> VLAN
> >>>>>>>>> deployed on different hypervisors then, MAC addrs of the source
> and
> >>>>>>>>> destination VMs keeps flapping between VM port and localnet port
> in
> >>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table
> local to
> >>>>>>>>> the chassis and avoids MAC flapping.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
> >>>>>>>>> ---
> >>>>>>>>
> >>>>>>>> Hi Naveen,
> >>>>>>>>
> >>>>>>>> Thanks a lot for the patch!
> >>>>>>>>
> >>>>>>>> Just a note, we already have a fix for the east-west traffic that
> causes
> >>>>>>>> FDB flapping when localnet is used:
> >>>>>>>>
> >>>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e=
> >>>>>>>>
> >>>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e=
> >>>>>>>>
> >>>>>>>> In general, however, I think it's a very good idea to move the
> FDB away
> >>>>>>>> from the Southbound and make it local to each hypervisor.  That
> reduces
> >>>>>>>> load on the Southbound among other things.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi Dumitru,
> >>>>>>>
> >>>>>>> Thanks for informing about the patches.
> >>>>>>> Yes, local FDB reduces load on southbound.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> include/ovn/actions.h |   7 +++
> >>>>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
> >>>>>>>>> northd/northd.c       |  26 ++++++++++
> >>>>>>>>> ovn-nb.xml            |  14 ++++++
> >>>>>>>>> tests/ovn.at          | 108
> ++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>> utilities/ovn-trace.c |   2 +
> >>>>>>>>> 6 files changed, 251 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> >>>>>>>>> index 49cfe0624..85ac92cd3 100644
> >>>>>>>>> --- a/include/ovn/actions.h
> >>>>>>>>> +++ b/include/ovn/actions.h
> >>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
> >>>>>>>>>  OVNACT(CHK_LB_AFF,        ovnact_result)          \
> >>>>>>>>>  OVNACT(SAMPLE,            ovnact_sample)          \
> >>>>>>>>>  OVNACT(MAC_CACHE_USE,     ovnact_null)            \
> >>>>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
> >>>>>>>>>
> >>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each
> action. */
> >>>>>>>>> enum OVS_PACKED_ENUM ovnact_type {
> >>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
> >>>>>>>>>  uint16_t timeout;
> >>>>>>>>> };
> >>>>>>>>>
> >>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
> >>>>>>>>> +struct ovnact_commit_fdb_local{
> >>>>>>>>> +    struct ovnact ovnact;
> >>>>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> /* Internal use by the helpers below. */
> >>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
> >>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
> >>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c
> >>>>>>>>> index a73fe1a1e..f5aa78db1 100644
> >>>>>>>>> --- a/lib/actions.c
> >>>>>>>>> +++ b/lib/actions.c
> >>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct
> ovnact_null *null OVS_UNUSED, struct ds *s)
> >>>>>>>>>  ds_put_cstr(s, "mac_cache_use;");
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> +static void
> >>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
> >>>>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
> >>>>>>>>> +{
> >>>>>>>>> +    uint16_t timeout = 0;
> >>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
> >>>>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
> >>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
> >>>>>>>>> +        return;
> >>>>>>>>> +    }
> >>>>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
> >>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
> >>>>>>>>> +        return;
> >>>>>>>>> +    }
> >>>>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow
> timeout")) {
> >>>>>>>>> +        return;
> >>>>>>>>> +    }
> >>>>>>>>> +    fdb_local->timeout = timeout;
> >>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local
> *fdb_local,
> >>>>>>>>> +                      struct ds *s)
> >>>>>>>>> +{
> >>>>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);",
> fdb_local->timeout);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local
> *fdb_local OVS_UNUSED)
> >>>>>>>>> +{
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local
> *fdb_local,
> >>>>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
> >>>>>>>>> +{
> >>>>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
> >>>>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
> >>>>>>>>> +    struct ofpact_learn_spec *ol_spec;
> >>>>>>>>> +    unsigned int imm_bytes;
> >>>>>>>>> +    uint8_t *src_imm;
> >>>>>>>>> +
> >>>>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
> >>>>>>>>> +    ol->idle_timeout = fdb_local->timeout;
> >>>>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
> >>>>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
> >>>>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
> >>>>>>>>> +    ol->cookie = htonll(cookie);
> >>>>>>>>> +
> >>>>>>>>> +    /* Match on metadata of the packet that created the new
> table. */
> >>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
> >>>>>>>>> +    ol_spec->dst.ofs = 0;
> >>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
> >>>>>>>>> +
> >>>>>>>>> +    /* Match on metadata of the packet. */
> >>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
> >>>>>>>>> +    ol_spec->dst.ofs = 0;
> >>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
> >>>>>>>>> +
> >>>>>>>>> +
> >>>>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
> >>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
> >>>>>>>>> +    ol_spec->dst.ofs = 0;
> >>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
> >>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
> >>>>>>>>> +
> >>>>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
> >>>>>>>>> +}
> >>>>>>>>
> >>>>>>>> A difference from today's SB.FDB centralized approach is that when
> >>>>>>>> ovn-controller restarts these flows will be cleared, I think.
> >>>>>>>>
> >>>>>>>> Are we OK with that?  I think so but if not what are the options
> to
> >>>>>>>> avoid clearing the local fdb cache on restart?
> >>>>>>>>
> >>>>>>>
> >>>>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
> >>>>>>> During this time, packets gets flooded. I need to think about
> possible
> >>>>>>> options if we want to retain FDB table.
> >>>>>>>
> >>>>>>> Can we take this up as an enhancement in a separate patch
> >>>>>>> once we identify a solution for this ?
> >>>>>>>
> >>>>>>
> >>>>>> Sounds good to me.
> >>>>>>
> >>>>>>>> Another difference with today's approach is that this avoids a
> >>>>>>>> controller action, that's great!
> >>>>>>>>
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local
> *fdb_local,
> >>>>>>>>> +                      const struct ovnact_encode_params *ep,
> >>>>>>>>> +                      struct ofpbuf *ofpacts)
> >>>>>>>>> +{
> >>>>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts,
> ep->lflow_uuid.parts[0]);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> static void
> >>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
> >>>>>>>>>                   const struct ovnact_encode_params *ep,
> >>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
> >>>>>>>>>      parse_sample(ctx);
> >>>>>>>>>  } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
> >>>>>>>>>      ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
> >>>>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
> >>>>>>>>> +        parse_commit_fdb_local(ctx,
> ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
> >>>>>>>>>  } else {
> >>>>>>>>>      lexer_syntax_error(ctx->lexer, "expecting action");
> >>>>>>>>>  }
> >>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
> >>>>>>>>> index d1465ddf7..de18694a0 100644
> >>>>>>>>> --- a/northd/northd.c
> >>>>>>>>> +++ b/northd/northd.c
> >>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct
> nbrec_logical_switch_port *nbsp)
> >>>>>>>>>  return smap_get_bool(&nbsp->options, "localnet_learn_fdb",
> false);
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> +static bool
> >>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
> >>>>>>>>> +{
> >>>>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local",
> false);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>
> >>>>>>>> Personally, I'd prefer if we don't add another config knob and we
> just
> >>>>>>>> make this the only way FDB works.  We could also document that
> the FDB
> >>>>>>>> SB table should be deprecated.
> >>>>>>>>
> >>>>>>>
> >>>>>>> If we want to make local FDB as default then, I think we need to
> handle overlay
> >>>>>>> use case as well. Probably, we might have to add a new stage in
> logical switch
> >>>>>>> egress pipeline to learn FDB entries for packets coming over
> tunnel (or)
> >>>>>>> something similar.
> >>>>>>>
> >>>>>>> Can we take this up in a separate patch ?
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> Given that we have a fix already for the original problem you were
> >>>>>> trying to address I would prefer that we avoid adding new config
> knobs
> >>>>>> and handle both the localnet and overlay cases at the same time.
> >>>>>>
> >>>>>
> >>>>> Sure.
> >>>>>
> >>>>>> OTOH, why is there a difference?
> >>>>>>
> >>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that
> triggers the
> >>>>>> learn() action) into MFF_LOG_INPORT:
> >>>>>>
> >>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.
> With
> >>>>>> VXLAN that's not the case but that's already a documented
> limitation,
> >>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features
> >>>>>> that need it (like egress ACLs matching against ingress port
> >>>>>> identifiers) are not supported:
> >>>>>>
> >>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e=
> >>>>>>
> >>>>>
> >>>>> In case of overlay, ingress and egress pipelines are processed on
> different nodes
> >>>>> assuming source and dest VMs are on different nodes. I think this
> makes local FDB
> >>>>> to learn only local VMs MACs and it never learn remote VMs MACs.
> >>>>>
> >>>>> Example:
> >>>>> 2 VMs with unknown addr set are connected to same logical switch
> where vm1 is on hv1
> >>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2,
> ingress pipeline happens
> >>>>> on hv1 and it learns vm1-mac and  this packet gets flooded since
> vm2-mac is not yet learned.
> >>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on
> hv2. Now, when vm2 is
> >>>>> responding back, ingress pipeline happens on hv2 and it learns
> vm2-mac and this packet also
> >>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet
> is received.
> >>>>>
> >>>>>
> >>>>
> >>>> Ah, I see your point, you're right.  So it does look like we'd need a
> >>>> dedicated fdb learning stage in the egress pipeline.  But that is
> >>>> probably not that terrible.
> >>>>
>
>
> Hi Dumitru,
>
> I started working on this patch, sorry for the delay.
>
> On further investigation, I realised that we cannot permanently remove FDB
> table from southbound even
> if we add new FDB leaning stage in egress pipeline.
>
> Following is the example:
>
> Two VMs part of different overlay subnets have unknown addr set whereVM1
> is connected to
> subnet-1 (LS1) and is running on hv1  and VM2 isconnected to subnet-2
> (LS2) and is running
> on hv2. Both subnets are connected to same logical router(LR).
>
> When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as
> below.
>   - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1
> MAC, packet gets
>     flooded in LS2.
>   - ICMP Req on HV2: LS2-out.
>   - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in.  HV2 learns VM2
> MAC, packet gets
>     flooded in LS1.
>   - ICMP Resp on HV1: LS1-out.
>   - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never
> learns VM1 MAC.
>
>
> I am thinking either we can make FDB local implementation only applicable
> for VLAN subnets with default disabled
> (or) implement is for both VLAN and overlay subnets with default disabled
> by documenting the limitation and
> explore the possibility of ignoring this setting in northd when multiple
> overlay subnets connected to same LR
> have ports with unknown addr configured.
>
> Please let me know your thoughts.
>
> Thanks,
> Naveen
>
> >>>>>> One more thing we need to take care of in order to be able to make
> local
> >>>>>> FDB the default is "FDB refresh":
> >>>>>>
> >>>>>> 551527a5e68e ("controller: Update FDB timestamp")
> >>>>>>
> >>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e=
> >>>>>>
> >>>>>
> >>>>> Sure.
> >>>>>
> >>>
> >>> Hi Naveen,
> >>>
> >>> Just touching base on this again, I was curious if you're still
> planning
> >>> to work on making the FDB table local; I still think it's beneficial in
> >>> general.
> >>>
> >>> Thanks,
> >>> Dumitru
> >>
> >>
> >> Hi Dumitru,
> >>
> >> Yes, I will work on this. I will try to send out the patch by Jan 1st
> week.
> >>
> >
> > Awesome, thanks Naveen!  No rush though, I just wanted to confirm
> > whether you were interested in pursuing this.
> >
> > Best regards,
> > Dumitru
> >
> >
>
>
Hi Naveen,

I did run into a similar issue with MAC binding, with distributed routers
there is currently no other way to share the MAC binding. That made me
wonder if it's worth pursuing, as the complexity of the code would
significantly increase having half in-memory, half DB solution.
With the aging option the DB size shouldn't be that problematic anymore not
sure if there are any other concerns with the current implementation.

From my point of view this might be a dead end, however others might have
slightly different opinions.

Thanks,
Ales
Naveen Yerramneni March 14, 2024, 2:21 p.m. UTC | #13
> On 08-Mar-2024, at 2:37 PM, Ales Musil <amusil@redhat.com> wrote:
> 
> 
> 
> On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni <naveen.yerramneni@nutanix.com> wrote:
> 
> 
> > On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> > 
> > On 12/18/23 16:17, Naveen Yerramneni wrote:
> >> 
> >> 
> >>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> >>> 
> >>> On 11/30/23 16:32, Dumitru Ceara wrote:
> >>>> On 11/30/23 15:54, Naveen Yerramneni wrote:
> >>>>> 
> >>>>> 
> >>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> >>>>>> 
> >>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> >>>>>>>> 
> >>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
> >>>>>>>>> This functionality can be enabled at the logical switch level:
> >>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this
> >>>>>>>>> functionality, it is disabled by default.
> >>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
> >>>>>>>>> for locally learned fdb flows, default timeout is 300 secs.
> >>>>>>>>> 
> >>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set.
> >>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
> >>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next;
> >>>>>>>>> 
> >>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
> >>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
> >>>>>>>>>       NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
> >>>>>>>>> 
> >>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports
> >>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
> >>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN
> >>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and
> >>>>>>>>> destination VMs keeps flapping between VM port and localnet port in 
> >>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
> >>>>>>>>> the chassis and avoids MAC flapping.
> >>>>>>>>> 
> >>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
> >>>>>>>>> ---
> >>>>>>>> 
> >>>>>>>> Hi Naveen,
> >>>>>>>> 
> >>>>>>>> Thanks a lot for the patch!
> >>>>>>>> 
> >>>>>>>> Just a note, we already have a fix for the east-west traffic that causes
> >>>>>>>> FDB flapping when localnet is used:
> >>>>>>>> 
> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
> >>>>>>>> 
> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
> >>>>>>>> 
> >>>>>>>> In general, however, I think it's a very good idea to move the FDB away
> >>>>>>>> from the Southbound and make it local to each hypervisor.  That reduces
> >>>>>>>> load on the Southbound among other things.
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> Hi Dumitru,
> >>>>>>> 
> >>>>>>> Thanks for informing about the patches.
> >>>>>>> Yes, local FDB reduces load on southbound.
> >>>>>>> 
> >>>>>>> 
> >>>>>>>>> include/ovn/actions.h |   7 +++
> >>>>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
> >>>>>>>>> northd/northd.c       |  26 ++++++++++
> >>>>>>>>> ovn-nb.xml            |  14 ++++++
> >>>>>>>>> tests/ovn.at [ovn.at]          | 108 ++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>> utilities/ovn-trace.c |   2 +
> >>>>>>>>> 6 files changed, 251 insertions(+)
> >>>>>>>>> 
> >>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> >>>>>>>>> index 49cfe0624..85ac92cd3 100644
> >>>>>>>>> --- a/include/ovn/actions.h
> >>>>>>>>> +++ b/include/ovn/actions.h
> >>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
> >>>>>>>>>  OVNACT(CHK_LB_AFF,        ovnact_result)          \
> >>>>>>>>>  OVNACT(SAMPLE,            ovnact_sample)          \
> >>>>>>>>>  OVNACT(MAC_CACHE_USE,     ovnact_null)            \
> >>>>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
> >>>>>>>>> 
> >>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
> >>>>>>>>> enum OVS_PACKED_ENUM ovnact_type {
> >>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
> >>>>>>>>>  uint16_t timeout;
> >>>>>>>>> };
> >>>>>>>>> 
> >>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
> >>>>>>>>> +struct ovnact_commit_fdb_local{
> >>>>>>>>> +    struct ovnact ovnact;
> >>>>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> /* Internal use by the helpers below. */
> >>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
> >>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
> >>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c
> >>>>>>>>> index a73fe1a1e..f5aa78db1 100644
> >>>>>>>>> --- a/lib/actions.c
> >>>>>>>>> +++ b/lib/actions.c
> >>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
> >>>>>>>>>  ds_put_cstr(s, "mac_cache_use;");
> >>>>>>>>> }
> >>>>>>>>> 
> >>>>>>>>> +static void
> >>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
> >>>>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
> >>>>>>>>> +{
> >>>>>>>>> +    uint16_t timeout = 0;
> >>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
> >>>>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
> >>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
> >>>>>>>>> +        return;
> >>>>>>>>> +    }
> >>>>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
> >>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
> >>>>>>>>> +        return;
> >>>>>>>>> +    }
> >>>>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
> >>>>>>>>> +        return;
> >>>>>>>>> +    }
> >>>>>>>>> +    fdb_local->timeout = timeout;
> >>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
> >>>>>>>>> +                      struct ds *s)
> >>>>>>>>> +{
> >>>>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
> >>>>>>>>> +{
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
> >>>>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
> >>>>>>>>> +{
> >>>>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
> >>>>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
> >>>>>>>>> +    struct ofpact_learn_spec *ol_spec;
> >>>>>>>>> +    unsigned int imm_bytes;
> >>>>>>>>> +    uint8_t *src_imm;
> >>>>>>>>> +
> >>>>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
> >>>>>>>>> +    ol->idle_timeout = fdb_local->timeout;
> >>>>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
> >>>>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
> >>>>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
> >>>>>>>>> +    ol->cookie = htonll(cookie);
> >>>>>>>>> +
> >>>>>>>>> +    /* Match on metadata of the packet that created the new table. */
> >>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
> >>>>>>>>> +    ol_spec->dst.ofs = 0;
> >>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
> >>>>>>>>> +
> >>>>>>>>> +    /* Match on metadata of the packet. */
> >>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
> >>>>>>>>> +    ol_spec->dst.ofs = 0;
> >>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
> >>>>>>>>> +
> >>>>>>>>> +
> >>>>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
> >>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
> >>>>>>>>> +    ol_spec->dst.ofs = 0;
> >>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
> >>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
> >>>>>>>>> +
> >>>>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
> >>>>>>>>> +}
> >>>>>>>> 
> >>>>>>>> A difference from today's SB.FDB centralized approach is that when
> >>>>>>>> ovn-controller restarts these flows will be cleared, I think.
> >>>>>>>> 
> >>>>>>>> Are we OK with that?  I think so but if not what are the options to
> >>>>>>>> avoid clearing the local fdb cache on restart?
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
> >>>>>>> During this time, packets gets flooded. I need to think about possible 
> >>>>>>> options if we want to retain FDB table.
> >>>>>>> 
> >>>>>>> Can we take this up as an enhancement in a separate patch
> >>>>>>> once we identify a solution for this ?
> >>>>>>> 
> >>>>>> 
> >>>>>> Sounds good to me.
> >>>>>> 
> >>>>>>>> Another difference with today's approach is that this avoids a
> >>>>>>>> controller action, that's great!
> >>>>>>>> 
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
> >>>>>>>>> +                      const struct ovnact_encode_params *ep,
> >>>>>>>>> +                      struct ofpbuf *ofpacts)
> >>>>>>>>> +{
> >>>>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> static void
> >>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
> >>>>>>>>>                   const struct ovnact_encode_params *ep,
> >>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
> >>>>>>>>>      parse_sample(ctx);
> >>>>>>>>>  } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
> >>>>>>>>>      ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
> >>>>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
> >>>>>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
> >>>>>>>>>  } else {
> >>>>>>>>>      lexer_syntax_error(ctx->lexer, "expecting action");
> >>>>>>>>>  }
> >>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
> >>>>>>>>> index d1465ddf7..de18694a0 100644
> >>>>>>>>> --- a/northd/northd.c
> >>>>>>>>> +++ b/northd/northd.c
> >>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
> >>>>>>>>>  return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
> >>>>>>>>> }
> >>>>>>>>> 
> >>>>>>>>> +static bool
> >>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
> >>>>>>>>> +{
> >>>>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>> 
> >>>>>>>> Personally, I'd prefer if we don't add another config knob and we just
> >>>>>>>> make this the only way FDB works.  We could also document that the FDB
> >>>>>>>> SB table should be deprecated.
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> If we want to make local FDB as default then, I think we need to handle overlay 
> >>>>>>> use case as well. Probably, we might have to add a new stage in logical switch
> >>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
> >>>>>>> something similar.
> >>>>>>> 
> >>>>>>> Can we take this up in a separate patch ?
> >>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> Given that we have a fix already for the original problem you were
> >>>>>> trying to address I would prefer that we avoid adding new config knobs
> >>>>>> and handle both the localnet and overlay cases at the same time.
> >>>>>> 
> >>>>> 
> >>>>> Sure.
> >>>>> 
> >>>>>> OTOH, why is there a difference?
> >>>>>> 
> >>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
> >>>>>> learn() action) into MFF_LOG_INPORT:
> >>>>>> 
> >>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
> >>>>>> VXLAN that's not the case but that's already a documented limitation,
> >>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features
> >>>>>> that need it (like egress ACLs matching against ingress port
> >>>>>> identifiers) are not supported:
> >>>>>> 
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
> >>>>>> 
> >>>>> 
> >>>>> In case of overlay, ingress and egress pipelines are processed on different nodes
> >>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB
> >>>>> to learn only local VMs MACs and it never learn remote VMs MACs.
> >>>>> 
> >>>>> Example: 
> >>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
> >>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
> >>>>> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
> >>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
> >>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
> >>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
> >>>>> 
> >>>>> 
> >>>> 
> >>>> Ah, I see your point, you're right.  So it does look like we'd need a
> >>>> dedicated fdb learning stage in the egress pipeline.  But that is
> >>>> probably not that terrible.
> >>>> 
> 
> 
> Hi Dumitru,
> 
> I started working on this patch, sorry for the delay.
> 
> On further investigation, I realised that we cannot permanently remove FDB table from southbound even
> if we add new FDB leaning stage in egress pipeline.
> 
> Following is the example:
> 
> Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to 
> subnet-1 (LS1) and is running on hv1  and VM2 isconnected to subnet-2 (LS2) and is running
> on hv2. Both subnets are connected to same logical router(LR).
> 
> When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below.
>   - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets
>     flooded in LS2.
>   - ICMP Req on HV2: LS2-out.
>   - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in.  HV2 learns VM2 MAC, packet gets
>     flooded in LS1.
>   - ICMP Resp on HV1: LS1-out.
>   - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC.
> 
> 
> I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled
> (or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and
> explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR
> have ports with unknown addr configured.
> 
> Please let me know your thoughts.
> 
> Thanks,
> Naveen
> 
> >>>>>> One more thing we need to take care of in order to be able to make local
> >>>>>> FDB the default is "FDB refresh":
> >>>>>> 
> >>>>>> 551527a5e68e ("controller: Update FDB timestamp")
> >>>>>> 
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
> >>>>>> 
> >>>>> 
> >>>>> Sure.
> >>>>> 
> >>> 
> >>> Hi Naveen,
> >>> 
> >>> Just touching base on this again, I was curious if you're still planning
> >>> to work on making the FDB table local; I still think it's beneficial in
> >>> general.
> >>> 
> >>> Thanks,
> >>> Dumitru
> >> 
> >> 
> >> Hi Dumitru,
> >> 
> >> Yes, I will work on this. I will try to send out the patch by Jan 1st week.
> >> 
> > 
> > Awesome, thanks Naveen!  No rush though, I just wanted to confirm
> > whether you were interested in pursuing this.
> > 
> > Best regards,
> > Dumitru
> > 
> > 
> 
> 
> Hi Naveen,
> 
> I did run into a similar issue with MAC binding, with distributed routers there is currently no other way to share the MAC binding. That made me wonder if it's worth pursuing, as the complexity of the code would significantly increase having half in-memory, half DB solution.
> With the aging option the DB size shouldn't be that problematic anymore not sure if there are any other concerns with the current implementation. 
> 
> From my point of view this might be a dead end, however others might have slightly different opinions.
> 
> Thanks,
> Ales
> 
> -- 
> Ales Musil
> Senior Software Engineer - OVN Core
> Red Hat EMEA [redhat.com]
> amusil@redhat.com 
>   [red.ht]


Hi Ales,

Thanks for the reply.

I definitely agree with you on the additional code complexity that we have to deal with when we have both implementations.
I think this FDB local helps to reduce the overhead on SB in case of VLAN networks depending on the size of the network. 

Also, one more issue we have noticed with FDB  table when fdb learning is enabled on localnet port:
   -  If underlay TOR switch floods some packets then, DB conflicts (OVSDB reports constraint violation errors) are occuring
      when controllers are trying to insert entries to FDB table. This is happening because when underlay TOR switch floods
      a packet then, all controllers receives packets at the same time and try to install FDB entry at the same time. Only first one
      succeeds and others report error. This cause controllers to recompute all the flows from SB. On scale networks, this is making
      OVN controllers busy.

If we are ok to support FDB local implementation only for VLAN subnets then, I can look at possible options to reduce the code complexity.
Any other suggestions are welcome.


Thanks,
Naveen
Dumitru Ceara March 14, 2024, 3:37 p.m. UTC | #14
On 3/14/24 15:21, Naveen Yerramneni wrote:
> 
> 
>> On 08-Mar-2024, at 2:37 PM, Ales Musil <amusil@redhat.com> wrote:
>>
>>
>>
>> On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni <naveen.yerramneni@nutanix.com> wrote:
>>
>>
>>> On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>
>>> On 12/18/23 16:17, Naveen Yerramneni wrote:
>>>>
>>>>
>>>>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>
>>>>> On 11/30/23 16:32, Dumitru Ceara wrote:
>>>>>> On 11/30/23 15:54, Naveen Yerramneni wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>>>>>>>>> This functionality can be enabled at the logical switch level:
>>>>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>>>>>>>> functionality, it is disabled by default.
>>>>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>>>>>>>> for locally learned fdb flows, default timeout is 300 secs.
>>>>>>>>>>>
>>>>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next;
>>>>>>>>>>>
>>>>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>>>>>>>>       NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>>>>>>>>>
>>>>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>>>>>>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>>>>>>>>> the chassis and avoids MAC flapping.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> Hi Naveen,
>>>>>>>>>>
>>>>>>>>>> Thanks a lot for the patch!
>>>>>>>>>>
>>>>>>>>>> Just a note, we already have a fix for the east-west traffic that causes
>>>>>>>>>> FDB flapping when localnet is used:
>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>>>>>>>>>
>>>>>>>>>> In general, however, I think it's a very good idea to move the FDB away
>>>>>>>>>> from the Southbound and make it local to each hypervisor.  That reduces
>>>>>>>>>> load on the Southbound among other things.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Dumitru,
>>>>>>>>>
>>>>>>>>> Thanks for informing about the patches.
>>>>>>>>> Yes, local FDB reduces load on southbound.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> include/ovn/actions.h |   7 +++
>>>>>>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>> northd/northd.c       |  26 ++++++++++
>>>>>>>>>>> ovn-nb.xml            |  14 ++++++
>>>>>>>>>>> tests/ovn.at [ovn.at]          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>> utilities/ovn-trace.c |   2 +
>>>>>>>>>>> 6 files changed, 251 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>>>>>>>> index 49cfe0624..85ac92cd3 100644
>>>>>>>>>>> --- a/include/ovn/actions.h
>>>>>>>>>>> +++ b/include/ovn/actions.h
>>>>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>>>>>>>>  OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>>>>>>>>  OVNACT(SAMPLE,            ovnact_sample)          \
>>>>>>>>>>>  OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>>>>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>>>>>>>>>
>>>>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>>>>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>>>>>>>>  uint16_t timeout;
>>>>>>>>>>> };
>>>>>>>>>>>
>>>>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>>>>>>>>> +struct ovnact_commit_fdb_local{
>>>>>>>>>>> +    struct ovnact ovnact;
>>>>>>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>>>>>>>>> +};
>>>>>>>>>>> +
>>>>>>>>>>> /* Internal use by the helpers below. */
>>>>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>>>>>>>> index a73fe1a1e..f5aa78db1 100644
>>>>>>>>>>> --- a/lib/actions.c
>>>>>>>>>>> +++ b/lib/actions.c
>>>>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>>>>>>>>  ds_put_cstr(s, "mac_cache_use;");
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> +static void
>>>>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>>>>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>>>>>>>>> +{
>>>>>>>>>>> +    uint16_t timeout = 0;
>>>>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>>>>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>>>>> +        return;
>>>>>>>>>>> +    }
>>>>>>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>>>>> +        return;
>>>>>>>>>>> +    }
>>>>>>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>>>>>>>>> +        return;
>>>>>>>>>>> +    }
>>>>>>>>>>> +    fdb_local->timeout = timeout;
>>>>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static void
>>>>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>>>> +                      struct ds *s)
>>>>>>>>>>> +{
>>>>>>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static void
>>>>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>>>>>>>>> +{
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static void
>>>>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>>>>>>>>> +{
>>>>>>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>>>>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>>>>>>>>> +    struct ofpact_learn_spec *ol_spec;
>>>>>>>>>>> +    unsigned int imm_bytes;
>>>>>>>>>>> +    uint8_t *src_imm;
>>>>>>>>>>> +
>>>>>>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>>>>>>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>>>>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>>>>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>>>>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>>>>>>>>> +    ol->cookie = htonll(cookie);
>>>>>>>>>>> +
>>>>>>>>>>> +    /* Match on metadata of the packet that created the new table. */
>>>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>>>>>>>>> +
>>>>>>>>>>> +    /* Match on metadata of the packet. */
>>>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>>>>>>>>> +
>>>>>>>>>>> +
>>>>>>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>>>>>>>>> +
>>>>>>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>>>>>>>>> +}
>>>>>>>>>>
>>>>>>>>>> A difference from today's SB.FDB centralized approach is that when
>>>>>>>>>> ovn-controller restarts these flows will be cleared, I think.
>>>>>>>>>>
>>>>>>>>>> Are we OK with that?  I think so but if not what are the options to
>>>>>>>>>> avoid clearing the local fdb cache on restart?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
>>>>>>>>> During this time, packets gets flooded. I need to think about possible 
>>>>>>>>> options if we want to retain FDB table.
>>>>>>>>>
>>>>>>>>> Can we take this up as an enhancement in a separate patch
>>>>>>>>> once we identify a solution for this ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sounds good to me.
>>>>>>>>
>>>>>>>>>> Another difference with today's approach is that this avoids a
>>>>>>>>>> controller action, that's great!
>>>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +static void
>>>>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>>>> +                      const struct ovnact_encode_params *ep,
>>>>>>>>>>> +                      struct ofpbuf *ofpacts)
>>>>>>>>>>> +{
>>>>>>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> static void
>>>>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>>>>>>>>                   const struct ovnact_encode_params *ep,
>>>>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>>>>>>>>      parse_sample(ctx);
>>>>>>>>>>>  } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>>>>>>>>      ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>>>>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>>>>>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>>>>>>>>  } else {
>>>>>>>>>>>      lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>>>>>>>  }
>>>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>>>>>> index d1465ddf7..de18694a0 100644
>>>>>>>>>>> --- a/northd/northd.c
>>>>>>>>>>> +++ b/northd/northd.c
>>>>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>>>>>>>>  return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> +static bool
>>>>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>>>>>>>>> +{
>>>>>>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>
>>>>>>>>>> Personally, I'd prefer if we don't add another config knob and we just
>>>>>>>>>> make this the only way FDB works.  We could also document that the FDB
>>>>>>>>>> SB table should be deprecated.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If we want to make local FDB as default then, I think we need to handle overlay 
>>>>>>>>> use case as well. Probably, we might have to add a new stage in logical switch
>>>>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>>>>>>>>> something similar.
>>>>>>>>>
>>>>>>>>> Can we take this up in a separate patch ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Given that we have a fix already for the original problem you were
>>>>>>>> trying to address I would prefer that we avoid adding new config knobs
>>>>>>>> and handle both the localnet and overlay cases at the same time.
>>>>>>>>
>>>>>>>
>>>>>>> Sure.
>>>>>>>
>>>>>>>> OTOH, why is there a difference?
>>>>>>>>
>>>>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
>>>>>>>> learn() action) into MFF_LOG_INPORT:
>>>>>>>>
>>>>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
>>>>>>>> VXLAN that's not the case but that's already a documented limitation,
>>>>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features
>>>>>>>> that need it (like egress ACLs matching against ingress port
>>>>>>>> identifiers) are not supported:
>>>>>>>>
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
>>>>>>>>
>>>>>>>
>>>>>>> In case of overlay, ingress and egress pipelines are processed on different nodes
>>>>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB
>>>>>>> to learn only local VMs MACs and it never learn remote VMs MACs.
>>>>>>>
>>>>>>> Example: 
>>>>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
>>>>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
>>>>>>> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
>>>>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
>>>>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
>>>>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Ah, I see your point, you're right.  So it does look like we'd need a
>>>>>> dedicated fdb learning stage in the egress pipeline.  But that is
>>>>>> probably not that terrible.
>>>>>>
>>
>>
>> Hi Dumitru,
>>

Hi Naveen,

>> I started working on this patch, sorry for the delay.
>>

Thanks for following up!

>> On further investigation, I realised that we cannot permanently remove FDB table from southbound even
>> if we add new FDB leaning stage in egress pipeline.
>>
>> Following is the example:
>>
>> Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to 
>> subnet-1 (LS1) and is running on hv1  and VM2 isconnected to subnet-2 (LS2) and is running
>> on hv2. Both subnets are connected to same logical router(LR).
>>
>> When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below.
>>   - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets
>>     flooded in LS2.
>>   - ICMP Req on HV2: LS2-out.
>>   - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in.  HV2 learns VM2 MAC, packet gets
>>     flooded in LS1.
>>   - ICMP Resp on HV1: LS1-out.
>>   - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC.
>>
>>
>> I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled
>> (or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and
>> explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR
>> have ports with unknown addr configured.
>>
>> Please let me know your thoughts.

My first impression is that it might become confusing to users if the
FDB local implementation is only applicablel for VLAN subnets.  I'd
prefer to avoid having two different implementations (centralized - in
SB - vs decentralized - local).

>>
>> Thanks,
>> Naveen
>>
>>>>>>>> One more thing we need to take care of in order to be able to make local
>>>>>>>> FDB the default is "FDB refresh":
>>>>>>>>
>>>>>>>> 551527a5e68e ("controller: Update FDB timestamp")
>>>>>>>>
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
>>>>>>>>
>>>>>>>
>>>>>>> Sure.
>>>>>>>
>>>>>
>>>>> Hi Naveen,
>>>>>
>>>>> Just touching base on this again, I was curious if you're still planning
>>>>> to work on making the FDB table local; I still think it's beneficial in
>>>>> general.
>>>>>
>>>>> Thanks,
>>>>> Dumitru
>>>>
>>>>
>>>> Hi Dumitru,
>>>>
>>>> Yes, I will work on this. I will try to send out the patch by Jan 1st week.
>>>>
>>>
>>> Awesome, thanks Naveen!  No rush though, I just wanted to confirm
>>> whether you were interested in pursuing this.
>>>
>>> Best regards,
>>> Dumitru
>>>
>>>
>>
>>
>> Hi Naveen,
>>
>> I did run into a similar issue with MAC binding, with distributed routers there is currently no other way to share the MAC binding. That made me wonder if it's worth pursuing, as the complexity of the code would significantly increase having half in-memory, half DB solution.
>> With the aging option the DB size shouldn't be that problematic anymore not sure if there are any other concerns with the current implementation. 
>>
>> From my point of view this might be a dead end, however others might have slightly different opinions.
>>
>> Thanks,
>> Ales
>>
>> -- 
>> Ales Musil
>> Senior Software Engineer - OVN Core
>> Red Hat EMEA [redhat.com]
>> amusil@redhat.com 
>>   [red.ht]
> 
> 
> Hi Ales,
> 
> Thanks for the reply.
> 
> I definitely agree with you on the additional code complexity that we have to deal with when we have both implementations.
> I think this FDB local helps to reduce the overhead on SB in case of VLAN networks depending on the size of the network. 
> 
> Also, one more issue we have noticed with FDB  table when fdb learning is enabled on localnet port:
>    -  If underlay TOR switch floods some packets then, DB conflicts (OVSDB reports constraint violation errors) are occuring
>       when controllers are trying to insert entries to FDB table. This is happening because when underlay TOR switch floods
>       a packet then, all controllers receives packets at the same time and try to install FDB entry at the same time. Only first one
>       succeeds and others report error. This cause controllers to recompute all the flows from SB. On scale networks, this is making
>       OVN controllers busy.
> 

Can't we address this in a similar way to what we did for MAC_Bindings
and add some sort of randomness to when the SB write happens, e.g.:

b416f6f65d71 ("controller: Add delay after multicast ARP packet")
9411dd300d5a ("mac-learn: Make the mac_binding struct more flexible")

https://github.com/ovn-org/ovn/commit/b416f6f65d71
https://github.com/ovn-org/ovn/commit/9411dd300d5a

> If we are ok to support FDB local implementation only for VLAN subnets then, I can look at possible options to reduce the code complexity.

Like I mentioned above, if we can address existing issues in the
centralized (in SB) implementation, I'd prefer avoiding having two
different implementations.

> Any other suggestions are welcome.
> 
> 
> Thanks,
> Naveen

Regards,
Dumitru
Naveen Yerramneni March 14, 2024, 3:42 p.m. UTC | #15
> On 14-Mar-2024, at 9:07 PM, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 3/14/24 15:21, Naveen Yerramneni wrote:
>> 
>> 
>>> On 08-Mar-2024, at 2:37 PM, Ales Musil <amusil@redhat.com> wrote:
>>> 
>>> 
>>> 
>>> On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni <naveen.yerramneni@nutanix.com> wrote:
>>> 
>>> 
>>>> On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>> 
>>>> On 12/18/23 16:17, Naveen Yerramneni wrote:
>>>>> 
>>>>> 
>>>>>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>> 
>>>>>> On 11/30/23 16:32, Dumitru Ceara wrote:
>>>>>>> On 11/30/23 15:54, Naveen Yerramneni wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>>> 
>>>>>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote:
>>>>>>>>>>>> This functionality can be enabled at the logical switch level:
>>>>>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this
>>>>>>>>>>>> functionality, it is disabled by default.
>>>>>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout
>>>>>>>>>>>> for locally learned fdb flows, default timeout is 300 secs.
>>>>>>>>>>>> 
>>>>>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set.
>>>>>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>),
>>>>>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next;
>>>>>>>>>>>> 
>>>>>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action.
>>>>>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[],
>>>>>>>>>>>>      NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[])
>>>>>>>>>>>> 
>>>>>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports
>>>>>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config,
>>>>>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN
>>>>>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and
>>>>>>>>>>>> destination VMs keeps flapping between VM port and localnet port in 
>>>>>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to
>>>>>>>>>>>> the chassis and avoids MAC flapping.
>>>>>>>>>>>> 
>>>>>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com>
>>>>>>>>>>>> ---
>>>>>>>>>>> 
>>>>>>>>>>> Hi Naveen,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks a lot for the patch!
>>>>>>>>>>> 
>>>>>>>>>>> Just a note, we already have a fix for the east-west traffic that causes
>>>>>>>>>>> FDB flapping when localnet is used:
>>>>>>>>>>> 
>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= 
>>>>>>>>>>> 
>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= 
>>>>>>>>>>> 
>>>>>>>>>>> In general, however, I think it's a very good idea to move the FDB away
>>>>>>>>>>> from the Southbound and make it local to each hypervisor.  That reduces
>>>>>>>>>>> load on the Southbound among other things.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi Dumitru,
>>>>>>>>>> 
>>>>>>>>>> Thanks for informing about the patches.
>>>>>>>>>> Yes, local FDB reduces load on southbound.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> include/ovn/actions.h |   7 +++
>>>>>>>>>>>> lib/actions.c         |  94 ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>> northd/northd.c       |  26 ++++++++++
>>>>>>>>>>>> ovn-nb.xml            |  14 ++++++
>>>>>>>>>>>> tests/ovn.at [ovn.at]          | 108 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>> utilities/ovn-trace.c |   2 +
>>>>>>>>>>>> 6 files changed, 251 insertions(+)
>>>>>>>>>>>> 
>>>>>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>>>>>>>>> index 49cfe0624..85ac92cd3 100644
>>>>>>>>>>>> --- a/include/ovn/actions.h
>>>>>>>>>>>> +++ b/include/ovn/actions.h
>>>>>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids;
>>>>>>>>>>>> OVNACT(CHK_LB_AFF,        ovnact_result)          \
>>>>>>>>>>>> OVNACT(SAMPLE,            ovnact_sample)          \
>>>>>>>>>>>> OVNACT(MAC_CACHE_USE,     ovnact_null)            \
>>>>>>>>>>>> +    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
>>>>>>>>>>>> 
>>>>>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>>>>>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff {
>>>>>>>>>>>> uint16_t timeout;
>>>>>>>>>>>> };
>>>>>>>>>>>> 
>>>>>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */
>>>>>>>>>>>> +struct ovnact_commit_fdb_local{
>>>>>>>>>>>> +    struct ovnact ovnact;
>>>>>>>>>>>> +    uint16_t timeout;  /* fdb_local flow timeout */
>>>>>>>>>>>> +};
>>>>>>>>>>>> +
>>>>>>>>>>>> /* Internal use by the helpers below. */
>>>>>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>>>>>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>>>>>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>>>>>>>>> index a73fe1a1e..f5aa78db1 100644
>>>>>>>>>>>> --- a/lib/actions.c
>>>>>>>>>>>> +++ b/lib/actions.c
>>>>>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
>>>>>>>>>>>> ds_put_cstr(s, "mac_cache_use;");
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> +static void
>>>>>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx,
>>>>>>>>>>>> +                     struct ovnact_commit_fdb_local *fdb_local)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    uint16_t timeout = 0;
>>>>>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
>>>>>>>>>>>> +    if (!lexer_match_id(ctx->lexer, "timeout")) {
>>>>>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>>>>>> +        return;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
>>>>>>>>>>>> +        lexer_syntax_error(ctx->lexer, "invalid parameter");
>>>>>>>>>>>> +        return;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
>>>>>>>>>>>> +        return;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    fdb_local->timeout = timeout;
>>>>>>>>>>>> +    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void
>>>>>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>>>>> +                      struct ds *s)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void
>>>>>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void
>>>>>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>>>>> +                        struct ofpbuf *ofpacts, uint32_t cookie)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
>>>>>>>>>>>> +    struct match match = MATCH_CATCHALL_INITIALIZER;
>>>>>>>>>>>> +    struct ofpact_learn_spec *ol_spec;
>>>>>>>>>>>> +    unsigned int imm_bytes;
>>>>>>>>>>>> +    uint8_t *src_imm;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
>>>>>>>>>>>> +    ol->idle_timeout = fdb_local->timeout;
>>>>>>>>>>>> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
>>>>>>>>>>>> +    ol->priority = OFP_DEFAULT_PRIORITY;
>>>>>>>>>>>> +    ol->table_id = OFTABLE_GET_FDB;
>>>>>>>>>>>> +    ol->cookie = htonll(cookie);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /* Match on metadata of the packet that created the new table. */
>>>>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
>>>>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /* Match on metadata of the packet. */
>>>>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
>>>>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
>>>>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
>>>>>>>>>>>> +
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
>>>>>>>>>>>> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
>>>>>>>>>>>> +    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
>>>>>>>>>>>> +    ol_spec->dst.ofs = 0;
>>>>>>>>>>>> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
>>>>>>>>>>>> +    ol_spec->n_bits = ol_spec->dst.n_bits;
>>>>>>>>>>>> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
>>>>>>>>>>>> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
>>>>>>>>>>>> +    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    ofpact_finish_LEARN(ofpacts, &ol);
>>>>>>>>>>>> +}
>>>>>>>>>>> 
>>>>>>>>>>> A difference from today's SB.FDB centralized approach is that when
>>>>>>>>>>> ovn-controller restarts these flows will be cleared, I think.
>>>>>>>>>>> 
>>>>>>>>>>> Are we OK with that?  I think so but if not what are the options to
>>>>>>>>>>> avoid clearing the local fdb cache on restart?
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> OVS has to relearn the FDB  flows whenever tables are cleared.
>>>>>>>>>> During this time, packets gets flooded. I need to think about possible 
>>>>>>>>>> options if we want to retain FDB table.
>>>>>>>>>> 
>>>>>>>>>> Can we take this up as an enhancement in a separate patch
>>>>>>>>>> once we identify a solution for this ?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Sounds good to me.
>>>>>>>>> 
>>>>>>>>>>> Another difference with today's approach is that this avoids a
>>>>>>>>>>> controller action, that's great!
>>>>>>>>>>> 
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void
>>>>>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
>>>>>>>>>>>> +                      const struct ovnact_encode_params *ep,
>>>>>>>>>>>> +                      struct ofpbuf *ofpacts)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> static void
>>>>>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
>>>>>>>>>>>>                  const struct ovnact_encode_params *ep,
>>>>>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx)
>>>>>>>>>>>>     parse_sample(ctx);
>>>>>>>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
>>>>>>>>>>>>     ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
>>>>>>>>>>>> +    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
>>>>>>>>>>>> +        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
>>>>>>>>>>>> } else {
>>>>>>>>>>>>     lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>>>>>>>> }
>>>>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>>>>>>> index d1465ddf7..de18694a0 100644
>>>>>>>>>>>> --- a/northd/northd.c
>>>>>>>>>>>> +++ b/northd/northd.c
>>>>>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
>>>>>>>>>>>> return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> +static bool
>>>>>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return smap_get_bool(&nbs->other_config, "fdb_local", false);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>> 
>>>>>>>>>>> Personally, I'd prefer if we don't add another config knob and we just
>>>>>>>>>>> make this the only way FDB works.  We could also document that the FDB
>>>>>>>>>>> SB table should be deprecated.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If we want to make local FDB as default then, I think we need to handle overlay 
>>>>>>>>>> use case as well. Probably, we might have to add a new stage in logical switch
>>>>>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or)
>>>>>>>>>> something similar.
>>>>>>>>>> 
>>>>>>>>>> Can we take this up in a separate patch ?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Given that we have a fix already for the original problem you were
>>>>>>>>> trying to address I would prefer that we avoid adding new config knobs
>>>>>>>>> and handle both the localnet and overlay cases at the same time.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Sure.
>>>>>>>> 
>>>>>>>>> OTOH, why is there a difference?
>>>>>>>>> 
>>>>>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the
>>>>>>>>> learn() action) into MFF_LOG_INPORT:
>>>>>>>>> 
>>>>>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set.  With
>>>>>>>>> VXLAN that's not the case but that's already a documented limitation,
>>>>>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features
>>>>>>>>> that need it (like egress ACLs matching against ingress port
>>>>>>>>> identifiers) are not supported:
>>>>>>>>> 
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> In case of overlay, ingress and egress pipelines are processed on different nodes
>>>>>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB
>>>>>>>> to learn only local VMs MACs and it never learn remote VMs MACs.
>>>>>>>> 
>>>>>>>> Example: 
>>>>>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1
>>>>>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens
>>>>>>>> on hv1 and it learns vm1-mac and  this packet gets flooded since vm2-mac is not yet learned.
>>>>>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is
>>>>>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also
>>>>>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received.
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> Ah, I see your point, you're right.  So it does look like we'd need a
>>>>>>> dedicated fdb learning stage in the egress pipeline.  But that is
>>>>>>> probably not that terrible.
>>>>>>> 
>>> 
>>> 
>>> Hi Dumitru,
>>> 
> 
> Hi Naveen,
> 
>>> I started working on this patch, sorry for the delay.
>>> 
> 
> Thanks for following up!
> 
>>> On further investigation, I realised that we cannot permanently remove FDB table from southbound even
>>> if we add new FDB leaning stage in egress pipeline.
>>> 
>>> Following is the example:
>>> 
>>> Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to 
>>> subnet-1 (LS1) and is running on hv1  and VM2 isconnected to subnet-2 (LS2) and is running
>>> on hv2. Both subnets are connected to same logical router(LR).
>>> 
>>> When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below.
>>>  - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets
>>>    flooded in LS2.
>>>  - ICMP Req on HV2: LS2-out.
>>>  - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in.  HV2 learns VM2 MAC, packet gets
>>>    flooded in LS1.
>>>  - ICMP Resp on HV1: LS1-out.
>>>  - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC.
>>> 
>>> 
>>> I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled
>>> (or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and
>>> explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR
>>> have ports with unknown addr configured.
>>> 
>>> Please let me know your thoughts.
> 
> My first impression is that it might become confusing to users if the
> FDB local implementation is only applicablel for VLAN subnets.  I'd
> prefer to avoid having two different implementations (centralized - in
> SB - vs decentralized - local).
> 
>>> 
>>> Thanks,
>>> Naveen
>>> 
>>>>>>>>> One more thing we need to take care of in order to be able to make local
>>>>>>>>> FDB the default is "FDB refresh":
>>>>>>>>> 
>>>>>>>>> 551527a5e68e ("controller: Update FDB timestamp")
>>>>>>>>> 
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Sure.
>>>>>>>> 
>>>>>> 
>>>>>> Hi Naveen,
>>>>>> 
>>>>>> Just touching base on this again, I was curious if you're still planning
>>>>>> to work on making the FDB table local; I still think it's beneficial in
>>>>>> general.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dumitru
>>>>> 
>>>>> 
>>>>> Hi Dumitru,
>>>>> 
>>>>> Yes, I will work on this. I will try to send out the patch by Jan 1st week.
>>>>> 
>>>> 
>>>> Awesome, thanks Naveen!  No rush though, I just wanted to confirm
>>>> whether you were interested in pursuing this.
>>>> 
>>>> Best regards,
>>>> Dumitru
>>>> 
>>>> 
>>> 
>>> 
>>> Hi Naveen,
>>> 
>>> I did run into a similar issue with MAC binding, with distributed routers there is currently no other way to share the MAC binding. That made me wonder if it's worth pursuing, as the complexity of the code would significantly increase having half in-memory, half DB solution.
>>> With the aging option the DB size shouldn't be that problematic anymore not sure if there are any other concerns with the current implementation. 
>>> 
>>> From my point of view this might be a dead end, however others might have slightly different opinions.
>>> 
>>> Thanks,
>>> Ales
>>> 
>>> -- 
>>> Ales Musil
>>> Senior Software Engineer - OVN Core
>>> Red Hat EMEA [redhat.com]
>>> amusil@redhat.com 
>>>  [red.ht]
>> 
>> 
>> Hi Ales,
>> 
>> Thanks for the reply.
>> 
>> I definitely agree with you on the additional code complexity that we have to deal with when we have both implementations.
>> I think this FDB local helps to reduce the overhead on SB in case of VLAN networks depending on the size of the network. 
>> 
>> Also, one more issue we have noticed with FDB  table when fdb learning is enabled on localnet port:
>>   -  If underlay TOR switch floods some packets then, DB conflicts (OVSDB reports constraint violation errors) are occuring
>>      when controllers are trying to insert entries to FDB table. This is happening because when underlay TOR switch floods
>>      a packet then, all controllers receives packets at the same time and try to install FDB entry at the same time. Only first one
>>      succeeds and others report error. This cause controllers to recompute all the flows from SB. On scale networks, this is making
>>      OVN controllers busy.
>> 
> 
> Can't we address this in a similar way to what we did for MAC_Bindings
> and add some sort of randomness to when the SB write happens, e.g.:
> 
> b416f6f65d71 ("controller: Add delay after multicast ARP packet")
> 9411dd300d5a ("mac-learn: Make the mac_binding struct more flexible")
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_b416f6f65d71&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=c0ceNH31esimVUy0sy65PbuIkZ0t3-hk9pq8tlyPKdZClTRxKXF0G2yiKgxtVzpU&s=sEl-2YZAwckbT5tjW6O64qWcD2iaCGy1fdcg53CQs44&e= 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_9411dd300d5a&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=c0ceNH31esimVUy0sy65PbuIkZ0t3-hk9pq8tlyPKdZClTRxKXF0G2yiKgxtVzpU&s=F-0lA-Wm0hPorf8JdI8chPm_nYff8IXatt4F6hE5qpA&e= 

Hi Dumitru,

Thanks for pointing me to these, I will go through this and get back.

> 
>> If we are ok to support FDB local implementation only for VLAN subnets then, I can look at possible options to reduce the code complexity.
> 
> Like I mentioned above, if we can address existing issues in the
> centralized (in SB) implementation, I'd prefer avoiding having two
> different implementations.

ACK

> 
>> Any other suggestions are welcome.
>> 
>> 
>> Thanks,
>> Naveen
> 
> Regards,
> Dumitru
diff mbox series

Patch

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 49cfe0624..85ac92cd3 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -127,6 +127,7 @@  struct collector_set_ids;
     OVNACT(CHK_LB_AFF,        ovnact_result)          \
     OVNACT(SAMPLE,            ovnact_sample)          \
     OVNACT(MAC_CACHE_USE,     ovnact_null)            \
+    OVNACT(COMMIT_FDB_LOCAL,  ovnact_commit_fdb_local) \
 
 /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -514,6 +515,12 @@  struct ovnact_commit_lb_aff {
     uint16_t timeout;
 };
 
+/* OVNACT_COMMIT_FBD_LOCAL. */
+struct ovnact_commit_fdb_local{
+    struct ovnact ovnact;
+    uint16_t timeout;  /* fdb_local flow timeout */
+};
+
 /* Internal use by the helpers below. */
 void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
 void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
diff --git a/lib/actions.c b/lib/actions.c
index a73fe1a1e..f5aa78db1 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -5236,6 +5236,98 @@  format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
     ds_put_cstr(s, "mac_cache_use;");
 }
 
+static void
+parse_commit_fdb_local(struct action_context *ctx,
+                     struct ovnact_commit_fdb_local *fdb_local)
+{
+    uint16_t timeout = 0;
+    lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */
+    if (!lexer_match_id(ctx->lexer, "timeout")) {
+        lexer_syntax_error(ctx->lexer, "invalid parameter");
+        return;
+    }
+    if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
+        lexer_syntax_error(ctx->lexer, "invalid parameter");
+        return;
+    }
+    if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) {
+        return;
+    }
+    fdb_local->timeout = timeout;
+    lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */
+}
+
+static void
+format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
+                      struct ds *s)
+{
+    ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout);
+}
+
+static void
+ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED)
+{
+}
+
+static void
+commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local,
+                        struct ofpbuf *ofpacts, uint32_t cookie)
+{
+    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
+    struct match match = MATCH_CATCHALL_INITIALIZER;
+    struct ofpact_learn_spec *ol_spec;
+    unsigned int imm_bytes;
+    uint8_t *src_imm;
+
+    ol->flags = NX_LEARN_F_DELETE_LEARNED;
+    ol->idle_timeout = fdb_local->timeout;
+    ol->hard_timeout = OFP_FLOW_PERMANENT;
+    ol->priority = OFP_DEFAULT_PRIORITY;
+    ol->table_id = OFTABLE_GET_FDB;
+    ol->cookie = htonll(cookie);
+
+    /* Match on metadata of the packet that created the new table. */
+    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
+    ol_spec->dst.field = mf_from_id(MFF_METADATA);
+    ol_spec->dst.ofs = 0;
+    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
+    ol_spec->n_bits = ol_spec->dst.n_bits;
+    ol_spec->dst_type = NX_LEARN_DST_MATCH;
+    ol_spec->src_type = NX_LEARN_SRC_FIELD;
+    ol_spec->src.field = mf_from_id(MFF_METADATA);
+
+    /* Match on metadata of the packet. */
+    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
+    ol_spec->dst.field = mf_from_id(MFF_ETH_DST);
+    ol_spec->dst.ofs = 0;
+    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
+    ol_spec->n_bits = ol_spec->dst.n_bits;
+    ol_spec->dst_type = NX_LEARN_DST_MATCH;
+    ol_spec->src_type = NX_LEARN_SRC_FIELD;
+    ol_spec->src.field = mf_from_id(MFF_ETH_SRC);
+
+
+    /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */
+    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
+    ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT);
+    ol_spec->dst.ofs = 0;
+    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
+    ol_spec->n_bits = ol_spec->dst.n_bits;
+    ol_spec->dst_type = NX_LEARN_DST_LOAD;
+    ol_spec->src_type = NX_LEARN_SRC_FIELD;
+    ol_spec->src.field = mf_from_id(MFF_LOG_INPORT);
+
+    ofpact_finish_LEARN(ofpacts, &ol);
+}
+
+static void
+encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local,
+                      const struct ovnact_encode_params *ep,
+                      struct ofpbuf *ofpacts)
+{
+     commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]);
+}
+
 static void
 encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
                      const struct ovnact_encode_params *ep,
@@ -5451,6 +5543,8 @@  parse_action(struct action_context *ctx)
         parse_sample(ctx);
     } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
         ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
+    } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) {
+        parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts));
     } else {
         lexer_syntax_error(ctx->lexer, "expecting action");
     }
diff --git a/northd/northd.c b/northd/northd.c
index d1465ddf7..de18694a0 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -1834,6 +1834,12 @@  localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp)
     return smap_get_bool(&nbsp->options, "localnet_learn_fdb", false);
 }
 
+static bool
+ls_is_fdb_local(const struct nbrec_logical_switch *nbs)
+{
+    return smap_get_bool(&nbs->other_config, "fdb_local", false);
+}
+
 static bool
 lsp_is_type_changed(const struct sbrec_port_binding *sb,
                 const struct nbrec_logical_switch_port *nbsp,
@@ -7033,6 +7039,8 @@  build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows,
     }
 }
 
+#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300
+
 static void
 build_lswitch_learn_fdb_op(
         struct ovn_port *op, struct hmap *lflows,
@@ -7042,6 +7050,24 @@  build_lswitch_learn_fdb_op(
 
     if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") ||
         (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) {
+
+        if (ls_is_fdb_local(op->od->nbs))
+        {
+            uint32_t idle_timeout =smap_get_uint(
+                        &op->od->nbs->other_config, "fdb_local_idle_timeout",
+                        FDB_LOCAL_DEF_IDLE_TIMEOUT_S);
+            ds_clear(match);
+            ds_clear(actions);
+            ds_put_format(match, "inport == %s", op->json_key);
+            ds_put_format(actions, "commit_fdb_local(timeout=%u); next;",
+                                    idle_timeout);
+            ovn_lflow_add_with_lport_and_hint(lflows, op->od,
+                                              S_SWITCH_IN_LOOKUP_FDB, 100,
+                                              ds_cstr(match), ds_cstr(actions),
+                                              op->key, &op->nbsp->header_);
+            return;
+        }
+
         ds_clear(match);
         ds_clear(actions);
         ds_put_format(match, "inport == %s", op->json_key);
diff --git a/ovn-nb.xml b/ovn-nb.xml
index fcb1c6ecc..3547ec4a6 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -803,6 +803,20 @@ 
       </column>
     </group>
 
+    <group title="Local FDB options">
+      <column name="other_config" key="fdb_local"
+              type='{"type": "boolean"}'>
+        If set to <code>true</code>, FDB flows are commited only to the
+        local chassis instead of southbound DB. Default is false.
+      </column>
+      <column name="other_config" key="fdb_local_idle_timeout"
+              type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
+        Local FDB flows <code>idle_timeout</code> value in seconds. FDB local
+        flows exceeding this timeout will be automatically removed. The value
+        defaults to 300, 0 means disabled.
+      </column>
+    </group>
+
     <column name="copp">
       <p>
         The control plane protection policy from table <ref table="Copp"/>
diff --git a/tests/ovn.at b/tests/ovn.at
index 92cf27581..dff50364a 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -34504,6 +34504,114 @@  OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([Local FDB MAC learning])
+ovn_start
+net_add n1
+
+AT_CHECK([ovn-nbctl ls-add ls0])
+
+AT_CHECK([ovn-nbctl lsp-add ls0 vif0])
+AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"])
+AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2])
+
+AT_CHECK([ovn-nbctl lsp-add ls0 vif1])
+AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"])
+AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3])
+
+AT_CHECK([ovn-nbctl lsp-add ls0 ln_port])
+AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
+AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
+AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1])
+AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true])
+AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1])
+
+AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true])
+
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+ovs-vsctl -- add-port br-int vif0 -- \
+    set interface vif0 external-ids:iface-id=vif0 \
+    options:tx_pcap=hv1/vif0-tx.pcap \
+    options:rxq_pcap=hv1/vif0-rx.pcap \
+    ofport-request=1
+ovs-vsctl -- add-port br-int vif1 -- \
+    set interface vif1 external-ids:iface-id=vif1 \
+    options:tx_pcap=hv1/vif1-tx.pcap \
+    options:rxq_pcap=hv1/vif1-rx.pcap \
+    ofport-request=2
+ovs-vsctl -- add-port br-phys ext0 -- \
+    set interface ext0 \
+    options:tx_pcap=hv1/ext0-tx.pcap \
+    options:rxq_pcap=hv1/ext0-rx.pcap \
+    ofport-request=3
+ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys
+
+wait_for_ports_up
+AT_CHECK([ovn-nbctl --wait=hv sync])
+
+send_packet() {
+    src_mac=$1
+    src_ip=$2
+    dst_mac=$3
+    dst_ip=$4
+    iface=$5
+    packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+    ovs-appctl netdev-dummy/receive $iface $packet
+}
+
+
+# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet
+ovn-sbctl dump-flows ls0 > sw0flows
+AT_CAPTURE_FILE([sw0flows])
+
+AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl
+  table=2 (ls_in_lookup_fdb   ), priority=0    , dnl
+match=(1), action=(next;)
+  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
+match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;)
+  table=2 (ls_in_lookup_fdb   ), priority=100  , dnl
+match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;)
+])
+
+AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl
+  table=3 (ls_in_put_fdb      ), priority=0    , dnl
+match=(1), action=(next;)
+])
+
+
+src_mac="505400000003"
+src_ip=`ip_to_hex 10.0.0.3`
+dst_mac="505400000004"
+dst_ip=`ip_to_hex 10.0.0.4`
+
+# send packet from vif0(which has unknown addr set) to vif1
+send_packet $src_mac $src_ip $dst_mac $dst_ip vif0
+
+# send packet from vif1 to vif0(which has unknown addr set)
+send_packet $dst_mac $dst_ip $src_mac $src_ip vif1
+
+# send packet from underlay to vif1
+src_mac="505400000064"
+src_ip=`ip_to_hex 10.0.0.100`
+send_packet $src_mac $src_ip $dst_mac $dst_ip ext0
+AT_CHECK([ovn-nbctl --wait=hv sync])
+
+# Make sure that OVS table 71 is populated on hv1.
+AS_BOX([Check that ovn-controller programs the flows for FDB])
+as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt
+AT_CAPTURE_FILE([hv1_offlows_table71.txt])
+AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl
+idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]]
+idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]]
+])
+
+OVN_CLEANUP([hv1])
+AT_CLEANUP
+])
+
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([MAC binding aging])
 AT_SKIP_IF([test $HAVE_SCAPY = no])
diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
index 0b86eae7b..354f84a4b 100644
--- a/utilities/ovn-trace.c
+++ b/utilities/ovn-trace.c
@@ -3355,6 +3355,8 @@  trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len,
             break;
         case OVNACT_MAC_CACHE_USE:
             break;
+        case OVNACT_COMMIT_FDB_LOCAL:
+            break;
         }
     }
     ofpbuf_uninit(&stack);