Message ID | 20231129064550.96265-1-naveen.yerramneni@nutanix.com |
---|---|
State | Changes Requested |
Delegated to: | Dumitru Ceara |
Headers | show |
Series | [ovs-dev] Add support to make fdb table local to the chassis. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | warning | apply and check: warning |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
References: <20231129064550.96265-1-naveen.yerramneni@nutanix.com> Bleep bloop. Greetings naveen.yerramneni, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: ERROR: Author naveen.yerramneni <naveen.yerramneni@nutanix.com> needs to sign off. WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Naveen Yerramneni <naveen.yerramneni@nutanix.com> WARNING: Line is 82 characters long (recommended limit is 79) #100 FILE: lib/actions.c:5268: ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) WARNING: Line is 80 characters long (recommended limit is 79) #160 FILE: lib/actions.c:5328: commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); ERROR: Inappropriate bracing around statement #206 FILE: northd/northd.c:7055: if (ls_is_fdb_local(op->od->nbs)) Lines checked: 386, Warnings: 3, Errors: 2 Please check this out. If you feel there has been an error, please email aconole@redhat.com Thanks, 0-day Robot
On 11/29/23 07:45, naveen.yerramneni wrote: > This functionality can be enabled at the logical switch level: > - "other_config:fdb_local" can be used to enable/disable this > functionality, it is disabled by default. > - "other_config:fdb_local_idle_timeout" sepcifies idle timeout > for locally learned fdb flows, default timeout is 300 secs. > > If enabled, below lflow is added for each port that has unknown addr set. > - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), > action=(commit_fdb_local(timeout=<timeout>); next; > > New OVN action: "commit_fdb_local". This sets following OVS action. > - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], > NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) > > This is useful when OVN is managing VLAN network that has multiple ports > set with unknown addr and localnet_learn_fdb is enabled. With this config, > if there is east-west traffic flowing between VMs part of same VLAN > deployed on different hypervisors then, MAC addrs of the source and > destination VMs keeps flapping between VM port and localnet port in > Southbound FDB table. Enabling fdb_local config makes fdb table local to > the chassis and avoids MAC flapping. > > Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> > --- Hi Naveen, Thanks a lot for the patch! Just a note, we already have a fix for the east-west traffic that causes FDB flapping when localnet is used: https://github.com/ovn-org/ovn/commit/2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e https://github.com/ovn-org/ovn/commit/f3a14907fe2b1ecdcfddfbed595cd097b6efbe14 In general, however, I think it's a very good idea to move the FDB away from the Southbound and make it local to each hypervisor. That reduces load on the Southbound among other things. > include/ovn/actions.h | 7 +++ > lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ > northd/northd.c | 26 ++++++++++ > ovn-nb.xml | 14 ++++++ > tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ > utilities/ovn-trace.c | 2 + > 6 files changed, 251 insertions(+) > > diff --git a/include/ovn/actions.h b/include/ovn/actions.h > index 49cfe0624..85ac92cd3 100644 > --- a/include/ovn/actions.h > +++ b/include/ovn/actions.h > @@ -127,6 +127,7 @@ struct collector_set_ids; > OVNACT(CHK_LB_AFF, ovnact_result) \ > OVNACT(SAMPLE, ovnact_sample) \ > OVNACT(MAC_CACHE_USE, ovnact_null) \ > + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ > > /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ > enum OVS_PACKED_ENUM ovnact_type { > @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { > uint16_t timeout; > }; > > +/* OVNACT_COMMIT_FBD_LOCAL. */ > +struct ovnact_commit_fdb_local{ > + struct ovnact ovnact; > + uint16_t timeout; /* fdb_local flow timeout */ > +}; > + > /* Internal use by the helpers below. */ > void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); > void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); > diff --git a/lib/actions.c b/lib/actions.c > index a73fe1a1e..f5aa78db1 100644 > --- a/lib/actions.c > +++ b/lib/actions.c > @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) > ds_put_cstr(s, "mac_cache_use;"); > } > > +static void > +parse_commit_fdb_local(struct action_context *ctx, > + struct ovnact_commit_fdb_local *fdb_local) > +{ > + uint16_t timeout = 0; > + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ > + if (!lexer_match_id(ctx->lexer, "timeout")) { > + lexer_syntax_error(ctx->lexer, "invalid parameter"); > + return; > + } > + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { > + lexer_syntax_error(ctx->lexer, "invalid parameter"); > + return; > + } > + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { > + return; > + } > + fdb_local->timeout = timeout; > + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ > +} > + > +static void > +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, > + struct ds *s) > +{ > + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); > +} > + > +static void > +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) > +{ > +} > + > +static void > +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, > + struct ofpbuf *ofpacts, uint32_t cookie) > +{ > + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); > + struct match match = MATCH_CATCHALL_INITIALIZER; > + struct ofpact_learn_spec *ol_spec; > + unsigned int imm_bytes; > + uint8_t *src_imm; > + > + ol->flags = NX_LEARN_F_DELETE_LEARNED; > + ol->idle_timeout = fdb_local->timeout; > + ol->hard_timeout = OFP_FLOW_PERMANENT; > + ol->priority = OFP_DEFAULT_PRIORITY; > + ol->table_id = OFTABLE_GET_FDB; > + ol->cookie = htonll(cookie); > + > + /* Match on metadata of the packet that created the new table. */ > + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > + ol_spec->dst.field = mf_from_id(MFF_METADATA); > + ol_spec->dst.ofs = 0; > + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > + ol_spec->n_bits = ol_spec->dst.n_bits; > + ol_spec->dst_type = NX_LEARN_DST_MATCH; > + ol_spec->src_type = NX_LEARN_SRC_FIELD; > + ol_spec->src.field = mf_from_id(MFF_METADATA); > + > + /* Match on metadata of the packet. */ > + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); > + ol_spec->dst.ofs = 0; > + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > + ol_spec->n_bits = ol_spec->dst.n_bits; > + ol_spec->dst_type = NX_LEARN_DST_MATCH; > + ol_spec->src_type = NX_LEARN_SRC_FIELD; > + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); > + > + > + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ > + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); > + ol_spec->dst.ofs = 0; > + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > + ol_spec->n_bits = ol_spec->dst.n_bits; > + ol_spec->dst_type = NX_LEARN_DST_LOAD; > + ol_spec->src_type = NX_LEARN_SRC_FIELD; > + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); > + > + ofpact_finish_LEARN(ofpacts, &ol); > +} A difference from today's SB.FDB centralized approach is that when ovn-controller restarts these flows will be cleared, I think. Are we OK with that? I think so but if not what are the options to avoid clearing the local fdb cache on restart? Another difference with today's approach is that this avoids a controller action, that's great! > + > +static void > +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, > + const struct ovnact_encode_params *ep, > + struct ofpbuf *ofpacts) > +{ > + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); > +} > + > static void > encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, > const struct ovnact_encode_params *ep, > @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) > parse_sample(ctx); > } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { > ovnact_put_MAC_CACHE_USE(ctx->ovnacts); > + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { > + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); > } else { > lexer_syntax_error(ctx->lexer, "expecting action"); > } > diff --git a/northd/northd.c b/northd/northd.c > index d1465ddf7..de18694a0 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) > return smap_get_bool( ->options, "localnet_learn_fdb", false); > } > > +static bool > +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) > +{ > + return smap_get_bool(&nbs->other_config, "fdb_local", false); > +} > + Personally, I'd prefer if we don't add another config knob and we just make this the only way FDB works. We could also document that the FDB SB table should be deprecated. > static bool > lsp_is_type_changed(const struct sbrec_port_binding *sb, > const struct nbrec_logical_switch_port *nbsp, > @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows, > } > } > > +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300 > + This, on the other hand, might be a good candidate for a config option. > static void > build_lswitch_learn_fdb_op( > struct ovn_port *op, struct hmap *lflows, > @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op( > > if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") || > (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) { > + > + if (ls_is_fdb_local(op->od->nbs)) > + { > + uint32_t idle_timeout =smap_get_uint( > + &op->od->nbs->other_config, "fdb_local_idle_timeout", > + FDB_LOCAL_DEF_IDLE_TIMEOUT_S); > + ds_clear(match); > + ds_clear(actions); > + ds_put_format(match, "inport == %s", op->json_key); > + ds_put_format(actions, "commit_fdb_local(timeout=%u); next;", > + idle_timeout); > + ovn_lflow_add_with_lport_and_hint(lflows, op->od, > + S_SWITCH_IN_LOOKUP_FDB, 100, > + ds_cstr(match), ds_cstr(actions), > + op->key, &op->nbsp->header_); > + return; > + } > + > ds_clear(match); > ds_clear(actions); > ds_put_format(match, "inport == %s", op->json_key); > diff --git a/ovn-nb.xml b/ovn-nb.xml > index fcb1c6ecc..3547ec4a6 100644 > --- a/ovn-nb.xml > +++ b/ovn-nb.xml > @@ -803,6 +803,20 @@ > </column> > </group> > > + <group title="Local FDB options"> > + <column name="other_config" key="fdb_local" > + type='{"type": "boolean"}'> > + If set to <code>true</code>, FDB flows are commited only to the > + local chassis instead of southbound DB. Default is false. > + </column> > + <column name="other_config" key="fdb_local_idle_timeout" > + type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'> > + Local FDB flows <code>idle_timeout</code> value in seconds. FDB local > + flows exceeding this timeout will be automatically removed. The value > + defaults to 300, 0 means disabled. > + </column> > + </group> > + > <column name="copp"> > <p> > The control plane protection policy from table <ref table="Copp"/> > diff --git a/tests/ovn.at b/tests/ovn.at > index 92cf27581..dff50364a 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1]) > AT_CLEANUP > ]) > > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([Local FDB MAC learning]) > +ovn_start > +net_add n1 > + > +AT_CHECK([ovn-nbctl ls-add ls0]) > + > +AT_CHECK([ovn-nbctl lsp-add ls0 vif0]) > +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"]) > +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2]) > + > +AT_CHECK([ovn-nbctl lsp-add ls0 vif1]) > +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"]) > +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3]) > + > +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port]) > +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) > +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) > +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) > +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true]) > +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1]) > + > +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true]) > + > +sim_add hv1 > +as hv1 > +ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.1 > +ovs-vsctl -- add-port br-int vif0 -- \ > + set interface vif0 external-ids:iface-id=vif0 \ > + options:tx_pcap=hv1/vif0-tx.pcap \ > + options:rxq_pcap=hv1/vif0-rx.pcap \ > + ofport-request=1 > +ovs-vsctl -- add-port br-int vif1 -- \ > + set interface vif1 external-ids:iface-id=vif1 \ > + options:tx_pcap=hv1/vif1-tx.pcap \ > + options:rxq_pcap=hv1/vif1-rx.pcap \ > + ofport-request=2 > +ovs-vsctl -- add-port br-phys ext0 -- \ > + set interface ext0 \ > + options:tx_pcap=hv1/ext0-tx.pcap \ > + options:rxq_pcap=hv1/ext0-rx.pcap \ > + ofport-request=3 > +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys > + > +wait_for_ports_up > +AT_CHECK([ovn-nbctl --wait=hv sync]) > + > +send_packet() { > + src_mac=$1 > + src_ip=$2 > + dst_mac=$3 > + dst_ip=$4 > + iface=$5 > + packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 > + ovs-appctl netdev-dummy/receive $iface $packet > +} > + > + > +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet > +ovn-sbctl dump-flows ls0 > sw0flows > +AT_CAPTURE_FILE([sw0flows]) > + > +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl > + table=2 (ls_in_lookup_fdb ), priority=0 , dnl > +match=(1), action=(next;) > + table=2 (ls_in_lookup_fdb ), priority=100 , dnl > +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;) > + table=2 (ls_in_lookup_fdb ), priority=100 , dnl > +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;) > +]) > + > +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl > + table=3 (ls_in_put_fdb ), priority=0 , dnl > +match=(1), action=(next;) > +]) > + > + > +src_mac="505400000003" > +src_ip=`ip_to_hex 10.0.0.3` > +dst_mac="505400000004" > +dst_ip=`ip_to_hex 10.0.0.4` > + > +# send packet from vif0(which has unknown addr set) to vif1 > +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0 > + > +# send packet from vif1 to vif0(which has unknown addr set) > +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1 > + > +# send packet from underlay to vif1 > +src_mac="505400000064" > +src_ip=`ip_to_hex 10.0.0.100` > +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0 > +AT_CHECK([ovn-nbctl --wait=hv sync]) > + > +# Make sure that OVS table 71 is populated on hv1. > +AS_BOX([Check that ovn-controller programs the flows for FDB]) > +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt > +AT_CAPTURE_FILE([hv1_offlows_table71.txt]) > +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl > +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]] > +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]] > +]) > + > +OVN_CLEANUP([hv1]) > +AT_CLEANUP > +]) > + > OVN_FOR_EACH_NORTHD([ > AT_SETUP([MAC binding aging]) > AT_SKIP_IF([test $HAVE_SCAPY = no]) > diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c > index 0b86eae7b..354f84a4b 100644 > --- a/utilities/ovn-trace.c > +++ b/utilities/ovn-trace.c > @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, > break; > case OVNACT_MAC_CACHE_USE: > break; > + case OVNACT_COMMIT_FDB_LOCAL: > + break; > } > } > ofpbuf_uninit(&stack); Regards, Dumitru
> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: > > On 11/29/23 07:45, naveen.yerramneni wrote: >> This functionality can be enabled at the logical switch level: >> - "other_config:fdb_local" can be used to enable/disable this >> functionality, it is disabled by default. >> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >> for locally learned fdb flows, default timeout is 300 secs. >> >> If enabled, below lflow is added for each port that has unknown addr set. >> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >> action=(commit_fdb_local(timeout=<timeout>); next; >> >> New OVN action: "commit_fdb_local". This sets following OVS action. >> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >> >> This is useful when OVN is managing VLAN network that has multiple ports >> set with unknown addr and localnet_learn_fdb is enabled. With this config, >> if there is east-west traffic flowing between VMs part of same VLAN >> deployed on different hypervisors then, MAC addrs of the source and >> destination VMs keeps flapping between VM port and localnet port in >> Southbound FDB table. Enabling fdb_local config makes fdb table local to >> the chassis and avoids MAC flapping. >> >> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >> --- > > Hi Naveen, > > Thanks a lot for the patch! > > Just a note, we already have a fix for the east-west traffic that causes > FDB flapping when localnet is used: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= > > In general, however, I think it's a very good idea to move the FDB away > from the Southbound and make it local to each hypervisor. That reduces > load on the Southbound among other things. > Hi Dumitru, Thanks for informing about the patches. Yes, local FDB reduces load on southbound. >> include/ovn/actions.h | 7 +++ >> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >> northd/northd.c | 26 ++++++++++ >> ovn-nb.xml | 14 ++++++ >> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >> utilities/ovn-trace.c | 2 + >> 6 files changed, 251 insertions(+) >> >> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >> index 49cfe0624..85ac92cd3 100644 >> --- a/include/ovn/actions.h >> +++ b/include/ovn/actions.h >> @@ -127,6 +127,7 @@ struct collector_set_ids; >> OVNACT(CHK_LB_AFF, ovnact_result) \ >> OVNACT(SAMPLE, ovnact_sample) \ >> OVNACT(MAC_CACHE_USE, ovnact_null) \ >> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >> >> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >> enum OVS_PACKED_ENUM ovnact_type { >> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >> uint16_t timeout; >> }; >> >> +/* OVNACT_COMMIT_FBD_LOCAL. */ >> +struct ovnact_commit_fdb_local{ >> + struct ovnact ovnact; >> + uint16_t timeout; /* fdb_local flow timeout */ >> +}; >> + >> /* Internal use by the helpers below. */ >> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >> diff --git a/lib/actions.c b/lib/actions.c >> index a73fe1a1e..f5aa78db1 100644 >> --- a/lib/actions.c >> +++ b/lib/actions.c >> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >> ds_put_cstr(s, "mac_cache_use;"); >> } >> >> +static void >> +parse_commit_fdb_local(struct action_context *ctx, >> + struct ovnact_commit_fdb_local *fdb_local) >> +{ >> + uint16_t timeout = 0; >> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >> + if (!lexer_match_id(ctx->lexer, "timeout")) { >> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >> + return; >> + } >> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >> + return; >> + } >> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >> + return; >> + } >> + fdb_local->timeout = timeout; >> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >> +} >> + >> +static void >> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >> + struct ds *s) >> +{ >> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >> +} >> + >> +static void >> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >> +{ >> +} >> + >> +static void >> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >> + struct ofpbuf *ofpacts, uint32_t cookie) >> +{ >> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >> + struct match match = MATCH_CATCHALL_INITIALIZER; >> + struct ofpact_learn_spec *ol_spec; >> + unsigned int imm_bytes; >> + uint8_t *src_imm; >> + >> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >> + ol->idle_timeout = fdb_local->timeout; >> + ol->hard_timeout = OFP_FLOW_PERMANENT; >> + ol->priority = OFP_DEFAULT_PRIORITY; >> + ol->table_id = OFTABLE_GET_FDB; >> + ol->cookie = htonll(cookie); >> + >> + /* Match on metadata of the packet that created the new table. */ >> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >> + ol_spec->dst.ofs = 0; >> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >> + ol_spec->n_bits = ol_spec->dst.n_bits; >> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >> + ol_spec->src.field = mf_from_id(MFF_METADATA); >> + >> + /* Match on metadata of the packet. */ >> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >> + ol_spec->dst.ofs = 0; >> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >> + ol_spec->n_bits = ol_spec->dst.n_bits; >> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >> + >> + >> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >> + ol_spec->dst.ofs = 0; >> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >> + ol_spec->n_bits = ol_spec->dst.n_bits; >> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >> + >> + ofpact_finish_LEARN(ofpacts, &ol); >> +} > > A difference from today's SB.FDB centralized approach is that when > ovn-controller restarts these flows will be cleared, I think. > > Are we OK with that? I think so but if not what are the options to > avoid clearing the local fdb cache on restart? > OVS has to relearn the FDB flows whenever tables are cleared. During this time, packets gets flooded. I need to think about possible options if we want to retain FDB table. Can we take this up as an enhancement in a separate patch once we identify a solution for this ? > Another difference with today's approach is that this avoids a > controller action, that's great! > >> + >> +static void >> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >> + const struct ovnact_encode_params *ep, >> + struct ofpbuf *ofpacts) >> +{ >> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >> +} >> + >> static void >> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >> const struct ovnact_encode_params *ep, >> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >> parse_sample(ctx); >> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >> } else { >> lexer_syntax_error(ctx->lexer, "expecting action"); >> } >> diff --git a/northd/northd.c b/northd/northd.c >> index d1465ddf7..de18694a0 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >> return smap_get_bool( ->options, "localnet_learn_fdb", false); >> } >> >> +static bool >> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >> +{ >> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >> +} >> + > > Personally, I'd prefer if we don't add another config knob and we just > make this the only way FDB works. We could also document that the FDB > SB table should be deprecated. > If we want to make local FDB as default then, I think we need to handle overlay use case as well. Probably, we might have to add a new stage in logical switch egress pipeline to learn FDB entries for packets coming over tunnel (or) something similar. Can we take this up in a separate patch ? >> static bool >> lsp_is_type_changed(const struct sbrec_port_binding *sb, >> const struct nbrec_logical_switch_port *nbsp, >> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows, >> } >> } >> >> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300 >> + > > This, on the other hand, might be a good candidate for a config option. “fdb_local_idle_timeout” option is added in this patch to configure the timeout. Default value is 300 secs. > >> static void >> build_lswitch_learn_fdb_op( >> struct ovn_port *op, struct hmap *lflows, >> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op( >> >> if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") || >> (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) { >> + >> + if (ls_is_fdb_local(op->od->nbs)) >> + { >> + uint32_t idle_timeout =smap_get_uint( >> + &op->od->nbs->other_config, "fdb_local_idle_timeout", >> + FDB_LOCAL_DEF_IDLE_TIMEOUT_S); >> + ds_clear(match); >> + ds_clear(actions); >> + ds_put_format(match, "inport == %s", op->json_key); >> + ds_put_format(actions, "commit_fdb_local(timeout=%u); next;", >> + idle_timeout); >> + ovn_lflow_add_with_lport_and_hint(lflows, op->od, >> + S_SWITCH_IN_LOOKUP_FDB, 100, >> + ds_cstr(match), ds_cstr(actions), >> + op->key, &op->nbsp->header_); >> + return; >> + } >> + >> ds_clear(match); >> ds_clear(actions); >> ds_put_format(match, "inport == %s", op->json_key); >> diff --git a/ovn-nb.xml b/ovn-nb.xml >> index fcb1c6ecc..3547ec4a6 100644 >> --- a/ovn-nb.xml >> +++ b/ovn-nb.xml >> @@ -803,6 +803,20 @@ >> </column> >> </group> >> >> + <group title="Local FDB options"> >> + <column name="other_config" key="fdb_local" >> + type='{"type": "boolean"}'> >> + If set to <code>true</code>, FDB flows are commited only to the >> + local chassis instead of southbound DB. Default is false. >> + </column> >> + <column name="other_config" key="fdb_local_idle_timeout" >> + type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'> >> + Local FDB flows <code>idle_timeout</code> value in seconds. FDB local >> + flows exceeding this timeout will be automatically removed. The value >> + defaults to 300, 0 means disabled. >> + </column> >> + </group> >> + >> <column name="copp"> >> <p> >> The control plane protection policy from table <ref table="Copp"/> >> diff --git a/tests/ovn.at b/tests/ovn.at >> index 92cf27581..dff50364a 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1]) >> AT_CLEANUP >> ]) >> >> +OVN_FOR_EACH_NORTHD([ >> +AT_SETUP([Local FDB MAC learning]) >> +ovn_start >> +net_add n1 >> + >> +AT_CHECK([ovn-nbctl ls-add ls0]) >> + >> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0]) >> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"]) >> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2]) >> + >> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1]) >> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"]) >> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3]) >> + >> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port]) >> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) >> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) >> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) >> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true]) >> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1]) >> + >> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true]) >> + >> +sim_add hv1 >> +as hv1 >> +ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.1 >> +ovs-vsctl -- add-port br-int vif0 -- \ >> + set interface vif0 external-ids:iface-id=vif0 \ >> + options:tx_pcap=hv1/vif0-tx.pcap \ >> + options:rxq_pcap=hv1/vif0-rx.pcap \ >> + ofport-request=1 >> +ovs-vsctl -- add-port br-int vif1 -- \ >> + set interface vif1 external-ids:iface-id=vif1 \ >> + options:tx_pcap=hv1/vif1-tx.pcap \ >> + options:rxq_pcap=hv1/vif1-rx.pcap \ >> + ofport-request=2 >> +ovs-vsctl -- add-port br-phys ext0 -- \ >> + set interface ext0 \ >> + options:tx_pcap=hv1/ext0-tx.pcap \ >> + options:rxq_pcap=hv1/ext0-rx.pcap \ >> + ofport-request=3 >> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys >> + >> +wait_for_ports_up >> +AT_CHECK([ovn-nbctl --wait=hv sync]) >> + >> +send_packet() { >> + src_mac=$1 >> + src_ip=$2 >> + dst_mac=$3 >> + dst_ip=$4 >> + iface=$5 >> + packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >> + ovs-appctl netdev-dummy/receive $iface $packet >> +} >> + >> + >> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet >> +ovn-sbctl dump-flows ls0 > sw0flows >> +AT_CAPTURE_FILE([sw0flows]) >> + >> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl >> + table=2 (ls_in_lookup_fdb ), priority=0 , dnl >> +match=(1), action=(next;) >> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;) >> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;) >> +]) >> + >> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl >> + table=3 (ls_in_put_fdb ), priority=0 , dnl >> +match=(1), action=(next;) >> +]) >> + >> + >> +src_mac="505400000003" >> +src_ip=`ip_to_hex 10.0.0.3` >> +dst_mac="505400000004" >> +dst_ip=`ip_to_hex 10.0.0.4` >> + >> +# send packet from vif0(which has unknown addr set) to vif1 >> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0 >> + >> +# send packet from vif1 to vif0(which has unknown addr set) >> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1 >> + >> +# send packet from underlay to vif1 >> +src_mac="505400000064" >> +src_ip=`ip_to_hex 10.0.0.100` >> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0 >> +AT_CHECK([ovn-nbctl --wait=hv sync]) >> + >> +# Make sure that OVS table 71 is populated on hv1. >> +AS_BOX([Check that ovn-controller programs the flows for FDB]) >> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt >> +AT_CAPTURE_FILE([hv1_offlows_table71.txt]) >> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl >> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]] >> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]] >> +]) >> + >> +OVN_CLEANUP([hv1]) >> +AT_CLEANUP >> +]) >> + >> OVN_FOR_EACH_NORTHD([ >> AT_SETUP([MAC binding aging]) >> AT_SKIP_IF([test $HAVE_SCAPY = no]) >> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c >> index 0b86eae7b..354f84a4b 100644 >> --- a/utilities/ovn-trace.c >> +++ b/utilities/ovn-trace.c >> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, >> break; >> case OVNACT_MAC_CACHE_USE: >> break; >> + case OVNACT_COMMIT_FDB_LOCAL: >> + break; >> } >> } >> ofpbuf_uninit(&stack); > > Regards, > Dumitru
On 11/30/23 09:45, Naveen Yerramneni wrote: > > >> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 11/29/23 07:45, naveen.yerramneni wrote: >>> This functionality can be enabled at the logical switch level: >>> - "other_config:fdb_local" can be used to enable/disable this >>> functionality, it is disabled by default. >>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>> for locally learned fdb flows, default timeout is 300 secs. >>> >>> If enabled, below lflow is added for each port that has unknown addr set. >>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>> action=(commit_fdb_local(timeout=<timeout>); next; >>> >>> New OVN action: "commit_fdb_local". This sets following OVS action. >>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>> >>> This is useful when OVN is managing VLAN network that has multiple ports >>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>> if there is east-west traffic flowing between VMs part of same VLAN >>> deployed on different hypervisors then, MAC addrs of the source and >>> destination VMs keeps flapping between VM port and localnet port in >>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>> the chassis and avoids MAC flapping. >>> >>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>> --- >> >> Hi Naveen, >> >> Thanks a lot for the patch! >> >> Just a note, we already have a fix for the east-west traffic that causes >> FDB flapping when localnet is used: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >> >> In general, however, I think it's a very good idea to move the FDB away >> from the Southbound and make it local to each hypervisor. That reduces >> load on the Southbound among other things. >> > > Hi Dumitru, > > Thanks for informing about the patches. > Yes, local FDB reduces load on southbound. > > >>> include/ovn/actions.h | 7 +++ >>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>> northd/northd.c | 26 ++++++++++ >>> ovn-nb.xml | 14 ++++++ >>> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >>> utilities/ovn-trace.c | 2 + >>> 6 files changed, 251 insertions(+) >>> >>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>> index 49cfe0624..85ac92cd3 100644 >>> --- a/include/ovn/actions.h >>> +++ b/include/ovn/actions.h >>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>> OVNACT(SAMPLE, ovnact_sample) \ >>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>> >>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>> enum OVS_PACKED_ENUM ovnact_type { >>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>> uint16_t timeout; >>> }; >>> >>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>> +struct ovnact_commit_fdb_local{ >>> + struct ovnact ovnact; >>> + uint16_t timeout; /* fdb_local flow timeout */ >>> +}; >>> + >>> /* Internal use by the helpers below. */ >>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>> diff --git a/lib/actions.c b/lib/actions.c >>> index a73fe1a1e..f5aa78db1 100644 >>> --- a/lib/actions.c >>> +++ b/lib/actions.c >>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>> ds_put_cstr(s, "mac_cache_use;"); >>> } >>> >>> +static void >>> +parse_commit_fdb_local(struct action_context *ctx, >>> + struct ovnact_commit_fdb_local *fdb_local) >>> +{ >>> + uint16_t timeout = 0; >>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>> + return; >>> + } >>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>> + return; >>> + } >>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>> + return; >>> + } >>> + fdb_local->timeout = timeout; >>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>> +} >>> + >>> +static void >>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>> + struct ds *s) >>> +{ >>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>> +} >>> + >>> +static void >>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>> +{ >>> +} >>> + >>> +static void >>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>> + struct ofpbuf *ofpacts, uint32_t cookie) >>> +{ >>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>> + struct ofpact_learn_spec *ol_spec; >>> + unsigned int imm_bytes; >>> + uint8_t *src_imm; >>> + >>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>> + ol->idle_timeout = fdb_local->timeout; >>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>> + ol->priority = OFP_DEFAULT_PRIORITY; >>> + ol->table_id = OFTABLE_GET_FDB; >>> + ol->cookie = htonll(cookie); >>> + >>> + /* Match on metadata of the packet that created the new table. */ >>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>> + ol_spec->dst.ofs = 0; >>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>> + >>> + /* Match on metadata of the packet. */ >>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>> + ol_spec->dst.ofs = 0; >>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>> + >>> + >>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>> + ol_spec->dst.ofs = 0; >>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>> + >>> + ofpact_finish_LEARN(ofpacts, &ol); >>> +} >> >> A difference from today's SB.FDB centralized approach is that when >> ovn-controller restarts these flows will be cleared, I think. >> >> Are we OK with that? I think so but if not what are the options to >> avoid clearing the local fdb cache on restart? >> > > OVS has to relearn the FDB flows whenever tables are cleared. > During this time, packets gets flooded. I need to think about possible > options if we want to retain FDB table. > > Can we take this up as an enhancement in a separate patch > once we identify a solution for this ? > Sounds good to me. >> Another difference with today's approach is that this avoids a >> controller action, that's great! >> >>> + >>> +static void >>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>> + const struct ovnact_encode_params *ep, >>> + struct ofpbuf *ofpacts) >>> +{ >>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>> +} >>> + >>> static void >>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>> const struct ovnact_encode_params *ep, >>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>> parse_sample(ctx); >>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>> } else { >>> lexer_syntax_error(ctx->lexer, "expecting action"); >>> } >>> diff --git a/northd/northd.c b/northd/northd.c >>> index d1465ddf7..de18694a0 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>> } >>> >>> +static bool >>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>> +{ >>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>> +} >>> + >> >> Personally, I'd prefer if we don't add another config knob and we just >> make this the only way FDB works. We could also document that the FDB >> SB table should be deprecated. >> > > If we want to make local FDB as default then, I think we need to handle overlay > use case as well. Probably, we might have to add a new stage in logical switch > egress pipeline to learn FDB entries for packets coming over tunnel (or) > something similar. > > Can we take this up in a separate patch ? > > Given that we have a fix already for the original problem you were trying to address I would prefer that we avoid adding new config knobs and handle both the localnet and overlay cases at the same time. OTOH, why is there a difference? The learned flow loads MFF_LOG_INPORT (from the packet that triggers the learn() action) into MFF_LOG_INPORT: With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With VXLAN that's not the case but that's already a documented limitation, LOG_INPORT is not available after VXLAN tunneling therefore features that need it (like egress ACLs matching against ingress port identifiers) are not supported: https://github.com/ovn-org/ovn/blob/main/ovn-architecture.7.xml#L2842 One more thing we need to take care of in order to be able to make local FDB the default is "FDB refresh": 551527a5e68e ("controller: Update FDB timestamp") https://github.com/ovn-org/ovn/commit/551527a5e68e7233ad80d212d549df98f13e37bc >>> static bool >>> lsp_is_type_changed(const struct sbrec_port_binding *sb, >>> const struct nbrec_logical_switch_port *nbsp, >>> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows, >>> } >>> } >>> >>> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300 >>> + >> >> This, on the other hand, might be a good candidate for a config option. > > “fdb_local_idle_timeout” option is added in this patch to configure the timeout. > Default value is 300 secs. > True, I missed that you do that below. However, we already have "LS.other_config:fdb_age_threshold" in the NB database to control exactly the same thing. I'd just use that one. >> >>> static void >>> build_lswitch_learn_fdb_op( >>> struct ovn_port *op, struct hmap *lflows, >>> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op( >>> >>> if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") || >>> (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) { >>> + >>> + if (ls_is_fdb_local(op->od->nbs)) >>> + { >>> + uint32_t idle_timeout =smap_get_uint( >>> + &op->od->nbs->other_config, "fdb_local_idle_timeout", >>> + FDB_LOCAL_DEF_IDLE_TIMEOUT_S); >>> + ds_clear(match); >>> + ds_clear(actions); >>> + ds_put_format(match, "inport == %s", op->json_key); >>> + ds_put_format(actions, "commit_fdb_local(timeout=%u); next;", >>> + idle_timeout); >>> + ovn_lflow_add_with_lport_and_hint(lflows, op->od, >>> + S_SWITCH_IN_LOOKUP_FDB, 100, >>> + ds_cstr(match), ds_cstr(actions), >>> + op->key, &op->nbsp->header_); >>> + return; >>> + } >>> + >>> ds_clear(match); >>> ds_clear(actions); >>> ds_put_format(match, "inport == %s", op->json_key); >>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>> index fcb1c6ecc..3547ec4a6 100644 >>> --- a/ovn-nb.xml >>> +++ b/ovn-nb.xml >>> @@ -803,6 +803,20 @@ >>> </column> >>> </group> >>> >>> + <group title="Local FDB options"> >>> + <column name="other_config" key="fdb_local" >>> + type='{"type": "boolean"}'> >>> + If set to <code>true</code>, FDB flows are commited only to the >>> + local chassis instead of southbound DB. Default is false. >>> + </column> >>> + <column name="other_config" key="fdb_local_idle_timeout" >>> + type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'> >>> + Local FDB flows <code>idle_timeout</code> value in seconds. FDB local >>> + flows exceeding this timeout will be automatically removed. The value >>> + defaults to 300, 0 means disabled. >>> + </column> >>> + </group> >>> + >>> <column name="copp"> >>> <p> >>> The control plane protection policy from table <ref table="Copp"/> >>> diff --git a/tests/ovn.at b/tests/ovn.at >>> index 92cf27581..dff50364a 100644 >>> --- a/tests/ovn.at >>> +++ b/tests/ovn.at >>> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1]) >>> AT_CLEANUP >>> ]) >>> >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([Local FDB MAC learning]) >>> +ovn_start >>> +net_add n1 >>> + >>> +AT_CHECK([ovn-nbctl ls-add ls0]) >>> + >>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0]) >>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"]) >>> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2]) >>> + >>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1]) >>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"]) >>> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3]) >>> + >>> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port]) >>> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) >>> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) >>> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) >>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true]) >>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1]) >>> + >>> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true]) >>> + >>> +sim_add hv1 >>> +as hv1 >>> +ovs-vsctl add-br br-phys >>> +ovn_attach n1 br-phys 192.168.0.1 >>> +ovs-vsctl -- add-port br-int vif0 -- \ >>> + set interface vif0 external-ids:iface-id=vif0 \ >>> + options:tx_pcap=hv1/vif0-tx.pcap \ >>> + options:rxq_pcap=hv1/vif0-rx.pcap \ >>> + ofport-request=1 >>> +ovs-vsctl -- add-port br-int vif1 -- \ >>> + set interface vif1 external-ids:iface-id=vif1 \ >>> + options:tx_pcap=hv1/vif1-tx.pcap \ >>> + options:rxq_pcap=hv1/vif1-rx.pcap \ >>> + ofport-request=2 >>> +ovs-vsctl -- add-port br-phys ext0 -- \ >>> + set interface ext0 \ >>> + options:tx_pcap=hv1/ext0-tx.pcap \ >>> + options:rxq_pcap=hv1/ext0-rx.pcap \ >>> + ofport-request=3 >>> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys >>> + >>> +wait_for_ports_up >>> +AT_CHECK([ovn-nbctl --wait=hv sync]) >>> + >>> +send_packet() { >>> + src_mac=$1 >>> + src_ip=$2 >>> + dst_mac=$3 >>> + dst_ip=$4 >>> + iface=$5 >>> + packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>> + ovs-appctl netdev-dummy/receive $iface $packet >>> +} >>> + >>> + >>> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet >>> +ovn-sbctl dump-flows ls0 > sw0flows >>> +AT_CAPTURE_FILE([sw0flows]) >>> + >>> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl >>> + table=2 (ls_in_lookup_fdb ), priority=0 , dnl >>> +match=(1), action=(next;) >>> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >>> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;) >>> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >>> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;) >>> +]) >>> + >>> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl >>> + table=3 (ls_in_put_fdb ), priority=0 , dnl >>> +match=(1), action=(next;) >>> +]) >>> + >>> + >>> +src_mac="505400000003" >>> +src_ip=`ip_to_hex 10.0.0.3` >>> +dst_mac="505400000004" >>> +dst_ip=`ip_to_hex 10.0.0.4` >>> + >>> +# send packet from vif0(which has unknown addr set) to vif1 >>> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0 >>> + >>> +# send packet from vif1 to vif0(which has unknown addr set) >>> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1 >>> + >>> +# send packet from underlay to vif1 >>> +src_mac="505400000064" >>> +src_ip=`ip_to_hex 10.0.0.100` >>> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0 >>> +AT_CHECK([ovn-nbctl --wait=hv sync]) >>> + >>> +# Make sure that OVS table 71 is populated on hv1. >>> +AS_BOX([Check that ovn-controller programs the flows for FDB]) >>> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt >>> +AT_CAPTURE_FILE([hv1_offlows_table71.txt]) >>> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl >>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]] >>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]] >>> +]) >>> + >>> +OVN_CLEANUP([hv1]) >>> +AT_CLEANUP >>> +]) >>> + >>> OVN_FOR_EACH_NORTHD([ >>> AT_SETUP([MAC binding aging]) >>> AT_SKIP_IF([test $HAVE_SCAPY = no]) >>> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c >>> index 0b86eae7b..354f84a4b 100644 >>> --- a/utilities/ovn-trace.c >>> +++ b/utilities/ovn-trace.c >>> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, >>> break; >>> case OVNACT_MAC_CACHE_USE: >>> break; >>> + case OVNACT_COMMIT_FDB_LOCAL: >>> + break; >>> } >>> } >>> ofpbuf_uninit(&stack); >> Regards, Dumitru
> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: > > On 11/30/23 09:45, Naveen Yerramneni wrote: >> >> >>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>> This functionality can be enabled at the logical switch level: >>>> - "other_config:fdb_local" can be used to enable/disable this >>>> functionality, it is disabled by default. >>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>> for locally learned fdb flows, default timeout is 300 secs. >>>> >>>> If enabled, below lflow is added for each port that has unknown addr set. >>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>> >>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>> >>>> This is useful when OVN is managing VLAN network that has multiple ports >>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>> if there is east-west traffic flowing between VMs part of same VLAN >>>> deployed on different hypervisors then, MAC addrs of the source and >>>> destination VMs keeps flapping between VM port and localnet port in >>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>> the chassis and avoids MAC flapping. >>>> >>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>> --- >>> >>> Hi Naveen, >>> >>> Thanks a lot for the patch! >>> >>> Just a note, we already have a fix for the east-west traffic that causes >>> FDB flapping when localnet is used: >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>> >>> In general, however, I think it's a very good idea to move the FDB away >>> from the Southbound and make it local to each hypervisor. That reduces >>> load on the Southbound among other things. >>> >> >> Hi Dumitru, >> >> Thanks for informing about the patches. >> Yes, local FDB reduces load on southbound. >> >> >>>> include/ovn/actions.h | 7 +++ >>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>> northd/northd.c | 26 ++++++++++ >>>> ovn-nb.xml | 14 ++++++ >>>> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>> utilities/ovn-trace.c | 2 + >>>> 6 files changed, 251 insertions(+) >>>> >>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>> index 49cfe0624..85ac92cd3 100644 >>>> --- a/include/ovn/actions.h >>>> +++ b/include/ovn/actions.h >>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>> OVNACT(SAMPLE, ovnact_sample) \ >>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>> >>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>> enum OVS_PACKED_ENUM ovnact_type { >>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>> uint16_t timeout; >>>> }; >>>> >>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>> +struct ovnact_commit_fdb_local{ >>>> + struct ovnact ovnact; >>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>> +}; >>>> + >>>> /* Internal use by the helpers below. */ >>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>> diff --git a/lib/actions.c b/lib/actions.c >>>> index a73fe1a1e..f5aa78db1 100644 >>>> --- a/lib/actions.c >>>> +++ b/lib/actions.c >>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>> ds_put_cstr(s, "mac_cache_use;"); >>>> } >>>> >>>> +static void >>>> +parse_commit_fdb_local(struct action_context *ctx, >>>> + struct ovnact_commit_fdb_local *fdb_local) >>>> +{ >>>> + uint16_t timeout = 0; >>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>> + return; >>>> + } >>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>> + return; >>>> + } >>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>> + return; >>>> + } >>>> + fdb_local->timeout = timeout; >>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>> +} >>>> + >>>> +static void >>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>> + struct ds *s) >>>> +{ >>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>> +} >>>> + >>>> +static void >>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>> +{ >>>> +} >>>> + >>>> +static void >>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>> +{ >>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>> + struct ofpact_learn_spec *ol_spec; >>>> + unsigned int imm_bytes; >>>> + uint8_t *src_imm; >>>> + >>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>> + ol->idle_timeout = fdb_local->timeout; >>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>> + ol->table_id = OFTABLE_GET_FDB; >>>> + ol->cookie = htonll(cookie); >>>> + >>>> + /* Match on metadata of the packet that created the new table. */ >>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>> + ol_spec->dst.ofs = 0; >>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>> + >>>> + /* Match on metadata of the packet. */ >>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>> + ol_spec->dst.ofs = 0; >>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>> + >>>> + >>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>> + ol_spec->dst.ofs = 0; >>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>> + >>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>> +} >>> >>> A difference from today's SB.FDB centralized approach is that when >>> ovn-controller restarts these flows will be cleared, I think. >>> >>> Are we OK with that? I think so but if not what are the options to >>> avoid clearing the local fdb cache on restart? >>> >> >> OVS has to relearn the FDB flows whenever tables are cleared. >> During this time, packets gets flooded. I need to think about possible >> options if we want to retain FDB table. >> >> Can we take this up as an enhancement in a separate patch >> once we identify a solution for this ? >> > > Sounds good to me. > >>> Another difference with today's approach is that this avoids a >>> controller action, that's great! >>> >>>> + >>>> +static void >>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>> + const struct ovnact_encode_params *ep, >>>> + struct ofpbuf *ofpacts) >>>> +{ >>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>> +} >>>> + >>>> static void >>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>> const struct ovnact_encode_params *ep, >>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>> parse_sample(ctx); >>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>> } else { >>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>> } >>>> diff --git a/northd/northd.c b/northd/northd.c >>>> index d1465ddf7..de18694a0 100644 >>>> --- a/northd/northd.c >>>> +++ b/northd/northd.c >>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>> } >>>> >>>> +static bool >>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>> +{ >>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>> +} >>>> + >>> >>> Personally, I'd prefer if we don't add another config knob and we just >>> make this the only way FDB works. We could also document that the FDB >>> SB table should be deprecated. >>> >> >> If we want to make local FDB as default then, I think we need to handle overlay >> use case as well. Probably, we might have to add a new stage in logical switch >> egress pipeline to learn FDB entries for packets coming over tunnel (or) >> something similar. >> >> Can we take this up in a separate patch ? >> >> > > Given that we have a fix already for the original problem you were > trying to address I would prefer that we avoid adding new config knobs > and handle both the localnet and overlay cases at the same time. > Sure. > OTOH, why is there a difference? > > The learned flow loads MFF_LOG_INPORT (from the packet that triggers the > learn() action) into MFF_LOG_INPORT: > > With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With > VXLAN that's not the case but that's already a documented limitation, > LOG_INPORT is not available after VXLAN tunneling therefore features > that need it (like egress ACLs matching against ingress port > identifiers) are not supported: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= > In case of overlay, ingress and egress pipelines are processed on different nodes assuming source and dest VMs are on different nodes. I think this makes local FDB to learn only local VMs MACs and it never learn remote VMs MACs. Example: 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. > One more thing we need to take care of in order to be able to make local > FDB the default is "FDB refresh": > > 551527a5e68e ("controller: Update FDB timestamp") > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= > Sure. >>>> static bool >>>> lsp_is_type_changed(const struct sbrec_port_binding *sb, >>>> const struct nbrec_logical_switch_port *nbsp, >>>> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows, >>>> } >>>> } >>>> >>>> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300 >>>> + >>> >>> This, on the other hand, might be a good candidate for a config option. >> >> “fdb_local_idle_timeout” option is added in this patch to configure the timeout. >> Default value is 300 secs. >> > > True, I missed that you do that below. However, we already have > "LS.other_config:fdb_age_threshold" in the NB database to control > exactly the same thing. I'd just use that one. > Yes, we can reuse the same. >>> >>>> static void >>>> build_lswitch_learn_fdb_op( >>>> struct ovn_port *op, struct hmap *lflows, >>>> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op( >>>> >>>> if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") || >>>> (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) { >>>> + >>>> + if (ls_is_fdb_local(op->od->nbs)) >>>> + { >>>> + uint32_t idle_timeout =smap_get_uint( >>>> + &op->od->nbs->other_config, "fdb_local_idle_timeout", >>>> + FDB_LOCAL_DEF_IDLE_TIMEOUT_S); >>>> + ds_clear(match); >>>> + ds_clear(actions); >>>> + ds_put_format(match, "inport == %s", op->json_key); >>>> + ds_put_format(actions, "commit_fdb_local(timeout=%u); next;", >>>> + idle_timeout); >>>> + ovn_lflow_add_with_lport_and_hint(lflows, op->od, >>>> + S_SWITCH_IN_LOOKUP_FDB, 100, >>>> + ds_cstr(match), ds_cstr(actions), >>>> + op->key, &op->nbsp->header_); >>>> + return; >>>> + } >>>> + >>>> ds_clear(match); >>>> ds_clear(actions); >>>> ds_put_format(match, "inport == %s", op->json_key); >>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>> index fcb1c6ecc..3547ec4a6 100644 >>>> --- a/ovn-nb.xml >>>> +++ b/ovn-nb.xml >>>> @@ -803,6 +803,20 @@ >>>> </column> >>>> </group> >>>> >>>> + <group title="Local FDB options"> >>>> + <column name="other_config" key="fdb_local" >>>> + type='{"type": "boolean"}'> >>>> + If set to <code>true</code>, FDB flows are commited only to the >>>> + local chassis instead of southbound DB. Default is false. >>>> + </column> >>>> + <column name="other_config" key="fdb_local_idle_timeout" >>>> + type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'> >>>> + Local FDB flows <code>idle_timeout</code> value in seconds. FDB local >>>> + flows exceeding this timeout will be automatically removed. The value >>>> + defaults to 300, 0 means disabled. >>>> + </column> >>>> + </group> >>>> + >>>> <column name="copp"> >>>> <p> >>>> The control plane protection policy from table <ref table="Copp"/> >>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>> index 92cf27581..dff50364a 100644 >>>> --- a/tests/ovn.at >>>> +++ b/tests/ovn.at >>>> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1]) >>>> AT_CLEANUP >>>> ]) >>>> >>>> +OVN_FOR_EACH_NORTHD([ >>>> +AT_SETUP([Local FDB MAC learning]) >>>> +ovn_start >>>> +net_add n1 >>>> + >>>> +AT_CHECK([ovn-nbctl ls-add ls0]) >>>> + >>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0]) >>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"]) >>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2]) >>>> + >>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1]) >>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"]) >>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3]) >>>> + >>>> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port]) >>>> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) >>>> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) >>>> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) >>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true]) >>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1]) >>>> + >>>> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true]) >>>> + >>>> +sim_add hv1 >>>> +as hv1 >>>> +ovs-vsctl add-br br-phys >>>> +ovn_attach n1 br-phys 192.168.0.1 >>>> +ovs-vsctl -- add-port br-int vif0 -- \ >>>> + set interface vif0 external-ids:iface-id=vif0 \ >>>> + options:tx_pcap=hv1/vif0-tx.pcap \ >>>> + options:rxq_pcap=hv1/vif0-rx.pcap \ >>>> + ofport-request=1 >>>> +ovs-vsctl -- add-port br-int vif1 -- \ >>>> + set interface vif1 external-ids:iface-id=vif1 \ >>>> + options:tx_pcap=hv1/vif1-tx.pcap \ >>>> + options:rxq_pcap=hv1/vif1-rx.pcap \ >>>> + ofport-request=2 >>>> +ovs-vsctl -- add-port br-phys ext0 -- \ >>>> + set interface ext0 \ >>>> + options:tx_pcap=hv1/ext0-tx.pcap \ >>>> + options:rxq_pcap=hv1/ext0-rx.pcap \ >>>> + ofport-request=3 >>>> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys >>>> + >>>> +wait_for_ports_up >>>> +AT_CHECK([ovn-nbctl --wait=hv sync]) >>>> + >>>> +send_packet() { >>>> + src_mac=$1 >>>> + src_ip=$2 >>>> + dst_mac=$3 >>>> + dst_ip=$4 >>>> + iface=$5 >>>> + packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>>> + ovs-appctl netdev-dummy/receive $iface $packet >>>> +} >>>> + >>>> + >>>> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet >>>> +ovn-sbctl dump-flows ls0 > sw0flows >>>> +AT_CAPTURE_FILE([sw0flows]) >>>> + >>>> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl >>>> + table=2 (ls_in_lookup_fdb ), priority=0 , dnl >>>> +match=(1), action=(next;) >>>> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >>>> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;) >>>> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >>>> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;) >>>> +]) >>>> + >>>> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl >>>> + table=3 (ls_in_put_fdb ), priority=0 , dnl >>>> +match=(1), action=(next;) >>>> +]) >>>> + >>>> + >>>> +src_mac="505400000003" >>>> +src_ip=`ip_to_hex 10.0.0.3` >>>> +dst_mac="505400000004" >>>> +dst_ip=`ip_to_hex 10.0.0.4` >>>> + >>>> +# send packet from vif0(which has unknown addr set) to vif1 >>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0 >>>> + >>>> +# send packet from vif1 to vif0(which has unknown addr set) >>>> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1 >>>> + >>>> +# send packet from underlay to vif1 >>>> +src_mac="505400000064" >>>> +src_ip=`ip_to_hex 10.0.0.100` >>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0 >>>> +AT_CHECK([ovn-nbctl --wait=hv sync]) >>>> + >>>> +# Make sure that OVS table 71 is populated on hv1. >>>> +AS_BOX([Check that ovn-controller programs the flows for FDB]) >>>> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt >>>> +AT_CAPTURE_FILE([hv1_offlows_table71.txt]) >>>> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl >>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]] >>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]] >>>> +]) >>>> + >>>> +OVN_CLEANUP([hv1]) >>>> +AT_CLEANUP >>>> +]) >>>> + >>>> OVN_FOR_EACH_NORTHD([ >>>> AT_SETUP([MAC binding aging]) >>>> AT_SKIP_IF([test $HAVE_SCAPY = no]) >>>> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c >>>> index 0b86eae7b..354f84a4b 100644 >>>> --- a/utilities/ovn-trace.c >>>> +++ b/utilities/ovn-trace.c >>>> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, >>>> break; >>>> case OVNACT_MAC_CACHE_USE: >>>> break; >>>> + case OVNACT_COMMIT_FDB_LOCAL: >>>> + break; >>>> } >>>> } >>>> ofpbuf_uninit(&stack); >>> > > Regards, > Dumitru
On 11/30/23 15:54, Naveen Yerramneni wrote: > > >> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 11/30/23 09:45, Naveen Yerramneni wrote: >>> >>> >>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>>> This functionality can be enabled at the logical switch level: >>>>> - "other_config:fdb_local" can be used to enable/disable this >>>>> functionality, it is disabled by default. >>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>>> for locally learned fdb flows, default timeout is 300 secs. >>>>> >>>>> If enabled, below lflow is added for each port that has unknown addr set. >>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>>> >>>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>>> >>>>> This is useful when OVN is managing VLAN network that has multiple ports >>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>>> if there is east-west traffic flowing between VMs part of same VLAN >>>>> deployed on different hypervisors then, MAC addrs of the source and >>>>> destination VMs keeps flapping between VM port and localnet port in >>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>>> the chassis and avoids MAC flapping. >>>>> >>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>>> --- >>>> >>>> Hi Naveen, >>>> >>>> Thanks a lot for the patch! >>>> >>>> Just a note, we already have a fix for the east-west traffic that causes >>>> FDB flapping when localnet is used: >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>>> >>>> In general, however, I think it's a very good idea to move the FDB away >>>> from the Southbound and make it local to each hypervisor. That reduces >>>> load on the Southbound among other things. >>>> >>> >>> Hi Dumitru, >>> >>> Thanks for informing about the patches. >>> Yes, local FDB reduces load on southbound. >>> >>> >>>>> include/ovn/actions.h | 7 +++ >>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>>> northd/northd.c | 26 ++++++++++ >>>>> ovn-nb.xml | 14 ++++++ >>>>> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>>> utilities/ovn-trace.c | 2 + >>>>> 6 files changed, 251 insertions(+) >>>>> >>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>>> index 49cfe0624..85ac92cd3 100644 >>>>> --- a/include/ovn/actions.h >>>>> +++ b/include/ovn/actions.h >>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>>> OVNACT(SAMPLE, ovnact_sample) \ >>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>>> >>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>>> enum OVS_PACKED_ENUM ovnact_type { >>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>>> uint16_t timeout; >>>>> }; >>>>> >>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>>> +struct ovnact_commit_fdb_local{ >>>>> + struct ovnact ovnact; >>>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>>> +}; >>>>> + >>>>> /* Internal use by the helpers below. */ >>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>>> diff --git a/lib/actions.c b/lib/actions.c >>>>> index a73fe1a1e..f5aa78db1 100644 >>>>> --- a/lib/actions.c >>>>> +++ b/lib/actions.c >>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>>> ds_put_cstr(s, "mac_cache_use;"); >>>>> } >>>>> >>>>> +static void >>>>> +parse_commit_fdb_local(struct action_context *ctx, >>>>> + struct ovnact_commit_fdb_local *fdb_local) >>>>> +{ >>>>> + uint16_t timeout = 0; >>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>> + return; >>>>> + } >>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>> + return; >>>>> + } >>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>>> + return; >>>>> + } >>>>> + fdb_local->timeout = timeout; >>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>>> +} >>>>> + >>>>> +static void >>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>> + struct ds *s) >>>>> +{ >>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>>> +} >>>>> + >>>>> +static void >>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>>> +{ >>>>> +} >>>>> + >>>>> +static void >>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>>> +{ >>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>>> + struct ofpact_learn_spec *ol_spec; >>>>> + unsigned int imm_bytes; >>>>> + uint8_t *src_imm; >>>>> + >>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>>> + ol->idle_timeout = fdb_local->timeout; >>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>>> + ol->table_id = OFTABLE_GET_FDB; >>>>> + ol->cookie = htonll(cookie); >>>>> + >>>>> + /* Match on metadata of the packet that created the new table. */ >>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>>> + ol_spec->dst.ofs = 0; >>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>>> + >>>>> + /* Match on metadata of the packet. */ >>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>>> + ol_spec->dst.ofs = 0; >>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>>> + >>>>> + >>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>>> + ol_spec->dst.ofs = 0; >>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>>> + >>>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>>> +} >>>> >>>> A difference from today's SB.FDB centralized approach is that when >>>> ovn-controller restarts these flows will be cleared, I think. >>>> >>>> Are we OK with that? I think so but if not what are the options to >>>> avoid clearing the local fdb cache on restart? >>>> >>> >>> OVS has to relearn the FDB flows whenever tables are cleared. >>> During this time, packets gets flooded. I need to think about possible >>> options if we want to retain FDB table. >>> >>> Can we take this up as an enhancement in a separate patch >>> once we identify a solution for this ? >>> >> >> Sounds good to me. >> >>>> Another difference with today's approach is that this avoids a >>>> controller action, that's great! >>>> >>>>> + >>>>> +static void >>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>> + const struct ovnact_encode_params *ep, >>>>> + struct ofpbuf *ofpacts) >>>>> +{ >>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>>> +} >>>>> + >>>>> static void >>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>>> const struct ovnact_encode_params *ep, >>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>>> parse_sample(ctx); >>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>>> } else { >>>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>>> } >>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>> index d1465ddf7..de18694a0 100644 >>>>> --- a/northd/northd.c >>>>> +++ b/northd/northd.c >>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>>> } >>>>> >>>>> +static bool >>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>>> +{ >>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>>> +} >>>>> + >>>> >>>> Personally, I'd prefer if we don't add another config knob and we just >>>> make this the only way FDB works. We could also document that the FDB >>>> SB table should be deprecated. >>>> >>> >>> If we want to make local FDB as default then, I think we need to handle overlay >>> use case as well. Probably, we might have to add a new stage in logical switch >>> egress pipeline to learn FDB entries for packets coming over tunnel (or) >>> something similar. >>> >>> Can we take this up in a separate patch ? >>> >>> >> >> Given that we have a fix already for the original problem you were >> trying to address I would prefer that we avoid adding new config knobs >> and handle both the localnet and overlay cases at the same time. >> > > Sure. > >> OTOH, why is there a difference? >> >> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the >> learn() action) into MFF_LOG_INPORT: >> >> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With >> VXLAN that's not the case but that's already a documented limitation, >> LOG_INPORT is not available after VXLAN tunneling therefore features >> that need it (like egress ACLs matching against ingress port >> identifiers) are not supported: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= >> > > In case of overlay, ingress and egress pipelines are processed on different nodes > assuming source and dest VMs are on different nodes. I think this makes local FDB > to learn only local VMs MACs and it never learn remote VMs MACs. > > Example: > 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 > and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens > on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. > Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is > responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also > gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. > > Ah, I see your point, you're right. So it does look like we'd need a dedicated fdb learning stage in the egress pipeline. But that is probably not that terrible. >> One more thing we need to take care of in order to be able to make local >> FDB the default is "FDB refresh": >> >> 551527a5e68e ("controller: Update FDB timestamp") >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= >> > > Sure. > >>>>> static bool >>>>> lsp_is_type_changed(const struct sbrec_port_binding *sb, >>>>> const struct nbrec_logical_switch_port *nbsp, >>>>> @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows, >>>>> } >>>>> } >>>>> >>>>> +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300 >>>>> + >>>> >>>> This, on the other hand, might be a good candidate for a config option. >>> >>> “fdb_local_idle_timeout” option is added in this patch to configure the timeout. >>> Default value is 300 secs. >>> >> >> True, I missed that you do that below. However, we already have >> "LS.other_config:fdb_age_threshold" in the NB database to control >> exactly the same thing. I'd just use that one. >> > > Yes, we can reuse the same. > >>>> >>>>> static void >>>>> build_lswitch_learn_fdb_op( >>>>> struct ovn_port *op, struct hmap *lflows, >>>>> @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op( >>>>> >>>>> if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") || >>>>> (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) { >>>>> + >>>>> + if (ls_is_fdb_local(op->od->nbs)) >>>>> + { >>>>> + uint32_t idle_timeout =smap_get_uint( >>>>> + &op->od->nbs->other_config, "fdb_local_idle_timeout", >>>>> + FDB_LOCAL_DEF_IDLE_TIMEOUT_S); >>>>> + ds_clear(match); >>>>> + ds_clear(actions); >>>>> + ds_put_format(match, "inport == %s", op->json_key); >>>>> + ds_put_format(actions, "commit_fdb_local(timeout=%u); next;", >>>>> + idle_timeout); >>>>> + ovn_lflow_add_with_lport_and_hint(lflows, op->od, >>>>> + S_SWITCH_IN_LOOKUP_FDB, 100, >>>>> + ds_cstr(match), ds_cstr(actions), >>>>> + op->key, &op->nbsp->header_); >>>>> + return; >>>>> + } >>>>> + >>>>> ds_clear(match); >>>>> ds_clear(actions); >>>>> ds_put_format(match, "inport == %s", op->json_key); >>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>> index fcb1c6ecc..3547ec4a6 100644 >>>>> --- a/ovn-nb.xml >>>>> +++ b/ovn-nb.xml >>>>> @@ -803,6 +803,20 @@ >>>>> </column> >>>>> </group> >>>>> >>>>> + <group title="Local FDB options"> >>>>> + <column name="other_config" key="fdb_local" >>>>> + type='{"type": "boolean"}'> >>>>> + If set to <code>true</code>, FDB flows are commited only to the >>>>> + local chassis instead of southbound DB. Default is false. >>>>> + </column> >>>>> + <column name="other_config" key="fdb_local_idle_timeout" >>>>> + type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'> >>>>> + Local FDB flows <code>idle_timeout</code> value in seconds. FDB local >>>>> + flows exceeding this timeout will be automatically removed. The value >>>>> + defaults to 300, 0 means disabled. >>>>> + </column> >>>>> + </group> >>>>> + >>>>> <column name="copp"> >>>>> <p> >>>>> The control plane protection policy from table <ref table="Copp"/> >>>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>>> index 92cf27581..dff50364a 100644 >>>>> --- a/tests/ovn.at >>>>> +++ b/tests/ovn.at >>>>> @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1]) >>>>> AT_CLEANUP >>>>> ]) >>>>> >>>>> +OVN_FOR_EACH_NORTHD([ >>>>> +AT_SETUP([Local FDB MAC learning]) >>>>> +ovn_start >>>>> +net_add n1 >>>>> + >>>>> +AT_CHECK([ovn-nbctl ls-add ls0]) >>>>> + >>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif0]) >>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"]) >>>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2]) >>>>> + >>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 vif1]) >>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"]) >>>>> +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3]) >>>>> + >>>>> +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port]) >>>>> +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) >>>>> +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) >>>>> +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) >>>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true]) >>>>> +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1]) >>>>> + >>>>> +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true]) >>>>> + >>>>> +sim_add hv1 >>>>> +as hv1 >>>>> +ovs-vsctl add-br br-phys >>>>> +ovn_attach n1 br-phys 192.168.0.1 >>>>> +ovs-vsctl -- add-port br-int vif0 -- \ >>>>> + set interface vif0 external-ids:iface-id=vif0 \ >>>>> + options:tx_pcap=hv1/vif0-tx.pcap \ >>>>> + options:rxq_pcap=hv1/vif0-rx.pcap \ >>>>> + ofport-request=1 >>>>> +ovs-vsctl -- add-port br-int vif1 -- \ >>>>> + set interface vif1 external-ids:iface-id=vif1 \ >>>>> + options:tx_pcap=hv1/vif1-tx.pcap \ >>>>> + options:rxq_pcap=hv1/vif1-rx.pcap \ >>>>> + ofport-request=2 >>>>> +ovs-vsctl -- add-port br-phys ext0 -- \ >>>>> + set interface ext0 \ >>>>> + options:tx_pcap=hv1/ext0-tx.pcap \ >>>>> + options:rxq_pcap=hv1/ext0-rx.pcap \ >>>>> + ofport-request=3 >>>>> +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys >>>>> + >>>>> +wait_for_ports_up >>>>> +AT_CHECK([ovn-nbctl --wait=hv sync]) >>>>> + >>>>> +send_packet() { >>>>> + src_mac=$1 >>>>> + src_ip=$2 >>>>> + dst_mac=$3 >>>>> + dst_ip=$4 >>>>> + iface=$5 >>>>> + packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>>>> + ovs-appctl netdev-dummy/receive $iface $packet >>>>> +} >>>>> + >>>>> + >>>>> +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet >>>>> +ovn-sbctl dump-flows ls0 > sw0flows >>>>> +AT_CAPTURE_FILE([sw0flows]) >>>>> + >>>>> +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl >>>>> + table=2 (ls_in_lookup_fdb ), priority=0 , dnl >>>>> +match=(1), action=(next;) >>>>> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >>>>> +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;) >>>>> + table=2 (ls_in_lookup_fdb ), priority=100 , dnl >>>>> +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;) >>>>> +]) >>>>> + >>>>> +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl >>>>> + table=3 (ls_in_put_fdb ), priority=0 , dnl >>>>> +match=(1), action=(next;) >>>>> +]) >>>>> + >>>>> + >>>>> +src_mac="505400000003" >>>>> +src_ip=`ip_to_hex 10.0.0.3` >>>>> +dst_mac="505400000004" >>>>> +dst_ip=`ip_to_hex 10.0.0.4` >>>>> + >>>>> +# send packet from vif0(which has unknown addr set) to vif1 >>>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0 >>>>> + >>>>> +# send packet from vif1 to vif0(which has unknown addr set) >>>>> +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1 >>>>> + >>>>> +# send packet from underlay to vif1 >>>>> +src_mac="505400000064" >>>>> +src_ip=`ip_to_hex 10.0.0.100` >>>>> +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0 >>>>> +AT_CHECK([ovn-nbctl --wait=hv sync]) >>>>> + >>>>> +# Make sure that OVS table 71 is populated on hv1. >>>>> +AS_BOX([Check that ovn-controller programs the flows for FDB]) >>>>> +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt >>>>> +AT_CAPTURE_FILE([hv1_offlows_table71.txt]) >>>>> +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl >>>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]] >>>>> +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]] >>>>> +]) >>>>> + >>>>> +OVN_CLEANUP([hv1]) >>>>> +AT_CLEANUP >>>>> +]) >>>>> + >>>>> OVN_FOR_EACH_NORTHD([ >>>>> AT_SETUP([MAC binding aging]) >>>>> AT_SKIP_IF([test $HAVE_SCAPY = no]) >>>>> diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c >>>>> index 0b86eae7b..354f84a4b 100644 >>>>> --- a/utilities/ovn-trace.c >>>>> +++ b/utilities/ovn-trace.c >>>>> @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, >>>>> break; >>>>> case OVNACT_MAC_CACHE_USE: >>>>> break; >>>>> + case OVNACT_COMMIT_FDB_LOCAL: >>>>> + break; >>>>> } >>>>> } >>>>> ofpbuf_uninit(&stack); >>>> >> >> Regards, >> Dumitru >
"naveen.yerramneni" <naveen.yerramneni@nutanix.com> writes: > This functionality can be enabled at the logical switch level: > - "other_config:fdb_local" can be used to enable/disable this > functionality, it is disabled by default. > - "other_config:fdb_local_idle_timeout" sepcifies idle timeout > for locally learned fdb flows, default timeout is 300 secs. > > If enabled, below lflow is added for each port that has unknown addr set. > - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), > action=(commit_fdb_local(timeout=<timeout>); next; > > New OVN action: "commit_fdb_local". This sets following OVS action. > - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], > NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) > > This is useful when OVN is managing VLAN network that has multiple ports > set with unknown addr and localnet_learn_fdb is enabled. With this config, > if there is east-west traffic flowing between VMs part of same VLAN > deployed on different hypervisors then, MAC addrs of the source and > destination VMs keeps flapping between VM port and localnet port in > Southbound FDB table. Enabling fdb_local config makes fdb table local to > the chassis and avoids MAC flapping. > > Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> > --- Recheck-request: github-robot-_ovn-kubernetes
On 11/30/23 16:32, Dumitru Ceara wrote: > On 11/30/23 15:54, Naveen Yerramneni wrote: >> >> >>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 11/30/23 09:45, Naveen Yerramneni wrote: >>>> >>>> >>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>> >>>>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>>>> This functionality can be enabled at the logical switch level: >>>>>> - "other_config:fdb_local" can be used to enable/disable this >>>>>> functionality, it is disabled by default. >>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>>>> for locally learned fdb flows, default timeout is 300 secs. >>>>>> >>>>>> If enabled, below lflow is added for each port that has unknown addr set. >>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>>>> >>>>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>>>> >>>>>> This is useful when OVN is managing VLAN network that has multiple ports >>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>>>> if there is east-west traffic flowing between VMs part of same VLAN >>>>>> deployed on different hypervisors then, MAC addrs of the source and >>>>>> destination VMs keeps flapping between VM port and localnet port in >>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>>>> the chassis and avoids MAC flapping. >>>>>> >>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>>>> --- >>>>> >>>>> Hi Naveen, >>>>> >>>>> Thanks a lot for the patch! >>>>> >>>>> Just a note, we already have a fix for the east-west traffic that causes >>>>> FDB flapping when localnet is used: >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>>>> >>>>> In general, however, I think it's a very good idea to move the FDB away >>>>> from the Southbound and make it local to each hypervisor. That reduces >>>>> load on the Southbound among other things. >>>>> >>>> >>>> Hi Dumitru, >>>> >>>> Thanks for informing about the patches. >>>> Yes, local FDB reduces load on southbound. >>>> >>>> >>>>>> include/ovn/actions.h | 7 +++ >>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>>>> northd/northd.c | 26 ++++++++++ >>>>>> ovn-nb.xml | 14 ++++++ >>>>>> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>>>> utilities/ovn-trace.c | 2 + >>>>>> 6 files changed, 251 insertions(+) >>>>>> >>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>>>> index 49cfe0624..85ac92cd3 100644 >>>>>> --- a/include/ovn/actions.h >>>>>> +++ b/include/ovn/actions.h >>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>>>> OVNACT(SAMPLE, ovnact_sample) \ >>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>>>> >>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>>>> enum OVS_PACKED_ENUM ovnact_type { >>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>>>> uint16_t timeout; >>>>>> }; >>>>>> >>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>>>> +struct ovnact_commit_fdb_local{ >>>>>> + struct ovnact ovnact; >>>>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>>>> +}; >>>>>> + >>>>>> /* Internal use by the helpers below. */ >>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>>>> diff --git a/lib/actions.c b/lib/actions.c >>>>>> index a73fe1a1e..f5aa78db1 100644 >>>>>> --- a/lib/actions.c >>>>>> +++ b/lib/actions.c >>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>>>> ds_put_cstr(s, "mac_cache_use;"); >>>>>> } >>>>>> >>>>>> +static void >>>>>> +parse_commit_fdb_local(struct action_context *ctx, >>>>>> + struct ovnact_commit_fdb_local *fdb_local) >>>>>> +{ >>>>>> + uint16_t timeout = 0; >>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>> + return; >>>>>> + } >>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>> + return; >>>>>> + } >>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>>>> + return; >>>>>> + } >>>>>> + fdb_local->timeout = timeout; >>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>>>> +} >>>>>> + >>>>>> +static void >>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>> + struct ds *s) >>>>>> +{ >>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>>>> +} >>>>>> + >>>>>> +static void >>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>>>> +{ >>>>>> +} >>>>>> + >>>>>> +static void >>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>>>> +{ >>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>>>> + struct ofpact_learn_spec *ol_spec; >>>>>> + unsigned int imm_bytes; >>>>>> + uint8_t *src_imm; >>>>>> + >>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>>>> + ol->idle_timeout = fdb_local->timeout; >>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>>>> + ol->table_id = OFTABLE_GET_FDB; >>>>>> + ol->cookie = htonll(cookie); >>>>>> + >>>>>> + /* Match on metadata of the packet that created the new table. */ >>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>>>> + ol_spec->dst.ofs = 0; >>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>>>> + >>>>>> + /* Match on metadata of the packet. */ >>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>>>> + ol_spec->dst.ofs = 0; >>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>>>> + >>>>>> + >>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>>>> + ol_spec->dst.ofs = 0; >>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>>>> + >>>>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>>>> +} >>>>> >>>>> A difference from today's SB.FDB centralized approach is that when >>>>> ovn-controller restarts these flows will be cleared, I think. >>>>> >>>>> Are we OK with that? I think so but if not what are the options to >>>>> avoid clearing the local fdb cache on restart? >>>>> >>>> >>>> OVS has to relearn the FDB flows whenever tables are cleared. >>>> During this time, packets gets flooded. I need to think about possible >>>> options if we want to retain FDB table. >>>> >>>> Can we take this up as an enhancement in a separate patch >>>> once we identify a solution for this ? >>>> >>> >>> Sounds good to me. >>> >>>>> Another difference with today's approach is that this avoids a >>>>> controller action, that's great! >>>>> >>>>>> + >>>>>> +static void >>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>> + const struct ovnact_encode_params *ep, >>>>>> + struct ofpbuf *ofpacts) >>>>>> +{ >>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>>>> +} >>>>>> + >>>>>> static void >>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>>>> const struct ovnact_encode_params *ep, >>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>>>> parse_sample(ctx); >>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>>>> } else { >>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>>>> } >>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>> index d1465ddf7..de18694a0 100644 >>>>>> --- a/northd/northd.c >>>>>> +++ b/northd/northd.c >>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>>>> } >>>>>> >>>>>> +static bool >>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>>>> +{ >>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>>>> +} >>>>>> + >>>>> >>>>> Personally, I'd prefer if we don't add another config knob and we just >>>>> make this the only way FDB works. We could also document that the FDB >>>>> SB table should be deprecated. >>>>> >>>> >>>> If we want to make local FDB as default then, I think we need to handle overlay >>>> use case as well. Probably, we might have to add a new stage in logical switch >>>> egress pipeline to learn FDB entries for packets coming over tunnel (or) >>>> something similar. >>>> >>>> Can we take this up in a separate patch ? >>>> >>>> >>> >>> Given that we have a fix already for the original problem you were >>> trying to address I would prefer that we avoid adding new config knobs >>> and handle both the localnet and overlay cases at the same time. >>> >> >> Sure. >> >>> OTOH, why is there a difference? >>> >>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the >>> learn() action) into MFF_LOG_INPORT: >>> >>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With >>> VXLAN that's not the case but that's already a documented limitation, >>> LOG_INPORT is not available after VXLAN tunneling therefore features >>> that need it (like egress ACLs matching against ingress port >>> identifiers) are not supported: >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= >>> >> >> In case of overlay, ingress and egress pipelines are processed on different nodes >> assuming source and dest VMs are on different nodes. I think this makes local FDB >> to learn only local VMs MACs and it never learn remote VMs MACs. >> >> Example: >> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 >> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens >> on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. >> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is >> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also >> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. >> >> > > Ah, I see your point, you're right. So it does look like we'd need a > dedicated fdb learning stage in the egress pipeline. But that is > probably not that terrible. > >>> One more thing we need to take care of in order to be able to make local >>> FDB the default is "FDB refresh": >>> >>> 551527a5e68e ("controller: Update FDB timestamp") >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= >>> >> >> Sure. >> Hi Naveen, Just touching base on this again, I was curious if you're still planning to work on making the FDB table local; I still think it's beneficial in general. Thanks, Dumitru
> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote: > > On 11/30/23 16:32, Dumitru Ceara wrote: >> On 11/30/23 15:54, Naveen Yerramneni wrote: >>> >>> >>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 11/30/23 09:45, Naveen Yerramneni wrote: >>>>> >>>>> >>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>> >>>>>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>>>>> This functionality can be enabled at the logical switch level: >>>>>>> - "other_config:fdb_local" can be used to enable/disable this >>>>>>> functionality, it is disabled by default. >>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>>>>> for locally learned fdb flows, default timeout is 300 secs. >>>>>>> >>>>>>> If enabled, below lflow is added for each port that has unknown addr set. >>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>>>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>>>>> >>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>>>>> >>>>>>> This is useful when OVN is managing VLAN network that has multiple ports >>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>>>>> if there is east-west traffic flowing between VMs part of same VLAN >>>>>>> deployed on different hypervisors then, MAC addrs of the source and >>>>>>> destination VMs keeps flapping between VM port and localnet port in >>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>>>>> the chassis and avoids MAC flapping. >>>>>>> >>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>>>>> --- >>>>>> >>>>>> Hi Naveen, >>>>>> >>>>>> Thanks a lot for the patch! >>>>>> >>>>>> Just a note, we already have a fix for the east-west traffic that causes >>>>>> FDB flapping when localnet is used: >>>>>> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>>>>> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>>>>> >>>>>> In general, however, I think it's a very good idea to move the FDB away >>>>>> from the Southbound and make it local to each hypervisor. That reduces >>>>>> load on the Southbound among other things. >>>>>> >>>>> >>>>> Hi Dumitru, >>>>> >>>>> Thanks for informing about the patches. >>>>> Yes, local FDB reduces load on southbound. >>>>> >>>>> >>>>>>> include/ovn/actions.h | 7 +++ >>>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>>>>> northd/northd.c | 26 ++++++++++ >>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>> utilities/ovn-trace.c | 2 + >>>>>>> 6 files changed, 251 insertions(+) >>>>>>> >>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>>>>> index 49cfe0624..85ac92cd3 100644 >>>>>>> --- a/include/ovn/actions.h >>>>>>> +++ b/include/ovn/actions.h >>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>>>>> OVNACT(SAMPLE, ovnact_sample) \ >>>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>>>>> >>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>>>>> enum OVS_PACKED_ENUM ovnact_type { >>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>>>>> uint16_t timeout; >>>>>>> }; >>>>>>> >>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>>>>> +struct ovnact_commit_fdb_local{ >>>>>>> + struct ovnact ovnact; >>>>>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>>>>> +}; >>>>>>> + >>>>>>> /* Internal use by the helpers below. */ >>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>>>>> diff --git a/lib/actions.c b/lib/actions.c >>>>>>> index a73fe1a1e..f5aa78db1 100644 >>>>>>> --- a/lib/actions.c >>>>>>> +++ b/lib/actions.c >>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>>>>> ds_put_cstr(s, "mac_cache_use;"); >>>>>>> } >>>>>>> >>>>>>> +static void >>>>>>> +parse_commit_fdb_local(struct action_context *ctx, >>>>>>> + struct ovnact_commit_fdb_local *fdb_local) >>>>>>> +{ >>>>>>> + uint16_t timeout = 0; >>>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>> + return; >>>>>>> + } >>>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>> + return; >>>>>>> + } >>>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>>>>> + return; >>>>>>> + } >>>>>>> + fdb_local->timeout = timeout; >>>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>>>>> +} >>>>>>> + >>>>>>> +static void >>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>> + struct ds *s) >>>>>>> +{ >>>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>>>>> +} >>>>>>> + >>>>>>> +static void >>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>>>>> +{ >>>>>>> +} >>>>>>> + >>>>>>> +static void >>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>>>>> +{ >>>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>>>>> + struct ofpact_learn_spec *ol_spec; >>>>>>> + unsigned int imm_bytes; >>>>>>> + uint8_t *src_imm; >>>>>>> + >>>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>>>>> + ol->idle_timeout = fdb_local->timeout; >>>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>>>>> + ol->table_id = OFTABLE_GET_FDB; >>>>>>> + ol->cookie = htonll(cookie); >>>>>>> + >>>>>>> + /* Match on metadata of the packet that created the new table. */ >>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>>>>> + ol_spec->dst.ofs = 0; >>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>>>>> + >>>>>>> + /* Match on metadata of the packet. */ >>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>>>>> + ol_spec->dst.ofs = 0; >>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>>>>> + >>>>>>> + >>>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>>>>> + ol_spec->dst.ofs = 0; >>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>>>>> + >>>>>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>>>>> +} >>>>>> >>>>>> A difference from today's SB.FDB centralized approach is that when >>>>>> ovn-controller restarts these flows will be cleared, I think. >>>>>> >>>>>> Are we OK with that? I think so but if not what are the options to >>>>>> avoid clearing the local fdb cache on restart? >>>>>> >>>>> >>>>> OVS has to relearn the FDB flows whenever tables are cleared. >>>>> During this time, packets gets flooded. I need to think about possible >>>>> options if we want to retain FDB table. >>>>> >>>>> Can we take this up as an enhancement in a separate patch >>>>> once we identify a solution for this ? >>>>> >>>> >>>> Sounds good to me. >>>> >>>>>> Another difference with today's approach is that this avoids a >>>>>> controller action, that's great! >>>>>> >>>>>>> + >>>>>>> +static void >>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>> + const struct ovnact_encode_params *ep, >>>>>>> + struct ofpbuf *ofpacts) >>>>>>> +{ >>>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>>>>> +} >>>>>>> + >>>>>>> static void >>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>>>>> const struct ovnact_encode_params *ep, >>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>>>>> parse_sample(ctx); >>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>>>>> } else { >>>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>>>>> } >>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>> index d1465ddf7..de18694a0 100644 >>>>>>> --- a/northd/northd.c >>>>>>> +++ b/northd/northd.c >>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>>>>> } >>>>>>> >>>>>>> +static bool >>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>>>>> +{ >>>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>>>>> +} >>>>>>> + >>>>>> >>>>>> Personally, I'd prefer if we don't add another config knob and we just >>>>>> make this the only way FDB works. We could also document that the FDB >>>>>> SB table should be deprecated. >>>>>> >>>>> >>>>> If we want to make local FDB as default then, I think we need to handle overlay >>>>> use case as well. Probably, we might have to add a new stage in logical switch >>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or) >>>>> something similar. >>>>> >>>>> Can we take this up in a separate patch ? >>>>> >>>>> >>>> >>>> Given that we have a fix already for the original problem you were >>>> trying to address I would prefer that we avoid adding new config knobs >>>> and handle both the localnet and overlay cases at the same time. >>>> >>> >>> Sure. >>> >>>> OTOH, why is there a difference? >>>> >>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the >>>> learn() action) into MFF_LOG_INPORT: >>>> >>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With >>>> VXLAN that's not the case but that's already a documented limitation, >>>> LOG_INPORT is not available after VXLAN tunneling therefore features >>>> that need it (like egress ACLs matching against ingress port >>>> identifiers) are not supported: >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= >>>> >>> >>> In case of overlay, ingress and egress pipelines are processed on different nodes >>> assuming source and dest VMs are on different nodes. I think this makes local FDB >>> to learn only local VMs MACs and it never learn remote VMs MACs. >>> >>> Example: >>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 >>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens >>> on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. >>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is >>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also >>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. >>> >>> >> >> Ah, I see your point, you're right. So it does look like we'd need a >> dedicated fdb learning stage in the egress pipeline. But that is >> probably not that terrible. >> >>>> One more thing we need to take care of in order to be able to make local >>>> FDB the default is "FDB refresh": >>>> >>>> 551527a5e68e ("controller: Update FDB timestamp") >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= >>>> >>> >>> Sure. >>> > > Hi Naveen, > > Just touching base on this again, I was curious if you're still planning > to work on making the FDB table local; I still think it's beneficial in > general. > > Thanks, > Dumitru Hi Dumitru, Yes, I will work on this. I will try to send out the patch by Jan 1st week. Thanks, Naveen
On 12/18/23 16:17, Naveen Yerramneni wrote: > > >> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 11/30/23 16:32, Dumitru Ceara wrote: >>> On 11/30/23 15:54, Naveen Yerramneni wrote: >>>> >>>> >>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>> >>>>> On 11/30/23 09:45, Naveen Yerramneni wrote: >>>>>> >>>>>> >>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>> >>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>>>>>> This functionality can be enabled at the logical switch level: >>>>>>>> - "other_config:fdb_local" can be used to enable/disable this >>>>>>>> functionality, it is disabled by default. >>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>>>>>> for locally learned fdb flows, default timeout is 300 secs. >>>>>>>> >>>>>>>> If enabled, below lflow is added for each port that has unknown addr set. >>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>>>>>> >>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>>>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>>>>>> >>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports >>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN >>>>>>>> deployed on different hypervisors then, MAC addrs of the source and >>>>>>>> destination VMs keeps flapping between VM port and localnet port in >>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>>>>>> the chassis and avoids MAC flapping. >>>>>>>> >>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>>>>>> --- >>>>>>> >>>>>>> Hi Naveen, >>>>>>> >>>>>>> Thanks a lot for the patch! >>>>>>> >>>>>>> Just a note, we already have a fix for the east-west traffic that causes >>>>>>> FDB flapping when localnet is used: >>>>>>> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>>>>>> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>>>>>> >>>>>>> In general, however, I think it's a very good idea to move the FDB away >>>>>>> from the Southbound and make it local to each hypervisor. That reduces >>>>>>> load on the Southbound among other things. >>>>>>> >>>>>> >>>>>> Hi Dumitru, >>>>>> >>>>>> Thanks for informing about the patches. >>>>>> Yes, local FDB reduces load on southbound. >>>>>> >>>>>> >>>>>>>> include/ovn/actions.h | 7 +++ >>>>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>>>>>> northd/northd.c | 26 ++++++++++ >>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>>> utilities/ovn-trace.c | 2 + >>>>>>>> 6 files changed, 251 insertions(+) >>>>>>>> >>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>>>>>> index 49cfe0624..85ac92cd3 100644 >>>>>>>> --- a/include/ovn/actions.h >>>>>>>> +++ b/include/ovn/actions.h >>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>>>>>> OVNACT(SAMPLE, ovnact_sample) \ >>>>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>>>>>> >>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>>>>>> enum OVS_PACKED_ENUM ovnact_type { >>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>>>>>> uint16_t timeout; >>>>>>>> }; >>>>>>>> >>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>>>>>> +struct ovnact_commit_fdb_local{ >>>>>>>> + struct ovnact ovnact; >>>>>>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>>>>>> +}; >>>>>>>> + >>>>>>>> /* Internal use by the helpers below. */ >>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>>>>>> diff --git a/lib/actions.c b/lib/actions.c >>>>>>>> index a73fe1a1e..f5aa78db1 100644 >>>>>>>> --- a/lib/actions.c >>>>>>>> +++ b/lib/actions.c >>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>>>>>> ds_put_cstr(s, "mac_cache_use;"); >>>>>>>> } >>>>>>>> >>>>>>>> +static void >>>>>>>> +parse_commit_fdb_local(struct action_context *ctx, >>>>>>>> + struct ovnact_commit_fdb_local *fdb_local) >>>>>>>> +{ >>>>>>>> + uint16_t timeout = 0; >>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + fdb_local->timeout = timeout; >>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void >>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>> + struct ds *s) >>>>>>>> +{ >>>>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void >>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>>>>>> +{ >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void >>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>>>>>> +{ >>>>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>>>>>> + struct ofpact_learn_spec *ol_spec; >>>>>>>> + unsigned int imm_bytes; >>>>>>>> + uint8_t *src_imm; >>>>>>>> + >>>>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>>>>>> + ol->idle_timeout = fdb_local->timeout; >>>>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>>>>>> + ol->table_id = OFTABLE_GET_FDB; >>>>>>>> + ol->cookie = htonll(cookie); >>>>>>>> + >>>>>>>> + /* Match on metadata of the packet that created the new table. */ >>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>>>>>> + >>>>>>>> + /* Match on metadata of the packet. */ >>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>>>>>> + >>>>>>>> + >>>>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>>>>>> + >>>>>>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>>>>>> +} >>>>>>> >>>>>>> A difference from today's SB.FDB centralized approach is that when >>>>>>> ovn-controller restarts these flows will be cleared, I think. >>>>>>> >>>>>>> Are we OK with that? I think so but if not what are the options to >>>>>>> avoid clearing the local fdb cache on restart? >>>>>>> >>>>>> >>>>>> OVS has to relearn the FDB flows whenever tables are cleared. >>>>>> During this time, packets gets flooded. I need to think about possible >>>>>> options if we want to retain FDB table. >>>>>> >>>>>> Can we take this up as an enhancement in a separate patch >>>>>> once we identify a solution for this ? >>>>>> >>>>> >>>>> Sounds good to me. >>>>> >>>>>>> Another difference with today's approach is that this avoids a >>>>>>> controller action, that's great! >>>>>>> >>>>>>>> + >>>>>>>> +static void >>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>> + const struct ovnact_encode_params *ep, >>>>>>>> + struct ofpbuf *ofpacts) >>>>>>>> +{ >>>>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>>>>>> +} >>>>>>>> + >>>>>>>> static void >>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>>>>>> const struct ovnact_encode_params *ep, >>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>>>>>> parse_sample(ctx); >>>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>>>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>>>>>> } else { >>>>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>>>>>> } >>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>> index d1465ddf7..de18694a0 100644 >>>>>>>> --- a/northd/northd.c >>>>>>>> +++ b/northd/northd.c >>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>>>>>> } >>>>>>>> >>>>>>>> +static bool >>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>>>>>> +{ >>>>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>>>>>> +} >>>>>>>> + >>>>>>> >>>>>>> Personally, I'd prefer if we don't add another config knob and we just >>>>>>> make this the only way FDB works. We could also document that the FDB >>>>>>> SB table should be deprecated. >>>>>>> >>>>>> >>>>>> If we want to make local FDB as default then, I think we need to handle overlay >>>>>> use case as well. Probably, we might have to add a new stage in logical switch >>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or) >>>>>> something similar. >>>>>> >>>>>> Can we take this up in a separate patch ? >>>>>> >>>>>> >>>>> >>>>> Given that we have a fix already for the original problem you were >>>>> trying to address I would prefer that we avoid adding new config knobs >>>>> and handle both the localnet and overlay cases at the same time. >>>>> >>>> >>>> Sure. >>>> >>>>> OTOH, why is there a difference? >>>>> >>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the >>>>> learn() action) into MFF_LOG_INPORT: >>>>> >>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With >>>>> VXLAN that's not the case but that's already a documented limitation, >>>>> LOG_INPORT is not available after VXLAN tunneling therefore features >>>>> that need it (like egress ACLs matching against ingress port >>>>> identifiers) are not supported: >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= >>>>> >>>> >>>> In case of overlay, ingress and egress pipelines are processed on different nodes >>>> assuming source and dest VMs are on different nodes. I think this makes local FDB >>>> to learn only local VMs MACs and it never learn remote VMs MACs. >>>> >>>> Example: >>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 >>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens >>>> on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. >>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is >>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also >>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. >>>> >>>> >>> >>> Ah, I see your point, you're right. So it does look like we'd need a >>> dedicated fdb learning stage in the egress pipeline. But that is >>> probably not that terrible. >>> >>>>> One more thing we need to take care of in order to be able to make local >>>>> FDB the default is "FDB refresh": >>>>> >>>>> 551527a5e68e ("controller: Update FDB timestamp") >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= >>>>> >>>> >>>> Sure. >>>> >> >> Hi Naveen, >> >> Just touching base on this again, I was curious if you're still planning >> to work on making the FDB table local; I still think it's beneficial in >> general. >> >> Thanks, >> Dumitru > > > Hi Dumitru, > > Yes, I will work on this. I will try to send out the patch by Jan 1st week. > Awesome, thanks Naveen! No rush though, I just wanted to confirm whether you were interested in pursuing this. Best regards, Dumitru
> On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote: > > On 12/18/23 16:17, Naveen Yerramneni wrote: >> >> >>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 11/30/23 16:32, Dumitru Ceara wrote: >>>> On 11/30/23 15:54, Naveen Yerramneni wrote: >>>>> >>>>> >>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>> >>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote: >>>>>>> >>>>>>> >>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>> >>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>>>>>>> This functionality can be enabled at the logical switch level: >>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this >>>>>>>>> functionality, it is disabled by default. >>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>>>>>>> for locally learned fdb flows, default timeout is 300 secs. >>>>>>>>> >>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set. >>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>>>>>>> >>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>>>>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>>>>>>> >>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports >>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN >>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and >>>>>>>>> destination VMs keeps flapping between VM port and localnet port in >>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>>>>>>> the chassis and avoids MAC flapping. >>>>>>>>> >>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>>>>>>> --- >>>>>>>> >>>>>>>> Hi Naveen, >>>>>>>> >>>>>>>> Thanks a lot for the patch! >>>>>>>> >>>>>>>> Just a note, we already have a fix for the east-west traffic that causes >>>>>>>> FDB flapping when localnet is used: >>>>>>>> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>>>>>>> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>>>>>>> >>>>>>>> In general, however, I think it's a very good idea to move the FDB away >>>>>>>> from the Southbound and make it local to each hypervisor. That reduces >>>>>>>> load on the Southbound among other things. >>>>>>>> >>>>>>> >>>>>>> Hi Dumitru, >>>>>>> >>>>>>> Thanks for informing about the patches. >>>>>>> Yes, local FDB reduces load on southbound. >>>>>>> >>>>>>> >>>>>>>>> include/ovn/actions.h | 7 +++ >>>>>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>>>>>>> northd/northd.c | 26 ++++++++++ >>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>> tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> utilities/ovn-trace.c | 2 + >>>>>>>>> 6 files changed, 251 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>>>>>>> index 49cfe0624..85ac92cd3 100644 >>>>>>>>> --- a/include/ovn/actions.h >>>>>>>>> +++ b/include/ovn/actions.h >>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>>>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>>>>>>> OVNACT(SAMPLE, ovnact_sample) \ >>>>>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>>>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>>>>>>> >>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>>>>>>> enum OVS_PACKED_ENUM ovnact_type { >>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>>>>>>> uint16_t timeout; >>>>>>>>> }; >>>>>>>>> >>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>>>>>>> +struct ovnact_commit_fdb_local{ >>>>>>>>> + struct ovnact ovnact; >>>>>>>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>>>>>>> +}; >>>>>>>>> + >>>>>>>>> /* Internal use by the helpers below. */ >>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c >>>>>>>>> index a73fe1a1e..f5aa78db1 100644 >>>>>>>>> --- a/lib/actions.c >>>>>>>>> +++ b/lib/actions.c >>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>>>>>>> ds_put_cstr(s, "mac_cache_use;"); >>>>>>>>> } >>>>>>>>> >>>>>>>>> +static void >>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx, >>>>>>>>> + struct ovnact_commit_fdb_local *fdb_local) >>>>>>>>> +{ >>>>>>>>> + uint16_t timeout = 0; >>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>>>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + fdb_local->timeout = timeout; >>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>> + struct ds *s) >>>>>>>>> +{ >>>>>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>>>>>>> +{ >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>>>>>>> +{ >>>>>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>>>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>>>>>>> + struct ofpact_learn_spec *ol_spec; >>>>>>>>> + unsigned int imm_bytes; >>>>>>>>> + uint8_t *src_imm; >>>>>>>>> + >>>>>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>>>>>>> + ol->idle_timeout = fdb_local->timeout; >>>>>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>>>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>>>>>>> + ol->table_id = OFTABLE_GET_FDB; >>>>>>>>> + ol->cookie = htonll(cookie); >>>>>>>>> + >>>>>>>>> + /* Match on metadata of the packet that created the new table. */ >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>>>>>>> + >>>>>>>>> + /* Match on metadata of the packet. */ >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>>>>>>> + >>>>>>>>> + >>>>>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>>>>>>> + >>>>>>>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>>>>>>> +} >>>>>>>> >>>>>>>> A difference from today's SB.FDB centralized approach is that when >>>>>>>> ovn-controller restarts these flows will be cleared, I think. >>>>>>>> >>>>>>>> Are we OK with that? I think so but if not what are the options to >>>>>>>> avoid clearing the local fdb cache on restart? >>>>>>>> >>>>>>> >>>>>>> OVS has to relearn the FDB flows whenever tables are cleared. >>>>>>> During this time, packets gets flooded. I need to think about possible >>>>>>> options if we want to retain FDB table. >>>>>>> >>>>>>> Can we take this up as an enhancement in a separate patch >>>>>>> once we identify a solution for this ? >>>>>>> >>>>>> >>>>>> Sounds good to me. >>>>>> >>>>>>>> Another difference with today's approach is that this avoids a >>>>>>>> controller action, that's great! >>>>>>>> >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>> + const struct ovnact_encode_params *ep, >>>>>>>>> + struct ofpbuf *ofpacts) >>>>>>>>> +{ >>>>>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> static void >>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>>>>>>> const struct ovnact_encode_params *ep, >>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>>>>>>> parse_sample(ctx); >>>>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>>>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>>>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>>>>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>>>>>>> } else { >>>>>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>>>>>>> } >>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>> index d1465ddf7..de18694a0 100644 >>>>>>>>> --- a/northd/northd.c >>>>>>>>> +++ b/northd/northd.c >>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>>>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>>>>>>> } >>>>>>>>> >>>>>>>>> +static bool >>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>>>>>>> +{ >>>>>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>>>>>>> +} >>>>>>>>> + >>>>>>>> >>>>>>>> Personally, I'd prefer if we don't add another config knob and we just >>>>>>>> make this the only way FDB works. We could also document that the FDB >>>>>>>> SB table should be deprecated. >>>>>>>> >>>>>>> >>>>>>> If we want to make local FDB as default then, I think we need to handle overlay >>>>>>> use case as well. Probably, we might have to add a new stage in logical switch >>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or) >>>>>>> something similar. >>>>>>> >>>>>>> Can we take this up in a separate patch ? >>>>>>> >>>>>>> >>>>>> >>>>>> Given that we have a fix already for the original problem you were >>>>>> trying to address I would prefer that we avoid adding new config knobs >>>>>> and handle both the localnet and overlay cases at the same time. >>>>>> >>>>> >>>>> Sure. >>>>> >>>>>> OTOH, why is there a difference? >>>>>> >>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the >>>>>> learn() action) into MFF_LOG_INPORT: >>>>>> >>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With >>>>>> VXLAN that's not the case but that's already a documented limitation, >>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features >>>>>> that need it (like egress ACLs matching against ingress port >>>>>> identifiers) are not supported: >>>>>> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= >>>>>> >>>>> >>>>> In case of overlay, ingress and egress pipelines are processed on different nodes >>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB >>>>> to learn only local VMs MACs and it never learn remote VMs MACs. >>>>> >>>>> Example: >>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 >>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens >>>>> on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. >>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is >>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also >>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. >>>>> >>>>> >>>> >>>> Ah, I see your point, you're right. So it does look like we'd need a >>>> dedicated fdb learning stage in the egress pipeline. But that is >>>> probably not that terrible. >>>> Hi Dumitru, I started working on this patch, sorry for the delay. On further investigation, I realised that we cannot permanently remove FDB table from southbound even if we add new FDB leaning stage in egress pipeline. Following is the example: Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to subnet-1 (LS1) and is running on hv1 and VM2 isconnected to subnet-2 (LS2) and is running on hv2. Both subnets are connected to same logical router(LR). When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below. - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets flooded in LS2. - ICMP Req on HV2: LS2-out. - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in. HV2 learns VM2 MAC, packet gets flooded in LS1. - ICMP Resp on HV1: LS1-out. - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC. I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled (or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR have ports with unknown addr configured. Please let me know your thoughts. Thanks, Naveen >>>>>> One more thing we need to take care of in order to be able to make local >>>>>> FDB the default is "FDB refresh": >>>>>> >>>>>> 551527a5e68e ("controller: Update FDB timestamp") >>>>>> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= >>>>>> >>>>> >>>>> Sure. >>>>> >>> >>> Hi Naveen, >>> >>> Just touching base on this again, I was curious if you're still planning >>> to work on making the FDB table local; I still think it's beneficial in >>> general. >>> >>> Thanks, >>> Dumitru >> >> >> Hi Dumitru, >> >> Yes, I will work on this. I will try to send out the patch by Jan 1st week. >> > > Awesome, thanks Naveen! No rush though, I just wanted to confirm > whether you were interested in pursuing this. > > Best regards, > Dumitru > >
On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni < naveen.yerramneni@nutanix.com> wrote: > > > > On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote: > > > > On 12/18/23 16:17, Naveen Yerramneni wrote: > >> > >> > >>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote: > >>> > >>> On 11/30/23 16:32, Dumitru Ceara wrote: > >>>> On 11/30/23 15:54, Naveen Yerramneni wrote: > >>>>> > >>>>> > >>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> > wrote: > >>>>>> > >>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote: > >>>>>>> > >>>>>>> > >>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> > wrote: > >>>>>>>> > >>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote: > >>>>>>>>> This functionality can be enabled at the logical switch level: > >>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this > >>>>>>>>> functionality, it is disabled by default. > >>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout > >>>>>>>>> for locally learned fdb flows, default timeout is 300 secs. > >>>>>>>>> > >>>>>>>>> If enabled, below lflow is added for each port that has unknown > addr set. > >>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == > <in_port>), > >>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next; > >>>>>>>>> > >>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS > action. > >>>>>>>>> - > learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], > >>>>>>>>> > NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) > >>>>>>>>> > >>>>>>>>> This is useful when OVN is managing VLAN network that has > multiple ports > >>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With > this config, > >>>>>>>>> if there is east-west traffic flowing between VMs part of same > VLAN > >>>>>>>>> deployed on different hypervisors then, MAC addrs of the source > and > >>>>>>>>> destination VMs keeps flapping between VM port and localnet port > in > >>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table > local to > >>>>>>>>> the chassis and avoids MAC flapping. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> > >>>>>>>>> --- > >>>>>>>> > >>>>>>>> Hi Naveen, > >>>>>>>> > >>>>>>>> Thanks a lot for the patch! > >>>>>>>> > >>>>>>>> Just a note, we already have a fix for the east-west traffic that > causes > >>>>>>>> FDB flapping when localnet is used: > >>>>>>>> > >>>>>>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= > >>>>>>>> > >>>>>>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= > >>>>>>>> > >>>>>>>> In general, however, I think it's a very good idea to move the > FDB away > >>>>>>>> from the Southbound and make it local to each hypervisor. That > reduces > >>>>>>>> load on the Southbound among other things. > >>>>>>>> > >>>>>>> > >>>>>>> Hi Dumitru, > >>>>>>> > >>>>>>> Thanks for informing about the patches. > >>>>>>> Yes, local FDB reduces load on southbound. > >>>>>>> > >>>>>>> > >>>>>>>>> include/ovn/actions.h | 7 +++ > >>>>>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ > >>>>>>>>> northd/northd.c | 26 ++++++++++ > >>>>>>>>> ovn-nb.xml | 14 ++++++ > >>>>>>>>> tests/ovn.at | 108 > ++++++++++++++++++++++++++++++++++++++++++ > >>>>>>>>> utilities/ovn-trace.c | 2 + > >>>>>>>>> 6 files changed, 251 insertions(+) > >>>>>>>>> > >>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h > >>>>>>>>> index 49cfe0624..85ac92cd3 100644 > >>>>>>>>> --- a/include/ovn/actions.h > >>>>>>>>> +++ b/include/ovn/actions.h > >>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; > >>>>>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ > >>>>>>>>> OVNACT(SAMPLE, ovnact_sample) \ > >>>>>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ > >>>>>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ > >>>>>>>>> > >>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each > action. */ > >>>>>>>>> enum OVS_PACKED_ENUM ovnact_type { > >>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { > >>>>>>>>> uint16_t timeout; > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ > >>>>>>>>> +struct ovnact_commit_fdb_local{ > >>>>>>>>> + struct ovnact ovnact; > >>>>>>>>> + uint16_t timeout; /* fdb_local flow timeout */ > >>>>>>>>> +}; > >>>>>>>>> + > >>>>>>>>> /* Internal use by the helpers below. */ > >>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); > >>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); > >>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c > >>>>>>>>> index a73fe1a1e..f5aa78db1 100644 > >>>>>>>>> --- a/lib/actions.c > >>>>>>>>> +++ b/lib/actions.c > >>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct > ovnact_null *null OVS_UNUSED, struct ds *s) > >>>>>>>>> ds_put_cstr(s, "mac_cache_use;"); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +static void > >>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx, > >>>>>>>>> + struct ovnact_commit_fdb_local *fdb_local) > >>>>>>>>> +{ > >>>>>>>>> + uint16_t timeout = 0; > >>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ > >>>>>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { > >>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); > >>>>>>>>> + return; > >>>>>>>>> + } > >>>>>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { > >>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); > >>>>>>>>> + return; > >>>>>>>>> + } > >>>>>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow > timeout")) { > >>>>>>>>> + return; > >>>>>>>>> + } > >>>>>>>>> + fdb_local->timeout = timeout; > >>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local > *fdb_local, > >>>>>>>>> + struct ds *s) > >>>>>>>>> +{ > >>>>>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", > fdb_local->timeout); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local > *fdb_local OVS_UNUSED) > >>>>>>>>> +{ > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local > *fdb_local, > >>>>>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) > >>>>>>>>> +{ > >>>>>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); > >>>>>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; > >>>>>>>>> + struct ofpact_learn_spec *ol_spec; > >>>>>>>>> + unsigned int imm_bytes; > >>>>>>>>> + uint8_t *src_imm; > >>>>>>>>> + > >>>>>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; > >>>>>>>>> + ol->idle_timeout = fdb_local->timeout; > >>>>>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; > >>>>>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; > >>>>>>>>> + ol->table_id = OFTABLE_GET_FDB; > >>>>>>>>> + ol->cookie = htonll(cookie); > >>>>>>>>> + > >>>>>>>>> + /* Match on metadata of the packet that created the new > table. */ > >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); > >>>>>>>>> + ol_spec->dst.ofs = 0; > >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; > >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; > >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; > >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); > >>>>>>>>> + > >>>>>>>>> + /* Match on metadata of the packet. */ > >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); > >>>>>>>>> + ol_spec->dst.ofs = 0; > >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; > >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; > >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; > >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); > >>>>>>>>> + > >>>>>>>>> + > >>>>>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ > >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); > >>>>>>>>> + ol_spec->dst.ofs = 0; > >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; > >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; > >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; > >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); > >>>>>>>>> + > >>>>>>>>> + ofpact_finish_LEARN(ofpacts, &ol); > >>>>>>>>> +} > >>>>>>>> > >>>>>>>> A difference from today's SB.FDB centralized approach is that when > >>>>>>>> ovn-controller restarts these flows will be cleared, I think. > >>>>>>>> > >>>>>>>> Are we OK with that? I think so but if not what are the options > to > >>>>>>>> avoid clearing the local fdb cache on restart? > >>>>>>>> > >>>>>>> > >>>>>>> OVS has to relearn the FDB flows whenever tables are cleared. > >>>>>>> During this time, packets gets flooded. I need to think about > possible > >>>>>>> options if we want to retain FDB table. > >>>>>>> > >>>>>>> Can we take this up as an enhancement in a separate patch > >>>>>>> once we identify a solution for this ? > >>>>>>> > >>>>>> > >>>>>> Sounds good to me. > >>>>>> > >>>>>>>> Another difference with today's approach is that this avoids a > >>>>>>>> controller action, that's great! > >>>>>>>> > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local > *fdb_local, > >>>>>>>>> + const struct ovnact_encode_params *ep, > >>>>>>>>> + struct ofpbuf *ofpacts) > >>>>>>>>> +{ > >>>>>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, > ep->lflow_uuid.parts[0]); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> static void > >>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, > >>>>>>>>> const struct ovnact_encode_params *ep, > >>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) > >>>>>>>>> parse_sample(ctx); > >>>>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { > >>>>>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); > >>>>>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { > >>>>>>>>> + parse_commit_fdb_local(ctx, > ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); > >>>>>>>>> } else { > >>>>>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); > >>>>>>>>> } > >>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c > >>>>>>>>> index d1465ddf7..de18694a0 100644 > >>>>>>>>> --- a/northd/northd.c > >>>>>>>>> +++ b/northd/northd.c > >>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct > nbrec_logical_switch_port *nbsp) > >>>>>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", > false); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +static bool > >>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) > >>>>>>>>> +{ > >>>>>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", > false); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>> > >>>>>>>> Personally, I'd prefer if we don't add another config knob and we > just > >>>>>>>> make this the only way FDB works. We could also document that > the FDB > >>>>>>>> SB table should be deprecated. > >>>>>>>> > >>>>>>> > >>>>>>> If we want to make local FDB as default then, I think we need to > handle overlay > >>>>>>> use case as well. Probably, we might have to add a new stage in > logical switch > >>>>>>> egress pipeline to learn FDB entries for packets coming over > tunnel (or) > >>>>>>> something similar. > >>>>>>> > >>>>>>> Can we take this up in a separate patch ? > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> Given that we have a fix already for the original problem you were > >>>>>> trying to address I would prefer that we avoid adding new config > knobs > >>>>>> and handle both the localnet and overlay cases at the same time. > >>>>>> > >>>>> > >>>>> Sure. > >>>>> > >>>>>> OTOH, why is there a difference? > >>>>>> > >>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that > triggers the > >>>>>> learn() action) into MFF_LOG_INPORT: > >>>>>> > >>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. > With > >>>>>> VXLAN that's not the case but that's already a documented > limitation, > >>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features > >>>>>> that need it (like egress ACLs matching against ingress port > >>>>>> identifiers) are not supported: > >>>>>> > >>>>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= > >>>>>> > >>>>> > >>>>> In case of overlay, ingress and egress pipelines are processed on > different nodes > >>>>> assuming source and dest VMs are on different nodes. I think this > makes local FDB > >>>>> to learn only local VMs MACs and it never learn remote VMs MACs. > >>>>> > >>>>> Example: > >>>>> 2 VMs with unknown addr set are connected to same logical switch > where vm1 is on hv1 > >>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, > ingress pipeline happens > >>>>> on hv1 and it learns vm1-mac and this packet gets flooded since > vm2-mac is not yet learned. > >>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on > hv2. Now, when vm2 is > >>>>> responding back, ingress pipeline happens on hv2 and it learns > vm2-mac and this packet also > >>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet > is received. > >>>>> > >>>>> > >>>> > >>>> Ah, I see your point, you're right. So it does look like we'd need a > >>>> dedicated fdb learning stage in the egress pipeline. But that is > >>>> probably not that terrible. > >>>> > > > Hi Dumitru, > > I started working on this patch, sorry for the delay. > > On further investigation, I realised that we cannot permanently remove FDB > table from southbound even > if we add new FDB leaning stage in egress pipeline. > > Following is the example: > > Two VMs part of different overlay subnets have unknown addr set whereVM1 > is connected to > subnet-1 (LS1) and is running on hv1 and VM2 isconnected to subnet-2 > (LS2) and is running > on hv2. Both subnets are connected to same logical router(LR). > > When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as > below. > - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 > MAC, packet gets > flooded in LS2. > - ICMP Req on HV2: LS2-out. > - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in. HV2 learns VM2 > MAC, packet gets > flooded in LS1. > - ICMP Resp on HV1: LS1-out. > - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never > learns VM1 MAC. > > > I am thinking either we can make FDB local implementation only applicable > for VLAN subnets with default disabled > (or) implement is for both VLAN and overlay subnets with default disabled > by documenting the limitation and > explore the possibility of ignoring this setting in northd when multiple > overlay subnets connected to same LR > have ports with unknown addr configured. > > Please let me know your thoughts. > > Thanks, > Naveen > > >>>>>> One more thing we need to take care of in order to be able to make > local > >>>>>> FDB the default is "FDB refresh": > >>>>>> > >>>>>> 551527a5e68e ("controller: Update FDB timestamp") > >>>>>> > >>>>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= > >>>>>> > >>>>> > >>>>> Sure. > >>>>> > >>> > >>> Hi Naveen, > >>> > >>> Just touching base on this again, I was curious if you're still > planning > >>> to work on making the FDB table local; I still think it's beneficial in > >>> general. > >>> > >>> Thanks, > >>> Dumitru > >> > >> > >> Hi Dumitru, > >> > >> Yes, I will work on this. I will try to send out the patch by Jan 1st > week. > >> > > > > Awesome, thanks Naveen! No rush though, I just wanted to confirm > > whether you were interested in pursuing this. > > > > Best regards, > > Dumitru > > > > > > Hi Naveen, I did run into a similar issue with MAC binding, with distributed routers there is currently no other way to share the MAC binding. That made me wonder if it's worth pursuing, as the complexity of the code would significantly increase having half in-memory, half DB solution. With the aging option the DB size shouldn't be that problematic anymore not sure if there are any other concerns with the current implementation. From my point of view this might be a dead end, however others might have slightly different opinions. Thanks, Ales
> On 08-Mar-2024, at 2:37 PM, Ales Musil <amusil@redhat.com> wrote: > > > > On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni <naveen.yerramneni@nutanix.com> wrote: > > > > On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote: > > > > On 12/18/23 16:17, Naveen Yerramneni wrote: > >> > >> > >>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote: > >>> > >>> On 11/30/23 16:32, Dumitru Ceara wrote: > >>>> On 11/30/23 15:54, Naveen Yerramneni wrote: > >>>>> > >>>>> > >>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: > >>>>>> > >>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote: > >>>>>>> > >>>>>>> > >>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: > >>>>>>>> > >>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote: > >>>>>>>>> This functionality can be enabled at the logical switch level: > >>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this > >>>>>>>>> functionality, it is disabled by default. > >>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout > >>>>>>>>> for locally learned fdb flows, default timeout is 300 secs. > >>>>>>>>> > >>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set. > >>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), > >>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next; > >>>>>>>>> > >>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action. > >>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], > >>>>>>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) > >>>>>>>>> > >>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports > >>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, > >>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN > >>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and > >>>>>>>>> destination VMs keeps flapping between VM port and localnet port in > >>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to > >>>>>>>>> the chassis and avoids MAC flapping. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> > >>>>>>>>> --- > >>>>>>>> > >>>>>>>> Hi Naveen, > >>>>>>>> > >>>>>>>> Thanks a lot for the patch! > >>>>>>>> > >>>>>>>> Just a note, we already have a fix for the east-west traffic that causes > >>>>>>>> FDB flapping when localnet is used: > >>>>>>>> > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= > >>>>>>>> > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= > >>>>>>>> > >>>>>>>> In general, however, I think it's a very good idea to move the FDB away > >>>>>>>> from the Southbound and make it local to each hypervisor. That reduces > >>>>>>>> load on the Southbound among other things. > >>>>>>>> > >>>>>>> > >>>>>>> Hi Dumitru, > >>>>>>> > >>>>>>> Thanks for informing about the patches. > >>>>>>> Yes, local FDB reduces load on southbound. > >>>>>>> > >>>>>>> > >>>>>>>>> include/ovn/actions.h | 7 +++ > >>>>>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ > >>>>>>>>> northd/northd.c | 26 ++++++++++ > >>>>>>>>> ovn-nb.xml | 14 ++++++ > >>>>>>>>> tests/ovn.at [ovn.at] | 108 ++++++++++++++++++++++++++++++++++++++++++ > >>>>>>>>> utilities/ovn-trace.c | 2 + > >>>>>>>>> 6 files changed, 251 insertions(+) > >>>>>>>>> > >>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h > >>>>>>>>> index 49cfe0624..85ac92cd3 100644 > >>>>>>>>> --- a/include/ovn/actions.h > >>>>>>>>> +++ b/include/ovn/actions.h > >>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; > >>>>>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ > >>>>>>>>> OVNACT(SAMPLE, ovnact_sample) \ > >>>>>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ > >>>>>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ > >>>>>>>>> > >>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ > >>>>>>>>> enum OVS_PACKED_ENUM ovnact_type { > >>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { > >>>>>>>>> uint16_t timeout; > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ > >>>>>>>>> +struct ovnact_commit_fdb_local{ > >>>>>>>>> + struct ovnact ovnact; > >>>>>>>>> + uint16_t timeout; /* fdb_local flow timeout */ > >>>>>>>>> +}; > >>>>>>>>> + > >>>>>>>>> /* Internal use by the helpers below. */ > >>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); > >>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); > >>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c > >>>>>>>>> index a73fe1a1e..f5aa78db1 100644 > >>>>>>>>> --- a/lib/actions.c > >>>>>>>>> +++ b/lib/actions.c > >>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) > >>>>>>>>> ds_put_cstr(s, "mac_cache_use;"); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +static void > >>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx, > >>>>>>>>> + struct ovnact_commit_fdb_local *fdb_local) > >>>>>>>>> +{ > >>>>>>>>> + uint16_t timeout = 0; > >>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ > >>>>>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { > >>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); > >>>>>>>>> + return; > >>>>>>>>> + } > >>>>>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { > >>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); > >>>>>>>>> + return; > >>>>>>>>> + } > >>>>>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { > >>>>>>>>> + return; > >>>>>>>>> + } > >>>>>>>>> + fdb_local->timeout = timeout; > >>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, > >>>>>>>>> + struct ds *s) > >>>>>>>>> +{ > >>>>>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) > >>>>>>>>> +{ > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, > >>>>>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) > >>>>>>>>> +{ > >>>>>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); > >>>>>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; > >>>>>>>>> + struct ofpact_learn_spec *ol_spec; > >>>>>>>>> + unsigned int imm_bytes; > >>>>>>>>> + uint8_t *src_imm; > >>>>>>>>> + > >>>>>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; > >>>>>>>>> + ol->idle_timeout = fdb_local->timeout; > >>>>>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; > >>>>>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; > >>>>>>>>> + ol->table_id = OFTABLE_GET_FDB; > >>>>>>>>> + ol->cookie = htonll(cookie); > >>>>>>>>> + > >>>>>>>>> + /* Match on metadata of the packet that created the new table. */ > >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); > >>>>>>>>> + ol_spec->dst.ofs = 0; > >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; > >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; > >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; > >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); > >>>>>>>>> + > >>>>>>>>> + /* Match on metadata of the packet. */ > >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); > >>>>>>>>> + ol_spec->dst.ofs = 0; > >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; > >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; > >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; > >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); > >>>>>>>>> + > >>>>>>>>> + > >>>>>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ > >>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > >>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); > >>>>>>>>> + ol_spec->dst.ofs = 0; > >>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > >>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; > >>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; > >>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; > >>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); > >>>>>>>>> + > >>>>>>>>> + ofpact_finish_LEARN(ofpacts, &ol); > >>>>>>>>> +} > >>>>>>>> > >>>>>>>> A difference from today's SB.FDB centralized approach is that when > >>>>>>>> ovn-controller restarts these flows will be cleared, I think. > >>>>>>>> > >>>>>>>> Are we OK with that? I think so but if not what are the options to > >>>>>>>> avoid clearing the local fdb cache on restart? > >>>>>>>> > >>>>>>> > >>>>>>> OVS has to relearn the FDB flows whenever tables are cleared. > >>>>>>> During this time, packets gets flooded. I need to think about possible > >>>>>>> options if we want to retain FDB table. > >>>>>>> > >>>>>>> Can we take this up as an enhancement in a separate patch > >>>>>>> once we identify a solution for this ? > >>>>>>> > >>>>>> > >>>>>> Sounds good to me. > >>>>>> > >>>>>>>> Another difference with today's approach is that this avoids a > >>>>>>>> controller action, that's great! > >>>>>>>> > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, > >>>>>>>>> + const struct ovnact_encode_params *ep, > >>>>>>>>> + struct ofpbuf *ofpacts) > >>>>>>>>> +{ > >>>>>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> static void > >>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, > >>>>>>>>> const struct ovnact_encode_params *ep, > >>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) > >>>>>>>>> parse_sample(ctx); > >>>>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { > >>>>>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); > >>>>>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { > >>>>>>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); > >>>>>>>>> } else { > >>>>>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); > >>>>>>>>> } > >>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c > >>>>>>>>> index d1465ddf7..de18694a0 100644 > >>>>>>>>> --- a/northd/northd.c > >>>>>>>>> +++ b/northd/northd.c > >>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) > >>>>>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +static bool > >>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) > >>>>>>>>> +{ > >>>>>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>> > >>>>>>>> Personally, I'd prefer if we don't add another config knob and we just > >>>>>>>> make this the only way FDB works. We could also document that the FDB > >>>>>>>> SB table should be deprecated. > >>>>>>>> > >>>>>>> > >>>>>>> If we want to make local FDB as default then, I think we need to handle overlay > >>>>>>> use case as well. Probably, we might have to add a new stage in logical switch > >>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or) > >>>>>>> something similar. > >>>>>>> > >>>>>>> Can we take this up in a separate patch ? > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> Given that we have a fix already for the original problem you were > >>>>>> trying to address I would prefer that we avoid adding new config knobs > >>>>>> and handle both the localnet and overlay cases at the same time. > >>>>>> > >>>>> > >>>>> Sure. > >>>>> > >>>>>> OTOH, why is there a difference? > >>>>>> > >>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the > >>>>>> learn() action) into MFF_LOG_INPORT: > >>>>>> > >>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With > >>>>>> VXLAN that's not the case but that's already a documented limitation, > >>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features > >>>>>> that need it (like egress ACLs matching against ingress port > >>>>>> identifiers) are not supported: > >>>>>> > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= > >>>>>> > >>>>> > >>>>> In case of overlay, ingress and egress pipelines are processed on different nodes > >>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB > >>>>> to learn only local VMs MACs and it never learn remote VMs MACs. > >>>>> > >>>>> Example: > >>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 > >>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens > >>>>> on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. > >>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is > >>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also > >>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. > >>>>> > >>>>> > >>>> > >>>> Ah, I see your point, you're right. So it does look like we'd need a > >>>> dedicated fdb learning stage in the egress pipeline. But that is > >>>> probably not that terrible. > >>>> > > > Hi Dumitru, > > I started working on this patch, sorry for the delay. > > On further investigation, I realised that we cannot permanently remove FDB table from southbound even > if we add new FDB leaning stage in egress pipeline. > > Following is the example: > > Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to > subnet-1 (LS1) and is running on hv1 and VM2 isconnected to subnet-2 (LS2) and is running > on hv2. Both subnets are connected to same logical router(LR). > > When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below. > - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets > flooded in LS2. > - ICMP Req on HV2: LS2-out. > - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in. HV2 learns VM2 MAC, packet gets > flooded in LS1. > - ICMP Resp on HV1: LS1-out. > - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC. > > > I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled > (or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and > explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR > have ports with unknown addr configured. > > Please let me know your thoughts. > > Thanks, > Naveen > > >>>>>> One more thing we need to take care of in order to be able to make local > >>>>>> FDB the default is "FDB refresh": > >>>>>> > >>>>>> 551527a5e68e ("controller: Update FDB timestamp") > >>>>>> > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= > >>>>>> > >>>>> > >>>>> Sure. > >>>>> > >>> > >>> Hi Naveen, > >>> > >>> Just touching base on this again, I was curious if you're still planning > >>> to work on making the FDB table local; I still think it's beneficial in > >>> general. > >>> > >>> Thanks, > >>> Dumitru > >> > >> > >> Hi Dumitru, > >> > >> Yes, I will work on this. I will try to send out the patch by Jan 1st week. > >> > > > > Awesome, thanks Naveen! No rush though, I just wanted to confirm > > whether you were interested in pursuing this. > > > > Best regards, > > Dumitru > > > > > > > Hi Naveen, > > I did run into a similar issue with MAC binding, with distributed routers there is currently no other way to share the MAC binding. That made me wonder if it's worth pursuing, as the complexity of the code would significantly increase having half in-memory, half DB solution. > With the aging option the DB size shouldn't be that problematic anymore not sure if there are any other concerns with the current implementation. > > From my point of view this might be a dead end, however others might have slightly different opinions. > > Thanks, > Ales > > -- > Ales Musil > Senior Software Engineer - OVN Core > Red Hat EMEA [redhat.com] > amusil@redhat.com > [red.ht] Hi Ales, Thanks for the reply. I definitely agree with you on the additional code complexity that we have to deal with when we have both implementations. I think this FDB local helps to reduce the overhead on SB in case of VLAN networks depending on the size of the network. Also, one more issue we have noticed with FDB table when fdb learning is enabled on localnet port: - If underlay TOR switch floods some packets then, DB conflicts (OVSDB reports constraint violation errors) are occuring when controllers are trying to insert entries to FDB table. This is happening because when underlay TOR switch floods a packet then, all controllers receives packets at the same time and try to install FDB entry at the same time. Only first one succeeds and others report error. This cause controllers to recompute all the flows from SB. On scale networks, this is making OVN controllers busy. If we are ok to support FDB local implementation only for VLAN subnets then, I can look at possible options to reduce the code complexity. Any other suggestions are welcome. Thanks, Naveen
On 3/14/24 15:21, Naveen Yerramneni wrote: > > >> On 08-Mar-2024, at 2:37 PM, Ales Musil <amusil@redhat.com> wrote: >> >> >> >> On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni <naveen.yerramneni@nutanix.com> wrote: >> >> >>> On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 12/18/23 16:17, Naveen Yerramneni wrote: >>>> >>>> >>>>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>> >>>>> On 11/30/23 16:32, Dumitru Ceara wrote: >>>>>> On 11/30/23 15:54, Naveen Yerramneni wrote: >>>>>>> >>>>>>> >>>>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>> >>>>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>>>>>>>>> This functionality can be enabled at the logical switch level: >>>>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this >>>>>>>>>>> functionality, it is disabled by default. >>>>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>>>>>>>>> for locally learned fdb flows, default timeout is 300 secs. >>>>>>>>>>> >>>>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set. >>>>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>>>>>>>>> >>>>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>>>>>>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>>>>>>>>> >>>>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports >>>>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN >>>>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and >>>>>>>>>>> destination VMs keeps flapping between VM port and localnet port in >>>>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>>>>>>>>> the chassis and avoids MAC flapping. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> Hi Naveen, >>>>>>>>>> >>>>>>>>>> Thanks a lot for the patch! >>>>>>>>>> >>>>>>>>>> Just a note, we already have a fix for the east-west traffic that causes >>>>>>>>>> FDB flapping when localnet is used: >>>>>>>>>> >>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>>>>>>>>> >>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>>>>>>>>> >>>>>>>>>> In general, however, I think it's a very good idea to move the FDB away >>>>>>>>>> from the Southbound and make it local to each hypervisor. That reduces >>>>>>>>>> load on the Southbound among other things. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Dumitru, >>>>>>>>> >>>>>>>>> Thanks for informing about the patches. >>>>>>>>> Yes, local FDB reduces load on southbound. >>>>>>>>> >>>>>>>>> >>>>>>>>>>> include/ovn/actions.h | 7 +++ >>>>>>>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>>>>>>>>> northd/northd.c | 26 ++++++++++ >>>>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>>>> tests/ovn.at [ovn.at] | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>>> utilities/ovn-trace.c | 2 + >>>>>>>>>>> 6 files changed, 251 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>>>>>>>>> index 49cfe0624..85ac92cd3 100644 >>>>>>>>>>> --- a/include/ovn/actions.h >>>>>>>>>>> +++ b/include/ovn/actions.h >>>>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>>>>>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>>>>>>>>> OVNACT(SAMPLE, ovnact_sample) \ >>>>>>>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>>>>>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>>>>>>>>> >>>>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>>>>>>>>> enum OVS_PACKED_ENUM ovnact_type { >>>>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>>>>>>>>> uint16_t timeout; >>>>>>>>>>> }; >>>>>>>>>>> >>>>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>>>>>>>>> +struct ovnact_commit_fdb_local{ >>>>>>>>>>> + struct ovnact ovnact; >>>>>>>>>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>>>>>>>>> +}; >>>>>>>>>>> + >>>>>>>>>>> /* Internal use by the helpers below. */ >>>>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c >>>>>>>>>>> index a73fe1a1e..f5aa78db1 100644 >>>>>>>>>>> --- a/lib/actions.c >>>>>>>>>>> +++ b/lib/actions.c >>>>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>>>>>>>>> ds_put_cstr(s, "mac_cache_use;"); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> +static void >>>>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx, >>>>>>>>>>> + struct ovnact_commit_fdb_local *fdb_local) >>>>>>>>>>> +{ >>>>>>>>>>> + uint16_t timeout = 0; >>>>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>>>>>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + fdb_local->timeout = timeout; >>>>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> +static void >>>>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>>>> + struct ds *s) >>>>>>>>>>> +{ >>>>>>>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> +static void >>>>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>>>>>>>>> +{ >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> +static void >>>>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>>>>>>>>> +{ >>>>>>>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>>>>>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>>>>>>>>> + struct ofpact_learn_spec *ol_spec; >>>>>>>>>>> + unsigned int imm_bytes; >>>>>>>>>>> + uint8_t *src_imm; >>>>>>>>>>> + >>>>>>>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>>>>>>>>> + ol->idle_timeout = fdb_local->timeout; >>>>>>>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>>>>>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>>>>>>>>> + ol->table_id = OFTABLE_GET_FDB; >>>>>>>>>>> + ol->cookie = htonll(cookie); >>>>>>>>>>> + >>>>>>>>>>> + /* Match on metadata of the packet that created the new table. */ >>>>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>>>>>>>>> + >>>>>>>>>>> + /* Match on metadata of the packet. */ >>>>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>>>>>>>>> + >>>>>>>>>>> + >>>>>>>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>>>>>>>>> + >>>>>>>>>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>>>>>>>>> +} >>>>>>>>>> >>>>>>>>>> A difference from today's SB.FDB centralized approach is that when >>>>>>>>>> ovn-controller restarts these flows will be cleared, I think. >>>>>>>>>> >>>>>>>>>> Are we OK with that? I think so but if not what are the options to >>>>>>>>>> avoid clearing the local fdb cache on restart? >>>>>>>>>> >>>>>>>>> >>>>>>>>> OVS has to relearn the FDB flows whenever tables are cleared. >>>>>>>>> During this time, packets gets flooded. I need to think about possible >>>>>>>>> options if we want to retain FDB table. >>>>>>>>> >>>>>>>>> Can we take this up as an enhancement in a separate patch >>>>>>>>> once we identify a solution for this ? >>>>>>>>> >>>>>>>> >>>>>>>> Sounds good to me. >>>>>>>> >>>>>>>>>> Another difference with today's approach is that this avoids a >>>>>>>>>> controller action, that's great! >>>>>>>>>> >>>>>>>>>>> + >>>>>>>>>>> +static void >>>>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>>>> + const struct ovnact_encode_params *ep, >>>>>>>>>>> + struct ofpbuf *ofpacts) >>>>>>>>>>> +{ >>>>>>>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> static void >>>>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>>>>>>>>> const struct ovnact_encode_params *ep, >>>>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>>>>>>>>> parse_sample(ctx); >>>>>>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>>>>>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>>>>>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>>>>>>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>>>>>>>>> } else { >>>>>>>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>>>>>>>>> } >>>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>>>> index d1465ddf7..de18694a0 100644 >>>>>>>>>>> --- a/northd/northd.c >>>>>>>>>>> +++ b/northd/northd.c >>>>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>>>>>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> +static bool >>>>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>>>>>>>>> +{ >>>>>>>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>> >>>>>>>>>> Personally, I'd prefer if we don't add another config knob and we just >>>>>>>>>> make this the only way FDB works. We could also document that the FDB >>>>>>>>>> SB table should be deprecated. >>>>>>>>>> >>>>>>>>> >>>>>>>>> If we want to make local FDB as default then, I think we need to handle overlay >>>>>>>>> use case as well. Probably, we might have to add a new stage in logical switch >>>>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or) >>>>>>>>> something similar. >>>>>>>>> >>>>>>>>> Can we take this up in a separate patch ? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> Given that we have a fix already for the original problem you were >>>>>>>> trying to address I would prefer that we avoid adding new config knobs >>>>>>>> and handle both the localnet and overlay cases at the same time. >>>>>>>> >>>>>>> >>>>>>> Sure. >>>>>>> >>>>>>>> OTOH, why is there a difference? >>>>>>>> >>>>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the >>>>>>>> learn() action) into MFF_LOG_INPORT: >>>>>>>> >>>>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With >>>>>>>> VXLAN that's not the case but that's already a documented limitation, >>>>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features >>>>>>>> that need it (like egress ACLs matching against ingress port >>>>>>>> identifiers) are not supported: >>>>>>>> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= >>>>>>>> >>>>>>> >>>>>>> In case of overlay, ingress and egress pipelines are processed on different nodes >>>>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB >>>>>>> to learn only local VMs MACs and it never learn remote VMs MACs. >>>>>>> >>>>>>> Example: >>>>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 >>>>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens >>>>>>> on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. >>>>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is >>>>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also >>>>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. >>>>>>> >>>>>>> >>>>>> >>>>>> Ah, I see your point, you're right. So it does look like we'd need a >>>>>> dedicated fdb learning stage in the egress pipeline. But that is >>>>>> probably not that terrible. >>>>>> >> >> >> Hi Dumitru, >> Hi Naveen, >> I started working on this patch, sorry for the delay. >> Thanks for following up! >> On further investigation, I realised that we cannot permanently remove FDB table from southbound even >> if we add new FDB leaning stage in egress pipeline. >> >> Following is the example: >> >> Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to >> subnet-1 (LS1) and is running on hv1 and VM2 isconnected to subnet-2 (LS2) and is running >> on hv2. Both subnets are connected to same logical router(LR). >> >> When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below. >> - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets >> flooded in LS2. >> - ICMP Req on HV2: LS2-out. >> - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in. HV2 learns VM2 MAC, packet gets >> flooded in LS1. >> - ICMP Resp on HV1: LS1-out. >> - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC. >> >> >> I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled >> (or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and >> explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR >> have ports with unknown addr configured. >> >> Please let me know your thoughts. My first impression is that it might become confusing to users if the FDB local implementation is only applicablel for VLAN subnets. I'd prefer to avoid having two different implementations (centralized - in SB - vs decentralized - local). >> >> Thanks, >> Naveen >> >>>>>>>> One more thing we need to take care of in order to be able to make local >>>>>>>> FDB the default is "FDB refresh": >>>>>>>> >>>>>>>> 551527a5e68e ("controller: Update FDB timestamp") >>>>>>>> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= >>>>>>>> >>>>>>> >>>>>>> Sure. >>>>>>> >>>>> >>>>> Hi Naveen, >>>>> >>>>> Just touching base on this again, I was curious if you're still planning >>>>> to work on making the FDB table local; I still think it's beneficial in >>>>> general. >>>>> >>>>> Thanks, >>>>> Dumitru >>>> >>>> >>>> Hi Dumitru, >>>> >>>> Yes, I will work on this. I will try to send out the patch by Jan 1st week. >>>> >>> >>> Awesome, thanks Naveen! No rush though, I just wanted to confirm >>> whether you were interested in pursuing this. >>> >>> Best regards, >>> Dumitru >>> >>> >> >> >> Hi Naveen, >> >> I did run into a similar issue with MAC binding, with distributed routers there is currently no other way to share the MAC binding. That made me wonder if it's worth pursuing, as the complexity of the code would significantly increase having half in-memory, half DB solution. >> With the aging option the DB size shouldn't be that problematic anymore not sure if there are any other concerns with the current implementation. >> >> From my point of view this might be a dead end, however others might have slightly different opinions. >> >> Thanks, >> Ales >> >> -- >> Ales Musil >> Senior Software Engineer - OVN Core >> Red Hat EMEA [redhat.com] >> amusil@redhat.com >> [red.ht] > > > Hi Ales, > > Thanks for the reply. > > I definitely agree with you on the additional code complexity that we have to deal with when we have both implementations. > I think this FDB local helps to reduce the overhead on SB in case of VLAN networks depending on the size of the network. > > Also, one more issue we have noticed with FDB table when fdb learning is enabled on localnet port: > - If underlay TOR switch floods some packets then, DB conflicts (OVSDB reports constraint violation errors) are occuring > when controllers are trying to insert entries to FDB table. This is happening because when underlay TOR switch floods > a packet then, all controllers receives packets at the same time and try to install FDB entry at the same time. Only first one > succeeds and others report error. This cause controllers to recompute all the flows from SB. On scale networks, this is making > OVN controllers busy. > Can't we address this in a similar way to what we did for MAC_Bindings and add some sort of randomness to when the SB write happens, e.g.: b416f6f65d71 ("controller: Add delay after multicast ARP packet") 9411dd300d5a ("mac-learn: Make the mac_binding struct more flexible") https://github.com/ovn-org/ovn/commit/b416f6f65d71 https://github.com/ovn-org/ovn/commit/9411dd300d5a > If we are ok to support FDB local implementation only for VLAN subnets then, I can look at possible options to reduce the code complexity. Like I mentioned above, if we can address existing issues in the centralized (in SB) implementation, I'd prefer avoiding having two different implementations. > Any other suggestions are welcome. > > > Thanks, > Naveen Regards, Dumitru
> On 14-Mar-2024, at 9:07 PM, Dumitru Ceara <dceara@redhat.com> wrote: > > On 3/14/24 15:21, Naveen Yerramneni wrote: >> >> >>> On 08-Mar-2024, at 2:37 PM, Ales Musil <amusil@redhat.com> wrote: >>> >>> >>> >>> On Wed, Mar 6, 2024 at 8:24 PM Naveen Yerramneni <naveen.yerramneni@nutanix.com> wrote: >>> >>> >>>> On 18-Dec-2023, at 8:53 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 12/18/23 16:17, Naveen Yerramneni wrote: >>>>> >>>>> >>>>>> On 18-Dec-2023, at 7:26 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>> >>>>>> On 11/30/23 16:32, Dumitru Ceara wrote: >>>>>>> On 11/30/23 15:54, Naveen Yerramneni wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On 30-Nov-2023, at 6:06 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>>> >>>>>>>>> On 11/30/23 09:45, Naveen Yerramneni wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 29-Nov-2023, at 2:24 PM, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>> On 11/29/23 07:45, naveen.yerramneni wrote: >>>>>>>>>>>> This functionality can be enabled at the logical switch level: >>>>>>>>>>>> - "other_config:fdb_local" can be used to enable/disable this >>>>>>>>>>>> functionality, it is disabled by default. >>>>>>>>>>>> - "other_config:fdb_local_idle_timeout" sepcifies idle timeout >>>>>>>>>>>> for locally learned fdb flows, default timeout is 300 secs. >>>>>>>>>>>> >>>>>>>>>>>> If enabled, below lflow is added for each port that has unknown addr set. >>>>>>>>>>>> - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), >>>>>>>>>>>> action=(commit_fdb_local(timeout=<timeout>); next; >>>>>>>>>>>> >>>>>>>>>>>> New OVN action: "commit_fdb_local". This sets following OVS action. >>>>>>>>>>>> - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], >>>>>>>>>>>> NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) >>>>>>>>>>>> >>>>>>>>>>>> This is useful when OVN is managing VLAN network that has multiple ports >>>>>>>>>>>> set with unknown addr and localnet_learn_fdb is enabled. With this config, >>>>>>>>>>>> if there is east-west traffic flowing between VMs part of same VLAN >>>>>>>>>>>> deployed on different hypervisors then, MAC addrs of the source and >>>>>>>>>>>> destination VMs keeps flapping between VM port and localnet port in >>>>>>>>>>>> Southbound FDB table. Enabling fdb_local config makes fdb table local to >>>>>>>>>>>> the chassis and avoids MAC flapping. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> >>>>>>>>>>>> --- >>>>>>>>>>> >>>>>>>>>>> Hi Naveen, >>>>>>>>>>> >>>>>>>>>>> Thanks a lot for the patch! >>>>>>>>>>> >>>>>>>>>>> Just a note, we already have a fix for the east-west traffic that causes >>>>>>>>>>> FDB flapping when localnet is used: >>>>>>>>>>> >>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_2acf91e9628e9481c48e4a6cec8ad5159fdd6d2e&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=LP9_zs2Rj34vMx20ntbu-A3taXqKMJNVH2TLQyOXCh0&e= >>>>>>>>>>> >>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_f3a14907fe2b1ecdcfddfbed595cd097b6efbe14&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=kPuq992rikXYk63APGxlIpfqY3lPpreN9f4ha9pZKpodnVgE9KfjEUNozpPUFzUu&s=gsUGtjyf9gSOr1LkcCH0O6MB1_tjXi9fuTgwEFgbRx8&e= >>>>>>>>>>> >>>>>>>>>>> In general, however, I think it's a very good idea to move the FDB away >>>>>>>>>>> from the Southbound and make it local to each hypervisor. That reduces >>>>>>>>>>> load on the Southbound among other things. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Dumitru, >>>>>>>>>> >>>>>>>>>> Thanks for informing about the patches. >>>>>>>>>> Yes, local FDB reduces load on southbound. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> include/ovn/actions.h | 7 +++ >>>>>>>>>>>> lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ >>>>>>>>>>>> northd/northd.c | 26 ++++++++++ >>>>>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>>>>> tests/ovn.at [ovn.at] | 108 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>>>> utilities/ovn-trace.c | 2 + >>>>>>>>>>>> 6 files changed, 251 insertions(+) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h >>>>>>>>>>>> index 49cfe0624..85ac92cd3 100644 >>>>>>>>>>>> --- a/include/ovn/actions.h >>>>>>>>>>>> +++ b/include/ovn/actions.h >>>>>>>>>>>> @@ -127,6 +127,7 @@ struct collector_set_ids; >>>>>>>>>>>> OVNACT(CHK_LB_AFF, ovnact_result) \ >>>>>>>>>>>> OVNACT(SAMPLE, ovnact_sample) \ >>>>>>>>>>>> OVNACT(MAC_CACHE_USE, ovnact_null) \ >>>>>>>>>>>> + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ >>>>>>>>>>>> >>>>>>>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ >>>>>>>>>>>> enum OVS_PACKED_ENUM ovnact_type { >>>>>>>>>>>> @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { >>>>>>>>>>>> uint16_t timeout; >>>>>>>>>>>> }; >>>>>>>>>>>> >>>>>>>>>>>> +/* OVNACT_COMMIT_FBD_LOCAL. */ >>>>>>>>>>>> +struct ovnact_commit_fdb_local{ >>>>>>>>>>>> + struct ovnact ovnact; >>>>>>>>>>>> + uint16_t timeout; /* fdb_local flow timeout */ >>>>>>>>>>>> +}; >>>>>>>>>>>> + >>>>>>>>>>>> /* Internal use by the helpers below. */ >>>>>>>>>>>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); >>>>>>>>>>>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); >>>>>>>>>>>> diff --git a/lib/actions.c b/lib/actions.c >>>>>>>>>>>> index a73fe1a1e..f5aa78db1 100644 >>>>>>>>>>>> --- a/lib/actions.c >>>>>>>>>>>> +++ b/lib/actions.c >>>>>>>>>>>> @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) >>>>>>>>>>>> ds_put_cstr(s, "mac_cache_use;"); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +static void >>>>>>>>>>>> +parse_commit_fdb_local(struct action_context *ctx, >>>>>>>>>>>> + struct ovnact_commit_fdb_local *fdb_local) >>>>>>>>>>>> +{ >>>>>>>>>>>> + uint16_t timeout = 0; >>>>>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ >>>>>>>>>>>> + if (!lexer_match_id(ctx->lexer, "timeout")) { >>>>>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>>>>>> + return; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { >>>>>>>>>>>> + lexer_syntax_error(ctx->lexer, "invalid parameter"); >>>>>>>>>>>> + return; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { >>>>>>>>>>>> + return; >>>>>>>>>>>> + } >>>>>>>>>>>> + fdb_local->timeout = timeout; >>>>>>>>>>>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void >>>>>>>>>>>> +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>>>>> + struct ds *s) >>>>>>>>>>>> +{ >>>>>>>>>>>> + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void >>>>>>>>>>>> +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) >>>>>>>>>>>> +{ >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void >>>>>>>>>>>> +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>>>>> + struct ofpbuf *ofpacts, uint32_t cookie) >>>>>>>>>>>> +{ >>>>>>>>>>>> + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); >>>>>>>>>>>> + struct match match = MATCH_CATCHALL_INITIALIZER; >>>>>>>>>>>> + struct ofpact_learn_spec *ol_spec; >>>>>>>>>>>> + unsigned int imm_bytes; >>>>>>>>>>>> + uint8_t *src_imm; >>>>>>>>>>>> + >>>>>>>>>>>> + ol->flags = NX_LEARN_F_DELETE_LEARNED; >>>>>>>>>>>> + ol->idle_timeout = fdb_local->timeout; >>>>>>>>>>>> + ol->hard_timeout = OFP_FLOW_PERMANENT; >>>>>>>>>>>> + ol->priority = OFP_DEFAULT_PRIORITY; >>>>>>>>>>>> + ol->table_id = OFTABLE_GET_FDB; >>>>>>>>>>>> + ol->cookie = htonll(cookie); >>>>>>>>>>>> + >>>>>>>>>>>> + /* Match on metadata of the packet that created the new table. */ >>>>>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_METADATA); >>>>>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_METADATA); >>>>>>>>>>>> + >>>>>>>>>>>> + /* Match on metadata of the packet. */ >>>>>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); >>>>>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_MATCH; >>>>>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); >>>>>>>>>>>> + >>>>>>>>>>>> + >>>>>>>>>>>> + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ >>>>>>>>>>>> + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); >>>>>>>>>>>> + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); >>>>>>>>>>>> + ol_spec->dst.ofs = 0; >>>>>>>>>>>> + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; >>>>>>>>>>>> + ol_spec->n_bits = ol_spec->dst.n_bits; >>>>>>>>>>>> + ol_spec->dst_type = NX_LEARN_DST_LOAD; >>>>>>>>>>>> + ol_spec->src_type = NX_LEARN_SRC_FIELD; >>>>>>>>>>>> + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); >>>>>>>>>>>> + >>>>>>>>>>>> + ofpact_finish_LEARN(ofpacts, &ol); >>>>>>>>>>>> +} >>>>>>>>>>> >>>>>>>>>>> A difference from today's SB.FDB centralized approach is that when >>>>>>>>>>> ovn-controller restarts these flows will be cleared, I think. >>>>>>>>>>> >>>>>>>>>>> Are we OK with that? I think so but if not what are the options to >>>>>>>>>>> avoid clearing the local fdb cache on restart? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> OVS has to relearn the FDB flows whenever tables are cleared. >>>>>>>>>> During this time, packets gets flooded. I need to think about possible >>>>>>>>>> options if we want to retain FDB table. >>>>>>>>>> >>>>>>>>>> Can we take this up as an enhancement in a separate patch >>>>>>>>>> once we identify a solution for this ? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Sounds good to me. >>>>>>>>> >>>>>>>>>>> Another difference with today's approach is that this avoids a >>>>>>>>>>> controller action, that's great! >>>>>>>>>>> >>>>>>>>>>>> + >>>>>>>>>>>> +static void >>>>>>>>>>>> +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, >>>>>>>>>>>> + const struct ovnact_encode_params *ep, >>>>>>>>>>>> + struct ofpbuf *ofpacts) >>>>>>>>>>>> +{ >>>>>>>>>>>> + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> static void >>>>>>>>>>>> encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, >>>>>>>>>>>> const struct ovnact_encode_params *ep, >>>>>>>>>>>> @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) >>>>>>>>>>>> parse_sample(ctx); >>>>>>>>>>>> } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { >>>>>>>>>>>> ovnact_put_MAC_CACHE_USE(ctx->ovnacts); >>>>>>>>>>>> + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { >>>>>>>>>>>> + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); >>>>>>>>>>>> } else { >>>>>>>>>>>> lexer_syntax_error(ctx->lexer, "expecting action"); >>>>>>>>>>>> } >>>>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>>>>> index d1465ddf7..de18694a0 100644 >>>>>>>>>>>> --- a/northd/northd.c >>>>>>>>>>>> +++ b/northd/northd.c >>>>>>>>>>>> @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) >>>>>>>>>>>> return smap_get_bool( ->options, "localnet_learn_fdb", false); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +static bool >>>>>>>>>>>> +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) >>>>>>>>>>>> +{ >>>>>>>>>>>> + return smap_get_bool(&nbs->other_config, "fdb_local", false); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> Personally, I'd prefer if we don't add another config knob and we just >>>>>>>>>>> make this the only way FDB works. We could also document that the FDB >>>>>>>>>>> SB table should be deprecated. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If we want to make local FDB as default then, I think we need to handle overlay >>>>>>>>>> use case as well. Probably, we might have to add a new stage in logical switch >>>>>>>>>> egress pipeline to learn FDB entries for packets coming over tunnel (or) >>>>>>>>>> something similar. >>>>>>>>>> >>>>>>>>>> Can we take this up in a separate patch ? >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Given that we have a fix already for the original problem you were >>>>>>>>> trying to address I would prefer that we avoid adding new config knobs >>>>>>>>> and handle both the localnet and overlay cases at the same time. >>>>>>>>> >>>>>>>> >>>>>>>> Sure. >>>>>>>> >>>>>>>>> OTOH, why is there a difference? >>>>>>>>> >>>>>>>>> The learned flow loads MFF_LOG_INPORT (from the packet that triggers the >>>>>>>>> learn() action) into MFF_LOG_INPORT: >>>>>>>>> >>>>>>>>> With Geneve and STT overlay MFF_LOG_INPORT is still correctly set. With >>>>>>>>> VXLAN that's not the case but that's already a documented limitation, >>>>>>>>> LOG_INPORT is not available after VXLAN tunneling therefore features >>>>>>>>> that need it (like egress ACLs matching against ingress port >>>>>>>>> identifiers) are not supported: >>>>>>>>> >>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_main_ovn-2Darchitecture.7.xml-23L2842&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=AyiJd4NaZd3P4HGvFbMO9No0cO8bux35PJNhtr8ZGe4&e= >>>>>>>>> >>>>>>>> >>>>>>>> In case of overlay, ingress and egress pipelines are processed on different nodes >>>>>>>> assuming source and dest VMs are on different nodes. I think this makes local FDB >>>>>>>> to learn only local VMs MACs and it never learn remote VMs MACs. >>>>>>>> >>>>>>>> Example: >>>>>>>> 2 VMs with unknown addr set are connected to same logical switch where vm1 is on hv1 >>>>>>>> and vm2 is on hv2. When vm1 is sending ICMP req packet to vm2, ingress pipeline happens >>>>>>>> on hv1 and it learns vm1-mac and this packet gets flooded since vm2-mac is not yet learned. >>>>>>>> Packet reaches hv2 over tunnel and egress pipeline is exercised on hv2. Now, when vm2 is >>>>>>>> responding back, ingress pipeline happens on hv2 and it learns vm2-mac and this packet also >>>>>>>> gets flooded since vm1-mac is not learnt on hv2 when ICMP req packet is received. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Ah, I see your point, you're right. So it does look like we'd need a >>>>>>> dedicated fdb learning stage in the egress pipeline. But that is >>>>>>> probably not that terrible. >>>>>>> >>> >>> >>> Hi Dumitru, >>> > > Hi Naveen, > >>> I started working on this patch, sorry for the delay. >>> > > Thanks for following up! > >>> On further investigation, I realised that we cannot permanently remove FDB table from southbound even >>> if we add new FDB leaning stage in egress pipeline. >>> >>> Following is the example: >>> >>> Two VMs part of different overlay subnets have unknown addr set whereVM1 is connected to >>> subnet-1 (LS1) and is running on hv1 and VM2 isconnected to subnet-2 (LS2) and is running >>> on hv2. Both subnets are connected to same logical router(LR). >>> >>> When VM1 is sending ICMP packet to VM2, pipeline stages gets processed as below. >>> - ICMP Req on HV1: LS1-in -> LR-in -> LR-out -> LS2-in. HV1 learns VM1 MAC, packet gets >>> flooded in LS2. >>> - ICMP Req on HV2: LS2-out. >>> - ICMP Resp on HV2: LS2-in -> LR-in -> LR-out -> LS1-in. HV2 learns VM2 MAC, packet gets >>> flooded in LS1. >>> - ICMP Resp on HV1: LS1-out. >>> - Every packet gets flooded since HV1 never learns VM2 MAC and HV2 never learns VM1 MAC. >>> >>> >>> I am thinking either we can make FDB local implementation only applicable for VLAN subnets with default disabled >>> (or) implement is for both VLAN and overlay subnets with default disabled by documenting the limitation and >>> explore the possibility of ignoring this setting in northd when multiple overlay subnets connected to same LR >>> have ports with unknown addr configured. >>> >>> Please let me know your thoughts. > > My first impression is that it might become confusing to users if the > FDB local implementation is only applicablel for VLAN subnets. I'd > prefer to avoid having two different implementations (centralized - in > SB - vs decentralized - local). > >>> >>> Thanks, >>> Naveen >>> >>>>>>>>> One more thing we need to take care of in order to be able to make local >>>>>>>>> FDB the default is "FDB refresh": >>>>>>>>> >>>>>>>>> 551527a5e68e ("controller: Update FDB timestamp") >>>>>>>>> >>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_551527a5e68e7233ad80d212d549df98f13e37bc&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=LO8-TXDG00EgnhB_oBbKGhuFWEv1FgwWbp-oMgCaMjsrN1ow1XIgiNdpeQ2FpNyQ&s=dNv1oya7S00oKbz1E9i5IyhqDOsEajuutcKSV6bIAfs&e= >>>>>>>>> >>>>>>>> >>>>>>>> Sure. >>>>>>>> >>>>>> >>>>>> Hi Naveen, >>>>>> >>>>>> Just touching base on this again, I was curious if you're still planning >>>>>> to work on making the FDB table local; I still think it's beneficial in >>>>>> general. >>>>>> >>>>>> Thanks, >>>>>> Dumitru >>>>> >>>>> >>>>> Hi Dumitru, >>>>> >>>>> Yes, I will work on this. I will try to send out the patch by Jan 1st week. >>>>> >>>> >>>> Awesome, thanks Naveen! No rush though, I just wanted to confirm >>>> whether you were interested in pursuing this. >>>> >>>> Best regards, >>>> Dumitru >>>> >>>> >>> >>> >>> Hi Naveen, >>> >>> I did run into a similar issue with MAC binding, with distributed routers there is currently no other way to share the MAC binding. That made me wonder if it's worth pursuing, as the complexity of the code would significantly increase having half in-memory, half DB solution. >>> With the aging option the DB size shouldn't be that problematic anymore not sure if there are any other concerns with the current implementation. >>> >>> From my point of view this might be a dead end, however others might have slightly different opinions. >>> >>> Thanks, >>> Ales >>> >>> -- >>> Ales Musil >>> Senior Software Engineer - OVN Core >>> Red Hat EMEA [redhat.com] >>> amusil@redhat.com >>> [red.ht] >> >> >> Hi Ales, >> >> Thanks for the reply. >> >> I definitely agree with you on the additional code complexity that we have to deal with when we have both implementations. >> I think this FDB local helps to reduce the overhead on SB in case of VLAN networks depending on the size of the network. >> >> Also, one more issue we have noticed with FDB table when fdb learning is enabled on localnet port: >> - If underlay TOR switch floods some packets then, DB conflicts (OVSDB reports constraint violation errors) are occuring >> when controllers are trying to insert entries to FDB table. This is happening because when underlay TOR switch floods >> a packet then, all controllers receives packets at the same time and try to install FDB entry at the same time. Only first one >> succeeds and others report error. This cause controllers to recompute all the flows from SB. On scale networks, this is making >> OVN controllers busy. >> > > Can't we address this in a similar way to what we did for MAC_Bindings > and add some sort of randomness to when the SB write happens, e.g.: > > b416f6f65d71 ("controller: Add delay after multicast ARP packet") > 9411dd300d5a ("mac-learn: Make the mac_binding struct more flexible") > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_b416f6f65d71&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=c0ceNH31esimVUy0sy65PbuIkZ0t3-hk9pq8tlyPKdZClTRxKXF0G2yiKgxtVzpU&s=sEl-2YZAwckbT5tjW6O64qWcD2iaCGy1fdcg53CQs44&e= > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_9411dd300d5a&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=c0ceNH31esimVUy0sy65PbuIkZ0t3-hk9pq8tlyPKdZClTRxKXF0G2yiKgxtVzpU&s=F-0lA-Wm0hPorf8JdI8chPm_nYff8IXatt4F6hE5qpA&e= Hi Dumitru, Thanks for pointing me to these, I will go through this and get back. > >> If we are ok to support FDB local implementation only for VLAN subnets then, I can look at possible options to reduce the code complexity. > > Like I mentioned above, if we can address existing issues in the > centralized (in SB) implementation, I'd prefer avoiding having two > different implementations. ACK > >> Any other suggestions are welcome. >> >> >> Thanks, >> Naveen > > Regards, > Dumitru
diff --git a/include/ovn/actions.h b/include/ovn/actions.h index 49cfe0624..85ac92cd3 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -127,6 +127,7 @@ struct collector_set_ids; OVNACT(CHK_LB_AFF, ovnact_result) \ OVNACT(SAMPLE, ovnact_sample) \ OVNACT(MAC_CACHE_USE, ovnact_null) \ + OVNACT(COMMIT_FDB_LOCAL, ovnact_commit_fdb_local) \ /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ enum OVS_PACKED_ENUM ovnact_type { @@ -514,6 +515,12 @@ struct ovnact_commit_lb_aff { uint16_t timeout; }; +/* OVNACT_COMMIT_FBD_LOCAL. */ +struct ovnact_commit_fdb_local{ + struct ovnact ovnact; + uint16_t timeout; /* fdb_local flow timeout */ +}; + /* Internal use by the helpers below. */ void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); diff --git a/lib/actions.c b/lib/actions.c index a73fe1a1e..f5aa78db1 100644 --- a/lib/actions.c +++ b/lib/actions.c @@ -5236,6 +5236,98 @@ format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s) ds_put_cstr(s, "mac_cache_use;"); } +static void +parse_commit_fdb_local(struct action_context *ctx, + struct ovnact_commit_fdb_local *fdb_local) +{ + uint16_t timeout = 0; + lexer_force_match(ctx->lexer, LEX_T_LPAREN); /* Skip '('. */ + if (!lexer_match_id(ctx->lexer, "timeout")) { + lexer_syntax_error(ctx->lexer, "invalid parameter"); + return; + } + if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { + lexer_syntax_error(ctx->lexer, "invalid parameter"); + return; + } + if (!action_parse_uint16(ctx, &timeout, "fdb_local flow timeout")) { + return; + } + fdb_local->timeout = timeout; + lexer_force_match(ctx->lexer, LEX_T_RPAREN); /* Skip ')'. */ +} + +static void +format_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, + struct ds *s) +{ + ds_put_format(s, "commit_fdb_local(timeout=%u);", fdb_local->timeout); +} + +static void +ovnact_commit_fdb_local_free(struct ovnact_commit_fdb_local *fdb_local OVS_UNUSED) +{ +} + +static void +commit_fdb_local_learn_action(struct ovnact_commit_fdb_local *fdb_local, + struct ofpbuf *ofpacts, uint32_t cookie) +{ + struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); + struct match match = MATCH_CATCHALL_INITIALIZER; + struct ofpact_learn_spec *ol_spec; + unsigned int imm_bytes; + uint8_t *src_imm; + + ol->flags = NX_LEARN_F_DELETE_LEARNED; + ol->idle_timeout = fdb_local->timeout; + ol->hard_timeout = OFP_FLOW_PERMANENT; + ol->priority = OFP_DEFAULT_PRIORITY; + ol->table_id = OFTABLE_GET_FDB; + ol->cookie = htonll(cookie); + + /* Match on metadata of the packet that created the new table. */ + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); + ol_spec->dst.field = mf_from_id(MFF_METADATA); + ol_spec->dst.ofs = 0; + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; + ol_spec->n_bits = ol_spec->dst.n_bits; + ol_spec->dst_type = NX_LEARN_DST_MATCH; + ol_spec->src_type = NX_LEARN_SRC_FIELD; + ol_spec->src.field = mf_from_id(MFF_METADATA); + + /* Match on metadata of the packet. */ + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); + ol_spec->dst.field = mf_from_id(MFF_ETH_DST); + ol_spec->dst.ofs = 0; + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; + ol_spec->n_bits = ol_spec->dst.n_bits; + ol_spec->dst_type = NX_LEARN_DST_MATCH; + ol_spec->src_type = NX_LEARN_SRC_FIELD; + ol_spec->src.field = mf_from_id(MFF_ETH_SRC); + + + /* Load MFF_LOG_OUTPORT from MFF_IN_PORT. */ + ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); + ol_spec->dst.field = mf_from_id(MFF_LOG_OUTPORT); + ol_spec->dst.ofs = 0; + ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; + ol_spec->n_bits = ol_spec->dst.n_bits; + ol_spec->dst_type = NX_LEARN_DST_LOAD; + ol_spec->src_type = NX_LEARN_SRC_FIELD; + ol_spec->src.field = mf_from_id(MFF_LOG_INPORT); + + ofpact_finish_LEARN(ofpacts, &ol); +} + +static void +encode_COMMIT_FDB_LOCAL(const struct ovnact_commit_fdb_local *fdb_local, + const struct ovnact_encode_params *ep, + struct ofpbuf *ofpacts) +{ + commit_fdb_local_learn_action(fdb_local, ofpacts, ep->lflow_uuid.parts[0]); +} + static void encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, const struct ovnact_encode_params *ep, @@ -5451,6 +5543,8 @@ parse_action(struct action_context *ctx) parse_sample(ctx); } else if (lexer_match_id(ctx->lexer, "mac_cache_use")) { ovnact_put_MAC_CACHE_USE(ctx->ovnacts); + } else if (lexer_match_id(ctx->lexer, "commit_fdb_local")) { + parse_commit_fdb_local(ctx, ovnact_put_COMMIT_FDB_LOCAL(ctx->ovnacts)); } else { lexer_syntax_error(ctx->lexer, "expecting action"); } diff --git a/northd/northd.c b/northd/northd.c index d1465ddf7..de18694a0 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -1834,6 +1834,12 @@ localnet_can_learn_mac(const struct nbrec_logical_switch_port *nbsp) return smap_get_bool( ->options, "localnet_learn_fdb", false); } +static bool +ls_is_fdb_local(const struct nbrec_logical_switch *nbs) +{ + return smap_get_bool(&nbs->other_config, "fdb_local", false); +} + static bool lsp_is_type_changed(const struct sbrec_port_binding *sb, const struct nbrec_logical_switch_port *nbsp, @@ -7033,6 +7039,8 @@ build_lswitch_port_sec_op(struct ovn_port *op, struct hmap *lflows, } } +#define FDB_LOCAL_DEF_IDLE_TIMEOUT_S 300 + static void build_lswitch_learn_fdb_op( struct ovn_port *op, struct hmap *lflows, @@ -7042,6 +7050,24 @@ build_lswitch_learn_fdb_op( if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, "") || (lsp_is_localnet(op->nbsp) && localnet_can_learn_mac(op->nbsp)))) { + + if (ls_is_fdb_local(op->od->nbs)) + { + uint32_t idle_timeout =smap_get_uint( + &op->od->nbs->other_config, "fdb_local_idle_timeout", + FDB_LOCAL_DEF_IDLE_TIMEOUT_S); + ds_clear(match); + ds_clear(actions); + ds_put_format(match, "inport == %s", op->json_key); + ds_put_format(actions, "commit_fdb_local(timeout=%u); next;", + idle_timeout); + ovn_lflow_add_with_lport_and_hint(lflows, op->od, + S_SWITCH_IN_LOOKUP_FDB, 100, + ds_cstr(match), ds_cstr(actions), + op->key, &op->nbsp->header_); + return; + } + ds_clear(match); ds_clear(actions); ds_put_format(match, "inport == %s", op->json_key); diff --git a/ovn-nb.xml b/ovn-nb.xml index fcb1c6ecc..3547ec4a6 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -803,6 +803,20 @@ </column> </group> + <group title="Local FDB options"> + <column name="other_config" key="fdb_local" + type='{"type": "boolean"}'> + If set to <code>true</code>, FDB flows are commited only to the + local chassis instead of southbound DB. Default is false. + </column> + <column name="other_config" key="fdb_local_idle_timeout" + type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'> + Local FDB flows <code>idle_timeout</code> value in seconds. FDB local + flows exceeding this timeout will be automatically removed. The value + defaults to 300, 0 means disabled. + </column> + </group> + <column name="copp"> <p> The control plane protection policy from table <ref table="Copp"/> diff --git a/tests/ovn.at b/tests/ovn.at index 92cf27581..dff50364a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -34504,6 +34504,114 @@ OVN_CLEANUP([hv1]) AT_CLEANUP ]) +OVN_FOR_EACH_NORTHD([ +AT_SETUP([Local FDB MAC learning]) +ovn_start +net_add n1 + +AT_CHECK([ovn-nbctl ls-add ls0]) + +AT_CHECK([ovn-nbctl lsp-add ls0 vif0]) +AT_CHECK([ovn-nbctl lsp-set-addresses vif0 "50:54:00:00:00:03 10.0.0.3" "unknown"]) +AT_CHECK([ovn-nbctl set logical_switch_port vif0 options:requested-tnl-key=2]) + +AT_CHECK([ovn-nbctl lsp-add ls0 vif1]) +AT_CHECK([ovn-nbctl lsp-set-addresses vif1 "50:54:00:00:00:04 10.0.0.4"]) +AT_CHECK([ovn-nbctl set logical_switch_port vif1 options:requested-tnl-key=3]) + +AT_CHECK([ovn-nbctl lsp-add ls0 ln_port]) +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:localnet_learn_fdb=true]) +AT_CHECK([ovn-nbctl set logical_switch_port ln_port options:requested-tnl-key=1]) + +AT_CHECK([ovn-nbctl set logical_switch ls0 other_config:fdb_local=true]) + +sim_add hv1 +as hv1 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.1 +ovs-vsctl -- add-port br-int vif0 -- \ + set interface vif0 external-ids:iface-id=vif0 \ + options:tx_pcap=hv1/vif0-tx.pcap \ + options:rxq_pcap=hv1/vif0-rx.pcap \ + ofport-request=1 +ovs-vsctl -- add-port br-int vif1 -- \ + set interface vif1 external-ids:iface-id=vif1 \ + options:tx_pcap=hv1/vif1-tx.pcap \ + options:rxq_pcap=hv1/vif1-rx.pcap \ + ofport-request=2 +ovs-vsctl -- add-port br-phys ext0 -- \ + set interface ext0 \ + options:tx_pcap=hv1/ext0-tx.pcap \ + options:rxq_pcap=hv1/ext0-rx.pcap \ + ofport-request=3 +ovs-vsctl set open . external_ids:ovn-bridge-mappings=physnet1:br-phys + +wait_for_ports_up +AT_CHECK([ovn-nbctl --wait=hv sync]) + +send_packet() { + src_mac=$1 + src_ip=$2 + dst_mac=$3 + dst_ip=$4 + iface=$5 + packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 + ovs-appctl netdev-dummy/receive $iface $packet +} + + +# Check that there is commit_fdb_local_fdb() flow added by ovn-northd for vif0 and localnet +ovn-sbctl dump-flows ls0 > sw0flows +AT_CAPTURE_FILE([sw0flows]) + +AT_CHECK([grep "ls_in_lookup_fdb" sw0flows | sort], [0], [dnl + table=2 (ls_in_lookup_fdb ), priority=0 , dnl +match=(1), action=(next;) + table=2 (ls_in_lookup_fdb ), priority=100 , dnl +match=(inport == "ln_port"), action=(commit_fdb_local(timeout=300); next;) + table=2 (ls_in_lookup_fdb ), priority=100 , dnl +match=(inport == "vif0"), action=(commit_fdb_local(timeout=300); next;) +]) + +AT_CHECK([grep "ls_in_put_fdb" sw0flows | sort], [0], [dnl + table=3 (ls_in_put_fdb ), priority=0 , dnl +match=(1), action=(next;) +]) + + +src_mac="505400000003" +src_ip=`ip_to_hex 10.0.0.3` +dst_mac="505400000004" +dst_ip=`ip_to_hex 10.0.0.4` + +# send packet from vif0(which has unknown addr set) to vif1 +send_packet $src_mac $src_ip $dst_mac $dst_ip vif0 + +# send packet from vif1 to vif0(which has unknown addr set) +send_packet $dst_mac $dst_ip $src_mac $src_ip vif1 + +# send packet from underlay to vif1 +src_mac="505400000064" +src_ip=`ip_to_hex 10.0.0.100` +send_packet $src_mac $src_ip $dst_mac $dst_ip ext0 +AT_CHECK([ovn-nbctl --wait=hv sync]) + +# Make sure that OVS table 71 is populated on hv1. +AS_BOX([Check that ovn-controller programs the flows for FDB]) +as hv1 ovs-ofctl dump-flows br-int table=71 > hv1_offlows_table71.txt +AT_CAPTURE_FILE([hv1_offlows_table71.txt]) +AT_CHECK([cat hv1_offlows_table71.txt | grep -v NXST | cut -d ' ' -f7- | sort], [0], [dnl +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:03 actions=load:0x2->NXM_NX_REG15[[]] +idle_timeout=300, idle_age=0, metadata=0x1,dl_dst=50:54:00:00:00:64 actions=load:0x1->NXM_NX_REG15[[]] +]) + +OVN_CLEANUP([hv1]) +AT_CLEANUP +]) + OVN_FOR_EACH_NORTHD([ AT_SETUP([MAC binding aging]) AT_SKIP_IF([test $HAVE_SCAPY = no]) diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c index 0b86eae7b..354f84a4b 100644 --- a/utilities/ovn-trace.c +++ b/utilities/ovn-trace.c @@ -3355,6 +3355,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, break; case OVNACT_MAC_CACHE_USE: break; + case OVNACT_COMMIT_FDB_LOCAL: + break; } } ofpbuf_uninit(&stack);
This functionality can be enabled at the logical switch level: - "other_config:fdb_local" can be used to enable/disable this functionality, it is disabled by default. - "other_config:fdb_local_idle_timeout" sepcifies idle timeout for locally learned fdb flows, default timeout is 300 secs. If enabled, below lflow is added for each port that has unknown addr set. - table=2 (ls_in_lookup_fdb), priority=100, match=(inport == <in_port>), action=(commit_fdb_local(timeout=<timeout>); next; New OVN action: "commit_fdb_local". This sets following OVS action. - learn(table=71,idle_timeout=<timeout>,delete_learned,OXM_OF_METADATA[], NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_REG14[]->NXM_NX_REG15[]) This is useful when OVN is managing VLAN network that has multiple ports set with unknown addr and localnet_learn_fdb is enabled. With this config, if there is east-west traffic flowing between VMs part of same VLAN deployed on different hypervisors then, MAC addrs of the source and destination VMs keeps flapping between VM port and localnet port in Southbound FDB table. Enabling fdb_local config makes fdb table local to the chassis and avoids MAC flapping. Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> --- include/ovn/actions.h | 7 +++ lib/actions.c | 94 ++++++++++++++++++++++++++++++++++++ northd/northd.c | 26 ++++++++++ ovn-nb.xml | 14 ++++++ tests/ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++ utilities/ovn-trace.c | 2 + 6 files changed, 251 insertions(+)