[ovs-dev,v5,8/8] ovn-northd-ddlog: New implementation of ovn-northd based on ddlog.

Message ID	20201112014559.1494128-9-blp@ovn.org
State	Superseded
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> sender: blp@ovn.org) by relay7-d.mail.gandi.net (Postfix) with ESMTPSA id 9004320008; Thu, 12 Nov 2020 01:46:19 +0000 (UTC) From: Ben Pfaff <blp@ovn.org> To: dev@openvswitch.org Date: Wed, 11 Nov 2020 17:45:59 -0800 Message-Id: <20201112014559.1494128-9-blp@ovn.org> In-Reply-To: <20201112014559.1494128-1-blp@ovn.org> References: <20201112014559.1494128-1-blp@ovn.org> MIME-Version: 1.0 Cc: Leonid Ryzhyk <lryzhyk@vmware.com>, Ben Pfaff <blp@ovn.org> Subject: [ovs-dev] [PATCH ovn v5 8/8] ovn-northd-ddlog: New implementation of ovn-northd based on ddlog. Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" <ovs-dev-bounces@openvswitch.org>
Series	Add DDlog implementation of ovn-northd \| expand [ovs-dev,v5,0/8] Add DDlog implementation of ovn-northd [ovs-dev,v5,1/8] Export `VLOG_WARN` and `VLOG_ERR` from libovn for use in ddlog [ovs-dev,v5,2/8] tests: Prepare for multiple northd types. [ovs-dev,v5,3/8] tests: Use portable "test a = b", not "test a == b". [ovs-dev,v5,4/8] tests: Allow more arguments to wait_for_row_count(), wait_column(). [ovs-dev,v5,5/8] tests: Eliminate most "sleep" calls. [ovs-dev,v5,6/8] tests: Improve debuggability of tests. [ovs-dev,v5,7/8] tests: Improve "ARP/ND request broadcast limiting" test. [ovs-dev,v5,8/8] ovn-northd-ddlog: New implementation of ovn-northd based on ddlog.

> On Nov 11, 2020, at 8:45 PM, Ben Pfaff <blp@ovn.org> wrote: > > From: Leonid Ryzhyk <lryzhyk@vmware.com> > > This implementation is incremental, meaning that it only recalculates > what is needed for the southbound database when northbound changes > occur. It is expected to scale better than the C implementation, > for large deployments. (This may take testing and tuning to be > effective.) > > There are three tests that I'm having mysterious trouble getting > to work with DDlog. For now, I've marked the testsuite to skip > them unless RUN_ANYWAY=yes is set in the environment. > > Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com> > Co-authored-by: Justin Pettit <jpettit@ovn.org> > Signed-off-by: Justin Pettit <jpettit@ovn.org> > Co-authored-by: Ben Pfaff <blp@ovn.org> > Signed-off-by: Ben Pfaff <blp@ovn.org> > --- > Documentation/automake.mk | 2 + > Documentation/intro/install/general.rst | 31 +- > Documentation/topics/debugging-ddlog.rst | 280 + > Documentation/topics/index.rst | 1 + > Documentation/tutorials/ddlog-new-feature.rst | 362 + > Documentation/tutorials/index.rst | 1 + > NEWS | 6 + > acinclude.m4 | 43 + > configure.ac | 5 + > m4/ovn.m4 | 16 + > northd/.gitignore | 4 + > northd/automake.mk | 104 + > northd/helpers.dl | 128 + > northd/ipam.dl | 506 ++ > northd/lrouter.dl | 715 ++ > northd/lswitch.dl | 643 ++ > northd/multicast.dl | 259 + > northd/ovn-nb.dlopts | 13 + > northd/ovn-northd-ddlog.c | 1752 ++++ > northd/ovn-sb.dlopts | 28 + > northd/ovn.dl | 387 + > northd/ovn.rs | 857 ++ > northd/ovn.toml | 2 + > northd/ovn_northd.dl | 7500 +++++++++++++++++ > northd/ovsdb2ddlog2c | 127 + > tests/atlocal.in | 7 + > tests/ovn-macros.at | 3 + > tests/ovn-northd.at | 97 + > tests/ovn.at | 12 + > tests/ovs-macros.at | 5 +- > tutorial/ovs-sandbox | 24 +- Sorry for making more work for you but.... Could we also do something for the "make sandbox" target, where we could have the ovn_start function optionally use ovn-northd-ddlog ? Something like: make sandbox --ddlog -- flaviof > utilities/checkpatch.py | 2 +- > utilities/ovn-ctl | 20 +- > 33 files changed, 13929 insertions(+), 13 deletions(-) > create mode 100644 Documentation/topics/debugging-ddlog.rst > create mode 100644 Documentation/tutorials/ddlog-new-feature.rst > create mode 100644 northd/helpers.dl > create mode 100644 northd/ipam.dl > create mode 100644 northd/lrouter.dl > create mode 100644 northd/lswitch.dl > create mode 100644 northd/multicast.dl > create mode 100644 northd/ovn-nb.dlopts > create mode 100644 northd/ovn-northd-ddlog.c > create mode 100644 northd/ovn-sb.dlopts > create mode 100644 northd/ovn.dl > create mode 100644 northd/ovn.rs > create mode 100644 northd/ovn.toml > create mode 100644 northd/ovn_northd.dl > create mode 100755 northd/ovsdb2ddlog2c > > diff --git a/Documentation/automake.mk b/Documentation/automake.mk > index e0f39b33fdf4..b3fd3d62b33b 100644 > --- a/Documentation/automake.mk > +++ b/Documentation/automake.mk > @@ -20,12 +20,14 @@ DOC_SOURCE = \ > Documentation/tutorials/ovn-ipsec.rst \ > Documentation/tutorials/ovn-rbac.rst \ > Documentation/tutorials/ovn-interconnection.rst \ > + Documentation/tutorials/ddlog-new-feature.rst \ > Documentation/topics/index.rst \ > Documentation/topics/testing.rst \ > Documentation/topics/high-availability.rst \ > Documentation/topics/integration.rst \ > Documentation/topics/ovn-news-2.8.rst \ > Documentation/topics/role-based-access-control.rst \ > + Documentation/topics/debugging-ddlog.rst \ > Documentation/howto/index.rst \ > Documentation/howto/docker.rst \ > Documentation/howto/firewalld.rst \ > diff --git a/Documentation/intro/install/general.rst b/Documentation/intro/install/general.rst > index 65b1f4a40e8a..e748ab430eae 100644 > --- a/Documentation/intro/install/general.rst > +++ b/Documentation/intro/install/general.rst > @@ -89,6 +89,13 @@ need the following software: > The environment variable OVS_RESOLV_CONF can be used to specify DNS server > configuration file (the default file on Linux is /etc/resolv.conf). > > +- `DDlog <https://github.com/vmware/differential-datalog>`, if you > + want to build ``ovn-northd-ddlog``, an alternate implementation of > + ``ovn-northd`` that scales better to large deployments. The NEWS > + file specifies the right version of DDlog to use with this release. > + Building with DDlog supports requires Rust to be installed (see > + https://www.rust-lang.org/tools/install). > + > If you are working from a Git tree or snapshot (instead of from a distribution > tarball), or if you modify the OVN build system or the database > schema, you will also need the following software: > @@ -176,6 +183,14 @@ the default database directory, add options as shown here:: > ``yum install`` or ``rpm -ivh``) and .deb (e.g. via > ``apt-get install`` or ``dpkg -i``) use the above configure options. > > +To build with DDlog support, add ``--with-ddlog=<path to ddlog>/lib`` > +to the ``configure`` command line. Building with DDLog adds a few > +minutes to the build because the Rust compiler is slow. To speed this > +up by about 2x, also add ``--enable-ddlog-fast-build``. This disables > +some Rust compiler optimizations, making a much slower > +``ovn-northd-ddlog`` executable, so it should not be used for > +production builds or for profiling. > + > By default, static libraries are built and linked against. If you want to use > shared libraries instead:: > > @@ -353,6 +368,14 @@ An example after install might be:: > $ ovn-ctl start_northd > $ ovn-ctl start_controller > > +If you built with DDlog support, then you can start > +``ovn-northd-ddlog`` instead of ``ovn-northd`` by adding > +``--ovn-northd-ddlog=yes``, e.g.:: > + > + $ export PATH=$PATH:/usr/local/share/ovn/scripts > + $ ovn-ctl --ovn-northd-ddlog=yes start_northd > + $ ovn-ctl start_controller > + > Starting OVN Central services > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > @@ -403,11 +426,15 @@ it at any time is harmless:: > $ ovn-nbctl --no-wait init > $ ovn-sbctl --no-wait init > > -Start the ovn-northd, telling it to connect to the OVN db servers same Unix > -domain socket:: > +Start ``ovn-northd``, telling it to connect to the OVN db servers same > +Unix domain socket:: > > $ ovn-northd --pidfile --detach --log-file > > +If you built with DDlog support, you can start ``ovn-northd-ddlog`` > +instead, the same way:: > + > + $ ovn-northd-ddlog --pidfile --detach --log-file > > Starting OVN Central services in containers > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > diff --git a/Documentation/topics/debugging-ddlog.rst b/Documentation/topics/debugging-ddlog.rst > new file mode 100644 > index 000000000000..046419b995f1 > --- /dev/null > +++ b/Documentation/topics/debugging-ddlog.rst > @@ -0,0 +1,280 @@ > +.. > + Licensed under the Apache License, Version 2.0 (the "License"); you may > + not use this file except in compliance with the License. You may obtain > + a copy of the License at > + > + http://www.apache.org/licenses/LICENSE-2.0 > + > + Unless required by applicable law or agreed to in writing, software > + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the > + License for the specific language governing permissions and limitations > + under the License. > + > + Convention for heading levels in OVN documentation: > + > + ======= Heading 0 (reserved for the title in a document) > + ------- Heading 1 > + ~~~~~~~ Heading 2 > + +++++++ Heading 3 > + ''''''' Heading 4 > + > + Avoid deeper levels because they do not render well. > + > +========================================= > +Debugging the DDlog version of ovn-northd > +========================================= > + > +This document gives some tips for debugging correctness issues in the > +DDlog implementation of ``ovn-northd``. To keep things conrete, we > +assume here that a failure occurred in one of the test cases in > +``ovn-e2e.at``, but the same methodology applies in any other > +environment. If none of these methods helps, ask for assistance or > +submit a bug report. > + > +Before trying these methods, you may want to check the northd log > +file, ``tests/testsuite.dir/<test_number>/northd/ovn-northd.log`` for > +error messages that might explain the failure. > + > +Compare OVSDB tables generated by DDlog vs C > +-------------------------------------------- > + > +The first thing I typically want to check when ``ovn-northd-ddlog`` > +does not behave as expected is how the OVSDB tables computed by DDlog > +differ from what the C implementation produces. Fortunately, all the > +infrastructure needed to do this already exists in OVN. > + > +First, let's modify the test script, e.g., ``ovn.at`` to dump the > +contents of OVSDB right before the failure. The most common issue is > +a difference between the logical flows generated by the two > +implementations. To make it easy to compare the generated flows, make > +sure that the test contains something like this in the right place:: > + > + ovn-sbctl dump-flows > sbflows > + AT_CAPTURE_FILE([sbflows]) > + > +The first line above dumps the OVN logical flow table to a file named > +``sbflows``. The second line ensures that, if the test fails, > +``sbflows`` get logged to ``testsuite.log``. That is not particularly > +useful for us right now, but it means that if someone later submits a > +bug report, that's one more piece of data that we don't have to ask > +for them to submit along with it. > + > +Next, we want to run the test twice, with the C and DDlog versions of > +northd, e.g., ``make check -j6 TESTSUITEFLAGS="-d 111 112"`` if 111 > +and 112 are the C and DDlog versions of the same test. The ``-d`` in > +this command line makes the test driver keep test directories around > +even for tests that succeed, since by default it deletes them. > + > +Now you can look at ``sbflows`` in each test log directory. The > +``ovn-northd-ddlog`` developers have gone to some trouble to make the > +DDlog flows as similar as possible to the C ones, right down to white > +space and other formatting. Thus, the DDlog output is often identical > +to C aside from logical datapath UUIDs. > + > +Usually, this means that one can get informative results by running > +``diff``, e.g.:: > + > + diff -u tests/testsuite.dir/111/sbflows tests/testsuite.dir/111/sbflows > + > +Running the input through the ``uuidfilt`` utility from OVS will > +generally get rid of the logical datapath UUID differences as well:: > + > + diff -u <(uuidfilt tests/testsuite.dir/111/sbflows) <(uuidfilt tests/testsuite.dir/111/sbflows) > + > +If there are nontrivial differences, this often identifies your bug. > + > +Often, once you have identified the difference between the two OVSDB > +dumps, this will immediately lead you to the root cause of the bug, > +but if you are not this lucky then the next method may help. > + > +Record and replay DDlog execution > +--------------------------------- > + > +DDlog offers a way to record all input table updates throughout the > +execution of northd and replay them against DDlog running as a > +standalone executable without all other OVN components. This has two > +advantages. First, this allows one to easily tweak the inputs, e.g. > +to simplify the test scenario. Second, the recorded execution can be > +easily replayed anywhere without having to reproduce your OVN setup. > + > +Use the ``--ddlog-record`` option to record updates, > +e.g. ``--ddlog-record=replay.dat`` to record to ``replay.dat``. > +(OVN's built-in tests automatically do this.) The file contains the > +log of transactions in the DDlog command format (see > +https://github.com/vmware/differential-datalog/blob/master/doc/command_reference/command_reference.md). > + > +To replay the log, you will need the standalone DDlog executable. By > +default, the build system does not compile this program, because it > +increases the already long Rust compilation time. To build it, add > +``NORTHD_CLI=1`` to the ``make`` command line, e.g. ``make > +NORTHD_CLI=1``. > + > +You can modify the log before replaying it, e.g., adding ``dump > +<table>`` commands to dump the contents of relations at various points > +during execution. The <table> name must be fully qualified based on > +the file in which it is declared, e.g. ``OVN_Southbound::<table>`` for > +southbound tables or ``lrouter::<table>.`` for ``lrouter.dl``. You > +can also use ``dump`` without an argument to dump the contents of all > +tables. > + > +The following command replays the log generated by OVN test number > +112 and dumps the output of DDlog to ``replay.dump``:: > + > + ovn/northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/112/northd/replay.dat > replay.dump > + > +Or, to dump table contents following the run, without having to edit > +``replay.dat``:: > + > + (cat tests/testsuite.dir/112/northd/replay.dat; echo 'dump;') | ovn/northd/ovn_northd_ddlog/target/release/ovn_northd_cli --no-init-snapshot > replay.dump > + > +Depending on whether and how you installed OVS and OVN, you might need > +to point ``LD_LIBRARY_PATH`` to library build directories to get the > +CLI to run, e.g.:: > + > + export LD_LIBRARY_PATH=$HOME/ovn/_build/lib/.libs:$HOME/ovs/_build/lib/.libs > + > +.. note:: > + > + The replay output may be less informative than you expect because > + DDlog does not, by default, keep around enough information to > + include input relation and intermediate relations in the output. > + These relations are often critical to understanding what is going > + on. To include them, add the options > + ``--output-internal-relations --output-input-relations=In_`` to > + ``DDLOG_EXTRA_FLAGS`` for building ``ovn-northd-ddlog``. For > + example, ``configure`` as:: > + > + ./configure DDLOG_EXTRA_FLAGS='--output-internal-relations --output-input-relations=In_' > + > +Debugging by Logging > +-------------------- > + > +One limitation of the previous method is that it allows one to inspect > +inputs and outputs of a rule, but not the (sometimes fairly > +complicated) computation that goes on inside the rule. You can of > +course break up the rule into several rules and dump the intermediate > +outputs. > + > +There are at least two alternatives for generating log messages. > +First, you can write rules to add strings to the Warning relation > +declared in ``ovn_north.dl``. Code in ``ovn-northd-ddlog.c`` will log > +any given string in this relation just once, when it is first added to > +the relation. (If it is removed from the relation and then added back > +later, it will be logged again.) > + > +Second, you can call using the ``warn()`` function declared in > +``ovn.dl`` from a DDlog rule. It's not straightforward to know > +exactly when this function will be called, like it would be in an > +imperative language like C, since DDlog is a declarative language > +where the user doesn't directly control when rules are triggered. You > +might, for example, see the rule being triggered multiple times with > +the same input. Nevertheless, this debugging technique is useful in > +practice. > + > +You will find many examples of the use of Warning and ``warn`` in > +``ovn_northd.dl``, where it is frequently used to report non-critical > +errors. > + > +Debugging panics > +---------------- > + > +**TODO**: update these instructions as DDlog's internal handling of panic's > +is improved. > + > +DDlog is a safe language, so DDlog programs normally do not crash, > +except for the following three cases: > + > +- A panic in a Rust function imported to DDlog as ``extern function``. > + > +- A panic in a C function imported to DDlog as ``extern function``. > + > +- A bug in the DDlog runtime or libraries. > + > +Below we walk through the steps involved in debugging such failures. > +In this scenario, there is an array-index-out-of-bounds error in the > +``ovn_scan_static_dynamic_ip6()`` function, which is written in Rust > +and imported to DDlog as an ``extern function``. When invoked from a > +DDlog rule, this function causes a panic in one of DDlog worker > +threads. > + > +**Step 1: Check for error messages in the northd log.** A panic can > +generally lead to unpredictable outcomes, so one cannot count on a > +clean error message showing up in the log (Other outcomes include > +crashing the entire process and even deadlocks. We are working to > +eliminate the latter possibility). In this case we are lucky to > +observe a bunch of error messages like the following in the ``northd`` > +log: > + > + ``2019-09-23T16:23:24.549Z|00011|ovn_northd|ERR|ddlog_transaction_commit(): > + error: failed to receive flush ack message from timely dataflow > + thread`` > + > +These messages are telling us that something is broken inside the > +DDlog runtime. > + > +**Step 2: Record and replay the failing scenario.** We use DDlog's > +record/replay capabilities (see above) to capture the faulty scenario. > +We replay the recorded trace:: > + > + northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat > + > +This generates a bunch of output ending with:: > + > + thread 'worker thread 2' panicked at 'index out of bounds: the len is 1 but the index is 1', /rustc/eae3437dfe991621e8afdc82734f4a172d7ddf9b/src/libcore/slice/mod.rs:2681:10 > + note: run with RUST_BACKTRACE=1 environment variable to display a backtrace. > + > +We re-run the CLI again with backtrace enabled (as suggested by the > +error message):: > + > + RUST_BACKTRACE=1 northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat > + > +This finally yields the following stack trace, which suggests array > +bound violation in ``ovn_scan_static_dynamic_ip6``:: > + > + 0: backtrace::backtrace::libunwind::trace > + at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29 10: core::panicking::panic_bounds_check > + at src/libcore/panicking.rs:61 > + [SKIPPED] > + 11: ovn_northd_ddlog::__ovn::ovn_scan_static_dynamic_ip6 > + 12: ovn_northd_ddlog::prog::__f > + [SKIPPED] > + > +Finally, looking at the source code of > +``ovn_scan_static_dynamic_ip6``, we identify the following line, > +containing an unsafe array indexing operator, as the culprit:: > + > + ovn_ipv6_parse(&f[1].to_string()) > + > +Clean build > +~~~~~~~~~~~ > + > +Occasionally it's desirable to a full and complete build of the > +DDlog-generated code. To trigger that, delete the generated > +``ovn_northd_ddlog`` directory and the ``ddlog.stamp`` witness file, > +like this:: > + > + rm -rf northd/ovn_northd_ddlog northd/ddlog.stamp > + > +or:: > + > + make clean-ddlog > + > +Submitting a bug report > +----------------------- > + > +If you are having trouble with DDlog and the above methods do not > +help, please submit a bug report to ``bugs@openvswitch.org``, CC > +``ryzhyk@gmail.com``. > + > +In addition to problem description, please provide as many of the > +following as possible: > + > +- Are you running with the right DDlog for the version of OVN? OVN > + and DDlog are both evolving and OVN needs to build against a > + specific version of DDlog. > + > +- ``replay.dat`` file generated as described above > + > +- Logs: ``ovn-northd.log`` and ``testsuite.log``, if you are running > + the OVN test suite > diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst > index 3b689cf53eae..d58d5618b2db 100644 > --- a/Documentation/topics/index.rst > +++ b/Documentation/topics/index.rst > @@ -36,6 +36,7 @@ OVN > .. toctree:: > :maxdepth: 2 > > + debugging-ddlog > integration.rst > high-availability > role-based-access-control > diff --git a/Documentation/tutorials/ddlog-new-feature.rst b/Documentation/tutorials/ddlog-new-feature.rst > new file mode 100644 > index 000000000000..02876db66d74 > --- /dev/null > +++ b/Documentation/tutorials/ddlog-new-feature.rst > @@ -0,0 +1,362 @@ > +.. > + Licensed under the Apache License, Version 2.0 (the "License"); you may > + not use this file except in compliance with the License. You may obtain > + a copy of the License at > + > + http://www.apache.org/licenses/LICENSE-2.0 > + > + Unless required by applicable law or agreed to in writing, software > + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the > + License for the specific language governing permissions and limitations > + under the License. > + > + Convention for heading levels in OVN documentation: > + > + ======= Heading 0 (reserved for the title in a document) > + ------- Heading 1 > + ~~~~~~~ Heading 2 > + +++++++ Heading 3 > + ''''''' Heading 4 > + > + Avoid deeper levels because they do not render well. > + > +=========================================================== > +Adding a new OVN feature to the DDlog version of ovn-northd > +=========================================================== > + > +This document describes the usual steps an OVN developer should go > +through when adding a new feature to ``ovn-northd-ddlog``. In order to > +make things less abstract we will use the IP Multicast > +``ovn-northd-ddlog`` implementation as an example. Even though the > +document is structured as a tutorial there might still exist > +feature-specific aspects that are not covered here. > + > +Overview > +-------- > + > +DDlog is a dataflow system: it receives data from a data source (a set > +of "input relations"), processes it through "intermediate relations" > +according to the rules specified in the DDlog program, and sends the > +processed "output relations" to a data sink. In OVN, the input > +relations primarily come from the OVN Northbound database and the > +output relations primarily go to the OVN Southbound database. The > +process looks like this:: > + > + from NBDB +----------+ +-----------------+ +-----------+ to SBDB > + ---------->|Input rels|-->|Intermediate rels|-->|Output rels|----------> > + +----------+ +-----------------+ +-----------+ > + > +Adding a new feature to ``ovn-northd-ddlog`` usually involves the > +following steps: > + > +1. Update northbound and/or southbound OVSDB schemas. > + > +2. Configure DDlog/OVSDB bindings. > + > +3. Define intermediate DDlog relations and rules to compute them. > + > +4. Write rules to update output relations. > + > +5. Generate ``Logical_Flow``s and/or other forwarding records (e.g., > + ``Multicast_Group``) that will control the dataplane operations. > + > +Update NB and/or SB OVSDB schemas > +--------------------------------- > + > +This step is no different from the normal development flow in C. > + > +Most of the times a developer chooses between two ways of configuring > +a new feature: > + > +1. Adding a set of columns to tables in the NB and/or SB database (or > + adding key-value pairs to existing columns). > + > +2. Adding new tables to the NB and/or SB database. > + > +Looking at IP Multicast, there are two ``OVN Northbound`` tables where > +configuration information is stored: > + > +- ``Logical_Switch``, column ``other_config``, keys ``mcast_*``. > + > +- ``Logical_Router``, column ``options``, keys ``mcast_*``. > + > +These tables become inputs to the DDlog pipeline. > + > +In addition we add a new table ``IP_Multicast`` to the SB database. > +DDlog will update this table, that is, ``IP_Multicast`` receives > +output from the above pipeline. > + > +Configuring DDlog/OVSDB bindings > +-------------------------------- > + > +Configuring ``northd/automake.mk`` > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The OVN build process uses DDlog's ``ovsdb2ddlog`` utility to parse > +``ovn-nb.ovsschema`` and ``ovn-sb.ovsschema`` and then automatically > +populate ``OVN_Northbound.dl`` and ``OVN_Southbound.dl``. For each > +OVN Northbound and Southbound table, it generates one or more > +corresponding DDlog relations. > + > +We need to supply ``ovsdb2ddlog`` with some information that it can't > +infer from the OVSDB schemas. This information must be specified as > +``ovsdb2ddlog`` arguments, which are read from > +``northd/ovn-nb.dlopts`` and ``northd/ovn-sb.dlopts``. > + > +The main choice for each new table is whether it is used for output. > +Output tables can also be used for input, but the converse is not > +true. If the table is used for output at all, we add ``-o <table>`` > +to the option file. Our new table ``IP_Multicast`` is an output > +table, so we add ``-o IP_Multicast`` to ``ovn-sb.dlopts``. > + > +For input-only tables, ``ovsdb2ddlog`` generates a DDlog input > +relation with the same name. For output tables, it generates this > +table plus an output relation named ``Out_<table>``. Thus, > +``OVN_Southbound.dl`` has two relations for ``IP_Multicast``:: > + > + input relation IP_Multicast ( > + _uuid: uuid, > + datapath: string, > + enabled: Set<bool>, > + querier: Set<bool> > + ) > + output relation Out_IP_Multicast ( > + _uuid: uuid, > + datapath: string, > + enabled: Set<bool>, > + querier: Set<bool> > + ) > + > +For an output table, consider whether only some of the columns are > +used for output, that is, some of the columns are effectively > +input-only. This is common in OVN for OVSDB columns that are managed > +externally (e.g. by a CMS). For each input-only column, we add ``--ro > +<table>.<column>``. Alternatively, if most of the columns are > +input-only but a few are output columns, add ``--rw <table>.<column>`` > +for each of the output columns. In our case, all of the columns are > +used for output, so we do not need to add anything. > + > +Finally, in some cases ``ovn-northd-ddlog`` shouldn't change values in > +. One such case is the ``seq_no`` column in the > +``IP_Multicast`` table. To do that we need to instruct ``ovsdb2ddlog`` > +to treat the column as read-only by using the ``--ro`` switch. > + > +``ovsdb2ddlog`` generates a number of additional DDlog relations, for > +use by auto-generated OVSDB adapter logic. These are irrelevant to > +most DDLog developers, although sometimes they can be handy for > +debugging. See the appendix_ for details. > + > +Define intermediate DDlog relations and rules to compute them. > +-------------------------------------------------------------- > + > +Obviously there will be a one-to-one relationship between logical > +switches/routers and IP multicast configuration. One way to represent > +this relationship is to create multicast configuration DDlog relations > +to be referenced by ``&Switch`` and ``&Router`` DDlog records:: > + > + /* IP Multicast per switch configuration. */ > + relation &McastSwitchCfg( > + datapath : uuid, > + enabled : bool, > + querier : bool > + } > + > + &McastSwitchCfg( > + .datapath = ls_uuid, > + .enabled = map_get_bool_def(other_config, "mcast_snoop", false), > + .querier = map_get_bool_def(other_config, "mcast_querier", true)) :- > + nb.Logical_Switch(._uuid = ls_uuid, > + .other_config = other_config). > + > +Then reference these relations in ``&Switch`` and ``&Router``. For > +example, in ``lswitch.dl``, the ``&Switch`` relation definition now > +contains:: > + > + relation &Switch( > + ls: nb.Logical_Switch, > + [...] > + mcast_cfg: Ref<McastSwitchCfg> > + ) > + > +And is populated by the following rule which references the correct > +``McastSwitchCfg`` based on the logical switch uuid:: > + > + &Switch(.ls = ls, > + [...] > + .mcast_cfg = mcast_cfg) :- > + nb.Logical_Switch[ls], > + [...] > + mcast_cfg in &McastSwitchCfg(.datapath = ls._uuid). > + > +Build state based on information dynamically updated by ``ovn-controller`` > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Some OVN features rely on information learned by ``ovn-controller`` to > +generate ``Logical_Flow`` or other records that control the dataplane. > +In case of IP Multicast, ``ovn-controller`` uses IGMP to learn > +multicast groups that are joined by hosts. > + > +Each ``ovn-controller`` maintains its own set of records to avoid > +ownership and concurrency with other controllers. If two hosts that > +are connected to the same logical switch but reside on different > +hypervisors (different ``ovn-controller`` processes) join the same > +multicast group G, each of the controllers will create an > +``IGMP_Group`` record in the ``OVN Southbound`` database which will > +contain a set of ports to which the interested hosts are connected. > + > +At this point ``ovn-northd-ddlog`` needs to aggregate the per-chassis > +IGMP records to generate a single ``Logical_Flow`` for group G. > +Moreover, the ports on which the hosts are connected are represented > +as references to ``Port_Binding`` records in the database. These also > +need to be translated to ``&SwitchPort`` DDlog relations. The > +corresponding DDlog operations that need to be performed are: > + > +- Flatten the ``<IGMP group, ports>`` mapping in order to be able to > + do the translation from ``Port_Binding`` to ``&SwitchPort``. For > + each ``IGMP_Group`` record in the ``OVN Southbound`` database > + generate an individual record of type ``IgmpSwitchGroupPort`` for > + each ``Port_Binding`` in the set of ports that joined the > + group. Also, translate the ``Port_Binding`` uuid to the > + corresponding ``Logical_Switch_Port`` uuid:: > + > + relation IgmpSwitchGroupPort( > + address: string, > + switch : Ref<Switch>, > + port : uuid > + ) > + > + IgmpSwitchGroupPort(address, switch, lsp_uuid) :- > + sb::IGMP_Group(.address = address, .datapath = igmp_dp_set, > + .ports = pb_ports), > + var pb_port_uuid = FlatMap(pb_ports), > + sb::Port_Binding(._uuid = pb_port_uuid, .logical_port = lsp_name), > + &SwitchPort( > + .lsp = nb.Logical_Switch_Port{._uuid = lsp_uuid, .name = lsp_name}, > + .sw = switch). > + > +- Aggregate the flattened IgmpSwitchGroupPort (implicitly from all > + ``ovn-controller`` instances) grouping by adress and logical > + switch:: > + > + relation IgmpSwitchMulticastGroup( > + address: string, > + switch : Ref<Switch>, > + ports : Set<uuid> > + ) > + > + IgmpSwitchMulticastGroup(address, switch, ports) :- > + IgmpSwitchGroupPort(address, switch, port), > + var ports = port.group_by((address, switch)).to_set(). > + > +At this point we have all the feature configuration relevant > +information stored in DDlog relations in ``ovn-northd-ddlog`` memory. > + > +Write rules to update output relations > +-------------------------------------- > + > +The developer updates output tables by writing rules that generate > +``Out_*`` relations. For IP Multicast this means:: > + > + /* IP_Multicast table (only applicable for Switches). */ > + sb::Out_IP_Multicast(._uuid = hash128(cfg.datapath), > + .datapath = cfg.datapath, > + .enabled = set_singleton(cfg.enabled), > + .querier = set_singleton(cfg.querier)) :- > + &McastSwitchCfg[cfg]. > + > +.. note:: ``OVN_Southbound.dl`` also contains an ``IP_Multicast`` > + relation with ``input`` qualifier. This relation stores the > + current snapshot of the OVSDB table and cannot be written to. > + > +Generate ``Logical_Flow`` and/or other forwarding records > +--------------------------------------------------------- > + > +At this point we have defined all DDlog relations required to generate > +``Logical_Flow``s. All we have to do is write the rules to do so. > +For each ``IgmpSwitchMulticastGroup`` we generate a ``Flow`` that has > +as action ``"outport = <Multicast_Group>; output;"``:: > + > + /* Ingress table 17: Add IP multicast flows learnt from IGMP (priority 90). */ > + for (IgmpSwitchMulticastGroup(.address = address, .switch = &sw)) { > + Flow(.logical_datapath = sw.dpname, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 90, > + .__match = "eth.mcast && ip4 && ip4.dst == ${address}", > + .actions = "outport = \"${address}\"; output;", > + .external_ids = map_empty()) > + } > + > +In some cases generating a logical flow is not enough. For IGMP we > +also need to maintain OVN southbound ``Multicast_Group`` records, > +one per IGMP group storing the corresponding ``Port_Binding`` uuids of > +ports where multicast traffic should be sent. This is also relatively > +straightforward:: > + > + /* Create a multicast group for each IGMP group learned by a Switch. > + * 'tunnel_key' == 0 triggers an ID allocation later. > + */ > + sb::Out_Multicast_Group (.datapath = switch.dpname, > + .name = address, > + .tunnel_key = 0, > + .ports = set_map_uuid2name(port_ids)) :- > + IgmpSwitchMulticastGroup(address, &switch, port_ids). > + > +We must also define DDlog relations that will allocate ``tunnel_key`` > +values. There are two cases: tunnel keys for records that already > +existed in the database are preserved to implement stable id > +allocation; new multicast groups need new keys. This kind of > +allocation can be tricky, especially to new users of DDlog. OVN > +contains multiple instances of allocation, so it's probably worth > +reading through the existing cases and following their pattern, and, > +if it's still tricky, asking for assistance. > + > +Appendix A. Additional relations generated by ``ovsdb2ddlog`` > +------------------------------------------------------------- > + > +.. _appendix: > + > +ovsdb2ddlog generates some extra relations to manage communication > +with the OVSDB server. It generates records in the following > +relations when rows in OVSDB output tables need to be added or deleted > +or updated. > + > +In the steady state, when everything is working well, a given record > +stays in any one of these relations only briefly: just long enough for > +``ovn-northd-ddlog`` to send a transaction to the OVSDB server. When > +the OVSDB server applies the update and sends an acknowledgement, this > +ordinarily means that these relations become empty, because there are > +no longer any further changes to send. > + > +Thus, records that persist in one of these relations is a sign of a > +problem. One example of such a problem is the database server > +rejecting the transactions sent by ``ovn-northd-ddlog``, which might > +happen if, for example, a bug in a ``.dl`` file would cause some OVSDB > +constraint or relational integrity rule to be violated. (Such a > +problem can often be diagnosed by looking in the OVSDB server's log.) > + > +- ``DeltaPlus_IP_Multicast`` used by the DDlog program to track new > + records that are not yet added to the database:: > + > + output relation DeltaPlus_IP_Multicast ( > + datapath: uuid_or_string_t, > + enabled: Set<bool>, > + querier: Set<bool> > + ) > + > +- ``DeltaMinus_IP_Multicast`` used by the DDlog program to track > + records that are no longer needed in the database and need to be > + removed:: > + > + output relation DeltaMinus_IP_Multicast ( > + _uuid: uuid > + ) > + > +- ``Update_IP_Multicast`` used by the DDlog program to track records > + whose fields need to be updated in the database:: > + > + output relation Update_IP_Multicast ( > + _uuid: uuid, > + enabled: Set<bool>, > + querier: Set<bool> > + ) > diff --git a/Documentation/tutorials/index.rst b/Documentation/tutorials/index.rst > index 4ff6e16f84cd..d1f4fda9df1e 100644 > --- a/Documentation/tutorials/index.rst > +++ b/Documentation/tutorials/index.rst > @@ -44,3 +44,4 @@ vSwitch. > ovn-rbac > ovn-ipsec > ovn-interconnection > + ddlog-new-feature > diff --git a/NEWS b/NEWS > index 601023067996..04b75e68c6a1 100644 > --- a/NEWS > +++ b/NEWS > @@ -1,5 +1,11 @@ > Post-v20.09.0 > --------------------- > + - ovn-northd-ddlog: New implementation of northd, based on DDlog. This > + implementation is incremental, meaning that it only recalculates what is > + needed for the southbound database when northbound changes occur. It is > + expected to scale better than the C implementation, for large deployments. > + (This may take testing and tuning to be effective.) This version of OVN > + requires DDLog 0.30. > - The "datapath" argument to ovn-trace is now optional, since the > datapath can be inferred from the inport (which is required). > - The obsolete "redirect-chassis" way to configure gateways has been > diff --git a/acinclude.m4 b/acinclude.m4 > index a797adc826c9..83d1d13bfb86 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -42,6 +42,49 @@ AC_DEFUN([OVS_ENABLE_WERROR], > fi > AC_SUBST([SPARSE_WERROR])]) > > +dnl OVS_CHECK_DDLOG > +dnl > +dnl Configure ddlog source tree > +AC_DEFUN([OVS_CHECK_DDLOG], [ > + AC_ARG_WITH([ddlog], > + [AC_HELP_STRING([--with-ddlog=.../differential-datalog/lib], > + [Enables DDlog by pointing to its library dir])], > + [DDLOGLIBDIR=$withval], [DDLOGLIBDIR=no]) > + > + AC_MSG_CHECKING([for DDlog library directory]) > + if test "$DDLOGLIBDIR" != no; then > + if test ! -d "$DDLOGLIBDIR"; then > + AC_MSG_ERROR([ddlog library dir "$DDLOGLIBDIR" doesn't exist]) > + elif test ! -f "$DDLOGLIBDIR"/ddlog_std.dl; then > + AC_MSG_ERROR([ddlog library dir "$DDLOGLIBDIR" lacks ddlog_std.dl]) > + fi > + > + AC_ARG_VAR([DDLOG]) > + AC_CHECK_PROGS([DDLOG], [ddlog], [none]) > + if test X"$DDLOG" = X"none"; then > + AC_MSG_ERROR([ddlog is required to build with DDlog]) > + fi > + > + AC_ARG_VAR([CARGO]) > + AC_CHECK_PROGS([CARGO], [cargo], [none]) > + if test X"$CARGO" = X"none"; then > + AC_MSG_ERROR([cargo is required to build with DDlog]) > + fi > + > + AC_ARG_VAR([RUSTC]) > + AC_CHECK_PROGS([RUSTC], [rustc], [none]) > + if test X"$RUSTC" = X"none"; then > + AC_MSG_ERROR([rustc is required to build with DDlog]) > + fi > + > + AC_SUBST([DDLOGLIBDIR]) > + AC_DEFINE([DDLOG], [1], [Build OVN daemons with ddlog.]) > + fi > + AC_MSG_RESULT([$DDLOGLIBDIR]) > + > + AM_CONDITIONAL([DDLOG], [test "$DDLOGLIBDIR" != no]) > +]) > + > dnl Checks for net/if_dl.h. > dnl > dnl (We use this as a proxy for checking whether we're building on FreeBSD > diff --git a/configure.ac b/configure.ac > index 0b17f05b9c77..40ab87f691b2 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -131,6 +131,7 @@ OVS_LIBTOOL_VERSIONS > OVS_CHECK_CXX > AX_FUNC_POSIX_MEMALIGN > OVN_CHECK_UNBOUND > +OVS_CHECK_DDLOG_FAST_BUILD > > OVS_CHECK_INCLUDE_NEXT([stdio.h string.h]) > AC_CONFIG_FILES([lib/libovn.sym]) > @@ -167,11 +168,15 @@ OVS_CONDITIONAL_CC_OPTION([-Wno-unused-parameter], [HAVE_WNO_UNUSED_PARAMETER]) > OVS_ENABLE_WERROR > OVS_ENABLE_SPARSE > > +OVS_CHECK_DDLOG > OVS_CHECK_PRAGMA_MESSAGE > OVN_CHECK_OVS > OVS_CTAGS_IDENTIFIERS > AC_SUBST([OVS_CFLAGS]) > AC_SUBST([OVS_LDFLAGS]) > +AC_SUBST([DDLOG_EXTRA_FLAGS]) > +AC_SUBST([DDLOG_EXTRA_RUSTFLAGS]) > +AC_SUBST([DDLOG_NORTHD_LIB_ONLY]) > > AC_SUBST([ovs_srcdir], ['${OVSDIR}']) > AC_SUBST([ovs_builddir], ['${OVSBUILDDIR}']) > diff --git a/m4/ovn.m4 b/m4/ovn.m4 > index dacfabb2a140..2909914fb87a 100644 > --- a/m4/ovn.m4 > +++ b/m4/ovn.m4 > @@ -576,3 +576,19 @@ AC_DEFUN([OVN_CHECK_UNBOUND], > fi > AM_CONDITIONAL([HAVE_UNBOUND], [test "$HAVE_UNBOUND" = yes]) > AC_SUBST([HAVE_UNBOUND])]) > + > +dnl Checks for --enable-ddlog-fast-build and updates DDLOG_EXTRA_RUSTFLAGS. > +AC_DEFUN([OVS_CHECK_DDLOG_FAST_BUILD], > + [AC_ARG_ENABLE( > + [ddlog_fast_build], > + [AC_HELP_STRING([--enable-ddlog-fast-build], > + [Build ddlog programs faster, but generate slower code])], > + [case "${enableval}" in > + (yes) ddlog_fast_build=true ;; > + (no) ddlog_fast_build=false ;; > + (*) AC_MSG_ERROR([bad value ${enableval} for --enable-ddlog-fast-build]) ;; > + esac], > + [ddlog_fast_build=false]) > + if $ddlog_fast_build; then > + DDLOG_EXTRA_RUSTFLAGS="-C opt-level=z" > + fi]) > diff --git a/northd/.gitignore b/northd/.gitignore > index 97a59801be9f..0f2b33ae7d01 100644 > --- a/northd/.gitignore > +++ b/northd/.gitignore > @@ -1,2 +1,6 @@ > /ovn-northd > +/ovn-northd-ddlog > /ovn-northd.8 > +/OVN_Northbound.dl > +/OVN_Southbound.dl > +/ovn_northd_ddlog/ > diff --git a/northd/automake.mk b/northd/automake.mk > index 69657e77e400..2717f59c5f3a 100644 > --- a/northd/automake.mk > +++ b/northd/automake.mk > @@ -8,3 +8,107 @@ northd_ovn_northd_LDADD = \ > man_MANS += northd/ovn-northd.8 > EXTRA_DIST += northd/ovn-northd.8.xml > CLEANFILES += northd/ovn-northd.8 > + > +EXTRA_DIST += \ > + northd/ovn-northd northd/ovn-northd.8.xml \ > + northd/ovn_northd.dl northd/ovn.dl northd/ovn.rs \ > + northd/ovn.toml northd/lswitch.dl northd/lrouter.dl \ > + northd/helpers.dl northd/ipam.dl northd/multicast.dl \ > + northd/ovn-nb.dlopts northd/ovn-sb.dlopts \ > + northd/ovsdb2ddlog2c > + > +if DDLOG > +bin_PROGRAMS += northd/ovn-northd-ddlog > +northd_ovn_northd_ddlog_SOURCES = \ > + northd/ovn-northd-ddlog.c \ > + northd/ovn-northd-ddlog-sb.inc \ > + northd/ovn-northd-ddlog-nb.inc \ > + northd/ovn_northd_ddlog/ddlog.h > +northd_ovn_northd_ddlog_LDADD = \ > + northd/ovn_northd_ddlog/target/release/libovn_northd_ddlog.la \ > + lib/libovn.la \ > + $(OVSDB_LIBDIR)/libovsdb.la \ > + $(OVS_LIBDIR)/libopenvswitch.la > + > +nb_opts = $$(cat $(srcdir)/northd/ovn-nb.dlopts) > +northd/OVN_Northbound.dl: ovn-nb.ovsschema northd/ovn-nb.dlopts > + $(AM_V_GEN)ovsdb2ddlog -f $< --output-file $@ $(nb_opts) > +northd/ovn-northd-ddlog-nb.inc: ovn-nb.ovsschema northd/ovn-nb.dlopts northd/ovsdb2ddlog2c > + $(AM_V_GEN)$(run_python) $(srcdir)/northd/ovsdb2ddlog2c -p nb_ -f $< --output-file $@ $(nb_opts) > + > +sb_opts = $$(cat $(srcdir)/northd/ovn-sb.dlopts) > +northd/OVN_Southbound.dl: ovn-sb.ovsschema northd/ovn-sb.dlopts > + $(AM_V_GEN)ovsdb2ddlog -f $< --output-file $@ $(sb_opts) > +northd/ovn-northd-ddlog-sb.inc: ovn-sb.ovsschema northd/ovn-sb.dlopts northd/ovsdb2ddlog2c > + $(AM_V_GEN)$(run_python) $(srcdir)/northd/ovsdb2ddlog2c -p sb_ -f $< --output-file $@ $(sb_opts) > + > +BUILT_SOURCES += \ > + northd/ovn-northd-ddlog-sb.inc \ > + northd/ovn-northd-ddlog-nb.inc > + > +northd/ovn_northd_ddlog/ddlog.h: northd/ddlog.stamp > + > +CARGO_VERBOSE = $(cargo_verbose_$(V)) > +cargo_verbose_ = $(cargo_verbose_$(AM_DEFAULT_VERBOSITY)) > +cargo_verbose_0 = > +cargo_verbose_1 = --verbose > + > +DDLOGFLAGS = -L $(DDLOGLIBDIR) -L $(builddir)/northd $(DDLOG_EXTRA_FLAGS) > + > +RUSTFLAGS = \ > + -L ../../lib/.libs \ > + -L $(OVS_LIBDIR)/.libs \ > + $$LIBOPENVSWITCH_DEPS \ > + $$LIBOVN_DEPS \ > + -Awarnings $(DDLOG_EXTRA_RUSTFLAGS) > + > +ddlog_sources = \ > + northd/ovn_northd.dl \ > + northd/lswitch.dl \ > + northd/lrouter.dl \ > + northd/ipam.dl \ > + northd/multicast.dl \ > + northd/ovn.dl \ > + northd/ovn.rs \ > + northd/helpers.dl \ > + northd/OVN_Northbound.dl \ > + northd/OVN_Southbound.dl > +northd/ddlog.stamp: $(ddlog_sources) > + $(AM_V_GEN)$(DDLOG) -i $< -o $(builddir)/northd $(DDLOGFLAGS) > + $(AM_V_at)touch $@ > + > +NORTHD_LIB = 1 > +NORTHD_CLI = 0 > + > +ddlog_targets = $(northd_lib_$(NORTHD_LIB)) $(northd_cli_$(NORTHD_CLI)) > +northd_lib_1 = northd/ovn_northd_ddlog/target/release/libovn_%_ddlog.la > +northd_cli_1 = northd/ovn_northd_ddlog/target/release/ovn_%_cli > +EXTRA_northd_ovn_northd_DEPENDENCIES = $(northd_cli_$(NORTHD_CLI)) > + > +cargo_build = $(cargo_build_$(NORTHD_LIB)$(NORTHD_CLI)) > +cargo_build_01 = --features command-line --bin ovn_northd_cli > +cargo_build_10 = --lib > +cargo_build_11 = --features command-line > + > +$(ddlog_targets): northd/ddlog.stamp lib/libovn.la $(OVS_LIBDIR)/libopenvswitch.la > + $(AM_V_GEN)LIBOVN_DEPS=`. lib/libovn.la && echo "$$dependency_libs"` && \ > + LIBOPENVSWITCH_DEPS=`. $(OVS_LIBDIR)/libopenvswitch.la && echo "$$dependency_libs"` && \ > + cd northd/ovn_northd_ddlog && \ > + RUSTC='$(RUSTC)' RUSTFLAGS="$(RUSTFLAGS)" \ > + cargo build --release $(CARGO_VERBOSE) $(cargo_build) --no-default-features --features ovsdb > +endif > + > +CLEAN_LOCAL += clean-ddlog > +clean-ddlog: > + rm -rf northd/ovn_northd_ddlog northd/ddlog.stamp > + > +CLEANFILES += \ > + northd/ddlog.stamp \ > + northd/ovn_northd_ddlog/ddlog.h \ > + northd/ovn_northd_ddlog/target/release/libovn_northd_ddlog.a \ > + northd/ovn_northd_ddlog/target/release/libovn_northd_ddlog.la \ > + northd/ovn_northd_ddlog/target/release/ovn_northd_cli \ > + northd/OVN_Northbound.dl \ > + northd/OVN_Southbound.dl \ > + northd/ovn-northd-ddlog-nb.inc \ > + northd/ovn-northd-ddlog-sb.inc > diff --git a/northd/helpers.dl b/northd/helpers.dl > new file mode 100644 > index 000000000000..d8d818c0ffb9 > --- /dev/null > +++ b/northd/helpers.dl > @@ -0,0 +1,128 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +import OVN_Northbound as nb > +import OVN_Southbound as sb > +import ovsdb > +import ovn > + > +/* ACLRef: reference to nb::ACL */ > +relation &ACLRef[nb::ACL] > +&ACLRef[acl] :- nb::ACL[acl]. > + > +/* DHCP_Options: reference to nb::DHCP_Options */ > +relation &DHCP_OptionsRef[nb::DHCP_Options] > +&DHCP_OptionsRef[options] :- nb::DHCP_Options[options]. > + > +/* QoS: reference to nb::QoS */ > +relation &QoSRef[nb::QoS] > +&QoSRef[qos] :- nb::QoS[qos]. > + > +/* LoadBalancerRef: reference to nb::Load_Balancer */ > +relation &LoadBalancerRef[nb::Load_Balancer] > +&LoadBalancerRef[lb] :- nb::Load_Balancer[lb]. > + > +/* LoadBalancerHealthCheckRef: reference to nb::Load_Balancer_Health_Check */ > +relation &LoadBalancerHealthCheckRef[nb::Load_Balancer_Health_Check] > +&LoadBalancerHealthCheckRef[lbhc] :- nb::Load_Balancer_Health_Check[lbhc]. > + > +/* NATRef: reference to nb::NAT*/ > +relation &NATRef[nb::NAT] > +&NATRef[nat] :- nb::NAT[nat]. > + > +/* AddressSetRef: reference to nb::Address_Set */ > +relation &AddressSetRef[nb::Address_Set] > +&AddressSetRef[__as] :- nb::Address_Set[__as]. > + > +/* ServiceMonitor: reference to sb::Service_Monitor */ > +relation &ServiceMonitorRef[sb::Service_Monitor] > +&ServiceMonitorRef[sm] :- sb::Service_Monitor[sm]. > + > +/* Switch-to-router logical port connections */ > +relation SwitchRouterPeer(lsp: uuid, lsp_name: string, lrp: uuid) > +SwitchRouterPeer(lsp, lsp_name, lrp) :- > + nb::Logical_Switch_Port(._uuid = lsp, .name = lsp_name, .__type = "router", .options = options), > + Some{var router_port} = map_get(options, "router-port"), > + nb::Logical_Router_Port(.name = router_port, ._uuid = lrp). > + > +function map_get_bool_def(m: Map<string, string>, > + k: string, def: bool): bool = { > + match (map_get(m, k)) { > + None -> def, > + Some{x} -> { > + if (def) { > + str_to_lower(x) != "false" > + } else { > + str_to_lower(x) == "true" > + } > + } > + } > +} > + > +function map_get_uint_def(m: Map<string, string>, k: string, > + def: integer): integer = { > + match (map_get(m, k)) { > + None -> def, > + Some{x} -> { > + match (str_to_uint(x, 10)) { > + Some{v} -> v, > + None -> def > + } > + } > + } > +} > + > +function map_get_int_def(m: Map<string, string>, k: string, > + def: integer): integer = { > + match (map_get(m, k)) { > + None -> def, > + Some{x} -> { > + match (str_to_int(x, 10)) { > + Some{v} -> v, > + None -> def > + } > + } > + } > +} > + > +function map_get_int_def_limit(m: Map<string, string>, k: string, def: integer, > + min: integer, max: integer): integer = { > + var v = map_get_int_def(m, k, def); > + var v1 = { > + if (v < min) min else v > + }; > + if (v1 > max) max else v1 > +} > + > +function map_get_str_def(m: Map<string, string>, k: string, > + def: string): string = { > + match (map_get(m, k)) { > + None -> def, > + Some{x} -> x > + } > +} > + > +function vec_nth_def(vector: Vec<'A>, index: bit<64>, def: 'A): 'A { > + match (vec_nth(vector, index)) { > + Some{value} -> value, > + None -> def > + } > +} > + > +function ha_chassis_group_uuid(uuid: uuid): uuid { hash128("hacg" ++ uuid) } > +function ha_chassis_uuid(chassis_name: string, nb_chassis_uuid: uuid): uuid { hash128("hac" ++ chassis_name ++ nb_chassis_uuid) } > + > +/* Dummy relation with one empty row, useful for putting into antijoins. */ > +relation Unit() > +Unit(). > diff --git a/northd/ipam.dl b/northd/ipam.dl > new file mode 100644 > index 000000000000..cc0f7989a7dd > --- /dev/null > +++ b/northd/ipam.dl > @@ -0,0 +1,506 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +/* > + * IPAM (IP address management) and MACAM (MAC address management) > + * > + * IPAM generally stands for IP address management. In non-virtualized > + * world, MAC addresses come with the hardware. But, with virtualized > + * workloads, they need to be assigned and managed. This function > + * does both IP address management (ipam) and MAC address management > + * (macam). > + */ > + > +import OVN_Northbound as nb > +import ovsdb > +import allocate > +import helpers > +import ovn > +import ovn_northd > +import lswitch > +import lrouter > + > +function mAC_ADDR_SPACE(): bit<64> = 64'hffffff > + > +/* > + * IPv4 dynamic address allocation. > + */ > + > +/* > + * The fixed portions of a request for a dynamic LSP address. > + */ > +typedef dynamic_address_request = DynamicAddressRequest{ > + mac: Option<eth_addr>, > + ip4: Option<in_addr>, > + ip6: Option<in6_addr> > +} > +function parse_dynamic_address_request(s: string): Option<dynamic_address_request> { > + var tokens = string_split(s, " "); > + var n = vec_len(tokens); > + if (n < 1 or n > 3) { > + return None > + }; > + > + var t0 = vec_nth_def(tokens, 0, ""); > + var t1 = vec_nth_def(tokens, 1, ""); > + var t2 = vec_nth_def(tokens, 2, ""); > + if (t0 == "dynamic") { > + if (n == 1) { > + Some{DynamicAddressRequest{None, None, None}} > + } else if (n == 2) { > + match (ip46_parse(t1)) { > + Some{IPv4{ipv4}} -> Some{DynamicAddressRequest{None, Some{ipv4}, None}}, > + Some{IPv6{ipv6}} -> Some{DynamicAddressRequest{None, None, Some{ipv6}}}, > + _ -> None > + } > + } else if (n == 3) { > + match ((ip_parse(t1), ipv6_parse(t2))) { > + (Some{ipv4}, Some{ipv6}) -> Some{DynamicAddressRequest{None, Some{ipv4}, Some{ipv6}}}, > + _ -> None > + } > + } else { > + None > + } > + } else if (n == 2 and t1 == "dynamic") { > + match (eth_addr_from_string(t0)) { > + Some{mac} -> Some{DynamicAddressRequest{Some{mac}, None, None}}, > + _ -> None > + } > + } else { > + None > + } > +} > + > +/* SwitchIPv4ReservedAddress - keeps track of statically reserved IPv4 addresses > + * for each switch whose subnet option is set, including: > + * (1) first and last (multicast) address in the subnet range > + * (2) addresses from `other_config.exclude_ips` > + * (3) port addresses in lsp.addresses, except "unknown" addresses, addresses of > + * "router" ports, dynamic addresses > + * (4) addresses associated with router ports peered with the switch. > + * (5) static IP component of "dynamic" `lsp.addresses`. > + * > + * Addresses are kept in host-endian format (i.e., bit<32> vs in_addr). > + */ > +relation SwitchIPv4ReservedAddress(lswitch: uuid, addr: bit<32>) > + > +/* Add reserved address groups (1) and (2). */ > +SwitchIPv4ReservedAddress(.lswitch = ls._uuid, > + .addr = addr) :- > + &Switch(.ls = ls, > + .subnet = Some{(_, _, start_ipv4, total_ipv4s)}), > + var exclude_ips = { > + var exclude_ips = set_singleton(start_ipv4); > + set_insert(exclude_ips, start_ipv4 + total_ipv4s - 1); > + match (map_get(ls.other_config, "exclude_ips")) { > + None -> exclude_ips, > + Some{exclude_ip_list} -> match (parse_ip_list(exclude_ip_list)) { > + Left{err} -> { > + warn("logical switch ${uuid2str(ls._uuid)}: bad exclude_ips (${err})"); > + exclude_ips > + }, > + Right{ranges} -> { > + for (range in ranges) { > + (var ip_start, var ip_end) = range; > + var start = iptohl(ip_start); > + var end = match (ip_end) { > + None -> start, > + Some{ip} -> iptohl(ip) > + }; > + start = max(start_ipv4, start); > + end = min(start_ipv4 + total_ipv4s - 1, end); > + if (end >= start) { > + for (addr in range_vec(start, end+1, 1)) { > + set_insert(exclude_ips, addr) > + } > + } else { > + warn("logical switch ${uuid2str(ls._uuid)}: excluded addresses not in subnet") > + } > + }; > + exclude_ips > + } > + } > + } > + }, > + var addr = FlatMap(exclude_ips). > + > +/* Add reserved address group (3). */ > +SwitchIPv4ReservedAddress(.lswitch = ls._uuid, > + .addr = addr) :- > + SwitchPortStaticAddresses( > + .port = &SwitchPort{ > + .sw = &Switch{.ls = ls, > + .subnet = Some{(_, _, start_ipv4, total_ipv4s)}}, > + .peer = None}, > + .addrs = lport_addrs > + ), > + var addrs = { > + var addrs = set_empty(); > + for (addr in lport_addrs.ipv4_addrs) { > + var addr_host_endian = iptohl(addr.addr); > + if (addr_host_endian >= start_ipv4 and addr_host_endian < start_ipv4 + total_ipv4s) { > + set_insert(addrs, addr_host_endian) > + } else () > + }; > + addrs > + }, > + var addr = FlatMap(addrs). > + > +/* Add reserved address group (4) */ > +SwitchIPv4ReservedAddress(.lswitch = ls._uuid, > + .addr = addr) :- > + &SwitchPort( > + .sw = &Switch{.ls = ls, > + .subnet = Some{(_, _, start_ipv4, total_ipv4s)}}, > + .peer = Some{&rport}), > + var addrs = { > + var addrs = set_empty(); > + for (addr in rport.networks.ipv4_addrs) { > + var addr_host_endian = iptohl(addr.addr); > + if (addr_host_endian >= start_ipv4 and addr_host_endian < start_ipv4 + total_ipv4s) { > + set_insert(addrs, addr_host_endian) > + } else () > + }; > + addrs > + }, > + var addr = FlatMap(addrs). > + > +/* Add reserved address group (5) */ > +SwitchIPv4ReservedAddress(.lswitch = sw.ls._uuid, > + .addr = iptohl(ip_addr)) :- > + &SwitchPort(.sw = &sw, .lsp = lsp, .static_dynamic_ipv4 = Some{ip_addr}). > + > +/* Aggregate all reserved addresses for each switch. */ > +relation SwitchIPv4ReservedAddresses(lswitch: uuid, addrs: Set<bit<32>>) > + > +SwitchIPv4ReservedAddresses(lswitch, addrs) :- > + SwitchIPv4ReservedAddress(lswitch, addr), > + var addrs = addr.group_by(lswitch).to_set(). > + > +SwitchIPv4ReservedAddresses(lswitch_uuid, set_empty()) :- > + nb::Logical_Switch(._uuid = lswitch_uuid), > + not SwitchIPv4ReservedAddress(lswitch_uuid, _). > + > +/* Allocate dynamic IP addresses for ports that require them: > + */ > +relation SwitchPortAllocatedIPv4DynAddress(lsport: uuid, dyn_addr: Option<in_addr>) > + > +SwitchPortAllocatedIPv4DynAddress(lsport, dyn_addr) :- > + /* Aggregate all ports of a switch that need a dynamic IP address */ > + port in &SwitchPort(.needs_dynamic_ipv4address = true, > + .sw = &sw), > + var switch_id = sw.ls._uuid, > + var ports = port.group_by(switch_id).to_vec(), > + SwitchIPv4ReservedAddresses(switch_id, reserved_addrs), > + /* Allocate dynamic addresses only for ports that don't have a dynamic address > + * or have one that is no longer valid. */ > + var dyn_addresses = { > + var used_addrs = reserved_addrs; > + var assigned_addrs = vec_empty(); > + var need_addr = vec_empty(); > + (var start_ipv4, var total_ipv4s) = match (vec_nth(ports, 0)) { > + None -> { (0, 0) } /* no ports with dynamic addresses */, > + Some{port0} -> { > + match (port0.sw.subnet) { > + None -> { > + abort("needs_dynamic_ipv4address is true, but subnet is undefined in port ${uuid2str(deref(port0).lsp._uuid)}"); > + (0, 0) > + }, > + Some{(_, _, start_ipv4, total_ipv4s)} -> (start_ipv4, total_ipv4s) > + } > + } > + }; > + for (port in ports) { > + //warn("port(${deref(port).lsp._uuid})"); > + match (deref(port).dynamic_address) { > + None -> { > + /* no dynamic address yet -- allocate one now */ > + //warn("need_addr(${deref(port).lsp._uuid})"); > + vec_push(need_addr, deref(port).lsp._uuid) > + }, > + Some{dynaddr} -> { > + match (vec_nth(dynaddr.ipv4_addrs, 0)) { > + None -> { > + /* dynamic address does not have IPv4 component -- allocate one now */ > + //warn("need_addr(${deref(port).lsp._uuid})"); > + vec_push(need_addr, deref(port).lsp._uuid) > + }, > + Some{addr} -> { > + var haddr = iptohl(addr.addr); > + if (haddr < start_ipv4 or haddr >= start_ipv4 + total_ipv4s) { > + vec_push(need_addr, deref(port).lsp._uuid) > + } else if (set_contains(used_addrs, haddr)) { > + vec_push(need_addr, deref(port).lsp._uuid); > + warn("Duplicate IP set on switch ${deref(port).lsp.name}: ${addr.addr}") > + } else { > + /* has valid dynamic address -- record it in used_addrs */ > + set_insert(used_addrs, haddr); > + assigned_addrs.push((port.lsp._uuid, Some{haddr})) > + } > + } > + } > + } > + } > + }; > + assigned_addrs.append(allocate_opt(used_addrs, need_addr, start_ipv4, start_ipv4 + total_ipv4s - 1)); > + assigned_addrs > + }, > + var port_address = FlatMap(dyn_addresses), > + (var lsport, var dyn_addr_bits) = port_address, > + var dyn_addr = dyn_addr_bits.map(hltoip). > + > +/* Compute new dynamic IPv4 address assignment: > + * - port does not need dynamic IP - use static_dynamic_ip if any > + * - a new address has been allocated for port - use this address > + * - otherwise, use existing dynamic IP > + */ > +relation SwitchPortNewIPv4DynAddress(lsport: uuid, dyn_addr: Option<in_addr>) > + > +SwitchPortNewIPv4DynAddress(lsp._uuid, ip_addr) :- > + &SwitchPort(.sw = &sw, > + .needs_dynamic_ipv4address = false, > + .static_dynamic_ipv4 = static_dynamic_ipv4, > + .lsp = lsp), > + var ip_addr = { > + match (static_dynamic_ipv4) { > + None -> { None }, > + Some{addr} -> { > + match (sw.subnet) { > + None -> { None }, > + Some{(_, _, start_ipv4, total_ipv4s)} -> { > + var haddr = iptohl(addr); > + if (haddr < start_ipv4 or haddr >= start_ipv4 + total_ipv4s) { > + /* new static ip is not valid */ > + None > + } else { > + Some{addr} > + } > + } > + } > + } > + } > + }. > + > +SwitchPortNewIPv4DynAddress(lsport, addr) :- > + SwitchPortAllocatedIPv4DynAddress(lsport, addr). > + > +/* > + * Dynamic MAC address allocation. > + */ > + > +function get_mac_prefix(options: Map<string,string>, uuid: uuid) : bit<64> = > +{ > + var existing_prefix = match (map_get(options, "mac_prefix")) { > + Some{prefix} -> scan_eth_addr_prefix(prefix), > + None -> None > + }; > + match (existing_prefix) { > + Some{prefix} -> prefix, > + None -> pseudorandom_mac(uuid, 16'h1234) & 64'hffffff000000 > + } > +} > +function put_mac_prefix(options: Map<string,string>, mac_prefix: bit<64>) > + : Map<string,string> = > +{ > + map_insert_imm(options, "mac_prefix", > + string_substr(to_string(eth_addr_from_uint64(mac_prefix)), 0, 8)) > +} > +relation MacPrefix(mac_prefix: bit<64>) > +MacPrefix(get_mac_prefix(options, uuid)) :- > + nb::NB_Global(._uuid = uuid, .options = options). > + > +/* ReservedMACAddress - keeps track of statically reserved MAC addresses. > + * (1) static addresses in `lsp.addresses` > + * (2) static MAC component of "dynamic" `lsp.addresses`. > + * (3) addresses associated with router ports peered with the switch. > + * > + * Addresses are kept in 64-bit host-endian format. > + */ > +relation ReservedMACAddress(addr: bit<64>) > + > +/* Add reserved address group (1). */ > +ReservedMACAddress(.addr = eth_addr_to_uint64(lport_addrs.ea)) :- > + SwitchPortStaticAddresses(.addrs = lport_addrs). > + > +/* Add reserved address group (2). */ > +ReservedMACAddress(.addr = eth_addr_to_uint64(mac_addr)) :- > + &SwitchPort(.lsp = lsp, .static_dynamic_mac = Some{mac_addr}). > + > +/* Add reserved address group (3). */ > +ReservedMACAddress(.addr = eth_addr_to_uint64(rport.networks.ea)) :- > + &SwitchPort(.peer = Some{&rport}). > + > +/* Aggregate all reserved MAC addresses. */ > +relation ReservedMACAddresses(addrs: Set<bit<64>>) > + > +ReservedMACAddresses(addrs) :- > + ReservedMACAddress(addr), > + var addrs = addr.group_by(()).to_set(). > + > +/* Handle case when `ReservedMACAddress` is empty */ > +ReservedMACAddresses(set_empty()) :- > + // NB_Global should have exactly one record, so we can > + // use it as a base for antijoin. > + nb::NB_Global(), > + not ReservedMACAddress(_). > + > +/* Allocate dynamic MAC addresses for ports that require them: > + * Case 1: port doesn't need dynamic MAC (i.e., does not have dynamic address or > + * has a dynamic address with a static MAC). > + * Case 2: needs dynamic MAC, has dynamic MAC, has existing dynamic MAC with the right prefix > + * needs dynamic MAC, does not have fixed dynamic MAC, doesn't have existing dynamic MAC with correct prefix > + */ > +relation SwitchPortAllocatedMACDynAddress(lsport: uuid, dyn_addr: bit<64>) > + > +SwitchPortAllocatedMACDynAddress(lsport, dyn_addr), > +SwitchPortDuplicateMACAddress(dup_addrs) :- > + /* Group all ports that need a dynamic IP address */ > + port in &SwitchPort(.needs_dynamic_macaddress = true, .lsp = lsp), > + SwitchPortNewIPv4DynAddress(lsp._uuid, ipv4_addr), > + var ports = (port, ipv4_addr).group_by(()).to_vec(), > + ReservedMACAddresses(reserved_addrs), > + MacPrefix(mac_prefix), > + (var dyn_addresses, var dup_addrs) = { > + var used_addrs = reserved_addrs; > + var need_addr = vec_empty(); > + var dup_addrs = set_empty(); > + for (port_with_addr in ports) { > + (var port, var ipv4_addr) = port_with_addr; > + var hint = match (ipv4_addr) { > + None -> Some { mac_prefix | 1 }, > + Some{addr} -> { > + /* The tentative MAC's suffix will be in the interval (1, 0xfffffe). */ > + var mac_suffix: bit<24> = iptohl(addr)[23:0] % ((mAC_ADDR_SPACE() - 1)[23:0]) + 1; > + Some{ mac_prefix | (40'd0 ++ mac_suffix) } > + } > + }; > + match (port.dynamic_address) { > + None -> { > + /* no dynamic address yet -- allocate one now */ > + vec_push(need_addr, (port.lsp._uuid, hint)) > + }, > + Some{dynaddr} -> { > + var haddr = eth_addr_to_uint64(dynaddr.ea); > + if ((haddr ^ mac_prefix) >> 24 != 0) { > + /* existing dynamic address is no longer valid */ > + vec_push(need_addr, (port.lsp._uuid, hint)) > + } else if (set_contains(used_addrs, haddr)) { > + set_insert(dup_addrs, dynaddr.ea); > + } else { > + /* has valid dynamic address -- record it in used_addrs */ > + set_insert(used_addrs, haddr) > + } > + } > + } > + }; > + // FIXME: if a port has a dynamic address that is no longer valid, and > + // we are unable to allocate a new address, the current behavior is to > + // keep the old invalid address. It should probably be changed to > + // removing the old address. > + // FIXME: OVN allocates MAC addresses by seeding them with IPv4 address. > + // Implement a custom allocation function that simulates this behavior. > + var res = allocate_with_hint(used_addrs, need_addr, mac_prefix + 1, mac_prefix + mAC_ADDR_SPACE() - 1); > + var res_strs = vec_empty(); > + for (x in res) { > + (var uuid, var addr) = x; > + vec_push(res_strs, "${uuid2str(uuid)}: ${eth_addr_from_uint64(addr)}") > + }; > + (res, dup_addrs) > + }, > + var port_address = FlatMap(dyn_addresses), > + (var lsport, var dyn_addr) = port_address. > + > +relation SwitchPortDuplicateMACAddress(dup_addrs: Set<eth_addr>) > +Warning["Duplicate MAC set: ${ea}"] :- > + SwitchPortDuplicateMACAddress(dup_addrs), > + var ea = FlatMap(dup_addrs). > + > +/* Compute new dynamic MAC address assignment: > + * - port does not need dynamic MAC - use `static_dynamic_mac` > + * - a new address has been allocated for port - use this address > + * - otherwise, use existing dynamic MAC > + */ > +relation SwitchPortNewMACDynAddress(lsport: uuid, dyn_addr: Option<eth_addr>) > + > +SwitchPortNewMACDynAddress(lsp._uuid, mac_addr) :- > + &SwitchPort(.needs_dynamic_macaddress = false, > + .lsp = lsp, > + .sw = &sw, > + .static_dynamic_mac = static_dynamic_mac), > + var mac_addr = match (static_dynamic_mac) { > + None -> None, > + Some{addr} -> { > + if (is_some(sw.subnet) or is_some(sw.ipv6_prefix) or > + map_get(sw.ls.other_config, "mac_only") == Some{"true"}) { > + Some{addr} > + } else { > + None > + } > + } > + }. > + > +SwitchPortNewMACDynAddress(lsport, Some{eth_addr_from_uint64(addr)}) :- > + SwitchPortAllocatedMACDynAddress(lsport, addr). > + > +SwitchPortNewMACDynAddress(lsp._uuid, addr) :- > + &SwitchPort(.needs_dynamic_macaddress = true, .lsp = lsp, .dynamic_address = cur_address), > + not SwitchPortAllocatedMACDynAddress(lsp._uuid, _), > + var addr = match (cur_address) { > + None -> None, > + Some{dynaddr} -> Some{dynaddr.ea} > + }. > + > +/* > + * Dynamic IPv6 address allocation. > + * `needs_dynamic_ipv6address` -> in6_generate_eui64(mac, ipv6_prefix) > + */ > +relation SwitchPortNewDynamicAddress(port: Ref<SwitchPort>, address: Option<lport_addresses>) > + > +SwitchPortNewDynamicAddress(port, None) :- > + port in &SwitchPort(.lsp = lsp), > + SwitchPortNewMACDynAddress(lsp._uuid, None). > + > +SwitchPortNewDynamicAddress(port, lport_address) :- > + port in &SwitchPort(.lsp = lsp, > + .sw = &sw, > + .needs_dynamic_ipv6address = needs_dynamic_ipv6address, > + .static_dynamic_ipv6 = static_dynamic_ipv6), > + SwitchPortNewMACDynAddress(lsp._uuid, Some{mac_addr}), > + SwitchPortNewIPv4DynAddress(lsp._uuid, opt_ip4_addr), > + var ip6_addr = match ((static_dynamic_ipv6, needs_dynamic_ipv6address, sw.ipv6_prefix)) { > + (Some{ipv6}, _, _) -> " ${ipv6}", > + (_, true, Some{prefix}) -> " ${in6_generate_eui64(mac_addr, prefix)}", > + _ -> "" > + }, > + var ip4_addr = match (opt_ip4_addr) { > + None -> "", > + Some{ip4} -> " ${ip4}" > + }, > + var addr_string = "${mac_addr}${ip6_addr}${ip4_addr}", > + var lport_address = extract_addresses(addr_string). > + > + > +///* If there's more than one dynamic addresses in port->addresses, log a warning > +// and only allocate the first dynamic address */ > +// > +// VLOG_WARN_RL(&rl, "More than one dynamic address " > +// "configured for logical switch port '%s'", > +// nbsp->name); > +// > +////>> * MAC addresses suffixes in OUIs managed by OVN"s MACAM (MAC Address > +////>> Management) system, in the range 1...0xfffffe. > +////>> * IPv4 addresses in ranges managed by OVN's IPAM (IP Address Management) > +////>> system. The range varies depending on the size of the subnet. > +////>> > +////>> Are these `dynamic_addresses` in OVN_Northbound.Logical_Switch_Port`? > diff --git a/northd/lrouter.dl b/northd/lrouter.dl > new file mode 100644 > index 000000000000..5ef54fb761e3 > --- /dev/null > +++ b/northd/lrouter.dl > @@ -0,0 +1,715 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +import OVN_Northbound as nb > +import OVN_Southbound as sb > +import multicast > +import ovsdb > +import ovn > +import helpers > +import lswitch > +import ovn_northd > + > +function is_enabled(lr: nb::Logical_Router): bool { is_enabled(lr.enabled) } > +function is_enabled(lrp: nb::Logical_Router_Port): bool { is_enabled(lrp.enabled) } > +function is_enabled(rp: RouterPort): bool { rp.lrp.is_enabled() } > +function is_enabled(rp: Ref<RouterPort>): bool { rp.lrp.is_enabled() } > + > +/* default logical flow prioriry for distributed routes */ > +function dROUTE_PRIO(): bit<32> = 400 > + > +/* LogicalRouterPortCandidate. > + * > + * Each row pairs a logical router port with its logical router, but without > + * checking that the logical router port is on only one logical router. > + * > + * (Use LogicalRouterPort instead, which guarantees uniqueness.) */ > +relation LogicalRouterPortCandidate(lrp_uuid: uuid, lr_uuid: uuid) > +LogicalRouterPortCandidate(lrp_uuid, lr_uuid) :- > + nb::Logical_Router(._uuid = lr_uuid, .ports = ports), > + var lrp_uuid = FlatMap(ports). > +Warning[message] :- > + LogicalRouterPortCandidate(lrp_uuid, lr_uuid), > + var lrs = lr_uuid.group_by(lrp_uuid).to_set(), > + set_size(lrs) > 1, > + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), > + var message = "Bad configuration: logical router port ${lrp.name} belongs " > + "to more than one logical router". > + > +/* Each row means 'lport' is in 'lrouter' (and only that lrouter). */ > +relation LogicalRouterPort(lport: uuid, lrouter: uuid) > +LogicalRouterPort(lrp_uuid, lr_uuid) :- > + LogicalRouterPortCandidate(lrp_uuid, lr_uuid), > + var lrs = lr_uuid.group_by(lrp_uuid).to_set(), > + set_size(lrs) == 1, > + Some{var lr_uuid} = set_nth(lrs, 0). > + > +/* > + * Peer routers. > + * > + * Each row in the relation indicates that routers 'a' and 'b' can reach > + * each other directly through router ports. > + * > + * This relation is symmetric: if (a,b) then (b,a). > + * This relation is antireflexive: if (a,b) then a != b. > + * > + * Routers aren't peers if they can reach each other only through logical > + * switch ports (that's the ReachableLogicalRouter table). > + */ > +relation PeerLogicalRouter(a: uuid, b: uuid) > +PeerLogicalRouter(lrp_uuid, peer._uuid) :- > + LogicalRouterPort(lrp_uuid, _), > + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), > + Some{var peer_name} = lrp.peer, > + peer in nb::Logical_Router_Port(.name = peer_name), > + peer.peer == Some{lrp.name}, // 'peer' must point back to 'lrp' > + lrp_uuid != peer._uuid. // No reflexive pointers. > + > +/* > + * First-hop routers. > + * > + * Each row indicates that 'lrouter' is a first-hop logical router for > + * 'lswitch', that is, that a "cable" directly connects 'lrouter' and > + * 'lswitch'. > + * > + * A switch can have multiple first-hop routers. */ > +relation FirstHopLogicalRouter(lrouter: uuid, lswitch: uuid) > +FirstHopLogicalRouter(lrouter, lswitch) :- > + LogicalRouterPort(lrp_uuid, lrouter), > + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), > + LogicalSwitchPort(lsp_uuid, lswitch), > + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), > + lsp.__type == "router", > + map_get(lsp.options, "router-port") == Some{lrp.name}, > + is_none(lrp.peer). > + > +/* > + * Reachable routers. > + * > + * Each row in the relation indicates that routers 'a' and 'b' can reach each > + * other directly or indirectly through any chain of logical routers and > + * switches. > + * > + * This relation is symmetric: if (a,b) then (b,a). > + * This relation is reflexive: (a,a) is always true. > + */ > +relation ReachableLogicalRouter(a: uuid, b: uuid) > +ReachableLogicalRouter(a, b) :- > + PeerLogicalRouter(a, c), > + ReachableLogicalRouter(c, b). > +ReachableLogicalRouter(a, b) :- > + FirstHopLogicalRouter(a, ls), > + FirstHopLogicalRouter(b, ls). > +ReachableLogicalRouter(a, b) :- > + ReachableLogicalRouter(a, c), > + ReachableLogicalRouter(c, b). > +ReachableLogicalRouter(a, a) :- ReachableLogicalRouter(a, _). > + > +// ha_chassis_group and gateway_chassis may not both be present. > +Warning[message] :- > + lrp in nb::Logical_Router_Port(), > + is_some(lrp.ha_chassis_group), > + not set_is_empty(lrp.gateway_chassis), > + var message = "Both ha_chassis_group and gateway_chassis configured on " > + "port ${lrp.name}; ignoring the latter". > + > +// A distributed gateway port cannot also be an L3 gateway router. > +Warning[message] :- > + lrp in nb::Logical_Router_Port(), > + is_some(lrp.ha_chassis_group) > + or not set_is_empty(lrp.gateway_chassis), > + map_contains_key(lrp.options, "chassis"), > + var message = "Bad configuration: distributed gateway port configured on " > + "port ${lrp.name} on L3 gateway router". > + > +/* DistributedGatewayPortCandidate. > + * > + * Each row pairs a logical router with its distributed gateway port, > + * but without checking that there is at most one DGP per LR. > + * > + * (Use DistributedGatewayPort instead, since it guarantees uniqueness.) */ > +relation DistributedGatewayPortCandidate(lr_uuid: uuid, lrp_uuid: uuid) > +DistributedGatewayPortCandidate(lr_uuid, lrp_uuid) :- > + lr in nb::Logical_Router(._uuid = lr_uuid), > + LogicalRouterPort(lrp_uuid, lr._uuid), > + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), > + not map_contains_key(lrp.options, "chassis"), > + var has_hcg = is_some(lrp.ha_chassis_group), > + var has_gc = not set_is_empty(lrp.gateway_chassis), > + has_hcg or has_gc. > +Warning[message] :- > + DistributedGatewayPortCandidate(lr_uuid, lrp_uuid), > + var lrps = lrp_uuid.group_by(lr_uuid).to_set(), > + set_size(lrps) > 1, > + lr in nb::Logical_Router(._uuid = lr_uuid), > + var message = "Bad configuration: multiple distributed gateway ports on " > + "logical router ${lr.name}; ignoring all of them". > + > +/* Distributed gateway ports. > + * > + * Each row means 'lrp' is the distributed gateway port on 'lr_uuid'. > + * > + * There is at most one distributed gateway port per logical router. */ > +relation DistributedGatewayPort(lrp: nb::Logical_Router_Port, lr_uuid: uuid) > +DistributedGatewayPort(lrp, lr_uuid) :- > + DistributedGatewayPortCandidate(lr_uuid, lrp_uuid), > + var lrps = lrp_uuid.group_by(lr_uuid).to_set(), > + set_size(lrps) == 1, > + Some{var lrp_uuid} = set_nth(lrps, 0), > + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid). > + > +/* HAChassis is an abstraction over nb::Gateway_Chassis and nb::HA_Chassis, which > + * are different ways to represent the same configuration. Each row is > + * effectively one HA_Chassis record. (Usually, we could associated each > + * row with a particular 'lr_uuid', but it's permissible for more than one > + * logical router to use a HA chassis group, so we omit it so that multiple > + * references get merged.) > + * > + * nb::Gateway_Chassis has an "options" column that this omits because > + * nb::HA_Chassis doesn't have anything similar. That's OK because no options > + * were ever defined. */ > +relation HAChassis(hacg_uuid: uuid, > + hac_uuid: uuid, > + chassis_name: string, > + priority: integer, > + external_ids: Map<string,string>) > +HAChassis(ha_chassis_group_uuid(lrp._uuid), gw_chassis_uuid, > + chassis_name, priority, external_ids) :- > + DistributedGatewayPort(.lrp = lrp), > + is_none(lrp.ha_chassis_group), > + var gw_chassis_uuid = FlatMap(lrp.gateway_chassis), > + nb::Gateway_Chassis(._uuid = gw_chassis_uuid, > + .chassis_name = chassis_name, > + .priority = priority, > + .external_ids = eids), > + var external_ids = map_insert_imm(eids, "chassis-name", chassis_name). > +HAChassis(ha_chassis_group_uuid(ha_chassis_group._uuid), ha_chassis_uuid, > + chassis_name, priority, external_ids) :- > + DistributedGatewayPort(.lrp = lrp), > + Some{var hac_group_uuid} = lrp.ha_chassis_group, > + ha_chassis_group in nb::HA_Chassis_Group(._uuid = hac_group_uuid), > + var ha_chassis_uuid = FlatMap(ha_chassis_group.ha_chassis), > + nb::HA_Chassis(._uuid = ha_chassis_uuid, > + .chassis_name = chassis_name, > + .priority = priority, > + .external_ids = eids), > + var external_ids = map_insert_imm(eids, "chassis-name", chassis_name). > + > +/* HAChassisGroup is an abstraction for sb::HA_Chassis_Group that papers over > + * the two southbound ways to configure it via nb::Gateway_Chassis and > + * nb::HA_Chassis. The former configuration method does not provide a name or > + * external_ids for the group (only for individual chassis), so we generate > + * them. > + * > + * (Usually, we could associated each row with a particular 'lr_uuid', but it's > + * permissible for more than one logical router to use a HA chassis group, so > + * we omit it so that multiple references get merged.) > + */ > +relation HAChassisGroup(uuid: uuid, > + name: string, > + external_ids: Map<string,string>) > +HAChassisGroup(ha_chassis_group_uuid(lrp._uuid), lrp.name, map_empty()) :- > + DistributedGatewayPort(.lrp = lrp), > + is_none(lrp.ha_chassis_group), > + not set_is_empty(lrp.gateway_chassis). > +HAChassisGroup(ha_chassis_group_uuid(hac_group_uuid), > + name, external_ids) :- > + DistributedGatewayPort(.lrp = lrp), > + Some{var hac_group_uuid} = lrp.ha_chassis_group, > + nb::HA_Chassis_Group(._uuid = hacg_uuid, > + .name = name, > + .external_ids = external_ids). > + > +/* Each row maps from a logical router to the name of its HAChassisGroup. > + * This level of indirection is needed because multiple logical routers > + * are allowed to reference a given HAChassisGroup. */ > +relation LogicalRouterHAChassisGroup(lr_uuid: uuid, > + hacg_uuid: uuid) > +LogicalRouterHAChassisGroup(lr_uuid, ha_chassis_group_uuid(lrp._uuid)) :- > + DistributedGatewayPort(lrp, lr_uuid), > + is_none(lrp.ha_chassis_group), > + set_size(lrp.gateway_chassis) > 0. > +LogicalRouterHAChassisGroup(lr_uuid, > + ha_chassis_group_uuid(hac_group_uuid)) :- > + DistributedGatewayPort(lrp, lr_uuid), > + Some{var hac_group_uuid} = lrp.ha_chassis_group, > + nb::HA_Chassis_Group(._uuid = hac_group_uuid). > + > + > +/* For each router port, tracks whether it's a redirect port of its router */ > +relation RouterPortIsRedirect(lrp: uuid, is_redirect: bool) > +RouterPortIsRedirect(lrp, true) :- DistributedGatewayPort(nb::Logical_Router_Port{._uuid = lrp}, _). > +RouterPortIsRedirect(lrp, false) :- > + nb::Logical_Router_Port(._uuid = lrp), > + not DistributedGatewayPort(nb::Logical_Router_Port{._uuid = lrp}, _). > + > +relation LogicalRouterRedirectPort(lr: uuid, has_redirect_port: Option<nb::Logical_Router_Port>) > + > +LogicalRouterRedirectPort(lr, Some{lrp}) :- > + DistributedGatewayPort(lrp, lr). > + > +LogicalRouterRedirectPort(lr, None) :- > + nb::Logical_Router(._uuid = lr), > + not DistributedGatewayPort(_, lr). > + > +typedef ExceptionalExtIps = AllowedExtIps{ips: Ref<nb::Address_Set>} > + | ExemptedExtIps{ips: Ref<nb::Address_Set>} > + > +typedef NAT = NAT{ > + nat: Ref<nb::NAT>, > + external_ip: v46_ip, > + external_mac: Option<eth_addr>, > + exceptional_ext_ips: Option<ExceptionalExtIps> > +} > + > +relation LogicalRouterNAT0( > + lr: uuid, > + nat: Ref<nb::NAT>, > + external_ip: v46_ip, > + external_mac: Option<eth_addr>) > +LogicalRouterNAT0(lr, nat, external_ip, external_mac) :- > + nb::Logical_Router(._uuid = lr, .nat = nats), > + var nat_uuid = FlatMap(nats), > + nat in &NATRef[nb::NAT{._uuid = nat_uuid}], > + Some{var external_ip} = ip46_parse(nat.external_ip), > + var external_mac = match (nat.external_mac) { > + Some{s} -> eth_addr_from_string(s), > + None -> None > + }. > +Warning["Bad ip address ${nat.external_ip} in nat configuration for router ${lr_name}."] :- > + nb::Logical_Router(._uuid = lr, .nat = nats, .name = lr_name), > + var nat_uuid = FlatMap(nats), > + nat in &NATRef[nb::NAT{._uuid = nat_uuid}], > + None = ip46_parse(nat.external_ip). > +Warning["Bad MAC address ${s} in nat configuration for router ${lr_name}."] :- > + nb::Logical_Router(._uuid = lr, .nat = nats, .name = lr_name), > + var nat_uuid = FlatMap(nats), > + nat in &NATRef[nb::NAT{._uuid = nat_uuid}], > + Some{var s} = nat.external_mac, > + None = eth_addr_from_string(s). > + > +relation LogicalRouterNAT(lr: uuid, nat: NAT) > +LogicalRouterNAT(lr, NAT{nat, external_ip, external_mac, None}) :- > + LogicalRouterNAT0(lr, nat, external_ip, external_mac), > + nat.allowed_ext_ips.is_none(), > + nat.exempted_ext_ips.is_none(). > +LogicalRouterNAT(lr, NAT{nat, external_ip, external_mac, Some{AllowedExtIps{__as}}}) :- > + LogicalRouterNAT0(lr, nat, external_ip, external_mac), > + nat.exempted_ext_ips.is_none(), > + Some{var __as_uuid} = nat.allowed_ext_ips, > + __as in &AddressSetRef[nb::Address_Set{._uuid = __as_uuid}]. > +LogicalRouterNAT(lr, NAT{nat, external_ip, external_mac, Some{ExemptedExtIps{__as}}}) :- > + LogicalRouterNAT0(lr, nat, external_ip, external_mac), > + nat.allowed_ext_ips.is_none(), > + Some{var __as_uuid} = nat.exempted_ext_ips, > + __as in &AddressSetRef[nb::Address_Set{._uuid = __as_uuid}]. > +Warning["NAT rule: ${nat._uuid} not applied, since" > + "both allowed and exempt external ips set"] :- > + LogicalRouterNAT0(lr, nat, _, _), > + nat.allowed_ext_ips.is_some() and nat.exempted_ext_ips.is_some(). > + > +relation LogicalRouterNATs(lr: uuid, nat: Vec<NAT>) > + > +LogicalRouterNATs(lr, nats) :- > + LogicalRouterNAT(lr, nat), > + var nats = nat.group_by(lr).to_vec(). > + > +LogicalRouterNATs(lr, vec_empty()) :- > + nb::Logical_Router(._uuid = lr), > + not LogicalRouterNAT(lr, _). > + > +/* For each router, collect the set of IPv4 and IPv6 addresses used for SNAT, > + * which includes: > + * > + * - dnat_force_snat_addrs > + * - lb_force_snat_addrs > + * - IP addresses used in the router's attached NAT rules > + * > + * This is like init_nat_entries() in ovn-northd.c. */ > +relation LogicalRouterSnatIP(lr: uuid, snat_ip: v46_ip, nat: Option<NAT>) > +LogicalRouterSnatIP(lr._uuid, force_snat_ip, None) :- > + lr in nb::Logical_Router(), > + var dnat_force_snat_ips = get_force_snat_ip(lr, "dnat"), > + var lb_force_snat_ips = get_force_snat_ip(lr, "lb"), > + var force_snat_ip = FlatMap(dnat_force_snat_ips.union(lb_force_snat_ips)). > +LogicalRouterSnatIP(lr, snat_ip, Some{nat}) :- > + LogicalRouterNAT(lr, nat@NAT{.nat = &nb::NAT{.__type = "snat"}, .external_ip = snat_ip}). > + > +function group_to_setunionmap(g: Group<'K1, ('K2,Set<'V>)>): Map<'K2,Set<'V>> { > + var map = map_empty(); > + for (entry in g) { > + (var key, var value) = entry; > + match (map.get(key)) { > + None -> map.insert(key, value), > + Some{old_value} -> map.insert(key, old_value.union(value)) > + } > + }; > + map > +} > +relation LogicalRouterSnatIPs(lr: uuid, snat_ips: Map<v46_ip, Set<NAT>>) > +LogicalRouterSnatIPs(lr, snat_ips) :- > + LogicalRouterSnatIP(lr, snat_ip, nat), > + var snat_ips = (snat_ip, nat.to_set()).group_by(lr).group_to_setunionmap(). > +LogicalRouterSnatIPs(lr._uuid, map_empty()) :- > + lr in nb::Logical_Router(), > + not LogicalRouterSnatIP(.lr = lr._uuid). > + > +relation LogicalRouterLB(lr: uuid, nat: Ref<nb::Load_Balancer>) > + > +LogicalRouterLB(lr, lb) :- > + nb::Logical_Router(._uuid = lr, .load_balancer = lbs), > + var lb_uuid = FlatMap(lbs), > + lb in &LoadBalancerRef[nb::Load_Balancer{._uuid = lb_uuid}]. > + > +relation LogicalRouterLBs(lr: uuid, nat: Vec<Ref<nb::Load_Balancer>>) > + > +LogicalRouterLBs(lr, lbs) :- > + LogicalRouterLB(lr, lb), > + var lbs = lb.group_by(lr).to_vec(). > + > +LogicalRouterLBs(lr, vec_empty()) :- > + nb::Logical_Router(._uuid = lr), > + not LogicalRouterLB(lr, _). > + > +/* Router relation collects all attributes of a logical router. > + * > + * `lr` - Logical_Router record from the NB database > + * `l3dgw_port` - optional redirect port (see `DistributedGatewayPort`) > + * `redirect_port_name` - derived redirect port name (or empty string if > + * router does not have a redirect port) > + * `is_gateway` - true iff the router is a gateway router. Together with > + * `l3dgw_port`, this flag affects the generation of various flows > + * related to NAT and load balancing. > + * `learn_from_arp_request` - whether ARP requests to addresses on the router > + * should always be learned > + */ > + > +function chassis_redirect_name(port_name: string): string = "cr-${port_name}" > + > +relation &Router( > + lr: nb::Logical_Router, > + l3dgw_port: Option<nb::Logical_Router_Port>, > + redirect_port_name: string, > + is_gateway: bool, > + nats: Vec<NAT>, > + snat_ips: Map<v46_ip, Set<NAT>>, > + lbs: Vec<Ref<nb::Load_Balancer>>, > + mcast_cfg: Ref<McastRouterCfg>, > + learn_from_arp_request: bool > +) > + > +&Router(.lr = lr, > + .l3dgw_port = l3dgw_port, > + .redirect_port_name = > + match (l3dgw_port) { > + Some{rport} -> json_string_escape(chassis_redirect_name(rport.name)), > + _ -> "" > + }, > + .is_gateway = is_some(map_get(lr.options, "chassis")), > + .nats = nats, > + .snat_ips = snat_ips, > + .lbs = lbs, > + .mcast_cfg = mcast_cfg, > + .learn_from_arp_request = learn_from_arp_request) :- > + lr in nb::Logical_Router(), > + lr.is_enabled(), > + LogicalRouterRedirectPort(lr._uuid, l3dgw_port), > + LogicalRouterNATs(lr._uuid, nats), > + LogicalRouterLBs(lr._uuid, lbs), > + LogicalRouterSnatIPs(lr._uuid, snat_ips), > + mcast_cfg in &McastRouterCfg(.datapath = lr._uuid), > + var learn_from_arp_request = map_get_bool_def(lr.options, "always_learn_from_arp_request", true). > + > +/* RouterLB: many-to-many relation between logical routers and nb::LB */ > +relation RouterLB(router: Ref<Router>, lb: Ref<nb::Load_Balancer>) > + > +RouterLB(router, lb) :- > + router in &Router(.lbs = lbs), > + var lb = FlatMap(lbs). > + > +/* Load balancer VIPs associated with routers */ > +relation RouterLBVIP( > + router: Ref<Router>, > + lb: Ref<nb::Load_Balancer>, > + vip: string, > + backends: string) > + > +RouterLBVIP(router, lb, vip, backends) :- > + RouterLB(router, lb@(&nb::Load_Balancer{.vips = vips})), > + var kv = FlatMap(vips), > + (var vip, var backends) = kv. > + > +/* Router-to-router logical port connections */ > +relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: string) > + > +RouterRouterPeer(rport1, rport2, peer_name) :- > + nb::Logical_Router_Port(._uuid = rport1, .peer = peer), > + Some{var peer_name} = peer, > + nb::Logical_Router_Port(._uuid = rport2, .name = peer_name). > + > +/* Router port can peer with anothe router port, a switch port or have > + * no peer. > + */ > +typedef RouterPeer = PeerRouter{rport: uuid, name: string} > + | PeerSwitch{sport: uuid, name: string} > + | PeerNone > + > +function router_peer_name(peer: RouterPeer): Option<string> = { > + match (peer) { > + PeerRouter{_, n} -> Some{n}, > + PeerSwitch{_, n} -> Some{n}, > + PeerNone -> None > + } > +} > + > +relation RouterPortPeer(rport: uuid, peer: RouterPeer) > + > +/* Router-to-router logical port connections */ > +RouterPortPeer(rport, PeerSwitch{sport, sport_name}) :- > + SwitchRouterPeer(sport, sport_name, rport). > + > +RouterPortPeer(rport1, PeerRouter{rport2, rport2_name}) :- > + RouterRouterPeer(rport1, rport2, rport2_name). > + > +RouterPortPeer(rport, PeerNone) :- > + nb::Logical_Router_Port(._uuid = rport), > + not SwitchRouterPeer(_, _, rport), > + not RouterRouterPeer(rport, _, _). > + > +/* Each row maps from a Logical_Router port to the input options in its > + * corresponding Port_Binding (if any). This is because northd preserves > + * most of the options in that column. (northd unconditionally sets the > + * ipv6_prefix_delegation and ipv6_prefix options, so we remove them for > + * faster convergence.) */ > +relation RouterPortSbOptions(lrp_uuid: uuid, options: Map<string,string>) > +RouterPortSbOptions(lrp._uuid, options) :- > + lrp in nb::Logical_Router_Port(), > + pb in sb::Port_Binding(._uuid = lrp._uuid), > + var options = { > + var options = pb.options; > + map_remove(options, "ipv6_prefix"); > + map_remove(options, "ipv6_prefix_delegation"); > + options > + }. > +RouterPortSbOptions(lrp._uuid, map_empty()) :- > + lrp in nb::Logical_Router_Port(), > + not sb::Port_Binding(._uuid = lrp._uuid). > + > +/* FIXME: what should happen when extract_lrp_networks fails? */ > +/* RouterPort relation collects all attributes of a logical router port */ > +relation &RouterPort( > + lrp: nb::Logical_Router_Port, > + json_name: string, > + networks: lport_addresses, > + router: Ref<Router>, > + is_redirect: bool, > + peer: RouterPeer, > + mcast_cfg: Ref<McastPortCfg>, > + sb_options: Map<string,string>) > + > +&RouterPort(.lrp = lrp, > + .json_name = json_string_escape(lrp.name), > + .networks = networks, > + .router = router, > + .is_redirect = is_redirect, > + .peer = peer, > + .mcast_cfg = mcast_cfg, > + .sb_options = sb_options) :- > + nb::Logical_Router_Port[lrp], > + Some{var networks} = extract_lrp_networks(lrp.mac, lrp.networks), > + LogicalRouterPort(lrp._uuid, lrouter_uuid), > + router in &Router(.lr = nb::Logical_Router{._uuid = lrouter_uuid}), > + RouterPortIsRedirect(lrp._uuid, is_redirect), > + RouterPortPeer(lrp._uuid, peer), > + mcast_cfg in &McastPortCfg(.port = lrp._uuid, .router_port = true), > + RouterPortSbOptions(lrp._uuid, sb_options). > + > +relation RouterPortNetworksIPv4Addr(port: Ref<RouterPort>, addr: ipv4_netaddr) > + > +RouterPortNetworksIPv4Addr(port, addr) :- > + port in &RouterPort(.networks = networks), > + var addr = FlatMap(networks.ipv4_addrs). > + > +relation RouterPortNetworksIPv6Addr(port: Ref<RouterPort>, addr: ipv6_netaddr) > + > +RouterPortNetworksIPv6Addr(port, addr) :- > + port in &RouterPort(.networks = networks), > + var addr = FlatMap(networks.ipv6_addrs). > + > +/* StaticRoute: Collects and parses attributes of a static route. */ > +typedef route_policy = SrcIp | DstIp > +function route_policy_from_string(s: Option<string>): route_policy = { > + match (s) { > + Some{"src-ip"} -> SrcIp, > + _ -> DstIp > + } > +} > +function to_string(policy: route_policy): string = { > + match (policy) { > + SrcIp -> "src-ip", > + DstIp -> "dst-ip" > + } > +} > + > +typedef route_key = RouteKey { > + policy: route_policy, > + ip_prefix: v46_ip, > + plen: bit<32> > +} > + > +relation &StaticRoute(lrsr: nb::Logical_Router_Static_Route, > + key: route_key, > + nexthop: v46_ip, > + output_port: Option<string>, > + ecmp_symmetric_reply: bool) > + > +&StaticRoute(.lrsr = lrsr, > + .key = RouteKey{policy, ip_prefix, plen}, > + .nexthop = nexthop, > + .output_port = lrsr.output_port, > + .ecmp_symmetric_reply = esr) :- > + lrsr in nb::Logical_Router_Static_Route(), > + var policy = route_policy_from_string(lrsr.policy), > + Some{(var nexthop, var nexthop_plen)} = ip46_parse_cidr(lrsr.nexthop), > + match (nexthop) { > + IPv4{_} -> nexthop_plen == 32, > + IPv6{_} -> nexthop_plen == 128 > + }, > + Some{(var ip_prefix, var plen)} = ip46_parse_cidr(lrsr.ip_prefix), > + match ((nexthop, ip_prefix)) { > + (IPv4{_}, IPv4{_}) -> true, > + (IPv6{_}, IPv6{_}) -> true, > + _ -> false > + }, > + var esr = map_get_bool_def(lrsr.options, "ecmp_symmetric_reply", false). > + > +/* Returns the IP address of the router port 'op' that > + * overlaps with 'ip'. If one is not found, returns None. */ > +function find_lrp_member_ip(networks: lport_addresses, ip: v46_ip): Option<v46_ip> = > +{ > + match (ip) { > + IPv4{ip4} -> { > + for (na in networks.ipv4_addrs) { > + if (ip_same_network((na.addr, ip4), ipv4_netaddr_mask(na))) { > + /* There should be only 1 interface that matches the > + * supplied IP. Otherwise, it's a configuration error, > + * because subnets of a router's interfaces should NOT > + * overlap. */ > + return Some{IPv4{na.addr}} > + } > + }; > + return None > + }, > + IPv6{ip6} -> { > + for (na in networks.ipv6_addrs) { > + if (ipv6_same_network((na.addr, ip6), ipv6_netaddr_mask(na))) { > + /* There should be only 1 interface that matches the > + * supplied IP. Otherwise, it's a configuration error, > + * because subnets of a router's interfaces should NOT > + * overlap. */ > + return Some{IPv6{na.addr}} > + } > + }; > + return None > + } > + } > +} > + > + > +/* Step 1: compute router-route pairs */ > +relation RouterStaticRoute_( > + router : Ref<Router>, > + key : route_key, > + nexthop : v46_ip, > + output_port : Option<string>, > + ecmp_symmetric_reply : bool) > + > +RouterStaticRoute_(.router = router, > + .key = route.key, > + .nexthop = route.nexthop, > + .output_port = route.output_port, > + .ecmp_symmetric_reply = route.ecmp_symmetric_reply) :- > + router in &Router(.lr = nb::Logical_Router{.static_routes = routes}), > + var route_id = FlatMap(routes), > + route in &StaticRoute(.lrsr = nb::Logical_Router_Static_Route{._uuid = route_id}). > + > +/* Step-2: compute output_port for each pair */ > +typedef route_dst = RouteDst { > + nexthop: v46_ip, > + src_ip: v46_ip, > + port: Ref<RouterPort>, > + ecmp_symmetric_reply: bool > +} > + > +relation RouterStaticRoute( > + router : Ref<Router>, > + key : route_key, > + dsts : Set<route_dst>) > + > +RouterStaticRoute(router, key, dsts) :- > + RouterStaticRoute_(.router = router, > + .key = key, > + .nexthop = nexthop, > + .output_port = None, > + .ecmp_symmetric_reply = ecmp_symmetric_reply), > + /* output_port is not specified, find the > + * router port matching the next hop. */ > + port in &RouterPort(.router = &Router{.lr = nb::Logical_Router{._uuid = router.lr._uuid}}, > + .networks = networks), > + Some{var src_ip} = find_lrp_member_ip(networks, nexthop), > + var dst = RouteDst{nexthop, src_ip, port, ecmp_symmetric_reply}, > + var dsts = dst.group_by((router, key)).to_set(). > + > +RouterStaticRoute(router, key, dsts) :- > + RouterStaticRoute_(.router = router, > + .key = key, > + .nexthop = nexthop, > + .output_port = Some{oport}, > + .ecmp_symmetric_reply = ecmp_symmetric_reply), > + /* output_port specified */ > + port in &RouterPort(.lrp = nb::Logical_Router_Port{.name = oport}, > + .networks = networks), > + Some{var src_ip} = match (find_lrp_member_ip(networks, nexthop)) { > + Some{src_ip} -> Some{src_ip}, > + None -> { > + /* There are no IP networks configured on the router's port via > + * which 'route->nexthop' is theoretically reachable. But since > + * 'out_port' has been specified, we honor it by trying to reach > + * 'route->nexthop' via the first IP address of 'out_port'. > + * (There are cases, e.g in GCE, where each VM gets a /32 IP > + * address and the default gateway is still reachable from it.) */ > + match (key.ip_prefix) { > + IPv4{_} -> match (vec_nth(networks.ipv4_addrs, 0)) { > + Some{addr} -> Some{IPv4{addr.addr}}, > + None -> { > + warn("No path for static route ${key.ip_prefix}; next hop ${nexthop}"); > + None > + } > + }, > + IPv6{_} -> match (vec_nth(networks.ipv6_addrs, 0)) { > + Some{addr} -> Some{IPv6{addr.addr}}, > + None -> { > + warn("No path for static route ${key.ip_prefix}; next hop ${nexthop}"); > + None > + } > + } > + } > + } > + }, > + var dsts = set_singleton(RouteDst{nexthop, src_ip, port, ecmp_symmetric_reply}). > + > +Warning[message] :- > + RouterStaticRoute_(.router = router, .key = key, .nexthop = nexthop), > + not RouterStaticRoute(.router = router, .key = key), > + var message = "No path for ${key.policy} static route ${key.ip_prefix}/${key.plen} with next hop ${nexthop}". > diff --git a/northd/lswitch.dl b/northd/lswitch.dl > new file mode 100644 > index 000000000000..9a2d4c1c8d4b > --- /dev/null > +++ b/northd/lswitch.dl > @@ -0,0 +1,643 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +import OVN_Northbound as nb > +import OVN_Southbound as sb > +import ovsdb > +import ovn > +import lrouter > +import multicast > +import helpers > +import ipam > + > +function is_enabled(lsp: nb::Logical_Switch_Port): bool { is_enabled(lsp.enabled) } > +function is_enabled(lsp: Ref<nb::Logical_Switch_Port>): bool { lsp.deref().is_enabled() } > +function is_enabled(sp: SwitchPort): bool { sp.lsp.is_enabled() } > +function is_enabled(sp: Ref<SwitchPort>): bool { sp.lsp.is_enabled() } > + > +relation SwitchRouterPeerRef(lsp: uuid, rport: Option<Ref<RouterPort>>) > + > +SwitchRouterPeerRef(lsp, Some{rport}) :- > + SwitchRouterPeer(lsp, _, lrp), > + rport in &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp}). > + > +SwitchRouterPeerRef(lsp, None) :- > + nb::Logical_Switch_Port(._uuid = lsp), > + not SwitchRouterPeer(lsp, _, _). > + > +/* map logical ports to logical switches */ > +relation LogicalSwitchPort(lport: uuid, lswitch: uuid) > + > +LogicalSwitchPort(lport, lswitch) :- > + nb::Logical_Switch(._uuid = lswitch, .ports = ports), > + var lport = FlatMap(ports). > + > +/* Logical switches that have enabled ports with "unknown" address */ > +relation LogicalSwitchUnknownPorts(ls: uuid, port_ids: Set<uuid>) > + > +LogicalSwitchUnknownPorts(ls_uuid, port_ids) :- > + &SwitchPort(.lsp = lsp, .sw = &Switch{.ls = ls}), > + lsp.is_enabled() and set_contains(lsp.addresses, "unknown"), > + var ls_uuid = ls._uuid, > + var port_ids = lsp._uuid.group_by(ls_uuid).to_set(). > + > +/* PortStaticAddresses: static IP addresses associated with each Logical_Switch_Port */ > +relation PortStaticAddresses(lsport: uuid, ip4addrs: Set<string>, ip6addrs: Set<string>) > + > +PortStaticAddresses(.lsport = port_uuid, > + .ip4addrs = set_unions(ip4_addrs), > + .ip6addrs = set_unions(ip6_addrs)) :- > + nb::Logical_Switch_Port(._uuid = port_uuid, .addresses = addresses), > + var address = FlatMap(if (set_is_empty(addresses)) { set_singleton("") } else { addresses }), > + (var ip4addrs, var ip6addrs) = if (not is_dynamic_lsp_address(address)) { > + split_addresses(address) > + } else { (set_empty(), set_empty()) }, > + var static_addrs = (ip4addrs, ip6addrs).group_by(port_uuid).group_unzip(), > + (var ip4_addrs, var ip6_addrs) = static_addrs. > + > +relation PortInGroup(port: uuid, group: uuid) > + > +PortInGroup(port, group) :- > + nb::Port_Group(._uuid = group, .ports = ports), > + var port = FlatMap(ports). > + > +/* All ACLs associated with logical switch */ > +relation LogicalSwitchACL(ls: uuid, acl: uuid) > + > +LogicalSwitchACL(ls, acl) :- > + nb::Logical_Switch(._uuid = ls, .acls = acls), > + var acl = FlatMap(acls). > + > +LogicalSwitchACL(ls, acl) :- > + nb::Logical_Switch(._uuid = ls, .ports = ports), > + var port_id = FlatMap(ports), > + PortInGroup(port_id, group_id), > + nb::Port_Group(._uuid = group_id, .acls = acls), > + var acl = FlatMap(acls). > + > +relation LogicalSwitchStatefulACL(ls: uuid, acl: uuid) > + > +LogicalSwitchStatefulACL(ls, acl) :- > + LogicalSwitchACL(ls, acl), > + nb::ACL(._uuid = acl, .action = "allow-related"). > + > +relation LogicalSwitchHasStatefulACL(ls: uuid, has_stateful_acl: bool) > + > +LogicalSwitchHasStatefulACL(ls, true) :- > + LogicalSwitchStatefulACL(ls, _). > + > +LogicalSwitchHasStatefulACL(ls, false) :- > + nb::Logical_Switch(._uuid = ls), > + not LogicalSwitchStatefulACL(ls, _). > + > +relation LogicalSwitchLocalnetPort0(ls_uuid: uuid, lsp_name: string) > +LogicalSwitchLocalnetPort0(ls_uuid, lsp_name) :- > + ls in nb::Logical_Switch(._uuid = ls_uuid), > + var lsp_uuid = FlatMap(ls.ports), > + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), > + lsp.__type == "localnet", > + var lsp_name = lsp.name. > + > +relation LogicalSwitchLocalnetPorts(ls_uuid: uuid, localnet_port_names: Vec<string>) > +LogicalSwitchLocalnetPorts(ls_uuid, localnet_port_names) :- > + LogicalSwitchLocalnetPort0(ls_uuid, lsp_name), > + var localnet_port_names = lsp_name.group_by(ls_uuid).to_vec(). > +LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :- > + ls in nb::Logical_Switch(), > + var ls_uuid = ls._uuid, > + not LogicalSwitchLocalnetPort0(ls_uuid, _). > + > +/* Flatten the list of dns_records in Logical_Switch */ > +relation LogicalSwitchDNS(ls_uuid: uuid, dns_uuid: uuid) > + > +LogicalSwitchDNS(ls._uuid, dns_uuid) :- > + nb::Logical_Switch[ls], > + var dns_uuid = FlatMap(ls.dns_records), > + nb::DNS(._uuid = dns_uuid). > + > +relation LogicalSwitchWithDNSRecords(ls: uuid) > + > +LogicalSwitchWithDNSRecords(ls) :- > + LogicalSwitchDNS(ls, dns_uuid), > + nb::DNS(._uuid = dns_uuid, .records = records), > + not map_is_empty(records). > + > +relation LogicalSwitchHasDNSRecords(ls: uuid, has_dns_records: bool) > + > +LogicalSwitchHasDNSRecords(ls, true) :- > + LogicalSwitchWithDNSRecords(ls). > + > +LogicalSwitchHasDNSRecords(ls, false) :- > + nb::Logical_Switch(._uuid = ls), > + not LogicalSwitchWithDNSRecords(ls). > + > +relation LogicalSwitchHasNonRouterPort0(ls: uuid) > +LogicalSwitchHasNonRouterPort0(ls_uuid) :- > + ls in nb::Logical_Switch(._uuid = ls_uuid), > + var lsp_uuid = FlatMap(ls.ports), > + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), > + lsp.__type != "router". > + > +relation LogicalSwitchHasNonRouterPort(ls: uuid, has_non_router_port: bool) > +LogicalSwitchHasNonRouterPort(ls, true) :- > + LogicalSwitchHasNonRouterPort0(ls). > +LogicalSwitchHasNonRouterPort(ls, false) :- > + nb::Logical_Switch(._uuid = ls), > + not LogicalSwitchHasNonRouterPort0(ls). > + > +/* Switch relation collects all attributes of a logical switch */ > + > +relation &Switch( > + ls: nb::Logical_Switch, > + has_stateful_acl: bool, > + has_lb_vip: bool, > + has_dns_records: bool, > + localnet_port_names: Vec<string>, > + subnet: Option<(in_addr/*subnet*/, in_addr/*mask*/, bit<32>/*start_ipv4*/, bit<32>/*total_ipv4s*/)>, > + ipv6_prefix: Option<in6_addr>, > + mcast_cfg: Ref<McastSwitchCfg>, > + is_vlan_transparent: bool, > + > + /* Does this switch have at least one port with type != "router"? */ > + has_non_router_port: bool > +) > + > +function ipv6_parse_prefix(s: string): Option<in6_addr> { > + if (string_contains(s, "/")) { > + match (ipv6_parse_cidr(s)) { > + Right{(addr, 64)} -> Some{addr}, > + _ -> None > + } > + } else { > + ipv6_parse(s) > + } > +} > + > +&Switch(.ls = ls, > + .has_stateful_acl = has_stateful_acl, > + .has_lb_vip = has_lb_vip, > + .has_dns_records = has_dns_records, > + .localnet_port_names = localnet_port_names, > + .subnet = subnet, > + .ipv6_prefix = ipv6_prefix, > + .mcast_cfg = mcast_cfg, > + .has_non_router_port = has_non_router_port, > + .is_vlan_transparent = is_vlan_transparent) :- > + nb::Logical_Switch[ls], > + LogicalSwitchHasStatefulACL(ls._uuid, has_stateful_acl), > + LogicalSwitchHasLBVIP(ls._uuid, has_lb_vip), > + LogicalSwitchHasDNSRecords(ls._uuid, has_dns_records), > + LogicalSwitchLocalnetPorts(ls._uuid, localnet_port_names), > + LogicalSwitchHasNonRouterPort(ls._uuid, has_non_router_port), > + mcast_cfg in &McastSwitchCfg(.datapath = ls._uuid), > + var subnet = > + match (map_get(ls.other_config, "subnet")) { > + None -> None, > + Some{subnet_str} -> { > + match (ip_parse_masked(subnet_str)) { > + Left{err} -> { > + warn("bad 'subnet' ${subnet_str}"); > + None > + }, > + Right{(subnet, mask)} -> { > + if (ip_count_cidr_bits(mask) == Some{32} > + or not ip_is_cidr(mask)) { > + warn("bad 'subnet' ${subnet_str}"); > + None > + } else { > + Some{(subnet, mask, (iptohl(subnet) & iptohl(mask)) + 1, ~iptohl(mask))} > + } > + } > + } > + } > + }, > + var ipv6_prefix = > + match (map_get(ls.other_config, "ipv6_prefix")) { > + None -> None, > + Some{prefix} -> ipv6_parse_prefix(prefix) > + }, > + var is_vlan_transparent = map_get_bool_def(ls.other_config, "vlan-passthru", false). > + > +/* SwitchLB: many-to-many relation between logical switches and nb::LB */ > +relation SwitchLB(sw_uuid: uuid, lb: Ref<nb::Load_Balancer>) > +SwitchLB(sw_uuid, lb) :- > + nb::Logical_Switch(._uuid = sw_uuid, .load_balancer = lb_ids), > + var lb_id = FlatMap(lb_ids), > + lb in &LoadBalancerRef[nb::Load_Balancer{._uuid = lb_id}]. > + > +/* Load balancer VIPs associated with switch */ > +relation SwitchLBVIP(sw_uuid: uuid, lb: Ref<nb::Load_Balancer>, vip: string, backends: string) > +SwitchLBVIP(sw_uuid, lb, vip, backends) :- > + SwitchLB(sw_uuid, lb@(&nb::Load_Balancer{.vips = vips})), > + var kv = FlatMap(vips), > + (var vip, var backends) = kv. > + > +relation LogicalSwitchHasLBVIP(sw_uuid: uuid, has_lb_vip: bool) > +LogicalSwitchHasLBVIP(sw_uuid, true) :- > + SwitchLBVIP(.sw_uuid = sw_uuid). > +LogicalSwitchHasLBVIP(sw_uuid, false) :- > + nb::Logical_Switch(._uuid = sw_uuid), > + not SwitchLBVIP(.sw_uuid = sw_uuid). > + > +relation &LBVIP( > + lb: Ref<nb::Load_Balancer>, > + vip_key: string, > + vip_addr: v46_ip, > + vip_port: bit<16>, > + backend_ips: string) > + > +&LBVIP(.lb = lb, > + .vip_key = vip_key, > + .vip_addr = vip_addr, > + .vip_port = vip_port, > + .backend_ips = backend_ips) :- > + LoadBalancerRef[lb], > + var vip = FlatMap(lb.vips), > + (var vip_key, var backend_ips) = vip, > + Some{(var vip_addr, var vip_port)} = ip_address_and_port_from_lb_key(vip_key). > + > +typedef svc_monitor = SvcMonitor{ > + port_name: string, // Might name a switch or router port. > + src_ip: string > +} > + > +relation &LBVIPBackend( > + lbvip: Ref<LBVIP>, > + ip: v46_ip, > + port: bit<16>, > + svc_monitor: Option<svc_monitor>) > + > +function parse_ip_port_mapping(mappings: Map<string,string>, ip: v46_ip) > + : Option<svc_monitor> { > + for (kv in mappings) { > + (var key, var value) = kv; > + if (ip46_parse(key) == Some{ip}) { > + var strs = string_split(value, ":"); > + if (vec_len(strs) != 2) { > + return None > + }; > + > + return match ((vec_nth(strs, 0), vec_nth(strs, 1))) { > + (Some{port_name}, Some{src_ip}) -> Some{SvcMonitor{port_name, src_ip}}, > + _ -> None > + } > + } > + }; > + return None > +} > + > +&LBVIPBackend(.lbvip = lbvip, > + .ip = ip, > + .port = port, > + .svc_monitor = svc_monitor) :- > + LBVIP[lbvip], > + var backend = FlatMap(string_split(lbvip.backend_ips, ",")), > + Some{(var ip, var port)} = ip_address_and_port_from_lb_key(backend), > + (var svc_monitor) = parse_ip_port_mapping(lbvip.lb.ip_port_mappings, ip). > + > +function is_online(status: Option<string>): bool = { > + match (status) { > + Some{s} -> s == "online", > + _ -> true > + } > +} > +function default_protocol(protocol: Option<string>): string = { > + match (protocol) { > + Some{x} -> x, > + None -> "tcp" > + } > +} > +relation &LBVIPBackendStatus( > + port: bit<16>, > + ip: v46_ip, > + protocol: string, > + logical_port: string, > + up: bool) > +&LBVIPBackendStatus(port, ip, protocol, logical_port, up) :- > + sm in sb::Service_Monitor(), > + var port = sm.port as bit<16>, > + Some{var ip} = ip46_parse(sm.ip), > + var protocol = default_protocol(sm.protocol), > + var logical_port = sm.logical_port, > + var up = is_online(sm.status). > +&LBVIPBackendStatus(port, ip, protocol, logical_port, true) :- > + LBVIPBackend[lbvipbackend], > + var port = lbvipbackend.port as bit<16>, > + var ip = lbvipbackend.ip, > + var protocol = default_protocol(lbvipbackend.lbvip.lb.protocol), > + Some{var svc_monitor} = lbvipbackend.svc_monitor, > + var logical_port = svc_monitor.port_name, > + not sb::Service_Monitor(.port = port as bit<64>, > + .ip = "${ip}", > + .protocol = Some{protocol}, > + .logical_port = logical_port). > + > +/* SwitchPortDHCPv4Options: many-to-one relation between logical switches and DHCPv4 options */ > +relation SwitchPortDHCPv4Options( > + port: Ref<SwitchPort>, > + dhcpv4_options: Ref<nb::DHCP_Options>) > + > +SwitchPortDHCPv4Options(port, options) :- > + port in &SwitchPort(.lsp = lsp), > + port.lsp.__type != "external", > + Some{var dhcpv4_uuid} = lsp.dhcpv4_options, > + options in &DHCP_OptionsRef[nb::DHCP_Options{._uuid = dhcpv4_uuid}]. > + > +/* SwitchPortDHCPv6Options: many-to-one relation between logical switches and DHCPv4 options */ > +relation SwitchPortDHCPv6Options( > + port: Ref<SwitchPort>, > + dhcpv6_options: Ref<nb::DHCP_Options>) > + > +SwitchPortDHCPv6Options(port, options) :- > + port in &SwitchPort(.lsp = lsp), > + port.lsp.__type != "external", > + Some{var dhcpv6_uuid} = lsp.dhcpv6_options, > + options in &DHCP_OptionsRef[nb::DHCP_Options{._uuid = dhcpv6_uuid}]. > + > +/* SwitchQoS: many-to-one relation between logical switches and nb::QoS */ > +relation SwitchQoS(sw: Ref<Switch>, qos: Ref<nb::QoS>) > + > +SwitchQoS(sw, qos) :- > + sw in &Switch(.ls = nb::Logical_Switch{.qos_rules = qos_rules}), > + var qos_rule = FlatMap(qos_rules), > + qos in &QoSRef[nb::QoS{._uuid = qos_rule}]. > + > +/* SwitchACL: many-to-many relation between logical switches and ACLs */ > +relation &SwitchACL(sw: Ref<Switch>, > + acl: Ref<nb::ACL>) > + > +&SwitchACL(.sw = sw, .acl = acl) :- > + LogicalSwitchACL(sw_uuid, acl_uuid), > + sw in &Switch(.ls = nb::Logical_Switch{._uuid = sw_uuid}), > + acl in &ACLRef[nb::ACL{._uuid = acl_uuid}]. > + > +relation SwitchPortUp(lsp: uuid, up: bool) > + > +SwitchPortUp(lsp, up) :- > + nb::Logical_Switch_Port(._uuid = lsp, .name = lsp_name, .__type = __type), > + sb::Port_Binding(.logical_port = lsp_name, .chassis = chassis), > + var up = > + if (__type == "router") { > + true > + } else if (is_none(chassis)) { > + false > + } else { > + true > + }. > + > +SwitchPortUp(lsp, up) :- > + nb::Logical_Switch_Port(._uuid = lsp, .name = lsp_name, .__type = __type), > + not sb::Port_Binding(.logical_port = lsp_name), > + var up = __type == "router". > + > +relation SwitchPortHAChassisGroup0(lsp_uuid: uuid, hac_group_uuid: uuid) > +SwitchPortHAChassisGroup0(lsp_uuid, ha_chassis_group_uuid(ls_uuid)) :- > + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), > + lsp.__type == "external", > + Some{var hac_group_uuid} = lsp.ha_chassis_group, > + ha_chassis_group in nb::HA_Chassis_Group(._uuid = hac_group_uuid), > + /* If the group is empty, then HA_Chassis_Group record will not be created in SB, > + * and so we should not create a reference to the group in Port_Binding table, > + * to avoid integrity violation. */ > + not set_is_empty(ha_chassis_group.ha_chassis), > + LogicalSwitchPort(.lport = lsp_uuid, .lswitch = ls_uuid). > +relation SwitchPortHAChassisGroup(lsp_uuid: uuid, hac_group_uuid: Option<uuid>) > +SwitchPortHAChassisGroup(lsp_uuid, Some{hac_group_uuid}) :- > + SwitchPortHAChassisGroup0(lsp_uuid, hac_group_uuid). > +SwitchPortHAChassisGroup(lsp_uuid, None) :- > + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), > + not SwitchPortHAChassisGroup0(lsp_uuid, _). > + > +/* SwitchPort relation collects all attributes of a logical switch port > + * - `peer` - peer router port, if any > + * - `static_dynamic_mac` - port has a "dynamic" address that contains a static MAC, > + * e.g., "80:fa:5b:06:72:b7 dynamic" > + * - `static_dynamic_ipv4`, `static_dynamic_ipv6` - port has a "dynamic" address that contains a static IP, > + * e.g., "dynamic 192.168.1.2" > + * - `needs_dynamic_ipv4address` - port requires a dynamically allocated IPv4 address > + * - `needs_dynamic_macaddress` - port requires a dynamically allocated MAC address > + * - `needs_dynamic_tag` - port requires a dynamically allocated tag > + * - `up` - true if the port is bound to a chassis or has type "" > + * - 'hac_group_uuid' - uuid of sb::HA_Chassis_Group, only for "external" ports > + */ > +relation &SwitchPort( > + lsp: nb::Logical_Switch_Port, > + json_name: string, > + sw: Ref<Switch>, > + peer: Option<Ref<RouterPort>>, > + static_addresses: Vec<lport_addresses>, > + dynamic_address: Option<lport_addresses>, > + static_dynamic_mac: Option<eth_addr>, > + static_dynamic_ipv4: Option<in_addr>, > + static_dynamic_ipv6: Option<in6_addr>, > + ps_addresses: Vec<lport_addresses>, > + ps_eth_addresses: Vec<string>, > + parent_name: Option<string>, > + needs_dynamic_ipv4address: bool, > + needs_dynamic_macaddress: bool, > + needs_dynamic_ipv6address: bool, > + needs_dynamic_tag: bool, > + up: bool, > + mcast_cfg: Ref<McastPortCfg>, > + hac_group_uuid: Option<uuid> > +) > + > +&SwitchPort(.lsp = lsp, > + .json_name = json_string_escape(lsp.name), > + .sw = sw, > + .peer = peer, > + .static_addresses = static_addresses, > + .dynamic_address = dynamic_address, > + .static_dynamic_mac = static_dynamic_mac, > + .static_dynamic_ipv4 = static_dynamic_ipv4, > + .static_dynamic_ipv6 = static_dynamic_ipv6, > + .ps_addresses = ps_addresses, > + .ps_eth_addresses = ps_eth_addresses, > + .parent_name = parent_name, > + .needs_dynamic_ipv4address = needs_dynamic_ipv4address, > + .needs_dynamic_macaddress = needs_dynamic_macaddress, > + .needs_dynamic_ipv6address = needs_dynamic_ipv6address, > + .needs_dynamic_tag = needs_dynamic_tag, > + .up = up, > + .mcast_cfg = mcast_cfg, > + .hac_group_uuid = hac_group_uuid) :- > + nb::Logical_Switch_Port[lsp], > + LogicalSwitchPort(lsp._uuid, lswitch_uuid), > + sw in &Switch(.ls = nb::Logical_Switch{._uuid = lswitch_uuid, .other_config = other_config}, > + .subnet = subnet, > + .ipv6_prefix = ipv6_prefix), > + SwitchRouterPeerRef(lsp._uuid, peer), > + SwitchPortUp(lsp._uuid, up), > + mcast_cfg in &McastPortCfg(.port = lsp._uuid, .router_port = false), > + var static_addresses = { > + var static_addresses = vec_empty(); > + for (addr in lsp.addresses) { > + if ((addr != "router") and (not is_dynamic_lsp_address(addr))) { > + match (extract_lsp_addresses(addr)) { > + None -> (), > + Some{lport_addr} -> vec_push(static_addresses, lport_addr) > + } > + } else () > + }; > + static_addresses > + }, > + var ps_addresses = { > + var ps_addresses = vec_empty(); > + for (addr in lsp.port_security) { > + match (extract_lsp_addresses(addr)) { > + None -> (), > + Some{lport_addr} -> vec_push(ps_addresses, lport_addr) > + } > + }; > + ps_addresses > + }, > + var ps_eth_addresses = { > + var ps_eth_addresses = vec_empty(); > + for (ps_addr in ps_addresses) { > + vec_push(ps_eth_addresses, "${ps_addr.ea}") > + }; > + ps_eth_addresses > + }, > + var dynamic_address = match (lsp.dynamic_addresses) { > + None -> None, > + Some{lport_addr} -> extract_lsp_addresses(lport_addr) > + }, > + (var static_dynamic_mac, > + var static_dynamic_ipv4, > + var static_dynamic_ipv6, > + var has_dyn_lsp_addr) = { > + var dynamic_address_request = None; > + for (addr in lsp.addresses) { > + dynamic_address_request = parse_dynamic_address_request(addr); > + if (is_some(dynamic_address_request)) { > + break > + } > + }; > + > + match (dynamic_address_request) { > + Some{DynamicAddressRequest{mac, ipv4, ipv6}} -> (mac, ipv4, ipv6, true), > + None -> (None, None, None, false) > + } > + }, > + var needs_dynamic_ipv4address = has_dyn_lsp_addr and is_none(peer) and is_some(subnet) and > + is_none(static_dynamic_ipv4), > + var needs_dynamic_macaddress = has_dyn_lsp_addr and is_none(peer) and is_none(static_dynamic_mac) and > + (is_some(subnet) or is_some(ipv6_prefix) or > + map_get(other_config, "mac_only") == Some{"true"}), > + var needs_dynamic_ipv6address = has_dyn_lsp_addr and is_none(peer) and is_some(ipv6_prefix) and is_none(static_dynamic_ipv6), > + var parent_name = match (lsp.parent_name) { > + None -> None, > + Some{pname} -> if (pname == "") { None } else { Some{pname} } > + }, > + /* Port needs dynamic tag if it has a parent and its `tag_request` is 0. */ > + var needs_dynamic_tag = is_some(parent_name) and > + lsp.tag_request == Some{0}, > + SwitchPortHAChassisGroup(.lsp_uuid = lsp._uuid, > + .hac_group_uuid = hac_group_uuid). > + > +/* Switch port port security addresses */ > +relation SwitchPortPSAddresses(port: Ref<SwitchPort>, > + ps_addrs: lport_addresses) > + > +SwitchPortPSAddresses(port, ps_addrs) :- > + port in &SwitchPort(.ps_addresses = ps_addresses), > + var ps_addrs = FlatMap(ps_addresses). > + > +/* All static addresses associated with a port parsed into > + * the lport_addresses data structure */ > +relation SwitchPortStaticAddresses(port: Ref<SwitchPort>, > + addrs: lport_addresses) > +SwitchPortStaticAddresses(port, addrs) :- > + port in &SwitchPort(.static_addresses = static_addresses), > + var addrs = FlatMap(static_addresses). > + > +/* All static and dynamic addresses associated with a port parsed into > + * the lport_addresses data structure */ > +relation SwitchPortAddresses(port: Ref<SwitchPort>, > + addrs: lport_addresses) > + > +SwitchPortAddresses(port, addrs) :- SwitchPortStaticAddresses(port, addrs). > + > +SwitchPortAddresses(port, dynamic_address) :- > + SwitchPortNewDynamicAddress(port, Some{dynamic_address}). > + > +/* "router" is a special Logical_Switch_Port address value that indicates that the Ethernet, IPv4, and IPv6 > + * this port should be obtained from the connected logical router port, as specified by router-port in > + * options. > + * > + * The resulting addresses are used to populate the logical switch’s destination lookup, and also for the > + * logical switch to generate ARP and ND replies. > + * > + * If the connected logical router port is a distributed gateway port and the logical router has rules > + * specified in nat with external_mac, then those addresses are also used to populate the switch’s destination > + * lookup. */ > +SwitchPortAddresses(port, addrs) :- > + port in &SwitchPort(.lsp = lsp, .peer = Some{&rport}), > + Some{var addrs} = { > + var opt_addrs = None; > + for (addr in lsp.addresses) { > + if (addr == "router") { > + opt_addrs = Some{rport.networks} > + } else () > + }; > + opt_addrs > + }. > + > +/* All static and dynamic IPv4 addresses associated with a port */ > +relation SwitchPortIPv4Address(port: Ref<SwitchPort>, > + ea: eth_addr, > + addr: ipv4_netaddr) > + > +SwitchPortIPv4Address(port, ea, addr) :- > + SwitchPortAddresses(port, LPortAddress{.ea = ea, .ipv4_addrs = addrs}), > + var addr = FlatMap(addrs). > + > +/* All static and dynamic IPv6 addresses associated with a port */ > +relation SwitchPortIPv6Address(port: Ref<SwitchPort>, > + ea: eth_addr, > + addr: ipv6_netaddr) > + > +SwitchPortIPv6Address(port, ea, addr) :- > + SwitchPortAddresses(port, LPortAddress{.ea = ea, .ipv6_addrs = addrs}), > + var addr = FlatMap(addrs). > + > +/* Service monitoring. */ > + > +/* MAC allocated for service monitor usage. Just one mac is allocated > + * for this purpose and ovn-controller's on each chassis will make use > + * of this mac when sending out the packets to monitor the services > + * defined in Service_Monitor Southbound table. Since these packets > + * all locally handled, having just one mac is good enough. */ > +function get_svc_monitor_mac(options: Map<string,string>, uuid: uuid) > + : eth_addr = > +{ > + var existing_mac = match ( > + map_get(options, "svc_monitor_mac")) > + { > + Some{mac} -> scan_eth_addr(mac), > + None -> None > + }; > + match (existing_mac) { > + Some{mac} -> mac, > + None -> eth_addr_from_uint64(pseudorandom_mac(uuid, 'h5678)) > + } > +} > +function put_svc_monitor_mac(options: Map<string,string>, > + svc_monitor_mac: eth_addr) : Map<string,string> = > +{ > + map_insert_imm(options, "svc_monitor_mac", to_string(svc_monitor_mac)) > +} > +relation SvcMonitorMac(mac: eth_addr) > +SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :- > + nb::NB_Global(._uuid = uuid, .options = options). > diff --git a/northd/multicast.dl b/northd/multicast.dl > new file mode 100644 > index 000000000000..3f108c85ef7d > --- /dev/null > +++ b/northd/multicast.dl > @@ -0,0 +1,259 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +import OVN_Northbound as nb > +import OVN_Southbound as sb > +import ovn > +import ovsdb > +import helpers > +import lswitch > +import lrouter > + > +function mCAST_DEFAULT_MAX_ENTRIES(): integer = 2048 > + > +function mCAST_DEFAULT_IDLE_TIMEOUT_S(): integer = 300 > +function mCAST_DEFAULT_MIN_IDLE_TIMEOUT_S(): integer = 15 > +function mCAST_DEFAULT_MAX_IDLE_TIMEOUT_S(): integer = 3600 > + > +function mCAST_DEFAULT_MIN_QUERY_INTERVAL_S(): integer = 1 > +function mCAST_DEFAULT_MAX_QUERY_INTERVAL_S(): integer = > + mCAST_DEFAULT_MAX_IDLE_TIMEOUT_S() > + > +function mCAST_DEFAULT_QUERY_MAX_RESPONSE_S(): integer = 1 > + > +/* IP Multicast per switch configuration. */ > +relation &McastSwitchCfg( > + datapath : uuid, > + enabled : bool, > + querier : bool, > + flood_unreg : bool, > + eth_src : string, > + ip4_src : string, > + ip6_src : string, > + table_size : integer, > + idle_timeout : integer, > + query_interval: integer, > + query_max_resp: integer > +) > + > + /* FIXME: Right now table_size is enforced only in ovn-controller but in > + * the ovn-northd C version we enforce it on the aggregate groups too. > + */ > + > +&McastSwitchCfg( > + .datapath = ls_uuid, > + .enabled = map_get_bool_def(other_config, "mcast_snoop", > + false), > + .querier = map_get_bool_def(other_config, "mcast_querier", > + true), > + .flood_unreg = map_get_bool_def(other_config, > + "mcast_flood_unregistered", > + false), > + .eth_src = map_get_str_def(other_config, "mcast_eth_src", ""), > + .ip4_src = map_get_str_def(other_config, "mcast_ip4_src", ""), > + .ip6_src = map_get_str_def(other_config, "mcast_ip6_src", ""), > + .table_size = map_get_int_def(other_config, > + "mcast_table_size", > + mCAST_DEFAULT_MAX_ENTRIES()), > + .idle_timeout = idle_timeout, > + .query_interval = query_interval, > + .query_max_resp = query_max_resp) :- > + nb::Logical_Switch(._uuid = ls_uuid, > + .other_config = other_config), > + var idle_timeout = > + map_get_int_def_limit(other_config, "mcast_idle_timeout", > + mCAST_DEFAULT_IDLE_TIMEOUT_S(), > + mCAST_DEFAULT_MIN_IDLE_TIMEOUT_S(), > + mCAST_DEFAULT_MAX_IDLE_TIMEOUT_S()), > + var query_interval = > + map_get_int_def_limit(other_config, "mcast_query_interval", > + idle_timeout / 2, > + mCAST_DEFAULT_MIN_QUERY_INTERVAL_S(), > + mCAST_DEFAULT_MAX_QUERY_INTERVAL_S()), > + var query_max_resp = > + map_get_int_def(other_config, "mcast_query_max_response", > + mCAST_DEFAULT_QUERY_MAX_RESPONSE_S()). > + > +/* IP Multicast per router configuration. */ > +relation &McastRouterCfg( > + datapath: uuid, > + relay : bool > +) > + > +&McastRouterCfg(lr_uuid, mcast_relay) :- > + nb::Logical_Router(._uuid = lr_uuid, .options = options), > + var mcast_relay = map_get_bool_def(options, "mcast_relay", false). > + > +/* IP Multicast port configuration. */ > +relation &McastPortCfg( > + port : uuid, > + router_port : bool, > + flood : bool, > + flood_reports : bool > +) > + > +&McastPortCfg(lsp_uuid, false, flood, flood_reports) :- > + nb::Logical_Switch_Port(._uuid = lsp_uuid, .options = options), > + var flood = map_get_bool_def(options, "mcast_flood", false), > + var flood_reports = map_get_bool_def(options, "mcast_flood_reports", > + false). > + > +&McastPortCfg(lrp_uuid, true, flood, flood) :- > + nb::Logical_Router_Port(._uuid = lrp_uuid, .options = options), > + var flood = map_get_bool_def(options, "mcast_flood", false). > + > +/* Mapping between Switch and the set of router port uuids on which to flood > + * IP multicast for relay. > + */ > +relation SwitchMcastFloodRelayPorts(sw: Ref<Switch>, ports: Set<uuid>) > + > +SwitchMcastFloodRelayPorts(switch, relay_ports) :- > + &SwitchPort( > + .lsp = lsp, > + .sw = switch, > + .peer = Some{&RouterPort{.router = &Router{.mcast_cfg = &mcast_cfg}}} > + ), mcast_cfg.relay, > + var relay_ports = lsp._uuid.group_by(switch).to_set(). > + > +SwitchMcastFloodRelayPorts(switch, set_empty()) :- > + Switch[switch], > + not &SwitchPort( > + .sw = switch, > + .peer = Some{ > + &RouterPort{ > + .router = &Router{.mcast_cfg = &McastRouterCfg{.relay=true}} > + } > + } > + ). > + > +/* Mapping between Switch and the set of port uuids on which to > + * flood IP multicast statically. > + */ > +relation SwitchMcastFloodPorts(sw: Ref<Switch>, ports: Set<uuid>) > + > +SwitchMcastFloodPorts(switch, flood_ports) :- > + &SwitchPort( > + .lsp = lsp, > + .sw = switch, > + .mcast_cfg = &McastPortCfg{.flood = true}), > + var flood_ports = lsp._uuid.group_by(switch).to_set(). > + > +SwitchMcastFloodPorts(switch, set_empty()) :- > + Switch[switch], > + not &SwitchPort( > + .sw = switch, > + .mcast_cfg = &McastPortCfg{.flood = true}). > + > +/* Mapping between Switch and the set of port uuids on which to > + * flood IP multicast reports statically. > + */ > +relation SwitchMcastFloodReportPorts(sw: Ref<Switch>, ports: Set<uuid>) > + > +SwitchMcastFloodReportPorts(switch, flood_ports) :- > + &SwitchPort( > + .lsp = lsp, > + .sw = switch, > + .mcast_cfg = &McastPortCfg{.flood_reports = true}), > + var flood_ports = lsp._uuid.group_by(switch).to_set(). > + > +SwitchMcastFloodReportPorts(switch, set_empty()) :- > + Switch[switch], > + not &SwitchPort( > + .sw = switch, > + .mcast_cfg = &McastPortCfg{.flood_reports = true}). > + > +/* Mapping between Router and the set of port uuids on which to > + * flood IP multicast reports statically. > + */ > +relation RouterMcastFloodPorts(sw: Ref<Router>, ports: Set<uuid>) > + > +RouterMcastFloodPorts(router, flood_ports) :- > + &RouterPort( > + .lrp = lrp, > + .router = router, > + .mcast_cfg = &McastPortCfg{.flood = true} > + ), > + var flood_ports = lrp._uuid.group_by(router).to_set(). > + > +RouterMcastFloodPorts(router, set_empty()) :- > + Router[router], > + not &RouterPort( > + .router = router, > + .mcast_cfg = &McastPortCfg{.flood = true}). > + > +/* Flattened IGMP group. One record per address-port tuple. */ > +relation IgmpSwitchGroupPort( > + address: string, > + switch : Ref<Switch>, > + port : uuid > +) > + > +IgmpSwitchGroupPort(address, switch, lsp_uuid) :- > + sb::IGMP_Group(.address = address, .datapath = igmp_dp_set, > + .ports = pb_ports), > + var pb_port_uuid = FlatMap(pb_ports), > + sb::Port_Binding(._uuid = pb_port_uuid, .logical_port = lsp_name), > + &SwitchPort( > + .lsp = nb::Logical_Switch_Port{._uuid = lsp_uuid, .name = lsp_name}, > + .sw = switch). > + > +/* Aggregated IGMP group: merges all IgmpSwitchGroupPort for a given > + * address-switch tuple from all chassis. > + */ > +relation IgmpSwitchMulticastGroup( > + address: string, > + switch : Ref<Switch>, > + ports : Set<uuid> > +) > + > +IgmpSwitchMulticastGroup(address, switch, ports) :- > + IgmpSwitchGroupPort(address, switch, port), > + var ports = port.group_by((address, switch)).to_set(). > + > +/* Flattened IGMP group representation for routers with relay enabled. One > + * record per address-port tuple for all IGMP groups learned by switches > + * connected to the router. > + */ > +relation IgmpRouterGroupPort( > + address: string, > + router : Ref<Router>, > + port : uuid > +) > + > +IgmpRouterGroupPort(address, rtr_port.router, rtr_port.lrp._uuid) :- > + SwitchMcastFloodRelayPorts(switch, sw_flood_ports), > + IgmpSwitchMulticastGroup(address, switch, _), > + /* For IPv6 only relay routable multicast groups > + * (RFC 4291 2.7). > + */ > + match (ipv6_parse(address)) { > + Some{ipv6} -> ipv6_is_routable_multicast(ipv6), > + None -> true > + }, > + var flood_port = FlatMap(sw_flood_ports), > + &SwitchPort(.lsp = nb::Logical_Switch_Port{._uuid = flood_port}, > + .peer = Some{&rtr_port}). > + > +/* Aggregated IGMP group for routers: merges all IgmpRouterGroupPort for > + * a given address-router tuple from all connected switches. > + */ > +relation IgmpRouterMulticastGroup( > + address: string, > + router : Ref<Router>, > + ports : Set<uuid> > +) > + > +IgmpRouterMulticastGroup(address, router, ports) :- > + IgmpRouterGroupPort(address, router, port), > + var ports = port.group_by((address, router)).to_set(). > diff --git a/northd/ovn-nb.dlopts b/northd/ovn-nb.dlopts > new file mode 100644 > index 000000000000..0682c14cf406 > --- /dev/null > +++ b/northd/ovn-nb.dlopts > @@ -0,0 +1,13 @@ > +-o Logical_Router_Port > +--rw Logical_Router_Port.ipv6_prefix > +-o Logical_Switch_Port > +--rw Logical_Switch_Port.tag > +--rw Logical_Switch_Port.dynamic_addresses > +--rw Logical_Switch_Port.up > +-o NB_Global > +--rw NB_Global.sb_cfg > +--rw NB_Global.hv_cfg > +--rw NB_Global.options > +--rw NB_Global.ipsec > +--rw NB_Global.nb_cfg_timestamp > +--rw NB_Global.hv_cfg_timestamp > diff --git a/northd/ovn-northd-ddlog.c b/northd/ovn-northd-ddlog.c > new file mode 100644 > index 000000000000..c929afa46258 > --- /dev/null > +++ b/northd/ovn-northd-ddlog.c > @@ -0,0 +1,1752 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include <config.h> > + > +#include <getopt.h> > +#include <stdlib.h> > +#include <stdio.h> > +#include <fcntl.h> > +#include <unistd.h> > + > +#include "command-line.h" > +#include "daemon.h" > +#include "fatal-signal.h" > +#include "hash.h" > +#include "jsonrpc.h" > +#include "lib/ovn-util.h" > +#include "openvswitch/hmap.h" > +#include "openvswitch/json.h" > +#include "openvswitch/poll-loop.h" > +#include "openvswitch/vlog.h" > +#include "ovsdb-data.h" > +#include "ovsdb-error.h" > +#include "ovsdb-parser.h" > +#include "ovsdb-types.h" > +#include "ovsdb/ovsdb.h" > +#include "ovsdb/table.h" > +#include "stream-ssl.h" > +#include "stream.h" > +#include "unixctl.h" > +#include "util.h" > +#include "uuid.h" > + > +#include "northd/ovn_northd_ddlog/ddlog.h" > + > +VLOG_DEFINE_THIS_MODULE(ovn_northd); > + > +#include "northd/ovn-northd-ddlog-nb.inc" > +#include "northd/ovn-northd-ddlog-sb.inc" > + > +struct northd_status { > + bool locked; > + bool pause; > +}; > + > +static unixctl_cb_func ovn_northd_exit; > +static unixctl_cb_func ovn_northd_pause; > +static unixctl_cb_func ovn_northd_resume; > +static unixctl_cb_func ovn_northd_is_paused; > +static unixctl_cb_func ovn_northd_status; > + > +/* --ddlog-record: The name of a file to which to record DDlog commands for > + * later replay. Useful for debugging. If null (by default), DDlog commands > + * are not recorded. */ > +static const char *record_file; > + > +static const char *ovnnb_db; > +static const char *ovnsb_db; > +static const char *unixctl_path; > + > +/* Frequently used table ids. */ > +static table_id WARNING_TABLE_ID; > +static table_id NB_CFG_TIMESTAMP_ID; > + > +/* Initialize frequently used table ids. */ > +static void init_table_ids(void) > +{ > + WARNING_TABLE_ID = ddlog_get_table_id("Warning"); > + NB_CFG_TIMESTAMP_ID = ddlog_get_table_id("NbCfgTimestamp"); > +} > + > +/* > + * Accumulates DDlog delta to be sent to OVSDB. > + * > + * FIXME: There is currently no global northd state descriptor shared by NB and > + * SB connections. We should probably introduce it and move this variable there > + * instead of declaring it as a global variable. > + */ > +static ddlog_delta *delta; > + > + > +/* Connection state machine. > + * > + * When a JSON-RPC session connects, sends a "get_schema" request > + * and transitions to S_SCHEMA_REQUESTED. */ > +#define STATES \ > + /* Waiting for "get_schema" reply. Once received, sends \ > + * "monitor" request whose details are informed by the \ > + * schema, and transitions to S_MONITOR_REQUESTED. */ \ > + STATE(S_SCHEMA_REQUESTED) \ > + \ > + /* Waits for "monitor" reply. On failure, transitions to \ > + * S_ERROR. If successful, replaces our snapshot of database \ > + * contents by the data carried in the reply and: \ > + * \ > + * - If this database needs a lock: \ > + * \ > + * + If northd is not paused, sends a lock request and \ > + * transitions to S_LOCK_REQUESTED. \ > + * \ > + * + If northd is paused, transition to S_PAUSED. \ > + * \ > + * - Otherwise, if there are any output-only tables, sends \ > + * "transact" request for their data and transitions to \ > + * S_OUTPUT_ONLY_DATA_REQUESTED. \ > + * \ > + * - Otherwise, transitions to S_MONITORING. */ \ > + STATE(S_MONITOR_REQUESTED) \ > + \ > + /* We need the lock and we're paused. We haven't requested \ > + * the lock (or we unlocked it). \ > + * \ > + * Waits for northd to be un-paused. Then, sends a lock \ > + * request and transitions to S_LOCK_REQUESTED. */ \ > + STATE(S_PAUSED) \ > + \ > + /* We're waiting for a reply for our lock request. Once we \ > + * get the reply: \ > + * \ > + * - If we did get the lock: \ > + * \ > + * + If there are any output-only tables, send \ > + * "transact" request for their data and transition \ > + * to S_OUTPUT_ONLY_DATA_REQUESTED. \ > + * \ > + * + Otherwise, transition to S_MONITORING. \ > + * \ > + * - If we didn't get the lock, transition to S_LOCK_CONTENDED. \ > + * \ > + * (We must ignore notifications that we got or lost the lock \ > + * when we're in this state, because they must be old.) */ \ > + STATE(S_LOCK_REQUESTED) \ > + \ > + /* We got a negative reply to our lock request. We're \ > + * waiting for a notification that we got the lock. \ > + * \ > + * (It's important that we ignore notifications that we got \ > + * the lock when we're not in this state, because they must \ > + * be old.) \ > + * \ > + * When we get the lock: \ > + * \ > + * - If there are any output-only tables, send "transact" \ > + * request for their data and transition to \ > + * S_OUTPUT_ONLY_DATA_REQUESTED. \ > + * \ > + * - Otherwise, transition to S_MONITORING. */ \ > + STATE(S_LOCK_CONTENDED) \ > + \ > + /* Waits for reply to "transact" request for data in output-only \ > + * tables. Once received, uses the data to initialize the local \ > + * idea of what's in those tables, and transitions to \ > + * S_MONITORING. \ > + * \ > + * If we get a notification that we lost the lock, transition \ > + * to S_LOCK_CONTENDED. */ \ > + STATE(S_OUTPUT_ONLY_DATA_REQUESTED) \ > + \ > + /* State that just processes "update" notifications for the \ > + * database. \ > + * \ > + * If we get a notification that we lost the lock, transition \ > + * to S_LOCK_CONTENDED. */ \ > + STATE(S_MONITORING) \ > + \ > + /* Terminal error state that indicates that nothing useful can be \ > + * done, for example because the database server doesn't actually \ > + * have the desired database. We maintain the session with the \ > + * database server anyway. If it starts serving the database \ > + * that we want, or if someone fixes and restarts the database, \ > + * then it will kill the session and we will automatically \ > + * reconnect and try again. */ \ > + STATE(S_ERROR) \ > + \ > + /* Terminal state that indicates we connected to a useless server \ > + * in a cluster, e.g. one that is partitioned from the rest of \ > + * the cluster. We're waiting to retry. */ \ > + STATE(S_RETRY) > + > +enum northd_state { > +#define STATE(NAME) NAME, > + STATES > +#undef STATE > +}; > + > +static const char * > +northd_state_to_string(enum northd_state state) > +{ > + switch (state) { > +#define STATE(NAME) case NAME: return #NAME; > + STATES > +#undef STATE > + default: return "<unknown>"; > + } > +} > + > +enum northd_monitoring { > + NORTHD_NOT_MONITORING, /* Database is not being monitored. */ > + NORTHD_MONITORING, /* Database has "monitor" outstanding. */ > + NORTHD_MONITORING_COND, /* Database has "monitor_cond" outstanding. */ > +}; > + > +struct northd_ctx { > + ddlog_prog ddlog; > + char *prefix; > + const char **input_relations; > + const char **output_relations; > + const char **output_only_relations; > + > + bool has_timestamp_columns; > + > + /* Session state. > + * > + *'state_seqno' is a snapshot of the session's sequence number as returned > + * jsonrpc_session_get_seqno(session), so if it differs from the value that > + * function currently returns then the session has reconnected and the > + * state machine must restart. */ > + struct jsonrpc_session *session; /* Connection to the server. */ > + enum northd_state state; /* Current session state. */ > + unsigned int state_seqno; /* See above. */ > + struct json *request_id; /* JSON ID for request awaiting reply. */ > + > + /* Database info. */ > + char *db_name; > + struct json *monitor_id; > + struct json *schema; > + struct json *output_only_data; > + enum northd_monitoring monitoring; > + > + /* Database locking. */ > + const char *lock_name; /* Name of lock we need, NULL if none. */ > + bool paused; > +}; > + > +enum lock_status { > + NOT_LOCKED, /* We don't have the lock and we didn't ask for it. */ > + REQUESTED_LOCK, /* We asked for the lock but we didn't get it yet. */ > + HAS_LOCK, /* We have the lock. */ > +}; > + > +static enum lock_status northd_lock_status(const struct northd_ctx *); > + > +static void northd_send_unlock_request(struct northd_ctx *); > + > +static bool northd_parse_lock_reply(const struct json *result); > + > +static void northd_handle_update(struct northd_ctx *, bool clear, > + const struct json *table_updates); > +static struct json *get_database_ops(struct northd_ctx *); > +static int ddlog_clear(struct northd_ctx *); > + > +static void > +northd_ctx_connection_status(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *ctx_) > +{ > + const struct northd_ctx *ctx = ctx_; > + bool connected = jsonrpc_session_is_connected(ctx->session); > + unixctl_command_reply(conn, connected ? "connected" : "not connected"); > +} > + > +static void > +northd_ctx_cluster_state_reset(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *ctx OVS_UNUSED) > +{ > + VLOG_INFO("XXX cluster state tracking not yet implemented"); > + unixctl_command_reply(conn, NULL); > +} > + > +static struct northd_ctx * > +northd_ctx_create(const char *server, const char *database, > + const char *unixctl_command_prefix, > + const char *lock_name, > + ddlog_prog ddlog, > + const char **input_relations, > + const char **output_relations, > + const char **output_only_relations) > +{ > + struct northd_ctx *ctx; > + > + ctx = xzalloc(sizeof *ctx); > + ctx->prefix = xasprintf("%s::", database); > + ctx->session = jsonrpc_session_open(server, true); > + ctx->state_seqno = UINT_MAX; > + ctx->request_id = NULL; > + > + ctx->input_relations = input_relations; > + ctx->output_relations = output_relations; > + ctx->output_only_relations = output_only_relations; > + > + ctx->db_name = xstrdup(database); > + ctx->monitor_id = json_array_create_2(json_string_create("monid"), > + json_string_create(database)); > + ctx->lock_name = lock_name; > + > + ctx->ddlog = ddlog; > + > + char *cmd = xasprintf("%s-connection-status", unixctl_command_prefix); > + unixctl_command_register(cmd, "", 0, 0, > + northd_ctx_connection_status, ctx); > + free(cmd); > + > + cmd = xasprintf("%s-cluster-state-reset", unixctl_command_prefix); > + unixctl_command_register(cmd, "", 0, 0, > + northd_ctx_cluster_state_reset, NULL); > + free(cmd); > + > + return ctx; > +} > + > +static void > +northd_ctx_destroy(struct northd_ctx *ctx) > +{ > + if (ctx) { > + jsonrpc_session_close(ctx->session); > + > + json_destroy(ctx->monitor_id); > + json_destroy(ctx->schema); > + json_destroy(ctx->output_only_data); > + > + json_destroy(ctx->request_id); > + free(ctx); > + } > +} > + > +/* Forces 'ctx' to drop its connection to the database and reconnect. */ > +static void > +northd_force_reconnect(struct northd_ctx *ctx) > +{ > + if (ctx->session) { > + jsonrpc_session_force_reconnect(ctx->session); > + } > +} > + > +static void northd_transition_at(struct northd_ctx *, enum northd_state, > + const char *where); > +#define northd_transition(CTX, STATE) \ > + northd_transition_at(CTX, STATE, OVS_SOURCE_LOCATOR) > + > +static void > +northd_transition_at(struct northd_ctx *ctx, enum northd_state new_state, > + const char *where) > +{ > + VLOG_DBG("%s: %s -> %s at %s", > + ctx->session ? jsonrpc_session_get_name(ctx->session) : "void", > + northd_state_to_string(ctx->state), > + northd_state_to_string(new_state), > + where); > + ctx->state = new_state; > +} > + > +#define northd_retry(CTX) northd_retry_at(CTX, OVS_SOURCE_LOCATOR) > +static void > +northd_retry_at(struct northd_ctx *ctx, const char *where) > +{ > + northd_send_unlock_request(ctx); > + > + if (ctx->session && jsonrpc_session_get_n_remotes(ctx->session) > 1) { > + northd_force_reconnect(ctx); > + northd_transition_at(ctx, S_RETRY, where); > + } else { > + northd_transition_at(ctx, S_ERROR, where); > + } > +} > + > +/* Returns true if 'ctx' is configured to obtain a lock and owns that lock. > + * > + * Locking and unlocking happens asynchronously from the database client's > + * point of view, so the information is only useful for optimization (e.g. if > + * the client doesn't have the lock then there's no point in trying to write to > + * the database). */ > +static enum lock_status > +northd_lock_status(const struct northd_ctx *ctx) > +{ > + if (!ctx->lock_name) { > + return NOT_LOCKED; > + } > + > + switch (ctx->state) { > + case S_SCHEMA_REQUESTED: > + case S_MONITOR_REQUESTED: > + case S_PAUSED: > + case S_ERROR: > + case S_RETRY: > + return NOT_LOCKED; > + > + case S_LOCK_REQUESTED: > + case S_LOCK_CONTENDED: > + return REQUESTED_LOCK; > + > + case S_OUTPUT_ONLY_DATA_REQUESTED: > + case S_MONITORING: > + return HAS_LOCK; > + } > + > + OVS_NOT_REACHED(); > +} > + > +static void > +northd_send_request(struct northd_ctx *ctx, struct jsonrpc_msg *request) > +{ > + json_destroy(ctx->request_id); > + ctx->request_id = json_clone(request->id); > + if (ctx->session) { > + jsonrpc_session_send(ctx->session, request); > + } > +} > + > +static void > +northd_send_schema_request(struct northd_ctx *ctx) > +{ > + northd_send_request(ctx, jsonrpc_create_request( > + "get_schema", > + json_array_create_1(json_string_create( > + ctx->db_name)), > + NULL)); > +} > + > +static void > +northd_send_transact(struct northd_ctx *ctx, struct json *ddlog_ops) > +{ > + struct json *comment = json_object_create(); > + json_object_put_string(comment, "op", "comment"); > + json_object_put_string(comment, "comment", "ovn-northd-ddlog"); > + json_array_add(ddlog_ops, comment); > + > + if (ctx->lock_name) { > + struct json *assertion = json_object_create(); > + json_object_put_string(assertion, "op", "assert"); > + json_object_put_string(assertion, "lock", ctx->lock_name); > + json_array_add(ddlog_ops, assertion); > + } > + > + northd_send_request(ctx, jsonrpc_create_request("transact", ddlog_ops, > + NULL)); > +} > + > +static bool > +northd_send_monitor_request(struct northd_ctx *ctx) > +{ > + struct ovsdb_schema *schema; > + struct ovsdb_error *error = ovsdb_schema_from_json(ctx->schema, &schema); > + if (error) { > + VLOG_ERR("couldn't parse schema (%s)", ovsdb_error_to_string(error)); > + return false; > + } > + > + const struct ovsdb_table_schema *nb_global = shash_find_data( > + &schema->tables, "NB_Global"); > + ctx->has_timestamp_columns > + = (nb_global > + && shash_find_data(&nb_global->columns, "nb_cfg_timestamp") > + && shash_find_data(&nb_global->columns, "sb_cfg_timestamp")); > + > + struct json *monitor_requests = json_object_create(); > + > + /* This should be smarter about ignoring not needed ones. There's a lot > + * more logic for this in ovsdb_idl_send_monitor_request(). */ > + size_t n = shash_count(&schema->tables); > + const struct shash_node **nodes = shash_sort(&schema->tables); > + for (int i = 0; i < n; i++) { > + struct ovsdb_table_schema *table = nodes[i]->data; > + > + /* Only subscribe to input relations we care about. */ > + for (const char **p = ctx->input_relations; *p; p++) { > + if (!strcmp(table->name, *p)) { > + json_object_put(monitor_requests, table->name, > + json_array_create_1(json_object_create())); > + break; > + } > + } > + } > + free(nodes); > + > + ovsdb_schema_destroy(schema); > + > + northd_send_request( > + ctx, > + jsonrpc_create_request( > + "monitor", > + json_array_create_3(json_string_create(ctx->db_name), > + json_clone(ctx->monitor_id), monitor_requests), > + NULL)); > + return true; > +} > + > +/* Sends the database server a request for all the row UUIDs in output-only > + * tables. */ > +static void > +northd_send_output_only_data_request(struct northd_ctx *ctx) > +{ > + json_destroy(ctx->output_only_data); > + ctx->output_only_data = NULL; > + > + struct json *ops = json_array_create_1(json_string_create(ctx->db_name)); > + for (size_t i = 0; ctx->output_only_relations[i]; i++) { > + const char *table = ctx->output_only_relations[i]; > + struct json *op = json_object_create(); > + json_object_put_string(op, "op", "select"); > + json_object_put_string(op, "table", table); > + json_object_put(op, "columns", > + json_array_create_1(json_string_create("_uuid"))); > + json_object_put(op, "where", json_array_create_empty()); > + json_array_add(ops, op); > + } > + VLOG_WARN("sending output-only data request"); > + > + northd_send_request(ctx, > + jsonrpc_create_request("transact", ops, NULL)); > +} > + > +static struct jsonrpc_msg * > +northd_compose_lock_request__(struct northd_ctx *ctx, const char *method) > +{ > + struct json *params = json_array_create_1(json_string_create( > + ctx->lock_name)); > + return jsonrpc_create_request(method, params, NULL); > +} > + > +static void > +northd_send_lock_request(struct northd_ctx *ctx) > +{ > + northd_send_request(ctx, northd_compose_lock_request__(ctx, "lock")); > +} > + > +/* This sends an unlock request, if 'ctx' has a defined lock and > + * is in a state that holds a lock or has requested a lock. > + * > + * When this sends an unlock request, the caller needs to > + * transition 'ctx' to some other state (because otherwise the > + * current state is still defined as holding or requesting a > + * lock). */ > +static void > +northd_send_unlock_request(struct northd_ctx *ctx) > +{ > + if (ctx->lock_name && northd_lock_status(ctx) != NOT_LOCKED) { > + northd_send_request(ctx, northd_compose_lock_request__(ctx, "unlock")); > + > + /* We don't care to track the unlock reply. */ > + free(ctx->request_id); > + ctx->request_id = NULL; > + } > +} > + > +static bool > +northd_process_response(struct northd_ctx *ctx, struct jsonrpc_msg *msg) > +{ > + if (msg->type != JSONRPC_REPLY && msg->type != JSONRPC_ERROR) { > + return false; > + } > + > + if (!json_equal(ctx->request_id, msg->id)) { > + return false; > + } > + json_destroy(ctx->request_id); > + ctx->request_id = NULL; > + > + if (msg->type == JSONRPC_ERROR) { > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); > + char *s = jsonrpc_msg_to_string(msg); > + VLOG_INFO_RL(&rl, "%s: received unexpected %s response in " > + "%s state: %s", jsonrpc_session_get_name(ctx->session), > + jsonrpc_msg_type_to_string(msg->type), > + northd_state_to_string(ctx->state), > + s); > + free(s); > + northd_retry(ctx); > + return true; > + } > + > + switch (ctx->state) { > + case S_SCHEMA_REQUESTED: > + json_destroy(ctx->schema); > + ctx->schema = json_clone(msg->result); > + if (northd_send_monitor_request(ctx)) { > + northd_transition(ctx, S_MONITOR_REQUESTED); > + } else { > + northd_retry(ctx); > + } > + break; > + > + case S_MONITOR_REQUESTED: > + ctx->monitoring = NORTHD_MONITORING; > + northd_handle_update(ctx, true, msg->result); > + if (ctx->paused) { > + northd_transition(ctx, S_PAUSED); > + } else if (ctx->lock_name) { > + northd_send_lock_request(ctx); > + northd_transition(ctx, S_LOCK_REQUESTED); > + } else if (ctx->output_only_relations[0]) { > + northd_send_output_only_data_request(ctx); > + northd_transition(ctx, S_OUTPUT_ONLY_DATA_REQUESTED); > + } else { > + northd_transition(ctx, S_MONITORING); > + } > + break; > + > + case S_PAUSED: > + /* (No outstanding requests.) */ > + break; > + > + case S_LOCK_REQUESTED: > + if (northd_parse_lock_reply(msg->result)) { > + /* We got the lock. */ > + if (ctx->output_only_relations[0]) { > + northd_send_output_only_data_request(ctx); > + northd_transition(ctx, S_OUTPUT_ONLY_DATA_REQUESTED); > + } else { > + northd_transition(ctx, S_MONITORING); > + } > + } else { > + /* We did not get the lock. */ > + northd_transition(ctx, S_LOCK_CONTENDED); > + } > + break; > + > + case S_LOCK_CONTENDED: > + /* (No outstanding requests.) */ > + break; > + > + case S_OUTPUT_ONLY_DATA_REQUESTED: > + ctx->output_only_data = msg->result; > + msg->result = NULL; > + northd_transition(ctx, S_MONITORING); > + break; > + > + case S_MONITORING: > + break; > + > + case S_ERROR: > + case S_RETRY: > + /* Nothing to do in this state. */ > + break; > + > + default: > + OVS_NOT_REACHED(); > + } > + > + return true; > +} > + > +static bool > +northd_handle_update_rpc(struct northd_ctx *ctx, > + const struct jsonrpc_msg *msg) > +{ > + if (msg->type == JSONRPC_NOTIFY) { > + if (!strcmp(msg->method, "update") > + && msg->params->type == JSON_ARRAY > + && msg->params->array.n == 2 > + && json_equal(msg->params->array.elems[0], ctx->monitor_id)) { > + northd_handle_update(ctx, false, msg->params->array.elems[1]); > + return true; > + } > + } > + return false; > +} > + > +static void > +northd_pause(struct northd_ctx *ctx) > +{ > + if (!ctx->paused && ctx->lock_name && ctx->state != S_PAUSED) { > + ctx->paused = true; > + VLOG_INFO("This ovn-northd instance is now paused."); > + if (northd_lock_status(ctx) != NOT_LOCKED) { > + northd_send_unlock_request(ctx); > + } > + if (ctx->state > S_PAUSED) { > + northd_transition(ctx, S_PAUSED); > + } > + } > +} > + > +static void > +northd_unpause(struct northd_ctx *ctx) > +{ > + if (ctx->paused) { > + ovs_assert(ctx->lock_name); > + > + switch (ctx->state) { > + case S_SCHEMA_REQUESTED: > + case S_MONITOR_REQUESTED: > + /* Nothing to do. */ > + break; > + > + case S_PAUSED: > + northd_send_lock_request(ctx); > + northd_transition(ctx, S_LOCK_REQUESTED); > + break; > + > + case S_LOCK_REQUESTED: > + case S_LOCK_CONTENDED: > + case S_OUTPUT_ONLY_DATA_REQUESTED: > + case S_MONITORING: > + case S_ERROR: > + case S_RETRY: > + OVS_NOT_REACHED(); > + } > + > + ctx->paused = false; > + } > + > +} > + > +static bool > +northd_process_lock_notify(struct northd_ctx *ctx, > + const struct jsonrpc_msg *msg) > +{ > + if (msg->type != JSONRPC_NOTIFY) { > + return false; > + } > + > + int got_lock = (!strcmp(msg->method, "locked") ? true > + : !strcmp(msg->method, "stolen") ? false > + : -1); > + if (got_lock < 0) { > + return false; > + } > + > + if (!ctx->lock_name > + || msg->params->type != JSON_ARRAY > + || json_array(msg->params)->n != 1 > + || json_array(msg->params)->elems[0]->type != JSON_STRING) { > + return false; > + } > + > + const char *lock_name = json_string(json_array(msg->params)->elems[0]); > + if (strcmp(ctx->lock_name, lock_name)) { > + return false; > + } > + > + switch (ctx->state) { > + case S_SCHEMA_REQUESTED: > + case S_MONITOR_REQUESTED: > + case S_PAUSED: > + case S_LOCK_REQUESTED: > + case S_ERROR: > + case S_RETRY: > + /* Ignore lock notification. It must be stale, resulting > + * from an old "lock" request. */ > + VLOG_DBG("received stale lock notification \"%s\" in state %s", > + msg->method, northd_state_to_string(ctx->state)); > + return true; > + > + case S_LOCK_CONTENDED: > + if (got_lock) { > + if (ctx->output_only_relations[0]) { > + northd_send_output_only_data_request(ctx); > + northd_transition(ctx, S_OUTPUT_ONLY_DATA_REQUESTED); > + } else { > + northd_transition(ctx, S_MONITORING); > + } > + } else { > + /* Should not be possible: we know that we received a > + * reply to our lock request, which means that there > + * should be no outstanding stale lock > + * notifications. */ > + VLOG_WARN("\"stolen\" notification in LOCK_CONTENDED state"); > + } > + return true; > + > + case S_OUTPUT_ONLY_DATA_REQUESTED: > + case S_MONITORING: > + if (!got_lock) { > + VLOG_INFO("northd lock stolen by another client"); > + northd_transition(ctx, S_LOCK_CONTENDED); > + } else { > + /* Should not be possible: we already had the * lock. */ > + VLOG_WARN("\"locked\" notification in %s state", > + northd_state_to_string(ctx->state)); > + } > + return true; > + } > + OVS_NOT_REACHED(); > +} > + > +static bool > +northd_parse_lock_reply(const struct json *result) > +{ > + if (result->type == JSON_OBJECT) { > + const struct json *locked > + = shash_find_data(json_object(result), "locked"); > + return locked && locked->type == JSON_TRUE; > + } else { > + return false; > + } > +} > + > +static void > +northd_process_msg(struct northd_ctx *ctx, struct jsonrpc_msg *msg) > +{ > + if (!northd_process_response(ctx, msg) > + && !northd_process_lock_notify(ctx, msg) > + && !northd_handle_update_rpc(ctx, msg)) { > + /* Unknown message. Log at debug level because this can > + * happen if northd_txn_destroy() is called to destroy a > + * transaction before we receive the reply, or in other > + * corner cases. */ > + char *s = jsonrpc_msg_to_string(msg); > + VLOG_DBG("%s: received unexpected %s message: %s", > + jsonrpc_session_get_name(ctx->session), > + jsonrpc_msg_type_to_string(msg->type), s); > + free(s); > + } > +} > + > +/* Processes a batch of messages from the database server on 'ctx'. */ > +static void > +northd_run(struct northd_ctx *ctx, bool run_deltas) > +{ > + if (!ctx->session) { > + return; > + } > + > + for (int i = 0; jsonrpc_session_is_connected(ctx->session) && i < 50; > + i++) { > + struct jsonrpc_msg *msg; > + unsigned int seqno; > + > + seqno = jsonrpc_session_get_seqno(ctx->session); > + if (ctx->state_seqno != seqno) { > + ctx->state_seqno = seqno; > + > + if (ctx->state != S_PAUSED) { > + northd_send_schema_request(ctx); > + ctx->state = S_SCHEMA_REQUESTED; > + } > + } > + > + msg = jsonrpc_session_recv(ctx->session); > + if (!msg) { > + break; > + } > + northd_process_msg(ctx, msg); > + jsonrpc_msg_destroy(msg); > + } > + jsonrpc_session_run(ctx->session); > + > + if (run_deltas && !ctx->request_id) { > + struct json *ops = get_database_ops(ctx); > + if (ops) { > + northd_send_transact(ctx, ops); > + } > + } > +} > + > +static void > +northd_update_probe_interval_cb( > + uintptr_t probe_intervalp_, > + table_id table OVS_UNUSED, > + const ddlog_record *rec, > + ssize_t weight OVS_UNUSED) > +{ > + int *probe_intervalp = (int *) probe_intervalp_; > + > + uint64_t x = ddlog_get_u64(rec); > + if (x > 1000) { > + *probe_intervalp = x; > + } > +} > + > +static void > +set_probe_interval(struct jsonrpc_session *session, int override_interval) > +{ > +#define DEFAULT_PROBE_INTERVAL_MSEC 5000 > + const char *name = jsonrpc_session_get_name(session); > + int default_interval = (!stream_or_pstream_needs_probes(name) > + ? 0 : DEFAULT_PROBE_INTERVAL_MSEC); > + jsonrpc_session_set_probe_interval(session, > + MAX(override_interval, default_interval)); > +} > + > +static void > +northd_update_probe_interval(struct northd_ctx *nb, struct northd_ctx *sb) > +{ > + /* -1 means the default probe interval. */ > + int probe_interval = -1; > + table_id tid = ddlog_get_table_id("Northd_Probe_Interval"); > + ddlog_delta *probe_delta = ddlog_delta_get_table(delta, tid); > + ddlog_delta_enumerate(probe_delta, northd_update_probe_interval_cb, (uintptr_t) &probe_interval); > + > + set_probe_interval(nb->session, probe_interval); > + set_probe_interval(sb->session, probe_interval); > + jsonrpc_session_set_probe_interval(sb->session, probe_interval); > +} > + > +/* Arranges for poll_block() to wake up when northd_run() has something to > + * do or when activity occurs on a transaction on 'ctx'. */ > +static void > +northd_wait(struct northd_ctx *ctx) > +{ > + if (!ctx->session) { > + return; > + } > + jsonrpc_session_wait(ctx->session); > + jsonrpc_session_recv_wait(ctx->session); > +} > + > +/* ddlog-specific actions. */ > + > +/* Generate OVSDB update command for delta-plus, delta-minus, and delta-update > + * tables. */ > +static void > +ddlog_table_update_deltas(struct ds *ds, ddlog_prog ddlog, > + const char *db, const char *table) > +{ > + int error; > + char *updates; > + > + error = ddlog_dump_ovsdb_delta_tables(ddlog, delta, db, table, &updates); > + if (error) { > + VLOG_INFO("DDlog error %d dumping delta for table %s", error, table); > + return; > + } > + > + if (!updates[0]) { > + ddlog_free_json(updates); > + return; > + } > + > + ds_put_cstr(ds, updates); > + ds_put_char(ds, ','); > + ddlog_free_json(updates); > +} > + > +/* Generate OVSDB update command for a output-only table. */ > +static void > +ddlog_table_update_output(struct ds *ds, ddlog_prog ddlog, > + const char *db, const char *table) > +{ > + int error; > + char *updates; > + > + error = ddlog_dump_ovsdb_output_table(ddlog, delta, db, table, &updates); > + if (error) { > + VLOG_WARN("%s: failed to generate update commands for " > + "output-only table (error %d)", table, error); > + return; > + } > + char *table_name = xasprintf("%s::Out_%s", db, table); > + ddlog_delta_clear_table(delta, ddlog_get_table_id(table_name)); > + free(table_name); > + > + if (!updates[0]) { > + ddlog_free_json(updates); > + return; > + } > + > + ds_put_cstr(ds, updates); > + ds_put_char(ds, ','); > + ddlog_free_json(updates); > +} > + > +/* A set of UUIDs. > + * > + * Not fully abstracted: the client still uses plain struct hmap, for > + * example. */ > + > +/* A node within a set of uuids. */ > +struct uuidset_node { > + struct hmap_node hmap_node; > + struct uuid uuid; > +}; > + > +static void uuidset_delete(struct hmap *uuidset, struct uuidset_node *); > + > +static void > +uuidset_destroy(struct hmap *uuidset) > +{ > + if (uuidset) { > + struct uuidset_node *node, *next; > + > + HMAP_FOR_EACH_SAFE (node, next, hmap_node, uuidset) { > + uuidset_delete(uuidset, node); > + } > + hmap_destroy(uuidset); > + } > +} > + > +static struct uuidset_node * > +uuidset_find(struct hmap *uuidset, const struct uuid *uuid) > +{ > + struct uuidset_node *node; > + > + HMAP_FOR_EACH_WITH_HASH (node, hmap_node, uuid_hash(uuid), uuidset) { > + if (uuid_equals(uuid, &node->uuid)) { > + return node; > + } > + } > + > + return NULL; > +} > + > +static void > +uuidset_insert(struct hmap *uuidset, const struct uuid *uuid) > +{ > + if (!uuidset_find(uuidset, uuid)) { > + struct uuidset_node *node = xmalloc(sizeof *node); > + node->uuid = *uuid; > + hmap_insert(uuidset, &node->hmap_node, uuid_hash(&node->uuid)); > + } > +} > + > +static void > +uuidset_delete(struct hmap *uuidset, struct uuidset_node *node) > +{ > + hmap_remove(uuidset, &node->hmap_node); > + free(node); > +} > + > +static struct ovsdb_error * > +parse_output_only_data(const struct json *txn_result, size_t index, > + struct hmap *uuidset) > +{ > + if (txn_result->type != JSON_ARRAY || txn_result->array.n <= index) { > + return ovsdb_syntax_error(txn_result, NULL, > + "transaction result missing for " > + "output-only relation %"PRIuSIZE, index); > + } > + > + struct ovsdb_parser p; > + ovsdb_parser_init(&p, txn_result->array.elems[0], "select result"); > + const struct json *rows = ovsdb_parser_member(&p, "rows", OP_ARRAY); > + struct ovsdb_error *error = ovsdb_parser_finish(&p); > + if (error) { > + return error; > + } > + > + for (size_t i = 0; i < rows->array.n; i++) { > + const struct json *row = rows->array.elems[i]; > + > + ovsdb_parser_init(&p, row, "row"); > + const struct json *uuid = ovsdb_parser_member(&p, "_uuid", OP_ARRAY); > + error = ovsdb_parser_finish(&p); > + if (error) { > + return error; > + } > + > + struct ovsdb_base_type base_type = OVSDB_BASE_UUID_INIT; > + union ovsdb_atom atom; > + error = ovsdb_atom_from_json(&atom, &base_type, uuid, NULL); > + if (error) { > + return error; > + } > + uuidset_insert(uuidset, &atom.uuid); > + } > + > + return NULL; > +} > + > +static bool > +get_ddlog_uuid(const ddlog_record *rec, struct uuid *uuid) > +{ > + if (!ddlog_is_int(rec)) { > + return false; > + } > + > + __uint128_t u128 = ddlog_get_u128(rec); > + uuid->parts[0] = u128 >> 96; > + uuid->parts[1] = u128 >> 64; > + uuid->parts[2] = u128 >> 32; > + uuid->parts[3] = u128; > + return true; > +} > + > +struct dump_index_data { > + ddlog_prog prog; > + struct hmap *rows_present; > + const char *table; > + struct ds *ops_s; > +}; > + > +static void OVS_UNUSED > +index_cb(uintptr_t data_, const ddlog_record *rec) > +{ > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); > + struct dump_index_data *data = (struct dump_index_data *) data_; > + > + /* Extract the rec's row UUID as 'uuid'. */ > + const ddlog_record *rec_uuid = ddlog_get_named_struct_field(rec, "_uuid"); > + if (!rec_uuid) { > + VLOG_WARN_RL(&rl, "%s: row has no _uuid column", data->table); > + return; > + } > + struct uuid uuid; > + if (!get_ddlog_uuid(rec_uuid, &uuid)) { > + VLOG_WARN_RL(&rl, "%s: _uuid column has unexpected type", data->table); > + return; > + } > + > + /* If a row with the given UUID was already in the database, then > + * send a operation to update it; otherwise, send an operation to > + * insert it. */ > + struct uuidset_node *node = uuidset_find(data->rows_present, &uuid); > + char *s = NULL; > + int ret; > + if (node) { > + uuidset_delete(data->rows_present, node); > + ret = ddlog_into_ovsdb_update_str(data->prog, data->table, rec, &s); > + } else { > + ret = ddlog_into_ovsdb_insert_str(data->prog, data->table, rec, &s); > + } > + if (ret) { > + VLOG_WARN_RL(&rl, "%s: ddlog could not convert row into database op", > + data->table); > + return; > + } > + ds_put_format(data->ops_s, "%s,", s); > + ddlog_free_json(s); > +} > + > +static struct json * > +where_uuid_equals(const struct uuid *uuid) > +{ > + return > + json_array_create_1( > + json_array_create_3( > + json_string_create("_uuid"), > + json_string_create("=="), > + json_array_create_2( > + json_string_create("uuid"), > + json_string_create_nocopy( > + xasprintf(UUID_FMT, UUID_ARGS(uuid)))))); > +} > + > +static void > +add_delete_row_op(const char *table, const struct uuid *uuid, struct ds *ops_s) > +{ > + struct json *op = json_object_create(); > + json_object_put_string(op, "op", "delete"); > + json_object_put_string(op, "table", table); > + json_object_put(op, "where", where_uuid_equals(uuid)); > + json_to_ds(op, 0, ops_s); > + json_destroy(op); > + ds_put_char(ops_s, ','); > +} > + > +static void > +northd_update_sb_cfg_cb( > + uintptr_t new_sb_cfgp_, > + table_id table OVS_UNUSED, > + const ddlog_record *rec, > + ssize_t weight) > +{ > + int64_t *new_sb_cfgp = (int64_t *) new_sb_cfgp_; > + > + if (weight < 0) { > + return; > + } > + > + if (ddlog_get_int(rec, NULL, 0) <= sizeof *new_sb_cfgp) { > + *new_sb_cfgp = ddlog_get_i64(rec); > + } > +} > + > +static struct json * > +get_database_ops(struct northd_ctx *ctx) > +{ > + struct ds ops_s = DS_EMPTY_INITIALIZER; > + ds_put_char(&ops_s, '['); > + json_string_escape(ctx->db_name, &ops_s); > + ds_put_char(&ops_s, ','); > + size_t start_len = ops_s.length; > + > + for (const char **p = ctx->output_relations; *p; p++) { > + ddlog_table_update_deltas(&ops_s, ctx->ddlog, ctx->db_name, *p); > + } > + > + if (ctx->output_only_data) { > + /* > + * We just reconnected to the database (or connected for the first time > + * in this execution). We assume that the contents of the output-only > + * tables might have changed (this is especially true the first time we > + * connect to the database a given execution, of course; we can't > + * assume that the tables have any particular contents in this case). > + * > + * ctx->output_only_data is a database reply that tells us the > + * UUIDs of the rows that exist in the database. Our strategy is to > + * compare these UUIDs to the UUIDs of the rows that exist in the DDlog > + * analogues of these tables, and then add, delete, or update rows as > + * necessary. > + * > + * (ctx->output_only_data only gives row UUIDs, not full row > + * contents. That means that for rows that exist in OVSDB and in > + * DDLog, we always send an update to set all the columns. It wouldn't > + * save bandwidth to do anything else, since we'd always have to send > + * the full row contents in one direction and if there were differences > + * we'd have to send the contents in both directions. With this > + * strategy we only send them in one direction even in the worst case.) > + * > + * (We can't just send an operation to delete all the rows and then > + * re-add them all in the same transaction, because ovsdb-server > + * rejecting deleting a row with a given UUID and the adding the same > + * UUID back in a single transaction.) > + */ > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 2); > + > + for (size_t i = 0; ctx->output_only_relations[i]; i++) { > + const char *table = ctx->output_only_relations[i]; > + > + /* Parse the list of row UUIDs received from OVSDB. */ > + struct hmap rows_present = HMAP_INITIALIZER(&rows_present); > + struct ovsdb_error *error = parse_output_only_data( > + ctx->output_only_data, i, &rows_present); > + if (error) { > + char *s = ovsdb_error_to_string_free(error); > + VLOG_WARN_RL(&rl, "%s", s); > + free(s); > + uuidset_destroy(&rows_present); > + continue; > + } > + > + /* Get the index_id for the DDlog table. > + * > + * We require output-only tables to have an accompanying index > + * named <table>_Index. */ > + char *index = xasprintf("%s_Index", table); > + index_id idxid = ddlog_get_index_id(index); > + if (idxid == -1) { > + VLOG_WARN_RL(&rl, "%s: unknown index", index); > + free(index); > + uuidset_destroy(&rows_present); > + continue; > + } > + free(index); > + > + /* For each row in the index, update a corresponding OVSDB row, if > + * there is one, otherwise insert a new row. */ > + struct dump_index_data cbdata = { > + ctx->ddlog, &rows_present, table, &ops_s > + }; > + ddlog_dump_index(ctx->ddlog, idxid, index_cb, (uintptr_t) &cbdata); > + > + /* Any uuids remaining in 'rows_present' are rows that are in OVSDB > + * but not DDlog. Delete them from OVSDB. */ > + struct uuidset_node *node; > + HMAP_FOR_EACH (node, hmap_node, &rows_present) { > + add_delete_row_op(table, &node->uuid, &ops_s); > + } > + uuidset_destroy(&rows_present); > + > + /* Discard any queued output to this table, since we just > + * did a full sync to it. */ > + struct ds tmp = DS_EMPTY_INITIALIZER; > + ddlog_table_update_output(&tmp, ctx->ddlog, ctx->db_name, table); > + ds_destroy(&tmp); > + } > + > + json_destroy(ctx->output_only_data); > + ctx->output_only_data = NULL; > + } else { > + for (const char **p = ctx->output_only_relations; *p; p++) { > + ddlog_table_update_output(&ops_s, ctx->ddlog, ctx->db_name, *p); > + } > + } > + > + /* If we're updating nb::NB_Global.sb_cfg, then also update > + * sb_cfg_timestamp. > + * > + * XXX If the transaction we're sending to the database fails, then > + * currently as written we'll never find out about it and sb_cfg_timestamp > + * will not be updated. > + */ > + static int64_t old_sb_cfg = INT64_MIN; > + static int64_t old_sb_cfg_timestamp = INT64_MIN; > + int64_t new_sb_cfg = old_sb_cfg; > + if (ctx->has_timestamp_columns) { > + table_id sb_cfg_tid = ddlog_get_table_id("SbCfg"); > + ddlog_delta *sb_cfg_delta = ddlog_delta_get_table(delta, sb_cfg_tid); > + ddlog_delta_enumerate(sb_cfg_delta, northd_update_sb_cfg_cb, > + (uintptr_t) &new_sb_cfg); > + ddlog_free_delta(sb_cfg_delta); > + > + if (new_sb_cfg != old_sb_cfg) { > + old_sb_cfg = new_sb_cfg; > + old_sb_cfg_timestamp = time_wall_msec(); > + ds_put_format(&ops_s, "{\"op\":\"update\",\"table\":\"NB_Global\",\"where\":[]," > + "\"row\":{\"sb_cfg_timestamp\":%"PRId64"}},", old_sb_cfg_timestamp); > + } > + } > + > + struct json *ops; > + if (ops_s.length > start_len) { > + ds_chomp(&ops_s, ','); > + ds_put_char(&ops_s, ']'); > + ops = json_from_string(ds_cstr(&ops_s)); > + } else { > + ops = NULL; > + } > + > + ds_destroy(&ops_s); > + > + return ops; > +} > + > +static void > +warning_cb(uintptr_t arg OVS_UNUSED, > + table_id table OVS_UNUSED, > + const ddlog_record *rec, > + ssize_t weight) > +{ > + size_t len; > + const char *s = ddlog_get_str_with_length(rec, &len); > + if (weight > 0) { > + VLOG_WARN("New warning: %.*s", (int)len, s); > + } else { > + VLOG_WARN("Warning cleared: %.*s", (int)len, s); > + } > +} > + > +static int > +ddlog_commit(ddlog_prog ddlog) > +{ > + ddlog_delta *new_delta = ddlog_transaction_commit_dump_changes(ddlog); > + if (!delta) { > + VLOG_WARN("Transaction commit failed"); > + return -1; > + } > + > + /* Remove warnings from delta and output them straight away. */ > + ddlog_delta *warnings = ddlog_delta_remove_table(new_delta, WARNING_TABLE_ID); > + ddlog_delta_enumerate(warnings, warning_cb, 0); > + ddlog_free_delta(warnings); > + > + /* Merge changes into `delta`. */ > + ddlog_delta_union(delta, new_delta); > + > + return 0; > +} > + > +static const struct json * > +json_object_get(const struct json *json, const char *member_name) > +{ > + return (json && json->type == JSON_OBJECT > + ? shash_find_data(json_object(json), member_name) > + : NULL); > +} > + > +/* Returns the new value of NB_Global::nb_cfg, if any, from the updates in > + * <table-updates> provided by the caller, or INT64_MIN if none is present. */ > +static int64_t > +get_nb_cfg(const struct json *table_updates) > +{ > + const struct json *nb_global = json_object_get(table_updates, "NB_Global"); > + if (nb_global) { > + struct shash_node *row; > + SHASH_FOR_EACH (row, json_object(nb_global)) { > + const struct json *value = row->data; > + const struct json *new = json_object_get(value, "new"); > + const struct json *nb_cfg = json_object_get(new, "nb_cfg"); > + if (nb_cfg && nb_cfg->type == JSON_INTEGER) { > + return json_integer(nb_cfg); > + } > + } > + } > + return INT64_MIN; > +} > + > +static void > +northd_handle_update(struct northd_ctx *ctx, bool clear, > + const struct json *table_updates) > +{ > + if (!table_updates) { > + return; > + } > + > + if (ddlog_transaction_start(ctx->ddlog)) { > + VLOG_WARN("DDlog failed to start transaction"); > + return; > + } > + > + if (clear && ddlog_clear(ctx)) { > + goto error; > + } > + char *updates_s = json_to_string(table_updates, 0); > + if (ddlog_apply_ovsdb_updates(ctx->ddlog, ctx->prefix, updates_s)) { > + VLOG_WARN("DDlog failed to apply updates"); > + free(updates_s); > + goto error; > + } > + free(updates_s); > + > + /* Whenever a new 'nb_cfg' value comes in, take the current time and push > + * it into the NbCfgTimestamp relation for the DDlog program to put into > + * nb::NB_Global.nb_cfg_timestamp. */ > + static int64_t old_nb_cfg = INT64_MIN; > + static int64_t old_nb_cfg_timestamp = INT64_MIN; > + int64_t new_nb_cfg = old_nb_cfg; > + int64_t new_nb_cfg_timestamp = old_nb_cfg_timestamp; > + if (ctx->has_timestamp_columns) { > + new_nb_cfg = get_nb_cfg(table_updates); > + if (new_nb_cfg == INT64_MIN) { > + new_nb_cfg = old_nb_cfg == INT64_MIN ? 0 : old_nb_cfg; > + } > + if (new_nb_cfg != old_nb_cfg) { > + new_nb_cfg_timestamp = time_wall_msec(); > + > + ddlog_cmd *updates[2]; > + int n_updates = 0; > + if (old_nb_cfg_timestamp != INT64_MIN) { > + updates[n_updates++] = ddlog_delete_val_cmd( > + NB_CFG_TIMESTAMP_ID, ddlog_i64(old_nb_cfg_timestamp)); > + } > + updates[n_updates++] = ddlog_insert_cmd( > + NB_CFG_TIMESTAMP_ID, ddlog_i64(new_nb_cfg_timestamp)); > + if (ddlog_apply_updates(ctx->ddlog, updates, n_updates) < 0) { > + goto error; > + } > + } > + } > + > + /* Commit changes to DDlog. */ > + if (ddlog_commit(ctx->ddlog)) { > + goto error; > + } > + old_nb_cfg = new_nb_cfg; > + old_nb_cfg_timestamp = new_nb_cfg_timestamp; > + > + /* This update may have implications for the other side, so > + * immediately wake to check for more changes to be applied. */ > + poll_immediate_wake(); > + > + return; > + > +error: > + ddlog_transaction_rollback(ctx->ddlog); > +} > + > +static int > +ddlog_clear(struct northd_ctx *ctx) > +{ > + int n_failures = 0; > + for (int i = 0; ctx->input_relations[i]; i++) { > + char *table = xasprintf("%s%s", ctx->prefix, ctx->input_relations[i]); > + if (ddlog_clear_relation(ctx->ddlog, ddlog_get_table_id(table))) { > + n_failures++; > + } > + free(table); > + } > + if (n_failures) { > + VLOG_WARN("failed to clear %d tables in %s database", > + n_failures, ctx->db_name); > + } > + return n_failures; > +} > + > +/* Callback used by the ddlog engine to print error messages. Note that > + * this is only used by the ddlog runtime, as opposed to the application > + * code in ovn_northd.dl, which uses the vlog facility directly. */ > +static void > +ddlog_print_error(const char *msg) > +{ > + VLOG_ERR("%s", msg); > +} > + > +static void > +usage(void) > +{ > + printf("\ > +%s: OVN northbound management daemon\n\ > +usage: %s [OPTIONS]\n\ > +\n\ > +Options:\n\ > + --ovnnb-db=DATABASE connect to ovn-nb database at DATABASE\n\ > + (default: %s)\n\ > + --ovnsb-db=DATABASE connect to ovn-sb database at DATABASE\n\ > + (default: %s)\n\ > + --unixctl=SOCKET override default control socket name\n\ > + -h, --help display this help message\n\ > + -o, --options list available options\n\ > + -V, --version display version information\n\ > +", program_name, program_name, default_nb_db(), default_sb_db()); > + daemon_usage(); > + vlog_usage(); > + stream_usage("database", true, true, false); > +} > + > +static void > +parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED) > +{ > + enum { > + OVN_DAEMON_OPTION_ENUMS, > + VLOG_OPTION_ENUMS, > + SSL_OPTION_ENUMS, > + OPT_DDLOG_RECORD > + }; > + static const struct option long_options[] = { > + {"ddlog-record", required_argument, NULL, OPT_DDLOG_RECORD}, > + {"ovnsb-db", required_argument, NULL, 'd'}, > + {"ovnnb-db", required_argument, NULL, 'D'}, > + {"unixctl", required_argument, NULL, 'u'}, > + {"help", no_argument, NULL, 'h'}, > + {"options", no_argument, NULL, 'o'}, > + {"version", no_argument, NULL, 'V'}, > + OVN_DAEMON_LONG_OPTIONS, > + VLOG_LONG_OPTIONS, > + STREAM_SSL_LONG_OPTIONS, > + {NULL, 0, NULL, 0}, > + }; > + char *short_options = ovs_cmdl_long_options_to_short_options(long_options); > + > + for (;;) { > + int c; > + > + c = getopt_long(argc, argv, short_options, long_options, NULL); > + if (c == -1) { > + break; > + } > + > + switch (c) { > + OVN_DAEMON_OPTION_HANDLERS; > + VLOG_OPTION_HANDLERS; > + STREAM_SSL_OPTION_HANDLERS; > + > + case OPT_DDLOG_RECORD: > + record_file = optarg; > + break; > + > + case 'd': > + ovnsb_db = optarg; > + break; > + > + case 'D': > + ovnnb_db = optarg; > + break; > + > + case 'u': > + unixctl_path = optarg; > + break; > + > + case 'h': > + usage(); > + exit(EXIT_SUCCESS); > + > + case 'o': > + ovs_cmdl_print_options(long_options); > + exit(EXIT_SUCCESS); > + > + case 'V': > + ovs_print_version(0, 0); > + exit(EXIT_SUCCESS); > + > + default: > + break; > + } > + } > + > + if (!ovnsb_db || !ovnsb_db[0]) { > + ovnsb_db = default_sb_db(); > + } > + > + if (!ovnnb_db || !ovnnb_db[0]) { > + ovnnb_db = default_nb_db(); > + } > + > + free(short_options); > +} > + > +int > +main(int argc, char *argv[]) > +{ > + int res = EXIT_SUCCESS; > + struct unixctl_server *unixctl; > + int retval; > + bool exiting; > + > + init_table_ids(); > + > + fatal_ignore_sigpipe(); > + ovs_cmdl_proctitle_init(argc, argv); > + set_program_name(argv[0]); > + service_start(&argc, &argv); > + parse_options(argc, argv); > + > + daemonize_start(false); > + > + char *abs_unixctl_path = get_abs_unix_ctl_path(unixctl_path); > + retval = unixctl_server_create(abs_unixctl_path, &unixctl); > + free(abs_unixctl_path); > + > + if (retval) { > + exit(EXIT_FAILURE); > + } > + > + struct northd_status status = { > + .locked = false, > + .pause = false, > + }; > + unixctl_command_register("exit", "", 0, 0, ovn_northd_exit, &exiting); > + unixctl_command_register("status", "", 0, 0, ovn_northd_status, &status); > + > + > + ddlog_prog ddlog; > + ddlog = ddlog_run(1, false, NULL, 0, ddlog_print_error, &delta); > + if (!ddlog) { > + ovs_fatal(0, "DDlog instance could not be created"); > + } > + > + int replay_fd = -1; > + if (record_file) { > + replay_fd = open(record_file, O_CREAT | O_WRONLY | O_TRUNC, 0666); > + if (replay_fd < 0) { > + ovs_fatal(errno, "%s: could not create DDlog record file", > + record_file); > + } > + > + if (ddlog_record_commands(ddlog, replay_fd)) { > + ovs_fatal(0, "could not enable DDlog command recording"); > + } > + } > + > + struct northd_ctx *nb_ctx = northd_ctx_create( > + ovnnb_db, "OVN_Northbound", "nb", NULL, ddlog, > + nb_input_relations, nb_output_relations, nb_output_only_relations); > + struct northd_ctx *sb_ctx = northd_ctx_create( > + ovnsb_db, "OVN_Southbound", "sb", "ovn_northd", ddlog, > + sb_input_relations, sb_output_relations, sb_output_only_relations); > + > + unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, sb_ctx); > + unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, sb_ctx); > + unixctl_command_register("is-paused", "", 0, 0, ovn_northd_is_paused, > + sb_ctx); > + > + daemonize_complete(); > + > + /* Main loop. */ > + exiting = false; > + while (!exiting) { > + bool has_lock = northd_lock_status(sb_ctx) == HAS_LOCK; > + if (!sb_ctx->paused) { > + if (has_lock && !status.locked) { > + VLOG_INFO("ovn-northd lock acquired. " > + "This ovn-northd instance is now active."); > + } else if (!has_lock && status.locked) { > + VLOG_INFO("ovn-northd lock lost. " > + "This ovn-northd instance is now on standby."); > + } > + } > + status.locked = has_lock; > + status.pause = sb_ctx->paused; > + > + bool run_deltas = (northd_lock_status(sb_ctx) == HAS_LOCK && > + nb_ctx->state == S_MONITORING && > + sb_ctx->state == S_MONITORING); > + > + northd_run(nb_ctx, run_deltas); > + northd_wait(nb_ctx); > + > + northd_run(sb_ctx, run_deltas); > + northd_wait(sb_ctx); > + > + northd_update_probe_interval(nb_ctx, sb_ctx); > + > + unixctl_server_run(unixctl); > + unixctl_server_wait(unixctl); > + if (exiting) { > + poll_immediate_wake(); > + } > + > + poll_block(); > + if (should_service_stop()) { > + exiting = true; > + } > + } > + > + northd_ctx_destroy(nb_ctx); > + northd_ctx_destroy(sb_ctx); > + > + ddlog_stop(ddlog); > + > + if (replay_fd >= 0) { > + fsync(replay_fd); > + close(replay_fd); > + } > + > + unixctl_server_destroy(unixctl); > + service_stop(); > + > + exit(res); > +} > + > +static void > +ovn_northd_exit(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *exiting_) > +{ > + bool *exiting = exiting_; > + *exiting = true; > + > + unixctl_command_reply(conn, NULL); > +} > + > +static void > +ovn_northd_pause(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *sb_ctx_) > +{ > + struct northd_ctx *sb_ctx = sb_ctx_; > + northd_pause(sb_ctx); > + unixctl_command_reply(conn, NULL); > +} > + > +static void > +ovn_northd_resume(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *sb_ctx_) > +{ > + struct northd_ctx *sb_ctx = sb_ctx_; > + northd_unpause(sb_ctx); > + unixctl_command_reply(conn, NULL); > +} > + > +static void > +ovn_northd_is_paused(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *sb_ctx_) > +{ > + struct northd_ctx *sb_ctx = sb_ctx_; > + if (sb_ctx->paused) { > + unixctl_command_reply(conn, "true"); > + } else { > + unixctl_command_reply(conn, "false"); > + } > +} > + > +static void > +ovn_northd_status(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *status_) > +{ > + struct northd_status *status = status_; > + char *status_string; > + > + if (status->pause) { > + status_string = "paused"; > + } else { > + status_string = status->locked ? "active" : "standby"; > + } > + > + /* > + * Use a labelled formatted output so we can add more to the status command > + * later without breaking any consuming scripts > + */ > + struct ds s = DS_EMPTY_INITIALIZER; > + ds_put_format(&s, "Status: %s\n", status_string); > + unixctl_command_reply(conn, ds_cstr(&s)); > + ds_destroy(&s); > +} > diff --git a/northd/ovn-sb.dlopts b/northd/ovn-sb.dlopts > new file mode 100644 > index 000000000000..41cf201d6536 > --- /dev/null > +++ b/northd/ovn-sb.dlopts > @@ -0,0 +1,28 @@ > +--output-only Logical_Flow > +-o SB_Global > +-o Multicast_Group > +-o Meter > +-o Meter_Band > +-o Datapath_Binding > +-o Port_Binding > +-o Gateway_Chassis > +-o HA_Chassis > +-o HA_Chassis_Group > +-o Port_Group > +-o MAC_Binding > +-o DHCP_Options > +-o DHCPv6_Options > +-o Address_Set > +-o DNS > +-o RBAC_Role > +-o RBAC_Permission > +-o IP_Multicast > +-o Service_Monitor > +--ro Port_Binding.chassis > +--ro Port_Binding.virtual_parent > +--ro Port_Binding.encap > +--ro IP_Multicast.seq_no > +--ro SB_Global.ssl > +--ro SB_Global.connections > +--ro SB_Global.external_ids > +--ro Service_Monitor.status > diff --git a/northd/ovn.dl b/northd/ovn.dl > new file mode 100644 > index 000000000000..e91a4e8a10d0 > --- /dev/null > +++ b/northd/ovn.dl > @@ -0,0 +1,387 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +import ovsdb > + > + > +/* Logical port is enabled if it does not have an enabled flag or the flag is true */ > +function is_enabled(s: Option<bool>): bool = { > + s != Some{false} > +} > + > +/* > + * Ethernet addresses > + */ > +extern type eth_addr > + > +extern function eth_addr_zero(): eth_addr > +extern function eth_addr2string(addr: eth_addr): string > +function to_string(addr: eth_addr): string { > + eth_addr2string(addr) > +} > +extern function scan_eth_addr(s: string): Option<eth_addr> > +extern function scan_eth_addr_prefix(s: string): Option<bit<64>> > +extern function eth_addr_from_string(s: string): Option<eth_addr> > +extern function eth_addr_to_uint64(ea: eth_addr): bit<64> > +extern function eth_addr_from_uint64(x: bit<64>): eth_addr > +extern function eth_addr_mark_random(ea: eth_addr): eth_addr > + > +function pseudorandom_mac(seed: uuid, variant: bit<16>) : bit<64> = { > + eth_addr_to_uint64(eth_addr_mark_random(eth_addr_from_uint64(hash64(seed ++ variant)))) > +} > + > +/* > + * IPv4 addresses > + */ > + > +extern type in_addr > + > +function to_string(ip: in_addr): string = { > + var x = iptohl(ip); > + "${x >> 24}.${(x >> 16) & 'hff}.${(x >> 8) & 'hff}.${x & 'hff}" > +} > + > +function ip_is_cidr(netmask: in_addr): bool { > + var x = ~iptohl(netmask); > + (x & (x + 1)) == 0 > +} > +function ip_is_local_multicast(ip: in_addr): bool { > + (iptohl(ip) & 32'hffffff00) == 32'he0000000 > +} > + > +function ip_create_mask(plen: bit<32>): in_addr { > + hltoip((64'h00000000ffffffff << (32 - plen))[31:0]) > +} > + > +function ip_bitxor(a: in_addr, b: in_addr): in_addr { > + hltoip(iptohl(a) ^ iptohl(b)) > +} > + > +function ip_bitand(a: in_addr, b: in_addr): in_addr { > + hltoip(iptohl(a) & iptohl(b)) > +} > + > +function ip_network(addr: in_addr, mask: in_addr): in_addr { > + hltoip(iptohl(addr) & iptohl(mask)) > +} > + > +function ip_host(addr: in_addr, mask: in_addr): in_addr { > + hltoip(iptohl(addr) & ~iptohl(mask)) > +} > + > +function ip_host_is_zero(addr: in_addr, mask: in_addr): bool { > + ip_is_zero(ip_host(addr, mask)) > +} > + > +function ip_is_zero(a: in_addr): bool { > + iptohl(a) == 0 > +} > + > +function ip_bcast(addr: in_addr, mask: in_addr): in_addr { > + hltoip(iptohl(addr) | ~iptohl(mask)) > +} > + > +extern function ip_parse(s: string): Option<in_addr> > +extern function ip_parse_masked(s: string): Either<string/*err*/, (in_addr/*host_ip*/, in_addr/*mask*/)> > +extern function ip_parse_cidr(s: string): Either<string/*err*/, (in_addr/*ip*/, bit<32>/*plen*/)> > +extern function ip_count_cidr_bits(ip: in_addr): Option<bit<8>> > + > +/* True if both 'ips' are in the same network as defined by netmask 'mask', > + * false otherwise. */ > +function ip_same_network(ips: (in_addr, in_addr), mask: in_addr): bool { > + ((iptohl(ips.0) ^ iptohl(ips.1)) & iptohl(mask)) == 0 > +} > + > +extern function iptohl(addr: in_addr): bit<32> > +extern function hltoip(addr: bit<32>): in_addr > +extern function scan_static_dynamic_ip(s: string): Option<in_addr> > + > +/* > + * parse IPv4 address list of the form: > + * "10.0.0.4 10.0.0.10 10.0.0.20..10.0.0.50 10.0.0.100..10.0.0.110" > + */ > +extern function parse_ip_list(ips: string): Either<string, Vec<(in_addr, Option<in_addr>)>> > + > +/* > + * IPv6 addresses > + */ > +extern type in6_addr > + > +extern function in6_generate_lla(ea: eth_addr): in6_addr > +extern function in6_generate_eui64(ea: eth_addr, prefix: in6_addr): in6_addr > +extern function in6_is_lla(addr: in6_addr): bool > +extern function in6_addr_solicited_node(ip6: in6_addr): in6_addr > + > +extern function ipv6_string_mapped(addr: in6_addr): string > +extern function ipv6_parse_masked(s: string): Either<string/*err*/, (in6_addr/*ip*/, in6_addr/*mask*/)> > +extern function ipv6_parse(s: string): Option<in6_addr> > +extern function ipv6_parse_cidr(s: string): Either<string/*err*/, (in6_addr/*ip*/, bit<32>/*plen*/)> > +extern function ipv6_bitxor(a: in6_addr, b: in6_addr): in6_addr > +extern function ipv6_bitand(a: in6_addr, b: in6_addr): in6_addr > +extern function ipv6_bitnot(a: in6_addr): in6_addr > +extern function ipv6_create_mask(mask: bit<32>): in6_addr > +extern function ipv6_is_zero(a: in6_addr): bool > +extern function ipv6_is_v4mapped(a: in6_addr): bool > +extern function ipv6_is_routable_multicast(a: in6_addr): bool > +extern function ipv6_is_all_hosts(a: in6_addr): bool > + > +function ipv6_network(addr: in6_addr, mask: in6_addr): in6_addr { > + ipv6_bitand(addr, mask) > +} > + > +function ipv6_host(addr: in6_addr, mask: in6_addr): in6_addr { > + ipv6_bitand(addr, ipv6_bitnot(mask)) > +} > + > +/* True if both 'ips' are in the same network as defined by netmask 'mask', > + * false otherwise. */ > +function ipv6_same_network(ips: (in6_addr, in6_addr), mask: in6_addr): bool { > + ipv6_network(ips.0, mask) == ipv6_network(ips.1, mask) > +} > + > +extern function ipv6_host_is_zero(addr: in6_addr, mask: in6_addr): bool > +extern function ipv6_multicast_to_ethernet(ip6: in6_addr): eth_addr > +extern function ipv6_is_cidr(ip6: in6_addr): bool > +extern function ipv6_count_cidr_bits(ip6: in6_addr): Option<bit<8>> > + > +extern function inet6_ntop(addr: in6_addr): string > +function to_string(addr: in6_addr): string = { > + inet6_ntop(addr) > +} > + > +/* > + * IPv4 | IPv6 addresses > + */ > + > +typedef v46_ip = IPv4 { ipv4: in_addr } | IPv6 { ipv6: in6_addr } > + > +function ip46_parse_cidr(s: string) : Option<(v46_ip, bit<32>)> = { > + match (ip_parse_cidr(s)) { > + Right{(ipv4, plen)} -> return Some{(IPv4{ipv4}, plen)}, > + _ -> () > + }; > + match (ipv6_parse_cidr(s)) { > + Right{(ipv6, plen)} -> return Some{(IPv6{ipv6}, plen)}, > + _ -> () > + }; > + None > +} > +function ip46_parse_masked(s: string) : Option<(v46_ip, v46_ip)> = { > + match (ip_parse_masked(s)) { > + Right{(ipv4, mask)} -> return Some{(IPv4{ipv4}, IPv4{mask})}, > + _ -> () > + }; > + match (ipv6_parse_masked(s)) { > + Right{(ipv6, mask)} -> return Some{(IPv6{ipv6}, IPv6{mask})}, > + _ -> () > + }; > + None > +} > +function ip46_parse(s: string) : Option<v46_ip> = { > + match (ip_parse(s)) { > + Some{ipv4} -> return Some{IPv4{ipv4}}, > + _ -> () > + }; > + match (ipv6_parse(s)) { > + Some{ipv6} -> return Some{IPv6{ipv6}}, > + _ -> () > + }; > + None > +} > +function to_string(ip46: v46_ip) : string = { > + match (ip46) { > + IPv4{ipv4} -> "${ipv4}", > + IPv6{ipv6} -> "${ipv6}" > + } > +} > +function to_bracketed_string(ip46: v46_ip) : string = { > + match (ip46) { > + IPv4{ipv4} -> "${ipv4}", > + IPv6{ipv6} -> "[${ipv6}]" > + } > +} > + > +function ip46_get_network(ip46: v46_ip, plen: bit<32>) : v46_ip { > + match (ip46) { > + IPv4{ipv4} -> IPv4{ip_bitand(ipv4, ip_create_mask(plen))}, > + IPv6{ipv6} -> IPv6{ipv6_bitand(ipv6, ipv6_create_mask(plen))} > + } > +} > + > +function ip46_is_all_ones(ip46: v46_ip) : bool { > + match (ip46) { > + IPv4{ipv4} -> ipv4 == ip_create_mask(32), > + IPv6{ipv6} -> ipv6 == ipv6_create_mask(128) > + } > +} > + > +function ip46_count_cidr_bits(ip46: v46_ip) : Option<bit<8>> { > + match (ip46) { > + IPv4{ipv4} -> ip_count_cidr_bits(ipv4), > + IPv6{ipv6} -> ipv6_count_cidr_bits(ipv6) > + } > +} > + > +function ip46_ipX(ip46: v46_ip) : string { > + match (ip46) { > + IPv4{_} -> "ip4", > + IPv6{_} -> "ip6" > + } > +} > + > +function ip46_xxreg(ip46: v46_ip) : string { > + match (ip46) { > + IPv4{_} -> "", > + IPv6{_} -> "xx" > + } > +} > + > +typedef ipv4_netaddr = IPV4NetAddr { > + addr: in_addr, /* 192.168.10.123 */ > + plen: bit<32> /* CIDR Prefix: 24. */ > +} > + > +/* Returns the netmask. */ > +function ipv4_netaddr_mask(na: ipv4_netaddr): in_addr { > + ip_create_mask(na.plen) > +} > + > +/* Returns the broadcast address. */ > +function ipv4_netaddr_bcast(na: ipv4_netaddr): in_addr { > + ip_bcast(na.addr, ipv4_netaddr_mask(na)) > +} > + > +/* Returns the network (with the host bits zeroed). */ > +function ipv4_netaddr_network(na: ipv4_netaddr): in_addr { > + ip_network(na.addr, ipv4_netaddr_mask(na)) > +} > + > +/* Returns the host (with the network bits zeroed). */ > +function ipv4_netaddr_host(na: ipv4_netaddr): in_addr { > + ip_host(na.addr, ipv4_netaddr_mask(na)) > +} > + > +/* Match on the host, if the host part is nonzero, or on the network > + * otherwise. */ > +function ipv4_netaddr_match_host_or_network(na: ipv4_netaddr): string { > + if (na.plen < 32 and ip_is_zero(ipv4_netaddr_host(na))) { > + "${na.addr}/${na.plen}" > + } else { > + "${na.addr}" > + } > +} > + > +/* Match on the network. */ > +function ipv4_netaddr_match_network(na: ipv4_netaddr): string { > + if (na.plen < 32) { > + "${ipv4_netaddr_network(na)}/${na.plen}" > + } else { > + "${na.addr}" > + } > +} > + > +typedef ipv6_netaddr = IPV6NetAddr { > + addr: in6_addr, /* fc00::1 */ > + plen: bit<32> /* CIDR Prefix: 64 */ > +} > + > +/* Returns the netmask. */ > +function ipv6_netaddr_mask(na: ipv6_netaddr): in6_addr { > + ipv6_create_mask(na.plen) > +} > + > +/* Returns the network (with the host bits zeroed). */ > +function ipv6_netaddr_network(na: ipv6_netaddr): in6_addr { > + ipv6_network(na.addr, ipv6_netaddr_mask(na)) > +} > + > +/* Returns the host (with the network bits zeroed). */ > +function ipv6_netaddr_host(na: ipv6_netaddr): in6_addr { > + ipv6_host(na.addr, ipv6_netaddr_mask(na)) > +} > + > +function ipv6_netaddr_solicited_node(na: ipv6_netaddr): in6_addr { > + in6_addr_solicited_node(na.addr) > +} > + > +function ipv6_netaddr_is_lla(na: ipv6_netaddr): bool { > + return in6_is_lla(ipv6_netaddr_network(na)) > +} > + > +/* Match on the network. */ > +function ipv6_netaddr_match_network(na: ipv6_netaddr): string { > + if (na.plen < 128) { > + "${ipv6_netaddr_network(na)}/${na.plen}" > + } else { > + "${na.addr}" > + } > +} > + > +typedef lport_addresses = LPortAddress { > + ea: eth_addr, > + ipv4_addrs: Vec<ipv4_netaddr>, > + ipv6_addrs: Vec<ipv6_netaddr> > +} > + > +function to_string(addr: lport_addresses): string = { > + var addrs = ["${addr.ea}"]; > + for (ip4 in addr.ipv4_addrs) { > + vec_push(addrs, "${ip4.addr}") > + }; > + > + for (ip6 in addr.ipv6_addrs) { > + vec_push(addrs, "${ip6.addr}") > + }; > + > + string_join(addrs, " ") > +} > + > +/* > + * Packet header lengths > + */ > +function eTH_HEADER_LEN(): integer = 14 > +function vLAN_HEADER_LEN(): integer = 4 > +function vLAN_ETH_HEADER_LEN(): integer = eTH_HEADER_LEN() + vLAN_HEADER_LEN() > + > +/* > + * Logging > + */ > +extern function warn(msg: string): () > +extern function err(msg: string): () > +extern function abort(msg: string): () > + > +/* > + * C functions imported from OVN > + */ > +extern function is_dynamic_lsp_address(addr: string): bool > +extern function extract_lsp_addresses(address: string): Option<lport_addresses> > +extern function extract_addresses(address: string): Option<lport_addresses> > +extern function extract_lrp_networks(mac: string, networks: Set<string>): Option<lport_addresses> > + > +extern function split_addresses(addr: string): (Set<string>, Set<string>) > + > +/* > + * C functions imported from OVS > + */ > +extern function json_string_escape(s: string): string > + > +/* Returns the number of 1-bits in `x`, between 0 and 64 inclusive */ > +extern function count_1bits(x: bit<64>): bit<8> > + > +/* For a 'key' of the form "IP:port" or just "IP", returns > + * (v46_ip, port) tuple. */ > +extern function ip_address_and_port_from_lb_key(k: string): Option<(v46_ip, bit<16>)> > + > +extern function str_to_int(s: string, base: bit<16>): Option<integer> > +extern function str_to_uint(s: string, base: bit<16>): Option<integer> > diff --git a/northd/ovn.rs b/northd/ovn.rs > new file mode 100644 > index 000000000000..e8d899951da8 > --- /dev/null > +++ b/northd/ovn.rs > @@ -0,0 +1,857 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +use ::nom::*; > +use ::differential_datalog::record; > +use ::std::ffi; > +use ::std::ptr; > +use ::std::default; > +use ::std::process; > +use ::std::os::raw; > +use ::libc; > + > +use crate::ddlog_std; > + > +pub fn warn(msg: &String) { > + warn_(msg.as_str()) > +} > + > +pub fn warn_(msg: &str) { > + unsafe { > + ddlog_warn(ffi::CString::new(msg).unwrap().as_ptr()); > + } > +} > + > +pub fn err_(msg: &str) { > + unsafe { > + ddlog_err(ffi::CString::new(msg).unwrap().as_ptr()); > + } > +} > + > +pub fn abort(msg: &String) { > + abort_(msg.as_str()) > +} > + > +fn abort_(msg: &str) { > + err_(format!("DDlog error: {}.", msg).as_ref()); > + process::abort(); > +} > + > +const ETH_ADDR_SIZE: usize = 6; > +const IN6_ADDR_SIZE: usize = 16; > +const INET6_ADDRSTRLEN: usize = 46; > +const INET_ADDRSTRLEN: usize = 16; > +const ETH_ADDR_STRLEN: usize = 17; > + > +const AF_INET: usize = 2; > +const AF_INET6: usize = 10; > + > +/* Implementation for externs declared in ovn.dl */ > + > +#[repr(C)] > +#[derive(Default, PartialEq, Eq, PartialOrd, Ord, Clone, Hash, Serialize, Deserialize, Debug)] > +pub struct eth_addr { > + x: [u8; ETH_ADDR_SIZE] > +} > + > +pub fn eth_addr_zero() -> eth_addr { > + eth_addr { x: [0; ETH_ADDR_SIZE] } > +} > + > +pub fn eth_addr2string(addr: &eth_addr) -> String { > + format!("{:02x}:{:02x}:{:02x}:{:02x}:{:02x}:{:02x}", > + addr.x[0], addr.x[1], addr.x[2], addr.x[3], addr.x[4], addr.x[5]) > +} > + > +pub fn eth_addr_from_string(s: &String) -> ddlog_std::Option<eth_addr> { > + let mut ea: eth_addr = Default::default(); > + unsafe { > + if ovs::eth_addr_from_string(string2cstr(s).as_ptr(), &mut ea as *mut eth_addr) { > + ddlog_std::Option::Some{x: ea} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn eth_addr_from_uint64(x: &u64) -> eth_addr { > + let mut ea: eth_addr = Default::default(); > + unsafe { > + ovs::eth_addr_from_uint64(*x as libc::uint64_t, &mut ea as *mut eth_addr); > + ea > + } > +} > + > +pub fn eth_addr_mark_random(ea: &eth_addr) -> eth_addr { > + unsafe { > + let mut ea_new = ea.clone(); > + ovs::eth_addr_mark_random(&mut ea_new as *mut eth_addr); > + ea_new > + } > +} > + > +pub fn eth_addr_to_uint64(ea: &eth_addr) -> u64 { > + unsafe { > + ovs::eth_addr_to_uint64(ea.clone()) as u64 > + } > +} > + > + > +impl FromRecord for eth_addr { > + fn from_record(val: &record::Record) -> Result<Self, String> { > + Ok(eth_addr{x: <[u8; ETH_ADDR_SIZE]>::from_record(val)?}) > + } > +} > + > +::differential_datalog::decl_struct_into_record!(eth_addr, <>, x); > +::differential_datalog::decl_record_mutator_struct!(eth_addr, <>, x: [u8; ETH_ADDR_SIZE]); > + > + > +#[repr(C)] > +#[derive(Default, PartialEq, Eq, PartialOrd, Ord, Clone, Hash, Serialize, Deserialize, Debug)] > +pub struct in6_addr { > + x: [u8; IN6_ADDR_SIZE] > +} > + > +pub const in6addr_any: in6_addr = in6_addr{x: [0; IN6_ADDR_SIZE]}; > +pub const in6addr_all_hosts: in6_addr = in6_addr{x: [ > + 0xff,0x02,0x00,0x00,0x00,0x00,0x00,0x00, > + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01 ]}; > + > +impl FromRecord for in6_addr { > + fn from_record(val: &record::Record) -> Result<Self, String> { > + Ok(in6_addr{x: <[u8; IN6_ADDR_SIZE]>::from_record(val)?}) > + } > +} > + > +::differential_datalog::decl_struct_into_record!(in6_addr, <>, x); > +::differential_datalog::decl_record_mutator_struct!(in6_addr, <>, x: [u8; IN6_ADDR_SIZE]); > + > +pub fn in6_generate_lla(ea: &eth_addr) -> in6_addr { > + let mut addr: in6_addr = Default::default(); > + unsafe {ovs::in6_generate_lla(ea.clone(), &mut addr as *mut in6_addr)}; > + addr > +} > + > +pub fn in6_generate_eui64(ea: &eth_addr, prefix: &in6_addr) -> in6_addr { > + let mut addr: in6_addr = Default::default(); > + unsafe {ovs::in6_generate_eui64(ea.clone(), > + prefix as *const in6_addr, > + &mut addr as *mut in6_addr)}; > + addr > +} > + > +pub fn in6_is_lla(addr: &in6_addr) -> bool { > + unsafe {ovs::in6_is_lla(addr as *const in6_addr)} > +} > + > +pub fn in6_addr_solicited_node(ip6: &in6_addr) -> in6_addr > +{ > + let mut res: in6_addr = Default::default(); > + unsafe { > + ovs::in6_addr_solicited_node(&mut res as *mut in6_addr, ip6 as *const in6_addr); > + } > + res > +} > + > +pub fn ipv6_bitand(a: &in6_addr, b: &in6_addr) -> in6_addr { > + unsafe { > + ovs::ipv6_addr_bitand(a as *const in6_addr, b as *const in6_addr) > + } > +} > + > +pub fn ipv6_bitxor(a: &in6_addr, b: &in6_addr) -> in6_addr { > + unsafe { > + ovs::ipv6_addr_bitxor(a as *const in6_addr, b as *const in6_addr) > + } > +} > + > +pub fn ipv6_bitnot(a: &in6_addr) -> in6_addr { > + let mut result: in6_addr = Default::default(); > + for i in 0..16 { > + result.x[i] = !a.x[i] > + } > + result > +} > + > +pub fn ipv6_string_mapped(addr: &in6_addr) -> String { > + let mut addr_str = [0 as i8; INET6_ADDRSTRLEN]; > + unsafe { > + ovs::ipv6_string_mapped(&mut addr_str[0] as *mut raw::c_char, addr as *const in6_addr); > + cstr2string(&addr_str as *const raw::c_char) > + } > +} > + > +pub fn ipv6_is_zero(addr: &in6_addr) -> bool { > + *addr == in6addr_any > +} > + > +pub fn ipv6_count_cidr_bits(ip6: &in6_addr) -> ddlog_std::Option<u8> { > + unsafe { > + match (ipv6_is_cidr(ip6)) { > + true => ddlog_std::Option::Some{x: ovs::ipv6_count_cidr_bits(ip6 as *const in6_addr) as u8}, > + false => ddlog_std::Option::None > + } > + } > +} > + > +pub fn json_string_escape(s: &String) -> String { > + let mut ds = ovs_ds::new(); > + unsafe { > + ovs::json_string_escape(ffi::CString::new(s.as_str()).unwrap().as_ptr() as *const raw::c_char, > + &mut ds as *mut ovs_ds); > + }; > + unsafe{ds.into_string()} > +} > + > +pub fn extract_lsp_addresses(address: &String) -> ddlog_std::Option<lport_addresses> { > + unsafe { > + let mut laddrs: lport_addresses_c = Default::default(); > + if ovn_c::extract_lsp_addresses(string2cstr(address).as_ptr(), > + &mut laddrs as *mut lport_addresses_c) { > + ddlog_std::Option::Some{x: laddrs.into_ddlog()} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn extract_addresses(address: &String) -> ddlog_std::Option<lport_addresses> { > + unsafe { > + let mut laddrs: lport_addresses_c = Default::default(); > + let mut ofs: raw::c_int = 0; > + if ovn_c::extract_addresses(string2cstr(address).as_ptr(), > + &mut laddrs as *mut lport_addresses_c, > + &mut ofs as *mut raw::c_int) { > + ddlog_std::Option::Some{x: laddrs.into_ddlog()} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn extract_lrp_networks(mac: &String, networks: &ddlog_std::Set<String>) -> ddlog_std::Option<lport_addresses> > +{ > + unsafe { > + let mut laddrs: lport_addresses_c = Default::default(); > + let mut networks_cstrs = Vec::with_capacity(networks.x.len()); > + let mut networks_ptrs = Vec::with_capacity(networks.x.len()); > + for net in networks.x.iter() { > + networks_cstrs.push(string2cstr(net)); > + networks_ptrs.push(networks_cstrs.last().unwrap().as_ptr()); > + }; > + if ovn_c::extract_lrp_networks__(string2cstr(mac).as_ptr(), networks_ptrs.as_ptr() as *const *const raw::c_char, > + networks_ptrs.len(), &mut laddrs as *mut lport_addresses_c) { > + ddlog_std::Option::Some{x: laddrs.into_ddlog()} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn ipv6_parse_masked(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in6_addr, in6_addr>> > +{ > + unsafe { > + let mut ip: in6_addr = Default::default(); > + let mut mask: in6_addr = Default::default(); > + let err = ovs::ipv6_parse_masked(string2cstr(s).as_ptr(), &mut ip as *mut in6_addr, &mut mask as *mut in6_addr); > + if (err != ptr::null_mut()) { > + let errstr = cstr2string(err); > + free(err as *mut raw::c_void); > + ddlog_std::Either::Left{l: errstr} > + } else { > + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, mask)} > + } > + } > +} > + > +pub fn ipv6_parse_cidr(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in6_addr, u32>> > +{ > + unsafe { > + let mut ip: in6_addr = Default::default(); > + let mut plen: raw::c_uint = 0; > + let err = ovs::ipv6_parse_cidr(string2cstr(s).as_ptr(), &mut ip as *mut in6_addr, &mut plen as *mut raw::c_uint); > + if (err != ptr::null_mut()) { > + let errstr = cstr2string(err); > + free(err as *mut raw::c_void); > + ddlog_std::Either::Left{l: errstr} > + } else { > + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, plen as u32)} > + } > + } > +} > + > +pub fn ipv6_parse(s: &String) -> ddlog_std::Option<in6_addr> > +{ > + unsafe { > + let mut ip: in6_addr = Default::default(); > + let res = ovs::ipv6_parse(string2cstr(s).as_ptr(), &mut ip as *mut in6_addr); > + if (res) { > + ddlog_std::Option::Some{x: ip} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn ipv6_create_mask(mask: &u32) -> in6_addr > +{ > + unsafe {ovs::ipv6_create_mask(*mask as raw::c_uint)} > +} > + > + > +pub fn ipv6_is_routable_multicast(a: &in6_addr) -> bool > +{ > + unsafe{ovn_c::ipv6_addr_is_routable_multicast(a as *const in6_addr)} > +} > + > +pub fn ipv6_is_all_hosts(a: &in6_addr) -> bool > +{ > + return *a == in6addr_all_hosts; > +} > + > +pub fn ipv6_is_cidr(a: &in6_addr) -> bool > +{ > + unsafe{ovs::ipv6_is_cidr(a as *const in6_addr)} > +} > + > +pub fn ipv6_multicast_to_ethernet(ip6: &in6_addr) -> eth_addr > +{ > + let mut eth: eth_addr = Default::default(); > + unsafe{ > + ovs::ipv6_multicast_to_ethernet(&mut eth as *mut eth_addr, ip6 as *const in6_addr); > + } > + eth > +} > + > +pub type in_addr = u32; > +pub type ovs_be32 = u32; > + > +pub fn iptohl(addr: &in_addr) -> u32 { > + ddlog_std::ntohl(addr) > +} > +pub fn hltoip(addr: &u32) -> in_addr { > + ddlog_std::htonl(addr) > +} > + > +pub fn ip_parse_masked(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in_addr, in_addr>> > +{ > + unsafe { > + let mut ip: ovs_be32 = 0; > + let mut mask: ovs_be32 = 0; > + let err = ovs::ip_parse_masked(string2cstr(s).as_ptr(), &mut ip as *mut ovs_be32, &mut mask as *mut ovs_be32); > + if (err != ptr::null_mut()) { > + let errstr = cstr2string(err); > + free(err as *mut raw::c_void); > + ddlog_std::Either::Left{l: errstr} > + } else { > + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, mask)} > + } > + } > +} > + > +pub fn ip_parse_cidr(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in_addr, u32>> > +{ > + unsafe { > + let mut ip: ovs_be32 = 0; > + let mut plen: raw::c_uint = 0; > + let err = ovs::ip_parse_cidr(string2cstr(s).as_ptr(), &mut ip as *mut ovs_be32, &mut plen as *mut raw::c_uint); > + if (err != ptr::null_mut()) { > + let errstr = cstr2string(err); > + free(err as *mut raw::c_void); > + ddlog_std::Either::Left{l: errstr} > + } else { > + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, plen as u32)} > + } > + } > +} > + > +pub fn ip_parse(s: &String) -> ddlog_std::Option<in_addr> > +{ > + unsafe { > + let mut ip: ovs_be32 = 0; > + if (ovs::ip_parse(string2cstr(s).as_ptr(), &mut ip as *mut ovs_be32)) { > + ddlog_std::Option::Some{x:ip} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn ip_count_cidr_bits(address: &in_addr) -> ddlog_std::Option<u8> { > + unsafe { > + match (ip_is_cidr(address)) { > + true => ddlog_std::Option::Some{x: ovs::ip_count_cidr_bits(*address) as u8}, > + false => ddlog_std::Option::None > + } > + } > +} > + > +pub fn is_dynamic_lsp_address(address: &String) -> bool { > + unsafe { > + ovn_c::is_dynamic_lsp_address(string2cstr(address).as_ptr()) > + } > +} > + > +pub fn split_addresses(addresses: &String) -> ddlog_std::tuple2<ddlog_std::Set<String>, ddlog_std::Set<String>> { > + let mut ip4_addrs = ovs_svec::new(); > + let mut ip6_addrs = ovs_svec::new(); > + unsafe { > + ovn_c::split_addresses(string2cstr(addresses).as_ptr(), &mut ip4_addrs as *mut ovs_svec, &mut ip6_addrs as *mut ovs_svec); > + ddlog_std::tuple2(ip4_addrs.into_strings(), ip6_addrs.into_strings()) > + } > +} > + > +pub fn scan_eth_addr(s: &String) -> ddlog_std::Option<eth_addr> { > + let mut ea = eth_addr_zero(); > + unsafe { > + if ovs::ovs_scan(string2cstr(s).as_ptr(), b"%hhx:%hhx:%hhx:%hhx:%hhx:%hhx\0".as_ptr() as *const raw::c_char, > + &mut ea.x[0] as *mut u8, &mut ea.x[1] as *mut u8, > + &mut ea.x[2] as *mut u8, &mut ea.x[3] as *mut u8, > + &mut ea.x[4] as *mut u8, &mut ea.x[5] as *mut u8) > + { > + ddlog_std::Option::Some{x: ea} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn scan_eth_addr_prefix(s: &String) -> ddlog_std::Option<u64> { > + let mut b2: u8 = 0; > + let mut b1: u8 = 0; > + let mut b0: u8 = 0; > + unsafe { > + if ovs::ovs_scan(string2cstr(s).as_ptr(), b"%hhx:%hhx:%hhx\0".as_ptr() as *const raw::c_char, > + &mut b2 as *mut u8, &mut b1 as *mut u8, &mut b0 as *mut u8) > + { > + ddlog_std::Option::Some{x: ((b2 as u64) << 40) | ((b1 as u64) << 32) | ((b0 as u64) << 24) } > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn scan_static_dynamic_ip(s: &String) -> ddlog_std::Option<in_addr> { > + let mut ip0: u8 = 0; > + let mut ip1: u8 = 0; > + let mut ip2: u8 = 0; > + let mut ip3: u8 = 0; > + let mut n: raw::c_uint = 0; > + unsafe { > + if ovs::ovs_scan(string2cstr(s).as_ptr(), b"dynamic %hhu.%hhu.%hhu.%hhu%n\0".as_ptr() as *const raw::c_char, > + &mut ip0 as *mut u8, > + &mut ip1 as *mut u8, > + &mut ip2 as *mut u8, > + &mut ip3 as *mut u8, > + &mut n) && s.len() == (n as usize) > + { > + ddlog_std::Option::Some{x: ddlog_std::htonl(&(((ip0 as u32) << 24) | ((ip1 as u32) << 16) | ((ip2 as u32) << 8) | (ip3 as u32)))} > + } else { > + ddlog_std::Option::None > + } > + } > +} > + > +pub fn ip_address_and_port_from_lb_key(k: &String) -> > + ddlog_std::Option<ddlog_std::tuple2<v46_ip, u16>> > +{ > + unsafe { > + let mut ip_address: *mut raw::c_char = ptr::null_mut(); > + let mut port: libc::uint16_t = 0; > + let mut addr_family: raw::c_int = 0; > + > + ovn_c::ip_address_and_port_from_lb_key(string2cstr(k).as_ptr(), &mut ip_address as *mut *mut raw::c_char, > + &mut port as *mut libc::uint16_t, &mut addr_family as *mut raw::c_int); > + if (ip_address != ptr::null_mut()) { > + match (ip46_parse(&cstr2string(ip_address))) { > + ddlog_std::Option::Some{x: ip46} => { > + let res = ddlog_std::tuple2(ip46, port as u16); > + free(ip_address as *mut raw::c_void); > + return ddlog_std::Option::Some{x: res} > + }, > + _ => () > + } > + } > + ddlog_std::Option::None > + } > +} > + > +pub fn count_1bits(x: &u64) -> u8 { > + x.count_ones() as u8 > +} > + > + > +pub fn str_to_int(s: &String, base: &u16) -> ddlog_std::Option<u64> { > + let mut i: raw::c_int = 0; > + let ok = unsafe { > + ovs::str_to_int(string2cstr(s).as_ptr(), *base as raw::c_int, &mut i as *mut raw::c_int) > + }; > + if ok { > + ddlog_std::Option::Some{x: i as u64} > + } else { > + ddlog_std::Option::None > + } > +} > + > +pub fn str_to_uint(s: &String, base: &u16) -> ddlog_std::Option<u64> { > + let mut i: raw::c_uint = 0; > + let ok = unsafe { > + ovs::str_to_uint(string2cstr(s).as_ptr(), *base as raw::c_int, &mut i as *mut raw::c_uint) > + }; > + if ok { > + ddlog_std::Option::Some{x: i as u64} > + } else { > + ddlog_std::Option::None > + } > +} > + > +pub fn inet6_ntop(addr: &in6_addr) -> String { > + let mut buf = [0 as i8; INET6_ADDRSTRLEN]; > + unsafe { > + let res = inet_ntop(AF_INET6 as raw::c_int, addr as *const in6_addr as *const raw::c_void, > + &mut buf[0] as *mut raw::c_char, INET6_ADDRSTRLEN as libc::socklen_t); > + if res == ptr::null() { > + warn(&format!("inet_ntop({:?}) failed", *addr)); > + "".to_owned() > + } else { > + cstr2string(&buf as *const raw::c_char) > + } > + } > +} > + > +/* Internals */ > + > +unsafe fn cstr2string(s: *const raw::c_char) -> String { > + ffi::CStr::from_ptr(s).to_owned().into_string(). > + unwrap_or_else(|e|{ warn(&format!("cstr2string: {}", e)); "".to_owned() }) > +} > + > +fn string2cstr(s: &String) -> ffi::CString { > + ffi::CString::new(s.as_str()).unwrap() > +} > + > +/* OVS dynamic string type */ > +#[repr(C)] > +struct ovs_ds { > + s: *mut raw::c_char, /* Null-terminated string. */ > + length: libc::size_t, /* Bytes used, not including null terminator. */ > + allocated: libc::size_t /* Bytes allocated, not including null terminator. */ > +} > + > +impl ovs_ds { > + pub fn new() -> ovs_ds { > + ovs_ds{s: ptr::null_mut(), length: 0, allocated: 0} > + } > + > + pub unsafe fn into_string(mut self) -> String { > + let res = cstr2string(ovs::ds_cstr(&self as *const ovs_ds)); > + ovs::ds_destroy(&mut self as *mut ovs_ds); > + res > + } > +} > + > +/* OVS string vector type */ > +#[repr(C)] > +struct ovs_svec { > + names: *mut *mut raw::c_char, > + n: libc::size_t, > + allocated: libc::size_t > +} > + > +impl ovs_svec { > + pub fn new() -> ovs_svec { > + ovs_svec{names: ptr::null_mut(), n: 0, allocated: 0} > + } > + > + pub unsafe fn into_strings(mut self) -> ddlog_std::Set<String> { > + let mut res: ddlog_std::Set<String> = ddlog_std::Set::new(); > + unsafe { > + for i in 0..self.n { > + res.insert(cstr2string(*self.names.offset(i as isize))); > + } > + ovs::svec_destroy(&mut self as *mut ovs_svec); > + } > + res > + } > +} > + > + > +// ovn/lib/ovn-util.h > +#[repr(C)] > +struct ipv4_netaddr_c { > + addr: libc::uint32_t, > + mask: libc::uint32_t, > + network: libc::uint32_t, > + plen: raw::c_uint, > + > + addr_s: [raw::c_char; INET_ADDRSTRLEN + 1], /* "192.168.10.123" */ > + network_s: [raw::c_char; INET_ADDRSTRLEN + 1], /* "192.168.10.0" */ > + bcast_s: [raw::c_char; INET_ADDRSTRLEN + 1] /* "192.168.10.255" */ > +} > + > +impl Default for ipv4_netaddr_c { > + fn default() -> Self { > + ipv4_netaddr_c { > + addr: 0, > + mask: 0, > + network: 0, > + plen: 0, > + addr_s: [0; INET_ADDRSTRLEN + 1], > + network_s: [0; INET_ADDRSTRLEN + 1], > + bcast_s: [0; INET_ADDRSTRLEN + 1] > + } > + } > +} > + > +impl ipv4_netaddr_c { > + pub unsafe fn to_ddlog(&self) -> ipv4_netaddr { > + ipv4_netaddr{ > + addr: self.addr, > + plen: self.plen, > + } > + } > +} > + > +#[repr(C)] > +struct ipv6_netaddr_c { > + addr: in6_addr, /* fc00::1 */ > + mask: in6_addr, /* ffff:ffff:ffff:ffff:: */ > + sn_addr: in6_addr, /* ff02:1:ff00::1 */ > + network: in6_addr, /* fc00:: */ > + plen: raw::c_uint, /* CIDR Prefix: 64 */ > + > + addr_s: [raw::c_char; INET6_ADDRSTRLEN + 1], /* "fc00::1" */ > + sn_addr_s: [raw::c_char; INET6_ADDRSTRLEN + 1], /* "ff02:1:ff00::1" */ > + network_s: [raw::c_char; INET6_ADDRSTRLEN + 1] /* "fc00::" */ > +} > + > +impl Default for ipv6_netaddr_c { > + fn default() -> Self { > + ipv6_netaddr_c { > + addr: Default::default(), > + mask: Default::default(), > + sn_addr: Default::default(), > + network: Default::default(), > + plen: 0, > + addr_s: [0; INET6_ADDRSTRLEN + 1], > + sn_addr_s: [0; INET6_ADDRSTRLEN + 1], > + network_s: [0; INET6_ADDRSTRLEN + 1] > + } > + } > +} > + > +impl ipv6_netaddr_c { > + pub unsafe fn to_ddlog(&self) -> ipv6_netaddr { > + ipv6_netaddr{ > + addr: self.addr.clone(), > + plen: self.plen > + } > + } > +} > + > + > +// ovn-util.h > +#[repr(C)] > +struct lport_addresses_c { > + ea_s: [raw::c_char; ETH_ADDR_STRLEN + 1], > + ea: eth_addr, > + n_ipv4_addrs: libc::size_t, > + ipv4_addrs: *mut ipv4_netaddr_c, > + n_ipv6_addrs: libc::size_t, > + ipv6_addrs: *mut ipv6_netaddr_c > +} > + > +impl Default for lport_addresses_c { > + fn default() -> Self { > + lport_addresses_c { > + ea_s: [0; ETH_ADDR_STRLEN + 1], > + ea: Default::default(), > + n_ipv4_addrs: 0, > + ipv4_addrs: ptr::null_mut(), > + n_ipv6_addrs: 0, > + ipv6_addrs: ptr::null_mut() > + } > + } > +} > + > +impl lport_addresses_c { > + pub unsafe fn into_ddlog(mut self) -> lport_addresses { > + let mut ipv4_addrs = ddlog_std::Vec::with_capacity(self.n_ipv4_addrs); > + for i in 0..self.n_ipv4_addrs { > + ipv4_addrs.push((&*self.ipv4_addrs.offset(i as isize)).to_ddlog()) > + } > + let mut ipv6_addrs = ddlog_std::Vec::with_capacity(self.n_ipv6_addrs); > + for i in 0..self.n_ipv6_addrs { > + ipv6_addrs.push((&*self.ipv6_addrs.offset(i as isize)).to_ddlog()) > + } > + let res = lport_addresses { > + ea: self.ea.clone(), > + ipv4_addrs: ipv4_addrs, > + ipv6_addrs: ipv6_addrs > + }; > + ovn_c::destroy_lport_addresses(&mut self as *mut lport_addresses_c); > + res > + } > +} > + > +/* functions imported from ovn-northd.c */ > +extern "C" { > + fn ddlog_warn(msg: *const raw::c_char); > + fn ddlog_err(msg: *const raw::c_char); > +} > + > +/* functions imported from libovn */ > +mod ovn_c { > + use ::std::os::raw; > + use ::libc; > + use super::lport_addresses_c; > + use super::ovs_svec; > + use super::in6_addr; > + > + #[link(name = "ovn")] > + extern "C" { > + // ovn/lib/ovn-util.h > + pub fn extract_lsp_addresses(address: *const raw::c_char, laddrs: *mut lport_addresses_c) -> bool; > + pub fn extract_addresses(address: *const raw::c_char, laddrs: *mut lport_addresses_c, ofs: *mut raw::c_int) -> bool; > + pub fn extract_lrp_networks__(mac: *const raw::c_char, networks: *const *const raw::c_char, > + n_networks: libc::size_t, laddrs: *mut lport_addresses_c) -> bool; > + pub fn destroy_lport_addresses(addrs: *mut lport_addresses_c); > + pub fn is_dynamic_lsp_address(address: *const raw::c_char) -> bool; > + pub fn split_addresses(addresses: *const raw::c_char, ip4_addrs: *mut ovs_svec, ipv6_addrs: *mut ovs_svec); > + pub fn ip_address_and_port_from_lb_key(key: *const raw::c_char, ip_address: *mut *mut raw::c_char, > + port: *mut libc::uint16_t, addr_family: *mut raw::c_int); > + pub fn ipv6_addr_is_routable_multicast(ip: *const in6_addr) -> bool; > + } > +} > + > +mod ovs { > + use ::std::os::raw; > + use ::libc; > + use super::in6_addr; > + use super::ovs_be32; > + use super::ovs_ds; > + use super::eth_addr; > + use super::ovs_svec; > + > + /* functions imported from libopenvswitch */ > + #[link(name = "openvswitch")] > + extern "C" { > + // lib/packets.h > + pub fn ipv6_string_mapped(addr_str: *mut raw::c_char, addr: *const in6_addr) -> *const raw::c_char; > + pub fn ipv6_parse_masked(s: *const raw::c_char, ip: *mut in6_addr, mask: *mut in6_addr) -> *mut raw::c_char; > + pub fn ipv6_parse_cidr(s: *const raw::c_char, ip: *mut in6_addr, plen: *mut raw::c_uint) -> *mut raw::c_char; > + pub fn ipv6_parse(s: *const raw::c_char, ip: *mut in6_addr) -> bool; > + pub fn ipv6_mask_is_any(mask: *const in6_addr) -> bool; > + pub fn ipv6_count_cidr_bits(mask: *const in6_addr) -> raw::c_int; > + pub fn ipv6_is_cidr(mask: *const in6_addr) -> bool; > + pub fn ipv6_addr_bitxor(a: *const in6_addr, b: *const in6_addr) -> in6_addr; > + pub fn ipv6_addr_bitand(a: *const in6_addr, b: *const in6_addr) -> in6_addr; > + pub fn ipv6_create_mask(mask: raw::c_uint) -> in6_addr; > + pub fn ipv6_is_zero(a: *const in6_addr) -> bool; > + pub fn ipv6_multicast_to_ethernet(eth: *mut eth_addr, ip6: *const in6_addr); > + pub fn ip_parse_masked(s: *const raw::c_char, ip: *mut ovs_be32, mask: *mut ovs_be32) -> *mut raw::c_char; > + pub fn ip_parse_cidr(s: *const raw::c_char, ip: *mut ovs_be32, plen: *mut raw::c_uint) -> *mut raw::c_char; > + pub fn ip_parse(s: *const raw::c_char, ip: *mut ovs_be32) -> bool; > + pub fn ip_count_cidr_bits(mask: ovs_be32) -> raw::c_int; > + pub fn eth_addr_from_string(s: *const raw::c_char, ea: *mut eth_addr) -> bool; > + pub fn eth_addr_to_uint64(ea: eth_addr) -> libc::uint64_t; > + pub fn eth_addr_from_uint64(x: libc::uint64_t, ea: *mut eth_addr); > + pub fn eth_addr_mark_random(ea: *mut eth_addr); > + pub fn in6_generate_eui64(ea: eth_addr, prefix: *const in6_addr, lla: *mut in6_addr); > + pub fn in6_generate_lla(ea: eth_addr, lla: *mut in6_addr); > + pub fn in6_is_lla(addr: *const in6_addr) -> bool; > + pub fn in6_addr_solicited_node(addr: *mut in6_addr, ip6: *const in6_addr); > + > + // include/openvswitch/json.h > + pub fn json_string_escape(str: *const raw::c_char, out: *mut ovs_ds); > + // openvswitch/dynamic-string.h > + pub fn ds_destroy(ds: *mut ovs_ds); > + pub fn ds_cstr(ds: *const ovs_ds) -> *const raw::c_char; > + pub fn svec_destroy(v: *mut ovs_svec); > + pub fn ovs_scan(s: *const raw::c_char, format: *const raw::c_char, ...) -> bool; > + pub fn str_to_int(s: *const raw::c_char, base: raw::c_int, i: *mut raw::c_int) -> bool; > + pub fn str_to_uint(s: *const raw::c_char, base: raw::c_int, i: *mut raw::c_uint) -> bool; > + } > +} > + > +/* functions imported from libc */ > +#[link(name = "c")] > +extern "C" { > + fn free(ptr: *mut raw::c_void); > +} > + > +/* functions imported from arp/inet6 */ > +extern "C" { > + fn inet_ntop(af: raw::c_int, cp: *const raw::c_void, > + buf: *mut raw::c_char, len: libc::socklen_t) -> *const raw::c_char; > +} > + > +/* > + * Parse IPv4 address list. > + */ > + > +named!(parse_spaces<nom::types::CompleteStr, ()>, > + do_parse!(many1!(one_of!(&" \t\n\r\x0c\x0b")) >> (()) ) > +); > + > +named!(parse_opt_spaces<nom::types::CompleteStr, ()>, > + do_parse!(opt!(parse_spaces) >> (())) > +); > + > +named!(parse_ipv4_range<nom::types::CompleteStr, (String, Option<String>)>, > + do_parse!(addr1: many_till!(complete!(nom::anychar), alt!(do_parse!(eof!() >> (nom::types::CompleteStr(""))) | peek!(tag!("..")) | tag!(" ") )) >> > + parse_opt_spaces >> > + addr2: opt!(do_parse!(tag!("..") >> > + parse_opt_spaces >> > + addr2: many_till!(complete!(nom::anychar), alt!(do_parse!(eof!() >> (' ')) | char!(' ')) ) >> > + (addr2) )) >> > + parse_opt_spaces >> > + (addr1.0.into_iter().collect(), addr2.map(|x|x.0.into_iter().collect())) ) > +); > + > +named!(parse_ipv4_address_list<nom::types::CompleteStr, Vec<(String, Option<String>)>>, > + do_parse!(parse_opt_spaces >> > + ranges: many0!(parse_ipv4_range) >> > + (ranges))); > + > +pub fn parse_ip_list(ips: &String) -> ddlog_std::Either<String, ddlog_std::Vec<ddlog_std::tuple2<in_addr, ddlog_std::Option<in_addr>>>> > +{ > + match parse_ipv4_address_list(nom::types::CompleteStr(ips.as_str())) { > + Err(e) => { > + ddlog_std::Either::Left{l: format!("invalid IP list format: \"{}\"", ips.as_str())} > + }, > + Ok((nom::types::CompleteStr(""), ranges)) => { > + let mut res = vec![]; > + for (ip1, ip2) in ranges.iter() { > + let start = match ip_parse(&ip1) { > + ddlog_std::Option::None => return ddlog_std::Either::Left{l: format!("invalid IP address: \"{}\"", *ip1)}, > + ddlog_std::Option::Some{x: ip} => ip > + }; > + let end = match ip2 { > + None => ddlog_std::Option::None, > + Some(ip_str) => match ip_parse(&ip_str.clone()) { > + ddlog_std::Option::None => return ddlog_std::Either::Left{l: format!("invalid IP address: \"{}\"", *ip_str)}, > + x => x > + } > + }; > + res.push(ddlog_std::tuple2(start, end)); > + }; > + ddlog_std::Either::Right{r: ddlog_std::Vec{x: res}} > + }, > + Ok((suffix, _)) => { > + ddlog_std::Either::Left{l: format!("IP address list contains trailing characters: \"{}\"", suffix)} > + } > + } > +} > diff --git a/northd/ovn.toml b/northd/ovn.toml > new file mode 100644 > index 000000000000..64108996edae > --- /dev/null > +++ b/northd/ovn.toml > @@ -0,0 +1,2 @@ > +[dependencies.nom] > +version = "4.0" > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl > new file mode 100644 > index 000000000000..3fbe67b31909 > --- /dev/null > +++ b/northd/ovn_northd.dl > @@ -0,0 +1,7500 @@ > +/* > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +import OVN_Northbound as nb > +import OVN_Southbound as sb > +import ovsdb > +import allocate > +import ovn > +import lswitch > +import lrouter > +import multicast > +import helpers > +import ipam > + > +output relation Warning[string] > + > +index Logical_Flow_Index() on sb::Out_Logical_Flow() > + > +/* Meter_Band table */ > +for (mb in nb::Meter_Band) { > + sb::Out_Meter_Band(._uuid = mb._uuid, > + .action = mb.action, > + .rate = mb.rate, > + .burst_size = mb.burst_size) > +} > + > +/* Meter table */ > +for (meter in nb::Meter) { > + sb::Out_Meter(._uuid = meter._uuid, > + .name = meter.name, > + .unit = meter.unit, > + .bands = meter.bands) > +} > + > +/* Proxy table for Out_Datapath_Binding: contains all Datapath_Binding fields, > + * except tunnel id, which is allocated separately (see TunKeyAllocation). */ > +relation OutProxy_Datapath_Binding ( > + _uuid: uuid, > + external_ids: Map<string,string> > +) > + > +/* Datapath_Binding table */ > +OutProxy_Datapath_Binding(uuid, external_ids) :- > + nb::Logical_Switch(._uuid = uuid, .name = name, .external_ids = ids, > + .other_config = other_config), > + var uuid_str = uuid2str(uuid), > + var external_ids = { > + var eids = ["logical-switch" -> uuid_str, "name" -> name]; > + match (map_get(ids, "neutron:network_name")) { > + None -> (), > + Some{nnn} -> map_insert(eids, "name2", nnn) > + }; > + match (map_get(other_config, "interconn-ts")) { > + None -> (), > + Some{value} -> map_insert(eids, "interconn-ts", value) > + }; > + eids > + }. > + > +OutProxy_Datapath_Binding(uuid, external_ids) :- > + lr in nb::Logical_Router(._uuid = uuid, .name = name, .external_ids = ids), > + lr.is_enabled(), > + var uuid_str = uuid2str(uuid), > + var external_ids = { > + var eids = ["logical-router" -> uuid_str, "name" -> name]; > + match (map_get(ids, "neutron:router_name")) { > + None -> (), > + Some{nnn} -> map_insert(eids, "name2", nnn) > + }; > + eids > + }. > + > +sb::Out_Datapath_Binding(uuid, tunkey, external_ids) :- > + OutProxy_Datapath_Binding(uuid, external_ids), > + TunKeyAllocation(uuid, tunkey). > + > + > +/* Proxy table for Out_Datapath_Binding: contains all Datapath_Binding fields, > + * except tunnel id, which is allocated separately (see PortTunKeyAllocation). */ > +relation OutProxy_Port_Binding ( > + _uuid: uuid, > + logical_port: string, > + __type: string, > + gateway_chassis: Set<uuid>, > + ha_chassis_group: Option<uuid>, > + options: Map<string,string>, > + datapath: uuid, > + parent_port: Option<string>, > + tag: Option<integer>, > + mac: Set<string>, > + nat_addresses: Set<string>, > + external_ids: Map<string,string> > +) > + > +/* Case 1: Create a Port_Binding per logical switch port that is not of type "router" */ > +OutProxy_Port_Binding(._uuid = lsp._uuid, > + .logical_port = lsp.name, > + .__type = lsp.__type, > + .gateway_chassis = set_empty(), > + .ha_chassis_group = sp.hac_group_uuid, > + .options = lsp.options, > + .datapath = sw.ls._uuid, > + .parent_port = lsp.parent_name, > + .tag = tag, > + .mac = lsp.addresses, > + .nat_addresses = set_empty(), > + .external_ids = eids) :- > + sp in &SwitchPort(.lsp = lsp, .sw = &sw), > + SwitchPortNewDynamicTag(lsp._uuid, opt_tag), > + var tag = match (opt_tag) { > + None -> lsp.tag, > + Some{t} -> Some{t} > + }, > + lsp.__type != "router", > + var eids = { > + var eids = lsp.external_ids; > + match (map_get(lsp.external_ids, "neutron:port_name")) { > + None -> (), > + Some{name} -> map_insert(eids, "name", name) > + }; > + eids > + }. > + > + > +/* Case 2: Create a Port_Binding per logical switch port of type "router" */ > +OutProxy_Port_Binding(._uuid = lsp._uuid, > + .logical_port = lsp.name, > + .__type = __type, > + .gateway_chassis = set_empty(), > + .ha_chassis_group = None, > + .options = options, > + .datapath = sw.ls._uuid, > + .parent_port = lsp.parent_name, > + .tag = None, > + .mac = lsp.addresses, > + .nat_addresses = nat_addresses, > + .external_ids = eids) :- > + &SwitchPort(.lsp = lsp, .sw = &sw, .peer = peer), > + var eids = { > + var eids = lsp.external_ids; > + match (map_get(lsp.external_ids, "neutron:port_name")) { > + None -> (), > + Some{name} -> map_insert(eids, "name", name) > + }; > + eids > + }, > + Some{var router_port} = map_get(lsp.options, "router-port"), > + var opt_chassis = match (peer) { > + Some{rport} -> map_get(rport.router.lr.options, "chassis"), > + None -> None > + }, > + var l3dgw_port = match (peer) { > + Some{rport} -> rport.router.l3dgw_port, > + None -> None > + }, > + (var __type, var options) = { > + var options = ["peer" -> router_port]; > + match (opt_chassis) { > + None -> { > + ("patch", options) > + }, > + Some{chassis} -> { > + map_insert(options, "l3gateway-chassis", chassis); > + ("l3gateway", options) > + } > + } > + }, > + var base_nat_addresses = { > + match (map_get(lsp.options, "nat-addresses")) { > + None -> { set_empty() }, > + Some{"router"} -> match ((l3dgw_port, opt_chassis, peer)) { > + (None, None, _) -> set_empty(), > + (_, _, None) -> set_empty(), > + (_, _, Some{rport}) -> get_nat_addresses(deref(rport)) > + }, > + Some{nat_addresses} -> { > + /* Only accept manual specification of ethernet address > + * followed by IPv4 addresses on type "l3gateway" ports. */ > + if (is_some(opt_chassis)) { > + match (extract_lsp_addresses(nat_addresses)) { > + None -> { > + warn("Error extracting nat-addresses."); > + set_empty() > + }, > + Some{_} -> { set_singleton(nat_addresses) } > + } > + } else { set_empty() } > + } > + } > + }, > + /* Add the router mac and IPv4 addresses to > + * Port_Binding.nat_addresses so that GARP is sent for these > + * IPs by the ovn-controller on which the distributed gateway > + * router port resides if: > + * > + * 1. The peer has 'reside-on-redirect-chassis' set and the > + * the logical router datapath has distributed router port. > + * > + * 2. The peer is distributed gateway router port. > + * > + * 3. The peer's router is a gateway router and the port has a localnet > + * port. > + * > + * Note: Port_Binding.nat_addresses column is also used for > + * sending the GARPs for the router port IPs. > + * */ > + var garp_nat_addresses = match (peer) { > + Some{rport} -> match ( > + (map_get_bool_def(rport.lrp.options, "reside-on-redirect-chassis", > + false) > + and is_some(l3dgw_port)) or > + Some{rport.lrp} == l3dgw_port or > + (is_some(map_get(rport.router.lr.options, "chassis")) and > + not sw.localnet_port_names.is_empty())) { > + false -> set_empty(), > + true -> set_singleton(get_garp_nat_addresses(deref(rport))) > + }, > + None -> set_empty() > + }, > + var nat_addresses = set_union(base_nat_addresses, garp_nat_addresses). > + > +/* Case 3: Port_Binding per logical router port */ > +OutProxy_Port_Binding(._uuid = lrp._uuid, > + .logical_port = lrp.name, > + .__type = __type, > + .gateway_chassis = set_empty(), > + .ha_chassis_group = None, > + .options = options, > + .datapath = router.lr._uuid, > + .parent_port = None, > + .tag = None, // always empty for router ports > + .mac = set_singleton("${lrp.mac} ${lrp.networks.join(\" \")}"), > + .nat_addresses = set_empty(), > + .external_ids = lrp.external_ids) :- > + rp in &RouterPort(.lrp = lrp, .router = &router, .peer = peer), > + RouterPortRAOptionsComplete(lrp._uuid, options0), > + (var __type, var options1) = match (map_get(router.lr.options, "chassis")) { > + /* TODO: derived ports */ > + None -> ("patch", map_empty()), > + Some{lrchassis} -> ("l3gateway", ["l3gateway-chassis" -> lrchassis]) > + }, > + var options2 = match (router_peer_name(peer)) { > + None -> map_empty(), > + Some{peer_name} -> ["peer" -> peer_name] > + }, > + var options3 = match ((peer, vec_is_empty(rp.networks.ipv6_addrs))) { > + (PeerSwitch{_, _}, false) -> { > + var enabled = lrp.is_enabled(); > + var pd = map_get_bool_def(lrp.options, "prefix_delegation", false); > + var p = map_get_bool_def(lrp.options, "prefix", false); > + ["ipv6_prefix_delegation" -> "${pd and enabled}", > + "ipv6_prefix" -> "${p and enabled}"] > + }, > + _ -> map_empty() > + }, > + PreserveIPv6RAPDList(lrp._uuid, ipv6_ra_pd_list), > + var options4 = match (ipv6_ra_pd_list) { > + None -> map_empty(), > + Some{value} -> ["ipv6_ra_pd_list" -> value] > + }, > + var options = map_union(options0, > + map_union(options1, > + map_union(options2, > + map_union(options3, options4)))), > + var eids = { > + var eids = lrp.external_ids; > + match (map_get(lrp.external_ids, "neutron:port_name")) { > + None -> (), > + Some{name} -> map_insert(eids, "name", name) > + }; > + eids > + }. > +/* > +*/ > +function get_router_load_balancer_ips(router: Router) : > + (Set<string>, Set<string>) = > +{ > + var all_ips_v4 = set_empty(); > + var all_ips_v6 = set_empty(); > + for (lb in router.lbs) { > + for (kv in deref(lb).vips) { > + (var vip, _) = kv; > + /* node->key contains IP:port or just IP. */ > + match (ip_address_and_port_from_lb_key(vip)) { > + None -> (), > + Some{(IPv4{ipv4}, _)} -> set_insert(all_ips_v4, "${ipv4}"), > + Some{(IPv6{ipv6}, _)} -> set_insert(all_ips_v6, "${ipv6}") > + } > + } > + }; > + (all_ips_v4, all_ips_v6) > +} > + > +/* Returns an array of strings, each consisting of a MAC address followed > + * by one or more IP addresses, and if the port is a distributed gateway > + * port, followed by 'is_chassis_resident("LPORT_NAME")', where the > + * LPORT_NAME is the name of the L3 redirect port or the name of the > + * logical_port specified in a NAT rule. These strings include the > + * external IP addresses of all NAT rules defined on that router, and all > + * of the IP addresses used in load balancer VIPs defined on that router. > + */ > +function get_nat_addresses(rport: RouterPort): Set<string> = > +{ > + var addresses = set_empty(); > + var router = deref(rport.router); > + var has_redirect = is_some(router.l3dgw_port); > + match (eth_addr_from_string(rport.lrp.mac)) { > + None -> addresses, > + Some{mac} -> { > + var c_addresses = "${mac}"; > + var central_ip_address = false; > + > + /* Get NAT IP addresses. */ > + for (nat in router.nats) { > + /* Determine whether this NAT rule satisfies the conditions for > + * distributed NAT processing. */ > + if (has_redirect and nat.nat.__type == "dnat_and_snat" and > + is_some(nat.nat.logical_port) and is_some(nat.external_mac)) { > + /* Distributed NAT rule. */ > + var logical_port = option_unwrap_or_default(nat.nat.logical_port); > + var external_mac = option_unwrap_or_default(nat.external_mac); > + set_insert(addresses, > + "${external_mac} ${nat.external_ip} " > + "is_chassis_resident(${json_string_escape(logical_port)})") > + } else { > + /* Centralized NAT rule, either on gateway router or distributed > + * router. > + * Check if external_ip is same as router ip. If so, then there > + * is no need to add this to the nat_addresses. The router IPs > + * will be added separately. */ > + var is_router_ip = false; > + match (nat.external_ip) { > + IPv4{ei} -> { > + for (ipv4 in rport.networks.ipv4_addrs) { > + if (ei == ipv4.addr) { > + is_router_ip = true; > + break > + } > + } > + }, > + IPv6{ei} -> { > + for (ipv6 in rport.networks.ipv6_addrs) { > + if (ei == ipv6.addr) { > + is_router_ip = true; > + break > + } > + } > + } > + }; > + if (not is_router_ip) { > + c_addresses = c_addresses ++ " ${nat.external_ip}"; > + central_ip_address = true > + } > + } > + }; > + > + /* A set to hold all load-balancer vips. */ > + (var all_ips_v4, var all_ips_v6) = get_router_load_balancer_ips(router); > + > + for (ip_address in set_union(all_ips_v4, all_ips_v6)) { > + c_addresses = c_addresses ++ " ${ip_address}"; > + central_ip_address = true > + }; > + > + if (central_ip_address) { > + /* Gratuitous ARP for centralized NAT rules on distributed gateway > + * ports should be restricted to the gateway chassis. */ > + if (has_redirect) { > + c_addresses = c_addresses ++ " is_chassis_resident(${router.redirect_port_name})" > + } else (); > + > + set_insert(addresses, c_addresses) > + } else (); > + addresses > + } > + } > +} > + > +function get_garp_nat_addresses(rport: RouterPort): string = { > + var garp_info = ["${rport.networks.ea}"]; > + for (ipv4_addr in rport.networks.ipv4_addrs) { > + vec_push(garp_info, "${ipv4_addr.addr}") > + }; > + if (rport.router.redirect_port_name != "") { > + vec_push(garp_info, > + "is_chassis_resident(${rport.router.redirect_port_name})") > + }; > + string_join(garp_info, " ") > +} > + > +/* Extra options computed for router ports by the logical flow generation code */ > +relation RouterPortRAOptions(lrp: uuid, options: Map<string, string>) > + > +relation RouterPortRAOptionsComplete(lrp: uuid, options: Map<string, string>) > + > +RouterPortRAOptionsComplete(lrp, options) :- > + RouterPortRAOptions(lrp, options). > +RouterPortRAOptionsComplete(lrp, map_empty()) :- > + nb::Logical_Router_Port(._uuid = lrp), > + not RouterPortRAOptions(lrp, _). > + > + > +/* > + * Create derived port for Logical_Router_Ports with non-empty 'gateway_chassis' column. > + */ > + > +/* Create derived ports */ > +OutProxy_Port_Binding(// lrp._uuid is already in use; generate a new UUID by > + // hashing it. > + ._uuid = hash128(lrp._uuid), > + .logical_port = chassis_redirect_name(lrp.name), > + .__type = "chassisredirect", > + .gateway_chassis = set_empty(), > + .ha_chassis_group = Some{hacg_uuid}, > + .options = options, > + .datapath = lr_uuid, > + .parent_port = None, > + .tag = None, //always empty for router ports > + .mac = set_singleton("${lrp.mac} ${lrp.networks.join(\" \")}"), > + .nat_addresses = set_empty(), > + .external_ids = lrp.external_ids) :- > + DistributedGatewayPort(lrp, lr_uuid), > + LogicalRouterHAChassisGroup(lr_uuid, hacg_uuid), > + var redirect_type = match (map_get(lrp.options, "redirect-type")) { > + Some{var value} -> ["redirect-type" -> value], > + _ -> map_empty() > + }, > + var options = map_insert_imm(redirect_type, "distributed-port", lrp.name). > + > + > +/* Add allocated qdisc_queue_id and tunnel key to Port_Binding. > + */ > +sb::Out_Port_Binding(._uuid = pbinding._uuid, > + .logical_port = pbinding.logical_port, > + .__type = pbinding.__type, > + .gateway_chassis = pbinding.gateway_chassis, > + .ha_chassis_group = pbinding.ha_chassis_group, > + .options = options0, > + .datapath = pbinding.datapath, > + .tunnel_key = tunkey, > + .parent_port = pbinding.parent_port, > + .tag = pbinding.tag, > + .mac = pbinding.mac, > + .nat_addresses = pbinding.nat_addresses, > + .external_ids = pbinding.external_ids) :- > + pbinding in OutProxy_Port_Binding(), > + PortTunKeyAllocation(pbinding._uuid, tunkey), > + QueueIDAllocation(pbinding._uuid, qid), > + var options0 = match (qid) { > + None -> pbinding.options, > + Some{id} -> map_insert_imm(pbinding.options, "qdisc_queue_id", "${id}") > + }. > + > +/* Referenced chassis. > + * > + * These tables track the sb::Chassis that a packet that traverses logical > + * router 'lr_uuid' can end up at (or start from). This is used for > + * sb::Out_HA_Chassis_Group's ref_chassis column. > + * > + * RefChassisSet0 has a row for each logical router that actually references a > + * chassis. RefChassisSet has a row for every logical router. */ > +relation RefChassis(lr_uuid: uuid, chassis_uuid: uuid) > +RefChassis(lr_uuid, chassis_uuid) :- > + ReachableLogicalRouter(lr_uuid, lr2_uuid), > + FirstHopLogicalRouter(lr2_uuid, ls_uuid), > + LogicalSwitchPort(lsp_uuid, ls_uuid), > + nb::Logical_Switch_Port(._uuid = lsp_uuid, .name = lsp_name), > + sb::Port_Binding(.logical_port = lsp_name, .chassis = chassis_uuids), > + Some{var chassis_uuid} = chassis_uuids. > +relation RefChassisSet0(lr_uuid: uuid, chassis_uuids: Set<uuid>) > +RefChassisSet0(lr_uuid, chassis_uuids) :- > + RefChassis(lr_uuid, chassis_uuid), > + var chassis_uuids = chassis_uuid.group_by(lr_uuid).to_set(). > +relation RefChassisSet(lr_uuid: uuid, chassis_uuids: Set<uuid>) > +RefChassisSet(lr_uuid, chassis_uuids) :- > + RefChassisSet0(lr_uuid, chassis_uuids). > +RefChassisSet(lr_uuid, set_empty()) :- > + nb::Logical_Router(._uuid = lr_uuid), > + not RefChassisSet0(lr_uuid, _). > + > +/* Referenced chassis for an HA chassis group. > + * > + * Multiple logical routers can reference an HA chassis group so we merge the > + * referenced chassis across all of them. > + */ > +relation HAChassisGroupRefChassisSet(hacg_uuid: uuid, > + chassis_uuids: Set<uuid>) > +HAChassisGroupRefChassisSet(hacg_uuid, chassis_uuids) :- > + LogicalRouterHAChassisGroup(lr_uuid, hacg_uuid), > + RefChassisSet(lr_uuid, chassis_uuids), > + var chassis_uuids = chassis_uuids.group_by(hacg_uuid).union(). > + > +/* HA_Chassis_Group and HA_Chassis. */ > +sb::Out_HA_Chassis_Group(hacg_uuid, hacg_name, ha_chassis, ref_chassis, eids) :- > + HAChassis(hacg_uuid, hac_uuid, chassis_name, _, _), > + var chassis_uuid = ha_chassis_uuid(chassis_name, hac_uuid), > + var ha_chassis = chassis_uuid.group_by(hacg_uuid).to_set(), > + HAChassisGroup(hacg_uuid, hacg_name, eids), > + HAChassisGroupRefChassisSet(hacg_uuid, ref_chassis). > + > +sb::Out_HA_Chassis(ha_chassis_uuid(chassis_name, hac_uuid), chassis, priority, eids) :- > + HAChassis(_, hac_uuid, chassis_name, priority, eids), > + chassis_rec in sb::Chassis(.name = chassis_name), > + var chassis = Some{chassis_rec._uuid}. > +sb::Out_HA_Chassis(ha_chassis_uuid(chassis_name, hac_uuid), None, priority, eids) :- > + HAChassis(_, hac_uuid, chassis_name, priority, eids), > + not chassis_rec in sb::Chassis(.name = chassis_name). > + > +relation HAChassisToChassis(name: string, chassis: Option<uuid>) > +HAChassisToChassis(name, Some{chassis}) :- > + sb::Chassis(._uuid = chassis, .name = name). > +HAChassisToChassis(name, None) :- > + nb::HA_Chassis(.chassis_name = name), > + not sb::Chassis(.name = name). > +sb::Out_HA_Chassis(ha_chassis_uuid(ha_chassis.chassis_name, hac_uuid), chassis, priority, eids) :- > + sp in &SwitchPort(), > + sp.lsp.__type == "external", > + Some{var ha_chassis_group_uuid} = sp.lsp.ha_chassis_group, > + ha_chassis_group in nb::HA_Chassis_Group(._uuid = ha_chassis_group_uuid), > + var hac_uuid = FlatMap(ha_chassis_group.ha_chassis), > + ha_chassis in nb::HA_Chassis(._uuid = hac_uuid, .priority = priority, .external_ids = eids), > + HAChassisToChassis(ha_chassis.chassis_name, chassis). > +sb::Out_HA_Chassis_Group(_uuid, name, ha_chassis, set_empty() /* XXX? */, eids) :- > + sp in &SwitchPort(), > + sp.lsp.__type == "external", > + var ls_uuid = sp.sw.ls._uuid, > + Some{var ha_chassis_group_uuid} = sp.lsp.ha_chassis_group, > + ha_chassis_group in nb::HA_Chassis_Group(._uuid = ha_chassis_group_uuid, .name = name, > + .external_ids = eids), > + var hac_uuid = FlatMap(ha_chassis_group.ha_chassis), > + ha_chassis in nb::HA_Chassis(._uuid = hac_uuid), > + var ha_chassis_uuid_name = ha_chassis_uuid(ha_chassis.chassis_name, hac_uuid), > + var ha_chassis = ha_chassis_uuid_name.group_by((ls_uuid, name, eids)).to_set(), > + var _uuid = ha_chassis_group_uuid(ls_uuid). > + > +/* > + * SB_Global: copy nb_cfg and options from NB. > + * If NB_Global does not exist yet, just keep the current value of SB_Global, > + * if any. > + */ > +for (nb_global in nb::NB_Global) { > + sb::Out_SB_Global(._uuid = nb_global._uuid, > + .nb_cfg = nb_global.nb_cfg, > + .options = nb_global.options, > + .ipsec = nb_global.ipsec) > +} > + > +sb::Out_SB_Global(._uuid = sb_global._uuid, > + .nb_cfg = sb_global.nb_cfg, > + .options = sb_global.options, > + .ipsec = sb_global.ipsec) :- > + sb_global in sb::SB_Global(), > + not nb::NB_Global(). > + > +/* sb::Chassis_Private joined with is_remote from sb::Chassis, > + * including a record even for a null Chassis ref. */ > +relation ChassisPrivate( > + cp: sb::Chassis_Private, > + is_remote: bool) > +ChassisPrivate(cp, map_get_bool_def(c.other_config, "is-remote", false)) :- > + cp in sb::Chassis_Private(.chassis = Some{uuid}), > + c in sb::Chassis(._uuid = uuid). > +ChassisPrivate(cp, false), > +Warning["Chassis not exist for Chassis_Private record, name: ${cp.name}"] :- > + cp in sb::Chassis_Private(.chassis = Some{uuid}), > + not sb::Chassis(._uuid = uuid). > +ChassisPrivate(cp, false), > +Warning["Chassis not exist for Chassis_Private record, name: ${cp.name}"] :- > + cp in sb::Chassis_Private(.chassis = None). > + > +/* Track minimum hv_cfg across all the (non-remote) chassis. */ > +relation HvCfg0(hv_cfg: integer) > +HvCfg0(hv_cfg) :- > + ChassisPrivate(.cp = sb::Chassis_Private{.nb_cfg = chassis_cfg}, .is_remote = false), > + var hv_cfg = chassis_cfg.group_by(()).min(). > +relation HvCfg(hv_cfg: integer) > +HvCfg(hv_cfg) :- HvCfg0(hv_cfg). > +HvCfg(hv_cfg) :- > + nb::NB_Global(.nb_cfg = hv_cfg), > + not HvCfg0(). > + > +/* Track maximum nb_cfg_timestamp among all the (non-remote) chassis > + * that have the minimum nb_cfg. */ > +relation HvCfgTimestamp0(hv_cfg_timestamp: integer) > +HvCfgTimestamp0(hv_cfg_timestamp) :- > + HvCfg(hv_cfg), > + ChassisPrivate(.cp = sb::Chassis_Private{.nb_cfg = hv_cfg, > + .nb_cfg_timestamp = chassis_cfg_timestamp}, > + .is_remote = false), > + var hv_cfg_timestamp = chassis_cfg_timestamp.group_by(()).max(). > +relation HvCfgTimestamp(hv_cfg_timestamp: integer) > +HvCfgTimestamp(hv_cfg_timestamp) :- HvCfgTimestamp0(hv_cfg_timestamp). > +HvCfgTimestamp(hv_cfg_timestamp) :- > + nb::NB_Global(.hv_cfg_timestamp = hv_cfg_timestamp), > + not HvCfgTimestamp0(). > + > +/* > + * NB_Global: > + * - set `sb_cfg` to the value of `SB_Global.nb_cfg`. > + * - set `hv_cfg` to the smallest value of `nb_cfg` across all `Chassis` > + * - FIXME: we use ipsec as unique key to make sure that we don't create multiple `NB_Global` > + * instance. There is a potential race condition if this field is modified at the same > + * time northd is updating `sb_cfg` or `hv_cfg`. > + */ > +input relation NbCfgTimestamp[integer] > +nb::Out_NB_Global(._uuid = _uuid, > + .sb_cfg = sb_cfg, > + .hv_cfg = hv_cfg, > + .nb_cfg_timestamp = nb_cfg_timestamp, > + .hv_cfg_timestamp = hv_cfg_timestamp, > + .ipsec = ipsec, > + .options = options) :- > + NbCfgTimestamp[nb_cfg_timestamp], > + HvCfgTimestamp(hv_cfg_timestamp), > + nbg in nb::NB_Global(._uuid = _uuid, .ipsec = ipsec), > + sb::SB_Global(.nb_cfg = sb_cfg), > + HvCfg(hv_cfg), > + HvCfgTimestamp(hv_cfg_timestamp), > + MacPrefix(mac_prefix), > + SvcMonitorMac(svc_monitor_mac), > + OvnMaxDpKeyLocal[max_tunid], > + var options0 = put_mac_prefix(nbg.options, mac_prefix), > + var options1 = put_svc_monitor_mac(options0, svc_monitor_mac), > + var options = map_insert_imm(options1, "max_tunid", "${max_tunid}"). > + > + > +/* SB_Global does not exist yet -- just keep the old value of NB_Global */ > +nb::Out_NB_Global(._uuid = nbg._uuid, > + .sb_cfg = nbg.sb_cfg, > + .hv_cfg = nbg.hv_cfg, > + .ipsec = nbg.ipsec, > + .options = nbg.options, > + .nb_cfg_timestamp = nb_cfg_timestamp, > + .hv_cfg_timestamp = hv_cfg_timestamp) :- > + NbCfgTimestamp[nb_cfg_timestamp], > + HvCfgTimestamp(hv_cfg_timestamp), > + nbg in nb::NB_Global(), > + not sb::SB_Global(). > + > +output relation SbCfg[integer] > +SbCfg[sb_cfg] :- nb::Out_NB_Global(.sb_cfg = sb_cfg). > + > +output relation Northd_Probe_Interval[integer] > +Northd_Probe_Interval[interval] :- > + nb in nb::NB_Global(), > + var interval = map_get_int_def(nb.options, "northd_probe_interval", 0). > + > +relation CheckLspIsUp[bool] > +CheckLspIsUp[check_lsp_is_up] :- > + nb in nb::NB_Global(), > + var check_lsp_is_up = not map_get_bool_def(nb.options, "ignore_lsp_down", false). > +CheckLspIsUp[true] :- > + Unit(), > + not nb in nb::NB_Global(). > + > +/* > + * Address_Set: copy from NB + additional records generated from NB Port_Group (two records for each > + * Port_Group for IPv4 and IPv6 addresses). > + * > + * There can be name collisions between the two types of Address_Set records. User-defined records > + * take precedence. > + */ > +sb::Out_Address_Set(._uuid = nb_as._uuid, > + .name = nb_as.name, > + .addresses = nb_as.addresses) :- > + AddressSetRef[nb_as]. > + > +sb::Out_Address_Set(._uuid = hash128("svc_monitor_mac"), > + .name = "svc_monitor_mac", > + .addresses = set_singleton("${svc_monitor_mac}")) :- > + SvcMonitorMac(svc_monitor_mac). > + > +sb::Out_Address_Set(hash128(as_name), as_name, set_unions(pg_ip4addrs)) :- > + nb::Port_Group(.ports = pg_ports, .name = pg_name), > + var as_name = pg_name ++ "_ip4", > + // avoid name collisions with user-defined Address_Sets > + not nb::Address_Set(.name = as_name), > + var port_uuid = FlatMap(pg_ports), > + PortStaticAddresses(.lsport = port_uuid, .ip4addrs = stat), > + SwitchPortNewDynamicAddress(&SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = port_uuid}}, > + dyn_addr), > + var dynamic = match (dyn_addr) { > + None -> set_empty(), > + Some{lpaddress} -> match (vec_nth(lpaddress.ipv4_addrs, 0)) { > + None -> set_empty(), > + Some{addr} -> set_singleton("${addr.addr}") > + } > + }, > + //PortDynamicAddresses(.lsport = port_uuid, .ip4addrs = dynamic), > + var port_ip4addrs = set_union(stat, dynamic), > + var pg_ip4addrs = port_ip4addrs.group_by(as_name).to_vec(). > + > +sb::Out_Address_Set(hash128(as_name), as_name, set_empty()) :- > + nb::Port_Group(.ports = set_empty(), .name = pg_name), > + var as_name = pg_name ++ "_ip4", > + // avoid name collisions with user-defined Address_Sets > + not nb::Address_Set(.name = as_name). > + > +sb::Out_Address_Set(hash128(as_name), as_name, set_unions(pg_ip6addrs)) :- > + nb::Port_Group(.ports = pg_ports, .name = pg_name), > + var as_name = pg_name ++ "_ip6", > + // avoid name collisions with user-defined Address_Sets > + not nb::Address_Set(.name = as_name), > + var port_uuid = FlatMap(pg_ports), > + PortStaticAddresses(.lsport = port_uuid, .ip6addrs = stat), > + SwitchPortNewDynamicAddress(&SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = port_uuid}}, > + dyn_addr), > + var dynamic = match (dyn_addr) { > + None -> set_empty(), > + Some{lpaddress} -> match (vec_nth(lpaddress.ipv6_addrs, 0)) { > + None -> set_empty(), > + Some{addr} -> set_singleton("${addr.addr}") > + } > + }, > + //PortDynamicAddresses(.lsport = port_uuid, .ip6addrs = dynamic), > + var port_ip6addrs = set_union(stat, dynamic), > + var pg_ip6addrs = port_ip6addrs.group_by(as_name).to_vec(). > + > +sb::Out_Address_Set(hash128(as_name), as_name, set_empty()) :- > + nb::Port_Group(.ports = set_empty(), .name = pg_name), > + var as_name = pg_name ++ "_ip6", > + // avoid name collisions with user-defined Address_Sets > + not nb::Address_Set(.name = as_name). > + > +/* > + * Port_Group > + * > + * Create one SB Port_Group record for every datapath that has ports > + * referenced by the NB Port_Group.ports field. In order to maintain the > + * SB Port_Group.name uniqueness constraint, ovn-northd populates the field > + * with the value: <SB.Logical_Datapath.tunnel_key>_<NB.Port_Group.name>. > + */ > +sb::Out_Port_Group(._uuid = hash128(sb_name), .name = sb_name, .ports = port_names) :- > + nb::Port_Group(._uuid = _uuid, .name = nb_name, .ports = pg_ports), > + var port_uuid = FlatMap(pg_ports), > + &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{._uuid = port_uuid, > + .name = port_name}, > + .sw = &Switch{.ls = nb::Logical_Switch{._uuid = ls_uuid}}), > + TunKeyAllocation(.datapath = ls_uuid, .tunkey = tunkey), > + var sb_name = "${tunkey}_${nb_name}", > + var port_names = port_name.group_by((_uuid, sb_name)).to_set(). > + > +/* > + * Multicast_Group: > + * - three static rows per logical switch: one for flooding, one for packets > + * with unknown destinations, one for flooding IP multicast known traffic to > + * mrouters. > + * - dynamically created rows based on IGMP groups learned by controllers. > + */ > + > +function mC_FLOOD(): (string, integer) = > + ("_MC_flood", 32768) > + > +function mC_UNKNOWN(): (string, integer) = > + ("_MC_unknown", 32769) > + > +function mC_MROUTER_FLOOD(): (string, integer) = > + ("_MC_mrouter_flood", 32770) > + > +function mC_MROUTER_STATIC(): (string, integer) = > + ("_MC_mrouter_static", 32771) > + > +function mC_STATIC(): (string, integer) = > + ("_MC_static", 32772) > + > +function mC_FLOOD_L2(): (string, integer) = > + ("_MC_flood_l2", 32773) > + > +function mC_IP_MCAST_MIN(): (string, integer) = > + ("_MC_ip_mcast_min", 32774) > + > +function mC_IP_MCAST_MAX(): (string, integer) = > + ("_MC_ip_mcast_max", 65535) > + > + > +// TODO: check that Multicast_Group.ports should not include derived ports > + > +/* Proxy table for Out_Multicast_Group: contains all Multicast_Group fields, > + * except `_uuid`, which will be computed by hashing the remaining fields, > + * and tunnel key, which case it is allocated separately (see > + * MulticastGroupTunKeyAllocation). */ > +relation OutProxy_Multicast_Group ( > + datapath: uuid, > + name: string, > + ports: Set<uuid> > +) > + > +/* Only create flood group if the switch has enabled ports */ > +sb::Out_Multicast_Group (._uuid = hash128((datapath,name)), > + .datapath = datapath, > + .name = name, > + .tunnel_key = tunnel_key, > + .ports = port_ids) :- > + &SwitchPort(.lsp = lsp, .sw = &Switch{.ls = ls}), > + lsp.is_enabled(), > + var datapath = ls._uuid, > + var port_ids = lsp._uuid.group_by((datapath)).to_set(), > + (var name, var tunnel_key) = mC_FLOOD(). > + > +/* Create a multicast group to flood to all switch ports except router ports. > + */ > +sb::Out_Multicast_Group (._uuid = hash128((datapath,name)), > + .datapath = datapath, > + .name = name, > + .tunnel_key = tunnel_key, > + .ports = port_ids) :- > + &SwitchPort(.lsp = lsp, .sw = &Switch{.ls = ls}), > + lsp.is_enabled(), > + lsp.__type != "router", > + var datapath = ls._uuid, > + var port_ids = lsp._uuid.group_by((datapath)).to_set(), > + (var name, var tunnel_key) = mC_FLOOD_L2(). > + > +/* Only create unknown group if the switch has ports with "unknown" address */ > +sb::Out_Multicast_Group (._uuid = hash128((ls,name)), > + .datapath = ls, > + .name = name, > + .tunnel_key = tunnel_key, > + .ports = port_ids) :- > + LogicalSwitchUnknownPorts(ls, port_ids), > + (var name, var tunnel_key) = mC_UNKNOWN(). > + > +/* Create a multicast group to flood multicast traffic to routers with > + * multicast relay enabled. > + */ > +sb::Out_Multicast_Group (._uuid = hash128((sw.ls._uuid,name)), > + .datapath = sw.ls._uuid, > + .name = name, > + .tunnel_key = tunnel_key, > + .ports = port_ids) :- > + SwitchMcastFloodRelayPorts(&sw, port_ids), not set_is_empty(port_ids), > + (var name, var tunnel_key) = mC_MROUTER_FLOOD(). > + > +/* Create a multicast group to flood traffic (no reports) to ports with > + * multicast flood enabled. > + */ > +sb::Out_Multicast_Group (._uuid = hash128((sw.ls._uuid,name)), > + .datapath = sw.ls._uuid, > + .name = name, > + .tunnel_key = tunnel_key, > + .ports = port_ids) :- > + SwitchMcastFloodPorts(&sw, port_ids), not set_is_empty(port_ids), > + (var name, var tunnel_key) = mC_STATIC(). > + > +/* Create a multicast group to flood reports to ports with > + * multicast flood_reports enabled. > + */ > +sb::Out_Multicast_Group (._uuid = hash128((sw.ls._uuid,name)), > + .datapath = sw.ls._uuid, > + .name = name, > + .tunnel_key = tunnel_key, > + .ports = port_ids) :- > + SwitchMcastFloodReportPorts(&sw, port_ids), not set_is_empty(port_ids), > + (var name, var tunnel_key) = mC_MROUTER_STATIC(). > + > +/* Create a multicast group to flood traffic and reports to router ports with > + * multicast flood enabled. > + */ > +sb::Out_Multicast_Group (._uuid = hash128((rtr.lr._uuid,name)), > + .datapath = rtr.lr._uuid, > + .name = name, > + .tunnel_key = tunnel_key, > + .ports = port_ids) :- > + RouterMcastFloodPorts(&rtr, port_ids), not set_is_empty(port_ids), > + (var name, var tunnel_key) = mC_STATIC(). > + > +/* Create a multicast group for each IGMP group learned by a Switch. > + * 'tunnel_key' == 0 triggers an ID allocation later. > + */ > +OutProxy_Multicast_Group (.datapath = switch.ls._uuid, > + .name = address, > + .ports = port_ids) :- > + IgmpSwitchMulticastGroup(address, &switch, port_ids). > + > +/* Create a multicast group for each IGMP group learned by a Router. > + * 'tunnel_key' == 0 triggers an ID allocation later. > + */ > +OutProxy_Multicast_Group (.datapath = router.lr._uuid, > + .name = address, > + .ports = port_ids) :- > + IgmpRouterMulticastGroup(address, &router, port_ids). > + > +/* Allocate a 'tunnel_key' for dynamic multicast groups. */ > +sb::Out_Multicast_Group(._uuid = hash128((mcgroup.datapath,mcgroup.name)), > + .datapath = mcgroup.datapath, > + .name = mcgroup.name, > + .tunnel_key = tunnel_key, > + .ports = mcgroup.ports) :- > + mcgroup in OutProxy_Multicast_Group(), > + MulticastGroupTunKeyAllocation(mcgroup.datapath, mcgroup.name, tunnel_key). > + > +/* > + * MAC binding: records inserted by hypervisors; northd removes records for deleted logical ports and datapaths. > + */ > +sb::Out_MAC_Binding (._uuid = mb._uuid, > + .logical_port = mb.logical_port, > + .ip = mb.ip, > + .mac = mb.mac, > + .datapath = mb.datapath) :- > + sb::MAC_Binding[mb], > + sb::Out_Port_Binding(.logical_port = mb.logical_port), > + sb::Out_Datapath_Binding(._uuid = mb.datapath). > + > +/* > + * DHCP options: fixed table > + */ > +sb::Out_DHCP_Options ( > + ._uuid = 128'h7d9d898a_179b_4898_8382_b73bec391f23, > + .name = "offerip", > + .code = 0, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hea5e7d14_fd97_491c_8004_a120bdbc4306, > + .name = "netmask", > + .code = 1, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hdab5e39b_6702_4245_9573_6c142aa3724c, > + .name = "router", > + .code = 3, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h340b4bc5_c5c3_43d1_ae77_564da69c8fcc, > + .name = "dns_server", > + .code = 6, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hcd1ab302_cbb2_4eab_9ec5_ec1c8541bd82, > + .name = "log_server", > + .code = 7, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h1c7ea6a0_fe6b_48c1_a920_302583c1ff08, > + .name = "lpr_server", > + .code = 9, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hae35e575_226a_4ab5_a1c4_166f426dd999, > + .name = "domain_name", > + .code = 15, > + .__type = "str" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'had0ec3e0_8be9_4c77_bceb_f8954a34c7ba, > + .name = "swap_server", > + .code = 16, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h884c2e02_6e99_4d12_aef7_8454ebf8a3b7, > + .name = "policy_filter", > + .code = 21, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h57cc2c61_fd2a_41c6_b6b1_6ce9a8901f86, > + .name = "router_solicitation", > + .code = 32, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h48249097_03f0_46c1_a32a_2dd57cd4d0f8, > + .name = "nis_server", > + .code = 41, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h333fe07e_bdd1_4371_aa4f_a412bc60f3a2, > + .name = "ntp_server", > + .code = 42, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h6207109c_49d0_4348_8238_dd92afb69bf0, > + .name = "server_id", > + .code = 54, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h2090b783_26d3_4c1d_830c_54c1b6c5d846, > + .name = "tftp_server", > + .code = 66, > + .__type = "host_id" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'ha18ff399_caea_406e_af7e_321c6f74e581, > + .name = "classless_static_route", > + .code = 121, > + .__type = "static_routes" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hb81ad7b4_62f0_40c7_a9a3_f96677628767, > + .name = "ms_classless_static_route", > + .code = 249, > + .__type = "static_routes" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h0c2e144e_4b5f_4e21_8978_0e20bac9a6ea, > + .name = "ip_forward_enable", > + .code = 19, > + .__type = "bool" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h6feb1926_9469_4b40_bfbf_478b9888cd3a, > + .name = "router_discovery", > + .code = 31, > + .__type = "bool" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hcb776249_e8b1_4502_b33b_fa294d44077d, > + .name = "ethernet_encap", > + .code = 36, > + .__type = "bool" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'ha2df9eaa_aea9_497f_b339_0c8ec3e39a07, > + .name = "default_ttl", > + .code = 23, > + .__type = "uint8" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hb44b45a9_5004_4ef5_8e6a_aa8629e1afb1, > + .name = "tcp_ttl", > + .code = 37, > + .__type = "uint8" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h50f01ca7_c650_46f0_8f50_39a67ec657da, > + .name = "mtu", > + .code = 26, > + .__type = "uint16" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h9d31c057_6085_4810_96af_eeac7d3c5308, > + .name = "lease_time", > + .code = 51, > + .__type = "uint32" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hea1e2e7a_9585_46ee_ad49_adfdefc0c4ef, > + .name = "T1", > + .code = 58, > + .__type = "uint32" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hbc83a233_554b_453a_afca_1eadf76810d2, > + .name = "T2", > + .code = 59, > + .__type = "uint32" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h1ab3eeca_0523_4101_9076_eea77d0232f4, > + .name = "bootfile_name", > + .code = 67, > + .__type = "str" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'ha5c20b69_f7f3_4fa8_b550_8697aec6cbb7, > + .name = "wpad", > + .code = 252, > + .__type = "str" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h1516bcb6_cc93_4233_a63f_bd29c8601831, > + .name = "path_prefix", > + .code = 210, > + .__type = "str" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hc98e13cd_f653_473c_85c1_850dcad685fc, > + .name = "tftp_server_address", > + .code = 150, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hfbe06e70_b43d_4dd9_9b21_2f27eb5da5df, > + .name = "arp_cache_timeout", > + .code = 35, > + .__type = "uint32" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h2af54a3c_545c_4104_ae1c_432caa3e085e, > + .name = "tcp_keepalive_interval", > + .code = 38, > + .__type = "uint32" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h4b2144e8_8d3f_4d96_9032_fe23c1866cd4, > + .name = "domain_search_list", > + .code = 119, > + .__type = "domains" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'hb7236164_eea4_4bf2_9306_8619a9e3ad1d, > + .name = "broadcast_address", > + .code = 28, > + .__type = "ipv4" > +). > + > +sb::Out_DHCP_Options ( > + ._uuid = 128'h2d738583_96f4_4a78_99a1_f8f7fe328f3f, > + .name = "bootfile_name_alt", > + .code = 254, > + .__type = "str" > +). > + > + > +/* > + * DHCPv6 options: fixed table > + */ > +sb::Out_DHCPv6_Options ( > + ._uuid = 128'h100b2659_0ec0_4da7_9ec3_25997f92dc00, > + .name = "server_id", > + .code = 2, > + .__type = "mac" > +). > + > +sb::Out_DHCPv6_Options ( > + ._uuid = 128'h53f49b50_db75_4b0d_83df_50d31009ca9c, > + .name = "ia_addr", > + .code = 5, > + .__type = "ipv6" > +). > + > +sb::Out_DHCPv6_Options ( > + ._uuid = 128'he3619685_d4f7_42ad_936b_4f4440b7eeb4, > + .name = "dns_server", > + .code = 23, > + .__type = "ipv6" > +). > + > +sb::Out_DHCPv6_Options ( > + ._uuid = 128'hcb8a4e7f_a312_4cb1_a846_e474d9f0c531, > + .name = "domain_search", > + .code = 24, > + .__type = "str" > +). > + > + > +/* > + * DNS: copied from NB + datapaths column pointer to LS datapaths that use the record > + */ > + > +function map_to_lowercase(m_in: Map<string,string>): Map<string,string> { > + var m_out = map_empty(); > + for (node in m_in) { > + (var k, var v) = node; > + map_insert(m_out, string_to_lowercase(k), string_to_lowercase(v)) > + }; > + m_out > +} > + > +sb::Out_DNS(._uuid = nbdns._uuid, > + .records = map_to_lowercase(nbdns.records), > + .datapaths = datapaths, > + .external_ids = map_insert_imm(nbdns.external_ids, "dns_id", uuid2str(nbdns._uuid))) :- > + nb::DNS[nbdns], > + LogicalSwitchDNS(ls_uuid, nbdns._uuid), > + var datapaths = ls_uuid.group_by(nbdns).to_set(). > + > +/* > + * RBAC_Permission: fixed > + */ > + > +sb::Out_RBAC_Permission ( > + ._uuid = 128'h7df3749a_1754_4a78_afa4_3abf526fe510, > + .table = "Chassis", > + .authorization = set_singleton("name"), > + .insert_delete = true, > + .update = ["nb_cfg", "external_ids", "encaps", > + "vtep_logical_switches", "other_config", "name"].to_set() > +). > + > +sb::Out_RBAC_Permission ( > + ._uuid = 128'h07e623f7_137c_4a11_9084_3b3f89cb4a54, > + .table = "Chassis_Private", > + .authorization = set_singleton("name"), > + .insert_delete = true, > + .update = ["nb_cfg", "nb_cfg_timestamp", "chassis", "name"].to_set() > +). > + > +sb::Out_RBAC_Permission ( > + ._uuid = 128'h94bec860_431e_4d95_82e7_3b75d8997241, > + .table = "Encap", > + .authorization = set_singleton("chassis_name"), > + .insert_delete = true, > + .update = ["type", "options", "ip", "chassis_name"].to_set() > +). > + > +sb::Out_RBAC_Permission ( > + ._uuid = 128'hd8ceff1a_2b11_48bd_802f_4a991aa4e908, > + .table = "Port_Binding", > + .authorization = set_singleton(""), > + .insert_delete = false, > + .update = set_singleton("chassis") > +). > + > +sb::Out_RBAC_Permission ( > + ._uuid = 128'h6ffdc696_8bfb_4d82_b620_a00d39270b2f, > + .table = "MAC_Binding", > + .authorization = set_singleton(""), > + .insert_delete = true, > + .update = ["logical_port", "ip", "mac", "datapath"].to_set() > +). > + > +sb::Out_RBAC_Permission ( > + ._uuid = 128'h39231c7e_4bf1_41d0_ada4_1d8a319c0da3, > + .table = "Service_Monitor", > + .authorization = set_singleton(""), > + .insert_delete = false, > + .update = set_singleton("status") > +). > + > +/* > + * RBAC_Role: fixed > + */ > +sb::Out_RBAC_Role ( > + ._uuid = 128'ha406b472_5de8_4456_9f38_bf344c911b22, > + .name = "ovn-controller", > + .permissions = [ > + "Chassis" -> 128'h7df3749a_1754_4a78_afa4_3abf526fe510, > + "Chassis_Private" -> 128'h07e623f7_137c_4a11_9084_3b3f89cb4a54, > + "Encap" -> 128'h94bec860_431e_4d95_82e7_3b75d8997241, > + "Port_Binding" -> 128'hd8ceff1a_2b11_48bd_802f_4a991aa4e908, > + "MAC_Binding" -> 128'h6ffdc696_8bfb_4d82_b620_a00d39270b2f, > + "Service_Monitor"-> 128'h39231c7e_4bf1_41d0_ada4_1d8a319c0da3] > + > +). > + > +/* Output modified Logical_Switch_Port table with dynamic address updated */ > +nb::Out_Logical_Switch_Port(._uuid = lsp._uuid, > + .tag = tag, > + .dynamic_addresses = dynamic_addresses, > + .up = Some{up}) :- > + SwitchPortNewDynamicAddress(&SwitchPort{.lsp = lsp, .up = up}, opt_dyn_addr), > + var dynamic_addresses = match (opt_dyn_addr) { > + None -> None, > + Some{dyn_addr} -> Some{"${dyn_addr}"} > + }, > + SwitchPortNewDynamicTag(lsp._uuid, opt_tag), > + var tag = match (opt_tag) { > + None -> lsp.tag, > + Some{t} -> Some{t} > + }. > + > +relation LRPIPv6Prefix0(lrp_uuid: uuid, ipv6_prefix: string) > +LRPIPv6Prefix0(lrp._uuid, ipv6_prefix) :- > + lrp in nb::Logical_Router_Port(), > + map_get_bool_def(lrp.options, "prefix", false), > + sb::Port_Binding(.logical_port = lrp.name, .options = options), > + Some{var ipv6_ra_pd_list} = map_get(options, "ipv6_ra_pd_list"), > + var parts = string_split(ipv6_ra_pd_list, ","), > + Some{var ipv6_prefix} = vec_nth(parts, 1). > + > +relation LRPIPv6Prefix(lrp_uuid: uuid, ipv6_prefix: Option<string>) > +LRPIPv6Prefix(lrp_uuid, Some{ipv6_prefix}) :- > + LRPIPv6Prefix0(lrp_uuid, ipv6_prefix). > +LRPIPv6Prefix(lrp_uuid, None) :- > + nb::Logical_Router_Port(._uuid = lrp_uuid), > + not LRPIPv6Prefix0(lrp_uuid, _). > + > +nb::Out_Logical_Router_Port(._uuid = _uuid, > + .ipv6_prefix = to_set(ipv6_prefix)) :- > + nb::Logical_Router_Port(._uuid = _uuid, .name = name), > + LRPIPv6Prefix(_uuid, ipv6_prefix). > + > +typedef Direction = IN | OUT > + > +typedef PipelineStage = PORT_SEC_L2 > + | PORT_SEC_IP > + | PORT_SEC_ND > + | PRE_ACL > + | PRE_LB > + | PRE_STATEFUL > + | ACL_HINT > + | ACL > + | QOS_MARK > + | QOS_METER > + | LB > + | STATEFUL > + | PRE_HAIRPIN > + | HAIRPIN > + | ARP_ND_RSP > + | DHCP_OPTIONS > + | DHCP_RESPONSE > + | DNS_LOOKUP > + | DNS_RESPONSE > + | EXTERNAL_PORT > + | L2_LKUP > + | ADMISSION > + | LOOKUP_NEIGHBOR > + | LEARN_NEIGHBOR > + | IP_INPUT > + | DEFRAG > + | UNSNAT > + | DNAT > + | ECMP_STATEFUL > + | ND_RA_OPTIONS > + | ND_RA_RESPONSE > + | IP_ROUTING > + | IP_ROUTING_ECMP > + | POLICY > + | ARP_RESOLVE > + | CHK_PKT_LEN > + | LARGER_PKTS > + | GW_REDIRECT > + | ARP_REQUEST > + | UNDNAT > + | SNAT > + | EGR_LOOP > + | DELIVERY > + > +typedef DatapathType = LSwitch | LRouter > + > +typedef Stage = Stage{ > + datapath : DatapathType, > + direction : Direction, > + stage : PipelineStage > +} > + > +function switch_stage(direction: Direction, stage: PipelineStage): Stage = { > + Stage{LSwitch, direction, stage} > +} > + > +function router_stage(direction: Direction, stage: PipelineStage): Stage = { > + Stage{LRouter, direction, stage} > +} > + > +function stage_id(stage: Stage): (integer, string) = > +{ > + match ((stage.datapath, stage.direction, stage.stage)) { > + /* Logical switch ingress stages. */ > + (LSwitch, IN, PORT_SEC_L2) -> (0, "ls_in_port_sec_l2"), > + (LSwitch, IN, PORT_SEC_IP) -> (1, "ls_in_port_sec_ip"), > + (LSwitch, IN, PORT_SEC_ND) -> (2, "ls_in_port_sec_nd"), > + (LSwitch, IN, PRE_ACL) -> (3, "ls_in_pre_acl"), > + (LSwitch, IN, PRE_LB) -> (4, "ls_in_pre_lb"), > + (LSwitch, IN, PRE_STATEFUL) -> (5, "ls_in_pre_stateful"), > + (LSwitch, IN, ACL_HINT) -> (6, "ls_in_acl_hint"), > + (LSwitch, IN, ACL) -> (7, "ls_in_acl"), > + (LSwitch, IN, QOS_MARK) -> (8, "ls_in_qos_mark"), > + (LSwitch, IN, QOS_METER) -> (9, "ls_in_qos_meter"), > + (LSwitch, IN, LB) -> (10, "ls_in_lb"), > + (LSwitch, IN, STATEFUL) -> (11, "ls_in_stateful"), > + (LSwitch, IN, PRE_HAIRPIN) -> (12, "ls_in_pre_hairpin"), > + (LSwitch, IN, HAIRPIN) -> (13, "ls_in_hairpin"), > + (LSwitch, IN, ARP_ND_RSP) -> (14, "ls_in_arp_rsp"), > + (LSwitch, IN, DHCP_OPTIONS) -> (15, "ls_in_dhcp_options"), > + (LSwitch, IN, DHCP_RESPONSE) -> (16, "ls_in_dhcp_response"), > + (LSwitch, IN, DNS_LOOKUP) -> (17, "ls_in_dns_lookup"), > + (LSwitch, IN, DNS_RESPONSE) -> (18, "ls_in_dns_response"), > + (LSwitch, IN, EXTERNAL_PORT) -> (19, "ls_in_external_port"), > + (LSwitch, IN, L2_LKUP) -> (20, "ls_in_l2_lkup"), > + > + /* Logical switch egress stages. */ > + (LSwitch, OUT, PRE_LB) -> (0, "ls_out_pre_lb"), > + (LSwitch, OUT, PRE_ACL) -> (1, "ls_out_pre_acl"), > + (LSwitch, OUT, PRE_STATEFUL) -> (2, "ls_out_pre_stateful"), > + (LSwitch, OUT, LB) -> (3, "ls_out_lb"), > + (LSwitch, OUT, ACL_HINT) -> (4, "ls_out_acl_hint"), > + (LSwitch, OUT, ACL) -> (5, "ls_out_acl"), > + (LSwitch, OUT, QOS_MARK) -> (6, "ls_out_qos_mark"), > + (LSwitch, OUT, QOS_METER) -> (7, "ls_out_qos_meter"), > + (LSwitch, OUT, STATEFUL) -> (8, "ls_out_stateful"), > + (LSwitch, OUT, PORT_SEC_IP) -> (9, "ls_out_port_sec_ip"), > + (LSwitch, OUT, PORT_SEC_L2) -> (10, "ls_out_port_sec_l2"), > + > + /* Logical router ingress stages. */ > + (LRouter, IN, ADMISSION) -> (0, "lr_in_admission"), > + (LRouter, IN, LOOKUP_NEIGHBOR) -> (1, "lr_in_lookup_neighbor"), > + (LRouter, IN, LEARN_NEIGHBOR) -> (2, "lr_in_learn_neighbor"), > + (LRouter, IN, IP_INPUT) -> (3, "lr_in_ip_input"), > + (LRouter, IN, DEFRAG) -> (4, "lr_in_defrag"), > + (LRouter, IN, UNSNAT) -> (5, "lr_in_unsnat"), > + (LRouter, IN, DNAT) -> (6, "lr_in_dnat"), > + (LRouter, IN, ECMP_STATEFUL) -> (7, "lr_in_ecmp_stateful"), > + (LRouter, IN, ND_RA_OPTIONS) -> (8, "lr_in_nd_ra_options"), > + (LRouter, IN, ND_RA_RESPONSE)-> (9, "lr_in_nd_ra_response"), > + (LRouter, IN, IP_ROUTING) -> (10, "lr_in_ip_routing"), > + (LRouter, IN, IP_ROUTING_ECMP) -> (11, "lr_in_ip_routing_ecmp"), > + (LRouter, IN, POLICY) -> (12, "lr_in_policy"), > + (LRouter, IN, ARP_RESOLVE) -> (13, "lr_in_arp_resolve"), > + (LRouter, IN, CHK_PKT_LEN) -> (14, "lr_in_chk_pkt_len"), > + (LRouter, IN, LARGER_PKTS) -> (15, "lr_in_larger_pkts"), > + (LRouter, IN, GW_REDIRECT) -> (16, "lr_in_gw_redirect"), > + (LRouter, IN, ARP_REQUEST) -> (17, "lr_in_arp_request"), > + > + /* Logical router egress stages. */ > + (LRouter, OUT, UNDNAT) -> (0, "lr_out_undnat"), > + (LRouter, OUT, SNAT) -> (1, "lr_out_snat"), > + (LRouter, OUT, EGR_LOOP) -> (2, "lr_out_egr_loop"), > + (LRouter, OUT, DELIVERY) -> (3, "lr_out_delivery"), > + > + _ -> (64'hffffffffffffffff, "") /* alternatively crash? */ > + } > +} > + > +/* > + * OVS register usage: > + * > + * Logical Switch pipeline: > + * +---------+----------------------------------------------+ > + * | R0 | REGBIT_{CONNTRACK/DHCP/DNS/HAIRPIN} | > + * | | REGBIT_ACL_HINT_{ALLOW_NEW/ALLOW/DROP/BLOCK} | > + * +---------+----------------------------------------------+ > + * | R1 - R9 | UNUSED | > + * +---------+----------------------------------------------+ > + * > + * Logical Router pipeline: > + * +-----+--------------------------+---+-----------------+---+---------------+ > + * | R0 | REGBIT_ND_RA_OPTS_RESULT | | | | | > + * | | (= IN_ND_RA_OPTIONS) | X | | | | > + * | | NEXT_HOP_IPV4 | R | | | | > + * | | (>= IP_INPUT) | E | INPORT_ETH_ADDR | X | | > + * +-----+--------------------------+ G | (< IP_INPUT) | X | | > + * | R1 | SRC_IPV4 for ARP-REQ | 0 | | R | | > + * | | (>= IP_INPUT) | | | E | NEXT_HOP_IPV6 | > + * +-----+--------------------------+---+-----------------+ G | (>= IP_INPUT) | > + * | R2 | UNUSED | X | | 0 | | > + * | | | R | | | | > + * +-----+--------------------------+ E | UNUSED | | | > + * | R3 | UNUSED | G | | | | > + * | | | 1 | | | | > + * +-----+--------------------------+---+-----------------+---+---------------+ > + * | R4 | UNUSED | X | | | | > + * | | | R | | | | > + * +-----+--------------------------+ E | UNUSED | X | | > + * | R5 | UNUSED | G | | X | | > + * | | | 2 | | R |SRC_IPV6 for NS| > + * +-----+--------------------------+---+-----------------+ E | (>= IP_INPUT) | > + * | R6 | UNUSED | X | | G | | > + * | | | R | | 1 | | > + * +-----+--------------------------+ E | UNUSED | | | > + * | R7 | UNUSED | G | | | | > + * | | | 3 | | | | > + * +-----+--------------------------+---+-----------------+---+---------------+ > + * | R8 | ECMP_GROUP_ID | | | > + * | | ECMP_MEMBER_ID | X | | > + * +-----+--------------------------+ R | | > + * | | REGBIT_{ | E | | > + * | | EGRESS_LOOPBACK/ | G | UNUSED | > + * | R9 | PKT_LARGER/ | 4 | | > + * | | LOOKUP_NEIGHBOR_RESULT/| | | > + * | | SKIP_LOOKUP_NEIGHBOR} | | | > + * +-----+--------------------------+---+-----------------+ > + * > + */ > + > +/* Register definitions specific to routers. */ > +function rEG_NEXT_HOP(): string = "reg0" /* reg0 for IPv4, xxreg0 for IPv6 */ > +function rEG_SRC(): string = "reg1" /* reg1 for IPv4, xxreg1 for IPv6 */ > + > +/* Register definitions specific to switches. */ > +function rEGBIT_CONNTRACK_DEFRAG() : string = "reg0[0]" > +function rEGBIT_CONNTRACK_COMMIT() : string = "reg0[1]" > +function rEGBIT_CONNTRACK_NAT() : string = "reg0[2]" > +function rEGBIT_DHCP_OPTS_RESULT() : string = "reg0[3]" > +function rEGBIT_DNS_LOOKUP_RESULT(): string = "reg0[4]" > +function rEGBIT_ND_RA_OPTS_RESULT(): string = "reg0[5]" > +function rEGBIT_HAIRPIN() : string = "reg0[6]" > +function rEGBIT_ACL_HINT_ALLOW_NEW(): string = "reg0[7]" > +function rEGBIT_ACL_HINT_ALLOW() : string = "reg0[8]" > +function rEGBIT_ACL_HINT_DROP() : string = "reg0[9]" > +function rEGBIT_ACL_HINT_BLOCK() : string = "reg0[10]" > + > +/* Register definitions for switches and routers. */ > + > +/* Indicate that this packet has been recirculated using egress > + * loopback. This allows certain checks to be bypassed, such as a > +* logical router dropping packets with source IP address equals > +* one of the logical router's own IP addresses. */ > +function rEGBIT_EGRESS_LOOPBACK() : string = "reg9[0]" > +/* Register to store the result of check_pkt_larger action. */ > +function rEGBIT_PKT_LARGER() : string = "reg9[1]" > +function rEGBIT_LOOKUP_NEIGHBOR_RESULT() : string = "reg9[2]" > +function rEGBIT_LOOKUP_NEIGHBOR_IP_RESULT() : string = "reg9[3]" > + > +/* Register to store the eth address associated to a router port for packets > + * received in S_ROUTER_IN_ADMISSION. > + */ > +function rEG_INPORT_ETH_ADDR() : string = "xreg0[0..47]" > + > +/* Register for ECMP bucket selection. */ > +function rEG_ECMP_GROUP_ID() : string = "reg8[0..15]" > +function rEG_ECMP_MEMBER_ID() : string = "reg8[16..31]" > + > +function fLAGBIT_NOT_VXLAN() : string = "flags[1] == 0" > + > +function mFF_N_LOG_REGS() : bit<32> = 10 > + > +/* > + * Logical_Flow > + relation Out_Logical_Flow ( > + logical_datapath: string, > + pipeline: string, > + table_id: integer, > + priority: integer, > + __match: string, > + actions: string, > + external_ids: Map<string,string>) > + */ > + > +relation Flow ( > + logical_datapath: uuid, > + stage: Stage, > + priority: integer, > + __match: string, > + actions: string, > + external_ids: Map<string,string> > +) > + > +sb::Out_Logical_Flow(._uuid = hash128((f.logical_datapath, f.stage, f.priority, f.__match, f.actions, f.external_ids)), > + .logical_datapath = f.logical_datapath, > + .pipeline = if (f.stage.direction == IN) "ingress" else "egress", > + .table_id = table_id, > + .priority = f.priority, > + .__match = f.__match, > + .actions = f.actions, > + .external_ids = map_insert_imm(f.external_ids, "stage-name", table_name)) :- > + Flow[f], > + (var table_id, var table_name) = stage_id(f.stage). > + > +/* Logical flows for forwarding groups. */ > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(fg_uuid)) :- > + sw in &Switch(), > + var fg_uuid = FlatMap(sw.ls.forwarding_groups), > + fg in nb::Forwarding_Group(._uuid = fg_uuid), > + not set_is_empty(fg.child_port), > + var __match = "arp.tpa == ${fg.vip} && arp.op == 1", > + var actions = "eth.dst = eth.src; " > + "eth.src = ${fg.vmac}; " > + "arp.op = 2; /* ARP reply */ " > + "arp.tha = arp.sha; " > + "arp.sha = ${fg.vmac}; " > + "arp.tpa = arp.spa; " > + "arp.spa = ${fg.vip}; " > + "outport = inport; " > + "flags.loopback = 1; " > + "output;". > + > +function escape_child_ports(child_port: Set<string>): string { > + var escaped = vec_with_capacity(set_size(child_port)); > + for (s in child_port) { > + vec_push(escaped, json_string_escape(s)) > + }; > + string_join(escaped, ",") > +} > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = map_empty()) :- > + sw in &Switch(), > + var fg_uuid = FlatMap(sw.ls.forwarding_groups), > + fg in nb::Forwarding_Group(._uuid = fg_uuid), > + not set_is_empty(fg.child_port), > + var __match = "eth.dst == ${fg.vmac}", > + var actions = "fwd_group(" ++ > + if (fg.liveness) { "liveness=\"true\"," } else { "" } ++ > + "childports=" ++ escape_child_ports(fg.child_port) ++ ");". > + > +/* Logical switch ingress table PORT_SEC_L2: admission control framework > + * (priority 100) */ > +for (sw in &Switch()) { > + if (not sw.is_vlan_transparent) { > + /* Block logical VLANs. */ > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_L2), > + .priority = 100, > + .__match = "vlan.present", > + .actions = "drop;", > + .external_ids = map_empty() /*TODO: check*/) > + }; > + > + /* Broadcast/multicast source address is invalid */ > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_L2), > + .priority = 100, > + .__match = "eth.src[40]", > + .actions = "drop;", > + .external_ids = map_empty() /*TODO: check*/) > + /* Port security flows have priority 50 (see below) and will continue to the next table > + if packet source is acceptable. */ > +} > + > +// space-separated set of strings > +function join(strings: Set<string>, sep: string): string { > + strings.to_vec().join(sep) > +} > + > +function build_port_security_ipv6_flow( > + pipeline: Direction, > + ea: eth_addr, > + ipv6_addrs: Vec<ipv6_netaddr>): string = > +{ > + var ip6_addrs = vec_empty(); > + > + /* Allow link-local address. */ > + vec_push(ip6_addrs, ipv6_string_mapped(in6_generate_lla(ea))); > + > + /* Allow ip6.dst=ff00::/8 for multicast packets */ > + if (pipeline == OUT) { > + vec_push(ip6_addrs, "ff00::/8") > + }; > + for (addr in ipv6_addrs) { > + vec_push(ip6_addrs, ipv6_netaddr_match_network(addr)) > + }; > + > + var dir = if (pipeline == IN) { "src" } else { "dst" }; > + " && ip6.${dir} == {" ++ ip6_addrs.join(", ") ++ "}" > +} > + > +function build_port_security_ipv6_nd_flow( > + ea: eth_addr, > + ipv6_addrs: Vec<ipv6_netaddr>): string = > +{ > + var __match = " && ip6 && nd && ((nd.sll == ${eth_addr_zero()} || " > + "nd.sll == ${ea}) || ((nd.tll == ${eth_addr_zero()} || " > + "nd.tll == ${ea})"; > + if (vec_is_empty(ipv6_addrs)) { > + __match ++ "))" > + } else { > + var ip6_str = ipv6_string_mapped(in6_generate_lla(ea)); > + __match = __match ++ " && (nd.target == ${ip6_str}"; > + > + for(addr in ipv6_addrs) { > + ip6_str = ipv6_netaddr_match_network(addr); > + __match = __match ++ " || nd.target == ${ip6_str}" > + }; > + __match ++ ")))" > + } > +} > + > +/* Pre-ACL */ > +for (&Switch(.ls =ls)) { > + /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are > + * allowed by default. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_ACL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_ACL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_ACL), > + .priority = 110, > + .__match = "eth.dst == $svc_monitor_mac", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_ACL), > + .priority = 110, > + .__match = "eth.src == $svc_monitor_mac", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > + > +/* If there are any stateful ACL rules in this datapath, we must > + * send all IP packets through the conntrack action, which handles > + * defragmentation, in order to match L4 headers. */ > + > +for (&SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "router"}, > + .json_name = lsp_name, > + .sw = &Switch{.ls = ls, .has_stateful_acl = true})) { > + /* Can't use ct() for router ports. Consider the > + * following configuration: lp1(10.0.0.2) on > + * hostA--ls1--lr0--ls2--lp2(10.0.1.2) on hostB, For a > + * ping from lp1 to lp2, First, the response will go > + * through ct() with a zone for lp2 in the ls2 ingress > + * pipeline on hostB. That ct zone knows about this > + * connection. Next, it goes through ct() with the zone > + * for the router port in the egress pipeline of ls2 on > + * hostB. This zone does not know about the connection, > + * as the icmp request went through the logical router > + * on hostA, not hostB. This would only work with > + * distributed conntrack state across all chassis. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_ACL), > + .priority = 110, > + .__match = "ip && inport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_ACL), > + .priority = 110, > + .__match = "ip && outport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +for (&SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "localnet"}, > + .json_name = lsp_name, > + .sw = &Switch{.ls = ls, .has_stateful_acl = true})) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_ACL), > + .priority = 110, > + .__match = "ip && inport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_ACL), > + .priority = 110, > + .__match = "ip && outport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +for (&Switch(.ls = ls, .has_stateful_acl = true)) { > + /* Ingress and Egress Pre-ACL Table (Priority 110). > + * > + * Not to do conntrack on ND and ICMP destination > + * unreachable packets. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_ACL), > + .priority = 110, > + .__match = "nd || nd_rs || nd_ra || mldv1 || mldv2 || " > + "(udp && udp.src == 546 && udp.dst == 547)", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_ACL), > + .priority = 110, > + .__match = "nd || nd_rs || nd_ra || mldv1 || mldv2 || " > + "(udp && udp.src == 546 && udp.dst == 547)", > + .actions = "next;", > + .external_ids = map_empty()); > + > + /* Ingress and Egress Pre-ACL Table (Priority 100). > + * > + * Regardless of whether the ACL is "from-lport" or "to-lport", > + * we need rules in both the ingress and egress table, because > + * the return traffic needs to be followed. > + * > + * 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send > + * it to conntrack for tracking and defragmentation. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_ACL), > + .priority = 100, > + .__match = "ip", > + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_ACL), > + .priority = 100, > + .__match = "ip", > + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", > + .external_ids = map_empty()) > +} > + > +/* Pre-LB */ > +for (&Switch(.ls = ls)) { > + /* Do not send ND packets to conntrack */ > + var __match = "nd || nd_rs || nd_ra || mldv1 || mldv2" in { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_LB), > + .priority = 110, > + .__match = __match, > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_LB), > + .priority = 110, > + .__match = __match, > + .actions = "next;", > + .external_ids = map_empty()) > + }; > + > + /* Do not send service monitor packets to conntrack. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_LB), > + .priority = 110, > + .__match = "eth.dst == $svc_monitor_mac", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_LB), > + .priority = 110, > + .__match = "eth.src == $svc_monitor_mac", > + .actions = "next;", > + .external_ids = map_empty()); > + > + /* Allow all packets to go to next tables by default. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_LB), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_LB), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +for (&SwitchPort(.lsp = lsp, .json_name = lsp_name, .sw = &Switch{.ls = ls})) > +if (lsp.__type == "router" or lsp.__type == "localnet") { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_LB), > + .priority = 110, > + .__match = "ip && inport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_LB), > + .priority = 110, > + .__match = "ip && outport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +relation HasEventElbMeter(has_meter: bool) > + > +HasEventElbMeter(true) :- > + nb::Meter(.name = "event-elb"). > + > +HasEventElbMeter(false) :- > + Unit(), > + not nb::Meter(.name = "event-elb"). > + > +/* Empty LoadBalancer Controller event */ > +function build_empty_lb_event_flow(key: string, lb: nb::Load_Balancer, > + meter: bool): Option<(string, string)> { > + (var ip, var port) = match (ip_address_and_port_from_lb_key(key)) { > + Some{(ip, port)} -> (ip, port), > + _ -> return None > + }; > + > + var protocol = match (lb.protocol) { > + Some{"tcp"} -> "tcp", > + _ -> "udp" > + }; > + var meter = match (meter) { > + true -> "event-elb", > + _ -> "" > + }; > + var vip = match (port) { > + 0 -> "${ip}", > + _ -> "${ip.to_bracketed_string()}:${port}" > + }; > + > + var __match = vec_with_capacity(2); > + __match.push("${ip46_ipX(ip)}.dst == ${ip}"); > + if (port != 0) { > + __match.push("${protocol}.dst == ${port}"); > + }; > + > + var action = "trigger_event(" > + "event = \"empty_lb_backends\", " > + "meter = \"${meter}\", " > + "vip = \"${vip}\", " > + "protocol = \"${protocol}\", " > + "load_balancer = \"${uuid2str(lb._uuid)}\");"; > + > + Some{(__match.join(" && "), action)} > +} > + > +/* ControllerEventEn has exactly one row, either 'true' to enable controller > + * events or 'false' to disable them. */ > +relation ControllerEventEn(enable: bool) > +ControllerEventEn(map_get_bool_def(options, "controller_event", false)) :- > + nb::NB_Global(.options = options). > +ControllerEventEn(false) :- Unit(), not nb::NB_Global(). > + > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PRE_LB), > + .priority = 130, > + .__match = __match, > + .actions = __action, > + .external_ids = stage_hint(lb._uuid)) :- > + ControllerEventEn(true), > + SwitchLBVIP(.sw_uuid = sw_uuid, .lb = &lb, .vip = vip, .backends = backends), > + sw in &Switch(.ls = nb::Logical_Switch{._uuid = sw_uuid}), > + backends == "", > + HasEventElbMeter(has_elb_meter), > + Some {(var __match, var __action)} = build_empty_lb_event_flow( > + vip, lb, has_elb_meter). > + > +/* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send > + * packet to conntrack for defragmentation. > + * > + * Send all the packets to conntrack in the ingress pipeline if the > + * logical switch has a load balancer with VIP configured. Earlier > + * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline > + * if the IP destination matches the VIP. But this causes few issues when > + * a logical switch has no ACLs configured with allow-related. > + * To understand the issue, lets a take a TCP load balancer - > + * 10.0.0.10:80=10.0.0.3:80. > + * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with > + * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1' > + * is sent to the p1's conntrack zone id and the packet is load balanced > + * to the backend - 10.0.0.3. For the reply packet from the backend lport, > + * it is not sent to the conntrack of backend lport's zone id. This is fine > + * as long as the packet is valid. Suppose the backend lport sends an > + * invalid TCP packet (like incorrect sequence number), the packet gets > + * delivered to the lport 'p1' without unDNATing the packet to the > + * VIP - 10.0.0.10. And this causes the connection to be reset by the > + * lport p1's VIF. > + * > + * We can't fix this issue by adding a logical flow to drop ct.inv packets > + * in the egress pipeline since it will drop all other connections not > + * destined to the load balancers. > + * > + * To fix this issue, we send all the packets to the conntrack in the > + * ingress pipeline if a load balancer is configured. We can now > + * add a lflow to drop ct.inv packets. > + */ > +for (sw in &Switch(.has_lb_vip = true)) { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PRE_LB), > + .priority = 100, > + .__match = "ip", > + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, PRE_LB), > + .priority = 100, > + .__match = "ip", > + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", > + .external_ids = map_empty()) > +} > + > +/* Pre-stateful */ > +for (&Switch(.ls = ls)) { > + /* Ingress and Egress pre-stateful Table (Priority 0): Packets are > + * allowed by default. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_STATEFUL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_STATEFUL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + /* If REGBIT_CONNTRACK_DEFRAG is set as 1, then the packets should be > + * sent to conntrack for tracking and defragmentation. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PRE_STATEFUL), > + .priority = 100, > + .__match = "${rEGBIT_CONNTRACK_DEFRAG()} == 1", > + .actions = "ct_next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PRE_STATEFUL), > + .priority = 100, > + .__match = "${rEGBIT_CONNTRACK_DEFRAG()} == 1", > + .actions = "ct_next;", > + .external_ids = map_empty()) > +} > + > +function build_acl_log(acl: nb::ACL): string = > +{ > + if (not acl.log) { > + "" > + } else { > + var strs = vec_empty(); > + match (acl.name) { > + None -> (), > + Some{name} -> vec_push(strs, "name=${json_string_escape(name)}") > + }; > + /* If a severity level isn't specified, default to "info". */ > + match (acl.severity) { > + None -> vec_push(strs, "severity=info"), > + Some{severity} -> vec_push(strs, "severity=${severity}") > + }; > + match (acl.action) { > + "drop" -> { > + vec_push(strs, "verdict=drop") > + }, > + "reject" -> { > + vec_push(strs, "verdict=reject") > + }, > + "allow" -> { > + vec_push(strs, "verdict=allow") > + }, > + "allow-related" -> { > + vec_push(strs, "verdict=allow") > + }, > + _ -> () > + }; > + match (acl.meter) { > + None -> (), > + Some{meter} -> vec_push(strs, "meter=${json_string_escape(meter)}") > + }; > + "log(${string_join(strs, \", \")}); " > + } > +} > + > +/* Due to various hard-coded priorities need to implement ACLs, the > + * northbound database supports a smaller range of ACL priorities than > + * are available to logical flows. This value is added to an ACL > + * priority to determine the ACL's logical flow priority. */ > +function oVN_ACL_PRI_OFFSET(): integer = 1000 > + > +/* Intermediate relation that stores reject ACLs. > + * The following rules generate logical flows for these ACLs. > + */ > +relation Reject(lsuuid: uuid, pipeline: string, stage: Stage, acl: nb::ACL, extra_match: string, extra_actions: string) > + > +/* build_reject_acl_rules() */ > +for (Reject(lsuuid, pipeline, stage, acl, extra_match_, extra_actions_)) { > + var extra_match = match (extra_match_) { > + "" -> "", > + s -> "(${s}) && " > + } in > + var extra_actions = match (extra_actions_) { > + "" -> "", > + s -> "${s} " > + } in > + var next = match (pipeline == "ingress") { > + true -> "next(pipeline=egress,table=${stage_id(switch_stage(OUT, QOS_MARK)).0})", > + false -> "next(pipeline=ingress,table=${stage_id(switch_stage(IN, L2_LKUP)).0})" > + } in > + var acl_log = build_acl_log(acl) in { > + var __match = extra_match ++ acl.__match in > + var actions = acl_log ++ extra_actions ++ "reg0 = 0; " > + "reject { " > + "/* eth.dst <-> eth.src; ip.dst <-> ip.src; is implicit. */ " > + "outport <-> inport; ${next}; };" in > + Flow(.logical_datapath = lsuuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(acl._uuid)) > + } > +} > + > +/* build_acls */ > +for (sw in &Switch(.ls = ls)) > +var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in > +{ > + /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by > + * default. A related rule at priority 1 is added below if there > + * are any stateful ACLs in this datapath. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ACL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + if (has_stateful) { > + /* Ingress and Egress ACL Table (Priority 1). > + * > + * By default, traffic is allowed. This is partially handled by > + * the Priority 0 ACL flows added earlier, but we also need to > + * commit IP flows. This is because, while the initiater's > + * direction may not have any stateful rules, the server's may > + * and then its return traffic would not have an associated > + * conntrack entry and would return "+invalid". > + * > + * We use "ct_commit" for a connection that is not already known > + * by the connection tracker. Once a connection is committed, > + * subsequent packets will hit the flow at priority 0 that just > + * uses "next;" > + * > + * We also check for established connections that have ct_label.blocked > + * set on them. That's a connection that was disallowed, but is > + * now allowed by policy again since it hit this default-allow flow. > + * We need to set ct_label.blocked=0 to let the connection continue, > + * which will be done by ct_commit() in the "stateful" stage. > + * Subsequent packets will hit the flow at priority 0 that just > + * uses "next;". */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ACL), > + .priority = 1, > + .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))", > + .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 1, > + .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))", > + .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;", > + .external_ids = map_empty()); > + > + /* Ingress and Egress ACL Table (Priority 65535). > + * > + * Always drop traffic that's in an invalid state. Also drop > + * reply direction packets for connections that have been marked > + * for deletion (bit 0 of ct_label is set). > + * > + * This is enforced at a higher priority than ACLs can be defined. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ACL), > + .priority = 65535, > + .__match = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)", > + .actions = "drop;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 65535, > + .__match = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)", > + .actions = "drop;", > + .external_ids = map_empty()); > + > + /* Ingress and Egress ACL Table (Priority 65535). > + * > + * Allow reply traffic that is part of an established > + * conntrack entry that has not been marked for deletion > + * (bit 0 of ct_label). We only match traffic in the > + * reply direction because we want traffic in the request > + * direction to hit the currently defined policy from ACLs. > + * > + * This is enforced at a higher priority than ACLs can be defined. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ACL), > + .priority = 65535, > + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv " > + "&& ct.rpl && ct_label.blocked == 0", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 65535, > + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv " > + "&& ct.rpl && ct_label.blocked == 0", > + .actions = "next;", > + .external_ids = map_empty()); > + > + /* Ingress and Egress ACL Table (Priority 65535). > + * > + * Allow traffic that is related to an existing conntrack entry that > + * has not been marked for deletion (bit 0 of ct_label). > + * > + * This is enforced at a higher priority than ACLs can be defined. > + * > + * NOTE: This does not support related data sessions (eg, > + * a dynamically negotiated FTP data channel), but will allow > + * related traffic such as an ICMP Port Unreachable through > + * that's generated from a non-listening UDP port. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ACL), > + .priority = 65535, > + .__match = "!ct.est && ct.rel && !ct.new && !ct.inv " > + "&& ct_label.blocked == 0", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 65535, > + .__match = "!ct.est && ct.rel && !ct.new && !ct.inv " > + "&& ct_label.blocked == 0", > + .actions = "next;", > + .external_ids = map_empty()); > + > + /* Ingress and Egress ACL Table (Priority 65535). > + * > + * Not to do conntrack on ND packets. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ACL), > + .priority = 65535, > + .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 65535, > + .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2", > + .actions = "next;", > + .external_ids = map_empty()) > + }; > + > + /* Add a 34000 priority flow to advance the DNS reply from ovn-controller, > + * if the CMS has configured DNS records for the datapath. > + */ > + if (sw.has_dns_records) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 34000, > + .__match = "udp.src == 53", > + .actions = if has_stateful "ct_commit; next;" else "next;", > + .external_ids = map_empty()) > + }; > + > + /* Add a 34000 priority flow to advance the service monitor reply > + * packets to skip applying ingress ACLs. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ACL), > + .priority = 34000, > + .__match = "eth.dst == $svc_monitor_mac", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 34000, > + .__match = "eth.src == $svc_monitor_mac", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +/* This stage builds hints for the IN/OUT_ACL stage. Based on various > + * combinations of ct flags packets may hit only a subset of the logical > + * flows in the IN/OUT_ACL stage. > + * > + * Populating ACL hints first and storing them in registers simplifies > + * the logical flow match expressions in the IN/OUT_ACL stage and > + * generates less openflows. > + * > + * Certain combinations of ct flags might be valid matches for multiple > + * types of ACL logical flows (e.g., allow/drop). In such cases hints > + * corresponding to all potential matches are set. > + */ > +input relation AclHintStages[Stage] > +AclHintStages[switch_stage(IN, ACL_HINT)]. > +AclHintStages[switch_stage(OUT, ACL_HINT)]. > +for (&Switch(.ls = ls)) { > + for (AclHintStages[stage]) { > + /* New, not already established connections, may hit either allow > + * or drop ACLs. For allow ACLs, the connection must also be committed > + * to conntrack so we set REGBIT_ACL_HINT_ALLOW_NEW. > + */ > + Flow(ls._uuid, stage, 7, "ct.new && !ct.est", > + "${rEGBIT_ACL_HINT_ALLOW_NEW()} = 1; " > + "${rEGBIT_ACL_HINT_DROP()} = 1; " > + "next;", map_empty()); > + > + /* Already established connections in the "request" direction that > + * are already marked as "blocked" may hit either: > + * - allow ACLs for connections that were previously allowed by a > + * policy that was deleted and is being readded now. In this case > + * the connection should be recommitted so we set > + * REGBIT_ACL_HINT_ALLOW_NEW. > + * - drop ACLs. > + */ > + Flow(ls._uuid, stage, 6, "!ct.new && ct.est && !ct.rpl && ct_label.blocked == 1", > + "${rEGBIT_ACL_HINT_ALLOW_NEW()} = 1; " > + "${rEGBIT_ACL_HINT_DROP()} = 1; " > + "next;", map_empty()); > + > + /* Not tracked traffic can either be allowed or dropped. */ > + Flow(ls._uuid, stage, 5, "!ct.trk", > + "${rEGBIT_ACL_HINT_ALLOW()} = 1; " > + "${rEGBIT_ACL_HINT_DROP()} = 1; " > + "next;", map_empty()); > + > + /* Already established connections in the "request" direction may hit > + * either: > + * - allow ACLs in which case the traffic should be allowed so we set > + * REGBIT_ACL_HINT_ALLOW. > + * - drop ACLs in which case the traffic should be blocked and the > + * connection must be committed with ct_label.blocked set so we set > + * REGBIT_ACL_HINT_BLOCK. > + */ > + Flow(ls._uuid, stage, 4, "!ct.new && ct.est && !ct.rpl && ct_label.blocked == 0", > + "${rEGBIT_ACL_HINT_ALLOW()} = 1; " > + "${rEGBIT_ACL_HINT_BLOCK()} = 1; " > + "next;", map_empty()); > + > + /* Not established or established and already blocked connections may > + * hit drop ACLs. > + */ > + Flow(ls._uuid, stage, 3, "!ct.est", > + "${rEGBIT_ACL_HINT_DROP()} = 1; " > + "next;", map_empty()); > + Flow(ls._uuid, stage, 2, "ct.est && ct_label.blocked == 1", > + "${rEGBIT_ACL_HINT_DROP()} = 1; " > + "next;", map_empty()); > + > + /* Established connections that were previously allowed might hit > + * drop ACLs in which case the connection must be committed with > + * ct_label.blocked set. > + */ > + Flow(ls._uuid, stage, 1, "ct.est && ct_label.blocked == 0", > + "${rEGBIT_ACL_HINT_BLOCK()} = 1; " > + "next;", map_empty()); > + > + /* In any case, advance to the next stage. */ > + Flow(ls._uuid, stage, 0, "1", "next;", map_empty()) > + } > +} > + > +/* Ingress or Egress ACL Table (Various priorities). */ > +for (&SwitchACL(.sw = &Switch{.ls = ls, .has_stateful_acl = has_stateful}, .acl = &acl)) { > + /* consider_acl */ > + var ingress = acl.direction == "from-lport" in > + var stage = if (ingress) { switch_stage(IN, ACL) } else { switch_stage(OUT, ACL) } in > + var pipeline = if ingress "ingress" else "egress" in > + var stage_hint = stage_hint(acl._uuid) in > + if (acl.action == "allow" or acl.action == "allow-related") { > + /* If there are any stateful flows, we must even commit "allow" > + * actions. This is because, while the initiater's > + * direction may not have any stateful rules, the server's > + * may and then its return traffic would not have an > + * associated conntrack entry and would return "+invalid". */ > + if (not has_stateful) { > + Flow(.logical_datapath = ls._uuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = acl.__match, > + .actions = "${build_acl_log(acl)}next;", > + .external_ids = stage_hint) > + } else { > + /* Commit the connection tracking entry if it's a new > + * connection that matches this ACL. After this commit, > + * the reply traffic is allowed by a flow we create at > + * priority 65535, defined earlier. > + * > + * It's also possible that a known connection was marked for > + * deletion after a policy was deleted, but the policy was > + * re-added while that connection is still known. We catch > + * that case here and un-set ct_label.blocked (which will be done > + * by ct_commit in the "stateful" stage) to indicate that the > + * connection should be allowed to resume. > + */ > + Flow(.logical_datapath = ls._uuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = "${rEGBIT_ACL_HINT_ALLOW_NEW()} == 1 && (${acl.__match})", > + .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; ${build_acl_log(acl)}next;", > + .external_ids = stage_hint); > + > + /* Match on traffic in the request direction for an established > + * connection tracking entry that has not been marked for > + * deletion. There is no need to commit here, so we can just > + * proceed to the next table. We use this to ensure that this > + * connection is still allowed by the currently defined > + * policy. Match untracked packets too. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = "${rEGBIT_ACL_HINT_ALLOW()} == 1 && (${acl.__match})", > + .actions = "${build_acl_log(acl)}next;", > + .external_ids = stage_hint) > + } > + } else if (acl.action == "drop" or acl.action == "reject") { > + /* The implementation of "drop" differs if stateful ACLs are in > + * use for this datapath. In that case, the actions differ > + * depending on whether the connection was previously committed > + * to the connection tracker with ct_commit. */ > + if (has_stateful) { > + /* If the packet is not tracked or not part of an established > + * connection, then we can simply reject/drop it. */ > + var __match = "${rEGBIT_ACL_HINT_DROP()} == 1" in > + if (acl.action == "reject") { > + Reject(ls._uuid, pipeline, stage, acl, __match, "") > + } else { > + Flow(.logical_datapath = ls._uuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = __match ++ " && (${acl.__match})", > + .actions = "${build_acl_log(acl)}/* drop */", > + .external_ids = stage_hint) > + }; > + /* For an existing connection without ct_label set, we've > + * encountered a policy change. ACLs previously allowed > + * this connection and we committed the connection tracking > + * entry. Current policy says that we should drop this > + * connection. First, we set bit 0 of ct_label to indicate > + * that this connection is set for deletion. By not > + * specifying "next;", we implicitly drop the packet after > + * updating conntrack state. We would normally defer > + * ct_commit() to the "stateful" stage, but since we're > + * rejecting/dropping the packet, we go ahead and do it here. > + */ > + var __match = "${rEGBIT_ACL_HINT_BLOCK()} == 1" in > + var actions = "ct_commit { ct_label.blocked = 1; }; " in > + if (acl.action == "reject") { > + Reject(ls._uuid, pipeline, stage, acl, __match, actions) > + } else { > + Flow(.logical_datapath = ls._uuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = __match ++ " && (${acl.__match})", > + .actions = "${actions}${build_acl_log(acl)}/* drop */", > + .external_ids = stage_hint) > + } > + } else { > + /* There are no stateful ACLs in use on this datapath, > + * so a "reject/drop" ACL is simply the "reject/drop" > + * logical flow action in all cases. */ > + if (acl.action == "reject") { > + Reject(ls._uuid, pipeline, stage, acl, "", "") > + } else { > + Flow(.logical_datapath = ls._uuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = acl.__match, > + .actions = "${build_acl_log(acl)}/* drop */", > + .external_ids = stage_hint) > + } > + } > + } > +} > + > +/* Add 34000 priority flow to allow DHCP reply from ovn-controller to all > + * logical ports of the datapath if the CMS has configured DHCPv4 options. > + * */ > +for (SwitchPortDHCPv4Options(.port = &SwitchPort{.lsp = lsp, .sw = &sw}, > + .dhcpv4_options = dhcpv4_options@&nb::DHCP_Options{.options = options}) > + if lsp.__type != "external") { > + (Some{var server_id}, Some{var server_mac}, Some{var lease_time}) = > + (map_get(options, "server_id"), map_get(options, "server_mac"), map_get(options, "lease_time")) in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 34000, > + .__match = "outport == ${json_string_escape(lsp.name)} " > + "&& eth.src == ${server_mac} " > + "&& ip4.src == ${server_id} && udp && udp.src == 67 " > + "&& udp.dst == 68", > + .actions = if (sw.has_stateful_acl) "ct_commit; next;" else "next;", > + .external_ids = stage_hint(dhcpv4_options._uuid)) > +} > + > +for (SwitchPortDHCPv6Options(.port = &SwitchPort{.lsp = lsp, .sw = &sw}, > + .dhcpv6_options = dhcpv6_options@&nb::DHCP_Options{.options=options} ) > + if lsp.__type != "external") { > + Some{var server_mac} = map_get(options, "server_id") in > + Some{var ea} = eth_addr_from_string(server_mac) in > + var server_ip = ipv6_string_mapped(in6_generate_lla(ea)) in > + /* Get the link local IP of the DHCPv6 server from the > + * server MAC. */ > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, ACL), > + .priority = 34000, > + .__match = "outport == ${json_string_escape(lsp.name)} " > + "&& eth.src == ${server_mac} " > + "&& ip6.src == ${server_ip} && udp && udp.src == 547 " > + "&& udp.dst == 546", > + .actions = if (sw.has_stateful_acl) "ct_commit; next;" else "next;", > + .external_ids = stage_hint(dhcpv6_options._uuid)) > +} > + > +relation QoSAction(qos: uuid, key_action: string, value_action: integer) > + > +QoSAction(qos, k, v) :- > + nb::QoS(._uuid = qos, .action = actions), > + var action = FlatMap(actions), > + (var k, var v) = action. > + > +/* QoS rules */ > +for (&Switch(.ls = ls)) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, QOS_MARK), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, QOS_MARK), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, QOS_METER), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, QOS_METER), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +for (SwitchQoS(.sw = &sw, .qos = &qos)) { > + var ingress = if (qos.direction == "from-lport") true else false in > + var pipeline = if ingress "ingress" else "egress" in { > + var stage = if (ingress) { switch_stage(IN, QOS_MARK) } else { switch_stage(OUT, QOS_MARK) } in > + /* FIXME: Can value_action be negative? */ > + for (QoSAction(qos._uuid, key_action, value_action)) { > + if (key_action == "dscp") { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = stage, > + .priority = qos.priority, > + .__match = qos.__match, > + .actions = "ip.dscp = ${value_action}; next;", > + .external_ids = stage_hint(qos._uuid)) > + } > + }; > + > + (var burst, var rate) = { > + var rate = 0; > + var burst = 0; > + for (bw in qos.bandwidth) { > + /* FIXME: Can value_bandwidth be negative? */ > + (var key_bandwidth, var value_bandwidth) = bw; > + if (key_bandwidth == "rate") { > + rate = value_bandwidth > + } else if (key_bandwidth == "burst") { > + burst = value_bandwidth > + } else () > + }; > + (burst, rate) > + } in > + if (rate != 0) { > + var stage = if (ingress) { switch_stage(IN, QOS_METER) } else { switch_stage(OUT, QOS_METER) } in > + var meter_action = if (burst != 0) { > + "set_meter(${rate}, ${burst}); next;" > + } else { > + "set_meter(${rate}); next;" > + } in > + /* Ingress and Egress QoS Meter Table. > + * > + * We limit the bandwidth of this flow by adding a meter table. > + */ > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = stage, > + .priority = qos.priority, > + .__match = qos.__match, > + .actions = meter_action, > + .external_ids = stage_hint(qos._uuid)) > + } > + } > +} > + > +/* LB rules */ > +for (&Switch(.ls = ls, .has_lb_vip = has_lb_vip)) { > + /* Ingress and Egress LB Table (Priority 0): Packets are allowed by > + * default. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, LB), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, LB), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + if (not ls.load_balancer.is_empty()) { > + for (&SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "router"}, > + .json_name = lsp_name, > + .sw = &Switch{.ls = ls})) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, LB), > + .priority = 65535, > + .__match = "ip && inport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, LB), > + .priority = 65535, > + .__match = "ip && outport == ${lsp_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > + } > + }; > + > + if (has_lb_vip) { > + /* Ingress and Egress LB Table (Priority 65534). > + * > + * Send established traffic through conntrack for just NAT. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, LB), > + .priority = 65534, > + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv && ct_label.natted == 1", > + .actions = "${rEGBIT_CONNTRACK_NAT()} = 1; next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, LB), > + .priority = 65534, > + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv && ct_label.natted == 1", > + .actions = "${rEGBIT_CONNTRACK_NAT()} = 1; next;", > + .external_ids = map_empty()) > + } > +} > + > +/* stateful rules */ > +for (&Switch(.ls = ls)) { > + /* Ingress and Egress stateful Table (Priority 0): Packets are > + * allowed by default. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, STATEFUL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, STATEFUL), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + /* If REGBIT_CONNTRACK_COMMIT is set as 1, then the packets should be > + * committed to conntrack. We always set ct_label.blocked to 0 here as > + * any packet that makes it this far is part of a connection we > + * want to allow to continue. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, STATEFUL), > + .priority = 100, > + .__match = "${rEGBIT_CONNTRACK_COMMIT()} == 1", > + .actions = "ct_commit { ct_label.blocked = 0; }; next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, STATEFUL), > + .priority = 100, > + .__match = "${rEGBIT_CONNTRACK_COMMIT()} == 1", > + .actions = "ct_commit { ct_label.blocked = 0; }; next;", > + .external_ids = map_empty()); > + > + /* If REGBIT_CONNTRACK_NAT is set as 1, then packets should just be sent > + * through nat (without committing). > + * > + * REGBIT_CONNTRACK_COMMIT is set for new connections and > + * REGBIT_CONNTRACK_NAT is set for established connections. So they > + * don't overlap. > + */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, STATEFUL), > + .priority = 100, > + .__match = "${rEGBIT_CONNTRACK_NAT()} == 1", > + .actions = "ct_lb;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, STATEFUL), > + .priority = 100, > + .__match = "${rEGBIT_CONNTRACK_NAT()} == 1", > + .actions = "ct_lb;", > + .external_ids = map_empty()) > +} > + > +/* Load balancing rules for new connections get committed to conntrack > + * table. So even if REGBIT_CONNTRACK_COMMIT is set in a previous table > + * a higher priority rule for load balancing below also commits the > + * connection, so it is okay if we do not hit the above match on > + * REGBIT_CONNTRACK_COMMIT. */ > +function get_match_for_lb_key(ip_address: v46_ip, > + port: bit<16>, > + protocol: Option<string>, > + redundancy: bool): string = { > + var port_match = if (port != 0) { > + var proto = if (protocol == Some{"udp"}) { > + "udp" > + } else { > + "tcp" > + }; > + if (redundancy) { " && ${proto}" } else { "" } ++ > + " && ${proto}.dst == ${port}" > + } else { > + "" > + }; > + > + var ip_match = match (ip_address) { > + IPv4{ipv4} -> "ip4.dst == ${ipv4}", > + IPv6{ipv6} -> "ip6.dst == ${ipv6}" > + }; > + > + if (redundancy) { "ip && " } else { "" } ++ ip_match ++ port_match > +} > +/* New connections in Ingress table. */ > + > +function ct_lb(backends: string, > + selection_fields: Set<string>, protocol: Option<string>): string { > + var args = vec_with_capacity(2); > + args.push("backends=${backends}"); > + > + if (not selection_fields.is_empty()) { > + var hash_fields = vec_with_capacity(selection_fields.size()); > + for (sf in selection_fields) { > + var hf = match ((sf, protocol)) { > + ("tp_src", Some{p}) -> "${p}_src", > + ("tp_dst", Some{p}) -> "${p}_dst", > + _ -> sf > + }; > + hash_fields.push(hf); > + }; > + args.push("hash_fields=" ++ json_string_escape(hash_fields.join(","))); > + }; > + > + "ct_lb(" ++ args.join("; ") ++ ");" > +} > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, STATEFUL), > + .priority = priority, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lb._uuid)) :- > + sw in &Switch(), > + LBVIPBackend[lbvipbackend], > + Some{var svc_monitor} = lbvipbackend.svc_monitor, > + var lbvip = lbvipbackend.lbvip, > + var lb = lbvip.lb, > + set_contains(sw.ls.load_balancer, lb._uuid), > + bs in &LBVIPBackendStatus(.port = lbvipbackend.port, > + .ip = lbvipbackend.ip, > + .protocol = default_protocol(lb.protocol), > + .logical_port = svc_monitor.port_name), > + var bses = bs.group_by((sw, lbvip, lb)).to_set(), > + var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false), > + var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, > + var up_backends = { > + var up_backends = set_empty(); > + for (bs in bses) { > + if (bs.up) { > + set_insert(up_backends, "${bs.ip}:${bs.port}") > + } > + }; > + up_backends > + }, > + var actions = if (set_is_empty(up_backends)) { > + "drop;" > + } else { > + ct_lb(string_join(set_to_vec(up_backends), ","), > + lb.selection_fields, lb.protocol) > + }. > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, STATEFUL), > + .priority = priority, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lb._uuid)) :- > + sw in &Switch(), > + LBVIPBackend[lbvipbackend], > + None = lbvipbackend.svc_monitor, > + var lbvip = lbvipbackend.lbvip, > + var lb = lbvip.lb, > + set_contains(sw.ls.load_balancer, lb._uuid), > + var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false), > + var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, > + var actions = ct_lb(lbvip.backend_ips, lb.selection_fields, lb.protocol). > + > +/* Also install flows that allow hairpinning of traffic (i.e., if > + * a load balancer VIP is DNAT-ed to a backend that happens to be > + * the source of the traffic). > + */ > + > +function get_hairpin_match(lbvipbackend: Ref<LBVIPBackend>, > + l4_dir: string, l3_dst: Option<v46_ip>): string = { > + var lbvip = lbvipbackend.lbvip; > + var lb = lbvip.lb; > + var ipX = ip46_ipX(lbvip.vip_addr); > + > + var __match = vec_with_capacity(3); > + > + vec_push(__match, "${ipX}.src == ${lbvipbackend.ip}"); > + > + match (l3_dst) { > + Some{s} -> vec_push(__match, "${ipX}.dst == ${s}"), > + _ -> () > + }; > + > + if (lbvip.vip_port != 0) { > + var proto = match (lb.protocol) { > + Some{value} -> value, > + None -> "tcp" > + }; > + vec_push(__match, "${proto}.${l4_dir} == ${lbvipbackend.port}") > + }; > + > + "(" ++ string_join(__match, " && ") ++ ")" > +} > + > +/* Ingress Pre-Hairpin table. > + * - Priority 2: SNAT load balanced traffic that needs to be hairpinned: > + * - Both SRC and DST IP match backend->ip and destination port > + * matches backend->port. > + * - Priority 1: unSNAT replies to hairpinned load balanced traffic. > + * - SRC IP matches backend->ip, DST IP matches LB VIP and source port > + * matches backend->port. > + */ > +/* Packets that after load balancing have equal source and > + * destination IPs should be hairpinned. > + */ > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PRE_HAIRPIN), > + .priority = 2, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lb._uuid)) :- > + sw in &Switch(), > + LBVIPBackend[lbvipbackend], > + var lbvip = lbvipbackend.lbvip, > + var lb = lbvip.lb, > + set_contains(sw.ls.load_balancer, lb._uuid), > + var __match = get_hairpin_match(lbvipbackend, "dst", Some{lbvipbackend.ip}), > + var matches = __match.group_by((lbvip, lb, sw)).to_vec(), > + var __match = string_join(matches, " || "), > + var actions = "${rEGBIT_HAIRPIN()} = 1; ct_snat(${lbvip.vip_addr});". > +/* If the packets are replies for hairpinned traffic, UNSNAT them. */ > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PRE_HAIRPIN), > + .priority = 1, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lb._uuid)) :- > + sw in &Switch(), > + LBVIPBackend[lbvipbackend], > + var lbvip = lbvipbackend.lbvip, > + var lb = lbvip.lb, > + set_contains(sw.ls.load_balancer, lb._uuid), > + var __match = get_hairpin_match(lbvipbackend, "src", None), > + var matches = __match.group_by((lbvip, lb, sw)).to_vec(), > + var ipX = ip46_ipX(lbvip.vip_addr), > + var __match = "(" ++ string_join(matches, " || ") ++ ") && " > + "${ipX}.dst == ${lbvip.vip_addr}", > + var actions = "${rEGBIT_HAIRPIN()} = 1; ct_snat;". > + > + > +/* Ingress Pre-Hairpin table (Priority 0). Packets that don't need > + * hairpinning should continue processing. > + */ > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PRE_HAIRPIN), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) :- > + sw in &Switch(). > + > +/* Ingress Hairpin table. > + * - Priority 0: Packets that don't need hairpinning should continue > + * processing. > + * - Priority 1: Packets that were SNAT-ed for hairpinning should be > + * looped back (i.e., swap ETH addresses and send back on inport). > + */ > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, HAIRPIN), > + .priority = 1, > + .__match = "${rEGBIT_HAIRPIN()} == 1", > + .actions = "eth.dst <-> eth.src;" > + "outport = inport;" > + "flags.loopback = 1;" > + "output;", > + .external_ids = map_empty()) :- > + sw in &Switch(). > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, HAIRPIN), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) :- > + sw in &Switch(). > + > + > +/* Logical switch ingress table PORT_SEC_L2: ingress port security - L2 (priority 50) > + ingress table PORT_SEC_IP: ingress port security - IP (priority 90 and 80) > + ingress table PORT_SEC_ND: ingress port security - ND (priority 90 and 80) */ > +for (&SwitchPort(.lsp = lsp, .sw = &sw, .json_name = json_name, .ps_eth_addresses = ps_eth_addresses) > + if lsp.is_enabled() and lsp.__type != "external") { > + for (pbinding in sb::Out_Port_Binding(.logical_port = lsp.name)) { > + var __match = if (vec_is_empty(ps_eth_addresses)) { > + "inport == ${json_name}" > + } else { > + "inport == ${json_name} && eth.src == {${ps_eth_addresses.join(\" \")}}" > + } in > + var actions = match (map_get(pbinding.options, "qdisc_queue_id")) { > + None -> "next;", > + Some{id} -> "set_queue(${id}); next;" > + } in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_L2), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lsp._uuid)) > + } > +} > + > +/** > +* Build port security constraints on IPv4 and IPv6 src and dst fields > +* and add logical flows to S_SWITCH_(IN/OUT)_PORT_SEC_IP stage. > +* > +* For each port security of the logical port, following > +* logical flows are added > +* - If the port security has IPv4 addresses, > +* - Priority 90 flow to allow IPv4 packets for known IPv4 addresses > +* > +* - If the port security has IPv6 addresses, > +* - Priority 90 flow to allow IPv6 packets for known IPv6 addresses > +* > +* - If the port security has IPv4 addresses or IPv6 addresses or both > +* - Priority 80 flow to drop all IPv4 and IPv6 traffic > +*/ > +for (SwitchPortPSAddresses(.port = &port@SwitchPort{.sw = &sw}, .ps_addrs = ps) > + if port.is_enabled() and > + (vec_len(ps.ipv4_addrs) > 0 or vec_len(ps.ipv6_addrs) > 0) and > + port.lsp.__type != "external") > +{ > + if (vec_len(ps.ipv4_addrs) > 0) { > + var dhcp_match = "inport == ${port.json_name}" > + " && eth.src == ${ps.ea}" > + " && ip4.src == 0.0.0.0" > + " && ip4.dst == 255.255.255.255" > + " && udp.src == 68 && udp.dst == 67" in { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_IP), > + .priority = 90, > + .__match = dhcp_match, > + .actions = "next;", > + .external_ids = stage_hint(port.lsp._uuid)) > + }; > + var addrs = { > + var addrs = vec_empty(); > + for (addr in ps.ipv4_addrs) { > + /* When the netmask is applied, if the host portion is > + * non-zero, the host can only use the specified > + * address. If zero, the host is allowed to use any > + * address in the subnet. > + */ > + vec_push(addrs, ipv4_netaddr_match_host_or_network(addr)) > + }; > + addrs > + } in > + var __match = > + "inport == ${port.json_name} && eth.src == ${ps.ea} && ip4.src == {" ++ > + string_join(addrs, ", ") ++ "}" in > + { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_IP), > + .priority = 90, > + .__match = __match, > + .actions = "next;", > + .external_ids = stage_hint(port.lsp._uuid)) > + } > + }; > + if (vec_len(ps.ipv6_addrs) > 0) { > + var dad_match = "inport == ${port.json_name}" > + " && eth.src == ${ps.ea}" > + " && ip6.src == ::" > + " && ip6.dst == ff02::/16" > + " && icmp6.type == {131, 135, 143}" in > + { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_IP), > + .priority = 90, > + .__match = dad_match, > + .actions = "next;", > + .external_ids = stage_hint(port.lsp._uuid)) > + }; > + var __match = "inport == ${port.json_name} && eth.src == ${ps.ea}" ++ > + build_port_security_ipv6_flow(IN, ps.ea, ps.ipv6_addrs) in > + { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_IP), > + .priority = 90, > + .__match = __match, > + .actions = "next;", > + .external_ids = stage_hint(port.lsp._uuid)) > + } > + }; > + var __match = "inport == ${port.json_name} && eth.src == ${ps.ea} && ip" in > + { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_IP), > + .priority = 80, > + .__match = __match, > + .actions = "drop;", > + .external_ids = stage_hint(port.lsp._uuid)) > + } > +} > + > +/** > + * Build port security constraints on ARP and IPv6 ND fields > + * and add logical flows to S_SWITCH_IN_PORT_SEC_ND stage. > + * > + * For each port security of the logical port, following > + * logical flows are added > + * - If the port security has no IP (both IPv4 and IPv6) or > + * if it has IPv4 address(es) > + * - Priority 90 flow to allow ARP packets for known MAC addresses > + * in the eth.src and arp.spa fields. If the port security > + * has IPv4 addresses, allow known IPv4 addresses in the arp.tpa field. > + * > + * - If the port security has no IP (both IPv4 and IPv6) or > + * if it has IPv6 address(es) > + * - Priority 90 flow to allow IPv6 ND packets for known MAC addresses > + * in the eth.src and nd.sll/nd.tll fields. If the port security > + * has IPv6 addresses, allow known IPv6 addresses in the nd.target field > + * for IPv6 Neighbor Advertisement packet. > + * > + * - Priority 80 flow to drop ARP and IPv6 ND packets. > + */ > +for (SwitchPortPSAddresses(.port = &port@SwitchPort{.sw = &sw}, .ps_addrs = ps) > + if port.is_enabled() and port.lsp.__type != "external") > +{ > + var no_ip = vec_is_empty(ps.ipv4_addrs) and vec_is_empty(ps.ipv6_addrs) in > + { > + if (not vec_is_empty(ps.ipv4_addrs) or no_ip) { > + var __match = { > + var prefix = "inport == ${port.json_name} && eth.src == ${ps.ea} && arp.sha == ${ps.ea}"; > + if (not vec_is_empty(ps.ipv4_addrs)) { > + var spas = vec_empty(); > + for (addr in ps.ipv4_addrs) { > + vec_push(spas, ipv4_netaddr_match_host_or_network(addr)) > + }; > + prefix ++ " && arp.spa == {${string_join(spas, \", \")}}" > + } else { > + prefix > + } > + } in { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_ND), > + .priority = 90, > + .__match = __match, > + .actions = "next;", > + .external_ids = stage_hint(port.lsp._uuid)) > + } > + }; > + if (not vec_is_empty(ps.ipv6_addrs) or no_ip) { > + var __match = "inport == ${port.json_name} && eth.src == ${ps.ea}" ++ > + build_port_security_ipv6_nd_flow(ps.ea, ps.ipv6_addrs) in > + { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_ND), > + .priority = 90, > + .__match = __match, > + .actions = "next;", > + .external_ids = stage_hint(port.lsp._uuid)) > + } > + }; > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_ND), > + .priority = 80, > + .__match = "inport == ${port.json_name} && (arp || nd)", > + .actions = "drop;", > + .external_ids = stage_hint(port.lsp._uuid)) > + } > +} > + > +/* Ingress table PORT_SEC_ND and PORT_SEC_IP: Port security - IP and ND, by > + * default goto next. (priority 0)*/ > +for (&Switch(.ls = ls)) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_ND), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, PORT_SEC_IP), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +/* Ingress table ARP_ND_RSP: ARP/ND responder, skip requests coming from > + * localnet and vtep ports. (priority 100); see ovn-northd.8.xml for the > + * rationale. */ > +for (&SwitchPort(.lsp = lsp, .sw = &sw, .json_name = json_name) > + if lsp.is_enabled() and > + (lsp.__type == "localnet" or lsp.__type == "vtep")) > +{ > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 100, > + .__match = "inport == ${json_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +function lsp_is_up(lsp: nb::Logical_Switch_Port): bool = { > + lsp.up == Some{true} > +} > + > +/* Ingress table ARP_ND_RSP: ARP/ND responder, reply for known IPs. > + * (priority 50). */ > +/* Handle > + * - GARPs for virtual ip which belongs to a logical port > + * of type 'virtual' and bind that port. > + * > + * - ARP reply from the virtual ip which belongs to a logical > + * port of type 'virtual' and bind that port. > + * */ > + Flow(.logical_datapath = sp.sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 100, > + .__match = "inport == ${vp.json_name} && " > + "((arp.op == 1 && arp.spa == ${virtual_ip} && arp.tpa == ${virtual_ip}) || " > + "(arp.op == 2 && arp.spa == ${virtual_ip}))", > + .actions = "bind_vport(${sp.json_name}, inport); next;", > + .external_ids = stage_hint(lsp._uuid)) :- > + sp in &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "virtual"}), > + Some{var virtual_ip} = map_get(lsp.options, "virtual-ip"), > + Some{var virtual_parents} = map_get(lsp.options, "virtual-parents"), > + Some{var ip} = ip_parse(virtual_ip), > + var vparent = FlatMap(string_split(virtual_parents, ",")), > + vp in &SwitchPort(.lsp = nb::Logical_Switch_Port{.name = vparent}), > + vp.sw == sp.sw. > + > +/* > + * Add ARP/ND reply flows if either the > + * - port is up and it doesn't have 'unknown' address defined or > + * - port type is router or > + * - port type is localport > + */ > +for (CheckLspIsUp[check_lsp_is_up]) { > + for (SwitchPortIPv4Address(.port = &SwitchPort{.lsp = lsp, .sw = &sw, .json_name = json_name}, > + .ea = ea, .addr = addr) > + if lsp.is_enabled() and > + ((lsp_is_up(lsp) or not check_lsp_is_up) > + or lsp.__type == "router" or lsp.__type == "localport") and > + lsp.__type != "external" and lsp.__type != "virtual" and > + not set_contains(lsp.addresses, "unknown")) > + { > + var __match = "arp.tpa == ${addr.addr} && arp.op == 1" in > + { > + var actions = "eth.dst = eth.src; " > + "eth.src = ${ea}; " > + "arp.op = 2; /* ARP reply */ " > + "arp.tha = arp.sha; " > + "arp.sha = ${ea}; " > + "arp.tpa = arp.spa; " > + "arp.spa = ${addr.addr}; " > + "outport = inport; " > + "flags.loopback = 1; " > + "output;" in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lsp._uuid)); > + > + /* Do not reply to an ARP request from the port that owns the > + * address (otherwise a DHCP client that ARPs to check for a > + * duplicate address will fail). Instead, forward it the usual > + * way. > + * > + * (Another alternative would be to simply drop the packet. If > + * everything is working as it is configured, then this would > + * produce equivalent results, since no one should reply to the > + * request. But ARPing for one's own IP address is intended to > + * detect situations where the network is not working as > + * configured, so dropping the request would frustrate that > + * intent.) */ > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 100, > + .__match = __match ++ " && inport == ${json_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > + } > + } > +} > + > +/* For ND solicitations, we need to listen for both the > + * unicast IPv6 address and its all-nodes multicast address, > + * but always respond with the unicast IPv6 address. */ > +for (SwitchPortIPv6Address(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, > + .ea = ea, .addr = addr) > + if lsp.is_enabled() and > + (lsp_is_up(lsp) or lsp.__type == "router" or lsp.__type == "localport") and > + lsp.__type != "external" and lsp.__type != "virtual") > +{ > + var __match = "nd_ns && ip6.dst == {${addr.addr}, ${ipv6_netaddr_solicited_node(addr)}} && nd.target == ${addr.addr}" in > + var actions = "${if (lsp.__type == \"router\") \"nd_na_router\" else \"nd_na\"} { " > + "eth.src = ${ea}; " > + "ip6.src = ${addr.addr}; " > + "nd.target = ${addr.addr}; " > + "nd.tll = ${ea}; " > + "outport = inport; " > + "flags.loopback = 1; " > + "output; " > + "};" in > + { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lsp._uuid)); > + > + /* Do not reply to a solicitation from the port that owns the > + * address (otherwise DAD detection will fail). */ > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 100, > + .__match = __match ++ " && inport == ${json_name}", > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > + } > +} > + > +/* Ingress table ARP_ND_RSP: ARP/ND responder, by default goto next. > + * (priority 0)*/ > +for (ls in nb::Logical_Switch) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +/* Ingress table ARP_ND_RSP: ARP/ND responder for service monitor source ip. > + * (priority 110)*/ > +Flow(.logical_datapath = sp.sw.ls._uuid, > + .stage = switch_stage(IN, ARP_ND_RSP), > + .priority = 110, > + .__match = "arp.tpa == ${svc_mon_src_ip} && arp.op == 1", > + .actions = "eth.dst = eth.src; " > + "eth.src = ${svc_monitor_mac}; " > + "arp.op = 2; /* ARP reply */ " > + "arp.tha = arp.sha; " > + "arp.sha = ${svc_monitor_mac}; " > + "arp.tpa = arp.spa; " > + "arp.spa = ${svc_mon_src_ip}; " > + "outport = inport; " > + "flags.loopback = 1; " > + "output;", > + .external_ids = stage_hint(lbvipbackend.lbvip.lb._uuid)) :- > + LBVIPBackend[lbvipbackend], > + Some{var svc_monitor} = lbvipbackend.svc_monitor, > + sp in &SwitchPort( > + .lsp = nb::Logical_Switch_Port{.name = svc_monitor.port_name}), > + var svc_mon_src_ip = svc_monitor.src_ip, > + SvcMonitorMac(svc_monitor_mac). > + > +function build_dhcpv4_action( > + lsp_json_key: string, > + dhcpv4_options: nb::DHCP_Options, > + offer_ip: in_addr) : Option<(string, string, string)> = > +{ > + match (ip_parse_masked(dhcpv4_options.cidr)) { > + Left{err} -> { > + /* cidr defined is invalid */ > + None > + }, > + Right{(var host_ip, var mask)} -> { > + if (not ip_same_network((offer_ip, host_ip), mask)) { > + /* the offer ip of the logical port doesn't belong to the cidr > + * defined in the DHCPv4 options. > + */ > + None > + } else { > + match ((map_get(dhcpv4_options.options, "server_id"), > + map_get(dhcpv4_options.options, "server_mac"), > + map_get(dhcpv4_options.options, "lease_time"))) > + { > + (Some{var server_ip}, Some{var server_mac}, Some{var lease_time}) -> { > + var options_map = dhcpv4_options.options; > + > + /* server_mac is not DHCPv4 option, delete it from the smap. */ > + map_remove(options_map, "server_mac"); > + map_insert(options_map, "netmask", "${mask}"); > + > + /* We're not using SMAP_FOR_EACH because we want a consistent order of the > + * options on different architectures (big or little endian, SSE4.2) */ > + var options = vec_empty(); > + for (node in options_map) { > + (var k, var v) = node; > + vec_push(options, "${k} = ${v}") > + }; > + var options_action = "${rEGBIT_DHCP_OPTS_RESULT()} = put_dhcp_opts(offerip = ${offer_ip}, " ++ > + string_join(options, ", ") ++ "); next;"; > + var response_action = "eth.dst = eth.src; eth.src = ${server_mac}; " > + "ip4.src = ${server_ip}; udp.src = 67; " > + "udp.dst = 68; outport = inport; flags.loopback = 1; " > + "output;"; > + > + var ipv4_addr_match = "ip4.src == ${offer_ip} && ip4.dst == {${server_ip}, 255.255.255.255}"; > + Some{(options_action, response_action, ipv4_addr_match)} > + }, > + _ -> { > + /* "server_id", "server_mac" and "lease_time" should be > + * present in the dhcp_options. */ > + //static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > + warn("Required DHCPv4 options not defined for lport - ${lsp_json_key}"); > + None > + } > + } > + } > + } > + } > +} > + > +function build_dhcpv6_action( > + lsp_json_key: string, > + dhcpv6_options: nb::DHCP_Options, > + offer_ip: in6_addr): Option<(string, string)> = > +{ > + match (ipv6_parse_masked(dhcpv6_options.cidr)) { > + Left{err} -> { > + /* cidr defined is invalid */ > + //warn("cidr is invalid - ${err}"); > + None > + }, > + Right{(var host_ip, var mask)} -> { > + if (not ipv6_same_network((offer_ip, host_ip), mask)) { > + /* offer_ip doesn't belongs to the cidr defined in lport's DHCPv6 > + * options.*/ > + //warn("ip does not belong to cidr"); > + None > + } else { > + /* "server_id" should be the MAC address. */ > + match (map_get(dhcpv6_options.options, "server_id")) { > + None -> { > + warn("server_id not present in the DHCPv6 options for lport ${lsp_json_key}"); > + None > + }, > + Some{server_mac} -> { > + match (eth_addr_from_string(server_mac)) { > + None -> { > + warn("server_id not present in the DHCPv6 options for lport ${lsp_json_key}"); > + None > + }, > + Some{ea} -> { > + /* Get the link local IP of the DHCPv6 server from the server MAC. */ > + var server_ip = ipv6_string_mapped(in6_generate_lla(ea)); > + var ia_addr = ipv6_string_mapped(offer_ip); > + var options = vec_empty(); > + > + /* Check whether the dhcpv6 options should be configured as stateful. > + * Only reply with ia_addr option for dhcpv6 stateful address mode. */ > + if (map_get_bool_def(dhcpv6_options.options, "dhcpv6_stateless", false) == false) { > + vec_push(options, "ia_addr = ${ia_addr}") > + } else (); > + > + /* We're not using SMAP_FOR_EACH because we want a consistent order of the > + * options on different architectures (big or little endian, SSE4.2) */ > + // FIXME: enumerate map in ascending order of keys. Is this good enough? > + for (node in dhcpv6_options.options) { > + (var k, var v) = node; > + if (k != "dhcpv6_stateless") { > + vec_push(options, "${k} = ${v}") > + } else () > + }; > + > + var options_action = "${rEGBIT_DHCP_OPTS_RESULT()} = put_dhcpv6_opts(" ++ > + string_join(options, ", ") ++ > + "); next;"; > + var response_action = "eth.dst = eth.src; eth.src = ${server_mac}; " > + "ip6.dst = ip6.src; ip6.src = ${server_ip}; udp.src = 547; " > + "udp.dst = 546; outport = inport; flags.loopback = 1; " > + "output;"; > + Some{(options_action, response_action)} > + } > + } > + } > + } > + } > + } > + } > +} > + > +/* If 'names' has one element, returns json_string_escape() for it. > + * Otherwise, returns json_string_escape() of all of its elements inside "{...}". > + */ > +function json_string_escape_vec(names: Vec<string>): string > +{ > + match ((names.len(), names.nth(0))) { > + (1, Some{name}) -> json_string_escape(name), > + _ -> { > + var json_names = vec_with_capacity(names.len()); > + for (name in names) { > + json_names.push(json_string_escape(name)); > + }; > + "{" ++ json_names.join(", ") ++ "}" > + } > + } > +} > + > +/* > + * Ordinarily, returns a single match against 'lsp'. > + * > + * If 'lsp' is an external port, returns a match against the localnet port(s) on > + * its switch along with a condition that it only operate if 'lsp' is > + * chassis-resident. This makes sense as a condition for sending DHCP replies > + * to external ports because only one chassis should send such a reply. > + * > + * Returns a prefix and a suffix string. There is no reason for this except > + * that it makes it possible to exactly mimic the format used by ovn-northd.c > + * so that text-based comparisons do not show differences. (This fails if > + * there's more than one localnet port since the C version uses multiple flows > + * in that case.) > + */ > +function match_dhcp_input(lsp: Ref<SwitchPort>): (string, string) = > +{ > + if (lsp.lsp.__type == "external" and not lsp.sw.localnet_port_names.is_empty()) { > + ("inport == " ++ json_string_escape_vec(lsp.sw.localnet_port_names) ++ " && ", > + " && is_chassis_resident(${lsp.json_name})") > + } else { > + ("inport == ${lsp.json_name} && ", "") > + } > +} > + > +/* Logical switch ingress tables DHCP_OPTIONS and DHCP_RESPONSE: DHCP options > + * and response priority 100 flows. */ > +for (lsp in &SwitchPort > + /* Don't add the DHCP flows if the port is not enabled or if the > + * port is a router port. */ > + if (lsp.is_enabled() and lsp.lsp.__type != "router") > + /* If it's an external port and there is no localnet port > + * and if it doesn't belong to an HA chassis group ignore it. */ > + and (lsp.lsp.__type != "external" > + or (not lsp.sw.localnet_port_names.is_empty() > + and is_some(lsp.lsp.ha_chassis_group)))) > +{ > + for (lps in LogicalSwitchPort(.lport = lsp.lsp._uuid, .lswitch = lsuuid)) { > + var json_key = json_string_escape(lsp.lsp.name) in > + (var pfx, var sfx) = match_dhcp_input(lsp) in > + { > + /* DHCPv4 options enabled for this port */ > + Some{var dhcpv4_options_uuid} = lsp.lsp.dhcpv4_options in > + { > + for (dhcpv4_options in nb::DHCP_Options(._uuid = dhcpv4_options_uuid)) { > + for (SwitchPortIPv4Address(.port = &SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = lsp.lsp._uuid}}, .ea = ea, .addr = addr)) { > + Some{(var options_action, var response_action, var ipv4_addr_match)} = > + build_dhcpv4_action(json_key, dhcpv4_options, addr.addr) in > + { > + var __match = > + pfx ++ "eth.src == ${ea} && " > + "ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && " > + "udp.src == 68 && udp.dst == 67" ++ sfx > + in > + Flow(.logical_datapath = lsuuid, > + .stage = switch_stage(IN, DHCP_OPTIONS), > + .priority = 100, > + .__match = __match, > + .actions = options_action, > + .external_ids = stage_hint(lsp.lsp._uuid)); > + > + /* Allow ip4.src = OFFER_IP and > + * ip4.dst = {SERVER_IP, 255.255.255.255} for the below > + * cases > + * - When the client wants to renew the IP by sending > + * the DHCPREQUEST to the server ip. > + * - When the client wants to renew the IP by > + * broadcasting the DHCPREQUEST. > + */ > + var __match = pfx ++ "eth.src == ${ea} && " > + "${ipv4_addr_match} && udp.src == 68 && udp.dst == 67" ++ sfx in > + Flow(.logical_datapath = lsuuid, > + .stage = switch_stage(IN, DHCP_OPTIONS), > + .priority = 100, > + .__match = __match, > + .actions = options_action, > + .external_ids = stage_hint(lsp.lsp._uuid)); > + > + /* If REGBIT_DHCP_OPTS_RESULT is set, it means the > + * put_dhcp_opts action is successful. */ > + var __match = pfx ++ "eth.src == ${ea} && " > + "ip4 && udp.src == 68 && udp.dst == 67 && " ++ > + rEGBIT_DHCP_OPTS_RESULT() ++ sfx in > + Flow(.logical_datapath = lsuuid, > + .stage = switch_stage(IN, DHCP_RESPONSE), > + .priority = 100, > + .__match = __match, > + .actions = response_action, > + .external_ids = stage_hint(lsp.lsp._uuid)) > + // FIXME: is there a constraint somewhere that guarantees that build_dhcpv4_action > + // returns Some() for at most 1 address in lsp_addrs? Otherwise, simulate this break > + // by computing an aggregate that returns the first element of a group. > + //break; > + } > + } > + } > + }; > + > + /* DHCPv6 options enabled for this port */ > + Some{var dhcpv6_options_uuid} = lsp.lsp.dhcpv6_options in > + { > + for (dhcpv6_options in nb::DHCP_Options(._uuid = dhcpv6_options_uuid)) { > + for (SwitchPortIPv6Address(.port = &SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = lsp.lsp._uuid}}, .ea = ea, .addr = addr)) { > + Some{(var options_action, var response_action)} = > + build_dhcpv6_action(json_key, dhcpv6_options, addr.addr) in > + { > + var __match = pfx ++ "eth.src == ${ea}" > + " && ip6.dst == ff02::1:2 && udp.src == 546 &&" > + " udp.dst == 547" ++ sfx in > + { > + Flow(.logical_datapath = lsuuid, > + .stage = switch_stage(IN, DHCP_OPTIONS), > + .priority = 100, > + .__match = __match, > + .actions = options_action, > + .external_ids = stage_hint(lsp.lsp._uuid)); > + > + /* If REGBIT_DHCP_OPTS_RESULT is set to 1, it means the > + * put_dhcpv6_opts action is successful */ > + Flow(.logical_datapath = lsuuid, > + .stage = switch_stage(IN, DHCP_RESPONSE), > + .priority = 100, > + .__match = __match ++ " && ${rEGBIT_DHCP_OPTS_RESULT()}", > + .actions = response_action, > + .external_ids = stage_hint(lsp.lsp._uuid)) > + // FIXME: is there a constraint somewhere that guarantees that build_dhcpv4_action > + // returns Some() for at most 1 address in lsp_addrs? Otherwise, simulate this breaks > + // by computing an aggregate that returns the first element of a group. > + //break; > + } > + } > + } > + } > + } > + } > + } > +} > + > +/* Logical switch ingress tables DNS_LOOKUP and DNS_RESPONSE: DNS lookup and > + * response priority 100 flows. > + */ > +for (LogicalSwitchHasDNSRecords(ls, true)) > +{ > + Flow(.logical_datapath = ls, > + .stage = switch_stage(IN, DNS_LOOKUP), > + .priority = 100, > + .__match = "udp.dst == 53", > + .actions = "${rEGBIT_DNS_LOOKUP_RESULT()} = dns_lookup(); next;", > + .external_ids = map_empty()); > + > + var action = "eth.dst <-> eth.src; ip4.src <-> ip4.dst; " > + "udp.dst = udp.src; udp.src = 53; outport = inport; " > + "flags.loopback = 1; output;" in > + Flow(.logical_datapath = ls, > + .stage = switch_stage(IN, DNS_RESPONSE), > + .priority = 100, > + .__match = "udp.dst == 53 && ${rEGBIT_DNS_LOOKUP_RESULT()}", > + .actions = action, > + .external_ids = map_empty()); > + > + var action = "eth.dst <-> eth.src; ip6.src <-> ip6.dst; " > + "udp.dst = udp.src; udp.src = 53; outport = inport; " > + "flags.loopback = 1; output;" in > + Flow(.logical_datapath = ls, > + .stage = switch_stage(IN, DNS_RESPONSE), > + .priority = 100, > + .__match = "udp.dst == 53 && ${rEGBIT_DNS_LOOKUP_RESULT()}", > + .actions = action, > + .external_ids = map_empty()) > +} > + > +/* Ingress table DHCP_OPTIONS and DHCP_RESPONSE: DHCP options and response, by > + * default goto next. (priority 0). > + * > + * Ingress table DNS_LOOKUP and DNS_RESPONSE: DNS lookup and response, by > + * default goto next. (priority 0). > + > + * Ingress table EXTERNAL_PORT - External port handling, by default goto next. > + * (priority 0). */ > +for (ls in nb::Logical_Switch) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, DHCP_OPTIONS), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, DHCP_RESPONSE), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, DNS_LOOKUP), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, DNS_RESPONSE), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, EXTERNAL_PORT), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 110, > + .__match = "eth.dst == $svc_monitor_mac", > + .actions = "handle_svc_check(inport);", > + .external_ids = map_empty()) :- > + sw in &Switch(). > + > +for (sw in &Switch(.ls = ls, .mcast_cfg = &mcast_cfg) > + if (mcast_cfg.enabled)) { > + for (SwitchMcastFloodRelayPorts(sw, relay_ports)) { > + for (SwitchMcastFloodReportPorts(sw, flood_report_ports)) { > + for (SwitchMcastFloodPorts(sw, flood_ports)) { > + var flood_relay = not set_is_empty(relay_ports) in > + var flood_reports = not set_is_empty(flood_report_ports) in > + var flood_static = not set_is_empty(flood_ports) in > + var igmp_act = { > + if (flood_reports) { > + var mrouter_static = json_string_escape(mC_MROUTER_STATIC().0); > + "clone { " > + "outport = ${mrouter_static}; " > + "output; " > + "};igmp;" > + } else { > + "igmp;" > + } > + } in { > + /* Punt IGMP traffic to controller. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 100, > + .__match = "ip4 && ip.proto == 2", > + .actions = "${igmp_act}", > + .external_ids = map_empty()); > + > + /* Punt MLD traffic to controller. */ > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 100, > + .__match = "mldv1 || mldv2", > + .actions = "${igmp_act}", > + .external_ids = map_empty()); > + > + /* Flood all IP multicast traffic destined to 224.0.0.X to > + * all ports - RFC 4541, section 2.1.2, item 2. > + */ > + var flood = json_string_escape(mC_FLOOD().0) in > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 85, > + .__match = "ip4.mcast && ip4.dst == 224.0.0.0/24", > + .actions = "outport = ${flood}; output;", > + .external_ids = map_empty()); > + > + /* Flood all IPv6 multicast traffic destined to reserved > + * multicast IPs (RFC 4291, 2.7.1). > + */ > + var flood = json_string_escape(mC_FLOOD().0) in > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 85, > + .__match = "ip6.mcast_flood", > + .actions = "outport = ${flood}; output;", > + .external_ids = map_empty()); > + > + /* Forward uregistered IP multicast to routers with relay > + * enabled and to any ports configured to flood IP > + * multicast traffic. If configured to flood unregistered > + * traffic this will be handled by the L2 multicast flow. > + */ > + if (not mcast_cfg.flood_unreg) { > + var relay_act = { > + if (flood_relay) { > + var rtr_flood = json_string_escape(mC_MROUTER_FLOOD().0); > + "clone { " > + "outport = ${rtr_flood}; " > + "output; " > + "}; " > + } else { > + "" > + } > + } in > + var static_act = { > + if (flood_static) { > + var mc_static = json_string_escape(mC_STATIC().0); > + "outport =${mc_static}; output;" > + } else { > + "" > + } > + } in > + var drop_act = { > + if (not flood_relay and not flood_static) { > + "drop;" > + } else { > + "" > + } > + } in > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 80, > + .__match = "ip4.mcast || ip6.mcast", > + .actions = > + "${relay_act}${static_act}${drop_act}", > + .external_ids = map_empty()) > + } > + } > + } > + } > + } > +} > + > +/* Ingress table L2_LKUP: Add IP multicast flows learnt from IGMP/MLD (priority > + * 90). */ > +for (IgmpSwitchMulticastGroup(.address = address, .switch = &sw)) { > + /* RFC 4541, section 2.1.2, item 2: Skip groups in the 224.0.0.X > + * range. > + * > + * RFC 4291, section 2.7.1: Skip groups that correspond to all > + * hosts. > + */ > + Some{var ip} = ip46_parse(address) in > + (var skip_address) = match (ip) { > + IPv4{ipv4} -> ip_is_local_multicast(ipv4), > + IPv6{ipv6} -> ipv6_is_all_hosts(ipv6) > + } in > + var ipX = ip46_ipX(ip) in > + for (SwitchMcastFloodRelayPorts(&sw, relay_ports) if not skip_address) { > + for (SwitchMcastFloodPorts(&sw, flood_ports)) { > + var flood_relay = not set_is_empty(relay_ports) in > + var flood_static = not set_is_empty(flood_ports) in > + var mc_rtr_flood = json_string_escape(mC_MROUTER_FLOOD().0) in > + var mc_static = json_string_escape(mC_STATIC().0) in > + var relay_act = { > + if (flood_relay) { > + "clone { " > + "outport = ${mc_rtr_flood}; output; " > + "};" > + } else { > + "" > + } > + } in > + var static_act = { > + if (flood_static) { > + "clone { " > + "outport =${mc_static}; " > + "output; " > + "};" > + } else { > + "" > + } > + } in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 90, > + .__match = "eth.mcast && ${ipX} && ${ipX}.dst == ${address}", > + .actions = > + "${relay_act} ${static_act} outport = \"${address}\"; " > + "output;", > + .external_ids = map_empty()) > + } > + } > +} > + > +/* Table EXTERNAL_PORT: External port. Drop ARP request for router ips from > + * external ports on chassis not binding those ports. This makes the router > + * pipeline to be run only on the chassis binding the external ports. > + * > + * For an external port X on logical switch LS, if X is not resident on this > + * chassis, drop ARP requests arriving on localnet ports from X's Ethernet > + * address, if the ARP request is asking to translate the IP address of a > + * router port on LS. */ > +Flow(.logical_datapath = sp.sw.ls._uuid, > + .stage = switch_stage(IN, EXTERNAL_PORT), > + .priority = 100, > + .__match = ("inport == ${json_string_escape(localnet_port_name)} && " > + "eth.src == ${lp_addr.ea} && " > + "!is_chassis_resident(${sp.json_name}) && " > + "arp.tpa == ${rp_addr.addr} && arp.op == 1"), > + .actions = "drop;", > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(), > + sp.lsp.__type == "external", > + var localnet_port_name = FlatMap(sp.sw.localnet_port_names), > + var lp_addr = FlatMap(sp.static_addresses), > + rp in &SwitchPort(.sw = sp.sw), > + rp.lsp.__type == "router", > + SwitchPortIPv4Address(.port = rp, .addr = rp_addr). > +Flow(.logical_datapath = sp.sw.ls._uuid, > + .stage = switch_stage(IN, EXTERNAL_PORT), > + .priority = 100, > + .__match = ("inport == ${json_string_escape(localnet_port_name)} && " > + "eth.src == ${lp_addr.ea} && " > + "!is_chassis_resident(${sp.json_name}) && " > + "nd_ns && ip6.dst == {${rp_addr.addr}, ${ipv6_netaddr_solicited_node(rp_addr)}} && " > + "nd.target == ${rp_addr.addr}"), > + .actions = "drop;", > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(), > + sp.lsp.__type == "external", > + var localnet_port_name = FlatMap(sp.sw.localnet_port_names), > + var lp_addr = FlatMap(sp.static_addresses), > + rp in &SwitchPort(.sw = sp.sw), > + rp.lsp.__type == "router", > + SwitchPortIPv6Address(.port = rp, .addr = rp_addr). > +Flow(.logical_datapath = sp.sw.ls._uuid, > + .stage = switch_stage(IN, EXTERNAL_PORT), > + .priority = 100, > + .__match = ("inport == ${json_string_escape(localnet_port_name)} && " > + "eth.src == ${lp_addr.ea} && " > + "eth.dst == ${ea} && " > + "!is_chassis_resident(${sp.json_name})"), > + .actions = "drop;", > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(), > + sp.lsp.__type == "external", > + var localnet_port_name = FlatMap(sp.sw.localnet_port_names), > + var lp_addr = FlatMap(sp.static_addresses), > + rp in &SwitchPort(.sw = sp.sw), > + rp.lsp.__type == "router", > + SwitchPortAddresses(.port = rp, .addrs = LPortAddress{.ea = ea}). > + > +/* Ingress table L2_LKUP: Destination lookup, broadcast and multicast handling > + * (priority 100). */ > +for (ls in nb::Logical_Switch) { > + var mc_flood = json_string_escape(mC_FLOOD().0) in > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 70, > + .__match = "eth.mcast", > + .actions = "outport = ${mc_flood}; output;", > + .external_ids = map_empty()) > +} > + > +/* Ingress table L2_LKUP: Destination lookup, unicast handling (priority 50). > +*/ > +for (SwitchPortStaticAddresses(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, > + .addrs = addrs) > + if lsp.__type != "external") { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 50, > + .__match = "eth.dst == ${addrs.ea}", > + .actions = "outport = ${json_name}; output;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +/* > + * Ingress table L2_LKUP: Flows that flood self originated ARP/ND packets in the > + * switching domain. > + */ > +/* Self originated ARP requests/ND need to be flooded to the L2 domain > + * (except on router ports). Determine that packets are self originated > + * by also matching on source MAC. Matching on ingress port is not > + * reliable in case this is a VLAN-backed network. > + * Priority: 75. > + */ > + > +/* Returns 'true' if the IP 'addr' is on the same subnet with one of the > + * IPs configured on the router port. > + */ > +function lrouter_port_ip_reachable(rp: Ref<RouterPort>, addr: v46_ip): bool { > + match (addr) { > + IPv4{ipv4} -> { > + for (na in rp.networks.ipv4_addrs) { > + if (ip_same_network((ipv4, na.addr), ipv4_netaddr_mask(na))) { > + return true > + } > + } > + }, > + IPv6{ipv6} -> { > + for (na in rp.networks.ipv6_addrs) { > + if (ipv6_same_network((ipv6, na.addr), ipv6_netaddr_mask(na))) { > + return true > + } > + } > + } > + }; > + false > +} > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 75, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(.sw = sw, .peer = Some{rp}), > + rp.is_enabled(), > + var eth_src_set = { > + var eth_src_set = set_singleton("${rp.networks.ea}"); > + for (nat in rp.router.nats) { > + match (nat.nat.external_mac) { > + Some{mac} -> > + if (lrouter_port_ip_reachable(rp, nat.external_ip)) { > + set_insert(eth_src_set, mac) > + } else (), > + _ -> () > + } > + }; > + eth_src_set > + }, > + var eth_src = "{" ++ string_join(eth_src_set.to_vec(), ", ") ++ "}", > + var __match = "eth.src == ${eth_src} && (arp.op == 1 || nd_ns)", > + var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0), > + var actions = "outport = ${mc_flood_l2}; output;". > + > +/* Forward ARP requests for owned IP addresses (L3, VIP, NAT) only to this > + * router port. > + * Priority: 80. > + */ > +function get_arp_forward_ips(rp: Ref<RouterPort>): (Set<string>, Set<string>) = { > + var all_ips_v4 = set_empty(); > + var all_ips_v6 = set_empty(); > + > + (var lb_ips_v4, var lb_ips_v6) > + = get_router_load_balancer_ips(deref(rp.router)); > + for (a in lb_ips_v4) { > + /* Check if the ovn port has a network configured on which we could > + * expect ARP requests for the LB VIP. > + */ > + match (ip_parse(a)) { > + Some{ipv4} -> if (lrouter_port_ip_reachable(rp, IPv4{ipv4})) { > + set_insert(all_ips_v4, a) > + }, > + _ -> () > + } > + }; > + for (a in lb_ips_v6) { > + /* Check if the ovn port has a network configured on which we could > + * expect NS requests for the LB VIP. > + */ > + match (ipv6_parse(a)) { > + Some{ipv6} -> if (lrouter_port_ip_reachable(rp, IPv6{ipv6})) { > + set_insert(all_ips_v6, a) > + }, > + _ -> () > + } > + }; > + > + for (nat in rp.router.nats) { > + if (nat.nat.__type != "snat") { > + /* Check if the ovn port has a network configured on which we could > + * expect ARP requests/NS for the DNAT external_ip. > + */ > + if (lrouter_port_ip_reachable(rp, nat.external_ip)) { > + match (nat.external_ip) { > + IPv4{_} -> set_insert(all_ips_v4, nat.nat.external_ip), > + IPv6{_} -> set_insert(all_ips_v6, nat.nat.external_ip) > + } > + } > + } > + }; > + > + for (a in rp.networks.ipv4_addrs) { > + set_insert(all_ips_v4, "${a.addr}") > + }; > + for (a in rp.networks.ipv6_addrs) { > + set_insert(all_ips_v6, "${a.addr}") > + }; > + > + (all_ips_v4, all_ips_v6) > +} > +/* Packets received from VXLAN tunnels have already been through the > + * router pipeline so we should skip them. Normally this is done by the > + * multicast_group implementation (VXLAN packets skip table 32 which > + * delivers to patch ports) but we're bypassing multicast_groups. > + * (This is why we match against fLAGBIT_NOT_VXLAN() here.) > + */ > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 80, > + .__match = fLAGBIT_NOT_VXLAN() ++ > + " && arp.op == 1 && arp.tpa == { " ++ > + string_join(set_to_vec(all_ips_v4), ", ") ++ "}", > + .actions = if (sw.has_non_router_port) { > + "clone {outport = ${sp.json_name}; output; }; " > + "outport = ${mc_flood_l2}; output;" > + } else { > + "outport = ${sp.json_name}; output;" > + }, > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(.sw = sw, .peer = Some{rp}), > + rp.is_enabled(), > + (var all_ips_v4, _) = get_arp_forward_ips(rp), > + not set_is_empty(all_ips_v4), > + var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0). > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 80, > + .__match = fLAGBIT_NOT_VXLAN() ++ > + " && nd_ns && nd.target == { " ++ > + string_join(set_to_vec(all_ips_v6), ", ") ++ "}", > + .actions = if (sw.has_non_router_port) { > + "clone {outport = ${sp.json_name}; output; }; " > + "outport = ${mc_flood_l2}; output;" > + } else { > + "outport = ${sp.json_name}; output;" > + }, > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(.sw = sw, .peer = Some{rp}), > + rp.is_enabled(), > + (_, var all_ips_v6) = get_arp_forward_ips(rp), > + not set_is_empty(all_ips_v6), > + var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0). > + > +for (SwitchPortNewDynamicAddress(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, > + .address = Some{addrs}) > + if lsp.__type != "external") { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 50, > + .__match = "eth.dst == ${addrs.ea}", > + .actions = "outport = ${json_name}; output;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +for (&SwitchPort(.lsp = lsp, > + .json_name = json_name, > + .sw = &sw, > + .peer = Some{&RouterPort{.lrp = lrp, > + .is_redirect = is_redirect, > + .router = &Router{.lr = lr, > + .redirect_port_name = redirect_port_name}}}) > + if (set_contains(lsp.addresses, "router") and lsp.__type != "external")) > +{ > + Some{var mac} = scan_eth_addr(lrp.mac) in { > + var add_chassis_resident_check = > + not sw.localnet_port_names.is_empty() and > + (/* The peer of this port represents a distributed > + * gateway port. The destination lookup flow for the > + * router's distributed gateway port MAC address should > + * only be programmed on the "redirect-chassis". */ > + is_redirect or > + /* Check if the option 'reside-on-redirect-chassis' > + * is set to true on the peer port. If set to true > + * and if the logical switch has a localnet port, it > + * means the router pipeline for the packets from > + * this logical switch should be run on the chassis > + * hosting the gateway port. > + */ > + map_get_bool_def(lrp.options, "reside-on-redirect-chassis", false)) in > + var __match = if (add_chassis_resident_check) { > + /* The destination lookup flow for the router's > + * distributed gateway port MAC address should only be > + * programmed on the "redirect-chassis". */ > + "eth.dst == ${mac} && is_chassis_resident(${redirect_port_name})" > + } else { > + "eth.dst == ${mac}" > + } in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 50, > + .__match = __match, > + .actions = "outport = ${json_name}; output;", > + .external_ids = stage_hint(lsp._uuid)); > + > + /* Add ethernet addresses specified in NAT rules on > + * distributed logical routers. */ > + if (is_redirect) { > + for (LogicalRouterNAT(.lr = lr._uuid, .nat = nat)) { > + if (nat.nat.__type == "dnat_and_snat") { > + Some{var lport} = nat.nat.logical_port in > + Some{var emac} = nat.nat.external_mac in > + Some{var nat_mac} = eth_addr_from_string(emac) in > + var __match = "eth.dst == ${nat_mac} && is_chassis_resident(${json_string_escape(lport)})" in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 50, > + .__match = __match, > + .actions = "outport = ${json_name}; output;", > + .external_ids = stage_hint(nat.nat._uuid)) > + } > + } > + } > + } > +} > +// FIXME: do we care about this? > +/* } else { > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); > + > + VLOG_INFO_RL(&rl, > + "%s: invalid syntax '%s' in addresses column", > + op->nbsp->name, op->nbsp->addresses[i]); > + }*/ > + > +/* Ingress table L2_LKUP: Destination lookup for unknown MACs (priority 0). */ > +for (LogicalSwitchUnknownPorts(.ls = ls_uuid)) { > + var mc_unknown = json_string_escape(mC_UNKNOWN().0) in > + Flow(.logical_datapath = ls_uuid, > + .stage = switch_stage(IN, L2_LKUP), > + .priority = 0, > + .__match = "1", > + .actions = "outport = ${mc_unknown}; output;", > + .external_ids = map_empty()) > +} > + > +/* Egress tables PORT_SEC_IP: Egress port security - IP (priority 0) > + * Egress table PORT_SEC_L2: Egress port security L2 - multicast/broadcast (priority 100). */ > +for (&Switch(.ls = ls)) { > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PORT_SEC_IP), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = ls._uuid, > + .stage = switch_stage(OUT, PORT_SEC_L2), > + .priority = 100, > + .__match = "eth.mcast", > + .actions = "output;", > + .external_ids = map_empty()) > +} > + > +/* Egress table PORT_SEC_IP: Egress port security - IP (priorities 90 and 80) > + * if port security enabled. > + * > + * Egress table PORT_SEC_L2: Egress port security - L2 (priorities 50 and 150). > + * > + * Priority 50 rules implement port security for enabled logical port. > + * > + * Priority 150 rules drop packets to disabled logical ports, so that they > + * don't even receive multicast or broadcast packets. */ > +Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, PORT_SEC_L2), > + .priority = 50, > + .__match = __match, > + .actions = queue_action ++ "output;", > + .external_ids = stage_hint(lsp._uuid)) :- > + &SwitchPort(.sw = &sw, .lsp = lsp, .json_name = json_name, .ps_eth_addresses = ps_eth_addresses), > + lsp.is_enabled(), > + lsp.__type != "external", > + var __match = if (vec_is_empty(ps_eth_addresses)) { > + "outport == ${json_name}" > + } else { > + "outport == ${json_name} && eth.dst == {${ps_eth_addresses.join(\" \")}}" > + }, > + pbinding in sb::Out_Port_Binding(.logical_port = lsp.name), > + var queue_action = match ((lsp.__type, > + map_get(pbinding.options, "qdisc_queue_id"))) { > + ("localnet", Some{queue_id}) -> "set_queue(${queue_id});", > + _ -> "" > + }. > + > +for (&SwitchPort(.lsp = lsp, .json_name = json_name, .sw = &sw) > + if not lsp.is_enabled() and lsp.__type != "external") { > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, PORT_SEC_L2), > + .priority = 150, > + .__match = "outport == {$json_name}", > + .actions = "drop;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +for (SwitchPortPSAddresses(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, > + .ps_addrs = ps) > + if (vec_len(ps.ipv4_addrs) > 0 or vec_len(ps.ipv6_addrs) > 0) > + and lsp.__type != "external") > +{ > + if (vec_len(ps.ipv4_addrs) > 0) { > + var addrs = { > + var addrs = vec_empty(); > + for (addr in ps.ipv4_addrs) { > + /* When the netmask is applied, if the host portion is > + * non-zero, the host can only use the specified > + * address. If zero, the host is allowed to use any > + * address in the subnet. > + */ > + vec_push(addrs, ipv4_netaddr_match_host_or_network(addr)); > + if (addr.plen < 32 and not ip_is_zero(ipv4_netaddr_host(addr))) { > + vec_push(addrs, "${ipv4_netaddr_bcast(addr)}") > + } > + }; > + addrs > + } in > + var __match = > + "outport == ${json_name} && eth.dst == ${ps.ea} && ip4.dst == {255.255.255.255, 224.0.0.0/4, " ++ > + string_join(addrs, ", ") ++ "}" in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, PORT_SEC_IP), > + .priority = 90, > + .__match = __match, > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > + }; > + if (vec_len(ps.ipv6_addrs) > 0) { > + var __match = "outport == ${json_name} && eth.dst == ${ps.ea}" ++ > + build_port_security_ipv6_flow(OUT, ps.ea, ps.ipv6_addrs) in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, PORT_SEC_IP), > + .priority = 90, > + .__match = __match, > + .actions = "next;", > + .external_ids = stage_hint(lsp._uuid)) > + }; > + var __match = "outport == ${json_name} && eth.dst == ${ps.ea} && ip" in > + Flow(.logical_datapath = sw.ls._uuid, > + .stage = switch_stage(OUT, PORT_SEC_IP), > + .priority = 80, > + .__match = __match, > + .actions = "drop;", > + .external_ids = stage_hint(lsp._uuid)) > +} > + > +/* Logical router ingress table ADMISSION: Admission control framework. */ > +for (&Router(.lr = lr)) { > + /* Logical VLANs not supported. > + * Broadcast/multicast source address is invalid. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ADMISSION), > + .priority = 100, > + .__match = "vlan.present || eth.src[40]", > + .actions = "drop;", > + .external_ids = map_empty()) > +} > + > +/* Logical router ingress table ADMISSION: match (priority 50). */ > +for (&RouterPort(.lrp = lrp, > + .json_name = json_name, > + .networks = lrp_networks, > + .router = &router, > + .is_redirect = is_redirect) > + /* Drop packets from disabled logical ports (since logical flow > + * tables are default-drop). */ > + if lrp.is_enabled()) > +{ > + //if (op->derived) { > + // /* No ingress packets should be received on a chassisredirect > + // * port. */ > + // continue; > + //} > + > + /* Store the ethernet address of the port receiving the packet. > + * This will save us from having to match on inport further down in > + * the pipeline. > + */ > + var actions = "${rEG_INPORT_ETH_ADDR()} = ${lrp_networks.ea}; next;" in { > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ADMISSION), > + .priority = 50, > + .__match = "eth.mcast && inport == ${json_name}", > + .actions = actions, > + .external_ids = stage_hint(lrp._uuid)); > + > + var __match = > + "eth.dst == ${lrp_networks.ea} && inport == ${json_name}" ++ > + if is_redirect { > + /* Traffic with eth.dst = l3dgw_port->lrp_networks.ea > + * should only be received on the "redirect-chassis". */ > + " && is_chassis_resident(${json_string_escape(chassis_redirect_name(lrp.name))})" > + } else { "" } in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ADMISSION), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lrp._uuid)) > + } > +} > + > + > +/* Logical router ingress table LOOKUP_NEIGHBOR and > + * table LEARN_NEIGHBOR. */ > +/* Learn MAC bindings from ARP/IPv6 ND. > + * > + * For ARP packets, table LOOKUP_NEIGHBOR does a lookup for the > + * (arp.spa, arp.sha) in the mac binding table using the 'lookup_arp' > + * action and stores the result in REGBIT_LOOKUP_NEIGHBOR_RESULT bit. > + * If "always_learn_from_arp_request" is set to false, it will also > + * lookup for the (arp.spa) in the mac binding table using the > + * "lookup_arp_ip" action for ARP request packets, and stores the > + * result in REGBIT_LOOKUP_NEIGHBOR_IP_RESULT bit; or set that bit > + * to "1" directly for ARP response packets. > + * > + * For IPv6 ND NA packets, table LOOKUP_NEIGHBOR does a lookup > + * for the (nd.target, nd.tll) in the mac binding table using the > + * 'lookup_nd' action and stores the result in > + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. If > + * "always_learn_from_arp_request" is set to false, > + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT bit is set. > + * > + * For IPv6 ND NS packets, table LOOKUP_NEIGHBOR does a lookup > + * for the (ip6.src, nd.sll) in the mac binding table using the > + * 'lookup_nd' action and stores the result in > + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. If > + * "always_learn_from_arp_request" is set to false, it will also lookup > + * for the (ip6.src) in the mac binding table using the "lookup_nd_ip" > + * action and stores the result in REGBIT_LOOKUP_NEIGHBOR_IP_RESULT > + * bit. > + * > + * Table LEARN_NEIGHBOR learns the mac-binding using the action > + * - 'put_arp/put_nd'. Learning mac-binding is skipped if > + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit is set or > + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT is not set. > + * > + * */ > + > +/* Flows for LOOKUP_NEIGHBOR. */ > +for (&Router(.lr = lr, .learn_from_arp_request = learn_from_arp_request)) > +var rLNR = rEGBIT_LOOKUP_NEIGHBOR_RESULT() in > +var rLNIR = rEGBIT_LOOKUP_NEIGHBOR_IP_RESULT() in > +{ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LOOKUP_NEIGHBOR), > + .priority = 100, > + .__match = "arp.op == 2", > + .actions = > + "${rLNR} = lookup_arp(inport, arp.spa, arp.sha); " ++ > + { if (learn_from_arp_request) "" else "${rLNIR} = 1; " } ++ > + "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LOOKUP_NEIGHBOR), > + .priority = 100, > + .__match = "nd_na", > + .actions = > + "${rLNR} = lookup_nd(inport, nd.target, nd.tll); " ++ > + { if (learn_from_arp_request) "" else "${rLNIR} = 1; " } ++ > + "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LOOKUP_NEIGHBOR), > + .priority = 100, > + .__match = "nd_ns", > + .actions = > + "${rLNR} = lookup_nd(inport, ip6.src, nd.sll); " ++ > + { if (learn_from_arp_request) "" else > + "${rLNIR} = lookup_nd_ip(inport, ip6.src); " } ++ > + "next;", > + .external_ids = map_empty()); > + > + /* For other packet types, we can skip neighbor learning. > + * So set REGBIT_LOOKUP_NEIGHBOR_RESULT to 1. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LOOKUP_NEIGHBOR), > + .priority = 0, > + .__match = "1", > + .actions = "${rLNR} = 1; next;", > + .external_ids = map_empty()); > + > + /* Flows for LEARN_NEIGHBOR. */ > + /* Skip Neighbor learning if not required. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LEARN_NEIGHBOR), > + .priority = 100, > + .__match = > + "${rLNR} == 1" ++ > + { if (learn_from_arp_request) "" else " || ${rLNIR} == 0" }, > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LEARN_NEIGHBOR), > + .priority = 90, > + .__match = "arp", > + .actions = "put_arp(inport, arp.spa, arp.sha); next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LEARN_NEIGHBOR), > + .priority = 90, > + .__match = "arp", > + .actions = "put_arp(inport, arp.spa, arp.sha); next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LEARN_NEIGHBOR), > + .priority = 90, > + .__match = "nd_na", > + .actions = "put_nd(inport, nd.target, nd.tll); next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LEARN_NEIGHBOR), > + .priority = 90, > + .__match = "nd_ns", > + .actions = "put_nd(inport, ip6.src, nd.sll); next;", > + .external_ids = map_empty()) > +} > + > +/* Check if we need to learn mac-binding from ARP requests. */ > +for (RouterPortNetworksIPv4Addr(rp@&RouterPort{.router = router}, addr)) { > + var is_l3dgw_port = match (router.l3dgw_port) { > + Some{l3dgw_lrp} -> l3dgw_lrp._uuid == rp.lrp._uuid, > + None -> false > + } in > + var has_redirect_port = router.redirect_port_name != "" in > + var chassis_residence = match (is_l3dgw_port and has_redirect_port) { > + true -> " && is_chassis_resident(${router.redirect_port_name})", > + false -> "" > + } in > + var rLNR = rEGBIT_LOOKUP_NEIGHBOR_RESULT() in > + var rLNIR = rEGBIT_LOOKUP_NEIGHBOR_IP_RESULT() in > + var match0 = "inport == ${rp.json_name} && " > + "arp.spa == ${ipv4_netaddr_match_network(addr)}" in > + var match1 = "arp.op == 1" ++ chassis_residence in > + var learn_from_arp_request = router.learn_from_arp_request in { > + if (not learn_from_arp_request) { > + /* ARP request to this address should always get learned, > + * so add a priority-110 flow to set > + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT to 1. */ > + var __match = [match0, "arp.tpa == ${addr.addr}", match1] in > + var actions = "${rLNR} = lookup_arp(inport, arp.spa, arp.sha); " > + "${rLNIR} = 1; " > + "next;" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, LOOKUP_NEIGHBOR), > + .priority = 110, > + .__match = __match.join(" && "), > + .actions = actions, > + .external_ids = stage_hint(rp.lrp._uuid)) > + }; > + > + var actions = "${rLNR} = lookup_arp(inport, arp.spa, arp.sha); " ++ > + { if (learn_from_arp_request) "" else > + "${rLNIR} = lookup_arp_ip(inport, arp.spa); " } ++ > + "next;" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, LOOKUP_NEIGHBOR), > + .priority = 100, > + .__match = "${match0} && ${match1}", > + .actions = actions, > + .external_ids = stage_hint(rp.lrp._uuid)) > + } > +} > + > + > +/* Logical router ingress table IP_INPUT: IP Input. */ > +for (router in &Router(.lr = lr, .mcast_cfg = &mcast_cfg)) { > + /* L3 admission control: drop multicast and broadcast source, localhost > + * source or destination, and zero network source or destination > + * (priority 100). */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 100, > + .__match = "ip4.src_mcast ||" > + "ip4.src == 255.255.255.255 || " > + "ip4.src == 127.0.0.0/8 || " > + "ip4.dst == 127.0.0.0/8 || " > + "ip4.src == 0.0.0.0/8 || " > + "ip4.dst == 0.0.0.0/8", > + .actions = "drop;", > + .external_ids = map_empty()); > + > + /* Drop ARP packets (priority 85). ARP request packets for router's own > + * IPs are handled with priority-90 flows. > + * Drop IPv6 ND packets (priority 85). ND NA packets for router's own > + * IPs are handled with priority-90 flows. > + */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 85, > + .__match = "arp || nd", > + .actions = "drop;", > + .external_ids = map_empty()); > + > + /* Allow IPv6 multicast traffic that's supposed to reach the > + * router pipeline (e.g., router solicitations). > + */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 84, > + .__match = "nd_rs || nd_ra", > + .actions = "next;", > + .external_ids = map_empty()); > + > + /* Drop other reserved multicast. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 83, > + .__match = "ip6.mcast_rsvd", > + .actions = "drop;", > + .external_ids = map_empty()); > + > + /* Allow other multicast if relay enabled (priority 82). */ > + var mcast_action = { if (mcast_cfg.relay) { "next;" } else { "drop;" } } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 82, > + .__match = "ip4.mcast || ip6.mcast", > + .actions = mcast_action, > + .external_ids = map_empty()); > + > + /* Drop Ethernet local broadcast. By definition this traffic should > + * not be forwarded.*/ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 50, > + .__match = "eth.bcast", > + .actions = "drop;", > + .external_ids = map_empty()); > + > + /* TTL discard */ > + Flow( > + .logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 30, > + .__match = "ip4 && ip.ttl == {0, 1}", > + .actions = "drop;", > + .external_ids = map_empty()); > + > + /* Pass other traffic not already handled to the next table for > + * routing. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +function format_v4_networks(networks: lport_addresses, add_bcast: bool): string = > +{ > + var addrs = vec_empty(); > + for (addr in networks.ipv4_addrs) { > + vec_push(addrs, "${addr.addr}"); > + if (add_bcast) { > + vec_push(addrs, "${ipv4_netaddr_bcast(addr)}") > + } else () > + }; > + if (vec_len(addrs) == 1) { > + string_join(addrs , ", ") > + } else { > + "{" ++ string_join(addrs , ", ") ++ "}" > + } > +} > + > +function format_v6_networks(networks: lport_addresses): string = > +{ > + var addrs = vec_empty(); > + for (addr in networks.ipv6_addrs) { > + vec_push(addrs, "${addr.addr}") > + }; > + if (vec_len(addrs) == 1) { > + string_join(addrs, ", ") > + } else { > + "{" ++ string_join(addrs , ", ") ++ "}" > + } > +} > + > +/* The following relation is used in ARP reply flow generation to determine whether > + * the is_chassis_resident check must be added to the flow. > + */ > +relation AddChassisResidentCheck_(lrp: uuid, add_check: bool) > + > +AddChassisResidentCheck_(lrp._uuid, res) :- > + &SwitchPort(.peer = Some{&RouterPort{.lrp = lrp, .router = &router, .is_redirect = is_redirect}}, > + .sw = sw), > + is_some(router.l3dgw_port), > + not sw.localnet_port_names.is_empty(), > + var res = if (is_redirect) { > + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea > + * should only be sent from the "redirect-chassis", so that > + * upstream MAC learning points to the "redirect-chassis". > + * Also need to avoid generation of multiple ARP responses > + * from different chassis. */ > + true > + } else { > + /* Check if the option 'reside-on-redirect-chassis' > + * is set to true on the router port. If set to true > + * and if peer's logical switch has a localnet port, it > + * means the router pipeline for the packets from > + * peer's logical switch is be run on the chassis > + * hosting the gateway port and it should reply to the > + * ARP requests for the router port IPs. > + */ > + map_get_bool_def(lrp.options, "reside-on-redirect-chassis", false) > + }. > + > + > +relation AddChassisResidentCheck(lrp: uuid, add_check: bool) > + > +AddChassisResidentCheck(lrp, add_check) :- > + AddChassisResidentCheck_(lrp, add_check). > + > +AddChassisResidentCheck(lrp, false) :- > + nb::Logical_Router_Port(._uuid = lrp), > + not AddChassisResidentCheck_(lrp, _). > + > + > +function get_force_snat_ip(lr: nb::Logical_Router, key_type: string): Set<v46_ip> = > +{ > + var ips = set_empty(); > + match (map_get(lr.options, key_type ++ "_force_snat_ip")) { > + None -> (), > + Some{s} -> { > + for (token in s.split(" ")) { > + match (ip46_parse(token)) { > + Some{ip} -> set_insert(ips, ip), > + _ -> () // XXX warn > + } > + }; > + } > + }; > + ips > +} > + > +function has_force_snat_ip(lr: nb::Logical_Router, key_type: string): bool { > + not get_force_snat_ip(lr, key_type).is_empty() > +} > + > +/* Logical router ingress table IP_INPUT: IP Input for IPv4. */ > +for (&RouterPort(.router = &router, .networks = networks, .lrp = lrp) > + if (not vec_is_empty(networks.ipv4_addrs))) > +{ > + /* L3 admission control: drop packets that originate from an > + * IPv4 address owned by the router or a broadcast address > + * known to the router (priority 100). */ > + var __match = "ip4.src == " ++ > + format_v4_networks(networks, true) ++ > + " && ${rEGBIT_EGRESS_LOOPBACK()} == 0" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 100, > + .__match = __match, > + .actions = "drop;", > + .external_ids = stage_hint(lrp._uuid)); > + > + /* ICMP echo reply. These flows reply to ICMP echo requests > + * received for the router's IP address. Since packets only > + * get here as part of the logical router datapath, the inport > + * (i.e. the incoming locally attached net) does not matter. > + * The ip.ttl also does not matter (RFC1812 section 4.2.2.9) */ > + var __match = "ip4.dst == " ++ > + format_v4_networks(networks, false) ++ > + " && icmp4.type == 8 && icmp4.code == 0" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 90, > + .__match = __match, > + .actions = "ip4.dst <-> ip4.src; " > + "ip.ttl = 255; " > + "icmp4.type = 0; " > + "flags.loopback = 1; " > + "next; ", > + .external_ids = stage_hint(lrp._uuid)) > +} > + > +/* Priority-90-92 flows handle ARP requests and ND packets. Most are > + * per logical port but DNAT addresses can be handled per datapath > + * for non gateway router ports. > + * > + * Priority 91 and 92 flows are added for each gateway router > + * port to handle the special cases. In case we get the packet > + * on a regular port, just reply with the port's ETH address. > + */ > +LogicalRouterNatArpNdFlow(router, nat) :- > + router in &Router(.lr = nb::Logical_Router{._uuid = lr}), > + LogicalRouterNAT(.lr = lr, .nat = nat@NAT{.nat = &nb::NAT{.__type = __type}}), > + /* Skip SNAT entries for now, we handle unique SNAT IPs separately > + * below. > + */ > + __type != "snat". > +/* Now handle SNAT entries too, one per unique SNAT IP. */ > +LogicalRouterNatArpNdFlow(router, nat) :- > + router in &Router(.snat_ips = snat_ips), > + var snat_ip = FlatMap(snat_ips), > + (var ip, var nats) = snat_ip, > + Some{var nat} = nats.nth(0). > + > +relation LogicalRouterNatArpNdFlow(router: Ref<Router>, nat: NAT) > +LogicalRouterArpNdFlow(router, nat, None, rEG_INPORT_ETH_ADDR(), None, false, 90) :- > + LogicalRouterNatArpNdFlow(router, nat). > + > +/* ARP / ND handling for external IP addresses. > + * > + * DNAT and SNAT IP addresses are external IP addresses that need ARP > + * handling. > + * > + * These are already taken care globally, per router. The only > + * exception is on the l3dgw_port where we might need to use a > + * different ETH address. > + */ > +LogicalRouterPortNatArpNdFlow(router, nat, l3dgw_port) :- > + router in &Router(.lr = lr, .l3dgw_port = Some{l3dgw_port}), > + LogicalRouterNAT(lr._uuid, nat), > + /* Skip SNAT entries for now, we handle unique SNAT IPs separately > + * below. > + */ > + nat.nat.__type != "snat". > +/* Now handle SNAT entries too, one per unique SNAT IP. */ > +LogicalRouterPortNatArpNdFlow(router, nat, l3dgw_port) :- > + router in &Router(.l3dgw_port = Some{l3dgw_port}, .snat_ips = snat_ips), > + var snat_ip = FlatMap(snat_ips), > + (var ip, var nats) = snat_ip, > + Some{var nat} = nats.nth(0). > + > +/* Respond to ARP/NS requests on the chassis that binds the gw > + * port. Drop the ARP/NS requests on other chassis. > + */ > +relation LogicalRouterPortNatArpNdFlow(router: Ref<Router>, nat: NAT, lrp: nb::Logical_Router_Port) > +LogicalRouterArpNdFlow(router, nat, Some{lrp}, mac, Some{extra_match}, false, 92), > +LogicalRouterArpNdFlow(router, nat, Some{lrp}, mac, None, true, 91) :- > + LogicalRouterPortNatArpNdFlow(router, nat, lrp), > + (var mac, var extra_match) = match ((nat.external_mac, nat.nat.logical_port)) { > + (Some{external_mac}, Some{logical_port}) -> ( > + /* distributed NAT case, use nat->external_mac */ > + external_mac.to_string(), > + /* Traffic with eth.src = nat->external_mac should only be > + * sent from the chassis where nat->logical_port is > + * resident, so that upstream MAC learning points to the > + * correct chassis. Also need to avoid generation of > + * multiple ARP responses from different chassis. */ > + "is_chassis_resident(${json_string_escape(logical_port)})" > + ), > + _ -> ( > + rEG_INPORT_ETH_ADDR(), > + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s > + * should only be sent from the gateway chassis, so that > + * upstream MAC learning points to the gateway chassis. > + * Also need to avoid generation of multiple ARP responses > + * from different chassis. */ > + match (router.redirect_port_name) { > + "" -> "", > + s -> "is_chassis_resident(${s})" > + } > + ) > + }. > + > +/* Now divide the ARP/ND flows into ARP and ND. */ > +relation LogicalRouterArpNdFlow( > + router: Ref<Router>, > + nat: NAT, > + lrp: Option<nb::Logical_Router_Port>, > + mac: string, > + extra_match: Option<string>, > + drop: bool, > + priority: integer) > +LogicalRouterArpFlow(router, lrp, ipv4, mac, extra_match, drop, priority, > + stage_hint(nat.nat._uuid)) :- > + LogicalRouterArpNdFlow(router, nat@NAT{.external_ip = IPv4{ipv4}}, lrp, > + mac, extra_match, drop, priority). > +LogicalRouterNdFlow(router, lrp, "nd_na", ipv6, true, mac, extra_match, drop, priority, > + stage_hint(nat.nat._uuid)) :- > + LogicalRouterArpNdFlow(router, nat@NAT{.external_ip = IPv6{ipv6}}, lrp, > + mac, extra_match, drop, priority). > + > +relation LogicalRouterArpFlow( > + lr: Ref<Router>, > + lrp: Option<nb::Logical_Router_Port>, > + ip: in_addr, > + mac: string, > + extra_match: Option<string>, > + drop: bool, > + priority: integer, > + external_ids: Map<string,string>) > +Flow(.logical_datapath = lr.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = priority, > + .__match = __match, > + .actions = actions, > + .external_ids = external_ids) :- > + LogicalRouterArpFlow(.lr = lr, .lrp = lrp, .ip = ip, .mac = mac, > + .extra_match = extra_match, .drop = drop, > + .priority = priority, .external_ids = external_ids), > + var __match = { > + var clauses = vec_with_capacity(3); > + match (lrp) { > + Some{p} -> clauses.push("inport == ${json_string_escape(p.name)}"), > + None -> () > + }; > + clauses.push("arp.op == 1 && arp.tpa == ${ip}"); > + clauses.append(extra_match.to_vec()); > + clauses.join(" && ") > + }, > + var actions = if (drop) { > + "drop;" > + } else { > + "eth.dst = eth.src; " > + "eth.src = ${mac}; " > + "arp.op = 2; /* ARP reply */ " > + "arp.tha = arp.sha; " > + "arp.sha = ${mac}; " > + "arp.tpa = arp.spa; " > + "arp.spa = ${ip}; " > + "outport = inport; " > + "flags.loopback = 1; " > + "output;" > + }. > + > +relation LogicalRouterNdFlow( > + lr: Ref<Router>, > + lrp: Option<nb::Logical_Router_Port>, > + action: string, > + ip: in6_addr, > + sn_ip: bool, > + mac: string, > + extra_match: Option<string>, > + drop: bool, > + priority: integer, > + external_ids: Map<string,string>) > +Flow(.logical_datapath = lr.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = priority, > + .__match = __match, > + .actions = actions, > + .external_ids = external_ids) :- > + LogicalRouterNdFlow(.lr = lr, .lrp = lrp, .action = action, .ip = ip, > + .sn_ip = sn_ip, .mac = mac, .extra_match = extra_match, > + .drop = drop, .priority = priority, > + .external_ids = external_ids), > + var __match = { > + var clauses = vec_with_capacity(4); > + match (lrp) { > + Some{p} -> clauses.push("inport == ${json_string_escape(p.name)}"), > + None -> () > + }; > + if (sn_ip) { > + clauses.push("ip6.dst == {${ip}, ${in6_addr_solicited_node(ip)}}") > + }; > + clauses.push("nd_ns && nd.target == ${ip}"); > + clauses.append(extra_match.to_vec()); > + clauses.join(" && ") > + }, > + var actions = if (drop) { > + "drop;" > + } else { > + "${action} { " > + "eth.src = ${mac}; " > + "ip6.src = ${ip}; " > + "nd.target = ${ip}; " > + "nd.tll = ${mac}; " > + "outport = inport; " > + "flags.loopback = 1; " > + "output; " > + "};" > + }. > + > +/* ICMP time exceeded */ > +for (RouterPortNetworksIPv4Addr(.port = &RouterPort{.lrp = lrp, > + .json_name = json_name, > + .router = router, > + .networks = networks, > + .is_redirect = is_redirect}, > + .addr = addr)) > +{ > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 40, > + .__match = "inport == ${json_name} && ip4 && " > + "ip.ttl == {0, 1} && !ip.later_frag", > + .actions = "icmp4 {" > + "eth.dst <-> eth.src; " > + "icmp4.type = 11; /* Time exceeded */ " > + "icmp4.code = 0; /* TTL exceeded in transit */ " > + "ip4.dst = ip4.src; " > + "ip4.src = ${addr.addr}; " > + "ip.ttl = 255; " > + "next; };", > + .external_ids = stage_hint(lrp._uuid)); > + > + /* ARP reply. These flows reply to ARP requests for the router's own > + * IP address. */ > + for (AddChassisResidentCheck(lrp._uuid, add_chassis_resident_check)) { > + var __match = > + "arp.spa == ${ipv4_netaddr_match_network(addr)}" ++ > + if (add_chassis_resident_check) { > + " && is_chassis_resident(${router.redirect_port_name})" > + } else "" in > + LogicalRouterArpFlow(.lr = router, > + .lrp = Some{lrp}, > + .ip = addr.addr, > + .mac = rEG_INPORT_ETH_ADDR(), > + .extra_match = Some{__match}, > + .drop = false, > + .priority = 90, > + .external_ids = stage_hint(lrp._uuid)) > + } > +} > + > +for (&RouterPort(.lrp = lrp, > + .router = router@&Router{.lr = lr}, > + .json_name = json_name, > + .networks = networks, > + .is_redirect = is_redirect)) > +var residence_check = match (is_redirect) { > + true -> Some{"is_chassis_resident(${router.redirect_port_name})"}, > + false -> None > +} in { > + for (RouterLBVIP(.router = &Router{.lr = nb::Logical_Router{._uuid= lr._uuid}}, .vip = vip)) { > + Some{(var ip_address, _)} = ip_address_and_port_from_lb_key(vip) in { > + IPv4{var ipv4} = ip_address in > + LogicalRouterArpFlow(.lr = router, > + .lrp = Some{lrp}, > + .ip = ipv4, > + .mac = rEG_INPORT_ETH_ADDR(), > + .extra_match = residence_check, > + .drop = false, > + .priority = 90, > + .external_ids = map_empty()); > + > + IPv6{var ipv6} = ip_address in > + LogicalRouterNdFlow(.lr = router, > + .lrp = Some{lrp}, > + .action = "nd_na", > + .ip = ipv6, > + .sn_ip = false, > + .mac = rEG_INPORT_ETH_ADDR(), > + .extra_match = residence_check, > + .drop = false, > + .priority = 90, > + .external_ids = map_empty()) > + } > + } > +} > + > +/* Drop IP traffic destined to router owned IPs except if the IP is > + * also a SNAT IP. Those are dropped later, in stage > + * "lr_in_arp_resolve", if unSNAT was unsuccessful. > + * > + * Priority 60. > + */ > +Flow(.logical_datapath = lr_uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 60, > + .__match = "ip4.dst == {" ++ match_ips.join(", ") ++ "}", > + .actions = "drop;", > + .external_ids = stage_hint(lrp_uuid)) :- > + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > + .router = &Router{.snat_ips = snat_ips, > + .lr = nb::Logical_Router{._uuid = lr_uuid}}, > + .networks = networks), > + var addr = FlatMap(networks.ipv4_addrs), > + not snat_ips.contains_key(IPv4{addr.addr}), > + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). > +Flow(.logical_datapath = lr_uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 60, > + .__match = "ip6.dst == {" ++ match_ips.join(", ") ++ "}", > + .actions = "drop;", > + .external_ids = stage_hint(lrp_uuid)) :- > + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > + .router = &Router{.snat_ips = snat_ips, > + .lr = nb::Logical_Router{._uuid = lr_uuid}}, > + .networks = networks), > + var addr = FlatMap(networks.ipv6_addrs), > + not snat_ips.contains_key(IPv6{addr.addr}), > + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). > + > +for (RouterPortNetworksIPv4Addr( > + .port = &RouterPort{ > + .router = &Router{.lr = lr, > + .l3dgw_port = None, > + .is_gateway = false}, > + .lrp = lrp}, > + .addr = addr)) > +{ > + /* UDP/TCP port unreachable. */ > + var __match = "ip4 && ip4.dst == ${addr.addr} && !ip.later_frag && udp" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 80, > + .__match = __match, > + .actions = "icmp4 {" > + "eth.dst <-> eth.src; " > + "ip4.dst <-> ip4.src; " > + "ip.ttl = 255; " > + "icmp4.type = 3; " > + "icmp4.code = 3; " > + "next; };", > + .external_ids = stage_hint(lrp._uuid)); > + > + var __match = "ip4 && ip4.dst == ${addr.addr} && !ip.later_frag && tcp" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 80, > + .__match = __match, > + .actions = "tcp_reset {" > + "eth.dst <-> eth.src; " > + "ip4.dst <-> ip4.src; " > + "next; };", > + .external_ids = stage_hint(lrp._uuid)); > + > + var __match = "ip4 && ip4.dst == ${addr.addr} && !ip.later_frag" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 70, > + .__match = __match, > + .actions = "icmp4 {" > + "eth.dst <-> eth.src; " > + "ip4.dst <-> ip4.src; " > + "ip.ttl = 255; " > + "icmp4.type = 3; " > + "icmp4.code = 2; " > + "next; };", > + .external_ids = stage_hint(lrp._uuid)) > +} > + > +/* DHCPv6 reply handling */ > +Flow(.logical_datapath = rp.router.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 100, > + .__match = "ip6.dst == ${ipv6_addr.addr} " > + "&& udp.src == 547 && udp.dst == 546", > + .actions = "reg0 = 0; handle_dhcpv6_reply;", > + .external_ids = stage_hint(rp.lrp._uuid)) :- > + rp in &RouterPort(), > + var ipv6_addr = FlatMap(rp.networks.ipv6_addrs). > + > +/* Logical router ingress table IP_INPUT: IP Input for IPv6. */ > +for (&RouterPort(.router = &router, .networks = networks, .lrp = lrp) > + if (not vec_is_empty(networks.ipv6_addrs))) > +{ > + //if (op->derived) { > + // /* No ingress packets are accepted on a chassisredirect > + // * port, so no need to program flows for that port. */ > + // continue; > + //} > + > + /* ICMPv6 echo reply. These flows reply to echo requests > + * received for the router's IP address. */ > + var __match = "ip6.dst == " ++ > + format_v6_networks(networks) ++ > + " && icmp6.type == 128 && icmp6.code == 0" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 90, > + .__match = __match, > + .actions = "ip6.dst <-> ip6.src; " > + "ip.ttl = 255; " > + "icmp6.type = 129; " > + "flags.loopback = 1; " > + "next; ", > + .external_ids = stage_hint(lrp._uuid)) > +} > + > +/* ND reply. These flows reply to ND solicitations for the > + * router's own IP address. */ > +for (RouterPortNetworksIPv6Addr(.port = &RouterPort{.lrp = lrp, > + .is_redirect = is_redirect, > + .router = router, > + .networks = networks, > + .json_name = json_name}, > + .addr = addr)) > +{ > + var extra_match = if (is_redirect) { > + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea > + * should only be sent from the gateway chassis, so that > + * upstream MAC learning points to the gateway chassis. > + * Also need to avoid generation of multiple ND replies > + * from different chassis. */ > + Some{"is_chassis_resident(${json_string_escape(chassis_redirect_name(lrp.name))})"} > + } else None in > + LogicalRouterNdFlow(.lr = router, > + .lrp = Some{lrp}, > + .action = "nd_na_router", > + .ip = addr.addr, > + .sn_ip = true, > + .mac = rEG_INPORT_ETH_ADDR(), > + .extra_match = extra_match, > + .drop = false, > + .priority = 90, > + .external_ids = stage_hint(lrp._uuid)) > +} > + > +/* UDP/TCP port unreachable */ > +for (RouterPortNetworksIPv6Addr( > + .port = &RouterPort{.router = &Router{.lr = lr, > + .l3dgw_port = None, > + .is_gateway = false}, > + .lrp = lrp, > + .json_name = json_name}, > + .addr = addr)) > +{ > + var __match = "ip6 && ip6.dst == ${addr.addr} && !ip.later_frag && tcp" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 80, > + .__match = __match, > + .actions = "tcp_reset {" > + "eth.dst <-> eth.src; " > + "ip6.dst <-> ip6.src; " > + "next; };", > + .external_ids = stage_hint(lrp._uuid)); > + > + var __match = "ip6 && ip6.dst == ${addr.addr} && !ip.later_frag && udp" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 80, > + .__match = __match, > + .actions = "icmp6 {" > + "eth.dst <-> eth.src; " > + "ip6.dst <-> ip6.src; " > + "ip.ttl = 255; " > + "icmp6.type = 1; " > + "icmp6.code = 4; " > + "next; };", > + .external_ids = stage_hint(lrp._uuid)); > + > + var __match = "ip6 && ip6.dst == ${addr.addr} && !ip.later_frag" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 70, > + .__match = __match, > + .actions = "icmp6 {" > + "eth.dst <-> eth.src; " > + "ip6.dst <-> ip6.src; " > + "ip.ttl = 255; " > + "icmp6.type = 1; " > + "icmp6.code = 3; " > + "next; };", > + .external_ids = stage_hint(lrp._uuid)) > +} > + > +/* ICMPv6 time exceeded */ > +for (RouterPortNetworksIPv6Addr(.port = &RouterPort{.router = &router, > + .lrp = lrp, > + .json_name = json_name}, > + .addr = addr) > + /* skip link-local address */ > + if (not ipv6_netaddr_is_lla(addr))) > +{ > + var __match = "inport == ${json_name} && ip6 && " > + "ip6.src == ${ipv6_netaddr_match_network(addr)} && " > + "ip.ttl == {0, 1} && !ip.later_frag" in > + var actions = "icmp6 {" > + "eth.dst <-> eth.src; " > + "ip6.dst = ip6.src; " > + "ip6.src = ${addr.addr}; " > + "ip.ttl = 255; " > + "icmp6.type = 3; /* Time exceeded */ " > + "icmp6.code = 0; /* TTL exceeded in transit */ " > + "next; };" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 40, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lrp._uuid)) > +} > + > +/* NAT, Defrag and load balancing. */ > + > +function default_allow_flow(datapath: uuid, stage: Stage): Flow { > + Flow{.logical_datapath = datapath, > + .stage = stage, > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()} > +} > +for (&Router(.lr = lr)) { > + /* Packets are allowed by default. */ > + Flow[default_allow_flow(lr._uuid, router_stage(IN, DEFRAG))]; > + Flow[default_allow_flow(lr._uuid, router_stage(IN, UNSNAT))]; > + Flow[default_allow_flow(lr._uuid, router_stage(OUT, SNAT))]; > + Flow[default_allow_flow(lr._uuid, router_stage(IN, DNAT))]; > + Flow[default_allow_flow(lr._uuid, router_stage(OUT, UNDNAT))]; > + Flow[default_allow_flow(lr._uuid, router_stage(OUT, EGR_LOOP))]; > + Flow[default_allow_flow(lr._uuid, router_stage(IN, ECMP_STATEFUL))]; > + > + /* Send the IPv6 NS packets to next table. When ovn-controller > + * generates IPv6 NS (for the action - nd_ns{}), the injected > + * packet would go through conntrack - which is not required. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, SNAT), > + .priority = 120, > + .__match = "nd_ns", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +function lrouter_nat_is_stateless(nat: NAT): bool = { > + Some{"true"} == map_get(nat.nat.options, "stateless") > +} > + > +/* Handles the match criteria and actions in logical flow > + * based on external ip based NAT rule filter. > + * > + * For ALLOWED_EXT_IPs, we will add an additional match criteria > + * of comparing ip*.src/dst with the allowed external ip address set. > + * > + * For EXEMPTED_EXT_IPs, we will have an additional logical flow > + * where we compare ip*.src/dst with the exempted external ip address set > + * and action says "next" instead of ct*. > + */ > +function lrouter_nat_add_ext_ip_match( > + router: Ref<Router>, > + nat: NAT, > + __match: string, > + ipX: string, > + is_src: bool, > + mask: v46_ip): (string, Option<Flow>) > +{ > + var dir = if (is_src) "src" else "dst"; > + match (nat.exceptional_ext_ips) { > + None -> ("", None), > + Some{AllowedExtIps{__as}} -> (" && ${ipX}.${dir} == $${__as.name}", None), > + Some{ExemptedExtIps{__as}} -> { > + /* Priority of logical flows corresponding to exempted_ext_ips is > + * +1 of the corresponding regulr NAT rule. > + * For example, if we have following NAT rule and we associate > + * exempted external ips to it: > + * "ovn-nbctl lr-nat-add router dnat_and_snat 10.15.24.139 50.0.0.11" > + * > + * And now we associate exempted external ip address set to it. > + * Now corresponding to above rule we will have following logical > + * flows: > + * lr_out_snat...priority=162, match=(..ip4.dst == $exempt_range), > + * action=(next;) > + * lr_out_snat...priority=161, match=(..), action=(ct_snat(....);) > + * > + */ > + var priority = match (is_src) { > + true -> { > + /* S_ROUTER_IN_DNAT uses priority 100 */ > + 100 + 1 > + }, > + false -> { > + /* S_ROUTER_OUT_SNAT uses priority (mask + 1 + 128 + 1) */ > + var is_gw_router = router.l3dgw_port.is_none(); > + var mask_1bits = ip46_count_cidr_bits(mask).unwrap_or(8'd0) as integer; > + mask_1bits + 2 + { if (not is_gw_router) 128 else 0 } > + } > + }; > + > + ("", > + Some{Flow{.logical_datapath = router.lr._uuid, > + .stage = if (is_src) { router_stage(IN, DNAT) } else { router_stage(OUT, SNAT) }, > + .priority = priority, > + .__match = "${__match} && ${ipX}.${dir} == $${__as.name}", > + .actions = "next;", > + .external_ids = stage_hint(nat.nat._uuid)}}) > + } > + } > +} > + > +relation LogicalRouterForceSnatFlows( > + logical_router: uuid, > + ips: Set<v46_ip>, > + context: string) > +Flow(.logical_datapath = logical_router, > + .stage = router_stage(IN, UNSNAT), > + .priority = 110, > + .__match = "${ipX} && ${ipX}.dst == ${ip}", > + .actions = "ct_snat;", > + .external_ids = map_empty()), > +/* Higher priority rules to force SNAT with the IP addresses > + * configured in the Gateway router. This only takes effect > + * when the packet has already been DNATed or load balanced once. */ > +Flow(.logical_datapath = logical_router, > + .stage = router_stage(OUT, SNAT), > + .priority = 100, > + .__match = "flags.force_snat_for_${context} == 1 && ${ipX}", > + .actions = "ct_snat(%{ip});", > + .external_ids = map_empty()) :- > + LogicalRouterForceSnatFlows(.logical_router = logical_router, > + .ips = ips, > + .context = context), > + var ip = FlatMap(ips), > + var ipX = ip46_ipX(ip). > + > +/* NAT rules are only valid on Gateway routers and routers with > + * l3dgw_port (router has a port with "redirect-chassis" > + * specified). */ > +for (r in &Router(.lr = lr, > + .l3dgw_port = l3dgw_port, > + .redirect_port_name = redirect_port_name, > + .is_gateway = is_gateway) > + if is_some(l3dgw_port) or is_gateway) > +{ > + for (LogicalRouterNAT(.lr = lr._uuid, .nat = nat)) { > + var ipX = ip46_ipX(nat.external_ip) in > + var xx = ip46_xxreg(nat.external_ip) in > + /* Check the validity of nat->logical_ip. 'logical_ip' can > + * be a subnet when the type is "snat". */ > + Some{(_, var mask)} = ip46_parse_masked(nat.nat.logical_ip) in > + true == match ((ip46_is_all_ones(mask), nat.nat.__type)) { > + (_, "snat") -> true, > + (false, _) -> { > + warn("bad ip ${nat.nat.logical_ip} for dnat in router ${uuid2str(lr._uuid)}"); > + false > + }, > + _ -> true > + } in > + /* For distributed router NAT, determine whether this NAT rule > + * satisfies the conditions for distributed NAT processing. */ > + var mac = match ((is_some(l3dgw_port) and nat.nat.__type == "dnat_and_snat", > + nat.nat.logical_port, nat.external_mac)) { > + (true, Some{_}, Some{mac}) -> Some{mac}, > + _ -> None > + } in > + var stateless = (lrouter_nat_is_stateless(nat) > + and nat.nat.__type == "dnat_and_snat") in > + { > + /* Ingress UNSNAT table: It is for already established connections' > + * reverse traffic. i.e., SNAT has already been done in egress > + * pipeline and now the packet has entered the ingress pipeline as > + * part of a reply. We undo the SNAT here. > + * > + * Undoing SNAT has to happen before DNAT processing. This is > + * because when the packet was DNATed in ingress pipeline, it did > + * not know about the possibility of eventual additional SNAT in > + * egress pipeline. */ > + if (nat.nat.__type == "snat" or nat.nat.__type == "dnat_and_snat") { > + if (l3dgw_port == None) { > + /* Gateway router. */ > + var actions = if (stateless) { > + "${ipX}.dst=${nat.nat.logical_ip}; next;" > + } else { > + "ct_snat;" > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, UNSNAT), > + .priority = 90, > + .__match = "ip && ${ipX}.dst == ${nat.nat.external_ip}", > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + }; > + Some{var gwport} = l3dgw_port in { > + /* Distributed router. */ > + > + /* Traffic received on l3dgw_port is subject to NAT. */ > + var __match = > + "ip && ${ipX}.dst == ${nat.nat.external_ip}" > + " && inport == ${json_string_escape(gwport.name)}" ++ > + if (mac == None) { > + /* Flows for NAT rules that are centralized are only > + * programmed on the "redirect-chassis". */ > + " && is_chassis_resident(${redirect_port_name})" > + } else { "" } in > + var actions = if (stateless) { > + "${ipX}.dst=${nat.nat.logical_ip}; next;" > + } else { > + "ct_snat;" > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, UNSNAT), > + .priority = 100, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + } > + }; > + > + /* Ingress DNAT table: Packets enter the pipeline with destination > + * IP address that needs to be DNATted from a external IP address > + * to a logical IP address. */ > + var ip_and_ports = "${nat.nat.logical_ip}" ++ > + if (nat.nat.external_port_range != "") { > + " ${nat.nat.external_port_range}" > + } else { > + "" > + } in > + if (nat.nat.__type == "dnat" or nat.nat.__type == "dnat_and_snat") { > + None = l3dgw_port in > + var __match = "ip && ip4.dst == ${nat.nat.external_ip}" in > + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( > + r, nat, __match, ipX, true, mask) in > + { > + /* Gateway router. */ > + /* Packet when it goes from the initiator to destination. > + * We need to set flags.loopback because the router can > + * send the packet back through the same interface. */ > + Some{var f} = ext_flow in Flow[f]; > + > + var flag_action = > + if (has_force_snat_ip(lr, "dnat")) { > + /* Indicate to the future tables that a DNAT has taken > + * place and a force SNAT needs to be done in the > + * Egress SNAT table. */ > + "flags.force_snat_for_dnat = 1; " > + } else { "" } in > + var nat_actions = if (stateless) { > + "${ipX}.dst=${nat.nat.logical_ip}; next;" > + } else { > + "flags.loopback = 1; " > + "ct_dnat(${ip_and_ports});" > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, DNAT), > + .priority = 100, > + .__match = __match ++ ext_ip_match, > + .actions = flag_action ++ nat_actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + }; > + > + Some{var gwport} = l3dgw_port in > + var __match = > + "ip && ${ipX}.dst == ${nat.nat.external_ip}" > + " && inport == ${json_string_escape(gwport.name)}" ++ > + if (mac == None) { > + /* Flows for NAT rules that are centralized are only > + * programmed on the "redirect-chassis". */ > + " && is_chassis_resident(${redirect_port_name})" > + } else { "" } in > + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( > + r, nat, __match, ipX, true, mask) in > + { > + /* Distributed router. */ > + /* Traffic received on l3dgw_port is subject to NAT. */ > + Some{var f} = ext_flow in Flow[f]; > + > + var actions = if (stateless) { > + "${ipX}.dst=${nat.nat.logical_ip}; next;" > + } else { > + "ct_dnat(${ip_and_ports});" > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, DNAT), > + .priority = 100, > + .__match = __match ++ ext_ip_match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + } > + }; > + > + /* ARP resolve for NAT IPs. */ > + Some{var gwport} = l3dgw_port in { > + var gwport_name = json_string_escape(gwport.name) in { > + if (nat.nat.__type == "snat") { > + var __match = "inport == ${gwport_name} && " > + "${ipX}.src == ${nat.nat.external_ip}" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, IP_INPUT), > + .priority = 120, > + .__match = __match, > + .actions = "next;", > + .external_ids = stage_hint(nat.nat._uuid)) > + }; > + > + var nexthop_reg = "${xx}${rEG_NEXT_HOP()}" in > + var __match = "outport == ${gwport_name} && " > + "${nexthop_reg} == ${nat.nat.external_ip}" in > + var dst_mac = match (mac) { > + Some{value} -> "${value}", > + None -> gwport.mac > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = __match, > + .actions = "eth.dst = ${dst_mac}; next;", > + .external_ids = stage_hint(nat.nat._uuid)) > + } > + }; > + > + /* Egress UNDNAT table: It is for already established connections' > + * reverse traffic. i.e., DNAT has already been done in ingress > + * pipeline and now the packet has entered the egress pipeline as > + * part of a reply. We undo the DNAT here. > + * > + * Note that this only applies for NAT on a distributed router. > + * Undo DNAT on a gateway router is done in the ingress DNAT > + * pipeline stage. */ > + if ((nat.nat.__type == "dnat" or nat.nat.__type == "dnat_and_snat")) { > + Some{var gwport} = l3dgw_port in > + var __match = > + "ip && ${ipX}.src == ${nat.nat.logical_ip}" > + " && outport == ${json_string_escape(gwport.name)}" ++ > + if (mac == None) { > + /* Flows for NAT rules that are centralized are only > + * programmed on the "redirect-chassis". */ > + " && is_chassis_resident(${redirect_port_name})" > + } else { "" } in > + var actions = > + match (mac) { > + Some{mac_addr} -> "eth.src = ${mac_addr}; ", > + None -> "" > + } ++ > + if (stateless) { > + "${ipX}.src=${nat.nat.external_ip}; next;" > + } else { > + "ct_dnat;" > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, UNDNAT), > + .priority = 100, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + }; > + > + /* Egress SNAT table: Packets enter the egress pipeline with > + * source ip address that needs to be SNATted to a external ip > + * address. */ > + var ip_and_ports = "${nat.nat.external_ip}" ++ > + if (nat.nat.external_port_range != "") { > + " ${nat.nat.external_port_range}" > + } else { > + "" > + } in > + if (nat.nat.__type == "snat" or nat.nat.__type == "dnat_and_snat") { > + None = l3dgw_port in > + var __match = "ip && ${ipX}.src == ${nat.nat.logical_ip}" in > + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( > + r, nat, __match, ipX, false, mask) in > + { > + /* Gateway router. */ > + Some{var f} = ext_flow in Flow[f]; > + > + /* The priority here is calculated such that the > + * nat->logical_ip with the longest mask gets a higher > + * priority. */ > + var actions = if (stateless) { > + "${ipX}.src=${nat.nat.external_ip}; next;" > + } else { > + "ct_snat(${ip_and_ports});" > + } in > + Some{var plen} = ip46_count_cidr_bits(mask) in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, SNAT), > + .priority = plen as bit<64> + 1, > + .__match = __match ++ ext_ip_match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + }; > + > + Some{var gwport} = l3dgw_port in > + var __match = > + "ip && ${ipX}.src == ${nat.nat.logical_ip}" > + " && outport == ${json_string_escape(gwport.name)}" ++ > + if (mac == None) { > + /* Flows for NAT rules that are centralized are only > + * programmed on the "redirect-chassis". */ > + " && is_chassis_resident(${redirect_port_name})" > + } else { "" } in > + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( > + r, nat, __match, ipX, false, mask) in > + { > + /* Distributed router. */ > + Some{var f} = ext_flow in Flow[f]; > + > + var actions = > + match (mac) { > + Some{mac_addr} -> "eth.src = ${mac_addr}; ", > + _ -> "" > + } ++ if (stateless) { > + "${ipX}.src=${nat.nat.external_ip}; next;" > + } else { > + "ct_snat(${ip_and_ports});" > + } in > + /* The priority here is calculated such that the > + * nat->logical_ip with the longest mask gets a higher > + * priority. */ > + Some{var plen} = ip46_count_cidr_bits(mask) in > + var priority = (plen as bit<64>) + 1 in > + var centralized_boost = if (mac == None) 128 else 0 in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, SNAT), > + .priority = priority + centralized_boost, > + .__match = __match ++ ext_ip_match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + } > + }; > + > + /* Logical router ingress table ADMISSION: > + * For NAT on a distributed router, add rules allowing > + * ingress traffic with eth.dst matching nat->external_mac > + * on the l3dgw_port instance where nat->logical_port is > + * resident. */ > + Some{var mac_addr} = mac in > + Some{var gwport} = l3dgw_port in > + Some{var logical_port} = nat.nat.logical_port in > + var __match = > + "eth.dst == ${mac_addr} && inport == ${json_string_escape(gwport.name)}" > + " && is_chassis_resident(${json_string_escape(logical_port)})" in > + /* Store the ethernet address of the port receiving the packet. > + * This will save us from having to match on inport further > + * down in the pipeline. > + */ > + var actions = "${rEG_INPORT_ETH_ADDR()} = ${gwport.mac}; next;" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ADMISSION), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)); > + > + /* Ingress Gateway Redirect Table: For NAT on a distributed > + * router, add flows that are specific to a NAT rule. These > + * flows indicate the presence of an applicable NAT rule that > + * can be applied in a distributed manner. > + * In particulr the IP src register and eth.src are set to NAT external IP and > + * NAT external mac so the ARP request generated in the following > + * stage is sent out with proper IP/MAC src addresses > + */ > + Some{var mac_addr} = mac in > + Some{var gwport} = l3dgw_port in > + Some{var logical_port} = nat.nat.logical_port in > + Some{var external_mac} = nat.nat.external_mac in > + var __match = > + "${ipX}.src == ${nat.nat.logical_ip} && " > + "outport == ${json_string_escape(gwport.name)} && " > + "is_chassis_resident(${json_string_escape(logical_port)})" in > + var actions = > + "eth.src = ${external_mac}; " > + "${xx}${rEG_SRC()} = ${nat.nat.external_ip}; " > + "next;" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, GW_REDIRECT), > + .priority = 100, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)); > + > + /* Egress Loopback table: For NAT on a distributed router. > + * If packets in the egress pipeline on the distributed > + * gateway port have ip.dst matching a NAT external IP, then > + * loop a clone of the packet back to the beginning of the > + * ingress pipeline with inport = outport. */ > + Some{var gwport} = l3dgw_port in > + /* Distributed router. */ > + Some{var port} = match (mac) { > + Some{_} -> match (nat.nat.logical_port) { > + Some{name} -> Some{json_string_escape(name)}, > + None -> None: Option<string> > + }, > + None -> Some{redirect_port_name} > + } in > + var __match = "${ipX}.dst == ${nat.nat.external_ip} && outport == ${json_string_escape(gwport.name)} && is_chassis_resident(${port})" in > + var regs = { > + var regs = vec_empty(); > + for (j in range_vec(0, mFF_N_LOG_REGS(), 01)) { > + vec_push(regs, "reg${j} = 0; ") > + }; > + regs > + } in > + var actions = > + "clone { ct_clear; " > + "inport = outport; outport = \"\"; " > + "flags = 0; flags.loopback = 1; " ++ > + string_join(regs, "") ++ > + "${rEGBIT_EGRESS_LOOPBACK()} = 1; " > + "next(pipeline=ingress, table=0); };" in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, EGR_LOOP), > + .priority = 100, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(nat.nat._uuid)) > + } > + }; > + > + /* Handle force SNAT options set in the gateway router. */ > + if (l3dgw_port == None) { > + var dnat_force_snat_ips = get_force_snat_ip(lr, "dnat") in > + if (not dnat_force_snat_ips.is_empty()) > + LogicalRouterForceSnatFlows(.logical_router = lr._uuid, > + .ips = dnat_force_snat_ips, > + .context = "dnat"); > + > + var lb_force_snat_ips = get_force_snat_ip(lr, "lb") in > + if (not lb_force_snat_ips.is_empty()) > + LogicalRouterForceSnatFlows(.logical_router = lr._uuid, > + .ips = lb_force_snat_ips, > + .context = "lb"); > + > + /* For gateway router, re-circulate every packet through > + * the DNAT zone. This helps with the following. > + * > + * Any packet that needs to be unDNATed in the reverse > + * direction gets unDNATed. Ideally this could be done in > + * the egress pipeline. But since the gateway router > + * does not have any feature that depends on the source > + * ip address being external IP address for IP routing, > + * we can do it here, saving a future re-circulation. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, DNAT), > + .priority = 50, > + .__match = "ip", > + .actions = "flags.loopback = 1; ct_dnat;", > + .external_ids = map_empty()) > + } > +} > + > +function nats_contain_vip(nats: Vec<NAT>, vip: v46_ip): bool { > + for (nat in nats) { > + if (nat.external_ip == vip) { > + return true > + } > + }; > + return false > +} > + > +/* Load balancing and packet defrag are only valid on > + * Gateway routers or router with gateway port. */ > +for (RouterLBVIP( > + .router = &Router{.lr = lr, > + .l3dgw_port = l3dgw_port, > + .redirect_port_name = redirect_port_name, > + .is_gateway = is_gateway, > + .nats = nats}, > + .lb = &lb, > + .vip = vip, > + .backends = backends) > + if is_some(l3dgw_port) or is_gateway) > +{ > + if (backends == "") { > + for (ControllerEventEn(true)) { > + for (HasEventElbMeter(has_elb_meter)) { > + Some {(var __match, var __action)} = > + build_empty_lb_event_flow(vip, lb, has_elb_meter) in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, DNAT), > + .priority = 130, > + .__match = __match, > + .actions = __action, > + .external_ids = stage_hint(lb._uuid)) > + } > + } > + }; > + > + /* A set to hold all ips that need defragmentation and tracking. */ > + > + /* vip contains IP:port or just IP. */ > + Some{(var ip_address, var port)} = ip_address_and_port_from_lb_key(vip) in > + var ipX = ip46_ipX(ip_address) in > + var proto = match (lb.protocol) { > + Some{proto} -> proto, > + _ -> "tcp" > + } in { > + /* If there are any load balancing rules, we should send > + * the packet to conntrack for defragmentation and > + * tracking. This helps with two things. > + * > + * 1. With tracking, we can send only new connections to > + * pick a DNAT ip address from a group. > + * 2. If there are L4 ports in load balancing rules, we > + * need the defragmentation to match on L4 ports. */ > + var __match = "ip && ${ipX}.dst == ${ip_address}" in > + /* One of these flows must be created for each unique LB VIP address. > + * We create one for each VIP:port pair; flows with the same IP and > + * different port numbers will produce identical flows that will > + * get merged by DDlog. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, DEFRAG), > + .priority = 100, > + .__match = __match, > + .actions = "ct_next;", > + .external_ids = stage_hint(lb._uuid)); > + > + /* Higher priority rules are added for load-balancing in DNAT > + * table. For every match (on a VIP[:port]), we add two flows > + * via add_router_lb_flow(). One flow is for specific matching > + * on ct.new with an action of "ct_lb($targets);". The other > + * flow is for ct.est with an action of "ct_dnat;". */ > + var match1 = "ip && ${ipX}.dst == ${ip_address}" in > + (var prio, var match2) = > + if (port != 0) { > + (120, " && ${proto} && ${proto}.dst == ${port}") > + } else { > + (110, "") > + } in > + var __match = match1 ++ match2 ++ > + match (l3dgw_port) { > + Some{gwport} -> " && is_chassis_resident(${redirect_port_name})", > + _ -> "" > + } in > + var has_force_snat_ip = has_force_snat_ip(lr, "lb") in > + { > + /* A match and actions for established connections. */ > + var est_match = "ct.est && " ++ __match in > + var actions = > + match (has_force_snat_ip) { > + true -> "flags.force_snat_for_lb = 1; ct_dnat;", > + false -> "ct_dnat;" > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, DNAT), > + .priority = prio, > + .__match = est_match, > + .actions = actions, > + .external_ids = stage_hint(lb._uuid)); > + > + if (nats_contain_vip(nats, ip_address)) { > + /* The load balancer vip is also present in the NAT entries. > + * So add a high priority lflow to advance the the packet > + * destined to the vip (and the vip port if defined) > + * in the S_ROUTER_IN_UNSNAT stage. > + * There seems to be an issue with ovs-vswitchd. When the new > + * connection packet destined for the lb vip is received, > + * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat > + * conntrack zone. For the next packet, if it goes through > + * unsnat stage, the conntrack flags are not set properly, and > + * it doesn't hit the established state flows in > + * S_ROUTER_IN_DNAT stage. */ > + var match3 = "${ipX} && ${ipX}.dst == ${ip_address} && ${proto}" ++ > + if (port != 0) { " && ${proto}.dst == ${port}" } > + else { "" } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, UNSNAT), > + .priority = 120, > + .__match = match3, > + .actions = "next;", > + .external_ids = stage_hint(lb._uuid)) > + }; > + > + Some{var gwport} = l3dgw_port in > + /* Add logical flows to UNDNAT the load balanced reverse traffic in > + * the router egress pipleine stage - S_ROUTER_OUT_UNDNAT if the logical > + * router has a gateway router port associated. > + */ > + var conds = { > + var conds = vec_empty(); > + for (ip_str in string_split(backends, ",")) { > + match (ip_address_and_port_from_lb_key(ip_str)) { > + None -> () /* FIXME: put a break here */, > + Some{(ip_address_, port_)} -> vec_push(conds, > + "(${ipX}.src == ${ip_address_}" ++ > + if (port_ != 0) { > + " && ${proto}.src == ${port_})" > + } else { > + ")" > + }) > + } > + }; > + conds > + } in > + not vec_is_empty(conds) in > + var undnat_match = > + "${ip46_ipX(ip_address)} && (" ++ string_join(conds, " || ") ++ > + ") && outport == ${json_string_escape(gwport.name)} && " > + "is_chassis_resident(${redirect_port_name})" in > + var action = > + match (has_force_snat_ip) { > + true -> "flags.force_snat_for_lb = 1; ct_dnat;", > + false -> "ct_dnat;" > + } in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, UNDNAT), > + .priority = 120, > + .__match = undnat_match, > + .actions = action, > + .external_ids = stage_hint(lb._uuid)) > + } > + } > +} > + > +/* Higher priority rules are added for load-balancing in DNAT > + * table. For every match (on a VIP[:port]), we add two flows > + * via add_router_lb_flow(). One flow is for specific matching > + * on ct.new with an action of "ct_lb($targets);". The other > + * flow is for ct.est with an action of "ct_dnat;". */ > +Flow(.logical_datapath = r.lr._uuid, > + .stage = router_stage(IN, DNAT), > + .priority = priority, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lb._uuid)) :- > + r in &Router(), > + is_some(r.l3dgw_port) or r.is_gateway, > + LBVIPBackend[lbvipbackend], > + Some{var svc_monitor} = lbvipbackend.svc_monitor, > + var lbvip = lbvipbackend.lbvip, > + var lb = lbvip.lb, > + set_contains(r.lr.load_balancer, lb._uuid), > + bs in &LBVIPBackendStatus(.port = lbvipbackend.port, > + .ip = lbvipbackend.ip, > + .protocol = default_protocol(lb.protocol), > + .logical_port = svc_monitor.port_name), > + var bses = bs.group_by((r, lbvip, lb)).to_set(), > + var __match > + = "ct.new && " ++ > + get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, true) ++ > + match (r.l3dgw_port) { > + Some{gwport} -> " && is_chassis_resident(${r.redirect_port_name})", > + _ -> "" > + }, > + var priority = if (lbvip.vip_port != 0) 120 else 110, > + var up_backends = { > + var up_backends = set_empty(); > + for (bs in bses) { > + if (bs.up) { > + set_insert(up_backends, "${bs.ip}:${bs.port}") > + } > + }; > + up_backends > + }, > + var actions = if (set_is_empty(up_backends)) { > + "drop;" > + } else { > + match (has_force_snat_ip(r.lr, "lb")) { > + true -> "flags.force_snat_for_lb = 1; ", > + false -> "" > + } ++ ct_lb(string_join(set_to_vec(up_backends), ","), lb.selection_fields, > + lb.protocol) > + }. > +Flow(.logical_datapath = r.lr._uuid, > + .stage = router_stage(IN, DNAT), > + .priority = priority, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lb._uuid)) :- > + r in &Router(), > + is_some(r.l3dgw_port) or r.is_gateway, > + LBVIPBackend[lbvipbackend], > + None = lbvipbackend.svc_monitor, > + var lbvip = lbvipbackend.lbvip, > + var lb = lbvip.lb, > + set_contains(r.lr.load_balancer, lb._uuid), > + var __match > + = "ct.new && " ++ > + get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, true) ++ > + match (r.l3dgw_port) { > + Some{gwport} -> " && is_chassis_resident(${r.redirect_port_name})", > + _ -> "" > + }, > + var priority = if (lbvip.vip_port != 0) 120 else 110, > + var actions = ct_lb(lbvip.backend_ips, lb.selection_fields, lb.protocol). > + > + > +/* Defaults based on MaxRtrInterval and MinRtrInterval from RFC 4861 section > + * 6.2.1 > + */ > +function nD_RA_MAX_INTERVAL_DEFAULT(): integer = 600 > + > +function nd_ra_min_interval_default(max: integer): integer = > +{ > + if (max >= 9) { max / 3 } else { max * 3 / 4 } > +} > + > +function nD_RA_MAX_INTERVAL_MAX(): integer = 1800 > +function nD_RA_MAX_INTERVAL_MIN(): integer = 4 > + > +function nD_RA_MIN_INTERVAL_MAX(max: integer): integer = ((max * 3) / 4) > +function nD_RA_MIN_INTERVAL_MIN(): integer = 3 > + > +function nD_MTU_DEFAULT(): integer = 0 > + > +function copy_ra_to_sb(port: RouterPort, address_mode: string): Map<string, string> = > +{ > + var options = port.sb_options; > + > + map_insert(options, "ipv6_ra_send_periodic", "true"); > + map_insert(options, "ipv6_ra_address_mode", address_mode); > + > + var max_interval = map_get_int_def(port.lrp.ipv6_ra_configs, "max_interval", > + nD_RA_MAX_INTERVAL_DEFAULT()); > + > + if (max_interval > nD_RA_MAX_INTERVAL_MAX()) { > + max_interval = nD_RA_MAX_INTERVAL_MAX() > + } else (); > + > + if (max_interval < nD_RA_MAX_INTERVAL_MIN()) { > + max_interval = nD_RA_MAX_INTERVAL_MIN() > + } else (); > + > + map_insert(options, "ipv6_ra_max_interval", "${max_interval}"); > + > + var min_interval = map_get_int_def(port.lrp.ipv6_ra_configs, > + "min_interval", nd_ra_min_interval_default(max_interval)); > + > + if (min_interval > nD_RA_MIN_INTERVAL_MAX(max_interval)) { > + min_interval = nD_RA_MIN_INTERVAL_MAX(max_interval) > + } else (); > + > + if (min_interval < nD_RA_MIN_INTERVAL_MIN()) { > + min_interval = nD_RA_MIN_INTERVAL_MIN() > + } else (); > + > + map_insert(options, "ipv6_ra_min_interval", "${min_interval}"); > + > + var mtu = map_get_int_def(port.lrp.ipv6_ra_configs, "mtu", nD_MTU_DEFAULT()); > + > + /* RFC 2460 requires the MTU for IPv6 to be at least 1280 */ > + if (mtu != 0 and mtu >= 1280) { > + map_insert(options, "ipv6_ra_mtu", "${mtu}") > + } else (); > + > + var prefixes = vec_empty(); > + for (addrs in port.networks.ipv6_addrs) { > + if (ipv6_netaddr_is_lla(addrs)) { > + map_insert(options, "ipv6_ra_src_addr", "${addrs.addr}") > + } else { > + vec_push(prefixes, ipv6_netaddr_match_network(addrs)) > + } > + }; > + match (map_get(port.sb_options, "ipv6_ra_pd_list")) { > + Some{value} -> vec_push(prefixes, value), > + _ -> () > + }; > + map_insert(options, "ipv6_ra_prefixes", string_join(prefixes, " ")); > + > + match (map_get(port.lrp.ipv6_ra_configs, "rdnss")) { > + Some{value} -> map_insert(options, "ipv6_ra_rdnss", value), > + _ -> () > + }; > + > + match (map_get(port.lrp.ipv6_ra_configs, "dnssl")) { > + Some{value} -> map_insert(options, "ipv6_ra_dnssl", value), > + _ -> () > + }; > + > + map_insert(options, "ipv6_ra_src_eth", "${port.networks.ea}"); > + > + var prf = match (map_get(port.lrp.ipv6_ra_configs, "router_preference")) { > + Some{prf} -> if (prf == "HIGH" or prf == "LOW") prf else "MEDIUM", > + _ -> "MEDIUM" > + }; > + map_insert(options, "ipv6_ra_prf", prf); > + > + match (map_get(port.lrp.ipv6_ra_configs, "route_info")) { > + Some{s} -> map_insert(options, "ipv6_ra_route_info", s), > + _ -> () > + }; > + > + options > +} > + > +/* Logical router ingress table ND_RA_OPTIONS and ND_RA_RESPONSE: IPv6 Router > + * Adv (RA) options and response. */ > +// FIXME: do these rules apply to derived ports? > +for (&RouterPort[port@RouterPort{.lrp = lrp@nb::Logical_Router_Port{.peer = None}, > + .router = &router, > + .json_name = json_name, > + .networks = networks, > + .peer = PeerSwitch{}}] > + if (not vec_is_empty(networks.ipv6_addrs))) > +{ > + Some{var address_mode} = map_get(lrp.ipv6_ra_configs, "address_mode") in > + /* FIXME: we need a nicer wat to write this */ > + true == > + if ((address_mode != "slaac") and > + (address_mode != "dhcpv6_stateful") and > + (address_mode != "dhcpv6_stateless")) { > + warn("Invalid address mode [${address_mode}] defined"); > + false > + } else { true } in > + { > + if (map_get_bool_def(lrp.ipv6_ra_configs, "send_periodic", false)) { > + RouterPortRAOptions(lrp._uuid, copy_ra_to_sb(port, address_mode)) > + }; > + > + (true, var prefix) = > + { > + var add_rs_response_flow = false; > + var prefix = ""; > + for (addr in networks.ipv6_addrs) { > + if (not ipv6_netaddr_is_lla(addr)) { > + prefix = prefix ++ ", prefix = ${ipv6_netaddr_match_network(addr)}"; > + add_rs_response_flow = true > + } else () > + }; > + (add_rs_response_flow, prefix) > + } in > + { > + var __match = "inport == ${json_name} && ip6.dst == ff02::2 && nd_rs" in > + /* As per RFC 2460, 1280 is minimum IPv6 MTU. */ > + var mtu = match(map_get(lrp.ipv6_ra_configs, "mtu")) { > + Some{mtu_s} -> { > + match (str_to_int(mtu_s, 10)) { > + None -> 0, > + Some{mtu} -> if (mtu >= 1280) mtu else 0 > + } > + }, > + None -> 0 > + } in > + var actions0 = > + "${rEGBIT_ND_RA_OPTS_RESULT()} = put_nd_ra_opts(" > + "addr_mode = ${json_string_escape(address_mode)}, " > + "slla = ${networks.ea}" ++ > + if (mtu > 0) { ", mtu = ${mtu}" } else { "" } in > + var router_preference = match (map_get(lrp.ipv6_ra_configs, "router_preference")) { > + Some{"MEDIUM"} -> "", > + None -> "", > + Some{prf} -> ", router_preference = \"${prf}\"" > + } in > + var actions = actions0 ++ router_preference ++ prefix ++ "); next;" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ND_RA_OPTIONS), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lrp._uuid)); > + > + var __match = "inport == ${json_name} && ip6.dst == ff02::2 && " > + "nd_ra && ${rEGBIT_ND_RA_OPTS_RESULT()}" in > + var ip6_str = ipv6_string_mapped(in6_generate_lla(networks.ea)) in > + var actions = "eth.dst = eth.src; eth.src = ${networks.ea}; " > + "ip6.dst = ip6.src; ip6.src = ${ip6_str}; " > + "outport = inport; flags.loopback = 1; " > + "output;" in > + Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ND_RA_RESPONSE), > + .priority = 50, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(lrp._uuid)) > + } > + } > +} > + > + > +/* Logical router ingress table ND_RA_OPTIONS, ND_RA_RESPONSE: RS responder, by > + * default goto next. (priority 0)*/ > +for (&Router(.lr = lr)) > +{ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ND_RA_OPTIONS), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ND_RA_RESPONSE), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +/* Proxy table that stores per-port routes. > + * There routes get converted into logical flows by > + * the following rule. > + */ > +relation Route(key: route_key, // matching criteria > + port: Ref<RouterPort>, // output port > + src_ip: v46_ip, // source IP address for output > + gateway: Option<v46_ip>) // next hop (unless being delivered) > + > +function build_route_match(key: route_key) : (string, bit<32>) = > +{ > + var ipX = ip46_ipX(key.ip_prefix); > + > + (var dir, var priority) = match (key.policy) { > + SrcIp -> ("src", key.plen * 2), > + DstIp -> ("dst", (key.plen * 2) + 1) > + }; > + > + var network = ip46_get_network(key.ip_prefix, key.plen); > + var __match = "${ipX}.${dir} == ${network}/${key.plen}"; > + > + (__match, priority) > +} > +for (Route(.port = port, > + .key = key, > + .src_ip = src_ip, > + .gateway = gateway)) > +{ > + var ipX = ip46_ipX(key.ip_prefix) in > + var xx = ip46_xxreg(key.ip_prefix) in > + /* IPv6 link-local addresses must be scoped to the local router port. */ > + var inport_match = match (key.ip_prefix) { > + IPv6{prefix} -> if (in6_is_lla(prefix)) { > + "inport == ${port.json_name} && " > + } else "", > + _ -> "" > + } in > + (var ip_match, var priority) = build_route_match(key) in > + var __match = inport_match ++ ip_match in > + var nexthop = match (gateway) { > + Some{gw} -> "${gw}", > + None -> "${ipX}.dst" > + } in > + var actions = > + "ip.ttl--; " > + "${rEG_ECMP_GROUP_ID()} = 0; " > + "${xx}${rEG_NEXT_HOP()} = ${nexthop}; " > + "${xx}${rEG_SRC()} = ${src_ip}; " > + "eth.src = ${port.networks.ea}; " > + "outport = ${port.json_name}; " > + "flags.loopback = 1; " > + "next;" in > + /* The priority here is calculated to implement longest-prefix-match > + * routing. */ > + Flow(.logical_datapath = port.router.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING), > + .priority = 32'd0 ++ priority, > + .__match = __match, > + .actions = actions, > + .external_ids = stage_hint(port.lrp._uuid)) > +} > + > +/* Logical router ingress table IP_ROUTING & IP_ROUTING_ECMP: IP Routing. > + * > + * A packet that arrives at this table is an IP packet that should be > + * routed to the address in 'ip[46].dst'. > + * > + * For regular routes without ECMP, table IP_ROUTING sets outport to the > + * correct output port, eth.src to the output port's MAC address, and > + * '[xx]${rEG_NEXT_HOP()}' to the next-hop IP address (leaving 'ip[46].dst', the > + * packet’s final destination, unchanged), and advances to the next table. > + * > + * For ECMP routes, i.e. multiple routes with same policy and prefix, table > + * IP_ROUTING remembers ECMP group id and selects a member id, and advances > + * to table IP_ROUTING_ECMP, which sets outport, eth.src, and the appropriate > + * next-hop register for the selected ECMP member. > + * */ > +Route(key, port, src_ip, None) :- > + RouterPortNetworksIPv4Addr(.port = port, .addr = addr), > + var key = RouteKey{DstIp, IPv4{addr.addr}, addr.plen}, > + var src_ip = IPv4{addr.addr}. > + > +Route(key, port, src_ip, None) :- > + RouterPortNetworksIPv6Addr(.port = port, .addr = addr), > + var key = RouteKey{DstIp, IPv6{addr.addr}, addr.plen}, > + var src_ip = IPv6{addr.addr}. > + > +Flow(.logical_datapath = r.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING_ECMP), > + .priority = 150, > + .__match = "${rEG_ECMP_GROUP_ID()} == 0", > + .actions = "next;", > + .external_ids = map_empty()) :- > + r in &Router(). > + > +/* Convert the static routes to flows. */ > +Route(key, dst.port, dst.src_ip, Some{dst.nexthop}) :- > + RouterStaticRoute(.router = &router, .key = key, .dsts = dsts), > + set_size(dsts) == 1, > + Some{var dst} = set_nth(dsts, 0). > + > +/* Return a vector of pairs (1, set[0]), ... (n, set[n - 1]). */ > +function numbered_vec(set: Set<'A>) : Vec<(bit<16>, 'A)> = { > + var vec = vec_with_capacity(set_size(set)); > + var i = 1; > + for (x in set) { > + vec_push(vec, (i, x)); > + i = i + 1 > + }; > + vec > +} > + > +relation EcmpGroup( > + group_id: bit<16>, > + router: Ref<Router>, > + key: route_key, > + dsts: Set<route_dst>, > + route_match: string, // This is build_route_match(key).0 > + route_priority: integer) // This is build_route_match(key).1 > + > +EcmpGroup(group_id, router, key, dsts, route_match, route_priority) :- > + r in RouterStaticRoute(.router = router, .key = key, .dsts = dsts), > + set_size(dsts) > 1, > + var groups = (router, key, dsts).group_by(()).to_set(), > + var group_id_and_group = FlatMap(numbered_vec(groups)), > + (var group_id, (var router, var key, var dsts)) = group_id_and_group, > + (var route_match, var route_priority0) = build_route_match(key), > + var route_priority = route_priority0 as integer. > + > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING), > + .priority = route_priority, > + .__match = route_match, > + .actions = actions, > + .external_ids = map_empty()) :- > + EcmpGroup(group_id, router, key, dsts, route_match, route_priority), > + var all_member_ids = { > + var member_ids = vec_with_capacity(set_size(dsts)); > + for (i in range_vec(1, set_size(dsts)+1, 1)) { > + vec_push(member_ids, "${i}") > + }; > + string_join(member_ids, ", ") > + }, > + var actions = > + "ip.ttl--; " > + "flags.loopback = 1; " > + "${rEG_ECMP_GROUP_ID()} = ${group_id}; " /* XXX */ > + "${rEG_ECMP_MEMBER_ID()} = select(${all_member_ids});". > + > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING_ECMP), > + .priority = 100, > + .__match = __match, > + .actions = actions, > + .external_ids = map_empty()) :- > + EcmpGroup(group_id, router, key, dsts, _, _), > + var member_id_and_dst = FlatMap(numbered_vec(dsts)), > + (var member_id, var dst) = member_id_and_dst, > + var xx = ip46_xxreg(dst.nexthop), > + var __match = "${rEG_ECMP_GROUP_ID()} == ${group_id} && " > + "${rEG_ECMP_MEMBER_ID()} == ${member_id}", > + var actions = "${xx}${rEG_NEXT_HOP()} = ${dst.nexthop}; " > + "${xx}${rEG_SRC()} = ${dst.src_ip}; " > + "eth.src = ${dst.port.networks.ea}; " > + "outport = ${dst.port.json_name}; " > + "next;". > + > +/* If symmetric ECMP replies are enabled, then packets that arrive over > + * an ECMP route need to go through conntrack. > + */ > +relation EcmpSymmetricReply( > + router: Ref<Router>, > + dst: route_dst, > + route_match: string, > + tunkey: integer) > +EcmpSymmetricReply(router, dst, route_match, tunkey) :- > + EcmpGroup(.router = router, .dsts = dsts, .route_match = route_match), > + router.is_gateway, > + var dst = FlatMap(dsts), > + dst.ecmp_symmetric_reply, > + PortTunKeyAllocation(.port = dst.port.lrp._uuid, .tunkey = tunkey). > + > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, DEFRAG), > + .priority = 100, > + .__match = __match, > + .actions = "ct_next;", > + .external_ids = map_empty()) :- > + EcmpSymmetricReply(router, dst, route_match, _), > + var __match = "inport == ${dst.port.json_name} && ${route_match}". > + > +/* And packets that go out over an ECMP route need conntrack. > + XXX this seems to exactly duplicate the above flow? */ > + > +/* Save src eth and inport in ct_label for packets that arrive over > + * an ECMP route. > + */ > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ECMP_STATEFUL), > + .priority = 100, > + .__match = __match, > + .actions = actions, > + .external_ids = map_empty()) :- > + EcmpSymmetricReply(router, dst, route_match, tunkey), > + var __match = "inport == ${dst.port.json_name} && ${route_match} && " > + "(ct.new && !ct.est)", > + var actions = "ct_commit { ct_label.ecmp_reply_eth = eth.src;" > + " ct_label.ecmp_reply_port = ${tunkey};}; next;". > + > +/* Bypass ECMP selection if we already have ct_label information > + * for where to route the packet. > + */ > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING), > + .priority = 100, > + .__match = "${ecmp_reply} && ${route_match}", > + .actions = "ip.ttl--; " > + "flags.loopback = 1; " > + "eth.src = ${dst.port.networks.ea}; " > + "${xx}reg1 = ${dst.src_ip}; " > + "outport = ${dst.port.json_name}; " > + "next;", > + .external_ids = map_empty()), > +/* Egress reply traffic for symmetric ECMP routes skips router policies. */ > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, POLICY), > + .priority = 65535, > + .__match = ecmp_reply, > + .actions = "next;", > + .external_ids = map_empty()), > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 200, > + .__match = ecmp_reply, > + .actions = "eth.dst = ct_label.ecmp_reply_eth; next;", > + .external_ids = map_empty()) :- > + EcmpSymmetricReply(router, dst, route_match, tunkey), > + var ecmp_reply = "ct.rpl && ct_label.ecmp_reply_port == ${tunkey}", > + var xx = ip46_xxreg(dst.nexthop). > + > + > +/* IP Multicast lookup. Here we set the output port, adjust TTL and advance > + * to next table (priority 500). > + */ > +/* Drop IPv6 multicast traffic that shouldn't be forwarded, > + * i.e., router solicitation and router advertisement. > + */ > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING), > + .priority = 550, > + .__match = "nd_rs || nd_ra", > + .actions = "drop;", > + .external_ids = map_empty()) :- > + router in &Router(). > + > +for (IgmpRouterMulticastGroup(address, &rtr, ports)) { > + for (RouterMcastFloodPorts(&rtr, flood_ports) if rtr.mcast_cfg.relay) { > + var flood_static = not set_is_empty(flood_ports) in > + var mc_static = json_string_escape(mC_STATIC().0) in > + var static_act = { > + if (flood_static) { > + "clone { " > + "outport = ${mc_static}; " > + "ip.ttl--; " > + "next; " > + "};" > + } else { > + "" > + } > + } in > + Some{var ip} = ip46_parse(address) in > + var ipX = ip46_ipX(ip) in > + Flow(.logical_datapath = rtr.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING), > + .priority = 500, > + .__match = "${ipX} && ${ipX}.dst == ${address}", > + .actions = > + "${static_act} outport = ${json_string_escape(address)}; " > + "ip.ttl--; next;", > + .external_ids = map_empty()) > + } > +} > + > +/* If needed, flood unregistered multicast on statically configured ports. > + * Priority 450. Otherwise drop any multicast traffic. > + */ > +for (RouterMcastFloodPorts(&rtr, flood_ports) if rtr.mcast_cfg.relay) { > + var mc_static = json_string_escape(mC_STATIC().0) in > + var flood_static = not set_is_empty(flood_ports) in > + var actions = if (flood_static) { > + "clone { " > + "outport = ${mc_static}; " > + "ip.ttl--; " > + "next; " > + "};" > + } else { > + "drop;" > + } in > + Flow(.logical_datapath = rtr.lr._uuid, > + .stage = router_stage(IN, IP_ROUTING), > + .priority = 450, > + .__match = "ip4.mcast || ip6.mcast", > + .actions = actions, > + .external_ids = map_empty()) > +} > + > +/* Logical router ingress table POLICY: Policy. > + * > + * A packet that arrives at this table is an IP packet that should be > + * permitted/denied/rerouted to the address in the rule's nexthop. > + * This table sets outport to the correct out_port, > + * eth.src to the output port's MAC address, > + * the appropriate register to the next-hop IP address (leaving > + * 'ip[46].dst', the packet’s final destination, unchanged), and > + * advances to the next table for ARP/ND resolution. */ > +for (&Router(.lr = lr)) { > + /* This is a catch-all rule. It has the lowest priority (0) > + * does a match-all("1") and pass-through (next) */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, POLICY), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +function stage_hint(_uuid: uuid): Map<string,string> = { > + ["stage-hint" -> "${hex(_uuid[127:96])}"] > +} > + > + > +/* Convert routing policies to flows. */ > +function pkt_mark_policy(options: Map<string,string>): string { > + var pkt_mark = map_get_uint_def(options, "pkt_mark", 0); > + if (pkt_mark > 0) { > + "pkt.mark = ${pkt_mark}; " > + } else { > + "" > + } > +} > +Flow(.logical_datapath = r.lr._uuid, > + .stage = router_stage(IN, POLICY), > + .priority = policy.priority, > + .__match = policy.__match, > + .actions = actions, > + .external_ids = stage_hint(policy._uuid)) :- > + r in &Router(), > + var policy_uuid = FlatMap(r.lr.policies), > + policy in nb::Logical_Router_Policy(._uuid = policy_uuid), > + policy.action == "reroute", > + out_port in &RouterPort(.router = r), > + Some{var nexthop_s} = policy.nexthop, > + Some{var nexthop} = ip46_parse(nexthop_s), > + Some{var src_ip} = find_lrp_member_ip(out_port.networks, nexthop), > + /* > + None: > + VLOG_WARN_RL(&rl, "lrp_addr not found for routing policy " > + " priority %"PRId64" nexthop %s", > + rule->priority, rule->nexthop); > + */ > + var xx = ip46_xxreg(src_ip), > + var actions = (pkt_mark_policy(policy.options) ++ > + "${xx}${rEG_NEXT_HOP()} = ${nexthop}; " > + "${xx}${rEG_SRC()} = ${src_ip}; " > + "eth.src = ${out_port.networks.ea}; " > + "outport = ${out_port.json_name}; " > + "flags.loopback = 1; " > + "next;"). > +Flow(.logical_datapath = r.lr._uuid, > + .stage = router_stage(IN, POLICY), > + .priority = policy.priority, > + .__match = policy.__match, > + .actions = "drop;", > + .external_ids = stage_hint(policy._uuid)) :- > + r in &Router(), > + var policy_uuid = FlatMap(r.lr.policies), > + policy in nb::Logical_Router_Policy(._uuid = policy_uuid), > + policy.action == "drop". > +Flow(.logical_datapath = r.lr._uuid, > + .stage = router_stage(IN, POLICY), > + .priority = policy.priority, > + .__match = policy.__match, > + .actions = pkt_mark_policy(policy.options) ++ "next;", > + .external_ids = stage_hint(policy._uuid)) :- > + r in &Router(), > + var policy_uuid = FlatMap(r.lr.policies), > + policy in nb::Logical_Router_Policy(._uuid = policy_uuid), > + policy.action == "allow". > + > +/* XXX destination unreachable */ > + > +/* Local router ingress table ARP_RESOLVE: ARP Resolution. > + * > + * Multicast packets already have the outport set so just advance to next > + * table (priority 500). > + */ > +for (&Router(.lr = lr)) { > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 500, > + .__match = "ip4.mcast || ip6.mcast", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +/* Local router ingress table ARP_RESOLVE: ARP Resolution. > + * > + * Any packet that reaches this table is an IP packet whose next-hop IP > + * address is in the next-hop register. (ip4.dst is the final destination.) This table > + * resolves the IP address in the next-hop register into an output port in outport and an > + * Ethernet address in eth.dst. */ > +// FIXME: does this apply to redirect ports? > +for (rp in &RouterPort(.peer = PeerRouter{peer_port, _}, > + .router = &router, > + .networks = networks)) > +{ > + for (&RouterPort(.lrp = nb::Logical_Router_Port{._uuid = peer_port}, > + .json_name = peer_json_name, > + .router = &peer_router)) > + { > + /* This is a logical router port. If next-hop IP address in > + * the next-hop register matches IP address of this router port, then > + * the packet is intended to eventually be sent to this > + * logical port. Set the destination mac address using this > + * port's mac address. > + * > + * The packet is still in peer's logical pipeline. So the match > + * should be on peer's outport. */ > + if (not vec_is_empty(networks.ipv4_addrs)) { > + var __match = "outport == ${peer_json_name} && " > + "${rEG_NEXT_HOP()} == " ++ > + format_v4_networks(networks, false) in > + Flow(.logical_datapath = peer_router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = __match, > + .actions = "eth.dst = ${networks.ea}; next;", > + .external_ids = stage_hint(rp.lrp._uuid)) > + }; > + > + if (not vec_is_empty(networks.ipv6_addrs)) { > + var __match = "outport == ${peer_json_name} && " > + "xx${rEG_NEXT_HOP()} == " ++ > + format_v6_networks(networks) in > + Flow(.logical_datapath = peer_router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = __match, > + .actions = "eth.dst = ${networks.ea}; next;", > + .external_ids = stage_hint(rp.lrp._uuid)) > + } > + } > +} > + > +/* Packet is on a non gateway chassis and > + * has an unresolved ARP on a network behind gateway > + * chassis attached router port. Since, redirect type > + * is "bridged", instead of calling "get_arp" > + * on this node, we will redirect the packet to gateway > + * chassis, by setting destination mac router port mac.*/ > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 50, > + .__match = "outport == ${rp.json_name} && " > + "!is_chassis_resident(${router.redirect_port_name})", > + .actions = "eth.dst = ${rp.networks.ea}; next;", > + .external_ids = stage_hint(lrp._uuid)) :- > + rp in &RouterPort(.lrp = lrp, .router = router), > + router.redirect_port_name != "", > + Some{"bridged"} = map_get(lrp.options, "redirect-type"). > + > + > +/* Drop IP traffic destined to router owned IPs. Part of it is dropped > + * in stage "lr_in_ip_input" but traffic that could have been unSNATed > + * but didn't match any existing session might still end up here. > + * > + * Priority 1. > + */ > +Flow(.logical_datapath = lr_uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 1, > + .__match = "ip4.dst == {" ++ match_ips.join(", ") ++ "}", > + .actions = "drop;", > + .external_ids = stage_hint(lrp_uuid)) :- > + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > + .router = &Router{.snat_ips = snat_ips, > + .lr = nb::Logical_Router{._uuid = lr_uuid}}, > + .networks = networks), > + var addr = FlatMap(networks.ipv4_addrs), > + snat_ips.contains_key(IPv4{addr.addr}), > + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). > +Flow(.logical_datapath = lr_uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 1, > + .__match = "ip6.dst == {" ++ match_ips.join(", ") ++ "}", > + .actions = "drop;", > + .external_ids = stage_hint(lrp_uuid)) :- > + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, > + .router = &Router{.snat_ips = snat_ips, > + .lr = nb::Logical_Router{._uuid = lr_uuid}}, > + .networks = networks), > + var addr = FlatMap(networks.ipv6_addrs), > + snat_ips.contains_key(IPv6{addr.addr}), > + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). > + > +/* This is a logical switch port that backs a VM or a container. > + * Extract its addresses. For each of the address, go through all > + * the router ports attached to the switch (to which this port > + * connects) and if the address in question is reachable from the > + * router port, add an ARP/ND entry in that router's pipeline. */ > +for (SwitchPortIPv4Address( > + .port = &SwitchPort{.lsp = lsp, .sw = &sw}, > + .ea = ea, > + .addr = addr) > + if lsp.__type != "router" and lsp.__type != "virtual" and lsp.is_enabled()) > +{ > + for (&SwitchPort(.sw = &Switch{.ls = nb::Logical_Switch{._uuid = sw.ls._uuid}}, > + .peer = Some{&peer@RouterPort{.router = &peer_router}})) > + { > + Some{_} = find_lrp_member_ip(peer.networks, IPv4{addr.addr}) in > + Flow(.logical_datapath = peer_router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = "outport == ${peer.json_name} && " > + "${rEG_NEXT_HOP()} == ${addr.addr}", > + .actions = "eth.dst = ${ea}; next;", > + .external_ids = stage_hint(lsp._uuid)) > + } > +} > + > +for (SwitchPortIPv6Address( > + .port = &SwitchPort{.lsp = lsp, .sw = &sw}, > + .ea = ea, > + .addr = addr) > + if lsp.__type != "router" and lsp.__type != "virtual" and lsp.is_enabled()) > +{ > + for (&SwitchPort(.sw = &Switch{.ls = nb::Logical_Switch{._uuid = sw.ls._uuid}}, > + .peer = Some{&peer@RouterPort{.router = &peer_router}})) > + { > + Some{_} = find_lrp_member_ip(peer.networks, IPv6{addr.addr}) in > + Flow(.logical_datapath = peer_router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = "outport == ${peer.json_name} && " > + "xx${rEG_NEXT_HOP()} == ${addr.addr}", > + .actions = "eth.dst = ${ea}; next;", > + .external_ids = stage_hint(lsp._uuid)) > + } > +} > + > +/* True if 's' is an empty set or a set that contains just an empty string, > + * false otherwise. > + * > + * This is meant for sets of 0 or 1 elements, like the OVSDB integration > + * with DDlog uses. */ > +function is_empty_set_or_string(s: Option<string>): bool = { > + match (s) { > + None -> true, > + Some{""} -> true, > + _ -> false > + } > +} > + > +/* This is a virtual port. Add ARP replies for the virtual ip with > + * the mac of the present active virtual parent. > + * If the logical port doesn't have virtual parent set in > + * Port_Binding table, then add the flow to set eth.dst to > + * 00:00:00:00:00:00 and advance to next table so that ARP is > + * resolved by router pipeline using the arp{} action. > + * The MAC_Binding entry for the virtual ip might be invalid. */ > +Flow(.logical_datapath = peer.router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = "outport == ${peer.json_name} && " > + "${rEG_NEXT_HOP()} == ${virtual_ip}", > + .actions = "eth.dst = 00:00:00:00:00:00; next;", > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "virtual"}), > + Some{var virtual_ip_s} = map_get(lsp.options, "virtual-ip"), > + Some{var virtual_parents} = map_get(lsp.options, "virtual-parents"), > + Some{var virtual_ip} = ip_parse(virtual_ip_s), > + pb in sb::Port_Binding(.logical_port = sp.lsp.name), > + is_empty_set_or_string(pb.virtual_parent) or is_none(pb.chassis), > + sp2 in &SwitchPort(.sw = sp.sw, .peer = Some{peer}), > + Some{_} = find_lrp_member_ip(peer.networks, IPv4{virtual_ip}). > +Flow(.logical_datapath = peer.router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = "outport == ${peer.json_name} && " > + "${rEG_NEXT_HOP()} == ${virtual_ip}", > + .actions = "eth.dst = ${address.ea}; next;", > + .external_ids = stage_hint(sp.lsp._uuid)) :- > + sp in &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "virtual"}), > + Some{var virtual_ip_s} = map_get(lsp.options, "virtual-ip"), > + Some{var virtual_parents} = map_get(lsp.options, "virtual-parents"), > + Some{var virtual_ip} = ip_parse(virtual_ip_s), > + pb in sb::Port_Binding(.logical_port = sp.lsp.name), > + not (is_empty_set_or_string(pb.virtual_parent) or is_none(pb.chassis)), > + Some{var virtual_parent} = pb.virtual_parent, > + vp in &SwitchPort(.lsp = nb::Logical_Switch_Port{.name = virtual_parent}), > + var address = FlatMap(vp.static_addresses), > + sp2 in &SwitchPort(.sw = sp.sw, .peer = Some{peer}), > + Some{_} = find_lrp_member_ip(peer.networks, IPv4{virtual_ip}). > + > +/* This is a logical switch port that connects to a router. */ > + > +/* The peer of this switch port is the router port for which > + * we need to add logical flows such that it can resolve > + * ARP entries for all the other router ports connected to > + * the switch in question. */ > +for (&SwitchPort(.lsp = lsp1, > + .peer = Some{&peer1@RouterPort{.router = &peer_router}}, > + .sw = &sw) > + if lsp1.is_enabled() and > + not map_get_bool_def(peer_router.lr.options, "dynamic_neigh_routers", false)) > +{ > + for (&SwitchPort(.lsp = lsp2, .peer = Some{&peer2}, > + .sw = &Switch{.ls = nb::Logical_Switch{._uuid = sw.ls._uuid}}) > + /* Skip the router port under consideration. */ > + if peer2.lrp._uuid != peer1.lrp._uuid) > + { > + if (not vec_is_empty(peer2.networks.ipv4_addrs)) { > + Flow(.logical_datapath = peer_router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = "outport == ${peer1.json_name} && " > + "${rEG_NEXT_HOP()} == ${format_v4_networks(peer2.networks, false)}", > + .actions = "eth.dst = ${peer2.networks.ea}; next;", > + .external_ids = stage_hint(lsp1._uuid)) > + }; > + > + if (not vec_is_empty(peer2.networks.ipv6_addrs)) { > + Flow(.logical_datapath = peer_router.lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 100, > + .__match = "outport == ${peer1.json_name} && " > + "xx${rEG_NEXT_HOP()} == ${format_v6_networks(peer2.networks)}", > + .actions = "eth.dst = ${peer2.networks.ea}; next;", > + .external_ids = stage_hint(lsp1._uuid)) > + } > + } > +} > + > +for (&Router(.lr = lr)) > +{ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 0, > + .__match = "ip4", > + .actions = "get_arp(outport, ${rEG_NEXT_HOP()}); next;", > + .external_ids = map_empty()); > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ARP_RESOLVE), > + .priority = 0, > + .__match = "ip6", > + .actions = "get_nd(outport, xx${rEG_NEXT_HOP()}); next;", > + .external_ids = map_empty()) > +} > + > +/* Local router ingress table CHK_PKT_LEN: Check packet length. > + * > + * Any IPv4 packet with outport set to the distributed gateway > + * router port, check the packet length and store the result in the > + * 'REGBIT_PKT_LARGER' register bit. > + * > + * Local router ingress table LARGER_PKTS: Handle larger packets. > + * > + * Any IPv4 packet with outport set to the distributed gateway > + * router port and the 'REGBIT_PKT_LARGER' register bit is set, > + * generate ICMPv4 packet with type 3 (Destination Unreachable) and > + * code 4 (Fragmentation needed). > + * */ > +Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, CHK_PKT_LEN), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) :- > + &Router(.lr = lr). > +Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LARGER_PKTS), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) :- > + &Router(.lr = lr). > +Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, CHK_PKT_LEN), > + .priority = 50, > + .__match = "outport == ${l3dgw_port_json_name}", > + .actions = "${rEGBIT_PKT_LARGER()} = check_pkt_larger(${mtu}); " > + "next;", > + .external_ids = stage_hint(l3dgw_port._uuid)) :- > + r in &Router(.lr = lr), > + Some{var l3dgw_port} = r.l3dgw_port, > + var l3dgw_port_json_name = json_string_escape(l3dgw_port.name), > + r.redirect_port_name != "", > + var gw_mtu = map_get_int_def(l3dgw_port.options, "gateway_mtu", 0), > + gw_mtu > 0, > + var mtu = gw_mtu + vLAN_ETH_HEADER_LEN(). > +Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LARGER_PKTS), > + .priority = 50, > + .__match = "inport == ${rp.json_name} && outport == ${l3dgw_port_json_name} && " > + "ip4 && ${rEGBIT_PKT_LARGER()}", > + .actions = "icmp4_error {" > + "${rEGBIT_EGRESS_LOOPBACK()} = 1; " > + "eth.dst = ${rp.networks.ea}; " > + "ip4.dst = ip4.src; " > + "ip4.src = ${first_ipv4.addr}; " > + "ip.ttl = 255; " > + "icmp4.type = 3; /* Destination Unreachable. */ " > + "icmp4.code = 4; /* Frag Needed and DF was Set. */ " > + /* Set icmp4.frag_mtu to gw_mtu */ > + "icmp4.frag_mtu = ${gw_mtu}; " > + "next(pipeline=ingress, table=0); " > + "};", > + .external_ids = stage_hint(rp.lrp._uuid)) :- > + r in &Router(.lr = lr), > + Some{var l3dgw_port} = r.l3dgw_port, > + var l3dgw_port_json_name = json_string_escape(l3dgw_port.name), > + r.redirect_port_name != "", > + var gw_mtu = map_get_int_def(l3dgw_port.options, "gateway_mtu", 0), > + gw_mtu > 0, > + rp in &RouterPort(.router = r), > + rp.lrp != l3dgw_port, > + Some{var first_ipv4} = vec_nth(rp.networks.ipv4_addrs, 0). > +Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, LARGER_PKTS), > + .priority = 50, > + .__match = "inport == ${rp.json_name} && outport == ${l3dgw_port_json_name} && " > + "ip6 && ${rEGBIT_PKT_LARGER()}", > + .actions = "icmp6_error {" > + "${rEGBIT_EGRESS_LOOPBACK()} = 1; " > + "eth.dst = ${rp.networks.ea}; " > + "ip6.dst = ip6.src; " > + "ip6.src = ${first_ipv6.addr}; " > + "ip.ttl = 255; " > + "icmp6.type = 2; /* Packet Too Big. */ " > + "icmp6.code = 0; " > + /* Set icmp6.frag_mtu to gw_mtu */ > + "icmp6.frag_mtu = ${gw_mtu}; " > + "next(pipeline=ingress, table=0); " > + "};", > + .external_ids = stage_hint(rp.lrp._uuid)) :- > + r in &Router(.lr = lr), > + Some{var l3dgw_port} = r.l3dgw_port, > + var l3dgw_port_json_name = json_string_escape(l3dgw_port.name), > + r.redirect_port_name != "", > + var gw_mtu = map_get_int_def(l3dgw_port.options, "gateway_mtu", 0), > + gw_mtu > 0, > + rp in &RouterPort(.router = r), > + rp.lrp != l3dgw_port, > + Some{var first_ipv6} = vec_nth(rp.networks.ipv6_addrs, 0). > + > +/* Logical router ingress table GW_REDIRECT: Gateway redirect. > + * > + * For traffic with outport equal to the l3dgw_port > + * on a distributed router, this table redirects a subset > + * of the traffic to the l3redirect_port which represents > + * the central instance of the l3dgw_port. > + */ > +for (&Router(.lr = lr, > + .l3dgw_port = l3dgw_port, > + .redirect_port_name = redirect_port_name)) > +{ > + /* For traffic with outport == l3dgw_port, if the > + * packet did not match any higher priority redirect > + * rule, then the traffic is redirected to the central > + * instance of the l3dgw_port. */ > + Some{var gwport} = l3dgw_port in > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, GW_REDIRECT), > + .priority = 50, > + .__match = "outport == ${json_string_escape(gwport.name)}", > + .actions = "outport = ${redirect_port_name}; next;", > + .external_ids = stage_hint(gwport._uuid)); > + > + /* Packets are allowed by default. */ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, GW_REDIRECT), > + .priority = 0, > + .__match = "1", > + .actions = "next;", > + .external_ids = map_empty()) > +} > + > +/* Local router ingress table ARP_REQUEST: ARP request. > + * > + * In the common case where the Ethernet destination has been resolved, > + * this table outputs the packet (priority 0). Otherwise, it composes > + * and sends an ARP/IPv6 NA request (priority 100). */ > +Flow(.logical_datapath = router.lr._uuid, > + .stage = router_stage(IN, ARP_REQUEST), > + .priority = 200, > + .__match = __match, > + .actions = actions, > + .external_ids = map_empty()) :- > + rsr in RouterStaticRoute(.router = &router), > + var dst = FlatMap(rsr.dsts), > + IPv6{var gw_ip6} = dst.nexthop, > + var __match = "eth.dst == 00:00:00:00:00:00 && " > + "ip6 && xx${rEG_NEXT_HOP()} == ${dst.nexthop}", > + var sn_addr = in6_addr_solicited_node(gw_ip6), > + var eth_dst = ipv6_multicast_to_ethernet(sn_addr), > + var sn_addr_s = ipv6_string_mapped(sn_addr), > + var actions = "nd_ns { " > + "eth.dst = ${eth_dst}; " > + "ip6.dst = ${sn_addr_s}; " > + "nd.target = ${dst.nexthop}; " > + "output; " > + "};". > + > +for (&Router(.lr = lr)) > +{ > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ARP_REQUEST), > + .priority = 100, > + .__match = "eth.dst == 00:00:00:00:00:00 && ip4", > + .actions = "arp { " > + "eth.dst = ff:ff:ff:ff:ff:ff; " > + "arp.spa = ${rEG_SRC()}; " > + "arp.tpa = ${rEG_NEXT_HOP()}; " > + "arp.op = 1; " /* ARP request */ > + "output; " > + "};", > + .external_ids = map_empty()); > + > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ARP_REQUEST), > + .priority = 100, > + .__match = "eth.dst == 00:00:00:00:00:00 && ip6", > + .actions = "nd_ns { " > + "nd.target = xx${rEG_NEXT_HOP()}; " > + "output; " > + "};", > + .external_ids = map_empty()); > + > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(IN, ARP_REQUEST), > + .priority = 0, > + .__match = "1", > + .actions = "output;", > + .external_ids = map_empty()) > +} > + > + > +/* Logical router egress table DELIVERY: Delivery (priority 100). > + * > + * Priority 100 rules deliver packets to enabled logical ports. */ > +for (&RouterPort(.lrp = lrp, > + .json_name = json_name, > + .networks = lrp_networks, > + .router = &Router{.lr = lr, .mcast_cfg = &mcast_cfg}) > + /* Drop packets to disabled logical ports (since logical flow > + * tables are default-drop). */ > + if lrp.is_enabled()) > +{ > + /* If multicast relay is enabled then also adjust source mac for IP > + * multicast traffic. > + */ > + if (mcast_cfg.relay) { > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, DELIVERY), > + .priority = 110, > + .__match = "(ip4.mcast || ip6.mcast) && " > + "outport == ${json_name}", > + .actions = "eth.src = ${lrp_networks.ea}; output;", > + .external_ids = stage_hint(lrp._uuid)) > + }; > + /* No egress packets should be processed in the context of > + * a chassisredirect port. The chassisredirect port should > + * be replaced by the l3dgw port in the local output > + * pipeline stage before egress processing. */ > + > + Flow(.logical_datapath = lr._uuid, > + .stage = router_stage(OUT, DELIVERY), > + .priority = 100, > + .__match = "outport == ${json_name}", > + .actions = "output;", > + .external_ids = stage_hint(lrp._uuid)) > +} > + > +/* > + * Datapath tunnel key allocation: > + * > + * Allocates a globally unique tunnel id in the range 1...2**24-1 for > + * each Logical_Switch and Logical_Router. > + */ > + > +function oVN_MAX_DP_KEY(): integer { (64'd1 << 24) - 1 } > +function oVN_MAX_DP_GLOBAL_NUM(): integer { (64'd1 << 16) - 1 } > +function oVN_MIN_DP_KEY_LOCAL(): integer { 1 } > +function oVN_MAX_DP_KEY_LOCAL(): integer { oVN_MAX_DP_KEY() - oVN_MAX_DP_GLOBAL_NUM() } > +function oVN_MIN_DP_KEY_GLOBAL(): integer { oVN_MAX_DP_KEY_LOCAL() + 1 } > +function oVN_MAX_DP_KEY_GLOBAL(): integer { oVN_MAX_DP_KEY() } > + > +function oVN_MAX_DP_VXLAN_KEY(): integer { (64'd1 << 12) - 1 } > +function oVN_MAX_DP_VXLAN_KEY_LOCAL(): integer { oVN_MAX_DP_KEY() - oVN_MAX_DP_GLOBAL_NUM() } > + > +/* If any chassis uses VXLAN encapsulation, then the entire deployment is in VXLAN mode. */ > +relation IsVxlanMode0() > +IsVxlanMode0() :- > + sb::Chassis(.encaps = encaps), > + var encap_uuid = FlatMap(encaps), > + sb::Encap(._uuid = encap_uuid, .__type = "vxlan"). > + > +relation IsVxlanMode[bool] > +IsVxlanMode[true] :- > + IsVxlanMode0(). > +IsVxlanMode[false] :- > + Unit(), > + not IsVxlanMode0(). > + > +/* The maximum datapath tunnel key that may be used. */ > +relation OvnMaxDpKeyLocal[integer] > +/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ > +OvnMaxDpKeyLocal[oVN_MAX_DP_VXLAN_KEY()] :- IsVxlanMode[true]. > +OvnMaxDpKeyLocal[oVN_MAX_DP_KEY() - oVN_MAX_DP_GLOBAL_NUM()] :- IsVxlanMode[false]. > + > +function get_dp_tunkey(map: Map<string,string>, key: string): Option<integer> { > + match (map_get(map, key)) { > + Some{value} -> match (str_to_int(value, 10)) { > + Some{x} -> if (x > 0 and x < (2<<24)) { > + Some{x} > + } else { > + None > + }, > + _ -> None > + }, > + _ -> None > + } > +} > + > +// Tunnel keys requested by datapaths. > +relation RequestedTunKey(datapath: uuid, tunkey: integer) > +RequestedTunKey(uuid, tunkey) :- > + ls in nb::Logical_Switch(._uuid = uuid), > + Some{var tunkey} = get_dp_tunkey(ls.other_config, "requested-tnl-key"). > +RequestedTunKey(uuid, tunkey) :- > + lr in nb::Logical_Router(._uuid = uuid), > + Some{var tunkey} = get_dp_tunkey(lr.options, "requested-tnl-key"). > +Warning[message] :- > + RequestedTunKey(datapath, tunkey), > + var count = datapath.group_by((tunkey)).size(), > + count > 1, > + var message = "${count} logical switches or routers request " > + "datapath tunnel key ${tunkey}". > + > +// Assign tunnel keys: > +// - First priority to requested tunnel keys. > +// - Second priority to already assigned tunnel keys. > +// In either case, make an arbitrary choice in case of conflicts within a > +// priority level. > +relation AssignedTunKey(datapath: uuid, tunkey: integer) > +AssignedTunKey(datapath, tunkey) :- > + RequestedTunKey(datapath, tunkey), > + var datapath = datapath.group_by(tunkey).first(). > +AssignedTunKey(datapath, tunkey) :- > + sb::Datapath_Binding(._uuid = datapath, .tunnel_key = tunkey), > + not RequestedTunKey(_, tunkey), > + not RequestedTunKey(datapath, _), > + var datapath = datapath.group_by(tunkey).first(). > + > +// all tunnel keys already in use in the Realized table > +relation AllocatedTunKeys(keys: Set<integer>) > +AllocatedTunKeys(keys) :- > + AssignedTunKey(.tunkey = tunkey), > + var keys = tunkey.group_by(()).to_set(). > + > +// Datapath_Binding's not yet in the Realized table > +relation NotYetAllocatedTunKeys(datapaths: Vec<uuid>) > + > +NotYetAllocatedTunKeys(datapaths) :- > + OutProxy_Datapath_Binding(._uuid = datapath), > + not AssignedTunKey(datapath, _), > + var datapaths = datapath.group_by(()).to_vec(). > + > +// Perform the allocation > +relation TunKeyAllocation(datapath: uuid, tunkey: integer) > + > +TunKeyAllocation(datapath, tunkey) :- AssignedTunKey(datapath, tunkey). > + > +// Case 1: AllocatedTunKeys relation is not empty (i.e., contains > +// a single record that stores a set of allocated keys) > +TunKeyAllocation(datapath, tunkey) :- > + NotYetAllocatedTunKeys(unallocated), > + AllocatedTunKeys(allocated), > + OvnMaxDpKeyLocal[max_dp_key_local], > + var allocation = FlatMap(allocate(allocated, unallocated, 1, max_dp_key_local)), > + (var datapath, var tunkey) = allocation. > + > +// Case 2: AllocatedTunKeys relation is empty > +TunKeyAllocation(datapath, tunkey) :- > + NotYetAllocatedTunKeys(unallocated), > + not AllocatedTunKeys(_), > + OvnMaxDpKeyLocal[max_dp_key_local], > + var allocation = FlatMap(allocate(set_empty(), unallocated, 1, max_dp_key_local)), > + (var datapath, var tunkey) = allocation. > + > +/* > + * Port id allocation: > + * > + * Port IDs in a per-datapath space in the range 1...2**15-1 > + */ > + > +function get_port_tunkey(map: Map<string,string>, key: string): Option<integer> { > + match (map_get(map, key)) { > + Some{value} -> match (str_to_int(value, 10)) { > + Some{x} -> if (x > 0 and x < (2<<15)) { > + Some{x} > + } else { > + None > + }, > + _ -> None > + }, > + _ -> None > + } > +} > + > +// Tunnel keys requested by port bindings. > +relation RequestedPortTunKey(datapath: uuid, port: uuid, tunkey: integer) > +RequestedPortTunKey(datapath, port, tunkey) :- > + sp in &SwitchPort(), > + var datapath = sp.sw.ls._uuid, > + var port = sp.lsp._uuid, > + Some{var tunkey} = get_port_tunkey(sp.lsp.options, "requested-tnl-key"). > +RequestedPortTunKey(datapath, port, tunkey) :- > + rp in &RouterPort(), > + var datapath = rp.router.lr._uuid, > + var port = rp.lrp._uuid, > + Some{var tunkey} = get_port_tunkey(rp.lrp.options, "requested-tnl-key"). > +Warning[message] :- > + RequestedPortTunKey(datapath, port, tunkey), > + var count = port.group_by((datapath, tunkey)).size(), > + count > 1, > + var message = "${count} logical ports in the same datapath " > + "request port tunnel key ${tunkey}". > + > +// Assign tunnel keys: > +// - First priority to requested tunnel keys. > +// - Second priority to already assigned tunnel keys. > +// In either case, make an arbitrary choice in case of conflicts within a > +// priority level. > +relation AssignedPortTunKey(datapath: uuid, port: uuid, tunkey: integer) > +AssignedPortTunKey(datapath, port, tunkey) :- > + RequestedPortTunKey(datapath, port, tunkey), > + var port = port.group_by((datapath, tunkey)).first(). > +AssignedPortTunKey(datapath, port, tunkey) :- > + sb::Port_Binding(._uuid = port_uuid, > + .datapath = datapath, > + .tunnel_key = tunkey), > + not RequestedPortTunKey(datapath, _, tunkey), > + not RequestedPortTunKey(datapath, port_uuid, _), > + var port = port_uuid.group_by((datapath, tunkey)).first(). > + > +// all tunnel keys already in use in the Realized table > +relation AllocatedPortTunKeys(datapath: uuid, keys: Set<integer>) > + > +AllocatedPortTunKeys(datapath, keys) :- > + AssignedPortTunKey(datapath, port, tunkey), > + var keys = tunkey.group_by(datapath).to_set(). > + > +// Port_Binding's not yet in the Realized table > +relation NotYetAllocatedPortTunKeys(datapath: uuid, all_logical_ids: Vec<uuid>) > + > +NotYetAllocatedPortTunKeys(datapath, all_names) :- > + OutProxy_Port_Binding(._uuid = port_uuid, .datapath = datapath), > + not AssignedPortTunKey(datapath, port_uuid, _), > + var all_names = port_uuid.group_by(datapath).to_vec(). > + > +// Perform the allocation. > +relation PortTunKeyAllocation(port: uuid, tunkey: integer) > + > +// Transfer existing allocations from the realized table. > +PortTunKeyAllocation(port, tunkey) :- AssignedPortTunKey(_, port, tunkey). > + > +// Case 1: AllocatedPortTunKeys(datapath) is not empty (i.e., contains > +// a single record that stores a set of allocated keys). > +PortTunKeyAllocation(port, tunkey) :- > + AllocatedPortTunKeys(datapath, allocated), > + NotYetAllocatedPortTunKeys(datapath, unallocated), > + var allocation = FlatMap(allocate(allocated, unallocated, 1, 64'hffff)), > + (var port, var tunkey) = allocation. > + > +// Case 2: PortAllocatedTunKeys(datapath) relation is empty > +PortTunKeyAllocation(port, tunkey) :- > + NotYetAllocatedPortTunKeys(datapath, unallocated), > + not AllocatedPortTunKeys(datapath, _), > + var allocation = FlatMap(allocate(set_empty(), unallocated, 1, 64'hffff)), > + (var port, var tunkey) = allocation. > + > +/* > + * Multicast group tunnel_key allocation: > + * > + * Tunnel-keys in a per-datapath space in the range 32770...65535 > + */ > + > +// All tunnel keys already in use in the Realized table. > +relation AllocatedMulticastGroupTunKeys(datapath_uuid: uuid, keys: Set<integer>) > + > +AllocatedMulticastGroupTunKeys(datapath_uuid, keys) :- > + sb::Multicast_Group(.datapath = datapath_uuid, .tunnel_key = tunkey), > + //sb::UUIDMap_Datapath_Binding(datapath, Left{datapath_uuid}), > + var keys = tunkey.group_by(datapath_uuid).to_set(). > + > +// Multicast_Group's not yet in the Realized table. > +relation NotYetAllocatedMulticastGroupTunKeys(datapath_uuid: uuid, > + all_logical_ids: Vec<string>) > + > +NotYetAllocatedMulticastGroupTunKeys(datapath_uuid, all_names) :- > + OutProxy_Multicast_Group(.name = name, .datapath = datapath_uuid), > + not sb::Multicast_Group(.name = name, .datapath = datapath_uuid), > + var all_names = name.group_by(datapath_uuid).to_vec(). > + > +// Perform the allocation > +relation MulticastGroupTunKeyAllocation(datapath_uuid: uuid, group: string, tunkey: integer) > + > +// transfer existing allocations from the realized table > +MulticastGroupTunKeyAllocation(datapath_uuid, group, tunkey) :- > + //sb::UUIDMap_Datapath_Binding(_, datapath_uuid), > + sb::Multicast_Group(.name = group, > + .datapath = datapath_uuid, > + .tunnel_key = tunkey). > + > +// Case 1: AllocatedMulticastGroupTunKeys(datapath) is not empty (i.e., > +// contains a single record that stores a set of allocated keys) > +MulticastGroupTunKeyAllocation(datapath_uuid, group, tunkey) :- > + AllocatedMulticastGroupTunKeys(datapath_uuid, allocated), > + NotYetAllocatedMulticastGroupTunKeys(datapath_uuid, unallocated), > + (_, var min_key) = mC_IP_MCAST_MIN(), > + (_, var max_key) = mC_IP_MCAST_MAX(), > + var allocation = FlatMap(allocate(allocated, unallocated, > + min_key, max_key)), > + (var group, var tunkey) = allocation. > + > +// Case 2: AllocatedMulticastGroupTunKeys(datapath) relation is empty > +MulticastGroupTunKeyAllocation(datapath_uuid, group, tunkey) :- > + NotYetAllocatedMulticastGroupTunKeys(datapath_uuid, unallocated), > + not AllocatedMulticastGroupTunKeys(datapath_uuid, _), > + (_, var min_key) = mC_IP_MCAST_MIN(), > + (_, var max_key) = mC_IP_MCAST_MAX(), > + var allocation = FlatMap(allocate(set_empty(), unallocated, > + min_key, max_key)), > + (var group, var tunkey) = allocation. > + > +/* > + * Queue ID allocation > + * > + * Queue IDs on a chassis, for routers that have QoS enabled, in a per-chassis > + * space in the range 1...0xf000. It looks to me like there'd only be a small > + * number of these per chassis, and probably a small number overall, in case it > + * matters. > + * > + * Queue ID may also need to be deallocated if port loses QoS attributes > + * > + * This logic applies mainly to sb::Port_Binding records bound to a chassis > + * (i.e. with the chassis column nonempty) but "localnet" ports can also > + * have a queue ID. For those we use the port's own UUID as the chassis UUID. > + */ > + > +function port_has_qos_params(opts: Map<string, string>): bool = { > + map_contains_key(opts, "qos_max_rate") or > + map_contains_key(opts, "qos_burst") > +} > + > + > +// ports in Out_Port_Binding that require queue ID on chassis > +relation PortRequiresQID(port: uuid, chassis: uuid) > + > +PortRequiresQID(pb._uuid, chassis) :- > + pb in OutProxy_Port_Binding(), > + pb.__type != "localnet", > + port_has_qos_params(pb.options), > + sb::Port_Binding(._uuid = pb._uuid, .chassis = chassis_set), > + Some{var chassis} = chassis_set. > +PortRequiresQID(pb._uuid, pb._uuid) :- > + pb in OutProxy_Port_Binding(), > + pb.__type == "localnet", > + port_has_qos_params(pb.options), > + sb::Port_Binding(._uuid = pb._uuid). > + > +relation AggPortRequiresQID(chassis: uuid, ports: Vec<uuid>) > + > +AggPortRequiresQID(chassis, ports) :- > + PortRequiresQID(port, chassis), > + var ports = port.group_by(chassis).to_vec(). > + > +relation AllocatedQIDs(chassis: uuid, allocated_ids: Map<uuid, integer>) > + > +AllocatedQIDs(chassis, allocated_ids) :- > + pb in sb::Port_Binding(), > + pb.__type != "localnet", > + Some{var chassis} = pb.chassis, > + Some{var qid_str} = map_get(pb.options, "qdisc_queue_id"), > + Some{var qid} = parse_dec_u64(qid_str), > + var allocated_ids = (pb._uuid, qid).group_by(chassis).to_map(). > +AllocatedQIDs(chassis, allocated_ids) :- > + pb in sb::Port_Binding(), > + pb.__type == "localnet", > + var chassis = pb._uuid, > + Some{var qid_str} = map_get(pb.options, "qdisc_queue_id"), > + Some{var qid} = parse_dec_u64(qid_str), > + var allocated_ids = (pb._uuid, qid).group_by(chassis).to_map(). > + > +// allocate queue IDs to ports > +relation QueueIDAllocation(port: uuid, qids: Option<integer>) > + > +// None for ports that do not require a queue > +QueueIDAllocation(port, None) :- > + OutProxy_Port_Binding(._uuid = port), > + not PortRequiresQID(port, _). > + > +QueueIDAllocation(port, Some{qid}) :- > + AggPortRequiresQID(chassis, ports), > + AllocatedQIDs(chassis, allocated_ids), > + var allocations = FlatMap(adjust_allocation(allocated_ids, ports, 1, 64'hf000)), > + (var port, var qid) = allocations. > + > +QueueIDAllocation(port, Some{qid}) :- > + AggPortRequiresQID(chassis, ports), > + not AllocatedQIDs(chassis, _), > + var allocations = FlatMap(adjust_allocation(map_empty(), ports, 1, 64'hf000)), > + (var port, var qid) = allocations. > + > +/* > + * This allows ovn-northd to preserve options:ipv6_ra_pd_list, which is set by > + * ovn-controller. > + */ > +relation PreserveIPv6RAPDList(lrp_uuid: uuid, ipv6_ra_pd_list: Option<string>) > +PreserveIPv6RAPDList(lrp_uuid, ipv6_ra_pd_list) :- > + sb::Port_Binding(._uuid = lrp_uuid, .options = options), > + var ipv6_ra_pd_list = map_get(options, "ipv6_ra_pd_list"). > +PreserveIPv6RAPDList(lrp_uuid, None) :- > + nb::Logical_Router_Port(._uuid = lrp_uuid), > + not sb::Port_Binding(._uuid = lrp_uuid). > + > +/* > + * Tag allocation for nested containers. > + */ > + > +/* Reserved tags for each parent port, including: > + * 1. For ports that need a dynamically allocated tag, existing tag, if any, > + * 2. For ports that have a statically assigned tag (via `tag_request`), the > + * `tag_request` value. > + * 3. For ports that do not have a tag_request, but have a tag statically assigned > + * by directly setting the `tag` field, use this value. > + */ > +relation SwitchPortReservedTag(parent_name: string, tags: integer) > + > +SwitchPortReservedTag(parent_name, tag) :- > + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = needs_dynamic_tag, .parent_name = Some{parent_name}), > + Some{var tag} = if (needs_dynamic_tag) { > + lsp.tag > + } else { > + match (lsp.tag_request) { > + Some{req} -> Some{req}, > + None -> lsp.tag > + } > + }. > + > +relation SwitchPortReservedTags(parent_name: string, tags: Set<integer>) > + > +SwitchPortReservedTags(parent_name, tags) :- > + SwitchPortReservedTag(parent_name, tag), > + var tags = tag.group_by(parent_name).to_set(). > + > +SwitchPortReservedTags(parent_name, set_empty()) :- > + nb::Logical_Switch_Port(.name = parent_name), > + not SwitchPortReservedTag(.parent_name = parent_name). > + > +/* Allocate tags for ports that require dynamically allocated tags and do not > + * have any yet. > + */ > +relation SwitchPortAllocatedTags(lsp_uuid: uuid, tag: Option<integer>) > + > +SwitchPortAllocatedTags(lsp_uuid, tag) :- > + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = true, .parent_name = Some{parent_name}), > + is_none(lsp.tag), > + var lsps_need_tag = lsp._uuid.group_by(parent_name).to_vec(), > + SwitchPortReservedTags(parent_name, reserved), > + var dyn_tags = allocate_opt(reserved, > + lsps_need_tag, > + 1, /* Tag 0 is invalid for nested containers. */ > + 4095), > + var lsp_tag = FlatMap(dyn_tags), > + (var lsp_uuid, var tag) = lsp_tag. > + > +/* New tag-to-port assignment: > + * Case 1. Statically reserved tag (via `tag_request`), if any. > + * Case 2. Existing tag for ports that require a dynamically allocated tag and already have one. > + * Case 3. Use newly allocated tags (from `SwitchPortAllocatedTags`) for all other ports. > + */ > +relation SwitchPortNewDynamicTag(port: uuid, tag: Option<integer>) > + > +/* Case 1 */ > +SwitchPortNewDynamicTag(lsp._uuid, tag) :- > + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = false), > + var tag = match (lsp.tag_request) { > + Some{0} -> None, > + treq -> treq > + }. > + > +/* Case 2 */ > +SwitchPortNewDynamicTag(lsp._uuid, Some{tag}) :- > + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = true), > + Some{var tag} = lsp.tag. > + > +/* Case 3 */ > +SwitchPortNewDynamicTag(lsp._uuid, tag) :- > + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = true), > + is_none(lsp.tag), > + SwitchPortAllocatedTags(lsp._uuid, tag). > + > +/* IP_Multicast table (only applicable for Switches). */ > +sb::Out_IP_Multicast(._uuid = cfg.datapath, > + .datapath = cfg.datapath, > + .enabled = Some{cfg.enabled}, > + .querier = Some{cfg.querier}, > + .eth_src = cfg.eth_src, > + .ip4_src = cfg.ip4_src, > + .ip6_src = cfg.ip6_src, > + .table_size = Some{cfg.table_size}, > + .idle_timeout = Some{cfg.idle_timeout}, > + .query_interval = Some{cfg.query_interval}, > + .query_max_resp = Some{cfg.query_max_resp}) :- > + &McastSwitchCfg[cfg]. > + > + > +relation PortExists(name: string) > +PortExists(name) :- nb::Logical_Switch_Port(.name = name). > +PortExists(name) :- nb::Logical_Router_Port(.name = name). > + > +sb::Out_Service_Monitor(._uuid = hash128((svc_monitor.port_name, lbvipbackend.ip, lbvipbackend.port, protocol)), > + .ip = "${lbvipbackend.ip}", > + .protocol = Some{protocol}, > + .port = lbvipbackend.port as integer, > + .logical_port = svc_monitor.port_name, > + .src_mac = to_string(svc_monitor_mac), > + .src_ip = svc_monitor.src_ip, > + .options = lbhc.options, > + .external_ids = map_empty()) :- > + SvcMonitorMac(svc_monitor_mac), > + LBVIPBackend[lbvipbackend], > + Some{var svc_monitor} = lbvipbackend.svc_monitor, > + LoadBalancerHealthCheckRef[lbhc], > + PortExists(svc_monitor.port_name), > + set_contains(lbvipbackend.lbvip.lb.health_check, lbhc._uuid), > + lbhc.vip == lbvipbackend.lbvip.vip_key, > + var protocol = default_protocol(lbvipbackend.lbvip.lb.protocol), > + protocol != "sctp". > + > +Warning["SCTP load balancers do not currently support " > + "health checks. Not creating health checks for " > + "load balancer ${uuid2str(lbvipbackend.lbvip.lb._uuid)}"] :- > + LBVIPBackend[lbvipbackend], > + default_protocol(lbvipbackend.lbvip.lb.protocol) == "sctp", > + Some{var svc_monitor} = lbvipbackend.svc_monitor, > + LoadBalancerHealthCheckRef[lbhc], > + set_contains(lbvipbackend.lbvip.lb.health_check, lbhc._uuid), > + lbhc.vip == lbvipbackend.lbvip.vip_key. > diff --git a/northd/ovsdb2ddlog2c b/northd/ovsdb2ddlog2c > new file mode 100755 > index 000000000000..c66ad81073e1 > --- /dev/null > +++ b/northd/ovsdb2ddlog2c > @@ -0,0 +1,127 @@ > +#!/usr/bin/env python3 > +# Copyright (c) 2020 Nicira, Inc. > +# > +# Licensed under the Apache License, Version 2.0 (the "License"); > +# you may not use this file except in compliance with the License. > +# You may obtain a copy of the License at: > +# > +# http://www.apache.org/licenses/LICENSE-2.0 > +# > +# Unless required by applicable law or agreed to in writing, software > +# distributed under the License is distributed on an "AS IS" BASIS, > +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > +# See the License for the specific language governing permissions and > +# limitations under the License. > + > +import getopt > +import sys > + > +import ovs.json > +import ovs.db.error > +import ovs.db.schema > + > +argv0 = sys.argv[0] > + > +def usage(): > + print("""\ > +%(argv0)s: ovsdb schema compiler for northd > +usage: %(argv0)s [OPTIONS] > + > +The following option must be specified: > + -p, --prefix=PREFIX Prefix for declarations in output. > + > +The following ovsdb2ddlog options are supported: > + -f, --schema-file=FILE OVSDB schema file. > + -o, --output-table=TABLE Mark TABLE as output. > + --output-only-table=TABLE Mark TABLE as output-only. DDlog will send updates to this table directly to OVSDB without comparing it with current OVSDB state. > + --ro=TABLE.COLUMN Ignored. > + --rw=TABLE.COLUMN Ignored. > + --output-file=FILE.inc Write output to FILE.inc. If this option is not specified, output will be written to stdout. > + > +The following options are also available: > + -h, --help display this help message > + -V, --version display version information\ > +""" % {'argv0': argv0}) > + sys.exit(0) > + > +if __name__ == "__main__": > + try: > + try: > + options, args = getopt.gnu_getopt(sys.argv[1:], 'p:f:o:hV', > + ['prefix=', > + 'schema-file=', > + 'output-table=', > + 'output-only-table=', > + 'ro=', > + 'rw=', > + 'output-file=']) > + except getopt.GetoptError as geo: > + sys.stderr.write("%s: %s\n" % (argv0, geo.msg)) > + sys.exit(1) > + > + prefix = None > + schema_file = None > + output_tables = set() > + output_only_tables = set() > + output_file = None > + for key, value in options: > + if key in ['-h', '--help']: > + usage() > + elif key in ['-V', '--version']: > + print("ovsdb2ddlog2c (OVN) @VERSION@") > + elif key in ['-p', '--prefix']: > + prefix = value > + elif key in ['-f', '--schema-file']: > + schema_file = value > + elif key in ['-o', '--output-table']: > + output_tables.add(value) > + elif key == '--output-only-table': > + output_only_tables.add(value) > + elif key in ['--ro', '--rw']: > + pass > + elif key == '--output-file': > + output_file = value > + else: > + sys.exit(0) > + > + if schema_file is None: > + sys.stderr.write("%s: missing -f or --schema-file option\n" % argv0) > + sys.exit(1) > + if prefix is None: > + sys.stderr.write("%s: missing -p or --prefix option\n" % argv0) > + sys.exit(1) > + if not output_tables.isdisjoint(output_only_tables): > + example = next(iter(output_tables.intersect(output_only_tables))) > + sys.stderr.write("%s: %s may not be both an output table and " > + "an output-only table\n" % (argv0, example)) > + sys.exit(1) > + > + schema = ovs.db.schema.DbSchema.from_json(ovs.json.from_file( > + schema_file)) > + > + all_tables = set(schema.tables.keys()) > + missing_tables = (output_tables | output_only_tables) - all_tables > + if missing_tables: > + sys.stderr.write("%s: %s is not the name of a table\n" > + % (argv0, next(iter(missing_tables)))) > + sys.exit(1) > + > + f = sys.stdout if output_file is None else open(output_file, "w") > + for name, tables in ( > + ("input_relations", all_tables - output_only_tables), > + ("output_relations", output_tables), > + ("output_only_relations", output_only_tables)): > + f.write("static const char *%s%s[] = {\n" % (prefix, name)) > + for table in sorted(tables): > + f.write(" \"%s\",\n" % table) > + f.write(" NULL,\n") > + f.write("};\n\n") > + if schema_file is not None: > + f.close() > + except ovs.db.error.Error as e: > + sys.stderr.write("%s: %s\n" % (argv0, e)) > + sys.exit(1) > + > +# Local variables: > +# mode: python > +# End: > diff --git a/tests/atlocal.in b/tests/atlocal.in > index 4517ebf72fab..8a3907d65a20 100644 > --- a/tests/atlocal.in > +++ b/tests/atlocal.in > @@ -210,3 +210,10 @@ export OVS_CTL_TIMEOUT > # matter break everything. > ASAN_OPTIONS=detect_leaks=0:abort_on_error=true:log_path=asan:$ASAN_OPTIONS > export ASAN_OPTIONS > + > +# Check whether we should run ddlog tests. > +if test '@DDLOGLIBDIR@' != no; then > + TEST_DDLOG="yes" > +else > + TEST_DDLOG="no" > +fi > diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > index b4dc387e54a4..7e7015380758 100644 > --- a/tests/ovn-macros.at > +++ b/tests/ovn-macros.at > @@ -460,4 +460,7 @@ m4_define([OVN_FOR_EACH_NORTHD], [dnl > m4_pushdef([NORTHD_TYPE], [ovn-northd])dnl > $1 > m4_popdef([NORTHD_TYPE])dnl > +m4_pushdef([NORTHD_TYPE], [ovn-northd-ddlog])dnl > +$1 > +m4_popdef([NORTHD_TYPE])dnl > ]) > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index 972ff5c626a3..7d73b0b835a1 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -704,6 +704,103 @@ check_row_count Datapath_Binding 1 > AT_CLEANUP > ]) > > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([ovn -- ovn-northd restart]) > +ovn_start --no-backup-northd > + > +# Check that ovn-northd is active, by verifying that it creates and > +# destroys southbound datapaths as one would expect. > +check_row_count Datapath_Binding 0 > +check ovn-nbctl --wait=sb ls-add sw0 > +check_row_count Datapath_Binding 1 > + > +# Kill northd. > +as northd > +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) > + > +# With ovn-northd gone, changes to nbdb won't be reflected into sbdb. > +# Make sure. > +check ovn-nbctl ls-add sw1 > +sleep 5 > +check_row_count Datapath_Binding 1 > + > +# Now resume ovn-northd. Changes should catch up. > +ovn_start_northd primary > +wait_row_count Datapath_Binding 2 > + > +AT_CLEANUP > +]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([ovn -- northbound database reconnection]) > +ovn_start --no-backup-northd > + > +# Check that ovn-northd is active, by verifying that it creates and > +# destroys southbound datapaths as one would expect. > +check_row_count Datapath_Binding 0 > +check ovn-nbctl --wait=sb ls-add sw0 > +check_row_count Datapath_Binding 1 > +lf=$(count_rows Logical_Flow) > + > +# Make nbdb ovsdb-server drop connection from ovn-northd. > +conn=$(as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/list-remotes) > +check as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/remove-remote "$conn" > +conn2=punix:`pwd`/special.sock > +check as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn2" > + > +# ovn-northd won't respond to changes (because the nbdb connection dropped). > +check ovn-nbctl --db="${conn2#p}" ls-add sw1 > +sleep 5 > +check_row_count Datapath_Binding 1 > +check_row_count Logical_Flow $lf > + > +# Now re-enable the nbdb connection and observe ovn-northd catch up. > +# > +# It's important to check both Datapath_Binding and Logical_Flow because > +# ovn-northd-ddlog implements them in different ways that might go wrong > +# differently on reconnection. > +check as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn" > +wait_row_count Datapath_Binding 2 > +wait_row_count Logical_Flow $(expr 2 \* $lf) > + > +AT_CLEANUP > +]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([ovn -- southbound database reconnection]) > +ovn_start --no-backup-northd > + > +# Check that ovn-northd is active, by verifying that it creates and > +# destroys southbound datapaths as one would expect. > +check_row_count Datapath_Binding 0 > +check ovn-nbctl --wait=sb ls-add sw0 > +check_row_count Datapath_Binding 1 > +lf=$(count_rows Logical_Flow) > + > +# Make sbdb ovsdb-server drop connection from ovn-northd. > +conn=$(as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/list-remotes) > +check as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/remove-remote "$conn" > +conn2=punix:`pwd`/special.sock > +check as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn2" > + > +# ovn-northd can't respond to changes (because the sbdb connection dropped). > +check ovn-nbctl ls-add sw1 > +sleep 5 > +OVN_SB_DB=${conn2#p} check_row_count Datapath_Binding 1 > +OVN_SB_DB=${conn2#p} check_row_count Logical_Flow $lf > + > +# Now re-enable the sbdb connection and observe ovn-northd catch up. > +# > +# It's important to check both Datapath_Binding and Logical_Flow because > +# ovn-northd-ddlog implements them in different ways that might go wrong > +# differently on reconnection. > +check as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn" > +wait_row_count Datapath_Binding 2 > +wait_row_count Logical_Flow $(expr 2 \* $lf) > + > +AT_CLEANUP > +]) > + > OVN_FOR_EACH_NORTHD([ > AT_SETUP([ovn -- check Redirect Chassis propagation from NB to SB]) > ovn_start > diff --git a/tests/ovn.at b/tests/ovn.at > index 3d2b7a7989a7..8274d2185b10 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -16820,6 +16820,10 @@ AT_CLEANUP > > OVN_FOR_EACH_NORTHD([ > AT_SETUP([ovn -- IGMP snoop/querier/relay]) > + > +dnl This test has problems with ovn-northd-ddlog. > +AT_SKIP_IF([test NORTHD_TYPE = ovn-northd-ddlog && test "$RUN_ANYWAY" != yes]) > + > ovn_start > > # Logical network: > @@ -17486,6 +17490,10 @@ AT_CLEANUP > > OVN_FOR_EACH_NORTHD([ > AT_SETUP([ovn -- MLD snoop/querier/relay]) > + > +dnl This test has problems with ovn-northd-ddlog. > +AT_SKIP_IF([test NORTHD_TYPE = ovn-northd-ddlog && test "$RUN_ANYWAY" != yes]) > + > ovn_start > > # Logical network: > @@ -20187,6 +20195,10 @@ AT_CLEANUP > > OVN_FOR_EACH_NORTHD([ > AT_SETUP([ovn -- interconnection]) > + > +dnl This test has problems with ovn-northd-ddlog. > +AT_SKIP_IF([test NORTHD_TYPE = ovn-northd-ddlog && test "$RUN_ANYWAY" != yes]) > + > ovn_init_ic_db > n_az=5 > n_ts=5 > diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at > index 8cdc0d640cc2..a1727f9d3fd8 100644 > --- a/tests/ovs-macros.at > +++ b/tests/ovs-macros.at > @@ -7,11 +7,14 @@ dnl Make AT_SETUP automatically do some things for us: > dnl - Run the ovs_init() shell function as the first step in every test. > dnl - If NORTHD_TYPE is defined, then append it to the test name and > dnl set it as a shell variable as well. > +dnl - Skip the test if it's for ovn-northd-ddlog but it didn't get built. > m4_rename([AT_SETUP], [OVS_AT_SETUP]) > m4_define([AT_SETUP], > [OVS_AT_SETUP($@[]m4_ifdef([NORTHD_TYPE], [ -- NORTHD_TYPE])) > m4_ifdef([NORTHD_TYPE], [[NORTHD_TYPE]=NORTHD_TYPE > -AT_SKIP_IF([test $NORTHD_TYPE = ovn-northd-ddlog && test $TEST_DDLOG = no]) > +])dnl > +m4_if(NORTHD_TYPE, [ovn-northd-ddlog], [dnl > +AT_SKIP_IF([test $TEST_DDLOG = no]) > ])dnl > ovs_init > ]) > diff --git a/tutorial/ovs-sandbox b/tutorial/ovs-sandbox > index 1841776a476d..676314b21151 100755 > --- a/tutorial/ovs-sandbox > +++ b/tutorial/ovs-sandbox > @@ -72,6 +72,7 @@ schema= > installed=false > built=false > ovn=true > +ddlog=false > ovnsb_schema= > ovnnb_schema= > ic_sb_schema= > @@ -143,6 +144,7 @@ General options: > -S, --schema=FILE use FILE as vswitch.ovsschema > > OVN options: > + --ddlog use ovn-northd-ddlog > --no-ovn-rbac disable role-based access control for OVN > --n-northds=NUMBER run NUMBER copies of northd (default: 1) > --n-ics=NUMBER run NUMBER copies of ic (default: 1) > @@ -234,6 +236,9 @@ EOF > --gdb-ovn-controller-vtep) > gdb_ovn_controller_vtep=true > ;; > + --ddlog) > + ddlog=true > + ;; > --no-ovn-rbac) > ovn_rbac=false > ;; > @@ -609,12 +614,23 @@ for i in $(seq $n_ics); do > --ovnsb-db="$OVN_SB_DB" --ovnnb-db="$OVN_NB_DB" \ > --ic-sb-db="$OVN_IC_SB_DB" --ic-nb-db="$OVN_IC_NB_DB" > done > + > +northd_args= > +if $ddlog; then > + OVN_NORTHD=ovn-northd-ddlog > +else > + OVN_NORTHD=ovn-northd > +fi > + > for i in $(seq $n_northds); do > if [ $i -eq 1 ]; then inst=""; else inst=$i; fi > - rungdb $gdb_ovn_northd $gdb_ovn_northd_ex ovn-northd --detach \ > - --no-chdir --pidfile=ovn-northd${inst}.pid -vconsole:off \ > - --log-file=ovn-northd${inst}.log -vsyslog:off \ > - --ovnsb-db="$OVN_SB_DB" --ovnnb-db="$OVN_NB_DB" > + if $ddlog; then > + northd_args=--ddlog-record=replay$inst.txt > + fi > + rungdb $gdb_ovn_northd $gdb_ovn_northd_ex $OVN_NORTHD --detach \ > + --no-chdir --pidfile=$OVN_NORTHD$inst.pid -vconsole:off \ > + --log-file=$OVN_NORTHD$inst.log -vsyslog:off \ > + --ovnsb-db="$OVN_SB_DB" --ovnnb-db="$OVN_NB_DB" $northd_args > done > for i in $(seq $n_controllers); do > if [ $i -eq 1 ]; then inst=""; else inst=$i; fi > diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py > index 981a433be9cc..fa2a382f1d14 100755 > --- a/utilities/checkpatch.py > +++ b/utilities/checkpatch.py > @@ -184,7 +184,7 @@ skip_signoff_check = False > # > # Python isn't checked as flake8 performs these checks during build. > line_length_blacklist = re.compile( > - r'\.(am|at|etc|in|m4|mk|patch|py)$|debian/rules') > + r'\.(am|at|etc|in|m4|mk|patch|py|dl)|$|debian/rules') > > # Don't enforce a requirement that leading whitespace be all spaces on > # files that include these characters in their name, since these kinds > diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl > index c44201ccfb3e..92f03815fa57 100755 > --- a/utilities/ovn-ctl > +++ b/utilities/ovn-ctl > @@ -458,10 +458,10 @@ start_northd () { > ovn_northd_params="`cat $ovn_northd_db_conf_file`" > fi > > - if daemon_is_running ovn-northd; then > - log_success_msg "ovn-northd is already running" > + if daemon_is_running $OVN_NORTHD_BIN; then > + log_success_msg "$OVN_NORTHD_BIN is already running" > else > - set ovn-northd > + set $OVN_NORTHD_BIN > if test X"$OVN_NORTHD_LOGFILE" != X; then > set "$@" --log-file=$OVN_NORTHD_LOGFILE > fi > @@ -571,7 +571,7 @@ start_controller_vtep () { > ## ---- ## > > stop_northd () { > - OVS_RUNDIR=${OVS_RUNDIR} stop_ovn_daemon ovn-northd > + OVS_RUNDIR=${OVS_RUNDIR} stop_ovn_daemon $OVN_NORTHD_BIN > > if [ ! -e $ovn_northd_db_conf_file ]; then > if test X"$OVN_MANAGE_OVSDB" = Xyes; then > @@ -714,6 +714,7 @@ set_defaults () { > OVN_CONTROLLER_WRAPPER= > OVSDB_NB_WRAPPER= > OVSDB_SB_WRAPPER= > + OVN_NORTHD_DDLOG=no > > OVN_USER= > > @@ -932,6 +933,8 @@ Options: > --ovs-user="user[:group]" pass the --user flag to ovs daemons > --ovsdb-nb-wrapper=WRAPPER run with a wrapper like valgrind for debugging > --ovsdb-sb-wrapper=WRAPPER run with a wrapper like valgrind for debugging > + --ovn-northd-ddlog=yes|no whether we should run the DDlog version > + of ovn-northd. The default is "no". > -h, --help display this help message > > File location options: > @@ -1087,6 +1090,13 @@ do > ;; > esac > done > + > +if test X"$OVN_NORTHD_DDLOG" = Xyes; then > + OVN_NORTHD_BIN=ovn-northd-ddlog > +else > + OVN_NORTHD_BIN=ovn-northd > +fi > + > case $command in > start_northd) > start_northd > @@ -1179,7 +1189,7 @@ case $command in > restart_ic_sb_ovsdb > ;; > status_northd) > - daemon_status ovn-northd || exit 1 > + daemon_status $OVN_NORTHD_BIN || exit 1 > ;; > status_ovsdb) > status_ovsdb > -- > 2.26.2 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev

diff --git a/Documentation/automake.mk b/Documentation/automake.mk index e0f39b33fdf4..b3fd3d62b33b 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -20,12 +20,14 @@ DOC_SOURCE = \ Documentation/tutorials/ovn-ipsec.rst \ Documentation/tutorials/ovn-rbac.rst \ Documentation/tutorials/ovn-interconnection.rst \ + Documentation/tutorials/ddlog-new-feature.rst \ Documentation/topics/index.rst \ Documentation/topics/testing.rst \ Documentation/topics/high-availability.rst \ Documentation/topics/integration.rst \ Documentation/topics/ovn-news-2.8.rst \ Documentation/topics/role-based-access-control.rst \ + Documentation/topics/debugging-ddlog.rst \ Documentation/howto/index.rst \ Documentation/howto/docker.rst \ Documentation/howto/firewalld.rst \ diff --git a/Documentation/intro/install/general.rst b/Documentation/intro/install/general.rst index 65b1f4a40e8a..e748ab430eae 100644 --- a/Documentation/intro/install/general.rst +++ b/Documentation/intro/install/general.rst @@ -89,6 +89,13 @@ need the following software: The environment variable OVS_RESOLV_CONF can be used to specify DNS server configuration file (the default file on Linux is /etc/resolv.conf). +- `DDlog <https://github.com/vmware/differential-datalog>`, if you + want to build ``ovn-northd-ddlog``, an alternate implementation of + ``ovn-northd`` that scales better to large deployments. The NEWS + file specifies the right version of DDlog to use with this release. + Building with DDlog supports requires Rust to be installed (see + https://www.rust-lang.org/tools/install). + If you are working from a Git tree or snapshot (instead of from a distribution tarball), or if you modify the OVN build system or the database schema, you will also need the following software: @@ -176,6 +183,14 @@ the default database directory, add options as shown here:: ``yum install`` or ``rpm -ivh``) and .deb (e.g. via ``apt-get install`` or ``dpkg -i``) use the above configure options. +To build with DDlog support, add ``--with-ddlog=<path to ddlog>/lib`` +to the ``configure`` command line. Building with DDLog adds a few +minutes to the build because the Rust compiler is slow. To speed this +up by about 2x, also add ``--enable-ddlog-fast-build``. This disables +some Rust compiler optimizations, making a much slower +``ovn-northd-ddlog`` executable, so it should not be used for +production builds or for profiling. + By default, static libraries are built and linked against. If you want to use shared libraries instead:: @@ -353,6 +368,14 @@ An example after install might be:: $ ovn-ctl start_northd $ ovn-ctl start_controller +If you built with DDlog support, then you can start +``ovn-northd-ddlog`` instead of ``ovn-northd`` by adding +``--ovn-northd-ddlog=yes``, e.g.:: + + $ export PATH=$PATH:/usr/local/share/ovn/scripts + $ ovn-ctl --ovn-northd-ddlog=yes start_northd + $ ovn-ctl start_controller + Starting OVN Central services ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -403,11 +426,15 @@ it at any time is harmless:: $ ovn-nbctl --no-wait init $ ovn-sbctl --no-wait init -Start the ovn-northd, telling it to connect to the OVN db servers same Unix -domain socket:: +Start ``ovn-northd``, telling it to connect to the OVN db servers same +Unix domain socket:: $ ovn-northd --pidfile --detach --log-file +If you built with DDlog support, you can start ``ovn-northd-ddlog`` +instead, the same way:: + + $ ovn-northd-ddlog --pidfile --detach --log-file Starting OVN Central services in containers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/topics/debugging-ddlog.rst b/Documentation/topics/debugging-ddlog.rst new file mode 100644 index 000000000000..046419b995f1 --- /dev/null +++ b/Documentation/topics/debugging-ddlog.rst @@ -0,0 +1,280 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in OVN documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +========================================= +Debugging the DDlog version of ovn-northd +========================================= + +This document gives some tips for debugging correctness issues in the +DDlog implementation of ``ovn-northd``. To keep things conrete, we +assume here that a failure occurred in one of the test cases in +``ovn-e2e.at``, but the same methodology applies in any other +environment. If none of these methods helps, ask for assistance or +submit a bug report. + +Before trying these methods, you may want to check the northd log +file, ``tests/testsuite.dir/<test_number>/northd/ovn-northd.log`` for +error messages that might explain the failure. + +Compare OVSDB tables generated by DDlog vs C +-------------------------------------------- + +The first thing I typically want to check when ``ovn-northd-ddlog`` +does not behave as expected is how the OVSDB tables computed by DDlog +differ from what the C implementation produces. Fortunately, all the +infrastructure needed to do this already exists in OVN. + +First, let's modify the test script, e.g., ``ovn.at`` to dump the +contents of OVSDB right before the failure. The most common issue is +a difference between the logical flows generated by the two +implementations. To make it easy to compare the generated flows, make +sure that the test contains something like this in the right place:: + + ovn-sbctl dump-flows > sbflows + AT_CAPTURE_FILE([sbflows]) + +The first line above dumps the OVN logical flow table to a file named +``sbflows``. The second line ensures that, if the test fails, +``sbflows`` get logged to ``testsuite.log``. That is not particularly +useful for us right now, but it means that if someone later submits a +bug report, that's one more piece of data that we don't have to ask +for them to submit along with it. + +Next, we want to run the test twice, with the C and DDlog versions of +northd, e.g., ``make check -j6 TESTSUITEFLAGS="-d 111 112"`` if 111 +and 112 are the C and DDlog versions of the same test. The ``-d`` in +this command line makes the test driver keep test directories around +even for tests that succeed, since by default it deletes them. + +Now you can look at ``sbflows`` in each test log directory. The +``ovn-northd-ddlog`` developers have gone to some trouble to make the +DDlog flows as similar as possible to the C ones, right down to white +space and other formatting. Thus, the DDlog output is often identical +to C aside from logical datapath UUIDs. + +Usually, this means that one can get informative results by running +``diff``, e.g.:: + + diff -u tests/testsuite.dir/111/sbflows tests/testsuite.dir/111/sbflows + +Running the input through the ``uuidfilt`` utility from OVS will +generally get rid of the logical datapath UUID differences as well:: + + diff -u <(uuidfilt tests/testsuite.dir/111/sbflows) <(uuidfilt tests/testsuite.dir/111/sbflows) + +If there are nontrivial differences, this often identifies your bug. + +Often, once you have identified the difference between the two OVSDB +dumps, this will immediately lead you to the root cause of the bug, +but if you are not this lucky then the next method may help. + +Record and replay DDlog execution +--------------------------------- + +DDlog offers a way to record all input table updates throughout the +execution of northd and replay them against DDlog running as a +standalone executable without all other OVN components. This has two +advantages. First, this allows one to easily tweak the inputs, e.g. +to simplify the test scenario. Second, the recorded execution can be +easily replayed anywhere without having to reproduce your OVN setup. + +Use the ``--ddlog-record`` option to record updates, +e.g. ``--ddlog-record=replay.dat`` to record to ``replay.dat``. +(OVN's built-in tests automatically do this.) The file contains the +log of transactions in the DDlog command format (see +https://github.com/vmware/differential-datalog/blob/master/doc/command_reference/command_reference.md). + +To replay the log, you will need the standalone DDlog executable. By +default, the build system does not compile this program, because it +increases the already long Rust compilation time. To build it, add +``NORTHD_CLI=1`` to the ``make`` command line, e.g. ``make +NORTHD_CLI=1``. + +You can modify the log before replaying it, e.g., adding ``dump +<table>`` commands to dump the contents of relations at various points +during execution. The <table> name must be fully qualified based on +the file in which it is declared, e.g. ``OVN_Southbound::<table>`` for +southbound tables or ``lrouter::<table>.`` for ``lrouter.dl``. You +can also use ``dump`` without an argument to dump the contents of all +tables. + +The following command replays the log generated by OVN test number +112 and dumps the output of DDlog to ``replay.dump``:: + + ovn/northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/112/northd/replay.dat > replay.dump + +Or, to dump table contents following the run, without having to edit +``replay.dat``:: + + (cat tests/testsuite.dir/112/northd/replay.dat; echo 'dump;') | ovn/northd/ovn_northd_ddlog/target/release/ovn_northd_cli --no-init-snapshot > replay.dump + +Depending on whether and how you installed OVS and OVN, you might need +to point ``LD_LIBRARY_PATH`` to library build directories to get the +CLI to run, e.g.:: + + export LD_LIBRARY_PATH=$HOME/ovn/_build/lib/.libs:$HOME/ovs/_build/lib/.libs + +.. note:: + + The replay output may be less informative than you expect because + DDlog does not, by default, keep around enough information to + include input relation and intermediate relations in the output. + These relations are often critical to understanding what is going + on. To include them, add the options + ``--output-internal-relations --output-input-relations=In_`` to + ``DDLOG_EXTRA_FLAGS`` for building ``ovn-northd-ddlog``. For + example, ``configure`` as:: + + ./configure DDLOG_EXTRA_FLAGS='--output-internal-relations --output-input-relations=In_' + +Debugging by Logging +-------------------- + +One limitation of the previous method is that it allows one to inspect +inputs and outputs of a rule, but not the (sometimes fairly +complicated) computation that goes on inside the rule. You can of +course break up the rule into several rules and dump the intermediate +outputs. + +There are at least two alternatives for generating log messages. +First, you can write rules to add strings to the Warning relation +declared in ``ovn_north.dl``. Code in ``ovn-northd-ddlog.c`` will log +any given string in this relation just once, when it is first added to +the relation. (If it is removed from the relation and then added back +later, it will be logged again.) + +Second, you can call using the ``warn()`` function declared in +``ovn.dl`` from a DDlog rule. It's not straightforward to know +exactly when this function will be called, like it would be in an +imperative language like C, since DDlog is a declarative language +where the user doesn't directly control when rules are triggered. You +might, for example, see the rule being triggered multiple times with +the same input. Nevertheless, this debugging technique is useful in +practice. + +You will find many examples of the use of Warning and ``warn`` in +``ovn_northd.dl``, where it is frequently used to report non-critical +errors. + +Debugging panics +---------------- + +**TODO**: update these instructions as DDlog's internal handling of panic's +is improved. + +DDlog is a safe language, so DDlog programs normally do not crash, +except for the following three cases: + +- A panic in a Rust function imported to DDlog as ``extern function``. + +- A panic in a C function imported to DDlog as ``extern function``. + +- A bug in the DDlog runtime or libraries. + +Below we walk through the steps involved in debugging such failures. +In this scenario, there is an array-index-out-of-bounds error in the +``ovn_scan_static_dynamic_ip6()`` function, which is written in Rust +and imported to DDlog as an ``extern function``. When invoked from a +DDlog rule, this function causes a panic in one of DDlog worker +threads. + +**Step 1: Check for error messages in the northd log.** A panic can +generally lead to unpredictable outcomes, so one cannot count on a +clean error message showing up in the log (Other outcomes include +crashing the entire process and even deadlocks. We are working to +eliminate the latter possibility). In this case we are lucky to +observe a bunch of error messages like the following in the ``northd`` +log: + + ``2019-09-23T16:23:24.549Z|00011|ovn_northd|ERR|ddlog_transaction_commit(): + error: failed to receive flush ack message from timely dataflow + thread`` + +These messages are telling us that something is broken inside the +DDlog runtime. + +**Step 2: Record and replay the failing scenario.** We use DDlog's +record/replay capabilities (see above) to capture the faulty scenario. +We replay the recorded trace:: + + northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat + +This generates a bunch of output ending with:: + + thread 'worker thread 2' panicked at 'index out of bounds: the len is 1 but the index is 1', /rustc/eae3437dfe991621e8afdc82734f4a172d7ddf9b/src/libcore/slice/mod.rs:2681:10 + note: run with RUST_BACKTRACE=1 environment variable to display a backtrace. + +We re-run the CLI again with backtrace enabled (as suggested by the +error message):: + + RUST_BACKTRACE=1 northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat + +This finally yields the following stack trace, which suggests array +bound violation in ``ovn_scan_static_dynamic_ip6``:: + + 0: backtrace::backtrace::libunwind::trace + at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29 10: core::panicking::panic_bounds_check + at src/libcore/panicking.rs:61 + [SKIPPED] + 11: ovn_northd_ddlog::__ovn::ovn_scan_static_dynamic_ip6 + 12: ovn_northd_ddlog::prog::__f + [SKIPPED] + +Finally, looking at the source code of +``ovn_scan_static_dynamic_ip6``, we identify the following line, +containing an unsafe array indexing operator, as the culprit:: + + ovn_ipv6_parse(&f[1].to_string()) + +Clean build +~~~~~~~~~~~ + +Occasionally it's desirable to a full and complete build of the +DDlog-generated code. To trigger that, delete the generated +``ovn_northd_ddlog`` directory and the ``ddlog.stamp`` witness file, +like this:: + + rm -rf northd/ovn_northd_ddlog northd/ddlog.stamp + +or:: + + make clean-ddlog + +Submitting a bug report +----------------------- + +If you are having trouble with DDlog and the above methods do not +help, please submit a bug report to ``bugs@openvswitch.org``, CC +``ryzhyk@gmail.com``. + +In addition to problem description, please provide as many of the +following as possible: + +- Are you running with the right DDlog for the version of OVN? OVN + and DDlog are both evolving and OVN needs to build against a + specific version of DDlog. + +- ``replay.dat`` file generated as described above + +- Logs: ``ovn-northd.log`` and ``testsuite.log``, if you are running + the OVN test suite diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst index 3b689cf53eae..d58d5618b2db 100644 --- a/Documentation/topics/index.rst +++ b/Documentation/topics/index.rst @@ -36,6 +36,7 @@ OVN .. toctree:: :maxdepth: 2 + debugging-ddlog integration.rst high-availability role-based-access-control diff --git a/Documentation/tutorials/ddlog-new-feature.rst b/Documentation/tutorials/ddlog-new-feature.rst new file mode 100644 index 000000000000..02876db66d74 --- /dev/null +++ b/Documentation/tutorials/ddlog-new-feature.rst @@ -0,0 +1,362 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in OVN documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +=========================================================== +Adding a new OVN feature to the DDlog version of ovn-northd +=========================================================== + +This document describes the usual steps an OVN developer should go +through when adding a new feature to ``ovn-northd-ddlog``. In order to +make things less abstract we will use the IP Multicast +``ovn-northd-ddlog`` implementation as an example. Even though the +document is structured as a tutorial there might still exist +feature-specific aspects that are not covered here. + +Overview +-------- + +DDlog is a dataflow system: it receives data from a data source (a set +of "input relations"), processes it through "intermediate relations" +according to the rules specified in the DDlog program, and sends the +processed "output relations" to a data sink. In OVN, the input +relations primarily come from the OVN Northbound database and the +output relations primarily go to the OVN Southbound database. The +process looks like this:: + + from NBDB +----------+ +-----------------+ +-----------+ to SBDB + ---------->|Input rels|-->|Intermediate rels|-->|Output rels|----------> + +----------+ +-----------------+ +-----------+ + +Adding a new feature to ``ovn-northd-ddlog`` usually involves the +following steps: + +1. Update northbound and/or southbound OVSDB schemas. + +2. Configure DDlog/OVSDB bindings. + +3. Define intermediate DDlog relations and rules to compute them. + +4. Write rules to update output relations. + +5. Generate ``Logical_Flow``s and/or other forwarding records (e.g., + ``Multicast_Group``) that will control the dataplane operations. + +Update NB and/or SB OVSDB schemas +--------------------------------- + +This step is no different from the normal development flow in C. + +Most of the times a developer chooses between two ways of configuring +a new feature: + +1. Adding a set of columns to tables in the NB and/or SB database (or + adding key-value pairs to existing columns). + +2. Adding new tables to the NB and/or SB database. + +Looking at IP Multicast, there are two ``OVN Northbound`` tables where +configuration information is stored: + +- ``Logical_Switch``, column ``other_config``, keys ``mcast_*``. + +- ``Logical_Router``, column ``options``, keys ``mcast_*``. + +These tables become inputs to the DDlog pipeline. + +In addition we add a new table ``IP_Multicast`` to the SB database. +DDlog will update this table, that is, ``IP_Multicast`` receives +output from the above pipeline. + +Configuring DDlog/OVSDB bindings +-------------------------------- + +Configuring ``northd/automake.mk`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The OVN build process uses DDlog's ``ovsdb2ddlog`` utility to parse +``ovn-nb.ovsschema`` and ``ovn-sb.ovsschema`` and then automatically +populate ``OVN_Northbound.dl`` and ``OVN_Southbound.dl``. For each +OVN Northbound and Southbound table, it generates one or more +corresponding DDlog relations. + +We need to supply ``ovsdb2ddlog`` with some information that it can't +infer from the OVSDB schemas. This information must be specified as +``ovsdb2ddlog`` arguments, which are read from +``northd/ovn-nb.dlopts`` and ``northd/ovn-sb.dlopts``. + +The main choice for each new table is whether it is used for output. +Output tables can also be used for input, but the converse is not +true. If the table is used for output at all, we add ``-o <table>`` +to the option file. Our new table ``IP_Multicast`` is an output +table, so we add ``-o IP_Multicast`` to ``ovn-sb.dlopts``. + +For input-only tables, ``ovsdb2ddlog`` generates a DDlog input +relation with the same name. For output tables, it generates this +table plus an output relation named ``Out_<table>``. Thus, +``OVN_Southbound.dl`` has two relations for ``IP_Multicast``:: + + input relation IP_Multicast ( + _uuid: uuid, + datapath: string, + enabled: Set<bool>, + querier: Set<bool> + ) + output relation Out_IP_Multicast ( + _uuid: uuid, + datapath: string, + enabled: Set<bool>, + querier: Set<bool> + ) + +For an output table, consider whether only some of the columns are +used for output, that is, some of the columns are effectively +input-only. This is common in OVN for OVSDB columns that are managed +externally (e.g. by a CMS). For each input-only column, we add ``--ro +<table>.<column>``. Alternatively, if most of the columns are +input-only but a few are output columns, add ``--rw <table>.<column>`` +for each of the output columns. In our case, all of the columns are +used for output, so we do not need to add anything. + +Finally, in some cases ``ovn-northd-ddlog`` shouldn't change values in +. One such case is the ``seq_no`` column in the +``IP_Multicast`` table. To do that we need to instruct ``ovsdb2ddlog`` +to treat the column as read-only by using the ``--ro`` switch. + +``ovsdb2ddlog`` generates a number of additional DDlog relations, for +use by auto-generated OVSDB adapter logic. These are irrelevant to +most DDLog developers, although sometimes they can be handy for +debugging. See the appendix_ for details. + +Define intermediate DDlog relations and rules to compute them. +-------------------------------------------------------------- + +Obviously there will be a one-to-one relationship between logical +switches/routers and IP multicast configuration. One way to represent +this relationship is to create multicast configuration DDlog relations +to be referenced by ``&Switch`` and ``&Router`` DDlog records:: + + /* IP Multicast per switch configuration. */ + relation &McastSwitchCfg( + datapath : uuid, + enabled : bool, + querier : bool + } + + &McastSwitchCfg( + .datapath = ls_uuid, + .enabled = map_get_bool_def(other_config, "mcast_snoop", false), + .querier = map_get_bool_def(other_config, "mcast_querier", true)) :- + nb.Logical_Switch(._uuid = ls_uuid, + .other_config = other_config). + +Then reference these relations in ``&Switch`` and ``&Router``. For +example, in ``lswitch.dl``, the ``&Switch`` relation definition now +contains:: + + relation &Switch( + ls: nb.Logical_Switch, + [...] + mcast_cfg: Ref<McastSwitchCfg> + ) + +And is populated by the following rule which references the correct +``McastSwitchCfg`` based on the logical switch uuid:: + + &Switch(.ls = ls, + [...] + .mcast_cfg = mcast_cfg) :- + nb.Logical_Switch[ls], + [...] + mcast_cfg in &McastSwitchCfg(.datapath = ls._uuid). + +Build state based on information dynamically updated by ``ovn-controller`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some OVN features rely on information learned by ``ovn-controller`` to +generate ``Logical_Flow`` or other records that control the dataplane. +In case of IP Multicast, ``ovn-controller`` uses IGMP to learn +multicast groups that are joined by hosts. + +Each ``ovn-controller`` maintains its own set of records to avoid +ownership and concurrency with other controllers. If two hosts that +are connected to the same logical switch but reside on different +hypervisors (different ``ovn-controller`` processes) join the same +multicast group G, each of the controllers will create an +``IGMP_Group`` record in the ``OVN Southbound`` database which will +contain a set of ports to which the interested hosts are connected. + +At this point ``ovn-northd-ddlog`` needs to aggregate the per-chassis +IGMP records to generate a single ``Logical_Flow`` for group G. +Moreover, the ports on which the hosts are connected are represented +as references to ``Port_Binding`` records in the database. These also +need to be translated to ``&SwitchPort`` DDlog relations. The +corresponding DDlog operations that need to be performed are: + +- Flatten the ``<IGMP group, ports>`` mapping in order to be able to + do the translation from ``Port_Binding`` to ``&SwitchPort``. For + each ``IGMP_Group`` record in the ``OVN Southbound`` database + generate an individual record of type ``IgmpSwitchGroupPort`` for + each ``Port_Binding`` in the set of ports that joined the + group. Also, translate the ``Port_Binding`` uuid to the + corresponding ``Logical_Switch_Port`` uuid:: + + relation IgmpSwitchGroupPort( + address: string, + switch : Ref<Switch>, + port : uuid + ) + + IgmpSwitchGroupPort(address, switch, lsp_uuid) :- + sb::IGMP_Group(.address = address, .datapath = igmp_dp_set, + .ports = pb_ports), + var pb_port_uuid = FlatMap(pb_ports), + sb::Port_Binding(._uuid = pb_port_uuid, .logical_port = lsp_name), + &SwitchPort( + .lsp = nb.Logical_Switch_Port{._uuid = lsp_uuid, .name = lsp_name}, + .sw = switch). + +- Aggregate the flattened IgmpSwitchGroupPort (implicitly from all + ``ovn-controller`` instances) grouping by adress and logical + switch:: + + relation IgmpSwitchMulticastGroup( + address: string, + switch : Ref<Switch>, + ports : Set<uuid> + ) + + IgmpSwitchMulticastGroup(address, switch, ports) :- + IgmpSwitchGroupPort(address, switch, port), + var ports = port.group_by((address, switch)).to_set(). + +At this point we have all the feature configuration relevant +information stored in DDlog relations in ``ovn-northd-ddlog`` memory. + +Write rules to update output relations +-------------------------------------- + +The developer updates output tables by writing rules that generate +``Out_*`` relations. For IP Multicast this means:: + + /* IP_Multicast table (only applicable for Switches). */ + sb::Out_IP_Multicast(._uuid = hash128(cfg.datapath), + .datapath = cfg.datapath, + .enabled = set_singleton(cfg.enabled), + .querier = set_singleton(cfg.querier)) :- + &McastSwitchCfg[cfg]. + +.. note:: ``OVN_Southbound.dl`` also contains an ``IP_Multicast`` + relation with ``input`` qualifier. This relation stores the + current snapshot of the OVSDB table and cannot be written to. + +Generate ``Logical_Flow`` and/or other forwarding records +--------------------------------------------------------- + +At this point we have defined all DDlog relations required to generate +``Logical_Flow``s. All we have to do is write the rules to do so. +For each ``IgmpSwitchMulticastGroup`` we generate a ``Flow`` that has +as action ``"outport = <Multicast_Group>; output;"``:: + + /* Ingress table 17: Add IP multicast flows learnt from IGMP (priority 90). */ + for (IgmpSwitchMulticastGroup(.address = address, .switch = &sw)) { + Flow(.logical_datapath = sw.dpname, + .stage = switch_stage(IN, L2_LKUP), + .priority = 90, + .__match = "eth.mcast && ip4 && ip4.dst == ${address}", + .actions = "outport = \"${address}\"; output;", + .external_ids = map_empty()) + } + +In some cases generating a logical flow is not enough. For IGMP we +also need to maintain OVN southbound ``Multicast_Group`` records, +one per IGMP group storing the corresponding ``Port_Binding`` uuids of +ports where multicast traffic should be sent. This is also relatively +straightforward:: + + /* Create a multicast group for each IGMP group learned by a Switch. + * 'tunnel_key' == 0 triggers an ID allocation later. + */ + sb::Out_Multicast_Group (.datapath = switch.dpname, + .name = address, + .tunnel_key = 0, + .ports = set_map_uuid2name(port_ids)) :- + IgmpSwitchMulticastGroup(address, &switch, port_ids). + +We must also define DDlog relations that will allocate ``tunnel_key`` +values. There are two cases: tunnel keys for records that already +existed in the database are preserved to implement stable id +allocation; new multicast groups need new keys. This kind of +allocation can be tricky, especially to new users of DDlog. OVN +contains multiple instances of allocation, so it's probably worth +reading through the existing cases and following their pattern, and, +if it's still tricky, asking for assistance. + +Appendix A. Additional relations generated by ``ovsdb2ddlog`` +------------------------------------------------------------- + +.. _appendix: + +ovsdb2ddlog generates some extra relations to manage communication +with the OVSDB server. It generates records in the following +relations when rows in OVSDB output tables need to be added or deleted +or updated. + +In the steady state, when everything is working well, a given record +stays in any one of these relations only briefly: just long enough for +``ovn-northd-ddlog`` to send a transaction to the OVSDB server. When +the OVSDB server applies the update and sends an acknowledgement, this +ordinarily means that these relations become empty, because there are +no longer any further changes to send. + +Thus, records that persist in one of these relations is a sign of a +problem. One example of such a problem is the database server +rejecting the transactions sent by ``ovn-northd-ddlog``, which might +happen if, for example, a bug in a ``.dl`` file would cause some OVSDB +constraint or relational integrity rule to be violated. (Such a +problem can often be diagnosed by looking in the OVSDB server's log.) + +- ``DeltaPlus_IP_Multicast`` used by the DDlog program to track new + records that are not yet added to the database:: + + output relation DeltaPlus_IP_Multicast ( + datapath: uuid_or_string_t, + enabled: Set<bool>, + querier: Set<bool> + ) + +- ``DeltaMinus_IP_Multicast`` used by the DDlog program to track + records that are no longer needed in the database and need to be + removed:: + + output relation DeltaMinus_IP_Multicast ( + _uuid: uuid + ) + +- ``Update_IP_Multicast`` used by the DDlog program to track records + whose fields need to be updated in the database:: + + output relation Update_IP_Multicast ( + _uuid: uuid, + enabled: Set<bool>, + querier: Set<bool> + ) diff --git a/Documentation/tutorials/index.rst b/Documentation/tutorials/index.rst index 4ff6e16f84cd..d1f4fda9df1e 100644 --- a/Documentation/tutorials/index.rst +++ b/Documentation/tutorials/index.rst @@ -44,3 +44,4 @@ vSwitch. ovn-rbac ovn-ipsec ovn-interconnection + ddlog-new-feature diff --git a/NEWS b/NEWS index 601023067996..04b75e68c6a1 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,11 @@ Post-v20.09.0 --------------------- + - ovn-northd-ddlog: New implementation of northd, based on DDlog. This + implementation is incremental, meaning that it only recalculates what is + needed for the southbound database when northbound changes occur. It is + expected to scale better than the C implementation, for large deployments. + (This may take testing and tuning to be effective.) This version of OVN + requires DDLog 0.30. - The "datapath" argument to ovn-trace is now optional, since the datapath can be inferred from the inport (which is required). - The obsolete "redirect-chassis" way to configure gateways has been diff --git a/acinclude.m4 b/acinclude.m4 index a797adc826c9..83d1d13bfb86 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -42,6 +42,49 @@ AC_DEFUN([OVS_ENABLE_WERROR], fi AC_SUBST([SPARSE_WERROR])]) +dnl OVS_CHECK_DDLOG +dnl +dnl Configure ddlog source tree +AC_DEFUN([OVS_CHECK_DDLOG], [ + AC_ARG_WITH([ddlog], + [AC_HELP_STRING([--with-ddlog=.../differential-datalog/lib], + [Enables DDlog by pointing to its library dir])], + [DDLOGLIBDIR=$withval], [DDLOGLIBDIR=no]) + + AC_MSG_CHECKING([for DDlog library directory]) + if test "$DDLOGLIBDIR" != no; then + if test ! -d "$DDLOGLIBDIR"; then + AC_MSG_ERROR([ddlog library dir "$DDLOGLIBDIR" doesn't exist]) + elif test ! -f "$DDLOGLIBDIR"/ddlog_std.dl; then + AC_MSG_ERROR([ddlog library dir "$DDLOGLIBDIR" lacks ddlog_std.dl]) + fi + + AC_ARG_VAR([DDLOG]) + AC_CHECK_PROGS([DDLOG], [ddlog], [none]) + if test X"$DDLOG" = X"none"; then + AC_MSG_ERROR([ddlog is required to build with DDlog]) + fi + + AC_ARG_VAR([CARGO]) + AC_CHECK_PROGS([CARGO], [cargo], [none]) + if test X"$CARGO" = X"none"; then + AC_MSG_ERROR([cargo is required to build with DDlog]) + fi + + AC_ARG_VAR([RUSTC]) + AC_CHECK_PROGS([RUSTC], [rustc], [none]) + if test X"$RUSTC" = X"none"; then + AC_MSG_ERROR([rustc is required to build with DDlog]) + fi + + AC_SUBST([DDLOGLIBDIR]) + AC_DEFINE([DDLOG], [1], [Build OVN daemons with ddlog.]) + fi + AC_MSG_RESULT([$DDLOGLIBDIR]) + + AM_CONDITIONAL([DDLOG], [test "$DDLOGLIBDIR" != no]) +]) + dnl Checks for net/if_dl.h. dnl dnl (We use this as a proxy for checking whether we're building on FreeBSD diff --git a/configure.ac b/configure.ac index 0b17f05b9c77..40ab87f691b2 100644 --- a/configure.ac +++ b/configure.ac @@ -131,6 +131,7 @@ OVS_LIBTOOL_VERSIONS OVS_CHECK_CXX AX_FUNC_POSIX_MEMALIGN OVN_CHECK_UNBOUND +OVS_CHECK_DDLOG_FAST_BUILD OVS_CHECK_INCLUDE_NEXT([stdio.h string.h]) AC_CONFIG_FILES([lib/libovn.sym]) @@ -167,11 +168,15 @@ OVS_CONDITIONAL_CC_OPTION([-Wno-unused-parameter], [HAVE_WNO_UNUSED_PARAMETER]) OVS_ENABLE_WERROR OVS_ENABLE_SPARSE +OVS_CHECK_DDLOG OVS_CHECK_PRAGMA_MESSAGE OVN_CHECK_OVS OVS_CTAGS_IDENTIFIERS AC_SUBST([OVS_CFLAGS]) AC_SUBST([OVS_LDFLAGS]) +AC_SUBST([DDLOG_EXTRA_FLAGS]) +AC_SUBST([DDLOG_EXTRA_RUSTFLAGS]) +AC_SUBST([DDLOG_NORTHD_LIB_ONLY]) AC_SUBST([ovs_srcdir], ['${OVSDIR}']) AC_SUBST([ovs_builddir], ['${OVSBUILDDIR}']) diff --git a/m4/ovn.m4 b/m4/ovn.m4 index dacfabb2a140..2909914fb87a 100644 --- a/m4/ovn.m4 +++ b/m4/ovn.m4 @@ -576,3 +576,19 @@ AC_DEFUN([OVN_CHECK_UNBOUND], fi AM_CONDITIONAL([HAVE_UNBOUND], [test "$HAVE_UNBOUND" = yes]) AC_SUBST([HAVE_UNBOUND])]) + +dnl Checks for --enable-ddlog-fast-build and updates DDLOG_EXTRA_RUSTFLAGS. +AC_DEFUN([OVS_CHECK_DDLOG_FAST_BUILD], + [AC_ARG_ENABLE( + [ddlog_fast_build], + [AC_HELP_STRING([--enable-ddlog-fast-build], + [Build ddlog programs faster, but generate slower code])], + [case "${enableval}" in + (yes) ddlog_fast_build=true ;; + (no) ddlog_fast_build=false ;; + (*) AC_MSG_ERROR([bad value ${enableval} for --enable-ddlog-fast-build]) ;; + esac], + [ddlog_fast_build=false]) + if $ddlog_fast_build; then + DDLOG_EXTRA_RUSTFLAGS="-C opt-level=z" + fi]) diff --git a/northd/.gitignore b/northd/.gitignore index 97a59801be9f..0f2b33ae7d01 100644 --- a/northd/.gitignore +++ b/northd/.gitignore @@ -1,2 +1,6 @@ /ovn-northd +/ovn-northd-ddlog /ovn-northd.8 +/OVN_Northbound.dl +/OVN_Southbound.dl +/ovn_northd_ddlog/ diff --git a/northd/automake.mk b/northd/automake.mk index 69657e77e400..2717f59c5f3a 100644 --- a/northd/automake.mk +++ b/northd/automake.mk @@ -8,3 +8,107 @@ northd_ovn_northd_LDADD = \ man_MANS += northd/ovn-northd.8 EXTRA_DIST += northd/ovn-northd.8.xml CLEANFILES += northd/ovn-northd.8 + +EXTRA_DIST += \ + northd/ovn-northd northd/ovn-northd.8.xml \ + northd/ovn_northd.dl northd/ovn.dl northd/ovn.rs \ + northd/ovn.toml northd/lswitch.dl northd/lrouter.dl \ + northd/helpers.dl northd/ipam.dl northd/multicast.dl \ + northd/ovn-nb.dlopts northd/ovn-sb.dlopts \ + northd/ovsdb2ddlog2c + +if DDLOG +bin_PROGRAMS += northd/ovn-northd-ddlog +northd_ovn_northd_ddlog_SOURCES = \ + northd/ovn-northd-ddlog.c \ + northd/ovn-northd-ddlog-sb.inc \ + northd/ovn-northd-ddlog-nb.inc \ + northd/ovn_northd_ddlog/ddlog.h +northd_ovn_northd_ddlog_LDADD = \ + northd/ovn_northd_ddlog/target/release/libovn_northd_ddlog.la \ + lib/libovn.la \ + $(OVSDB_LIBDIR)/libovsdb.la \ + $(OVS_LIBDIR)/libopenvswitch.la + +nb_opts = $$(cat $(srcdir)/northd/ovn-nb.dlopts) +northd/OVN_Northbound.dl: ovn-nb.ovsschema northd/ovn-nb.dlopts + $(AM_V_GEN)ovsdb2ddlog -f $< --output-file $@ $(nb_opts) +northd/ovn-northd-ddlog-nb.inc: ovn-nb.ovsschema northd/ovn-nb.dlopts northd/ovsdb2ddlog2c + $(AM_V_GEN)$(run_python) $(srcdir)/northd/ovsdb2ddlog2c -p nb_ -f $< --output-file $@ $(nb_opts) + +sb_opts = $$(cat $(srcdir)/northd/ovn-sb.dlopts) +northd/OVN_Southbound.dl: ovn-sb.ovsschema northd/ovn-sb.dlopts + $(AM_V_GEN)ovsdb2ddlog -f $< --output-file $@ $(sb_opts) +northd/ovn-northd-ddlog-sb.inc: ovn-sb.ovsschema northd/ovn-sb.dlopts northd/ovsdb2ddlog2c + $(AM_V_GEN)$(run_python) $(srcdir)/northd/ovsdb2ddlog2c -p sb_ -f $< --output-file $@ $(sb_opts) + +BUILT_SOURCES += \ + northd/ovn-northd-ddlog-sb.inc \ + northd/ovn-northd-ddlog-nb.inc + +northd/ovn_northd_ddlog/ddlog.h: northd/ddlog.stamp + +CARGO_VERBOSE = $(cargo_verbose_$(V)) +cargo_verbose_ = $(cargo_verbose_$(AM_DEFAULT_VERBOSITY)) +cargo_verbose_0 = +cargo_verbose_1 = --verbose + +DDLOGFLAGS = -L $(DDLOGLIBDIR) -L $(builddir)/northd $(DDLOG_EXTRA_FLAGS) + +RUSTFLAGS = \ + -L ../../lib/.libs \ + -L $(OVS_LIBDIR)/.libs \ + $$LIBOPENVSWITCH_DEPS \ + $$LIBOVN_DEPS \ + -Awarnings $(DDLOG_EXTRA_RUSTFLAGS) + +ddlog_sources = \ + northd/ovn_northd.dl \ + northd/lswitch.dl \ + northd/lrouter.dl \ + northd/ipam.dl \ + northd/multicast.dl \ + northd/ovn.dl \ + northd/ovn.rs \ + northd/helpers.dl \ + northd/OVN_Northbound.dl \ + northd/OVN_Southbound.dl +northd/ddlog.stamp: $(ddlog_sources) + $(AM_V_GEN)$(DDLOG) -i $< -o $(builddir)/northd $(DDLOGFLAGS) + $(AM_V_at)touch $@ + +NORTHD_LIB = 1 +NORTHD_CLI = 0 + +ddlog_targets = $(northd_lib_$(NORTHD_LIB)) $(northd_cli_$(NORTHD_CLI)) +northd_lib_1 = northd/ovn_northd_ddlog/target/release/libovn_%_ddlog.la +northd_cli_1 = northd/ovn_northd_ddlog/target/release/ovn_%_cli +EXTRA_northd_ovn_northd_DEPENDENCIES = $(northd_cli_$(NORTHD_CLI)) + +cargo_build = $(cargo_build_$(NORTHD_LIB)$(NORTHD_CLI)) +cargo_build_01 = --features command-line --bin ovn_northd_cli +cargo_build_10 = --lib +cargo_build_11 = --features command-line + +$(ddlog_targets): northd/ddlog.stamp lib/libovn.la $(OVS_LIBDIR)/libopenvswitch.la + $(AM_V_GEN)LIBOVN_DEPS=`. lib/libovn.la && echo "$$dependency_libs"` && \ + LIBOPENVSWITCH_DEPS=`. $(OVS_LIBDIR)/libopenvswitch.la && echo "$$dependency_libs"` && \ + cd northd/ovn_northd_ddlog && \ + RUSTC='$(RUSTC)' RUSTFLAGS="$(RUSTFLAGS)" \ + cargo build --release $(CARGO_VERBOSE) $(cargo_build) --no-default-features --features ovsdb +endif + +CLEAN_LOCAL += clean-ddlog +clean-ddlog: + rm -rf northd/ovn_northd_ddlog northd/ddlog.stamp + +CLEANFILES += \ + northd/ddlog.stamp \ + northd/ovn_northd_ddlog/ddlog.h \ + northd/ovn_northd_ddlog/target/release/libovn_northd_ddlog.a \ + northd/ovn_northd_ddlog/target/release/libovn_northd_ddlog.la \ + northd/ovn_northd_ddlog/target/release/ovn_northd_cli \ + northd/OVN_Northbound.dl \ + northd/OVN_Southbound.dl \ + northd/ovn-northd-ddlog-nb.inc \ + northd/ovn-northd-ddlog-sb.inc diff --git a/northd/helpers.dl b/northd/helpers.dl new file mode 100644 index 000000000000..d8d818c0ffb9 --- /dev/null +++ b/northd/helpers.dl @@ -0,0 +1,128 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import OVN_Northbound as nb +import OVN_Southbound as sb +import ovsdb +import ovn + +/* ACLRef: reference to nb::ACL */ +relation &ACLRef[nb::ACL] +&ACLRef[acl] :- nb::ACL[acl]. + +/* DHCP_Options: reference to nb::DHCP_Options */ +relation &DHCP_OptionsRef[nb::DHCP_Options] +&DHCP_OptionsRef[options] :- nb::DHCP_Options[options]. + +/* QoS: reference to nb::QoS */ +relation &QoSRef[nb::QoS] +&QoSRef[qos] :- nb::QoS[qos]. + +/* LoadBalancerRef: reference to nb::Load_Balancer */ +relation &LoadBalancerRef[nb::Load_Balancer] +&LoadBalancerRef[lb] :- nb::Load_Balancer[lb]. + +/* LoadBalancerHealthCheckRef: reference to nb::Load_Balancer_Health_Check */ +relation &LoadBalancerHealthCheckRef[nb::Load_Balancer_Health_Check] +&LoadBalancerHealthCheckRef[lbhc] :- nb::Load_Balancer_Health_Check[lbhc]. + +/* NATRef: reference to nb::NAT*/ +relation &NATRef[nb::NAT] +&NATRef[nat] :- nb::NAT[nat]. + +/* AddressSetRef: reference to nb::Address_Set */ +relation &AddressSetRef[nb::Address_Set] +&AddressSetRef[__as] :- nb::Address_Set[__as]. + +/* ServiceMonitor: reference to sb::Service_Monitor */ +relation &ServiceMonitorRef[sb::Service_Monitor] +&ServiceMonitorRef[sm] :- sb::Service_Monitor[sm]. + +/* Switch-to-router logical port connections */ +relation SwitchRouterPeer(lsp: uuid, lsp_name: string, lrp: uuid) +SwitchRouterPeer(lsp, lsp_name, lrp) :- + nb::Logical_Switch_Port(._uuid = lsp, .name = lsp_name, .__type = "router", .options = options), + Some{var router_port} = map_get(options, "router-port"), + nb::Logical_Router_Port(.name = router_port, ._uuid = lrp). + +function map_get_bool_def(m: Map<string, string>, + k: string, def: bool): bool = { + match (map_get(m, k)) { + None -> def, + Some{x} -> { + if (def) { + str_to_lower(x) != "false" + } else { + str_to_lower(x) == "true" + } + } + } +} + +function map_get_uint_def(m: Map<string, string>, k: string, + def: integer): integer = { + match (map_get(m, k)) { + None -> def, + Some{x} -> { + match (str_to_uint(x, 10)) { + Some{v} -> v, + None -> def + } + } + } +} + +function map_get_int_def(m: Map<string, string>, k: string, + def: integer): integer = { + match (map_get(m, k)) { + None -> def, + Some{x} -> { + match (str_to_int(x, 10)) { + Some{v} -> v, + None -> def + } + } + } +} + +function map_get_int_def_limit(m: Map<string, string>, k: string, def: integer, + min: integer, max: integer): integer = { + var v = map_get_int_def(m, k, def); + var v1 = { + if (v < min) min else v + }; + if (v1 > max) max else v1 +} + +function map_get_str_def(m: Map<string, string>, k: string, + def: string): string = { + match (map_get(m, k)) { + None -> def, + Some{x} -> x + } +} + +function vec_nth_def(vector: Vec<'A>, index: bit<64>, def: 'A): 'A { + match (vec_nth(vector, index)) { + Some{value} -> value, + None -> def + } +} + +function ha_chassis_group_uuid(uuid: uuid): uuid { hash128("hacg" ++ uuid) } +function ha_chassis_uuid(chassis_name: string, nb_chassis_uuid: uuid): uuid { hash128("hac" ++ chassis_name ++ nb_chassis_uuid) } + +/* Dummy relation with one empty row, useful for putting into antijoins. */ +relation Unit() +Unit(). diff --git a/northd/ipam.dl b/northd/ipam.dl new file mode 100644 index 000000000000..cc0f7989a7dd --- /dev/null +++ b/northd/ipam.dl @@ -0,0 +1,506 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * IPAM (IP address management) and MACAM (MAC address management) + * + * IPAM generally stands for IP address management. In non-virtualized + * world, MAC addresses come with the hardware. But, with virtualized + * workloads, they need to be assigned and managed. This function + * does both IP address management (ipam) and MAC address management + * (macam). + */ + +import OVN_Northbound as nb +import ovsdb +import allocate +import helpers +import ovn +import ovn_northd +import lswitch +import lrouter + +function mAC_ADDR_SPACE(): bit<64> = 64'hffffff + +/* + * IPv4 dynamic address allocation. + */ + +/* + * The fixed portions of a request for a dynamic LSP address. + */ +typedef dynamic_address_request = DynamicAddressRequest{ + mac: Option<eth_addr>, + ip4: Option<in_addr>, + ip6: Option<in6_addr> +} +function parse_dynamic_address_request(s: string): Option<dynamic_address_request> { + var tokens = string_split(s, " "); + var n = vec_len(tokens); + if (n < 1 or n > 3) { + return None + }; + + var t0 = vec_nth_def(tokens, 0, ""); + var t1 = vec_nth_def(tokens, 1, ""); + var t2 = vec_nth_def(tokens, 2, ""); + if (t0 == "dynamic") { + if (n == 1) { + Some{DynamicAddressRequest{None, None, None}} + } else if (n == 2) { + match (ip46_parse(t1)) { + Some{IPv4{ipv4}} -> Some{DynamicAddressRequest{None, Some{ipv4}, None}}, + Some{IPv6{ipv6}} -> Some{DynamicAddressRequest{None, None, Some{ipv6}}}, + _ -> None + } + } else if (n == 3) { + match ((ip_parse(t1), ipv6_parse(t2))) { + (Some{ipv4}, Some{ipv6}) -> Some{DynamicAddressRequest{None, Some{ipv4}, Some{ipv6}}}, + _ -> None + } + } else { + None + } + } else if (n == 2 and t1 == "dynamic") { + match (eth_addr_from_string(t0)) { + Some{mac} -> Some{DynamicAddressRequest{Some{mac}, None, None}}, + _ -> None + } + } else { + None + } +} + +/* SwitchIPv4ReservedAddress - keeps track of statically reserved IPv4 addresses + * for each switch whose subnet option is set, including: + * (1) first and last (multicast) address in the subnet range + * (2) addresses from `other_config.exclude_ips` + * (3) port addresses in lsp.addresses, except "unknown" addresses, addresses of + * "router" ports, dynamic addresses + * (4) addresses associated with router ports peered with the switch. + * (5) static IP component of "dynamic" `lsp.addresses`. + * + * Addresses are kept in host-endian format (i.e., bit<32> vs in_addr). + */ +relation SwitchIPv4ReservedAddress(lswitch: uuid, addr: bit<32>) + +/* Add reserved address groups (1) and (2). */ +SwitchIPv4ReservedAddress(.lswitch = ls._uuid, + .addr = addr) :- + &Switch(.ls = ls, + .subnet = Some{(_, _, start_ipv4, total_ipv4s)}), + var exclude_ips = { + var exclude_ips = set_singleton(start_ipv4); + set_insert(exclude_ips, start_ipv4 + total_ipv4s - 1); + match (map_get(ls.other_config, "exclude_ips")) { + None -> exclude_ips, + Some{exclude_ip_list} -> match (parse_ip_list(exclude_ip_list)) { + Left{err} -> { + warn("logical switch ${uuid2str(ls._uuid)}: bad exclude_ips (${err})"); + exclude_ips + }, + Right{ranges} -> { + for (range in ranges) { + (var ip_start, var ip_end) = range; + var start = iptohl(ip_start); + var end = match (ip_end) { + None -> start, + Some{ip} -> iptohl(ip) + }; + start = max(start_ipv4, start); + end = min(start_ipv4 + total_ipv4s - 1, end); + if (end >= start) { + for (addr in range_vec(start, end+1, 1)) { + set_insert(exclude_ips, addr) + } + } else { + warn("logical switch ${uuid2str(ls._uuid)}: excluded addresses not in subnet") + } + }; + exclude_ips + } + } + } + }, + var addr = FlatMap(exclude_ips). + +/* Add reserved address group (3). */ +SwitchIPv4ReservedAddress(.lswitch = ls._uuid, + .addr = addr) :- + SwitchPortStaticAddresses( + .port = &SwitchPort{ + .sw = &Switch{.ls = ls, + .subnet = Some{(_, _, start_ipv4, total_ipv4s)}}, + .peer = None}, + .addrs = lport_addrs + ), + var addrs = { + var addrs = set_empty(); + for (addr in lport_addrs.ipv4_addrs) { + var addr_host_endian = iptohl(addr.addr); + if (addr_host_endian >= start_ipv4 and addr_host_endian < start_ipv4 + total_ipv4s) { + set_insert(addrs, addr_host_endian) + } else () + }; + addrs + }, + var addr = FlatMap(addrs). + +/* Add reserved address group (4) */ +SwitchIPv4ReservedAddress(.lswitch = ls._uuid, + .addr = addr) :- + &SwitchPort( + .sw = &Switch{.ls = ls, + .subnet = Some{(_, _, start_ipv4, total_ipv4s)}}, + .peer = Some{&rport}), + var addrs = { + var addrs = set_empty(); + for (addr in rport.networks.ipv4_addrs) { + var addr_host_endian = iptohl(addr.addr); + if (addr_host_endian >= start_ipv4 and addr_host_endian < start_ipv4 + total_ipv4s) { + set_insert(addrs, addr_host_endian) + } else () + }; + addrs + }, + var addr = FlatMap(addrs). + +/* Add reserved address group (5) */ +SwitchIPv4ReservedAddress(.lswitch = sw.ls._uuid, + .addr = iptohl(ip_addr)) :- + &SwitchPort(.sw = &sw, .lsp = lsp, .static_dynamic_ipv4 = Some{ip_addr}). + +/* Aggregate all reserved addresses for each switch. */ +relation SwitchIPv4ReservedAddresses(lswitch: uuid, addrs: Set<bit<32>>) + +SwitchIPv4ReservedAddresses(lswitch, addrs) :- + SwitchIPv4ReservedAddress(lswitch, addr), + var addrs = addr.group_by(lswitch).to_set(). + +SwitchIPv4ReservedAddresses(lswitch_uuid, set_empty()) :- + nb::Logical_Switch(._uuid = lswitch_uuid), + not SwitchIPv4ReservedAddress(lswitch_uuid, _). + +/* Allocate dynamic IP addresses for ports that require them: + */ +relation SwitchPortAllocatedIPv4DynAddress(lsport: uuid, dyn_addr: Option<in_addr>) + +SwitchPortAllocatedIPv4DynAddress(lsport, dyn_addr) :- + /* Aggregate all ports of a switch that need a dynamic IP address */ + port in &SwitchPort(.needs_dynamic_ipv4address = true, + .sw = &sw), + var switch_id = sw.ls._uuid, + var ports = port.group_by(switch_id).to_vec(), + SwitchIPv4ReservedAddresses(switch_id, reserved_addrs), + /* Allocate dynamic addresses only for ports that don't have a dynamic address + * or have one that is no longer valid. */ + var dyn_addresses = { + var used_addrs = reserved_addrs; + var assigned_addrs = vec_empty(); + var need_addr = vec_empty(); + (var start_ipv4, var total_ipv4s) = match (vec_nth(ports, 0)) { + None -> { (0, 0) } /* no ports with dynamic addresses */, + Some{port0} -> { + match (port0.sw.subnet) { + None -> { + abort("needs_dynamic_ipv4address is true, but subnet is undefined in port ${uuid2str(deref(port0).lsp._uuid)}"); + (0, 0) + }, + Some{(_, _, start_ipv4, total_ipv4s)} -> (start_ipv4, total_ipv4s) + } + } + }; + for (port in ports) { + //warn("port(${deref(port).lsp._uuid})"); + match (deref(port).dynamic_address) { + None -> { + /* no dynamic address yet -- allocate one now */ + //warn("need_addr(${deref(port).lsp._uuid})"); + vec_push(need_addr, deref(port).lsp._uuid) + }, + Some{dynaddr} -> { + match (vec_nth(dynaddr.ipv4_addrs, 0)) { + None -> { + /* dynamic address does not have IPv4 component -- allocate one now */ + //warn("need_addr(${deref(port).lsp._uuid})"); + vec_push(need_addr, deref(port).lsp._uuid) + }, + Some{addr} -> { + var haddr = iptohl(addr.addr); + if (haddr < start_ipv4 or haddr >= start_ipv4 + total_ipv4s) { + vec_push(need_addr, deref(port).lsp._uuid) + } else if (set_contains(used_addrs, haddr)) { + vec_push(need_addr, deref(port).lsp._uuid); + warn("Duplicate IP set on switch ${deref(port).lsp.name}: ${addr.addr}") + } else { + /* has valid dynamic address -- record it in used_addrs */ + set_insert(used_addrs, haddr); + assigned_addrs.push((port.lsp._uuid, Some{haddr})) + } + } + } + } + } + }; + assigned_addrs.append(allocate_opt(used_addrs, need_addr, start_ipv4, start_ipv4 + total_ipv4s - 1)); + assigned_addrs + }, + var port_address = FlatMap(dyn_addresses), + (var lsport, var dyn_addr_bits) = port_address, + var dyn_addr = dyn_addr_bits.map(hltoip). + +/* Compute new dynamic IPv4 address assignment: + * - port does not need dynamic IP - use static_dynamic_ip if any + * - a new address has been allocated for port - use this address + * - otherwise, use existing dynamic IP + */ +relation SwitchPortNewIPv4DynAddress(lsport: uuid, dyn_addr: Option<in_addr>) + +SwitchPortNewIPv4DynAddress(lsp._uuid, ip_addr) :- + &SwitchPort(.sw = &sw, + .needs_dynamic_ipv4address = false, + .static_dynamic_ipv4 = static_dynamic_ipv4, + .lsp = lsp), + var ip_addr = { + match (static_dynamic_ipv4) { + None -> { None }, + Some{addr} -> { + match (sw.subnet) { + None -> { None }, + Some{(_, _, start_ipv4, total_ipv4s)} -> { + var haddr = iptohl(addr); + if (haddr < start_ipv4 or haddr >= start_ipv4 + total_ipv4s) { + /* new static ip is not valid */ + None + } else { + Some{addr} + } + } + } + } + } + }. + +SwitchPortNewIPv4DynAddress(lsport, addr) :- + SwitchPortAllocatedIPv4DynAddress(lsport, addr). + +/* + * Dynamic MAC address allocation. + */ + +function get_mac_prefix(options: Map<string,string>, uuid: uuid) : bit<64> = +{ + var existing_prefix = match (map_get(options, "mac_prefix")) { + Some{prefix} -> scan_eth_addr_prefix(prefix), + None -> None + }; + match (existing_prefix) { + Some{prefix} -> prefix, + None -> pseudorandom_mac(uuid, 16'h1234) & 64'hffffff000000 + } +} +function put_mac_prefix(options: Map<string,string>, mac_prefix: bit<64>) + : Map<string,string> = +{ + map_insert_imm(options, "mac_prefix", + string_substr(to_string(eth_addr_from_uint64(mac_prefix)), 0, 8)) +} +relation MacPrefix(mac_prefix: bit<64>) +MacPrefix(get_mac_prefix(options, uuid)) :- + nb::NB_Global(._uuid = uuid, .options = options). + +/* ReservedMACAddress - keeps track of statically reserved MAC addresses. + * (1) static addresses in `lsp.addresses` + * (2) static MAC component of "dynamic" `lsp.addresses`. + * (3) addresses associated with router ports peered with the switch. + * + * Addresses are kept in 64-bit host-endian format. + */ +relation ReservedMACAddress(addr: bit<64>) + +/* Add reserved address group (1). */ +ReservedMACAddress(.addr = eth_addr_to_uint64(lport_addrs.ea)) :- + SwitchPortStaticAddresses(.addrs = lport_addrs). + +/* Add reserved address group (2). */ +ReservedMACAddress(.addr = eth_addr_to_uint64(mac_addr)) :- + &SwitchPort(.lsp = lsp, .static_dynamic_mac = Some{mac_addr}). + +/* Add reserved address group (3). */ +ReservedMACAddress(.addr = eth_addr_to_uint64(rport.networks.ea)) :- + &SwitchPort(.peer = Some{&rport}). + +/* Aggregate all reserved MAC addresses. */ +relation ReservedMACAddresses(addrs: Set<bit<64>>) + +ReservedMACAddresses(addrs) :- + ReservedMACAddress(addr), + var addrs = addr.group_by(()).to_set(). + +/* Handle case when `ReservedMACAddress` is empty */ +ReservedMACAddresses(set_empty()) :- + // NB_Global should have exactly one record, so we can + // use it as a base for antijoin. + nb::NB_Global(), + not ReservedMACAddress(_). + +/* Allocate dynamic MAC addresses for ports that require them: + * Case 1: port doesn't need dynamic MAC (i.e., does not have dynamic address or + * has a dynamic address with a static MAC). + * Case 2: needs dynamic MAC, has dynamic MAC, has existing dynamic MAC with the right prefix + * needs dynamic MAC, does not have fixed dynamic MAC, doesn't have existing dynamic MAC with correct prefix + */ +relation SwitchPortAllocatedMACDynAddress(lsport: uuid, dyn_addr: bit<64>) + +SwitchPortAllocatedMACDynAddress(lsport, dyn_addr), +SwitchPortDuplicateMACAddress(dup_addrs) :- + /* Group all ports that need a dynamic IP address */ + port in &SwitchPort(.needs_dynamic_macaddress = true, .lsp = lsp), + SwitchPortNewIPv4DynAddress(lsp._uuid, ipv4_addr), + var ports = (port, ipv4_addr).group_by(()).to_vec(), + ReservedMACAddresses(reserved_addrs), + MacPrefix(mac_prefix), + (var dyn_addresses, var dup_addrs) = { + var used_addrs = reserved_addrs; + var need_addr = vec_empty(); + var dup_addrs = set_empty(); + for (port_with_addr in ports) { + (var port, var ipv4_addr) = port_with_addr; + var hint = match (ipv4_addr) { + None -> Some { mac_prefix | 1 }, + Some{addr} -> { + /* The tentative MAC's suffix will be in the interval (1, 0xfffffe). */ + var mac_suffix: bit<24> = iptohl(addr)[23:0] % ((mAC_ADDR_SPACE() - 1)[23:0]) + 1; + Some{ mac_prefix | (40'd0 ++ mac_suffix) } + } + }; + match (port.dynamic_address) { + None -> { + /* no dynamic address yet -- allocate one now */ + vec_push(need_addr, (port.lsp._uuid, hint)) + }, + Some{dynaddr} -> { + var haddr = eth_addr_to_uint64(dynaddr.ea); + if ((haddr ^ mac_prefix) >> 24 != 0) { + /* existing dynamic address is no longer valid */ + vec_push(need_addr, (port.lsp._uuid, hint)) + } else if (set_contains(used_addrs, haddr)) { + set_insert(dup_addrs, dynaddr.ea); + } else { + /* has valid dynamic address -- record it in used_addrs */ + set_insert(used_addrs, haddr) + } + } + } + }; + // FIXME: if a port has a dynamic address that is no longer valid, and + // we are unable to allocate a new address, the current behavior is to + // keep the old invalid address. It should probably be changed to + // removing the old address. + // FIXME: OVN allocates MAC addresses by seeding them with IPv4 address. + // Implement a custom allocation function that simulates this behavior. + var res = allocate_with_hint(used_addrs, need_addr, mac_prefix + 1, mac_prefix + mAC_ADDR_SPACE() - 1); + var res_strs = vec_empty(); + for (x in res) { + (var uuid, var addr) = x; + vec_push(res_strs, "${uuid2str(uuid)}: ${eth_addr_from_uint64(addr)}") + }; + (res, dup_addrs) + }, + var port_address = FlatMap(dyn_addresses), + (var lsport, var dyn_addr) = port_address. + +relation SwitchPortDuplicateMACAddress(dup_addrs: Set<eth_addr>) +Warning["Duplicate MAC set: ${ea}"] :- + SwitchPortDuplicateMACAddress(dup_addrs), + var ea = FlatMap(dup_addrs). + +/* Compute new dynamic MAC address assignment: + * - port does not need dynamic MAC - use `static_dynamic_mac` + * - a new address has been allocated for port - use this address + * - otherwise, use existing dynamic MAC + */ +relation SwitchPortNewMACDynAddress(lsport: uuid, dyn_addr: Option<eth_addr>) + +SwitchPortNewMACDynAddress(lsp._uuid, mac_addr) :- + &SwitchPort(.needs_dynamic_macaddress = false, + .lsp = lsp, + .sw = &sw, + .static_dynamic_mac = static_dynamic_mac), + var mac_addr = match (static_dynamic_mac) { + None -> None, + Some{addr} -> { + if (is_some(sw.subnet) or is_some(sw.ipv6_prefix) or + map_get(sw.ls.other_config, "mac_only") == Some{"true"}) { + Some{addr} + } else { + None + } + } + }. + +SwitchPortNewMACDynAddress(lsport, Some{eth_addr_from_uint64(addr)}) :- + SwitchPortAllocatedMACDynAddress(lsport, addr). + +SwitchPortNewMACDynAddress(lsp._uuid, addr) :- + &SwitchPort(.needs_dynamic_macaddress = true, .lsp = lsp, .dynamic_address = cur_address), + not SwitchPortAllocatedMACDynAddress(lsp._uuid, _), + var addr = match (cur_address) { + None -> None, + Some{dynaddr} -> Some{dynaddr.ea} + }. + +/* + * Dynamic IPv6 address allocation. + * `needs_dynamic_ipv6address` -> in6_generate_eui64(mac, ipv6_prefix) + */ +relation SwitchPortNewDynamicAddress(port: Ref<SwitchPort>, address: Option<lport_addresses>) + +SwitchPortNewDynamicAddress(port, None) :- + port in &SwitchPort(.lsp = lsp), + SwitchPortNewMACDynAddress(lsp._uuid, None). + +SwitchPortNewDynamicAddress(port, lport_address) :- + port in &SwitchPort(.lsp = lsp, + .sw = &sw, + .needs_dynamic_ipv6address = needs_dynamic_ipv6address, + .static_dynamic_ipv6 = static_dynamic_ipv6), + SwitchPortNewMACDynAddress(lsp._uuid, Some{mac_addr}), + SwitchPortNewIPv4DynAddress(lsp._uuid, opt_ip4_addr), + var ip6_addr = match ((static_dynamic_ipv6, needs_dynamic_ipv6address, sw.ipv6_prefix)) { + (Some{ipv6}, _, _) -> " ${ipv6}", + (_, true, Some{prefix}) -> " ${in6_generate_eui64(mac_addr, prefix)}", + _ -> "" + }, + var ip4_addr = match (opt_ip4_addr) { + None -> "", + Some{ip4} -> " ${ip4}" + }, + var addr_string = "${mac_addr}${ip6_addr}${ip4_addr}", + var lport_address = extract_addresses(addr_string). + + +///* If there's more than one dynamic addresses in port->addresses, log a warning +// and only allocate the first dynamic address */ +// +// VLOG_WARN_RL(&rl, "More than one dynamic address " +// "configured for logical switch port '%s'", +// nbsp->name); +// +////>> * MAC addresses suffixes in OUIs managed by OVN"s MACAM (MAC Address +////>> Management) system, in the range 1...0xfffffe. +////>> * IPv4 addresses in ranges managed by OVN's IPAM (IP Address Management) +////>> system. The range varies depending on the size of the subnet. +////>> +////>> Are these `dynamic_addresses` in OVN_Northbound.Logical_Switch_Port`? diff --git a/northd/lrouter.dl b/northd/lrouter.dl new file mode 100644 index 000000000000..5ef54fb761e3 --- /dev/null +++ b/northd/lrouter.dl @@ -0,0 +1,715 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import OVN_Northbound as nb +import OVN_Southbound as sb +import multicast +import ovsdb +import ovn +import helpers +import lswitch +import ovn_northd + +function is_enabled(lr: nb::Logical_Router): bool { is_enabled(lr.enabled) } +function is_enabled(lrp: nb::Logical_Router_Port): bool { is_enabled(lrp.enabled) } +function is_enabled(rp: RouterPort): bool { rp.lrp.is_enabled() } +function is_enabled(rp: Ref<RouterPort>): bool { rp.lrp.is_enabled() } + +/* default logical flow prioriry for distributed routes */ +function dROUTE_PRIO(): bit<32> = 400 + +/* LogicalRouterPortCandidate. + * + * Each row pairs a logical router port with its logical router, but without + * checking that the logical router port is on only one logical router. + * + * (Use LogicalRouterPort instead, which guarantees uniqueness.) */ +relation LogicalRouterPortCandidate(lrp_uuid: uuid, lr_uuid: uuid) +LogicalRouterPortCandidate(lrp_uuid, lr_uuid) :- + nb::Logical_Router(._uuid = lr_uuid, .ports = ports), + var lrp_uuid = FlatMap(ports). +Warning[message] :- + LogicalRouterPortCandidate(lrp_uuid, lr_uuid), + var lrs = lr_uuid.group_by(lrp_uuid).to_set(), + set_size(lrs) > 1, + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), + var message = "Bad configuration: logical router port ${lrp.name} belongs " + "to more than one logical router". + +/* Each row means 'lport' is in 'lrouter' (and only that lrouter). */ +relation LogicalRouterPort(lport: uuid, lrouter: uuid) +LogicalRouterPort(lrp_uuid, lr_uuid) :- + LogicalRouterPortCandidate(lrp_uuid, lr_uuid), + var lrs = lr_uuid.group_by(lrp_uuid).to_set(), + set_size(lrs) == 1, + Some{var lr_uuid} = set_nth(lrs, 0). + +/* + * Peer routers. + * + * Each row in the relation indicates that routers 'a' and 'b' can reach + * each other directly through router ports. + * + * This relation is symmetric: if (a,b) then (b,a). + * This relation is antireflexive: if (a,b) then a != b. + * + * Routers aren't peers if they can reach each other only through logical + * switch ports (that's the ReachableLogicalRouter table). + */ +relation PeerLogicalRouter(a: uuid, b: uuid) +PeerLogicalRouter(lrp_uuid, peer._uuid) :- + LogicalRouterPort(lrp_uuid, _), + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), + Some{var peer_name} = lrp.peer, + peer in nb::Logical_Router_Port(.name = peer_name), + peer.peer == Some{lrp.name}, // 'peer' must point back to 'lrp' + lrp_uuid != peer._uuid. // No reflexive pointers. + +/* + * First-hop routers. + * + * Each row indicates that 'lrouter' is a first-hop logical router for + * 'lswitch', that is, that a "cable" directly connects 'lrouter' and + * 'lswitch'. + * + * A switch can have multiple first-hop routers. */ +relation FirstHopLogicalRouter(lrouter: uuid, lswitch: uuid) +FirstHopLogicalRouter(lrouter, lswitch) :- + LogicalRouterPort(lrp_uuid, lrouter), + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), + LogicalSwitchPort(lsp_uuid, lswitch), + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), + lsp.__type == "router", + map_get(lsp.options, "router-port") == Some{lrp.name}, + is_none(lrp.peer). + +/* + * Reachable routers. + * + * Each row in the relation indicates that routers 'a' and 'b' can reach each + * other directly or indirectly through any chain of logical routers and + * switches. + * + * This relation is symmetric: if (a,b) then (b,a). + * This relation is reflexive: (a,a) is always true. + */ +relation ReachableLogicalRouter(a: uuid, b: uuid) +ReachableLogicalRouter(a, b) :- + PeerLogicalRouter(a, c), + ReachableLogicalRouter(c, b). +ReachableLogicalRouter(a, b) :- + FirstHopLogicalRouter(a, ls), + FirstHopLogicalRouter(b, ls). +ReachableLogicalRouter(a, b) :- + ReachableLogicalRouter(a, c), + ReachableLogicalRouter(c, b). +ReachableLogicalRouter(a, a) :- ReachableLogicalRouter(a, _). + +// ha_chassis_group and gateway_chassis may not both be present. +Warning[message] :- + lrp in nb::Logical_Router_Port(), + is_some(lrp.ha_chassis_group), + not set_is_empty(lrp.gateway_chassis), + var message = "Both ha_chassis_group and gateway_chassis configured on " + "port ${lrp.name}; ignoring the latter". + +// A distributed gateway port cannot also be an L3 gateway router. +Warning[message] :- + lrp in nb::Logical_Router_Port(), + is_some(lrp.ha_chassis_group) + or not set_is_empty(lrp.gateway_chassis), + map_contains_key(lrp.options, "chassis"), + var message = "Bad configuration: distributed gateway port configured on " + "port ${lrp.name} on L3 gateway router". + +/* DistributedGatewayPortCandidate. + * + * Each row pairs a logical router with its distributed gateway port, + * but without checking that there is at most one DGP per LR. + * + * (Use DistributedGatewayPort instead, since it guarantees uniqueness.) */ +relation DistributedGatewayPortCandidate(lr_uuid: uuid, lrp_uuid: uuid) +DistributedGatewayPortCandidate(lr_uuid, lrp_uuid) :- + lr in nb::Logical_Router(._uuid = lr_uuid), + LogicalRouterPort(lrp_uuid, lr._uuid), + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid), + not map_contains_key(lrp.options, "chassis"), + var has_hcg = is_some(lrp.ha_chassis_group), + var has_gc = not set_is_empty(lrp.gateway_chassis), + has_hcg or has_gc. +Warning[message] :- + DistributedGatewayPortCandidate(lr_uuid, lrp_uuid), + var lrps = lrp_uuid.group_by(lr_uuid).to_set(), + set_size(lrps) > 1, + lr in nb::Logical_Router(._uuid = lr_uuid), + var message = "Bad configuration: multiple distributed gateway ports on " + "logical router ${lr.name}; ignoring all of them". + +/* Distributed gateway ports. + * + * Each row means 'lrp' is the distributed gateway port on 'lr_uuid'. + * + * There is at most one distributed gateway port per logical router. */ +relation DistributedGatewayPort(lrp: nb::Logical_Router_Port, lr_uuid: uuid) +DistributedGatewayPort(lrp, lr_uuid) :- + DistributedGatewayPortCandidate(lr_uuid, lrp_uuid), + var lrps = lrp_uuid.group_by(lr_uuid).to_set(), + set_size(lrps) == 1, + Some{var lrp_uuid} = set_nth(lrps, 0), + lrp in nb::Logical_Router_Port(._uuid = lrp_uuid). + +/* HAChassis is an abstraction over nb::Gateway_Chassis and nb::HA_Chassis, which + * are different ways to represent the same configuration. Each row is + * effectively one HA_Chassis record. (Usually, we could associated each + * row with a particular 'lr_uuid', but it's permissible for more than one + * logical router to use a HA chassis group, so we omit it so that multiple + * references get merged.) + * + * nb::Gateway_Chassis has an "options" column that this omits because + * nb::HA_Chassis doesn't have anything similar. That's OK because no options + * were ever defined. */ +relation HAChassis(hacg_uuid: uuid, + hac_uuid: uuid, + chassis_name: string, + priority: integer, + external_ids: Map<string,string>) +HAChassis(ha_chassis_group_uuid(lrp._uuid), gw_chassis_uuid, + chassis_name, priority, external_ids) :- + DistributedGatewayPort(.lrp = lrp), + is_none(lrp.ha_chassis_group), + var gw_chassis_uuid = FlatMap(lrp.gateway_chassis), + nb::Gateway_Chassis(._uuid = gw_chassis_uuid, + .chassis_name = chassis_name, + .priority = priority, + .external_ids = eids), + var external_ids = map_insert_imm(eids, "chassis-name", chassis_name). +HAChassis(ha_chassis_group_uuid(ha_chassis_group._uuid), ha_chassis_uuid, + chassis_name, priority, external_ids) :- + DistributedGatewayPort(.lrp = lrp), + Some{var hac_group_uuid} = lrp.ha_chassis_group, + ha_chassis_group in nb::HA_Chassis_Group(._uuid = hac_group_uuid), + var ha_chassis_uuid = FlatMap(ha_chassis_group.ha_chassis), + nb::HA_Chassis(._uuid = ha_chassis_uuid, + .chassis_name = chassis_name, + .priority = priority, + .external_ids = eids), + var external_ids = map_insert_imm(eids, "chassis-name", chassis_name). + +/* HAChassisGroup is an abstraction for sb::HA_Chassis_Group that papers over + * the two southbound ways to configure it via nb::Gateway_Chassis and + * nb::HA_Chassis. The former configuration method does not provide a name or + * external_ids for the group (only for individual chassis), so we generate + * them. + * + * (Usually, we could associated each row with a particular 'lr_uuid', but it's + * permissible for more than one logical router to use a HA chassis group, so + * we omit it so that multiple references get merged.) + */ +relation HAChassisGroup(uuid: uuid, + name: string, + external_ids: Map<string,string>) +HAChassisGroup(ha_chassis_group_uuid(lrp._uuid), lrp.name, map_empty()) :- + DistributedGatewayPort(.lrp = lrp), + is_none(lrp.ha_chassis_group), + not set_is_empty(lrp.gateway_chassis). +HAChassisGroup(ha_chassis_group_uuid(hac_group_uuid), + name, external_ids) :- + DistributedGatewayPort(.lrp = lrp), + Some{var hac_group_uuid} = lrp.ha_chassis_group, + nb::HA_Chassis_Group(._uuid = hacg_uuid, + .name = name, + .external_ids = external_ids). + +/* Each row maps from a logical router to the name of its HAChassisGroup. + * This level of indirection is needed because multiple logical routers + * are allowed to reference a given HAChassisGroup. */ +relation LogicalRouterHAChassisGroup(lr_uuid: uuid, + hacg_uuid: uuid) +LogicalRouterHAChassisGroup(lr_uuid, ha_chassis_group_uuid(lrp._uuid)) :- + DistributedGatewayPort(lrp, lr_uuid), + is_none(lrp.ha_chassis_group), + set_size(lrp.gateway_chassis) > 0. +LogicalRouterHAChassisGroup(lr_uuid, + ha_chassis_group_uuid(hac_group_uuid)) :- + DistributedGatewayPort(lrp, lr_uuid), + Some{var hac_group_uuid} = lrp.ha_chassis_group, + nb::HA_Chassis_Group(._uuid = hac_group_uuid). + + +/* For each router port, tracks whether it's a redirect port of its router */ +relation RouterPortIsRedirect(lrp: uuid, is_redirect: bool) +RouterPortIsRedirect(lrp, true) :- DistributedGatewayPort(nb::Logical_Router_Port{._uuid = lrp}, _). +RouterPortIsRedirect(lrp, false) :- + nb::Logical_Router_Port(._uuid = lrp), + not DistributedGatewayPort(nb::Logical_Router_Port{._uuid = lrp}, _). + +relation LogicalRouterRedirectPort(lr: uuid, has_redirect_port: Option<nb::Logical_Router_Port>) + +LogicalRouterRedirectPort(lr, Some{lrp}) :- + DistributedGatewayPort(lrp, lr). + +LogicalRouterRedirectPort(lr, None) :- + nb::Logical_Router(._uuid = lr), + not DistributedGatewayPort(_, lr). + +typedef ExceptionalExtIps = AllowedExtIps{ips: Ref<nb::Address_Set>} + | ExemptedExtIps{ips: Ref<nb::Address_Set>} + +typedef NAT = NAT{ + nat: Ref<nb::NAT>, + external_ip: v46_ip, + external_mac: Option<eth_addr>, + exceptional_ext_ips: Option<ExceptionalExtIps> +} + +relation LogicalRouterNAT0( + lr: uuid, + nat: Ref<nb::NAT>, + external_ip: v46_ip, + external_mac: Option<eth_addr>) +LogicalRouterNAT0(lr, nat, external_ip, external_mac) :- + nb::Logical_Router(._uuid = lr, .nat = nats), + var nat_uuid = FlatMap(nats), + nat in &NATRef[nb::NAT{._uuid = nat_uuid}], + Some{var external_ip} = ip46_parse(nat.external_ip), + var external_mac = match (nat.external_mac) { + Some{s} -> eth_addr_from_string(s), + None -> None + }. +Warning["Bad ip address ${nat.external_ip} in nat configuration for router ${lr_name}."] :- + nb::Logical_Router(._uuid = lr, .nat = nats, .name = lr_name), + var nat_uuid = FlatMap(nats), + nat in &NATRef[nb::NAT{._uuid = nat_uuid}], + None = ip46_parse(nat.external_ip). +Warning["Bad MAC address ${s} in nat configuration for router ${lr_name}."] :- + nb::Logical_Router(._uuid = lr, .nat = nats, .name = lr_name), + var nat_uuid = FlatMap(nats), + nat in &NATRef[nb::NAT{._uuid = nat_uuid}], + Some{var s} = nat.external_mac, + None = eth_addr_from_string(s). + +relation LogicalRouterNAT(lr: uuid, nat: NAT) +LogicalRouterNAT(lr, NAT{nat, external_ip, external_mac, None}) :- + LogicalRouterNAT0(lr, nat, external_ip, external_mac), + nat.allowed_ext_ips.is_none(), + nat.exempted_ext_ips.is_none(). +LogicalRouterNAT(lr, NAT{nat, external_ip, external_mac, Some{AllowedExtIps{__as}}}) :- + LogicalRouterNAT0(lr, nat, external_ip, external_mac), + nat.exempted_ext_ips.is_none(), + Some{var __as_uuid} = nat.allowed_ext_ips, + __as in &AddressSetRef[nb::Address_Set{._uuid = __as_uuid}]. +LogicalRouterNAT(lr, NAT{nat, external_ip, external_mac, Some{ExemptedExtIps{__as}}}) :- + LogicalRouterNAT0(lr, nat, external_ip, external_mac), + nat.allowed_ext_ips.is_none(), + Some{var __as_uuid} = nat.exempted_ext_ips, + __as in &AddressSetRef[nb::Address_Set{._uuid = __as_uuid}]. +Warning["NAT rule: ${nat._uuid} not applied, since" + "both allowed and exempt external ips set"] :- + LogicalRouterNAT0(lr, nat, _, _), + nat.allowed_ext_ips.is_some() and nat.exempted_ext_ips.is_some(). + +relation LogicalRouterNATs(lr: uuid, nat: Vec<NAT>) + +LogicalRouterNATs(lr, nats) :- + LogicalRouterNAT(lr, nat), + var nats = nat.group_by(lr).to_vec(). + +LogicalRouterNATs(lr, vec_empty()) :- + nb::Logical_Router(._uuid = lr), + not LogicalRouterNAT(lr, _). + +/* For each router, collect the set of IPv4 and IPv6 addresses used for SNAT, + * which includes: + * + * - dnat_force_snat_addrs + * - lb_force_snat_addrs + * - IP addresses used in the router's attached NAT rules + * + * This is like init_nat_entries() in ovn-northd.c. */ +relation LogicalRouterSnatIP(lr: uuid, snat_ip: v46_ip, nat: Option<NAT>) +LogicalRouterSnatIP(lr._uuid, force_snat_ip, None) :- + lr in nb::Logical_Router(), + var dnat_force_snat_ips = get_force_snat_ip(lr, "dnat"), + var lb_force_snat_ips = get_force_snat_ip(lr, "lb"), + var force_snat_ip = FlatMap(dnat_force_snat_ips.union(lb_force_snat_ips)). +LogicalRouterSnatIP(lr, snat_ip, Some{nat}) :- + LogicalRouterNAT(lr, nat@NAT{.nat = &nb::NAT{.__type = "snat"}, .external_ip = snat_ip}). + +function group_to_setunionmap(g: Group<'K1, ('K2,Set<'V>)>): Map<'K2,Set<'V>> { + var map = map_empty(); + for (entry in g) { + (var key, var value) = entry; + match (map.get(key)) { + None -> map.insert(key, value), + Some{old_value} -> map.insert(key, old_value.union(value)) + } + }; + map +} +relation LogicalRouterSnatIPs(lr: uuid, snat_ips: Map<v46_ip, Set<NAT>>) +LogicalRouterSnatIPs(lr, snat_ips) :- + LogicalRouterSnatIP(lr, snat_ip, nat), + var snat_ips = (snat_ip, nat.to_set()).group_by(lr).group_to_setunionmap(). +LogicalRouterSnatIPs(lr._uuid, map_empty()) :- + lr in nb::Logical_Router(), + not LogicalRouterSnatIP(.lr = lr._uuid). + +relation LogicalRouterLB(lr: uuid, nat: Ref<nb::Load_Balancer>) + +LogicalRouterLB(lr, lb) :- + nb::Logical_Router(._uuid = lr, .load_balancer = lbs), + var lb_uuid = FlatMap(lbs), + lb in &LoadBalancerRef[nb::Load_Balancer{._uuid = lb_uuid}]. + +relation LogicalRouterLBs(lr: uuid, nat: Vec<Ref<nb::Load_Balancer>>) + +LogicalRouterLBs(lr, lbs) :- + LogicalRouterLB(lr, lb), + var lbs = lb.group_by(lr).to_vec(). + +LogicalRouterLBs(lr, vec_empty()) :- + nb::Logical_Router(._uuid = lr), + not LogicalRouterLB(lr, _). + +/* Router relation collects all attributes of a logical router. + * + * `lr` - Logical_Router record from the NB database + * `l3dgw_port` - optional redirect port (see `DistributedGatewayPort`) + * `redirect_port_name` - derived redirect port name (or empty string if + * router does not have a redirect port) + * `is_gateway` - true iff the router is a gateway router. Together with + * `l3dgw_port`, this flag affects the generation of various flows + * related to NAT and load balancing. + * `learn_from_arp_request` - whether ARP requests to addresses on the router + * should always be learned + */ + +function chassis_redirect_name(port_name: string): string = "cr-${port_name}" + +relation &Router( + lr: nb::Logical_Router, + l3dgw_port: Option<nb::Logical_Router_Port>, + redirect_port_name: string, + is_gateway: bool, + nats: Vec<NAT>, + snat_ips: Map<v46_ip, Set<NAT>>, + lbs: Vec<Ref<nb::Load_Balancer>>, + mcast_cfg: Ref<McastRouterCfg>, + learn_from_arp_request: bool +) + +&Router(.lr = lr, + .l3dgw_port = l3dgw_port, + .redirect_port_name = + match (l3dgw_port) { + Some{rport} -> json_string_escape(chassis_redirect_name(rport.name)), + _ -> "" + }, + .is_gateway = is_some(map_get(lr.options, "chassis")), + .nats = nats, + .snat_ips = snat_ips, + .lbs = lbs, + .mcast_cfg = mcast_cfg, + .learn_from_arp_request = learn_from_arp_request) :- + lr in nb::Logical_Router(), + lr.is_enabled(), + LogicalRouterRedirectPort(lr._uuid, l3dgw_port), + LogicalRouterNATs(lr._uuid, nats), + LogicalRouterLBs(lr._uuid, lbs), + LogicalRouterSnatIPs(lr._uuid, snat_ips), + mcast_cfg in &McastRouterCfg(.datapath = lr._uuid), + var learn_from_arp_request = map_get_bool_def(lr.options, "always_learn_from_arp_request", true). + +/* RouterLB: many-to-many relation between logical routers and nb::LB */ +relation RouterLB(router: Ref<Router>, lb: Ref<nb::Load_Balancer>) + +RouterLB(router, lb) :- + router in &Router(.lbs = lbs), + var lb = FlatMap(lbs). + +/* Load balancer VIPs associated with routers */ +relation RouterLBVIP( + router: Ref<Router>, + lb: Ref<nb::Load_Balancer>, + vip: string, + backends: string) + +RouterLBVIP(router, lb, vip, backends) :- + RouterLB(router, lb@(&nb::Load_Balancer{.vips = vips})), + var kv = FlatMap(vips), + (var vip, var backends) = kv. + +/* Router-to-router logical port connections */ +relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: string) + +RouterRouterPeer(rport1, rport2, peer_name) :- + nb::Logical_Router_Port(._uuid = rport1, .peer = peer), + Some{var peer_name} = peer, + nb::Logical_Router_Port(._uuid = rport2, .name = peer_name). + +/* Router port can peer with anothe router port, a switch port or have + * no peer. + */ +typedef RouterPeer = PeerRouter{rport: uuid, name: string} + | PeerSwitch{sport: uuid, name: string} + | PeerNone + +function router_peer_name(peer: RouterPeer): Option<string> = { + match (peer) { + PeerRouter{_, n} -> Some{n}, + PeerSwitch{_, n} -> Some{n}, + PeerNone -> None + } +} + +relation RouterPortPeer(rport: uuid, peer: RouterPeer) + +/* Router-to-router logical port connections */ +RouterPortPeer(rport, PeerSwitch{sport, sport_name}) :- + SwitchRouterPeer(sport, sport_name, rport). + +RouterPortPeer(rport1, PeerRouter{rport2, rport2_name}) :- + RouterRouterPeer(rport1, rport2, rport2_name). + +RouterPortPeer(rport, PeerNone) :- + nb::Logical_Router_Port(._uuid = rport), + not SwitchRouterPeer(_, _, rport), + not RouterRouterPeer(rport, _, _). + +/* Each row maps from a Logical_Router port to the input options in its + * corresponding Port_Binding (if any). This is because northd preserves + * most of the options in that column. (northd unconditionally sets the + * ipv6_prefix_delegation and ipv6_prefix options, so we remove them for + * faster convergence.) */ +relation RouterPortSbOptions(lrp_uuid: uuid, options: Map<string,string>) +RouterPortSbOptions(lrp._uuid, options) :- + lrp in nb::Logical_Router_Port(), + pb in sb::Port_Binding(._uuid = lrp._uuid), + var options = { + var options = pb.options; + map_remove(options, "ipv6_prefix"); + map_remove(options, "ipv6_prefix_delegation"); + options + }. +RouterPortSbOptions(lrp._uuid, map_empty()) :- + lrp in nb::Logical_Router_Port(), + not sb::Port_Binding(._uuid = lrp._uuid). + +/* FIXME: what should happen when extract_lrp_networks fails? */ +/* RouterPort relation collects all attributes of a logical router port */ +relation &RouterPort( + lrp: nb::Logical_Router_Port, + json_name: string, + networks: lport_addresses, + router: Ref<Router>, + is_redirect: bool, + peer: RouterPeer, + mcast_cfg: Ref<McastPortCfg>, + sb_options: Map<string,string>) + +&RouterPort(.lrp = lrp, + .json_name = json_string_escape(lrp.name), + .networks = networks, + .router = router, + .is_redirect = is_redirect, + .peer = peer, + .mcast_cfg = mcast_cfg, + .sb_options = sb_options) :- + nb::Logical_Router_Port[lrp], + Some{var networks} = extract_lrp_networks(lrp.mac, lrp.networks), + LogicalRouterPort(lrp._uuid, lrouter_uuid), + router in &Router(.lr = nb::Logical_Router{._uuid = lrouter_uuid}), + RouterPortIsRedirect(lrp._uuid, is_redirect), + RouterPortPeer(lrp._uuid, peer), + mcast_cfg in &McastPortCfg(.port = lrp._uuid, .router_port = true), + RouterPortSbOptions(lrp._uuid, sb_options). + +relation RouterPortNetworksIPv4Addr(port: Ref<RouterPort>, addr: ipv4_netaddr) + +RouterPortNetworksIPv4Addr(port, addr) :- + port in &RouterPort(.networks = networks), + var addr = FlatMap(networks.ipv4_addrs). + +relation RouterPortNetworksIPv6Addr(port: Ref<RouterPort>, addr: ipv6_netaddr) + +RouterPortNetworksIPv6Addr(port, addr) :- + port in &RouterPort(.networks = networks), + var addr = FlatMap(networks.ipv6_addrs). + +/* StaticRoute: Collects and parses attributes of a static route. */ +typedef route_policy = SrcIp | DstIp +function route_policy_from_string(s: Option<string>): route_policy = { + match (s) { + Some{"src-ip"} -> SrcIp, + _ -> DstIp + } +} +function to_string(policy: route_policy): string = { + match (policy) { + SrcIp -> "src-ip", + DstIp -> "dst-ip" + } +} + +typedef route_key = RouteKey { + policy: route_policy, + ip_prefix: v46_ip, + plen: bit<32> +} + +relation &StaticRoute(lrsr: nb::Logical_Router_Static_Route, + key: route_key, + nexthop: v46_ip, + output_port: Option<string>, + ecmp_symmetric_reply: bool) + +&StaticRoute(.lrsr = lrsr, + .key = RouteKey{policy, ip_prefix, plen}, + .nexthop = nexthop, + .output_port = lrsr.output_port, + .ecmp_symmetric_reply = esr) :- + lrsr in nb::Logical_Router_Static_Route(), + var policy = route_policy_from_string(lrsr.policy), + Some{(var nexthop, var nexthop_plen)} = ip46_parse_cidr(lrsr.nexthop), + match (nexthop) { + IPv4{_} -> nexthop_plen == 32, + IPv6{_} -> nexthop_plen == 128 + }, + Some{(var ip_prefix, var plen)} = ip46_parse_cidr(lrsr.ip_prefix), + match ((nexthop, ip_prefix)) { + (IPv4{_}, IPv4{_}) -> true, + (IPv6{_}, IPv6{_}) -> true, + _ -> false + }, + var esr = map_get_bool_def(lrsr.options, "ecmp_symmetric_reply", false). + +/* Returns the IP address of the router port 'op' that + * overlaps with 'ip'. If one is not found, returns None. */ +function find_lrp_member_ip(networks: lport_addresses, ip: v46_ip): Option<v46_ip> = +{ + match (ip) { + IPv4{ip4} -> { + for (na in networks.ipv4_addrs) { + if (ip_same_network((na.addr, ip4), ipv4_netaddr_mask(na))) { + /* There should be only 1 interface that matches the + * supplied IP. Otherwise, it's a configuration error, + * because subnets of a router's interfaces should NOT + * overlap. */ + return Some{IPv4{na.addr}} + } + }; + return None + }, + IPv6{ip6} -> { + for (na in networks.ipv6_addrs) { + if (ipv6_same_network((na.addr, ip6), ipv6_netaddr_mask(na))) { + /* There should be only 1 interface that matches the + * supplied IP. Otherwise, it's a configuration error, + * because subnets of a router's interfaces should NOT + * overlap. */ + return Some{IPv6{na.addr}} + } + }; + return None + } + } +} + + +/* Step 1: compute router-route pairs */ +relation RouterStaticRoute_( + router : Ref<Router>, + key : route_key, + nexthop : v46_ip, + output_port : Option<string>, + ecmp_symmetric_reply : bool) + +RouterStaticRoute_(.router = router, + .key = route.key, + .nexthop = route.nexthop, + .output_port = route.output_port, + .ecmp_symmetric_reply = route.ecmp_symmetric_reply) :- + router in &Router(.lr = nb::Logical_Router{.static_routes = routes}), + var route_id = FlatMap(routes), + route in &StaticRoute(.lrsr = nb::Logical_Router_Static_Route{._uuid = route_id}). + +/* Step-2: compute output_port for each pair */ +typedef route_dst = RouteDst { + nexthop: v46_ip, + src_ip: v46_ip, + port: Ref<RouterPort>, + ecmp_symmetric_reply: bool +} + +relation RouterStaticRoute( + router : Ref<Router>, + key : route_key, + dsts : Set<route_dst>) + +RouterStaticRoute(router, key, dsts) :- + RouterStaticRoute_(.router = router, + .key = key, + .nexthop = nexthop, + .output_port = None, + .ecmp_symmetric_reply = ecmp_symmetric_reply), + /* output_port is not specified, find the + * router port matching the next hop. */ + port in &RouterPort(.router = &Router{.lr = nb::Logical_Router{._uuid = router.lr._uuid}}, + .networks = networks), + Some{var src_ip} = find_lrp_member_ip(networks, nexthop), + var dst = RouteDst{nexthop, src_ip, port, ecmp_symmetric_reply}, + var dsts = dst.group_by((router, key)).to_set(). + +RouterStaticRoute(router, key, dsts) :- + RouterStaticRoute_(.router = router, + .key = key, + .nexthop = nexthop, + .output_port = Some{oport}, + .ecmp_symmetric_reply = ecmp_symmetric_reply), + /* output_port specified */ + port in &RouterPort(.lrp = nb::Logical_Router_Port{.name = oport}, + .networks = networks), + Some{var src_ip} = match (find_lrp_member_ip(networks, nexthop)) { + Some{src_ip} -> Some{src_ip}, + None -> { + /* There are no IP networks configured on the router's port via + * which 'route->nexthop' is theoretically reachable. But since + * 'out_port' has been specified, we honor it by trying to reach + * 'route->nexthop' via the first IP address of 'out_port'. + * (There are cases, e.g in GCE, where each VM gets a /32 IP + * address and the default gateway is still reachable from it.) */ + match (key.ip_prefix) { + IPv4{_} -> match (vec_nth(networks.ipv4_addrs, 0)) { + Some{addr} -> Some{IPv4{addr.addr}}, + None -> { + warn("No path for static route ${key.ip_prefix}; next hop ${nexthop}"); + None + } + }, + IPv6{_} -> match (vec_nth(networks.ipv6_addrs, 0)) { + Some{addr} -> Some{IPv6{addr.addr}}, + None -> { + warn("No path for static route ${key.ip_prefix}; next hop ${nexthop}"); + None + } + } + } + } + }, + var dsts = set_singleton(RouteDst{nexthop, src_ip, port, ecmp_symmetric_reply}). + +Warning[message] :- + RouterStaticRoute_(.router = router, .key = key, .nexthop = nexthop), + not RouterStaticRoute(.router = router, .key = key), + var message = "No path for ${key.policy} static route ${key.ip_prefix}/${key.plen} with next hop ${nexthop}". diff --git a/northd/lswitch.dl b/northd/lswitch.dl new file mode 100644 index 000000000000..9a2d4c1c8d4b --- /dev/null +++ b/northd/lswitch.dl @@ -0,0 +1,643 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import OVN_Northbound as nb +import OVN_Southbound as sb +import ovsdb +import ovn +import lrouter +import multicast +import helpers +import ipam + +function is_enabled(lsp: nb::Logical_Switch_Port): bool { is_enabled(lsp.enabled) } +function is_enabled(lsp: Ref<nb::Logical_Switch_Port>): bool { lsp.deref().is_enabled() } +function is_enabled(sp: SwitchPort): bool { sp.lsp.is_enabled() } +function is_enabled(sp: Ref<SwitchPort>): bool { sp.lsp.is_enabled() } + +relation SwitchRouterPeerRef(lsp: uuid, rport: Option<Ref<RouterPort>>) + +SwitchRouterPeerRef(lsp, Some{rport}) :- + SwitchRouterPeer(lsp, _, lrp), + rport in &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp}). + +SwitchRouterPeerRef(lsp, None) :- + nb::Logical_Switch_Port(._uuid = lsp), + not SwitchRouterPeer(lsp, _, _). + +/* map logical ports to logical switches */ +relation LogicalSwitchPort(lport: uuid, lswitch: uuid) + +LogicalSwitchPort(lport, lswitch) :- + nb::Logical_Switch(._uuid = lswitch, .ports = ports), + var lport = FlatMap(ports). + +/* Logical switches that have enabled ports with "unknown" address */ +relation LogicalSwitchUnknownPorts(ls: uuid, port_ids: Set<uuid>) + +LogicalSwitchUnknownPorts(ls_uuid, port_ids) :- + &SwitchPort(.lsp = lsp, .sw = &Switch{.ls = ls}), + lsp.is_enabled() and set_contains(lsp.addresses, "unknown"), + var ls_uuid = ls._uuid, + var port_ids = lsp._uuid.group_by(ls_uuid).to_set(). + +/* PortStaticAddresses: static IP addresses associated with each Logical_Switch_Port */ +relation PortStaticAddresses(lsport: uuid, ip4addrs: Set<string>, ip6addrs: Set<string>) + +PortStaticAddresses(.lsport = port_uuid, + .ip4addrs = set_unions(ip4_addrs), + .ip6addrs = set_unions(ip6_addrs)) :- + nb::Logical_Switch_Port(._uuid = port_uuid, .addresses = addresses), + var address = FlatMap(if (set_is_empty(addresses)) { set_singleton("") } else { addresses }), + (var ip4addrs, var ip6addrs) = if (not is_dynamic_lsp_address(address)) { + split_addresses(address) + } else { (set_empty(), set_empty()) }, + var static_addrs = (ip4addrs, ip6addrs).group_by(port_uuid).group_unzip(), + (var ip4_addrs, var ip6_addrs) = static_addrs. + +relation PortInGroup(port: uuid, group: uuid) + +PortInGroup(port, group) :- + nb::Port_Group(._uuid = group, .ports = ports), + var port = FlatMap(ports). + +/* All ACLs associated with logical switch */ +relation LogicalSwitchACL(ls: uuid, acl: uuid) + +LogicalSwitchACL(ls, acl) :- + nb::Logical_Switch(._uuid = ls, .acls = acls), + var acl = FlatMap(acls). + +LogicalSwitchACL(ls, acl) :- + nb::Logical_Switch(._uuid = ls, .ports = ports), + var port_id = FlatMap(ports), + PortInGroup(port_id, group_id), + nb::Port_Group(._uuid = group_id, .acls = acls), + var acl = FlatMap(acls). + +relation LogicalSwitchStatefulACL(ls: uuid, acl: uuid) + +LogicalSwitchStatefulACL(ls, acl) :- + LogicalSwitchACL(ls, acl), + nb::ACL(._uuid = acl, .action = "allow-related"). + +relation LogicalSwitchHasStatefulACL(ls: uuid, has_stateful_acl: bool) + +LogicalSwitchHasStatefulACL(ls, true) :- + LogicalSwitchStatefulACL(ls, _). + +LogicalSwitchHasStatefulACL(ls, false) :- + nb::Logical_Switch(._uuid = ls), + not LogicalSwitchStatefulACL(ls, _). + +relation LogicalSwitchLocalnetPort0(ls_uuid: uuid, lsp_name: string) +LogicalSwitchLocalnetPort0(ls_uuid, lsp_name) :- + ls in nb::Logical_Switch(._uuid = ls_uuid), + var lsp_uuid = FlatMap(ls.ports), + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), + lsp.__type == "localnet", + var lsp_name = lsp.name. + +relation LogicalSwitchLocalnetPorts(ls_uuid: uuid, localnet_port_names: Vec<string>) +LogicalSwitchLocalnetPorts(ls_uuid, localnet_port_names) :- + LogicalSwitchLocalnetPort0(ls_uuid, lsp_name), + var localnet_port_names = lsp_name.group_by(ls_uuid).to_vec(). +LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :- + ls in nb::Logical_Switch(), + var ls_uuid = ls._uuid, + not LogicalSwitchLocalnetPort0(ls_uuid, _). + +/* Flatten the list of dns_records in Logical_Switch */ +relation LogicalSwitchDNS(ls_uuid: uuid, dns_uuid: uuid) + +LogicalSwitchDNS(ls._uuid, dns_uuid) :- + nb::Logical_Switch[ls], + var dns_uuid = FlatMap(ls.dns_records), + nb::DNS(._uuid = dns_uuid). + +relation LogicalSwitchWithDNSRecords(ls: uuid) + +LogicalSwitchWithDNSRecords(ls) :- + LogicalSwitchDNS(ls, dns_uuid), + nb::DNS(._uuid = dns_uuid, .records = records), + not map_is_empty(records). + +relation LogicalSwitchHasDNSRecords(ls: uuid, has_dns_records: bool) + +LogicalSwitchHasDNSRecords(ls, true) :- + LogicalSwitchWithDNSRecords(ls). + +LogicalSwitchHasDNSRecords(ls, false) :- + nb::Logical_Switch(._uuid = ls), + not LogicalSwitchWithDNSRecords(ls). + +relation LogicalSwitchHasNonRouterPort0(ls: uuid) +LogicalSwitchHasNonRouterPort0(ls_uuid) :- + ls in nb::Logical_Switch(._uuid = ls_uuid), + var lsp_uuid = FlatMap(ls.ports), + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), + lsp.__type != "router". + +relation LogicalSwitchHasNonRouterPort(ls: uuid, has_non_router_port: bool) +LogicalSwitchHasNonRouterPort(ls, true) :- + LogicalSwitchHasNonRouterPort0(ls). +LogicalSwitchHasNonRouterPort(ls, false) :- + nb::Logical_Switch(._uuid = ls), + not LogicalSwitchHasNonRouterPort0(ls). + +/* Switch relation collects all attributes of a logical switch */ + +relation &Switch( + ls: nb::Logical_Switch, + has_stateful_acl: bool, + has_lb_vip: bool, + has_dns_records: bool, + localnet_port_names: Vec<string>, + subnet: Option<(in_addr/*subnet*/, in_addr/*mask*/, bit<32>/*start_ipv4*/, bit<32>/*total_ipv4s*/)>, + ipv6_prefix: Option<in6_addr>, + mcast_cfg: Ref<McastSwitchCfg>, + is_vlan_transparent: bool, + + /* Does this switch have at least one port with type != "router"? */ + has_non_router_port: bool +) + +function ipv6_parse_prefix(s: string): Option<in6_addr> { + if (string_contains(s, "/")) { + match (ipv6_parse_cidr(s)) { + Right{(addr, 64)} -> Some{addr}, + _ -> None + } + } else { + ipv6_parse(s) + } +} + +&Switch(.ls = ls, + .has_stateful_acl = has_stateful_acl, + .has_lb_vip = has_lb_vip, + .has_dns_records = has_dns_records, + .localnet_port_names = localnet_port_names, + .subnet = subnet, + .ipv6_prefix = ipv6_prefix, + .mcast_cfg = mcast_cfg, + .has_non_router_port = has_non_router_port, + .is_vlan_transparent = is_vlan_transparent) :- + nb::Logical_Switch[ls], + LogicalSwitchHasStatefulACL(ls._uuid, has_stateful_acl), + LogicalSwitchHasLBVIP(ls._uuid, has_lb_vip), + LogicalSwitchHasDNSRecords(ls._uuid, has_dns_records), + LogicalSwitchLocalnetPorts(ls._uuid, localnet_port_names), + LogicalSwitchHasNonRouterPort(ls._uuid, has_non_router_port), + mcast_cfg in &McastSwitchCfg(.datapath = ls._uuid), + var subnet = + match (map_get(ls.other_config, "subnet")) { + None -> None, + Some{subnet_str} -> { + match (ip_parse_masked(subnet_str)) { + Left{err} -> { + warn("bad 'subnet' ${subnet_str}"); + None + }, + Right{(subnet, mask)} -> { + if (ip_count_cidr_bits(mask) == Some{32} + or not ip_is_cidr(mask)) { + warn("bad 'subnet' ${subnet_str}"); + None + } else { + Some{(subnet, mask, (iptohl(subnet) & iptohl(mask)) + 1, ~iptohl(mask))} + } + } + } + } + }, + var ipv6_prefix = + match (map_get(ls.other_config, "ipv6_prefix")) { + None -> None, + Some{prefix} -> ipv6_parse_prefix(prefix) + }, + var is_vlan_transparent = map_get_bool_def(ls.other_config, "vlan-passthru", false). + +/* SwitchLB: many-to-many relation between logical switches and nb::LB */ +relation SwitchLB(sw_uuid: uuid, lb: Ref<nb::Load_Balancer>) +SwitchLB(sw_uuid, lb) :- + nb::Logical_Switch(._uuid = sw_uuid, .load_balancer = lb_ids), + var lb_id = FlatMap(lb_ids), + lb in &LoadBalancerRef[nb::Load_Balancer{._uuid = lb_id}]. + +/* Load balancer VIPs associated with switch */ +relation SwitchLBVIP(sw_uuid: uuid, lb: Ref<nb::Load_Balancer>, vip: string, backends: string) +SwitchLBVIP(sw_uuid, lb, vip, backends) :- + SwitchLB(sw_uuid, lb@(&nb::Load_Balancer{.vips = vips})), + var kv = FlatMap(vips), + (var vip, var backends) = kv. + +relation LogicalSwitchHasLBVIP(sw_uuid: uuid, has_lb_vip: bool) +LogicalSwitchHasLBVIP(sw_uuid, true) :- + SwitchLBVIP(.sw_uuid = sw_uuid). +LogicalSwitchHasLBVIP(sw_uuid, false) :- + nb::Logical_Switch(._uuid = sw_uuid), + not SwitchLBVIP(.sw_uuid = sw_uuid). + +relation &LBVIP( + lb: Ref<nb::Load_Balancer>, + vip_key: string, + vip_addr: v46_ip, + vip_port: bit<16>, + backend_ips: string) + +&LBVIP(.lb = lb, + .vip_key = vip_key, + .vip_addr = vip_addr, + .vip_port = vip_port, + .backend_ips = backend_ips) :- + LoadBalancerRef[lb], + var vip = FlatMap(lb.vips), + (var vip_key, var backend_ips) = vip, + Some{(var vip_addr, var vip_port)} = ip_address_and_port_from_lb_key(vip_key). + +typedef svc_monitor = SvcMonitor{ + port_name: string, // Might name a switch or router port. + src_ip: string +} + +relation &LBVIPBackend( + lbvip: Ref<LBVIP>, + ip: v46_ip, + port: bit<16>, + svc_monitor: Option<svc_monitor>) + +function parse_ip_port_mapping(mappings: Map<string,string>, ip: v46_ip) + : Option<svc_monitor> { + for (kv in mappings) { + (var key, var value) = kv; + if (ip46_parse(key) == Some{ip}) { + var strs = string_split(value, ":"); + if (vec_len(strs) != 2) { + return None + }; + + return match ((vec_nth(strs, 0), vec_nth(strs, 1))) { + (Some{port_name}, Some{src_ip}) -> Some{SvcMonitor{port_name, src_ip}}, + _ -> None + } + } + }; + return None +} + +&LBVIPBackend(.lbvip = lbvip, + .ip = ip, + .port = port, + .svc_monitor = svc_monitor) :- + LBVIP[lbvip], + var backend = FlatMap(string_split(lbvip.backend_ips, ",")), + Some{(var ip, var port)} = ip_address_and_port_from_lb_key(backend), + (var svc_monitor) = parse_ip_port_mapping(lbvip.lb.ip_port_mappings, ip). + +function is_online(status: Option<string>): bool = { + match (status) { + Some{s} -> s == "online", + _ -> true + } +} +function default_protocol(protocol: Option<string>): string = { + match (protocol) { + Some{x} -> x, + None -> "tcp" + } +} +relation &LBVIPBackendStatus( + port: bit<16>, + ip: v46_ip, + protocol: string, + logical_port: string, + up: bool) +&LBVIPBackendStatus(port, ip, protocol, logical_port, up) :- + sm in sb::Service_Monitor(), + var port = sm.port as bit<16>, + Some{var ip} = ip46_parse(sm.ip), + var protocol = default_protocol(sm.protocol), + var logical_port = sm.logical_port, + var up = is_online(sm.status). +&LBVIPBackendStatus(port, ip, protocol, logical_port, true) :- + LBVIPBackend[lbvipbackend], + var port = lbvipbackend.port as bit<16>, + var ip = lbvipbackend.ip, + var protocol = default_protocol(lbvipbackend.lbvip.lb.protocol), + Some{var svc_monitor} = lbvipbackend.svc_monitor, + var logical_port = svc_monitor.port_name, + not sb::Service_Monitor(.port = port as bit<64>, + .ip = "${ip}", + .protocol = Some{protocol}, + .logical_port = logical_port). + +/* SwitchPortDHCPv4Options: many-to-one relation between logical switches and DHCPv4 options */ +relation SwitchPortDHCPv4Options( + port: Ref<SwitchPort>, + dhcpv4_options: Ref<nb::DHCP_Options>) + +SwitchPortDHCPv4Options(port, options) :- + port in &SwitchPort(.lsp = lsp), + port.lsp.__type != "external", + Some{var dhcpv4_uuid} = lsp.dhcpv4_options, + options in &DHCP_OptionsRef[nb::DHCP_Options{._uuid = dhcpv4_uuid}]. + +/* SwitchPortDHCPv6Options: many-to-one relation between logical switches and DHCPv4 options */ +relation SwitchPortDHCPv6Options( + port: Ref<SwitchPort>, + dhcpv6_options: Ref<nb::DHCP_Options>) + +SwitchPortDHCPv6Options(port, options) :- + port in &SwitchPort(.lsp = lsp), + port.lsp.__type != "external", + Some{var dhcpv6_uuid} = lsp.dhcpv6_options, + options in &DHCP_OptionsRef[nb::DHCP_Options{._uuid = dhcpv6_uuid}]. + +/* SwitchQoS: many-to-one relation between logical switches and nb::QoS */ +relation SwitchQoS(sw: Ref<Switch>, qos: Ref<nb::QoS>) + +SwitchQoS(sw, qos) :- + sw in &Switch(.ls = nb::Logical_Switch{.qos_rules = qos_rules}), + var qos_rule = FlatMap(qos_rules), + qos in &QoSRef[nb::QoS{._uuid = qos_rule}]. + +/* SwitchACL: many-to-many relation between logical switches and ACLs */ +relation &SwitchACL(sw: Ref<Switch>, + acl: Ref<nb::ACL>) + +&SwitchACL(.sw = sw, .acl = acl) :- + LogicalSwitchACL(sw_uuid, acl_uuid), + sw in &Switch(.ls = nb::Logical_Switch{._uuid = sw_uuid}), + acl in &ACLRef[nb::ACL{._uuid = acl_uuid}]. + +relation SwitchPortUp(lsp: uuid, up: bool) + +SwitchPortUp(lsp, up) :- + nb::Logical_Switch_Port(._uuid = lsp, .name = lsp_name, .__type = __type), + sb::Port_Binding(.logical_port = lsp_name, .chassis = chassis), + var up = + if (__type == "router") { + true + } else if (is_none(chassis)) { + false + } else { + true + }. + +SwitchPortUp(lsp, up) :- + nb::Logical_Switch_Port(._uuid = lsp, .name = lsp_name, .__type = __type), + not sb::Port_Binding(.logical_port = lsp_name), + var up = __type == "router". + +relation SwitchPortHAChassisGroup0(lsp_uuid: uuid, hac_group_uuid: uuid) +SwitchPortHAChassisGroup0(lsp_uuid, ha_chassis_group_uuid(ls_uuid)) :- + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), + lsp.__type == "external", + Some{var hac_group_uuid} = lsp.ha_chassis_group, + ha_chassis_group in nb::HA_Chassis_Group(._uuid = hac_group_uuid), + /* If the group is empty, then HA_Chassis_Group record will not be created in SB, + * and so we should not create a reference to the group in Port_Binding table, + * to avoid integrity violation. */ + not set_is_empty(ha_chassis_group.ha_chassis), + LogicalSwitchPort(.lport = lsp_uuid, .lswitch = ls_uuid). +relation SwitchPortHAChassisGroup(lsp_uuid: uuid, hac_group_uuid: Option<uuid>) +SwitchPortHAChassisGroup(lsp_uuid, Some{hac_group_uuid}) :- + SwitchPortHAChassisGroup0(lsp_uuid, hac_group_uuid). +SwitchPortHAChassisGroup(lsp_uuid, None) :- + lsp in nb::Logical_Switch_Port(._uuid = lsp_uuid), + not SwitchPortHAChassisGroup0(lsp_uuid, _). + +/* SwitchPort relation collects all attributes of a logical switch port + * - `peer` - peer router port, if any + * - `static_dynamic_mac` - port has a "dynamic" address that contains a static MAC, + * e.g., "80:fa:5b:06:72:b7 dynamic" + * - `static_dynamic_ipv4`, `static_dynamic_ipv6` - port has a "dynamic" address that contains a static IP, + * e.g., "dynamic 192.168.1.2" + * - `needs_dynamic_ipv4address` - port requires a dynamically allocated IPv4 address + * - `needs_dynamic_macaddress` - port requires a dynamically allocated MAC address + * - `needs_dynamic_tag` - port requires a dynamically allocated tag + * - `up` - true if the port is bound to a chassis or has type "" + * - 'hac_group_uuid' - uuid of sb::HA_Chassis_Group, only for "external" ports + */ +relation &SwitchPort( + lsp: nb::Logical_Switch_Port, + json_name: string, + sw: Ref<Switch>, + peer: Option<Ref<RouterPort>>, + static_addresses: Vec<lport_addresses>, + dynamic_address: Option<lport_addresses>, + static_dynamic_mac: Option<eth_addr>, + static_dynamic_ipv4: Option<in_addr>, + static_dynamic_ipv6: Option<in6_addr>, + ps_addresses: Vec<lport_addresses>, + ps_eth_addresses: Vec<string>, + parent_name: Option<string>, + needs_dynamic_ipv4address: bool, + needs_dynamic_macaddress: bool, + needs_dynamic_ipv6address: bool, + needs_dynamic_tag: bool, + up: bool, + mcast_cfg: Ref<McastPortCfg>, + hac_group_uuid: Option<uuid> +) + +&SwitchPort(.lsp = lsp, + .json_name = json_string_escape(lsp.name), + .sw = sw, + .peer = peer, + .static_addresses = static_addresses, + .dynamic_address = dynamic_address, + .static_dynamic_mac = static_dynamic_mac, + .static_dynamic_ipv4 = static_dynamic_ipv4, + .static_dynamic_ipv6 = static_dynamic_ipv6, + .ps_addresses = ps_addresses, + .ps_eth_addresses = ps_eth_addresses, + .parent_name = parent_name, + .needs_dynamic_ipv4address = needs_dynamic_ipv4address, + .needs_dynamic_macaddress = needs_dynamic_macaddress, + .needs_dynamic_ipv6address = needs_dynamic_ipv6address, + .needs_dynamic_tag = needs_dynamic_tag, + .up = up, + .mcast_cfg = mcast_cfg, + .hac_group_uuid = hac_group_uuid) :- + nb::Logical_Switch_Port[lsp], + LogicalSwitchPort(lsp._uuid, lswitch_uuid), + sw in &Switch(.ls = nb::Logical_Switch{._uuid = lswitch_uuid, .other_config = other_config}, + .subnet = subnet, + .ipv6_prefix = ipv6_prefix), + SwitchRouterPeerRef(lsp._uuid, peer), + SwitchPortUp(lsp._uuid, up), + mcast_cfg in &McastPortCfg(.port = lsp._uuid, .router_port = false), + var static_addresses = { + var static_addresses = vec_empty(); + for (addr in lsp.addresses) { + if ((addr != "router") and (not is_dynamic_lsp_address(addr))) { + match (extract_lsp_addresses(addr)) { + None -> (), + Some{lport_addr} -> vec_push(static_addresses, lport_addr) + } + } else () + }; + static_addresses + }, + var ps_addresses = { + var ps_addresses = vec_empty(); + for (addr in lsp.port_security) { + match (extract_lsp_addresses(addr)) { + None -> (), + Some{lport_addr} -> vec_push(ps_addresses, lport_addr) + } + }; + ps_addresses + }, + var ps_eth_addresses = { + var ps_eth_addresses = vec_empty(); + for (ps_addr in ps_addresses) { + vec_push(ps_eth_addresses, "${ps_addr.ea}") + }; + ps_eth_addresses + }, + var dynamic_address = match (lsp.dynamic_addresses) { + None -> None, + Some{lport_addr} -> extract_lsp_addresses(lport_addr) + }, + (var static_dynamic_mac, + var static_dynamic_ipv4, + var static_dynamic_ipv6, + var has_dyn_lsp_addr) = { + var dynamic_address_request = None; + for (addr in lsp.addresses) { + dynamic_address_request = parse_dynamic_address_request(addr); + if (is_some(dynamic_address_request)) { + break + } + }; + + match (dynamic_address_request) { + Some{DynamicAddressRequest{mac, ipv4, ipv6}} -> (mac, ipv4, ipv6, true), + None -> (None, None, None, false) + } + }, + var needs_dynamic_ipv4address = has_dyn_lsp_addr and is_none(peer) and is_some(subnet) and + is_none(static_dynamic_ipv4), + var needs_dynamic_macaddress = has_dyn_lsp_addr and is_none(peer) and is_none(static_dynamic_mac) and + (is_some(subnet) or is_some(ipv6_prefix) or + map_get(other_config, "mac_only") == Some{"true"}), + var needs_dynamic_ipv6address = has_dyn_lsp_addr and is_none(peer) and is_some(ipv6_prefix) and is_none(static_dynamic_ipv6), + var parent_name = match (lsp.parent_name) { + None -> None, + Some{pname} -> if (pname == "") { None } else { Some{pname} } + }, + /* Port needs dynamic tag if it has a parent and its `tag_request` is 0. */ + var needs_dynamic_tag = is_some(parent_name) and + lsp.tag_request == Some{0}, + SwitchPortHAChassisGroup(.lsp_uuid = lsp._uuid, + .hac_group_uuid = hac_group_uuid). + +/* Switch port port security addresses */ +relation SwitchPortPSAddresses(port: Ref<SwitchPort>, + ps_addrs: lport_addresses) + +SwitchPortPSAddresses(port, ps_addrs) :- + port in &SwitchPort(.ps_addresses = ps_addresses), + var ps_addrs = FlatMap(ps_addresses). + +/* All static addresses associated with a port parsed into + * the lport_addresses data structure */ +relation SwitchPortStaticAddresses(port: Ref<SwitchPort>, + addrs: lport_addresses) +SwitchPortStaticAddresses(port, addrs) :- + port in &SwitchPort(.static_addresses = static_addresses), + var addrs = FlatMap(static_addresses). + +/* All static and dynamic addresses associated with a port parsed into + * the lport_addresses data structure */ +relation SwitchPortAddresses(port: Ref<SwitchPort>, + addrs: lport_addresses) + +SwitchPortAddresses(port, addrs) :- SwitchPortStaticAddresses(port, addrs). + +SwitchPortAddresses(port, dynamic_address) :- + SwitchPortNewDynamicAddress(port, Some{dynamic_address}). + +/* "router" is a special Logical_Switch_Port address value that indicates that the Ethernet, IPv4, and IPv6 + * this port should be obtained from the connected logical router port, as specified by router-port in + * options. + * + * The resulting addresses are used to populate the logical switch’s destination lookup, and also for the + * logical switch to generate ARP and ND replies. + * + * If the connected logical router port is a distributed gateway port and the logical router has rules + * specified in nat with external_mac, then those addresses are also used to populate the switch’s destination + * lookup. */ +SwitchPortAddresses(port, addrs) :- + port in &SwitchPort(.lsp = lsp, .peer = Some{&rport}), + Some{var addrs} = { + var opt_addrs = None; + for (addr in lsp.addresses) { + if (addr == "router") { + opt_addrs = Some{rport.networks} + } else () + }; + opt_addrs + }. + +/* All static and dynamic IPv4 addresses associated with a port */ +relation SwitchPortIPv4Address(port: Ref<SwitchPort>, + ea: eth_addr, + addr: ipv4_netaddr) + +SwitchPortIPv4Address(port, ea, addr) :- + SwitchPortAddresses(port, LPortAddress{.ea = ea, .ipv4_addrs = addrs}), + var addr = FlatMap(addrs). + +/* All static and dynamic IPv6 addresses associated with a port */ +relation SwitchPortIPv6Address(port: Ref<SwitchPort>, + ea: eth_addr, + addr: ipv6_netaddr) + +SwitchPortIPv6Address(port, ea, addr) :- + SwitchPortAddresses(port, LPortAddress{.ea = ea, .ipv6_addrs = addrs}), + var addr = FlatMap(addrs). + +/* Service monitoring. */ + +/* MAC allocated for service monitor usage. Just one mac is allocated + * for this purpose and ovn-controller's on each chassis will make use + * of this mac when sending out the packets to monitor the services + * defined in Service_Monitor Southbound table. Since these packets + * all locally handled, having just one mac is good enough. */ +function get_svc_monitor_mac(options: Map<string,string>, uuid: uuid) + : eth_addr = +{ + var existing_mac = match ( + map_get(options, "svc_monitor_mac")) + { + Some{mac} -> scan_eth_addr(mac), + None -> None + }; + match (existing_mac) { + Some{mac} -> mac, + None -> eth_addr_from_uint64(pseudorandom_mac(uuid, 'h5678)) + } +} +function put_svc_monitor_mac(options: Map<string,string>, + svc_monitor_mac: eth_addr) : Map<string,string> = +{ + map_insert_imm(options, "svc_monitor_mac", to_string(svc_monitor_mac)) +} +relation SvcMonitorMac(mac: eth_addr) +SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :- + nb::NB_Global(._uuid = uuid, .options = options). diff --git a/northd/multicast.dl b/northd/multicast.dl new file mode 100644 index 000000000000..3f108c85ef7d --- /dev/null +++ b/northd/multicast.dl @@ -0,0 +1,259 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import OVN_Northbound as nb +import OVN_Southbound as sb +import ovn +import ovsdb +import helpers +import lswitch +import lrouter + +function mCAST_DEFAULT_MAX_ENTRIES(): integer = 2048 + +function mCAST_DEFAULT_IDLE_TIMEOUT_S(): integer = 300 +function mCAST_DEFAULT_MIN_IDLE_TIMEOUT_S(): integer = 15 +function mCAST_DEFAULT_MAX_IDLE_TIMEOUT_S(): integer = 3600 + +function mCAST_DEFAULT_MIN_QUERY_INTERVAL_S(): integer = 1 +function mCAST_DEFAULT_MAX_QUERY_INTERVAL_S(): integer = + mCAST_DEFAULT_MAX_IDLE_TIMEOUT_S() + +function mCAST_DEFAULT_QUERY_MAX_RESPONSE_S(): integer = 1 + +/* IP Multicast per switch configuration. */ +relation &McastSwitchCfg( + datapath : uuid, + enabled : bool, + querier : bool, + flood_unreg : bool, + eth_src : string, + ip4_src : string, + ip6_src : string, + table_size : integer, + idle_timeout : integer, + query_interval: integer, + query_max_resp: integer +) + + /* FIXME: Right now table_size is enforced only in ovn-controller but in + * the ovn-northd C version we enforce it on the aggregate groups too. + */ + +&McastSwitchCfg( + .datapath = ls_uuid, + .enabled = map_get_bool_def(other_config, "mcast_snoop", + false), + .querier = map_get_bool_def(other_config, "mcast_querier", + true), + .flood_unreg = map_get_bool_def(other_config, + "mcast_flood_unregistered", + false), + .eth_src = map_get_str_def(other_config, "mcast_eth_src", ""), + .ip4_src = map_get_str_def(other_config, "mcast_ip4_src", ""), + .ip6_src = map_get_str_def(other_config, "mcast_ip6_src", ""), + .table_size = map_get_int_def(other_config, + "mcast_table_size", + mCAST_DEFAULT_MAX_ENTRIES()), + .idle_timeout = idle_timeout, + .query_interval = query_interval, + .query_max_resp = query_max_resp) :- + nb::Logical_Switch(._uuid = ls_uuid, + .other_config = other_config), + var idle_timeout = + map_get_int_def_limit(other_config, "mcast_idle_timeout", + mCAST_DEFAULT_IDLE_TIMEOUT_S(), + mCAST_DEFAULT_MIN_IDLE_TIMEOUT_S(), + mCAST_DEFAULT_MAX_IDLE_TIMEOUT_S()), + var query_interval = + map_get_int_def_limit(other_config, "mcast_query_interval", + idle_timeout / 2, + mCAST_DEFAULT_MIN_QUERY_INTERVAL_S(), + mCAST_DEFAULT_MAX_QUERY_INTERVAL_S()), + var query_max_resp = + map_get_int_def(other_config, "mcast_query_max_response", + mCAST_DEFAULT_QUERY_MAX_RESPONSE_S()). + +/* IP Multicast per router configuration. */ +relation &McastRouterCfg( + datapath: uuid, + relay : bool +) + +&McastRouterCfg(lr_uuid, mcast_relay) :- + nb::Logical_Router(._uuid = lr_uuid, .options = options), + var mcast_relay = map_get_bool_def(options, "mcast_relay", false). + +/* IP Multicast port configuration. */ +relation &McastPortCfg( + port : uuid, + router_port : bool, + flood : bool, + flood_reports : bool +) + +&McastPortCfg(lsp_uuid, false, flood, flood_reports) :- + nb::Logical_Switch_Port(._uuid = lsp_uuid, .options = options), + var flood = map_get_bool_def(options, "mcast_flood", false), + var flood_reports = map_get_bool_def(options, "mcast_flood_reports", + false). + +&McastPortCfg(lrp_uuid, true, flood, flood) :- + nb::Logical_Router_Port(._uuid = lrp_uuid, .options = options), + var flood = map_get_bool_def(options, "mcast_flood", false). + +/* Mapping between Switch and the set of router port uuids on which to flood + * IP multicast for relay. + */ +relation SwitchMcastFloodRelayPorts(sw: Ref<Switch>, ports: Set<uuid>) + +SwitchMcastFloodRelayPorts(switch, relay_ports) :- + &SwitchPort( + .lsp = lsp, + .sw = switch, + .peer = Some{&RouterPort{.router = &Router{.mcast_cfg = &mcast_cfg}}} + ), mcast_cfg.relay, + var relay_ports = lsp._uuid.group_by(switch).to_set(). + +SwitchMcastFloodRelayPorts(switch, set_empty()) :- + Switch[switch], + not &SwitchPort( + .sw = switch, + .peer = Some{ + &RouterPort{ + .router = &Router{.mcast_cfg = &McastRouterCfg{.relay=true}} + } + } + ). + +/* Mapping between Switch and the set of port uuids on which to + * flood IP multicast statically. + */ +relation SwitchMcastFloodPorts(sw: Ref<Switch>, ports: Set<uuid>) + +SwitchMcastFloodPorts(switch, flood_ports) :- + &SwitchPort( + .lsp = lsp, + .sw = switch, + .mcast_cfg = &McastPortCfg{.flood = true}), + var flood_ports = lsp._uuid.group_by(switch).to_set(). + +SwitchMcastFloodPorts(switch, set_empty()) :- + Switch[switch], + not &SwitchPort( + .sw = switch, + .mcast_cfg = &McastPortCfg{.flood = true}). + +/* Mapping between Switch and the set of port uuids on which to + * flood IP multicast reports statically. + */ +relation SwitchMcastFloodReportPorts(sw: Ref<Switch>, ports: Set<uuid>) + +SwitchMcastFloodReportPorts(switch, flood_ports) :- + &SwitchPort( + .lsp = lsp, + .sw = switch, + .mcast_cfg = &McastPortCfg{.flood_reports = true}), + var flood_ports = lsp._uuid.group_by(switch).to_set(). + +SwitchMcastFloodReportPorts(switch, set_empty()) :- + Switch[switch], + not &SwitchPort( + .sw = switch, + .mcast_cfg = &McastPortCfg{.flood_reports = true}). + +/* Mapping between Router and the set of port uuids on which to + * flood IP multicast reports statically. + */ +relation RouterMcastFloodPorts(sw: Ref<Router>, ports: Set<uuid>) + +RouterMcastFloodPorts(router, flood_ports) :- + &RouterPort( + .lrp = lrp, + .router = router, + .mcast_cfg = &McastPortCfg{.flood = true} + ), + var flood_ports = lrp._uuid.group_by(router).to_set(). + +RouterMcastFloodPorts(router, set_empty()) :- + Router[router], + not &RouterPort( + .router = router, + .mcast_cfg = &McastPortCfg{.flood = true}). + +/* Flattened IGMP group. One record per address-port tuple. */ +relation IgmpSwitchGroupPort( + address: string, + switch : Ref<Switch>, + port : uuid +) + +IgmpSwitchGroupPort(address, switch, lsp_uuid) :- + sb::IGMP_Group(.address = address, .datapath = igmp_dp_set, + .ports = pb_ports), + var pb_port_uuid = FlatMap(pb_ports), + sb::Port_Binding(._uuid = pb_port_uuid, .logical_port = lsp_name), + &SwitchPort( + .lsp = nb::Logical_Switch_Port{._uuid = lsp_uuid, .name = lsp_name}, + .sw = switch). + +/* Aggregated IGMP group: merges all IgmpSwitchGroupPort for a given + * address-switch tuple from all chassis. + */ +relation IgmpSwitchMulticastGroup( + address: string, + switch : Ref<Switch>, + ports : Set<uuid> +) + +IgmpSwitchMulticastGroup(address, switch, ports) :- + IgmpSwitchGroupPort(address, switch, port), + var ports = port.group_by((address, switch)).to_set(). + +/* Flattened IGMP group representation for routers with relay enabled. One + * record per address-port tuple for all IGMP groups learned by switches + * connected to the router. + */ +relation IgmpRouterGroupPort( + address: string, + router : Ref<Router>, + port : uuid +) + +IgmpRouterGroupPort(address, rtr_port.router, rtr_port.lrp._uuid) :- + SwitchMcastFloodRelayPorts(switch, sw_flood_ports), + IgmpSwitchMulticastGroup(address, switch, _), + /* For IPv6 only relay routable multicast groups + * (RFC 4291 2.7). + */ + match (ipv6_parse(address)) { + Some{ipv6} -> ipv6_is_routable_multicast(ipv6), + None -> true + }, + var flood_port = FlatMap(sw_flood_ports), + &SwitchPort(.lsp = nb::Logical_Switch_Port{._uuid = flood_port}, + .peer = Some{&rtr_port}). + +/* Aggregated IGMP group for routers: merges all IgmpRouterGroupPort for + * a given address-router tuple from all connected switches. + */ +relation IgmpRouterMulticastGroup( + address: string, + router : Ref<Router>, + ports : Set<uuid> +) + +IgmpRouterMulticastGroup(address, router, ports) :- + IgmpRouterGroupPort(address, router, port), + var ports = port.group_by((address, router)).to_set(). diff --git a/northd/ovn-nb.dlopts b/northd/ovn-nb.dlopts new file mode 100644 index 000000000000..0682c14cf406 --- /dev/null +++ b/northd/ovn-nb.dlopts @@ -0,0 +1,13 @@ +-o Logical_Router_Port +--rw Logical_Router_Port.ipv6_prefix +-o Logical_Switch_Port +--rw Logical_Switch_Port.tag +--rw Logical_Switch_Port.dynamic_addresses +--rw Logical_Switch_Port.up +-o NB_Global +--rw NB_Global.sb_cfg +--rw NB_Global.hv_cfg +--rw NB_Global.options +--rw NB_Global.ipsec +--rw NB_Global.nb_cfg_timestamp +--rw NB_Global.hv_cfg_timestamp diff --git a/northd/ovn-northd-ddlog.c b/northd/ovn-northd-ddlog.c new file mode 100644 index 000000000000..c929afa46258 --- /dev/null +++ b/northd/ovn-northd-ddlog.c @@ -0,0 +1,1752 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <config.h> + +#include <getopt.h> +#include <stdlib.h> +#include <stdio.h> +#include <fcntl.h> +#include <unistd.h> + +#include "command-line.h" +#include "daemon.h" +#include "fatal-signal.h" +#include "hash.h" +#include "jsonrpc.h" +#include "lib/ovn-util.h" +#include "openvswitch/hmap.h" +#include "openvswitch/json.h" +#include "openvswitch/poll-loop.h" +#include "openvswitch/vlog.h" +#include "ovsdb-data.h" +#include "ovsdb-error.h" +#include "ovsdb-parser.h" +#include "ovsdb-types.h" +#include "ovsdb/ovsdb.h" +#include "ovsdb/table.h" +#include "stream-ssl.h" +#include "stream.h" +#include "unixctl.h" +#include "util.h" +#include "uuid.h" + +#include "northd/ovn_northd_ddlog/ddlog.h" + +VLOG_DEFINE_THIS_MODULE(ovn_northd); + +#include "northd/ovn-northd-ddlog-nb.inc" +#include "northd/ovn-northd-ddlog-sb.inc" + +struct northd_status { + bool locked; + bool pause; +}; + +static unixctl_cb_func ovn_northd_exit; +static unixctl_cb_func ovn_northd_pause; +static unixctl_cb_func ovn_northd_resume; +static unixctl_cb_func ovn_northd_is_paused; +static unixctl_cb_func ovn_northd_status; + +/* --ddlog-record: The name of a file to which to record DDlog commands for + * later replay. Useful for debugging. If null (by default), DDlog commands + * are not recorded. */ +static const char *record_file; + +static const char *ovnnb_db; +static const char *ovnsb_db; +static const char *unixctl_path; + +/* Frequently used table ids. */ +static table_id WARNING_TABLE_ID; +static table_id NB_CFG_TIMESTAMP_ID; + +/* Initialize frequently used table ids. */ +static void init_table_ids(void) +{ + WARNING_TABLE_ID = ddlog_get_table_id("Warning"); + NB_CFG_TIMESTAMP_ID = ddlog_get_table_id("NbCfgTimestamp"); +} + +/* + * Accumulates DDlog delta to be sent to OVSDB. + * + * FIXME: There is currently no global northd state descriptor shared by NB and + * SB connections. We should probably introduce it and move this variable there + * instead of declaring it as a global variable. + */ +static ddlog_delta *delta; + + +/* Connection state machine. + * + * When a JSON-RPC session connects, sends a "get_schema" request + * and transitions to S_SCHEMA_REQUESTED. */ +#define STATES \ + /* Waiting for "get_schema" reply. Once received, sends \ + * "monitor" request whose details are informed by the \ + * schema, and transitions to S_MONITOR_REQUESTED. */ \ + STATE(S_SCHEMA_REQUESTED) \ + \ + /* Waits for "monitor" reply. On failure, transitions to \ + * S_ERROR. If successful, replaces our snapshot of database \ + * contents by the data carried in the reply and: \ + * \ + * - If this database needs a lock: \ + * \ + * + If northd is not paused, sends a lock request and \ + * transitions to S_LOCK_REQUESTED. \ + * \ + * + If northd is paused, transition to S_PAUSED. \ + * \ + * - Otherwise, if there are any output-only tables, sends \ + * "transact" request for their data and transitions to \ + * S_OUTPUT_ONLY_DATA_REQUESTED. \ + * \ + * - Otherwise, transitions to S_MONITORING. */ \ + STATE(S_MONITOR_REQUESTED) \ + \ + /* We need the lock and we're paused. We haven't requested \ + * the lock (or we unlocked it). \ + * \ + * Waits for northd to be un-paused. Then, sends a lock \ + * request and transitions to S_LOCK_REQUESTED. */ \ + STATE(S_PAUSED) \ + \ + /* We're waiting for a reply for our lock request. Once we \ + * get the reply: \ + * \ + * - If we did get the lock: \ + * \ + * + If there are any output-only tables, send \ + * "transact" request for their data and transition \ + * to S_OUTPUT_ONLY_DATA_REQUESTED. \ + * \ + * + Otherwise, transition to S_MONITORING. \ + * \ + * - If we didn't get the lock, transition to S_LOCK_CONTENDED. \ + * \ + * (We must ignore notifications that we got or lost the lock \ + * when we're in this state, because they must be old.) */ \ + STATE(S_LOCK_REQUESTED) \ + \ + /* We got a negative reply to our lock request. We're \ + * waiting for a notification that we got the lock. \ + * \ + * (It's important that we ignore notifications that we got \ + * the lock when we're not in this state, because they must \ + * be old.) \ + * \ + * When we get the lock: \ + * \ + * - If there are any output-only tables, send "transact" \ + * request for their data and transition to \ + * S_OUTPUT_ONLY_DATA_REQUESTED. \ + * \ + * - Otherwise, transition to S_MONITORING. */ \ + STATE(S_LOCK_CONTENDED) \ + \ + /* Waits for reply to "transact" request for data in output-only \ + * tables. Once received, uses the data to initialize the local \ + * idea of what's in those tables, and transitions to \ + * S_MONITORING. \ + * \ + * If we get a notification that we lost the lock, transition \ + * to S_LOCK_CONTENDED. */ \ + STATE(S_OUTPUT_ONLY_DATA_REQUESTED) \ + \ + /* State that just processes "update" notifications for the \ + * database. \ + * \ + * If we get a notification that we lost the lock, transition \ + * to S_LOCK_CONTENDED. */ \ + STATE(S_MONITORING) \ + \ + /* Terminal error state that indicates that nothing useful can be \ + * done, for example because the database server doesn't actually \ + * have the desired database. We maintain the session with the \ + * database server anyway. If it starts serving the database \ + * that we want, or if someone fixes and restarts the database, \ + * then it will kill the session and we will automatically \ + * reconnect and try again. */ \ + STATE(S_ERROR) \ + \ + /* Terminal state that indicates we connected to a useless server \ + * in a cluster, e.g. one that is partitioned from the rest of \ + * the cluster. We're waiting to retry. */ \ + STATE(S_RETRY) + +enum northd_state { +#define STATE(NAME) NAME, + STATES +#undef STATE +}; + +static const char * +northd_state_to_string(enum northd_state state) +{ + switch (state) { +#define STATE(NAME) case NAME: return #NAME; + STATES +#undef STATE + default: return "<unknown>"; + } +} + +enum northd_monitoring { + NORTHD_NOT_MONITORING, /* Database is not being monitored. */ + NORTHD_MONITORING, /* Database has "monitor" outstanding. */ + NORTHD_MONITORING_COND, /* Database has "monitor_cond" outstanding. */ +}; + +struct northd_ctx { + ddlog_prog ddlog; + char *prefix; + const char **input_relations; + const char **output_relations; + const char **output_only_relations; + + bool has_timestamp_columns; + + /* Session state. + * + *'state_seqno' is a snapshot of the session's sequence number as returned + * jsonrpc_session_get_seqno(session), so if it differs from the value that + * function currently returns then the session has reconnected and the + * state machine must restart. */ + struct jsonrpc_session *session; /* Connection to the server. */ + enum northd_state state; /* Current session state. */ + unsigned int state_seqno; /* See above. */ + struct json *request_id; /* JSON ID for request awaiting reply. */ + + /* Database info. */ + char *db_name; + struct json *monitor_id; + struct json *schema; + struct json *output_only_data; + enum northd_monitoring monitoring; + + /* Database locking. */ + const char *lock_name; /* Name of lock we need, NULL if none. */ + bool paused; +}; + +enum lock_status { + NOT_LOCKED, /* We don't have the lock and we didn't ask for it. */ + REQUESTED_LOCK, /* We asked for the lock but we didn't get it yet. */ + HAS_LOCK, /* We have the lock. */ +}; + +static enum lock_status northd_lock_status(const struct northd_ctx *); + +static void northd_send_unlock_request(struct northd_ctx *); + +static bool northd_parse_lock_reply(const struct json *result); + +static void northd_handle_update(struct northd_ctx *, bool clear, + const struct json *table_updates); +static struct json *get_database_ops(struct northd_ctx *); +static int ddlog_clear(struct northd_ctx *); + +static void +northd_ctx_connection_status(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *ctx_) +{ + const struct northd_ctx *ctx = ctx_; + bool connected = jsonrpc_session_is_connected(ctx->session); + unixctl_command_reply(conn, connected ? "connected" : "not connected"); +} + +static void +northd_ctx_cluster_state_reset(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *ctx OVS_UNUSED) +{ + VLOG_INFO("XXX cluster state tracking not yet implemented"); + unixctl_command_reply(conn, NULL); +} + +static struct northd_ctx * +northd_ctx_create(const char *server, const char *database, + const char *unixctl_command_prefix, + const char *lock_name, + ddlog_prog ddlog, + const char **input_relations, + const char **output_relations, + const char **output_only_relations) +{ + struct northd_ctx *ctx; + + ctx = xzalloc(sizeof *ctx); + ctx->prefix = xasprintf("%s::", database); + ctx->session = jsonrpc_session_open(server, true); + ctx->state_seqno = UINT_MAX; + ctx->request_id = NULL; + + ctx->input_relations = input_relations; + ctx->output_relations = output_relations; + ctx->output_only_relations = output_only_relations; + + ctx->db_name = xstrdup(database); + ctx->monitor_id = json_array_create_2(json_string_create("monid"), + json_string_create(database)); + ctx->lock_name = lock_name; + + ctx->ddlog = ddlog; + + char *cmd = xasprintf("%s-connection-status", unixctl_command_prefix); + unixctl_command_register(cmd, "", 0, 0, + northd_ctx_connection_status, ctx); + free(cmd); + + cmd = xasprintf("%s-cluster-state-reset", unixctl_command_prefix); + unixctl_command_register(cmd, "", 0, 0, + northd_ctx_cluster_state_reset, NULL); + free(cmd); + + return ctx; +} + +static void +northd_ctx_destroy(struct northd_ctx *ctx) +{ + if (ctx) { + jsonrpc_session_close(ctx->session); + + json_destroy(ctx->monitor_id); + json_destroy(ctx->schema); + json_destroy(ctx->output_only_data); + + json_destroy(ctx->request_id); + free(ctx); + } +} + +/* Forces 'ctx' to drop its connection to the database and reconnect. */ +static void +northd_force_reconnect(struct northd_ctx *ctx) +{ + if (ctx->session) { + jsonrpc_session_force_reconnect(ctx->session); + } +} + +static void northd_transition_at(struct northd_ctx *, enum northd_state, + const char *where); +#define northd_transition(CTX, STATE) \ + northd_transition_at(CTX, STATE, OVS_SOURCE_LOCATOR) + +static void +northd_transition_at(struct northd_ctx *ctx, enum northd_state new_state, + const char *where) +{ + VLOG_DBG("%s: %s -> %s at %s", + ctx->session ? jsonrpc_session_get_name(ctx->session) : "void", + northd_state_to_string(ctx->state), + northd_state_to_string(new_state), + where); + ctx->state = new_state; +} + +#define northd_retry(CTX) northd_retry_at(CTX, OVS_SOURCE_LOCATOR) +static void +northd_retry_at(struct northd_ctx *ctx, const char *where) +{ + northd_send_unlock_request(ctx); + + if (ctx->session && jsonrpc_session_get_n_remotes(ctx->session) > 1) { + northd_force_reconnect(ctx); + northd_transition_at(ctx, S_RETRY, where); + } else { + northd_transition_at(ctx, S_ERROR, where); + } +} + +/* Returns true if 'ctx' is configured to obtain a lock and owns that lock. + * + * Locking and unlocking happens asynchronously from the database client's + * point of view, so the information is only useful for optimization (e.g. if + * the client doesn't have the lock then there's no point in trying to write to + * the database). */ +static enum lock_status +northd_lock_status(const struct northd_ctx *ctx) +{ + if (!ctx->lock_name) { + return NOT_LOCKED; + } + + switch (ctx->state) { + case S_SCHEMA_REQUESTED: + case S_MONITOR_REQUESTED: + case S_PAUSED: + case S_ERROR: + case S_RETRY: + return NOT_LOCKED; + + case S_LOCK_REQUESTED: + case S_LOCK_CONTENDED: + return REQUESTED_LOCK; + + case S_OUTPUT_ONLY_DATA_REQUESTED: + case S_MONITORING: + return HAS_LOCK; + } + + OVS_NOT_REACHED(); +} + +static void +northd_send_request(struct northd_ctx *ctx, struct jsonrpc_msg *request) +{ + json_destroy(ctx->request_id); + ctx->request_id = json_clone(request->id); + if (ctx->session) { + jsonrpc_session_send(ctx->session, request); + } +} + +static void +northd_send_schema_request(struct northd_ctx *ctx) +{ + northd_send_request(ctx, jsonrpc_create_request( + "get_schema", + json_array_create_1(json_string_create( + ctx->db_name)), + NULL)); +} + +static void +northd_send_transact(struct northd_ctx *ctx, struct json *ddlog_ops) +{ + struct json *comment = json_object_create(); + json_object_put_string(comment, "op", "comment"); + json_object_put_string(comment, "comment", "ovn-northd-ddlog"); + json_array_add(ddlog_ops, comment); + + if (ctx->lock_name) { + struct json *assertion = json_object_create(); + json_object_put_string(assertion, "op", "assert"); + json_object_put_string(assertion, "lock", ctx->lock_name); + json_array_add(ddlog_ops, assertion); + } + + northd_send_request(ctx, jsonrpc_create_request("transact", ddlog_ops, + NULL)); +} + +static bool +northd_send_monitor_request(struct northd_ctx *ctx) +{ + struct ovsdb_schema *schema; + struct ovsdb_error *error = ovsdb_schema_from_json(ctx->schema, &schema); + if (error) { + VLOG_ERR("couldn't parse schema (%s)", ovsdb_error_to_string(error)); + return false; + } + + const struct ovsdb_table_schema *nb_global = shash_find_data( + &schema->tables, "NB_Global"); + ctx->has_timestamp_columns + = (nb_global + && shash_find_data(&nb_global->columns, "nb_cfg_timestamp") + && shash_find_data(&nb_global->columns, "sb_cfg_timestamp")); + + struct json *monitor_requests = json_object_create(); + + /* This should be smarter about ignoring not needed ones. There's a lot + * more logic for this in ovsdb_idl_send_monitor_request(). */ + size_t n = shash_count(&schema->tables); + const struct shash_node **nodes = shash_sort(&schema->tables); + for (int i = 0; i < n; i++) { + struct ovsdb_table_schema *table = nodes[i]->data; + + /* Only subscribe to input relations we care about. */ + for (const char **p = ctx->input_relations; *p; p++) { + if (!strcmp(table->name, *p)) { + json_object_put(monitor_requests, table->name, + json_array_create_1(json_object_create())); + break; + } + } + } + free(nodes); + + ovsdb_schema_destroy(schema); + + northd_send_request( + ctx, + jsonrpc_create_request( + "monitor", + json_array_create_3(json_string_create(ctx->db_name), + json_clone(ctx->monitor_id), monitor_requests), + NULL)); + return true; +} + +/* Sends the database server a request for all the row UUIDs in output-only + * tables. */ +static void +northd_send_output_only_data_request(struct northd_ctx *ctx) +{ + json_destroy(ctx->output_only_data); + ctx->output_only_data = NULL; + + struct json *ops = json_array_create_1(json_string_create(ctx->db_name)); + for (size_t i = 0; ctx->output_only_relations[i]; i++) { + const char *table = ctx->output_only_relations[i]; + struct json *op = json_object_create(); + json_object_put_string(op, "op", "select"); + json_object_put_string(op, "table", table); + json_object_put(op, "columns", + json_array_create_1(json_string_create("_uuid"))); + json_object_put(op, "where", json_array_create_empty()); + json_array_add(ops, op); + } + VLOG_WARN("sending output-only data request"); + + northd_send_request(ctx, + jsonrpc_create_request("transact", ops, NULL)); +} + +static struct jsonrpc_msg * +northd_compose_lock_request__(struct northd_ctx *ctx, const char *method) +{ + struct json *params = json_array_create_1(json_string_create( + ctx->lock_name)); + return jsonrpc_create_request(method, params, NULL); +} + +static void +northd_send_lock_request(struct northd_ctx *ctx) +{ + northd_send_request(ctx, northd_compose_lock_request__(ctx, "lock")); +} + +/* This sends an unlock request, if 'ctx' has a defined lock and + * is in a state that holds a lock or has requested a lock. + * + * When this sends an unlock request, the caller needs to + * transition 'ctx' to some other state (because otherwise the + * current state is still defined as holding or requesting a + * lock). */ +static void +northd_send_unlock_request(struct northd_ctx *ctx) +{ + if (ctx->lock_name && northd_lock_status(ctx) != NOT_LOCKED) { + northd_send_request(ctx, northd_compose_lock_request__(ctx, "unlock")); + + /* We don't care to track the unlock reply. */ + free(ctx->request_id); + ctx->request_id = NULL; + } +} + +static bool +northd_process_response(struct northd_ctx *ctx, struct jsonrpc_msg *msg) +{ + if (msg->type != JSONRPC_REPLY && msg->type != JSONRPC_ERROR) { + return false; + } + + if (!json_equal(ctx->request_id, msg->id)) { + return false; + } + json_destroy(ctx->request_id); + ctx->request_id = NULL; + + if (msg->type == JSONRPC_ERROR) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); + char *s = jsonrpc_msg_to_string(msg); + VLOG_INFO_RL(&rl, "%s: received unexpected %s response in " + "%s state: %s", jsonrpc_session_get_name(ctx->session), + jsonrpc_msg_type_to_string(msg->type), + northd_state_to_string(ctx->state), + s); + free(s); + northd_retry(ctx); + return true; + } + + switch (ctx->state) { + case S_SCHEMA_REQUESTED: + json_destroy(ctx->schema); + ctx->schema = json_clone(msg->result); + if (northd_send_monitor_request(ctx)) { + northd_transition(ctx, S_MONITOR_REQUESTED); + } else { + northd_retry(ctx); + } + break; + + case S_MONITOR_REQUESTED: + ctx->monitoring = NORTHD_MONITORING; + northd_handle_update(ctx, true, msg->result); + if (ctx->paused) { + northd_transition(ctx, S_PAUSED); + } else if (ctx->lock_name) { + northd_send_lock_request(ctx); + northd_transition(ctx, S_LOCK_REQUESTED); + } else if (ctx->output_only_relations[0]) { + northd_send_output_only_data_request(ctx); + northd_transition(ctx, S_OUTPUT_ONLY_DATA_REQUESTED); + } else { + northd_transition(ctx, S_MONITORING); + } + break; + + case S_PAUSED: + /* (No outstanding requests.) */ + break; + + case S_LOCK_REQUESTED: + if (northd_parse_lock_reply(msg->result)) { + /* We got the lock. */ + if (ctx->output_only_relations[0]) { + northd_send_output_only_data_request(ctx); + northd_transition(ctx, S_OUTPUT_ONLY_DATA_REQUESTED); + } else { + northd_transition(ctx, S_MONITORING); + } + } else { + /* We did not get the lock. */ + northd_transition(ctx, S_LOCK_CONTENDED); + } + break; + + case S_LOCK_CONTENDED: + /* (No outstanding requests.) */ + break; + + case S_OUTPUT_ONLY_DATA_REQUESTED: + ctx->output_only_data = msg->result; + msg->result = NULL; + northd_transition(ctx, S_MONITORING); + break; + + case S_MONITORING: + break; + + case S_ERROR: + case S_RETRY: + /* Nothing to do in this state. */ + break; + + default: + OVS_NOT_REACHED(); + } + + return true; +} + +static bool +northd_handle_update_rpc(struct northd_ctx *ctx, + const struct jsonrpc_msg *msg) +{ + if (msg->type == JSONRPC_NOTIFY) { + if (!strcmp(msg->method, "update") + && msg->params->type == JSON_ARRAY + && msg->params->array.n == 2 + && json_equal(msg->params->array.elems[0], ctx->monitor_id)) { + northd_handle_update(ctx, false, msg->params->array.elems[1]); + return true; + } + } + return false; +} + +static void +northd_pause(struct northd_ctx *ctx) +{ + if (!ctx->paused && ctx->lock_name && ctx->state != S_PAUSED) { + ctx->paused = true; + VLOG_INFO("This ovn-northd instance is now paused."); + if (northd_lock_status(ctx) != NOT_LOCKED) { + northd_send_unlock_request(ctx); + } + if (ctx->state > S_PAUSED) { + northd_transition(ctx, S_PAUSED); + } + } +} + +static void +northd_unpause(struct northd_ctx *ctx) +{ + if (ctx->paused) { + ovs_assert(ctx->lock_name); + + switch (ctx->state) { + case S_SCHEMA_REQUESTED: + case S_MONITOR_REQUESTED: + /* Nothing to do. */ + break; + + case S_PAUSED: + northd_send_lock_request(ctx); + northd_transition(ctx, S_LOCK_REQUESTED); + break; + + case S_LOCK_REQUESTED: + case S_LOCK_CONTENDED: + case S_OUTPUT_ONLY_DATA_REQUESTED: + case S_MONITORING: + case S_ERROR: + case S_RETRY: + OVS_NOT_REACHED(); + } + + ctx->paused = false; + } + +} + +static bool +northd_process_lock_notify(struct northd_ctx *ctx, + const struct jsonrpc_msg *msg) +{ + if (msg->type != JSONRPC_NOTIFY) { + return false; + } + + int got_lock = (!strcmp(msg->method, "locked") ? true + : !strcmp(msg->method, "stolen") ? false + : -1); + if (got_lock < 0) { + return false; + } + + if (!ctx->lock_name + || msg->params->type != JSON_ARRAY + || json_array(msg->params)->n != 1 + || json_array(msg->params)->elems[0]->type != JSON_STRING) { + return false; + } + + const char *lock_name = json_string(json_array(msg->params)->elems[0]); + if (strcmp(ctx->lock_name, lock_name)) { + return false; + } + + switch (ctx->state) { + case S_SCHEMA_REQUESTED: + case S_MONITOR_REQUESTED: + case S_PAUSED: + case S_LOCK_REQUESTED: + case S_ERROR: + case S_RETRY: + /* Ignore lock notification. It must be stale, resulting + * from an old "lock" request. */ + VLOG_DBG("received stale lock notification \"%s\" in state %s", + msg->method, northd_state_to_string(ctx->state)); + return true; + + case S_LOCK_CONTENDED: + if (got_lock) { + if (ctx->output_only_relations[0]) { + northd_send_output_only_data_request(ctx); + northd_transition(ctx, S_OUTPUT_ONLY_DATA_REQUESTED); + } else { + northd_transition(ctx, S_MONITORING); + } + } else { + /* Should not be possible: we know that we received a + * reply to our lock request, which means that there + * should be no outstanding stale lock + * notifications. */ + VLOG_WARN("\"stolen\" notification in LOCK_CONTENDED state"); + } + return true; + + case S_OUTPUT_ONLY_DATA_REQUESTED: + case S_MONITORING: + if (!got_lock) { + VLOG_INFO("northd lock stolen by another client"); + northd_transition(ctx, S_LOCK_CONTENDED); + } else { + /* Should not be possible: we already had the * lock. */ + VLOG_WARN("\"locked\" notification in %s state", + northd_state_to_string(ctx->state)); + } + return true; + } + OVS_NOT_REACHED(); +} + +static bool +northd_parse_lock_reply(const struct json *result) +{ + if (result->type == JSON_OBJECT) { + const struct json *locked + = shash_find_data(json_object(result), "locked"); + return locked && locked->type == JSON_TRUE; + } else { + return false; + } +} + +static void +northd_process_msg(struct northd_ctx *ctx, struct jsonrpc_msg *msg) +{ + if (!northd_process_response(ctx, msg) + && !northd_process_lock_notify(ctx, msg) + && !northd_handle_update_rpc(ctx, msg)) { + /* Unknown message. Log at debug level because this can + * happen if northd_txn_destroy() is called to destroy a + * transaction before we receive the reply, or in other + * corner cases. */ + char *s = jsonrpc_msg_to_string(msg); + VLOG_DBG("%s: received unexpected %s message: %s", + jsonrpc_session_get_name(ctx->session), + jsonrpc_msg_type_to_string(msg->type), s); + free(s); + } +} + +/* Processes a batch of messages from the database server on 'ctx'. */ +static void +northd_run(struct northd_ctx *ctx, bool run_deltas) +{ + if (!ctx->session) { + return; + } + + for (int i = 0; jsonrpc_session_is_connected(ctx->session) && i < 50; + i++) { + struct jsonrpc_msg *msg; + unsigned int seqno; + + seqno = jsonrpc_session_get_seqno(ctx->session); + if (ctx->state_seqno != seqno) { + ctx->state_seqno = seqno; + + if (ctx->state != S_PAUSED) { + northd_send_schema_request(ctx); + ctx->state = S_SCHEMA_REQUESTED; + } + } + + msg = jsonrpc_session_recv(ctx->session); + if (!msg) { + break; + } + northd_process_msg(ctx, msg); + jsonrpc_msg_destroy(msg); + } + jsonrpc_session_run(ctx->session); + + if (run_deltas && !ctx->request_id) { + struct json *ops = get_database_ops(ctx); + if (ops) { + northd_send_transact(ctx, ops); + } + } +} + +static void +northd_update_probe_interval_cb( + uintptr_t probe_intervalp_, + table_id table OVS_UNUSED, + const ddlog_record *rec, + ssize_t weight OVS_UNUSED) +{ + int *probe_intervalp = (int *) probe_intervalp_; + + uint64_t x = ddlog_get_u64(rec); + if (x > 1000) { + *probe_intervalp = x; + } +} + +static void +set_probe_interval(struct jsonrpc_session *session, int override_interval) +{ +#define DEFAULT_PROBE_INTERVAL_MSEC 5000 + const char *name = jsonrpc_session_get_name(session); + int default_interval = (!stream_or_pstream_needs_probes(name) + ? 0 : DEFAULT_PROBE_INTERVAL_MSEC); + jsonrpc_session_set_probe_interval(session, + MAX(override_interval, default_interval)); +} + +static void +northd_update_probe_interval(struct northd_ctx *nb, struct northd_ctx *sb) +{ + /* -1 means the default probe interval. */ + int probe_interval = -1; + table_id tid = ddlog_get_table_id("Northd_Probe_Interval"); + ddlog_delta *probe_delta = ddlog_delta_get_table(delta, tid); + ddlog_delta_enumerate(probe_delta, northd_update_probe_interval_cb, (uintptr_t) &probe_interval); + + set_probe_interval(nb->session, probe_interval); + set_probe_interval(sb->session, probe_interval); + jsonrpc_session_set_probe_interval(sb->session, probe_interval); +} + +/* Arranges for poll_block() to wake up when northd_run() has something to + * do or when activity occurs on a transaction on 'ctx'. */ +static void +northd_wait(struct northd_ctx *ctx) +{ + if (!ctx->session) { + return; + } + jsonrpc_session_wait(ctx->session); + jsonrpc_session_recv_wait(ctx->session); +} + +/* ddlog-specific actions. */ + +/* Generate OVSDB update command for delta-plus, delta-minus, and delta-update + * tables. */ +static void +ddlog_table_update_deltas(struct ds *ds, ddlog_prog ddlog, + const char *db, const char *table) +{ + int error; + char *updates; + + error = ddlog_dump_ovsdb_delta_tables(ddlog, delta, db, table, &updates); + if (error) { + VLOG_INFO("DDlog error %d dumping delta for table %s", error, table); + return; + } + + if (!updates[0]) { + ddlog_free_json(updates); + return; + } + + ds_put_cstr(ds, updates); + ds_put_char(ds, ','); + ddlog_free_json(updates); +} + +/* Generate OVSDB update command for a output-only table. */ +static void +ddlog_table_update_output(struct ds *ds, ddlog_prog ddlog, + const char *db, const char *table) +{ + int error; + char *updates; + + error = ddlog_dump_ovsdb_output_table(ddlog, delta, db, table, &updates); + if (error) { + VLOG_WARN("%s: failed to generate update commands for " + "output-only table (error %d)", table, error); + return; + } + char *table_name = xasprintf("%s::Out_%s", db, table); + ddlog_delta_clear_table(delta, ddlog_get_table_id(table_name)); + free(table_name); + + if (!updates[0]) { + ddlog_free_json(updates); + return; + } + + ds_put_cstr(ds, updates); + ds_put_char(ds, ','); + ddlog_free_json(updates); +} + +/* A set of UUIDs. + * + * Not fully abstracted: the client still uses plain struct hmap, for + * example. */ + +/* A node within a set of uuids. */ +struct uuidset_node { + struct hmap_node hmap_node; + struct uuid uuid; +}; + +static void uuidset_delete(struct hmap *uuidset, struct uuidset_node *); + +static void +uuidset_destroy(struct hmap *uuidset) +{ + if (uuidset) { + struct uuidset_node *node, *next; + + HMAP_FOR_EACH_SAFE (node, next, hmap_node, uuidset) { + uuidset_delete(uuidset, node); + } + hmap_destroy(uuidset); + } +} + +static struct uuidset_node * +uuidset_find(struct hmap *uuidset, const struct uuid *uuid) +{ + struct uuidset_node *node; + + HMAP_FOR_EACH_WITH_HASH (node, hmap_node, uuid_hash(uuid), uuidset) { + if (uuid_equals(uuid, &node->uuid)) { + return node; + } + } + + return NULL; +} + +static void +uuidset_insert(struct hmap *uuidset, const struct uuid *uuid) +{ + if (!uuidset_find(uuidset, uuid)) { + struct uuidset_node *node = xmalloc(sizeof *node); + node->uuid = *uuid; + hmap_insert(uuidset, &node->hmap_node, uuid_hash(&node->uuid)); + } +} + +static void +uuidset_delete(struct hmap *uuidset, struct uuidset_node *node) +{ + hmap_remove(uuidset, &node->hmap_node); + free(node); +} + +static struct ovsdb_error * +parse_output_only_data(const struct json *txn_result, size_t index, + struct hmap *uuidset) +{ + if (txn_result->type != JSON_ARRAY || txn_result->array.n <= index) { + return ovsdb_syntax_error(txn_result, NULL, + "transaction result missing for " + "output-only relation %"PRIuSIZE, index); + } + + struct ovsdb_parser p; + ovsdb_parser_init(&p, txn_result->array.elems[0], "select result"); + const struct json *rows = ovsdb_parser_member(&p, "rows", OP_ARRAY); + struct ovsdb_error *error = ovsdb_parser_finish(&p); + if (error) { + return error; + } + + for (size_t i = 0; i < rows->array.n; i++) { + const struct json *row = rows->array.elems[i]; + + ovsdb_parser_init(&p, row, "row"); + const struct json *uuid = ovsdb_parser_member(&p, "_uuid", OP_ARRAY); + error = ovsdb_parser_finish(&p); + if (error) { + return error; + } + + struct ovsdb_base_type base_type = OVSDB_BASE_UUID_INIT; + union ovsdb_atom atom; + error = ovsdb_atom_from_json(&atom, &base_type, uuid, NULL); + if (error) { + return error; + } + uuidset_insert(uuidset, &atom.uuid); + } + + return NULL; +} + +static bool +get_ddlog_uuid(const ddlog_record *rec, struct uuid *uuid) +{ + if (!ddlog_is_int(rec)) { + return false; + } + + __uint128_t u128 = ddlog_get_u128(rec); + uuid->parts[0] = u128 >> 96; + uuid->parts[1] = u128 >> 64; + uuid->parts[2] = u128 >> 32; + uuid->parts[3] = u128; + return true; +} + +struct dump_index_data { + ddlog_prog prog; + struct hmap *rows_present; + const char *table; + struct ds *ops_s; +}; + +static void OVS_UNUSED +index_cb(uintptr_t data_, const ddlog_record *rec) +{ + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); + struct dump_index_data *data = (struct dump_index_data *) data_; + + /* Extract the rec's row UUID as 'uuid'. */ + const ddlog_record *rec_uuid = ddlog_get_named_struct_field(rec, "_uuid"); + if (!rec_uuid) { + VLOG_WARN_RL(&rl, "%s: row has no _uuid column", data->table); + return; + } + struct uuid uuid; + if (!get_ddlog_uuid(rec_uuid, &uuid)) { + VLOG_WARN_RL(&rl, "%s: _uuid column has unexpected type", data->table); + return; + } + + /* If a row with the given UUID was already in the database, then + * send a operation to update it; otherwise, send an operation to + * insert it. */ + struct uuidset_node *node = uuidset_find(data->rows_present, &uuid); + char *s = NULL; + int ret; + if (node) { + uuidset_delete(data->rows_present, node); + ret = ddlog_into_ovsdb_update_str(data->prog, data->table, rec, &s); + } else { + ret = ddlog_into_ovsdb_insert_str(data->prog, data->table, rec, &s); + } + if (ret) { + VLOG_WARN_RL(&rl, "%s: ddlog could not convert row into database op", + data->table); + return; + } + ds_put_format(data->ops_s, "%s,", s); + ddlog_free_json(s); +} + +static struct json * +where_uuid_equals(const struct uuid *uuid) +{ + return + json_array_create_1( + json_array_create_3( + json_string_create("_uuid"), + json_string_create("=="), + json_array_create_2( + json_string_create("uuid"), + json_string_create_nocopy( + xasprintf(UUID_FMT, UUID_ARGS(uuid)))))); +} + +static void +add_delete_row_op(const char *table, const struct uuid *uuid, struct ds *ops_s) +{ + struct json *op = json_object_create(); + json_object_put_string(op, "op", "delete"); + json_object_put_string(op, "table", table); + json_object_put(op, "where", where_uuid_equals(uuid)); + json_to_ds(op, 0, ops_s); + json_destroy(op); + ds_put_char(ops_s, ','); +} + +static void +northd_update_sb_cfg_cb( + uintptr_t new_sb_cfgp_, + table_id table OVS_UNUSED, + const ddlog_record *rec, + ssize_t weight) +{ + int64_t *new_sb_cfgp = (int64_t *) new_sb_cfgp_; + + if (weight < 0) { + return; + } + + if (ddlog_get_int(rec, NULL, 0) <= sizeof *new_sb_cfgp) { + *new_sb_cfgp = ddlog_get_i64(rec); + } +} + +static struct json * +get_database_ops(struct northd_ctx *ctx) +{ + struct ds ops_s = DS_EMPTY_INITIALIZER; + ds_put_char(&ops_s, '['); + json_string_escape(ctx->db_name, &ops_s); + ds_put_char(&ops_s, ','); + size_t start_len = ops_s.length; + + for (const char **p = ctx->output_relations; *p; p++) { + ddlog_table_update_deltas(&ops_s, ctx->ddlog, ctx->db_name, *p); + } + + if (ctx->output_only_data) { + /* + * We just reconnected to the database (or connected for the first time + * in this execution). We assume that the contents of the output-only + * tables might have changed (this is especially true the first time we + * connect to the database a given execution, of course; we can't + * assume that the tables have any particular contents in this case). + * + * ctx->output_only_data is a database reply that tells us the + * UUIDs of the rows that exist in the database. Our strategy is to + * compare these UUIDs to the UUIDs of the rows that exist in the DDlog + * analogues of these tables, and then add, delete, or update rows as + * necessary. + * + * (ctx->output_only_data only gives row UUIDs, not full row + * contents. That means that for rows that exist in OVSDB and in + * DDLog, we always send an update to set all the columns. It wouldn't + * save bandwidth to do anything else, since we'd always have to send + * the full row contents in one direction and if there were differences + * we'd have to send the contents in both directions. With this + * strategy we only send them in one direction even in the worst case.) + * + * (We can't just send an operation to delete all the rows and then + * re-add them all in the same transaction, because ovsdb-server + * rejecting deleting a row with a given UUID and the adding the same + * UUID back in a single transaction.) + */ + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 2); + + for (size_t i = 0; ctx->output_only_relations[i]; i++) { + const char *table = ctx->output_only_relations[i]; + + /* Parse the list of row UUIDs received from OVSDB. */ + struct hmap rows_present = HMAP_INITIALIZER(&rows_present); + struct ovsdb_error *error = parse_output_only_data( + ctx->output_only_data, i, &rows_present); + if (error) { + char *s = ovsdb_error_to_string_free(error); + VLOG_WARN_RL(&rl, "%s", s); + free(s); + uuidset_destroy(&rows_present); + continue; + } + + /* Get the index_id for the DDlog table. + * + * We require output-only tables to have an accompanying index + * named <table>_Index. */ + char *index = xasprintf("%s_Index", table); + index_id idxid = ddlog_get_index_id(index); + if (idxid == -1) { + VLOG_WARN_RL(&rl, "%s: unknown index", index); + free(index); + uuidset_destroy(&rows_present); + continue; + } + free(index); + + /* For each row in the index, update a corresponding OVSDB row, if + * there is one, otherwise insert a new row. */ + struct dump_index_data cbdata = { + ctx->ddlog, &rows_present, table, &ops_s + }; + ddlog_dump_index(ctx->ddlog, idxid, index_cb, (uintptr_t) &cbdata); + + /* Any uuids remaining in 'rows_present' are rows that are in OVSDB + * but not DDlog. Delete them from OVSDB. */ + struct uuidset_node *node; + HMAP_FOR_EACH (node, hmap_node, &rows_present) { + add_delete_row_op(table, &node->uuid, &ops_s); + } + uuidset_destroy(&rows_present); + + /* Discard any queued output to this table, since we just + * did a full sync to it. */ + struct ds tmp = DS_EMPTY_INITIALIZER; + ddlog_table_update_output(&tmp, ctx->ddlog, ctx->db_name, table); + ds_destroy(&tmp); + } + + json_destroy(ctx->output_only_data); + ctx->output_only_data = NULL; + } else { + for (const char **p = ctx->output_only_relations; *p; p++) { + ddlog_table_update_output(&ops_s, ctx->ddlog, ctx->db_name, *p); + } + } + + /* If we're updating nb::NB_Global.sb_cfg, then also update + * sb_cfg_timestamp. + * + * XXX If the transaction we're sending to the database fails, then + * currently as written we'll never find out about it and sb_cfg_timestamp + * will not be updated. + */ + static int64_t old_sb_cfg = INT64_MIN; + static int64_t old_sb_cfg_timestamp = INT64_MIN; + int64_t new_sb_cfg = old_sb_cfg; + if (ctx->has_timestamp_columns) { + table_id sb_cfg_tid = ddlog_get_table_id("SbCfg"); + ddlog_delta *sb_cfg_delta = ddlog_delta_get_table(delta, sb_cfg_tid); + ddlog_delta_enumerate(sb_cfg_delta, northd_update_sb_cfg_cb, + (uintptr_t) &new_sb_cfg); + ddlog_free_delta(sb_cfg_delta); + + if (new_sb_cfg != old_sb_cfg) { + old_sb_cfg = new_sb_cfg; + old_sb_cfg_timestamp = time_wall_msec(); + ds_put_format(&ops_s, "{\"op\":\"update\",\"table\":\"NB_Global\",\"where\":[]," + "\"row\":{\"sb_cfg_timestamp\":%"PRId64"}},", old_sb_cfg_timestamp); + } + } + + struct json *ops; + if (ops_s.length > start_len) { + ds_chomp(&ops_s, ','); + ds_put_char(&ops_s, ']'); + ops = json_from_string(ds_cstr(&ops_s)); + } else { + ops = NULL; + } + + ds_destroy(&ops_s); + + return ops; +} + +static void +warning_cb(uintptr_t arg OVS_UNUSED, + table_id table OVS_UNUSED, + const ddlog_record *rec, + ssize_t weight) +{ + size_t len; + const char *s = ddlog_get_str_with_length(rec, &len); + if (weight > 0) { + VLOG_WARN("New warning: %.*s", (int)len, s); + } else { + VLOG_WARN("Warning cleared: %.*s", (int)len, s); + } +} + +static int +ddlog_commit(ddlog_prog ddlog) +{ + ddlog_delta *new_delta = ddlog_transaction_commit_dump_changes(ddlog); + if (!delta) { + VLOG_WARN("Transaction commit failed"); + return -1; + } + + /* Remove warnings from delta and output them straight away. */ + ddlog_delta *warnings = ddlog_delta_remove_table(new_delta, WARNING_TABLE_ID); + ddlog_delta_enumerate(warnings, warning_cb, 0); + ddlog_free_delta(warnings); + + /* Merge changes into `delta`. */ + ddlog_delta_union(delta, new_delta); + + return 0; +} + +static const struct json * +json_object_get(const struct json *json, const char *member_name) +{ + return (json && json->type == JSON_OBJECT + ? shash_find_data(json_object(json), member_name) + : NULL); +} + +/* Returns the new value of NB_Global::nb_cfg, if any, from the updates in + * <table-updates> provided by the caller, or INT64_MIN if none is present. */ +static int64_t +get_nb_cfg(const struct json *table_updates) +{ + const struct json *nb_global = json_object_get(table_updates, "NB_Global"); + if (nb_global) { + struct shash_node *row; + SHASH_FOR_EACH (row, json_object(nb_global)) { + const struct json *value = row->data; + const struct json *new = json_object_get(value, "new"); + const struct json *nb_cfg = json_object_get(new, "nb_cfg"); + if (nb_cfg && nb_cfg->type == JSON_INTEGER) { + return json_integer(nb_cfg); + } + } + } + return INT64_MIN; +} + +static void +northd_handle_update(struct northd_ctx *ctx, bool clear, + const struct json *table_updates) +{ + if (!table_updates) { + return; + } + + if (ddlog_transaction_start(ctx->ddlog)) { + VLOG_WARN("DDlog failed to start transaction"); + return; + } + + if (clear && ddlog_clear(ctx)) { + goto error; + } + char *updates_s = json_to_string(table_updates, 0); + if (ddlog_apply_ovsdb_updates(ctx->ddlog, ctx->prefix, updates_s)) { + VLOG_WARN("DDlog failed to apply updates"); + free(updates_s); + goto error; + } + free(updates_s); + + /* Whenever a new 'nb_cfg' value comes in, take the current time and push + * it into the NbCfgTimestamp relation for the DDlog program to put into + * nb::NB_Global.nb_cfg_timestamp. */ + static int64_t old_nb_cfg = INT64_MIN; + static int64_t old_nb_cfg_timestamp = INT64_MIN; + int64_t new_nb_cfg = old_nb_cfg; + int64_t new_nb_cfg_timestamp = old_nb_cfg_timestamp; + if (ctx->has_timestamp_columns) { + new_nb_cfg = get_nb_cfg(table_updates); + if (new_nb_cfg == INT64_MIN) { + new_nb_cfg = old_nb_cfg == INT64_MIN ? 0 : old_nb_cfg; + } + if (new_nb_cfg != old_nb_cfg) { + new_nb_cfg_timestamp = time_wall_msec(); + + ddlog_cmd *updates[2]; + int n_updates = 0; + if (old_nb_cfg_timestamp != INT64_MIN) { + updates[n_updates++] = ddlog_delete_val_cmd( + NB_CFG_TIMESTAMP_ID, ddlog_i64(old_nb_cfg_timestamp)); + } + updates[n_updates++] = ddlog_insert_cmd( + NB_CFG_TIMESTAMP_ID, ddlog_i64(new_nb_cfg_timestamp)); + if (ddlog_apply_updates(ctx->ddlog, updates, n_updates) < 0) { + goto error; + } + } + } + + /* Commit changes to DDlog. */ + if (ddlog_commit(ctx->ddlog)) { + goto error; + } + old_nb_cfg = new_nb_cfg; + old_nb_cfg_timestamp = new_nb_cfg_timestamp; + + /* This update may have implications for the other side, so + * immediately wake to check for more changes to be applied. */ + poll_immediate_wake(); + + return; + +error: + ddlog_transaction_rollback(ctx->ddlog); +} + +static int +ddlog_clear(struct northd_ctx *ctx) +{ + int n_failures = 0; + for (int i = 0; ctx->input_relations[i]; i++) { + char *table = xasprintf("%s%s", ctx->prefix, ctx->input_relations[i]); + if (ddlog_clear_relation(ctx->ddlog, ddlog_get_table_id(table))) { + n_failures++; + } + free(table); + } + if (n_failures) { + VLOG_WARN("failed to clear %d tables in %s database", + n_failures, ctx->db_name); + } + return n_failures; +} + +/* Callback used by the ddlog engine to print error messages. Note that + * this is only used by the ddlog runtime, as opposed to the application + * code in ovn_northd.dl, which uses the vlog facility directly. */ +static void +ddlog_print_error(const char *msg) +{ + VLOG_ERR("%s", msg); +} + +static void +usage(void) +{ + printf("\ +%s: OVN northbound management daemon\n\ +usage: %s [OPTIONS]\n\ +\n\ +Options:\n\ + --ovnnb-db=DATABASE connect to ovn-nb database at DATABASE\n\ + (default: %s)\n\ + --ovnsb-db=DATABASE connect to ovn-sb database at DATABASE\n\ + (default: %s)\n\ + --unixctl=SOCKET override default control socket name\n\ + -h, --help display this help message\n\ + -o, --options list available options\n\ + -V, --version display version information\n\ +", program_name, program_name, default_nb_db(), default_sb_db()); + daemon_usage(); + vlog_usage(); + stream_usage("database", true, true, false); +} + +static void +parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED) +{ + enum { + OVN_DAEMON_OPTION_ENUMS, + VLOG_OPTION_ENUMS, + SSL_OPTION_ENUMS, + OPT_DDLOG_RECORD + }; + static const struct option long_options[] = { + {"ddlog-record", required_argument, NULL, OPT_DDLOG_RECORD}, + {"ovnsb-db", required_argument, NULL, 'd'}, + {"ovnnb-db", required_argument, NULL, 'D'}, + {"unixctl", required_argument, NULL, 'u'}, + {"help", no_argument, NULL, 'h'}, + {"options", no_argument, NULL, 'o'}, + {"version", no_argument, NULL, 'V'}, + OVN_DAEMON_LONG_OPTIONS, + VLOG_LONG_OPTIONS, + STREAM_SSL_LONG_OPTIONS, + {NULL, 0, NULL, 0}, + }; + char *short_options = ovs_cmdl_long_options_to_short_options(long_options); + + for (;;) { + int c; + + c = getopt_long(argc, argv, short_options, long_options, NULL); + if (c == -1) { + break; + } + + switch (c) { + OVN_DAEMON_OPTION_HANDLERS; + VLOG_OPTION_HANDLERS; + STREAM_SSL_OPTION_HANDLERS; + + case OPT_DDLOG_RECORD: + record_file = optarg; + break; + + case 'd': + ovnsb_db = optarg; + break; + + case 'D': + ovnnb_db = optarg; + break; + + case 'u': + unixctl_path = optarg; + break; + + case 'h': + usage(); + exit(EXIT_SUCCESS); + + case 'o': + ovs_cmdl_print_options(long_options); + exit(EXIT_SUCCESS); + + case 'V': + ovs_print_version(0, 0); + exit(EXIT_SUCCESS); + + default: + break; + } + } + + if (!ovnsb_db || !ovnsb_db[0]) { + ovnsb_db = default_sb_db(); + } + + if (!ovnnb_db || !ovnnb_db[0]) { + ovnnb_db = default_nb_db(); + } + + free(short_options); +} + +int +main(int argc, char *argv[]) +{ + int res = EXIT_SUCCESS; + struct unixctl_server *unixctl; + int retval; + bool exiting; + + init_table_ids(); + + fatal_ignore_sigpipe(); + ovs_cmdl_proctitle_init(argc, argv); + set_program_name(argv[0]); + service_start(&argc, &argv); + parse_options(argc, argv); + + daemonize_start(false); + + char *abs_unixctl_path = get_abs_unix_ctl_path(unixctl_path); + retval = unixctl_server_create(abs_unixctl_path, &unixctl); + free(abs_unixctl_path); + + if (retval) { + exit(EXIT_FAILURE); + } + + struct northd_status status = { + .locked = false, + .pause = false, + }; + unixctl_command_register("exit", "", 0, 0, ovn_northd_exit, &exiting); + unixctl_command_register("status", "", 0, 0, ovn_northd_status, &status); + + + ddlog_prog ddlog; + ddlog = ddlog_run(1, false, NULL, 0, ddlog_print_error, &delta); + if (!ddlog) { + ovs_fatal(0, "DDlog instance could not be created"); + } + + int replay_fd = -1; + if (record_file) { + replay_fd = open(record_file, O_CREAT | O_WRONLY | O_TRUNC, 0666); + if (replay_fd < 0) { + ovs_fatal(errno, "%s: could not create DDlog record file", + record_file); + } + + if (ddlog_record_commands(ddlog, replay_fd)) { + ovs_fatal(0, "could not enable DDlog command recording"); + } + } + + struct northd_ctx *nb_ctx = northd_ctx_create( + ovnnb_db, "OVN_Northbound", "nb", NULL, ddlog, + nb_input_relations, nb_output_relations, nb_output_only_relations); + struct northd_ctx *sb_ctx = northd_ctx_create( + ovnsb_db, "OVN_Southbound", "sb", "ovn_northd", ddlog, + sb_input_relations, sb_output_relations, sb_output_only_relations); + + unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, sb_ctx); + unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, sb_ctx); + unixctl_command_register("is-paused", "", 0, 0, ovn_northd_is_paused, + sb_ctx); + + daemonize_complete(); + + /* Main loop. */ + exiting = false; + while (!exiting) { + bool has_lock = northd_lock_status(sb_ctx) == HAS_LOCK; + if (!sb_ctx->paused) { + if (has_lock && !status.locked) { + VLOG_INFO("ovn-northd lock acquired. " + "This ovn-northd instance is now active."); + } else if (!has_lock && status.locked) { + VLOG_INFO("ovn-northd lock lost. " + "This ovn-northd instance is now on standby."); + } + } + status.locked = has_lock; + status.pause = sb_ctx->paused; + + bool run_deltas = (northd_lock_status(sb_ctx) == HAS_LOCK && + nb_ctx->state == S_MONITORING && + sb_ctx->state == S_MONITORING); + + northd_run(nb_ctx, run_deltas); + northd_wait(nb_ctx); + + northd_run(sb_ctx, run_deltas); + northd_wait(sb_ctx); + + northd_update_probe_interval(nb_ctx, sb_ctx); + + unixctl_server_run(unixctl); + unixctl_server_wait(unixctl); + if (exiting) { + poll_immediate_wake(); + } + + poll_block(); + if (should_service_stop()) { + exiting = true; + } + } + + northd_ctx_destroy(nb_ctx); + northd_ctx_destroy(sb_ctx); + + ddlog_stop(ddlog); + + if (replay_fd >= 0) { + fsync(replay_fd); + close(replay_fd); + } + + unixctl_server_destroy(unixctl); + service_stop(); + + exit(res); +} + +static void +ovn_northd_exit(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *exiting_) +{ + bool *exiting = exiting_; + *exiting = true; + + unixctl_command_reply(conn, NULL); +} + +static void +ovn_northd_pause(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *sb_ctx_) +{ + struct northd_ctx *sb_ctx = sb_ctx_; + northd_pause(sb_ctx); + unixctl_command_reply(conn, NULL); +} + +static void +ovn_northd_resume(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *sb_ctx_) +{ + struct northd_ctx *sb_ctx = sb_ctx_; + northd_unpause(sb_ctx); + unixctl_command_reply(conn, NULL); +} + +static void +ovn_northd_is_paused(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *sb_ctx_) +{ + struct northd_ctx *sb_ctx = sb_ctx_; + if (sb_ctx->paused) { + unixctl_command_reply(conn, "true"); + } else { + unixctl_command_reply(conn, "false"); + } +} + +static void +ovn_northd_status(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *status_) +{ + struct northd_status *status = status_; + char *status_string; + + if (status->pause) { + status_string = "paused"; + } else { + status_string = status->locked ? "active" : "standby"; + } + + /* + * Use a labelled formatted output so we can add more to the status command + * later without breaking any consuming scripts + */ + struct ds s = DS_EMPTY_INITIALIZER; + ds_put_format(&s, "Status: %s\n", status_string); + unixctl_command_reply(conn, ds_cstr(&s)); + ds_destroy(&s); +} diff --git a/northd/ovn-sb.dlopts b/northd/ovn-sb.dlopts new file mode 100644 index 000000000000..41cf201d6536 --- /dev/null +++ b/northd/ovn-sb.dlopts @@ -0,0 +1,28 @@ +--output-only Logical_Flow +-o SB_Global +-o Multicast_Group +-o Meter +-o Meter_Band +-o Datapath_Binding +-o Port_Binding +-o Gateway_Chassis +-o HA_Chassis +-o HA_Chassis_Group +-o Port_Group +-o MAC_Binding +-o DHCP_Options +-o DHCPv6_Options +-o Address_Set +-o DNS +-o RBAC_Role +-o RBAC_Permission +-o IP_Multicast +-o Service_Monitor +--ro Port_Binding.chassis +--ro Port_Binding.virtual_parent +--ro Port_Binding.encap +--ro IP_Multicast.seq_no +--ro SB_Global.ssl +--ro SB_Global.connections +--ro SB_Global.external_ids +--ro Service_Monitor.status diff --git a/northd/ovn.dl b/northd/ovn.dl new file mode 100644 index 000000000000..e91a4e8a10d0 --- /dev/null +++ b/northd/ovn.dl @@ -0,0 +1,387 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import ovsdb + + +/* Logical port is enabled if it does not have an enabled flag or the flag is true */ +function is_enabled(s: Option<bool>): bool = { + s != Some{false} +} + +/* + * Ethernet addresses + */ +extern type eth_addr + +extern function eth_addr_zero(): eth_addr +extern function eth_addr2string(addr: eth_addr): string +function to_string(addr: eth_addr): string { + eth_addr2string(addr) +} +extern function scan_eth_addr(s: string): Option<eth_addr> +extern function scan_eth_addr_prefix(s: string): Option<bit<64>> +extern function eth_addr_from_string(s: string): Option<eth_addr> +extern function eth_addr_to_uint64(ea: eth_addr): bit<64> +extern function eth_addr_from_uint64(x: bit<64>): eth_addr +extern function eth_addr_mark_random(ea: eth_addr): eth_addr + +function pseudorandom_mac(seed: uuid, variant: bit<16>) : bit<64> = { + eth_addr_to_uint64(eth_addr_mark_random(eth_addr_from_uint64(hash64(seed ++ variant)))) +} + +/* + * IPv4 addresses + */ + +extern type in_addr + +function to_string(ip: in_addr): string = { + var x = iptohl(ip); + "${x >> 24}.${(x >> 16) & 'hff}.${(x >> 8) & 'hff}.${x & 'hff}" +} + +function ip_is_cidr(netmask: in_addr): bool { + var x = ~iptohl(netmask); + (x & (x + 1)) == 0 +} +function ip_is_local_multicast(ip: in_addr): bool { + (iptohl(ip) & 32'hffffff00) == 32'he0000000 +} + +function ip_create_mask(plen: bit<32>): in_addr { + hltoip((64'h00000000ffffffff << (32 - plen))[31:0]) +} + +function ip_bitxor(a: in_addr, b: in_addr): in_addr { + hltoip(iptohl(a) ^ iptohl(b)) +} + +function ip_bitand(a: in_addr, b: in_addr): in_addr { + hltoip(iptohl(a) & iptohl(b)) +} + +function ip_network(addr: in_addr, mask: in_addr): in_addr { + hltoip(iptohl(addr) & iptohl(mask)) +} + +function ip_host(addr: in_addr, mask: in_addr): in_addr { + hltoip(iptohl(addr) & ~iptohl(mask)) +} + +function ip_host_is_zero(addr: in_addr, mask: in_addr): bool { + ip_is_zero(ip_host(addr, mask)) +} + +function ip_is_zero(a: in_addr): bool { + iptohl(a) == 0 +} + +function ip_bcast(addr: in_addr, mask: in_addr): in_addr { + hltoip(iptohl(addr) | ~iptohl(mask)) +} + +extern function ip_parse(s: string): Option<in_addr> +extern function ip_parse_masked(s: string): Either<string/*err*/, (in_addr/*host_ip*/, in_addr/*mask*/)> +extern function ip_parse_cidr(s: string): Either<string/*err*/, (in_addr/*ip*/, bit<32>/*plen*/)> +extern function ip_count_cidr_bits(ip: in_addr): Option<bit<8>> + +/* True if both 'ips' are in the same network as defined by netmask 'mask', + * false otherwise. */ +function ip_same_network(ips: (in_addr, in_addr), mask: in_addr): bool { + ((iptohl(ips.0) ^ iptohl(ips.1)) & iptohl(mask)) == 0 +} + +extern function iptohl(addr: in_addr): bit<32> +extern function hltoip(addr: bit<32>): in_addr +extern function scan_static_dynamic_ip(s: string): Option<in_addr> + +/* + * parse IPv4 address list of the form: + * "10.0.0.4 10.0.0.10 10.0.0.20..10.0.0.50 10.0.0.100..10.0.0.110" + */ +extern function parse_ip_list(ips: string): Either<string, Vec<(in_addr, Option<in_addr>)>> + +/* + * IPv6 addresses + */ +extern type in6_addr + +extern function in6_generate_lla(ea: eth_addr): in6_addr +extern function in6_generate_eui64(ea: eth_addr, prefix: in6_addr): in6_addr +extern function in6_is_lla(addr: in6_addr): bool +extern function in6_addr_solicited_node(ip6: in6_addr): in6_addr + +extern function ipv6_string_mapped(addr: in6_addr): string +extern function ipv6_parse_masked(s: string): Either<string/*err*/, (in6_addr/*ip*/, in6_addr/*mask*/)> +extern function ipv6_parse(s: string): Option<in6_addr> +extern function ipv6_parse_cidr(s: string): Either<string/*err*/, (in6_addr/*ip*/, bit<32>/*plen*/)> +extern function ipv6_bitxor(a: in6_addr, b: in6_addr): in6_addr +extern function ipv6_bitand(a: in6_addr, b: in6_addr): in6_addr +extern function ipv6_bitnot(a: in6_addr): in6_addr +extern function ipv6_create_mask(mask: bit<32>): in6_addr +extern function ipv6_is_zero(a: in6_addr): bool +extern function ipv6_is_v4mapped(a: in6_addr): bool +extern function ipv6_is_routable_multicast(a: in6_addr): bool +extern function ipv6_is_all_hosts(a: in6_addr): bool + +function ipv6_network(addr: in6_addr, mask: in6_addr): in6_addr { + ipv6_bitand(addr, mask) +} + +function ipv6_host(addr: in6_addr, mask: in6_addr): in6_addr { + ipv6_bitand(addr, ipv6_bitnot(mask)) +} + +/* True if both 'ips' are in the same network as defined by netmask 'mask', + * false otherwise. */ +function ipv6_same_network(ips: (in6_addr, in6_addr), mask: in6_addr): bool { + ipv6_network(ips.0, mask) == ipv6_network(ips.1, mask) +} + +extern function ipv6_host_is_zero(addr: in6_addr, mask: in6_addr): bool +extern function ipv6_multicast_to_ethernet(ip6: in6_addr): eth_addr +extern function ipv6_is_cidr(ip6: in6_addr): bool +extern function ipv6_count_cidr_bits(ip6: in6_addr): Option<bit<8>> + +extern function inet6_ntop(addr: in6_addr): string +function to_string(addr: in6_addr): string = { + inet6_ntop(addr) +} + +/* + * IPv4 | IPv6 addresses + */ + +typedef v46_ip = IPv4 { ipv4: in_addr } | IPv6 { ipv6: in6_addr } + +function ip46_parse_cidr(s: string) : Option<(v46_ip, bit<32>)> = { + match (ip_parse_cidr(s)) { + Right{(ipv4, plen)} -> return Some{(IPv4{ipv4}, plen)}, + _ -> () + }; + match (ipv6_parse_cidr(s)) { + Right{(ipv6, plen)} -> return Some{(IPv6{ipv6}, plen)}, + _ -> () + }; + None +} +function ip46_parse_masked(s: string) : Option<(v46_ip, v46_ip)> = { + match (ip_parse_masked(s)) { + Right{(ipv4, mask)} -> return Some{(IPv4{ipv4}, IPv4{mask})}, + _ -> () + }; + match (ipv6_parse_masked(s)) { + Right{(ipv6, mask)} -> return Some{(IPv6{ipv6}, IPv6{mask})}, + _ -> () + }; + None +} +function ip46_parse(s: string) : Option<v46_ip> = { + match (ip_parse(s)) { + Some{ipv4} -> return Some{IPv4{ipv4}}, + _ -> () + }; + match (ipv6_parse(s)) { + Some{ipv6} -> return Some{IPv6{ipv6}}, + _ -> () + }; + None +} +function to_string(ip46: v46_ip) : string = { + match (ip46) { + IPv4{ipv4} -> "${ipv4}", + IPv6{ipv6} -> "${ipv6}" + } +} +function to_bracketed_string(ip46: v46_ip) : string = { + match (ip46) { + IPv4{ipv4} -> "${ipv4}", + IPv6{ipv6} -> "[${ipv6}]" + } +} + +function ip46_get_network(ip46: v46_ip, plen: bit<32>) : v46_ip { + match (ip46) { + IPv4{ipv4} -> IPv4{ip_bitand(ipv4, ip_create_mask(plen))}, + IPv6{ipv6} -> IPv6{ipv6_bitand(ipv6, ipv6_create_mask(plen))} + } +} + +function ip46_is_all_ones(ip46: v46_ip) : bool { + match (ip46) { + IPv4{ipv4} -> ipv4 == ip_create_mask(32), + IPv6{ipv6} -> ipv6 == ipv6_create_mask(128) + } +} + +function ip46_count_cidr_bits(ip46: v46_ip) : Option<bit<8>> { + match (ip46) { + IPv4{ipv4} -> ip_count_cidr_bits(ipv4), + IPv6{ipv6} -> ipv6_count_cidr_bits(ipv6) + } +} + +function ip46_ipX(ip46: v46_ip) : string { + match (ip46) { + IPv4{_} -> "ip4", + IPv6{_} -> "ip6" + } +} + +function ip46_xxreg(ip46: v46_ip) : string { + match (ip46) { + IPv4{_} -> "", + IPv6{_} -> "xx" + } +} + +typedef ipv4_netaddr = IPV4NetAddr { + addr: in_addr, /* 192.168.10.123 */ + plen: bit<32> /* CIDR Prefix: 24. */ +} + +/* Returns the netmask. */ +function ipv4_netaddr_mask(na: ipv4_netaddr): in_addr { + ip_create_mask(na.plen) +} + +/* Returns the broadcast address. */ +function ipv4_netaddr_bcast(na: ipv4_netaddr): in_addr { + ip_bcast(na.addr, ipv4_netaddr_mask(na)) +} + +/* Returns the network (with the host bits zeroed). */ +function ipv4_netaddr_network(na: ipv4_netaddr): in_addr { + ip_network(na.addr, ipv4_netaddr_mask(na)) +} + +/* Returns the host (with the network bits zeroed). */ +function ipv4_netaddr_host(na: ipv4_netaddr): in_addr { + ip_host(na.addr, ipv4_netaddr_mask(na)) +} + +/* Match on the host, if the host part is nonzero, or on the network + * otherwise. */ +function ipv4_netaddr_match_host_or_network(na: ipv4_netaddr): string { + if (na.plen < 32 and ip_is_zero(ipv4_netaddr_host(na))) { + "${na.addr}/${na.plen}" + } else { + "${na.addr}" + } +} + +/* Match on the network. */ +function ipv4_netaddr_match_network(na: ipv4_netaddr): string { + if (na.plen < 32) { + "${ipv4_netaddr_network(na)}/${na.plen}" + } else { + "${na.addr}" + } +} + +typedef ipv6_netaddr = IPV6NetAddr { + addr: in6_addr, /* fc00::1 */ + plen: bit<32> /* CIDR Prefix: 64 */ +} + +/* Returns the netmask. */ +function ipv6_netaddr_mask(na: ipv6_netaddr): in6_addr { + ipv6_create_mask(na.plen) +} + +/* Returns the network (with the host bits zeroed). */ +function ipv6_netaddr_network(na: ipv6_netaddr): in6_addr { + ipv6_network(na.addr, ipv6_netaddr_mask(na)) +} + +/* Returns the host (with the network bits zeroed). */ +function ipv6_netaddr_host(na: ipv6_netaddr): in6_addr { + ipv6_host(na.addr, ipv6_netaddr_mask(na)) +} + +function ipv6_netaddr_solicited_node(na: ipv6_netaddr): in6_addr { + in6_addr_solicited_node(na.addr) +} + +function ipv6_netaddr_is_lla(na: ipv6_netaddr): bool { + return in6_is_lla(ipv6_netaddr_network(na)) +} + +/* Match on the network. */ +function ipv6_netaddr_match_network(na: ipv6_netaddr): string { + if (na.plen < 128) { + "${ipv6_netaddr_network(na)}/${na.plen}" + } else { + "${na.addr}" + } +} + +typedef lport_addresses = LPortAddress { + ea: eth_addr, + ipv4_addrs: Vec<ipv4_netaddr>, + ipv6_addrs: Vec<ipv6_netaddr> +} + +function to_string(addr: lport_addresses): string = { + var addrs = ["${addr.ea}"]; + for (ip4 in addr.ipv4_addrs) { + vec_push(addrs, "${ip4.addr}") + }; + + for (ip6 in addr.ipv6_addrs) { + vec_push(addrs, "${ip6.addr}") + }; + + string_join(addrs, " ") +} + +/* + * Packet header lengths + */ +function eTH_HEADER_LEN(): integer = 14 +function vLAN_HEADER_LEN(): integer = 4 +function vLAN_ETH_HEADER_LEN(): integer = eTH_HEADER_LEN() + vLAN_HEADER_LEN() + +/* + * Logging + */ +extern function warn(msg: string): () +extern function err(msg: string): () +extern function abort(msg: string): () + +/* + * C functions imported from OVN + */ +extern function is_dynamic_lsp_address(addr: string): bool +extern function extract_lsp_addresses(address: string): Option<lport_addresses> +extern function extract_addresses(address: string): Option<lport_addresses> +extern function extract_lrp_networks(mac: string, networks: Set<string>): Option<lport_addresses> + +extern function split_addresses(addr: string): (Set<string>, Set<string>) + +/* + * C functions imported from OVS + */ +extern function json_string_escape(s: string): string + +/* Returns the number of 1-bits in `x`, between 0 and 64 inclusive */ +extern function count_1bits(x: bit<64>): bit<8> + +/* For a 'key' of the form "IP:port" or just "IP", returns + * (v46_ip, port) tuple. */ +extern function ip_address_and_port_from_lb_key(k: string): Option<(v46_ip, bit<16>)> + +extern function str_to_int(s: string, base: bit<16>): Option<integer> +extern function str_to_uint(s: string, base: bit<16>): Option<integer> diff --git a/northd/ovn.rs b/northd/ovn.rs new file mode 100644 index 000000000000..e8d899951da8 --- /dev/null +++ b/northd/ovn.rs @@ -0,0 +1,857 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +use ::nom::*; +use ::differential_datalog::record; +use ::std::ffi; +use ::std::ptr; +use ::std::default; +use ::std::process; +use ::std::os::raw; +use ::libc; + +use crate::ddlog_std; + +pub fn warn(msg: &String) { + warn_(msg.as_str()) +} + +pub fn warn_(msg: &str) { + unsafe { + ddlog_warn(ffi::CString::new(msg).unwrap().as_ptr()); + } +} + +pub fn err_(msg: &str) { + unsafe { + ddlog_err(ffi::CString::new(msg).unwrap().as_ptr()); + } +} + +pub fn abort(msg: &String) { + abort_(msg.as_str()) +} + +fn abort_(msg: &str) { + err_(format!("DDlog error: {}.", msg).as_ref()); + process::abort(); +} + +const ETH_ADDR_SIZE: usize = 6; +const IN6_ADDR_SIZE: usize = 16; +const INET6_ADDRSTRLEN: usize = 46; +const INET_ADDRSTRLEN: usize = 16; +const ETH_ADDR_STRLEN: usize = 17; + +const AF_INET: usize = 2; +const AF_INET6: usize = 10; + +/* Implementation for externs declared in ovn.dl */ + +#[repr(C)] +#[derive(Default, PartialEq, Eq, PartialOrd, Ord, Clone, Hash, Serialize, Deserialize, Debug)] +pub struct eth_addr { + x: [u8; ETH_ADDR_SIZE] +} + +pub fn eth_addr_zero() -> eth_addr { + eth_addr { x: [0; ETH_ADDR_SIZE] } +} + +pub fn eth_addr2string(addr: &eth_addr) -> String { + format!("{:02x}:{:02x}:{:02x}:{:02x}:{:02x}:{:02x}", + addr.x[0], addr.x[1], addr.x[2], addr.x[3], addr.x[4], addr.x[5]) +} + +pub fn eth_addr_from_string(s: &String) -> ddlog_std::Option<eth_addr> { + let mut ea: eth_addr = Default::default(); + unsafe { + if ovs::eth_addr_from_string(string2cstr(s).as_ptr(), &mut ea as *mut eth_addr) { + ddlog_std::Option::Some{x: ea} + } else { + ddlog_std::Option::None + } + } +} + +pub fn eth_addr_from_uint64(x: &u64) -> eth_addr { + let mut ea: eth_addr = Default::default(); + unsafe { + ovs::eth_addr_from_uint64(*x as libc::uint64_t, &mut ea as *mut eth_addr); + ea + } +} + +pub fn eth_addr_mark_random(ea: &eth_addr) -> eth_addr { + unsafe { + let mut ea_new = ea.clone(); + ovs::eth_addr_mark_random(&mut ea_new as *mut eth_addr); + ea_new + } +} + +pub fn eth_addr_to_uint64(ea: &eth_addr) -> u64 { + unsafe { + ovs::eth_addr_to_uint64(ea.clone()) as u64 + } +} + + +impl FromRecord for eth_addr { + fn from_record(val: &record::Record) -> Result<Self, String> { + Ok(eth_addr{x: <[u8; ETH_ADDR_SIZE]>::from_record(val)?}) + } +} + +::differential_datalog::decl_struct_into_record!(eth_addr, <>, x); +::differential_datalog::decl_record_mutator_struct!(eth_addr, <>, x: [u8; ETH_ADDR_SIZE]); + + +#[repr(C)] +#[derive(Default, PartialEq, Eq, PartialOrd, Ord, Clone, Hash, Serialize, Deserialize, Debug)] +pub struct in6_addr { + x: [u8; IN6_ADDR_SIZE] +} + +pub const in6addr_any: in6_addr = in6_addr{x: [0; IN6_ADDR_SIZE]}; +pub const in6addr_all_hosts: in6_addr = in6_addr{x: [ + 0xff,0x02,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01 ]}; + +impl FromRecord for in6_addr { + fn from_record(val: &record::Record) -> Result<Self, String> { + Ok(in6_addr{x: <[u8; IN6_ADDR_SIZE]>::from_record(val)?}) + } +} + +::differential_datalog::decl_struct_into_record!(in6_addr, <>, x); +::differential_datalog::decl_record_mutator_struct!(in6_addr, <>, x: [u8; IN6_ADDR_SIZE]); + +pub fn in6_generate_lla(ea: &eth_addr) -> in6_addr { + let mut addr: in6_addr = Default::default(); + unsafe {ovs::in6_generate_lla(ea.clone(), &mut addr as *mut in6_addr)}; + addr +} + +pub fn in6_generate_eui64(ea: &eth_addr, prefix: &in6_addr) -> in6_addr { + let mut addr: in6_addr = Default::default(); + unsafe {ovs::in6_generate_eui64(ea.clone(), + prefix as *const in6_addr, + &mut addr as *mut in6_addr)}; + addr +} + +pub fn in6_is_lla(addr: &in6_addr) -> bool { + unsafe {ovs::in6_is_lla(addr as *const in6_addr)} +} + +pub fn in6_addr_solicited_node(ip6: &in6_addr) -> in6_addr +{ + let mut res: in6_addr = Default::default(); + unsafe { + ovs::in6_addr_solicited_node(&mut res as *mut in6_addr, ip6 as *const in6_addr); + } + res +} + +pub fn ipv6_bitand(a: &in6_addr, b: &in6_addr) -> in6_addr { + unsafe { + ovs::ipv6_addr_bitand(a as *const in6_addr, b as *const in6_addr) + } +} + +pub fn ipv6_bitxor(a: &in6_addr, b: &in6_addr) -> in6_addr { + unsafe { + ovs::ipv6_addr_bitxor(a as *const in6_addr, b as *const in6_addr) + } +} + +pub fn ipv6_bitnot(a: &in6_addr) -> in6_addr { + let mut result: in6_addr = Default::default(); + for i in 0..16 { + result.x[i] = !a.x[i] + } + result +} + +pub fn ipv6_string_mapped(addr: &in6_addr) -> String { + let mut addr_str = [0 as i8; INET6_ADDRSTRLEN]; + unsafe { + ovs::ipv6_string_mapped(&mut addr_str[0] as *mut raw::c_char, addr as *const in6_addr); + cstr2string(&addr_str as *const raw::c_char) + } +} + +pub fn ipv6_is_zero(addr: &in6_addr) -> bool { + *addr == in6addr_any +} + +pub fn ipv6_count_cidr_bits(ip6: &in6_addr) -> ddlog_std::Option<u8> { + unsafe { + match (ipv6_is_cidr(ip6)) { + true => ddlog_std::Option::Some{x: ovs::ipv6_count_cidr_bits(ip6 as *const in6_addr) as u8}, + false => ddlog_std::Option::None + } + } +} + +pub fn json_string_escape(s: &String) -> String { + let mut ds = ovs_ds::new(); + unsafe { + ovs::json_string_escape(ffi::CString::new(s.as_str()).unwrap().as_ptr() as *const raw::c_char, + &mut ds as *mut ovs_ds); + }; + unsafe{ds.into_string()} +} + +pub fn extract_lsp_addresses(address: &String) -> ddlog_std::Option<lport_addresses> { + unsafe { + let mut laddrs: lport_addresses_c = Default::default(); + if ovn_c::extract_lsp_addresses(string2cstr(address).as_ptr(), + &mut laddrs as *mut lport_addresses_c) { + ddlog_std::Option::Some{x: laddrs.into_ddlog()} + } else { + ddlog_std::Option::None + } + } +} + +pub fn extract_addresses(address: &String) -> ddlog_std::Option<lport_addresses> { + unsafe { + let mut laddrs: lport_addresses_c = Default::default(); + let mut ofs: raw::c_int = 0; + if ovn_c::extract_addresses(string2cstr(address).as_ptr(), + &mut laddrs as *mut lport_addresses_c, + &mut ofs as *mut raw::c_int) { + ddlog_std::Option::Some{x: laddrs.into_ddlog()} + } else { + ddlog_std::Option::None + } + } +} + +pub fn extract_lrp_networks(mac: &String, networks: &ddlog_std::Set<String>) -> ddlog_std::Option<lport_addresses> +{ + unsafe { + let mut laddrs: lport_addresses_c = Default::default(); + let mut networks_cstrs = Vec::with_capacity(networks.x.len()); + let mut networks_ptrs = Vec::with_capacity(networks.x.len()); + for net in networks.x.iter() { + networks_cstrs.push(string2cstr(net)); + networks_ptrs.push(networks_cstrs.last().unwrap().as_ptr()); + }; + if ovn_c::extract_lrp_networks__(string2cstr(mac).as_ptr(), networks_ptrs.as_ptr() as *const *const raw::c_char, + networks_ptrs.len(), &mut laddrs as *mut lport_addresses_c) { + ddlog_std::Option::Some{x: laddrs.into_ddlog()} + } else { + ddlog_std::Option::None + } + } +} + +pub fn ipv6_parse_masked(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in6_addr, in6_addr>> +{ + unsafe { + let mut ip: in6_addr = Default::default(); + let mut mask: in6_addr = Default::default(); + let err = ovs::ipv6_parse_masked(string2cstr(s).as_ptr(), &mut ip as *mut in6_addr, &mut mask as *mut in6_addr); + if (err != ptr::null_mut()) { + let errstr = cstr2string(err); + free(err as *mut raw::c_void); + ddlog_std::Either::Left{l: errstr} + } else { + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, mask)} + } + } +} + +pub fn ipv6_parse_cidr(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in6_addr, u32>> +{ + unsafe { + let mut ip: in6_addr = Default::default(); + let mut plen: raw::c_uint = 0; + let err = ovs::ipv6_parse_cidr(string2cstr(s).as_ptr(), &mut ip as *mut in6_addr, &mut plen as *mut raw::c_uint); + if (err != ptr::null_mut()) { + let errstr = cstr2string(err); + free(err as *mut raw::c_void); + ddlog_std::Either::Left{l: errstr} + } else { + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, plen as u32)} + } + } +} + +pub fn ipv6_parse(s: &String) -> ddlog_std::Option<in6_addr> +{ + unsafe { + let mut ip: in6_addr = Default::default(); + let res = ovs::ipv6_parse(string2cstr(s).as_ptr(), &mut ip as *mut in6_addr); + if (res) { + ddlog_std::Option::Some{x: ip} + } else { + ddlog_std::Option::None + } + } +} + +pub fn ipv6_create_mask(mask: &u32) -> in6_addr +{ + unsafe {ovs::ipv6_create_mask(*mask as raw::c_uint)} +} + + +pub fn ipv6_is_routable_multicast(a: &in6_addr) -> bool +{ + unsafe{ovn_c::ipv6_addr_is_routable_multicast(a as *const in6_addr)} +} + +pub fn ipv6_is_all_hosts(a: &in6_addr) -> bool +{ + return *a == in6addr_all_hosts; +} + +pub fn ipv6_is_cidr(a: &in6_addr) -> bool +{ + unsafe{ovs::ipv6_is_cidr(a as *const in6_addr)} +} + +pub fn ipv6_multicast_to_ethernet(ip6: &in6_addr) -> eth_addr +{ + let mut eth: eth_addr = Default::default(); + unsafe{ + ovs::ipv6_multicast_to_ethernet(&mut eth as *mut eth_addr, ip6 as *const in6_addr); + } + eth +} + +pub type in_addr = u32; +pub type ovs_be32 = u32; + +pub fn iptohl(addr: &in_addr) -> u32 { + ddlog_std::ntohl(addr) +} +pub fn hltoip(addr: &u32) -> in_addr { + ddlog_std::htonl(addr) +} + +pub fn ip_parse_masked(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in_addr, in_addr>> +{ + unsafe { + let mut ip: ovs_be32 = 0; + let mut mask: ovs_be32 = 0; + let err = ovs::ip_parse_masked(string2cstr(s).as_ptr(), &mut ip as *mut ovs_be32, &mut mask as *mut ovs_be32); + if (err != ptr::null_mut()) { + let errstr = cstr2string(err); + free(err as *mut raw::c_void); + ddlog_std::Either::Left{l: errstr} + } else { + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, mask)} + } + } +} + +pub fn ip_parse_cidr(s: &String) -> ddlog_std::Either<String, ddlog_std::tuple2<in_addr, u32>> +{ + unsafe { + let mut ip: ovs_be32 = 0; + let mut plen: raw::c_uint = 0; + let err = ovs::ip_parse_cidr(string2cstr(s).as_ptr(), &mut ip as *mut ovs_be32, &mut plen as *mut raw::c_uint); + if (err != ptr::null_mut()) { + let errstr = cstr2string(err); + free(err as *mut raw::c_void); + ddlog_std::Either::Left{l: errstr} + } else { + ddlog_std::Either::Right{r: ddlog_std::tuple2(ip, plen as u32)} + } + } +} + +pub fn ip_parse(s: &String) -> ddlog_std::Option<in_addr> +{ + unsafe { + let mut ip: ovs_be32 = 0; + if (ovs::ip_parse(string2cstr(s).as_ptr(), &mut ip as *mut ovs_be32)) { + ddlog_std::Option::Some{x:ip} + } else { + ddlog_std::Option::None + } + } +} + +pub fn ip_count_cidr_bits(address: &in_addr) -> ddlog_std::Option<u8> { + unsafe { + match (ip_is_cidr(address)) { + true => ddlog_std::Option::Some{x: ovs::ip_count_cidr_bits(*address) as u8}, + false => ddlog_std::Option::None + } + } +} + +pub fn is_dynamic_lsp_address(address: &String) -> bool { + unsafe { + ovn_c::is_dynamic_lsp_address(string2cstr(address).as_ptr()) + } +} + +pub fn split_addresses(addresses: &String) -> ddlog_std::tuple2<ddlog_std::Set<String>, ddlog_std::Set<String>> { + let mut ip4_addrs = ovs_svec::new(); + let mut ip6_addrs = ovs_svec::new(); + unsafe { + ovn_c::split_addresses(string2cstr(addresses).as_ptr(), &mut ip4_addrs as *mut ovs_svec, &mut ip6_addrs as *mut ovs_svec); + ddlog_std::tuple2(ip4_addrs.into_strings(), ip6_addrs.into_strings()) + } +} + +pub fn scan_eth_addr(s: &String) -> ddlog_std::Option<eth_addr> { + let mut ea = eth_addr_zero(); + unsafe { + if ovs::ovs_scan(string2cstr(s).as_ptr(), b"%hhx:%hhx:%hhx:%hhx:%hhx:%hhx\0".as_ptr() as *const raw::c_char, + &mut ea.x[0] as *mut u8, &mut ea.x[1] as *mut u8, + &mut ea.x[2] as *mut u8, &mut ea.x[3] as *mut u8, + &mut ea.x[4] as *mut u8, &mut ea.x[5] as *mut u8) + { + ddlog_std::Option::Some{x: ea} + } else { + ddlog_std::Option::None + } + } +} + +pub fn scan_eth_addr_prefix(s: &String) -> ddlog_std::Option<u64> { + let mut b2: u8 = 0; + let mut b1: u8 = 0; + let mut b0: u8 = 0; + unsafe { + if ovs::ovs_scan(string2cstr(s).as_ptr(), b"%hhx:%hhx:%hhx\0".as_ptr() as *const raw::c_char, + &mut b2 as *mut u8, &mut b1 as *mut u8, &mut b0 as *mut u8) + { + ddlog_std::Option::Some{x: ((b2 as u64) << 40) | ((b1 as u64) << 32) | ((b0 as u64) << 24) } + } else { + ddlog_std::Option::None + } + } +} + +pub fn scan_static_dynamic_ip(s: &String) -> ddlog_std::Option<in_addr> { + let mut ip0: u8 = 0; + let mut ip1: u8 = 0; + let mut ip2: u8 = 0; + let mut ip3: u8 = 0; + let mut n: raw::c_uint = 0; + unsafe { + if ovs::ovs_scan(string2cstr(s).as_ptr(), b"dynamic %hhu.%hhu.%hhu.%hhu%n\0".as_ptr() as *const raw::c_char, + &mut ip0 as *mut u8, + &mut ip1 as *mut u8, + &mut ip2 as *mut u8, + &mut ip3 as *mut u8, + &mut n) && s.len() == (n as usize) + { + ddlog_std::Option::Some{x: ddlog_std::htonl(&(((ip0 as u32) << 24) | ((ip1 as u32) << 16) | ((ip2 as u32) << 8) | (ip3 as u32)))} + } else { + ddlog_std::Option::None + } + } +} + +pub fn ip_address_and_port_from_lb_key(k: &String) -> + ddlog_std::Option<ddlog_std::tuple2<v46_ip, u16>> +{ + unsafe { + let mut ip_address: *mut raw::c_char = ptr::null_mut(); + let mut port: libc::uint16_t = 0; + let mut addr_family: raw::c_int = 0; + + ovn_c::ip_address_and_port_from_lb_key(string2cstr(k).as_ptr(), &mut ip_address as *mut *mut raw::c_char, + &mut port as *mut libc::uint16_t, &mut addr_family as *mut raw::c_int); + if (ip_address != ptr::null_mut()) { + match (ip46_parse(&cstr2string(ip_address))) { + ddlog_std::Option::Some{x: ip46} => { + let res = ddlog_std::tuple2(ip46, port as u16); + free(ip_address as *mut raw::c_void); + return ddlog_std::Option::Some{x: res} + }, + _ => () + } + } + ddlog_std::Option::None + } +} + +pub fn count_1bits(x: &u64) -> u8 { + x.count_ones() as u8 +} + + +pub fn str_to_int(s: &String, base: &u16) -> ddlog_std::Option<u64> { + let mut i: raw::c_int = 0; + let ok = unsafe { + ovs::str_to_int(string2cstr(s).as_ptr(), *base as raw::c_int, &mut i as *mut raw::c_int) + }; + if ok { + ddlog_std::Option::Some{x: i as u64} + } else { + ddlog_std::Option::None + } +} + +pub fn str_to_uint(s: &String, base: &u16) -> ddlog_std::Option<u64> { + let mut i: raw::c_uint = 0; + let ok = unsafe { + ovs::str_to_uint(string2cstr(s).as_ptr(), *base as raw::c_int, &mut i as *mut raw::c_uint) + }; + if ok { + ddlog_std::Option::Some{x: i as u64} + } else { + ddlog_std::Option::None + } +} + +pub fn inet6_ntop(addr: &in6_addr) -> String { + let mut buf = [0 as i8; INET6_ADDRSTRLEN]; + unsafe { + let res = inet_ntop(AF_INET6 as raw::c_int, addr as *const in6_addr as *const raw::c_void, + &mut buf[0] as *mut raw::c_char, INET6_ADDRSTRLEN as libc::socklen_t); + if res == ptr::null() { + warn(&format!("inet_ntop({:?}) failed", *addr)); + "".to_owned() + } else { + cstr2string(&buf as *const raw::c_char) + } + } +} + +/* Internals */ + +unsafe fn cstr2string(s: *const raw::c_char) -> String { + ffi::CStr::from_ptr(s).to_owned().into_string(). + unwrap_or_else(|e|{ warn(&format!("cstr2string: {}", e)); "".to_owned() }) +} + +fn string2cstr(s: &String) -> ffi::CString { + ffi::CString::new(s.as_str()).unwrap() +} + +/* OVS dynamic string type */ +#[repr(C)] +struct ovs_ds { + s: *mut raw::c_char, /* Null-terminated string. */ + length: libc::size_t, /* Bytes used, not including null terminator. */ + allocated: libc::size_t /* Bytes allocated, not including null terminator. */ +} + +impl ovs_ds { + pub fn new() -> ovs_ds { + ovs_ds{s: ptr::null_mut(), length: 0, allocated: 0} + } + + pub unsafe fn into_string(mut self) -> String { + let res = cstr2string(ovs::ds_cstr(&self as *const ovs_ds)); + ovs::ds_destroy(&mut self as *mut ovs_ds); + res + } +} + +/* OVS string vector type */ +#[repr(C)] +struct ovs_svec { + names: *mut *mut raw::c_char, + n: libc::size_t, + allocated: libc::size_t +} + +impl ovs_svec { + pub fn new() -> ovs_svec { + ovs_svec{names: ptr::null_mut(), n: 0, allocated: 0} + } + + pub unsafe fn into_strings(mut self) -> ddlog_std::Set<String> { + let mut res: ddlog_std::Set<String> = ddlog_std::Set::new(); + unsafe { + for i in 0..self.n { + res.insert(cstr2string(*self.names.offset(i as isize))); + } + ovs::svec_destroy(&mut self as *mut ovs_svec); + } + res + } +} + + +// ovn/lib/ovn-util.h +#[repr(C)] +struct ipv4_netaddr_c { + addr: libc::uint32_t, + mask: libc::uint32_t, + network: libc::uint32_t, + plen: raw::c_uint, + + addr_s: [raw::c_char; INET_ADDRSTRLEN + 1], /* "192.168.10.123" */ + network_s: [raw::c_char; INET_ADDRSTRLEN + 1], /* "192.168.10.0" */ + bcast_s: [raw::c_char; INET_ADDRSTRLEN + 1] /* "192.168.10.255" */ +} + +impl Default for ipv4_netaddr_c { + fn default() -> Self { + ipv4_netaddr_c { + addr: 0, + mask: 0, + network: 0, + plen: 0, + addr_s: [0; INET_ADDRSTRLEN + 1], + network_s: [0; INET_ADDRSTRLEN + 1], + bcast_s: [0; INET_ADDRSTRLEN + 1] + } + } +} + +impl ipv4_netaddr_c { + pub unsafe fn to_ddlog(&self) -> ipv4_netaddr { + ipv4_netaddr{ + addr: self.addr, + plen: self.plen, + } + } +} + +#[repr(C)] +struct ipv6_netaddr_c { + addr: in6_addr, /* fc00::1 */ + mask: in6_addr, /* ffff:ffff:ffff:ffff:: */ + sn_addr: in6_addr, /* ff02:1:ff00::1 */ + network: in6_addr, /* fc00:: */ + plen: raw::c_uint, /* CIDR Prefix: 64 */ + + addr_s: [raw::c_char; INET6_ADDRSTRLEN + 1], /* "fc00::1" */ + sn_addr_s: [raw::c_char; INET6_ADDRSTRLEN + 1], /* "ff02:1:ff00::1" */ + network_s: [raw::c_char; INET6_ADDRSTRLEN + 1] /* "fc00::" */ +} + +impl Default for ipv6_netaddr_c { + fn default() -> Self { + ipv6_netaddr_c { + addr: Default::default(), + mask: Default::default(), + sn_addr: Default::default(), + network: Default::default(), + plen: 0, + addr_s: [0; INET6_ADDRSTRLEN + 1], + sn_addr_s: [0; INET6_ADDRSTRLEN + 1], + network_s: [0; INET6_ADDRSTRLEN + 1] + } + } +} + +impl ipv6_netaddr_c { + pub unsafe fn to_ddlog(&self) -> ipv6_netaddr { + ipv6_netaddr{ + addr: self.addr.clone(), + plen: self.plen + } + } +} + + +// ovn-util.h +#[repr(C)] +struct lport_addresses_c { + ea_s: [raw::c_char; ETH_ADDR_STRLEN + 1], + ea: eth_addr, + n_ipv4_addrs: libc::size_t, + ipv4_addrs: *mut ipv4_netaddr_c, + n_ipv6_addrs: libc::size_t, + ipv6_addrs: *mut ipv6_netaddr_c +} + +impl Default for lport_addresses_c { + fn default() -> Self { + lport_addresses_c { + ea_s: [0; ETH_ADDR_STRLEN + 1], + ea: Default::default(), + n_ipv4_addrs: 0, + ipv4_addrs: ptr::null_mut(), + n_ipv6_addrs: 0, + ipv6_addrs: ptr::null_mut() + } + } +} + +impl lport_addresses_c { + pub unsafe fn into_ddlog(mut self) -> lport_addresses { + let mut ipv4_addrs = ddlog_std::Vec::with_capacity(self.n_ipv4_addrs); + for i in 0..self.n_ipv4_addrs { + ipv4_addrs.push((&*self.ipv4_addrs.offset(i as isize)).to_ddlog()) + } + let mut ipv6_addrs = ddlog_std::Vec::with_capacity(self.n_ipv6_addrs); + for i in 0..self.n_ipv6_addrs { + ipv6_addrs.push((&*self.ipv6_addrs.offset(i as isize)).to_ddlog()) + } + let res = lport_addresses { + ea: self.ea.clone(), + ipv4_addrs: ipv4_addrs, + ipv6_addrs: ipv6_addrs + }; + ovn_c::destroy_lport_addresses(&mut self as *mut lport_addresses_c); + res + } +} + +/* functions imported from ovn-northd.c */ +extern "C" { + fn ddlog_warn(msg: *const raw::c_char); + fn ddlog_err(msg: *const raw::c_char); +} + +/* functions imported from libovn */ +mod ovn_c { + use ::std::os::raw; + use ::libc; + use super::lport_addresses_c; + use super::ovs_svec; + use super::in6_addr; + + #[link(name = "ovn")] + extern "C" { + // ovn/lib/ovn-util.h + pub fn extract_lsp_addresses(address: *const raw::c_char, laddrs: *mut lport_addresses_c) -> bool; + pub fn extract_addresses(address: *const raw::c_char, laddrs: *mut lport_addresses_c, ofs: *mut raw::c_int) -> bool; + pub fn extract_lrp_networks__(mac: *const raw::c_char, networks: *const *const raw::c_char, + n_networks: libc::size_t, laddrs: *mut lport_addresses_c) -> bool; + pub fn destroy_lport_addresses(addrs: *mut lport_addresses_c); + pub fn is_dynamic_lsp_address(address: *const raw::c_char) -> bool; + pub fn split_addresses(addresses: *const raw::c_char, ip4_addrs: *mut ovs_svec, ipv6_addrs: *mut ovs_svec); + pub fn ip_address_and_port_from_lb_key(key: *const raw::c_char, ip_address: *mut *mut raw::c_char, + port: *mut libc::uint16_t, addr_family: *mut raw::c_int); + pub fn ipv6_addr_is_routable_multicast(ip: *const in6_addr) -> bool; + } +} + +mod ovs { + use ::std::os::raw; + use ::libc; + use super::in6_addr; + use super::ovs_be32; + use super::ovs_ds; + use super::eth_addr; + use super::ovs_svec; + + /* functions imported from libopenvswitch */ + #[link(name = "openvswitch")] + extern "C" { + // lib/packets.h + pub fn ipv6_string_mapped(addr_str: *mut raw::c_char, addr: *const in6_addr) -> *const raw::c_char; + pub fn ipv6_parse_masked(s: *const raw::c_char, ip: *mut in6_addr, mask: *mut in6_addr) -> *mut raw::c_char; + pub fn ipv6_parse_cidr(s: *const raw::c_char, ip: *mut in6_addr, plen: *mut raw::c_uint) -> *mut raw::c_char; + pub fn ipv6_parse(s: *const raw::c_char, ip: *mut in6_addr) -> bool; + pub fn ipv6_mask_is_any(mask: *const in6_addr) -> bool; + pub fn ipv6_count_cidr_bits(mask: *const in6_addr) -> raw::c_int; + pub fn ipv6_is_cidr(mask: *const in6_addr) -> bool; + pub fn ipv6_addr_bitxor(a: *const in6_addr, b: *const in6_addr) -> in6_addr; + pub fn ipv6_addr_bitand(a: *const in6_addr, b: *const in6_addr) -> in6_addr; + pub fn ipv6_create_mask(mask: raw::c_uint) -> in6_addr; + pub fn ipv6_is_zero(a: *const in6_addr) -> bool; + pub fn ipv6_multicast_to_ethernet(eth: *mut eth_addr, ip6: *const in6_addr); + pub fn ip_parse_masked(s: *const raw::c_char, ip: *mut ovs_be32, mask: *mut ovs_be32) -> *mut raw::c_char; + pub fn ip_parse_cidr(s: *const raw::c_char, ip: *mut ovs_be32, plen: *mut raw::c_uint) -> *mut raw::c_char; + pub fn ip_parse(s: *const raw::c_char, ip: *mut ovs_be32) -> bool; + pub fn ip_count_cidr_bits(mask: ovs_be32) -> raw::c_int; + pub fn eth_addr_from_string(s: *const raw::c_char, ea: *mut eth_addr) -> bool; + pub fn eth_addr_to_uint64(ea: eth_addr) -> libc::uint64_t; + pub fn eth_addr_from_uint64(x: libc::uint64_t, ea: *mut eth_addr); + pub fn eth_addr_mark_random(ea: *mut eth_addr); + pub fn in6_generate_eui64(ea: eth_addr, prefix: *const in6_addr, lla: *mut in6_addr); + pub fn in6_generate_lla(ea: eth_addr, lla: *mut in6_addr); + pub fn in6_is_lla(addr: *const in6_addr) -> bool; + pub fn in6_addr_solicited_node(addr: *mut in6_addr, ip6: *const in6_addr); + + // include/openvswitch/json.h + pub fn json_string_escape(str: *const raw::c_char, out: *mut ovs_ds); + // openvswitch/dynamic-string.h + pub fn ds_destroy(ds: *mut ovs_ds); + pub fn ds_cstr(ds: *const ovs_ds) -> *const raw::c_char; + pub fn svec_destroy(v: *mut ovs_svec); + pub fn ovs_scan(s: *const raw::c_char, format: *const raw::c_char, ...) -> bool; + pub fn str_to_int(s: *const raw::c_char, base: raw::c_int, i: *mut raw::c_int) -> bool; + pub fn str_to_uint(s: *const raw::c_char, base: raw::c_int, i: *mut raw::c_uint) -> bool; + } +} + +/* functions imported from libc */ +#[link(name = "c")] +extern "C" { + fn free(ptr: *mut raw::c_void); +} + +/* functions imported from arp/inet6 */ +extern "C" { + fn inet_ntop(af: raw::c_int, cp: *const raw::c_void, + buf: *mut raw::c_char, len: libc::socklen_t) -> *const raw::c_char; +} + +/* + * Parse IPv4 address list. + */ + +named!(parse_spaces<nom::types::CompleteStr, ()>, + do_parse!(many1!(one_of!(&" \t\n\r\x0c\x0b")) >> (()) ) +); + +named!(parse_opt_spaces<nom::types::CompleteStr, ()>, + do_parse!(opt!(parse_spaces) >> (())) +); + +named!(parse_ipv4_range<nom::types::CompleteStr, (String, Option<String>)>, + do_parse!(addr1: many_till!(complete!(nom::anychar), alt!(do_parse!(eof!() >> (nom::types::CompleteStr(""))) | peek!(tag!("..")) | tag!(" ") )) >> + parse_opt_spaces >> + addr2: opt!(do_parse!(tag!("..") >> + parse_opt_spaces >> + addr2: many_till!(complete!(nom::anychar), alt!(do_parse!(eof!() >> (' ')) | char!(' ')) ) >> + (addr2) )) >> + parse_opt_spaces >> + (addr1.0.into_iter().collect(), addr2.map(|x|x.0.into_iter().collect())) ) +); + +named!(parse_ipv4_address_list<nom::types::CompleteStr, Vec<(String, Option<String>)>>, + do_parse!(parse_opt_spaces >> + ranges: many0!(parse_ipv4_range) >> + (ranges))); + +pub fn parse_ip_list(ips: &String) -> ddlog_std::Either<String, ddlog_std::Vec<ddlog_std::tuple2<in_addr, ddlog_std::Option<in_addr>>>> +{ + match parse_ipv4_address_list(nom::types::CompleteStr(ips.as_str())) { + Err(e) => { + ddlog_std::Either::Left{l: format!("invalid IP list format: \"{}\"", ips.as_str())} + }, + Ok((nom::types::CompleteStr(""), ranges)) => { + let mut res = vec![]; + for (ip1, ip2) in ranges.iter() { + let start = match ip_parse(&ip1) { + ddlog_std::Option::None => return ddlog_std::Either::Left{l: format!("invalid IP address: \"{}\"", *ip1)}, + ddlog_std::Option::Some{x: ip} => ip + }; + let end = match ip2 { + None => ddlog_std::Option::None, + Some(ip_str) => match ip_parse(&ip_str.clone()) { + ddlog_std::Option::None => return ddlog_std::Either::Left{l: format!("invalid IP address: \"{}\"", *ip_str)}, + x => x + } + }; + res.push(ddlog_std::tuple2(start, end)); + }; + ddlog_std::Either::Right{r: ddlog_std::Vec{x: res}} + }, + Ok((suffix, _)) => { + ddlog_std::Either::Left{l: format!("IP address list contains trailing characters: \"{}\"", suffix)} + } + } +} diff --git a/northd/ovn.toml b/northd/ovn.toml new file mode 100644 index 000000000000..64108996edae --- /dev/null +++ b/northd/ovn.toml @@ -0,0 +1,2 @@ +[dependencies.nom] +version = "4.0" diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl new file mode 100644 index 000000000000..3fbe67b31909 --- /dev/null +++ b/northd/ovn_northd.dl @@ -0,0 +1,7500 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import OVN_Northbound as nb +import OVN_Southbound as sb +import ovsdb +import allocate +import ovn +import lswitch +import lrouter +import multicast +import helpers +import ipam + +output relation Warning[string] + +index Logical_Flow_Index() on sb::Out_Logical_Flow() + +/* Meter_Band table */ +for (mb in nb::Meter_Band) { + sb::Out_Meter_Band(._uuid = mb._uuid, + .action = mb.action, + .rate = mb.rate, + .burst_size = mb.burst_size) +} + +/* Meter table */ +for (meter in nb::Meter) { + sb::Out_Meter(._uuid = meter._uuid, + .name = meter.name, + .unit = meter.unit, + .bands = meter.bands) +} + +/* Proxy table for Out_Datapath_Binding: contains all Datapath_Binding fields, + * except tunnel id, which is allocated separately (see TunKeyAllocation). */ +relation OutProxy_Datapath_Binding ( + _uuid: uuid, + external_ids: Map<string,string> +) + +/* Datapath_Binding table */ +OutProxy_Datapath_Binding(uuid, external_ids) :- + nb::Logical_Switch(._uuid = uuid, .name = name, .external_ids = ids, + .other_config = other_config), + var uuid_str = uuid2str(uuid), + var external_ids = { + var eids = ["logical-switch" -> uuid_str, "name" -> name]; + match (map_get(ids, "neutron:network_name")) { + None -> (), + Some{nnn} -> map_insert(eids, "name2", nnn) + }; + match (map_get(other_config, "interconn-ts")) { + None -> (), + Some{value} -> map_insert(eids, "interconn-ts", value) + }; + eids + }. + +OutProxy_Datapath_Binding(uuid, external_ids) :- + lr in nb::Logical_Router(._uuid = uuid, .name = name, .external_ids = ids), + lr.is_enabled(), + var uuid_str = uuid2str(uuid), + var external_ids = { + var eids = ["logical-router" -> uuid_str, "name" -> name]; + match (map_get(ids, "neutron:router_name")) { + None -> (), + Some{nnn} -> map_insert(eids, "name2", nnn) + }; + eids + }. + +sb::Out_Datapath_Binding(uuid, tunkey, external_ids) :- + OutProxy_Datapath_Binding(uuid, external_ids), + TunKeyAllocation(uuid, tunkey). + + +/* Proxy table for Out_Datapath_Binding: contains all Datapath_Binding fields, + * except tunnel id, which is allocated separately (see PortTunKeyAllocation). */ +relation OutProxy_Port_Binding ( + _uuid: uuid, + logical_port: string, + __type: string, + gateway_chassis: Set<uuid>, + ha_chassis_group: Option<uuid>, + options: Map<string,string>, + datapath: uuid, + parent_port: Option<string>, + tag: Option<integer>, + mac: Set<string>, + nat_addresses: Set<string>, + external_ids: Map<string,string> +) + +/* Case 1: Create a Port_Binding per logical switch port that is not of type "router" */ +OutProxy_Port_Binding(._uuid = lsp._uuid, + .logical_port = lsp.name, + .__type = lsp.__type, + .gateway_chassis = set_empty(), + .ha_chassis_group = sp.hac_group_uuid, + .options = lsp.options, + .datapath = sw.ls._uuid, + .parent_port = lsp.parent_name, + .tag = tag, + .mac = lsp.addresses, + .nat_addresses = set_empty(), + .external_ids = eids) :- + sp in &SwitchPort(.lsp = lsp, .sw = &sw), + SwitchPortNewDynamicTag(lsp._uuid, opt_tag), + var tag = match (opt_tag) { + None -> lsp.tag, + Some{t} -> Some{t} + }, + lsp.__type != "router", + var eids = { + var eids = lsp.external_ids; + match (map_get(lsp.external_ids, "neutron:port_name")) { + None -> (), + Some{name} -> map_insert(eids, "name", name) + }; + eids + }. + + +/* Case 2: Create a Port_Binding per logical switch port of type "router" */ +OutProxy_Port_Binding(._uuid = lsp._uuid, + .logical_port = lsp.name, + .__type = __type, + .gateway_chassis = set_empty(), + .ha_chassis_group = None, + .options = options, + .datapath = sw.ls._uuid, + .parent_port = lsp.parent_name, + .tag = None, + .mac = lsp.addresses, + .nat_addresses = nat_addresses, + .external_ids = eids) :- + &SwitchPort(.lsp = lsp, .sw = &sw, .peer = peer), + var eids = { + var eids = lsp.external_ids; + match (map_get(lsp.external_ids, "neutron:port_name")) { + None -> (), + Some{name} -> map_insert(eids, "name", name) + }; + eids + }, + Some{var router_port} = map_get(lsp.options, "router-port"), + var opt_chassis = match (peer) { + Some{rport} -> map_get(rport.router.lr.options, "chassis"), + None -> None + }, + var l3dgw_port = match (peer) { + Some{rport} -> rport.router.l3dgw_port, + None -> None + }, + (var __type, var options) = { + var options = ["peer" -> router_port]; + match (opt_chassis) { + None -> { + ("patch", options) + }, + Some{chassis} -> { + map_insert(options, "l3gateway-chassis", chassis); + ("l3gateway", options) + } + } + }, + var base_nat_addresses = { + match (map_get(lsp.options, "nat-addresses")) { + None -> { set_empty() }, + Some{"router"} -> match ((l3dgw_port, opt_chassis, peer)) { + (None, None, _) -> set_empty(), + (_, _, None) -> set_empty(), + (_, _, Some{rport}) -> get_nat_addresses(deref(rport)) + }, + Some{nat_addresses} -> { + /* Only accept manual specification of ethernet address + * followed by IPv4 addresses on type "l3gateway" ports. */ + if (is_some(opt_chassis)) { + match (extract_lsp_addresses(nat_addresses)) { + None -> { + warn("Error extracting nat-addresses."); + set_empty() + }, + Some{_} -> { set_singleton(nat_addresses) } + } + } else { set_empty() } + } + } + }, + /* Add the router mac and IPv4 addresses to + * Port_Binding.nat_addresses so that GARP is sent for these + * IPs by the ovn-controller on which the distributed gateway + * router port resides if: + * + * 1. The peer has 'reside-on-redirect-chassis' set and the + * the logical router datapath has distributed router port. + * + * 2. The peer is distributed gateway router port. + * + * 3. The peer's router is a gateway router and the port has a localnet + * port. + * + * Note: Port_Binding.nat_addresses column is also used for + * sending the GARPs for the router port IPs. + * */ + var garp_nat_addresses = match (peer) { + Some{rport} -> match ( + (map_get_bool_def(rport.lrp.options, "reside-on-redirect-chassis", + false) + and is_some(l3dgw_port)) or + Some{rport.lrp} == l3dgw_port or + (is_some(map_get(rport.router.lr.options, "chassis")) and + not sw.localnet_port_names.is_empty())) { + false -> set_empty(), + true -> set_singleton(get_garp_nat_addresses(deref(rport))) + }, + None -> set_empty() + }, + var nat_addresses = set_union(base_nat_addresses, garp_nat_addresses). + +/* Case 3: Port_Binding per logical router port */ +OutProxy_Port_Binding(._uuid = lrp._uuid, + .logical_port = lrp.name, + .__type = __type, + .gateway_chassis = set_empty(), + .ha_chassis_group = None, + .options = options, + .datapath = router.lr._uuid, + .parent_port = None, + .tag = None, // always empty for router ports + .mac = set_singleton("${lrp.mac} ${lrp.networks.join(\" \")}"), + .nat_addresses = set_empty(), + .external_ids = lrp.external_ids) :- + rp in &RouterPort(.lrp = lrp, .router = &router, .peer = peer), + RouterPortRAOptionsComplete(lrp._uuid, options0), + (var __type, var options1) = match (map_get(router.lr.options, "chassis")) { + /* TODO: derived ports */ + None -> ("patch", map_empty()), + Some{lrchassis} -> ("l3gateway", ["l3gateway-chassis" -> lrchassis]) + }, + var options2 = match (router_peer_name(peer)) { + None -> map_empty(), + Some{peer_name} -> ["peer" -> peer_name] + }, + var options3 = match ((peer, vec_is_empty(rp.networks.ipv6_addrs))) { + (PeerSwitch{_, _}, false) -> { + var enabled = lrp.is_enabled(); + var pd = map_get_bool_def(lrp.options, "prefix_delegation", false); + var p = map_get_bool_def(lrp.options, "prefix", false); + ["ipv6_prefix_delegation" -> "${pd and enabled}", + "ipv6_prefix" -> "${p and enabled}"] + }, + _ -> map_empty() + }, + PreserveIPv6RAPDList(lrp._uuid, ipv6_ra_pd_list), + var options4 = match (ipv6_ra_pd_list) { + None -> map_empty(), + Some{value} -> ["ipv6_ra_pd_list" -> value] + }, + var options = map_union(options0, + map_union(options1, + map_union(options2, + map_union(options3, options4)))), + var eids = { + var eids = lrp.external_ids; + match (map_get(lrp.external_ids, "neutron:port_name")) { + None -> (), + Some{name} -> map_insert(eids, "name", name) + }; + eids + }. +/* +*/ +function get_router_load_balancer_ips(router: Router) : + (Set<string>, Set<string>) = +{ + var all_ips_v4 = set_empty(); + var all_ips_v6 = set_empty(); + for (lb in router.lbs) { + for (kv in deref(lb).vips) { + (var vip, _) = kv; + /* node->key contains IP:port or just IP. */ + match (ip_address_and_port_from_lb_key(vip)) { + None -> (), + Some{(IPv4{ipv4}, _)} -> set_insert(all_ips_v4, "${ipv4}"), + Some{(IPv6{ipv6}, _)} -> set_insert(all_ips_v6, "${ipv6}") + } + } + }; + (all_ips_v4, all_ips_v6) +} + +/* Returns an array of strings, each consisting of a MAC address followed + * by one or more IP addresses, and if the port is a distributed gateway + * port, followed by 'is_chassis_resident("LPORT_NAME")', where the + * LPORT_NAME is the name of the L3 redirect port or the name of the + * logical_port specified in a NAT rule. These strings include the + * external IP addresses of all NAT rules defined on that router, and all + * of the IP addresses used in load balancer VIPs defined on that router. + */ +function get_nat_addresses(rport: RouterPort): Set<string> = +{ + var addresses = set_empty(); + var router = deref(rport.router); + var has_redirect = is_some(router.l3dgw_port); + match (eth_addr_from_string(rport.lrp.mac)) { + None -> addresses, + Some{mac} -> { + var c_addresses = "${mac}"; + var central_ip_address = false; + + /* Get NAT IP addresses. */ + for (nat in router.nats) { + /* Determine whether this NAT rule satisfies the conditions for + * distributed NAT processing. */ + if (has_redirect and nat.nat.__type == "dnat_and_snat" and + is_some(nat.nat.logical_port) and is_some(nat.external_mac)) { + /* Distributed NAT rule. */ + var logical_port = option_unwrap_or_default(nat.nat.logical_port); + var external_mac = option_unwrap_or_default(nat.external_mac); + set_insert(addresses, + "${external_mac} ${nat.external_ip} " + "is_chassis_resident(${json_string_escape(logical_port)})") + } else { + /* Centralized NAT rule, either on gateway router or distributed + * router. + * Check if external_ip is same as router ip. If so, then there + * is no need to add this to the nat_addresses. The router IPs + * will be added separately. */ + var is_router_ip = false; + match (nat.external_ip) { + IPv4{ei} -> { + for (ipv4 in rport.networks.ipv4_addrs) { + if (ei == ipv4.addr) { + is_router_ip = true; + break + } + } + }, + IPv6{ei} -> { + for (ipv6 in rport.networks.ipv6_addrs) { + if (ei == ipv6.addr) { + is_router_ip = true; + break + } + } + } + }; + if (not is_router_ip) { + c_addresses = c_addresses ++ " ${nat.external_ip}"; + central_ip_address = true + } + } + }; + + /* A set to hold all load-balancer vips. */ + (var all_ips_v4, var all_ips_v6) = get_router_load_balancer_ips(router); + + for (ip_address in set_union(all_ips_v4, all_ips_v6)) { + c_addresses = c_addresses ++ " ${ip_address}"; + central_ip_address = true + }; + + if (central_ip_address) { + /* Gratuitous ARP for centralized NAT rules on distributed gateway + * ports should be restricted to the gateway chassis. */ + if (has_redirect) { + c_addresses = c_addresses ++ " is_chassis_resident(${router.redirect_port_name})" + } else (); + + set_insert(addresses, c_addresses) + } else (); + addresses + } + } +} + +function get_garp_nat_addresses(rport: RouterPort): string = { + var garp_info = ["${rport.networks.ea}"]; + for (ipv4_addr in rport.networks.ipv4_addrs) { + vec_push(garp_info, "${ipv4_addr.addr}") + }; + if (rport.router.redirect_port_name != "") { + vec_push(garp_info, + "is_chassis_resident(${rport.router.redirect_port_name})") + }; + string_join(garp_info, " ") +} + +/* Extra options computed for router ports by the logical flow generation code */ +relation RouterPortRAOptions(lrp: uuid, options: Map<string, string>) + +relation RouterPortRAOptionsComplete(lrp: uuid, options: Map<string, string>) + +RouterPortRAOptionsComplete(lrp, options) :- + RouterPortRAOptions(lrp, options). +RouterPortRAOptionsComplete(lrp, map_empty()) :- + nb::Logical_Router_Port(._uuid = lrp), + not RouterPortRAOptions(lrp, _). + + +/* + * Create derived port for Logical_Router_Ports with non-empty 'gateway_chassis' column. + */ + +/* Create derived ports */ +OutProxy_Port_Binding(// lrp._uuid is already in use; generate a new UUID by + // hashing it. + ._uuid = hash128(lrp._uuid), + .logical_port = chassis_redirect_name(lrp.name), + .__type = "chassisredirect", + .gateway_chassis = set_empty(), + .ha_chassis_group = Some{hacg_uuid}, + .options = options, + .datapath = lr_uuid, + .parent_port = None, + .tag = None, //always empty for router ports + .mac = set_singleton("${lrp.mac} ${lrp.networks.join(\" \")}"), + .nat_addresses = set_empty(), + .external_ids = lrp.external_ids) :- + DistributedGatewayPort(lrp, lr_uuid), + LogicalRouterHAChassisGroup(lr_uuid, hacg_uuid), + var redirect_type = match (map_get(lrp.options, "redirect-type")) { + Some{var value} -> ["redirect-type" -> value], + _ -> map_empty() + }, + var options = map_insert_imm(redirect_type, "distributed-port", lrp.name). + + +/* Add allocated qdisc_queue_id and tunnel key to Port_Binding. + */ +sb::Out_Port_Binding(._uuid = pbinding._uuid, + .logical_port = pbinding.logical_port, + .__type = pbinding.__type, + .gateway_chassis = pbinding.gateway_chassis, + .ha_chassis_group = pbinding.ha_chassis_group, + .options = options0, + .datapath = pbinding.datapath, + .tunnel_key = tunkey, + .parent_port = pbinding.parent_port, + .tag = pbinding.tag, + .mac = pbinding.mac, + .nat_addresses = pbinding.nat_addresses, + .external_ids = pbinding.external_ids) :- + pbinding in OutProxy_Port_Binding(), + PortTunKeyAllocation(pbinding._uuid, tunkey), + QueueIDAllocation(pbinding._uuid, qid), + var options0 = match (qid) { + None -> pbinding.options, + Some{id} -> map_insert_imm(pbinding.options, "qdisc_queue_id", "${id}") + }. + +/* Referenced chassis. + * + * These tables track the sb::Chassis that a packet that traverses logical + * router 'lr_uuid' can end up at (or start from). This is used for + * sb::Out_HA_Chassis_Group's ref_chassis column. + * + * RefChassisSet0 has a row for each logical router that actually references a + * chassis. RefChassisSet has a row for every logical router. */ +relation RefChassis(lr_uuid: uuid, chassis_uuid: uuid) +RefChassis(lr_uuid, chassis_uuid) :- + ReachableLogicalRouter(lr_uuid, lr2_uuid), + FirstHopLogicalRouter(lr2_uuid, ls_uuid), + LogicalSwitchPort(lsp_uuid, ls_uuid), + nb::Logical_Switch_Port(._uuid = lsp_uuid, .name = lsp_name), + sb::Port_Binding(.logical_port = lsp_name, .chassis = chassis_uuids), + Some{var chassis_uuid} = chassis_uuids. +relation RefChassisSet0(lr_uuid: uuid, chassis_uuids: Set<uuid>) +RefChassisSet0(lr_uuid, chassis_uuids) :- + RefChassis(lr_uuid, chassis_uuid), + var chassis_uuids = chassis_uuid.group_by(lr_uuid).to_set(). +relation RefChassisSet(lr_uuid: uuid, chassis_uuids: Set<uuid>) +RefChassisSet(lr_uuid, chassis_uuids) :- + RefChassisSet0(lr_uuid, chassis_uuids). +RefChassisSet(lr_uuid, set_empty()) :- + nb::Logical_Router(._uuid = lr_uuid), + not RefChassisSet0(lr_uuid, _). + +/* Referenced chassis for an HA chassis group. + * + * Multiple logical routers can reference an HA chassis group so we merge the + * referenced chassis across all of them. + */ +relation HAChassisGroupRefChassisSet(hacg_uuid: uuid, + chassis_uuids: Set<uuid>) +HAChassisGroupRefChassisSet(hacg_uuid, chassis_uuids) :- + LogicalRouterHAChassisGroup(lr_uuid, hacg_uuid), + RefChassisSet(lr_uuid, chassis_uuids), + var chassis_uuids = chassis_uuids.group_by(hacg_uuid).union(). + +/* HA_Chassis_Group and HA_Chassis. */ +sb::Out_HA_Chassis_Group(hacg_uuid, hacg_name, ha_chassis, ref_chassis, eids) :- + HAChassis(hacg_uuid, hac_uuid, chassis_name, _, _), + var chassis_uuid = ha_chassis_uuid(chassis_name, hac_uuid), + var ha_chassis = chassis_uuid.group_by(hacg_uuid).to_set(), + HAChassisGroup(hacg_uuid, hacg_name, eids), + HAChassisGroupRefChassisSet(hacg_uuid, ref_chassis). + +sb::Out_HA_Chassis(ha_chassis_uuid(chassis_name, hac_uuid), chassis, priority, eids) :- + HAChassis(_, hac_uuid, chassis_name, priority, eids), + chassis_rec in sb::Chassis(.name = chassis_name), + var chassis = Some{chassis_rec._uuid}. +sb::Out_HA_Chassis(ha_chassis_uuid(chassis_name, hac_uuid), None, priority, eids) :- + HAChassis(_, hac_uuid, chassis_name, priority, eids), + not chassis_rec in sb::Chassis(.name = chassis_name). + +relation HAChassisToChassis(name: string, chassis: Option<uuid>) +HAChassisToChassis(name, Some{chassis}) :- + sb::Chassis(._uuid = chassis, .name = name). +HAChassisToChassis(name, None) :- + nb::HA_Chassis(.chassis_name = name), + not sb::Chassis(.name = name). +sb::Out_HA_Chassis(ha_chassis_uuid(ha_chassis.chassis_name, hac_uuid), chassis, priority, eids) :- + sp in &SwitchPort(), + sp.lsp.__type == "external", + Some{var ha_chassis_group_uuid} = sp.lsp.ha_chassis_group, + ha_chassis_group in nb::HA_Chassis_Group(._uuid = ha_chassis_group_uuid), + var hac_uuid = FlatMap(ha_chassis_group.ha_chassis), + ha_chassis in nb::HA_Chassis(._uuid = hac_uuid, .priority = priority, .external_ids = eids), + HAChassisToChassis(ha_chassis.chassis_name, chassis). +sb::Out_HA_Chassis_Group(_uuid, name, ha_chassis, set_empty() /* XXX? */, eids) :- + sp in &SwitchPort(), + sp.lsp.__type == "external", + var ls_uuid = sp.sw.ls._uuid, + Some{var ha_chassis_group_uuid} = sp.lsp.ha_chassis_group, + ha_chassis_group in nb::HA_Chassis_Group(._uuid = ha_chassis_group_uuid, .name = name, + .external_ids = eids), + var hac_uuid = FlatMap(ha_chassis_group.ha_chassis), + ha_chassis in nb::HA_Chassis(._uuid = hac_uuid), + var ha_chassis_uuid_name = ha_chassis_uuid(ha_chassis.chassis_name, hac_uuid), + var ha_chassis = ha_chassis_uuid_name.group_by((ls_uuid, name, eids)).to_set(), + var _uuid = ha_chassis_group_uuid(ls_uuid). + +/* + * SB_Global: copy nb_cfg and options from NB. + * If NB_Global does not exist yet, just keep the current value of SB_Global, + * if any. + */ +for (nb_global in nb::NB_Global) { + sb::Out_SB_Global(._uuid = nb_global._uuid, + .nb_cfg = nb_global.nb_cfg, + .options = nb_global.options, + .ipsec = nb_global.ipsec) +} + +sb::Out_SB_Global(._uuid = sb_global._uuid, + .nb_cfg = sb_global.nb_cfg, + .options = sb_global.options, + .ipsec = sb_global.ipsec) :- + sb_global in sb::SB_Global(), + not nb::NB_Global(). + +/* sb::Chassis_Private joined with is_remote from sb::Chassis, + * including a record even for a null Chassis ref. */ +relation ChassisPrivate( + cp: sb::Chassis_Private, + is_remote: bool) +ChassisPrivate(cp, map_get_bool_def(c.other_config, "is-remote", false)) :- + cp in sb::Chassis_Private(.chassis = Some{uuid}), + c in sb::Chassis(._uuid = uuid). +ChassisPrivate(cp, false), +Warning["Chassis not exist for Chassis_Private record, name: ${cp.name}"] :- + cp in sb::Chassis_Private(.chassis = Some{uuid}), + not sb::Chassis(._uuid = uuid). +ChassisPrivate(cp, false), +Warning["Chassis not exist for Chassis_Private record, name: ${cp.name}"] :- + cp in sb::Chassis_Private(.chassis = None). + +/* Track minimum hv_cfg across all the (non-remote) chassis. */ +relation HvCfg0(hv_cfg: integer) +HvCfg0(hv_cfg) :- + ChassisPrivate(.cp = sb::Chassis_Private{.nb_cfg = chassis_cfg}, .is_remote = false), + var hv_cfg = chassis_cfg.group_by(()).min(). +relation HvCfg(hv_cfg: integer) +HvCfg(hv_cfg) :- HvCfg0(hv_cfg). +HvCfg(hv_cfg) :- + nb::NB_Global(.nb_cfg = hv_cfg), + not HvCfg0(). + +/* Track maximum nb_cfg_timestamp among all the (non-remote) chassis + * that have the minimum nb_cfg. */ +relation HvCfgTimestamp0(hv_cfg_timestamp: integer) +HvCfgTimestamp0(hv_cfg_timestamp) :- + HvCfg(hv_cfg), + ChassisPrivate(.cp = sb::Chassis_Private{.nb_cfg = hv_cfg, + .nb_cfg_timestamp = chassis_cfg_timestamp}, + .is_remote = false), + var hv_cfg_timestamp = chassis_cfg_timestamp.group_by(()).max(). +relation HvCfgTimestamp(hv_cfg_timestamp: integer) +HvCfgTimestamp(hv_cfg_timestamp) :- HvCfgTimestamp0(hv_cfg_timestamp). +HvCfgTimestamp(hv_cfg_timestamp) :- + nb::NB_Global(.hv_cfg_timestamp = hv_cfg_timestamp), + not HvCfgTimestamp0(). + +/* + * NB_Global: + * - set `sb_cfg` to the value of `SB_Global.nb_cfg`. + * - set `hv_cfg` to the smallest value of `nb_cfg` across all `Chassis` + * - FIXME: we use ipsec as unique key to make sure that we don't create multiple `NB_Global` + * instance. There is a potential race condition if this field is modified at the same + * time northd is updating `sb_cfg` or `hv_cfg`. + */ +input relation NbCfgTimestamp[integer] +nb::Out_NB_Global(._uuid = _uuid, + .sb_cfg = sb_cfg, + .hv_cfg = hv_cfg, + .nb_cfg_timestamp = nb_cfg_timestamp, + .hv_cfg_timestamp = hv_cfg_timestamp, + .ipsec = ipsec, + .options = options) :- + NbCfgTimestamp[nb_cfg_timestamp], + HvCfgTimestamp(hv_cfg_timestamp), + nbg in nb::NB_Global(._uuid = _uuid, .ipsec = ipsec), + sb::SB_Global(.nb_cfg = sb_cfg), + HvCfg(hv_cfg), + HvCfgTimestamp(hv_cfg_timestamp), + MacPrefix(mac_prefix), + SvcMonitorMac(svc_monitor_mac), + OvnMaxDpKeyLocal[max_tunid], + var options0 = put_mac_prefix(nbg.options, mac_prefix), + var options1 = put_svc_monitor_mac(options0, svc_monitor_mac), + var options = map_insert_imm(options1, "max_tunid", "${max_tunid}"). + + +/* SB_Global does not exist yet -- just keep the old value of NB_Global */ +nb::Out_NB_Global(._uuid = nbg._uuid, + .sb_cfg = nbg.sb_cfg, + .hv_cfg = nbg.hv_cfg, + .ipsec = nbg.ipsec, + .options = nbg.options, + .nb_cfg_timestamp = nb_cfg_timestamp, + .hv_cfg_timestamp = hv_cfg_timestamp) :- + NbCfgTimestamp[nb_cfg_timestamp], + HvCfgTimestamp(hv_cfg_timestamp), + nbg in nb::NB_Global(), + not sb::SB_Global(). + +output relation SbCfg[integer] +SbCfg[sb_cfg] :- nb::Out_NB_Global(.sb_cfg = sb_cfg). + +output relation Northd_Probe_Interval[integer] +Northd_Probe_Interval[interval] :- + nb in nb::NB_Global(), + var interval = map_get_int_def(nb.options, "northd_probe_interval", 0). + +relation CheckLspIsUp[bool] +CheckLspIsUp[check_lsp_is_up] :- + nb in nb::NB_Global(), + var check_lsp_is_up = not map_get_bool_def(nb.options, "ignore_lsp_down", false). +CheckLspIsUp[true] :- + Unit(), + not nb in nb::NB_Global(). + +/* + * Address_Set: copy from NB + additional records generated from NB Port_Group (two records for each + * Port_Group for IPv4 and IPv6 addresses). + * + * There can be name collisions between the two types of Address_Set records. User-defined records + * take precedence. + */ +sb::Out_Address_Set(._uuid = nb_as._uuid, + .name = nb_as.name, + .addresses = nb_as.addresses) :- + AddressSetRef[nb_as]. + +sb::Out_Address_Set(._uuid = hash128("svc_monitor_mac"), + .name = "svc_monitor_mac", + .addresses = set_singleton("${svc_monitor_mac}")) :- + SvcMonitorMac(svc_monitor_mac). + +sb::Out_Address_Set(hash128(as_name), as_name, set_unions(pg_ip4addrs)) :- + nb::Port_Group(.ports = pg_ports, .name = pg_name), + var as_name = pg_name ++ "_ip4", + // avoid name collisions with user-defined Address_Sets + not nb::Address_Set(.name = as_name), + var port_uuid = FlatMap(pg_ports), + PortStaticAddresses(.lsport = port_uuid, .ip4addrs = stat), + SwitchPortNewDynamicAddress(&SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = port_uuid}}, + dyn_addr), + var dynamic = match (dyn_addr) { + None -> set_empty(), + Some{lpaddress} -> match (vec_nth(lpaddress.ipv4_addrs, 0)) { + None -> set_empty(), + Some{addr} -> set_singleton("${addr.addr}") + } + }, + //PortDynamicAddresses(.lsport = port_uuid, .ip4addrs = dynamic), + var port_ip4addrs = set_union(stat, dynamic), + var pg_ip4addrs = port_ip4addrs.group_by(as_name).to_vec(). + +sb::Out_Address_Set(hash128(as_name), as_name, set_empty()) :- + nb::Port_Group(.ports = set_empty(), .name = pg_name), + var as_name = pg_name ++ "_ip4", + // avoid name collisions with user-defined Address_Sets + not nb::Address_Set(.name = as_name). + +sb::Out_Address_Set(hash128(as_name), as_name, set_unions(pg_ip6addrs)) :- + nb::Port_Group(.ports = pg_ports, .name = pg_name), + var as_name = pg_name ++ "_ip6", + // avoid name collisions with user-defined Address_Sets + not nb::Address_Set(.name = as_name), + var port_uuid = FlatMap(pg_ports), + PortStaticAddresses(.lsport = port_uuid, .ip6addrs = stat), + SwitchPortNewDynamicAddress(&SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = port_uuid}}, + dyn_addr), + var dynamic = match (dyn_addr) { + None -> set_empty(), + Some{lpaddress} -> match (vec_nth(lpaddress.ipv6_addrs, 0)) { + None -> set_empty(), + Some{addr} -> set_singleton("${addr.addr}") + } + }, + //PortDynamicAddresses(.lsport = port_uuid, .ip6addrs = dynamic), + var port_ip6addrs = set_union(stat, dynamic), + var pg_ip6addrs = port_ip6addrs.group_by(as_name).to_vec(). + +sb::Out_Address_Set(hash128(as_name), as_name, set_empty()) :- + nb::Port_Group(.ports = set_empty(), .name = pg_name), + var as_name = pg_name ++ "_ip6", + // avoid name collisions with user-defined Address_Sets + not nb::Address_Set(.name = as_name). + +/* + * Port_Group + * + * Create one SB Port_Group record for every datapath that has ports + * referenced by the NB Port_Group.ports field. In order to maintain the + * SB Port_Group.name uniqueness constraint, ovn-northd populates the field + * with the value: <SB.Logical_Datapath.tunnel_key>_<NB.Port_Group.name>. + */ +sb::Out_Port_Group(._uuid = hash128(sb_name), .name = sb_name, .ports = port_names) :- + nb::Port_Group(._uuid = _uuid, .name = nb_name, .ports = pg_ports), + var port_uuid = FlatMap(pg_ports), + &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{._uuid = port_uuid, + .name = port_name}, + .sw = &Switch{.ls = nb::Logical_Switch{._uuid = ls_uuid}}), + TunKeyAllocation(.datapath = ls_uuid, .tunkey = tunkey), + var sb_name = "${tunkey}_${nb_name}", + var port_names = port_name.group_by((_uuid, sb_name)).to_set(). + +/* + * Multicast_Group: + * - three static rows per logical switch: one for flooding, one for packets + * with unknown destinations, one for flooding IP multicast known traffic to + * mrouters. + * - dynamically created rows based on IGMP groups learned by controllers. + */ + +function mC_FLOOD(): (string, integer) = + ("_MC_flood", 32768) + +function mC_UNKNOWN(): (string, integer) = + ("_MC_unknown", 32769) + +function mC_MROUTER_FLOOD(): (string, integer) = + ("_MC_mrouter_flood", 32770) + +function mC_MROUTER_STATIC(): (string, integer) = + ("_MC_mrouter_static", 32771) + +function mC_STATIC(): (string, integer) = + ("_MC_static", 32772) + +function mC_FLOOD_L2(): (string, integer) = + ("_MC_flood_l2", 32773) + +function mC_IP_MCAST_MIN(): (string, integer) = + ("_MC_ip_mcast_min", 32774) + +function mC_IP_MCAST_MAX(): (string, integer) = + ("_MC_ip_mcast_max", 65535) + + +// TODO: check that Multicast_Group.ports should not include derived ports + +/* Proxy table for Out_Multicast_Group: contains all Multicast_Group fields, + * except `_uuid`, which will be computed by hashing the remaining fields, + * and tunnel key, which case it is allocated separately (see + * MulticastGroupTunKeyAllocation). */ +relation OutProxy_Multicast_Group ( + datapath: uuid, + name: string, + ports: Set<uuid> +) + +/* Only create flood group if the switch has enabled ports */ +sb::Out_Multicast_Group (._uuid = hash128((datapath,name)), + .datapath = datapath, + .name = name, + .tunnel_key = tunnel_key, + .ports = port_ids) :- + &SwitchPort(.lsp = lsp, .sw = &Switch{.ls = ls}), + lsp.is_enabled(), + var datapath = ls._uuid, + var port_ids = lsp._uuid.group_by((datapath)).to_set(), + (var name, var tunnel_key) = mC_FLOOD(). + +/* Create a multicast group to flood to all switch ports except router ports. + */ +sb::Out_Multicast_Group (._uuid = hash128((datapath,name)), + .datapath = datapath, + .name = name, + .tunnel_key = tunnel_key, + .ports = port_ids) :- + &SwitchPort(.lsp = lsp, .sw = &Switch{.ls = ls}), + lsp.is_enabled(), + lsp.__type != "router", + var datapath = ls._uuid, + var port_ids = lsp._uuid.group_by((datapath)).to_set(), + (var name, var tunnel_key) = mC_FLOOD_L2(). + +/* Only create unknown group if the switch has ports with "unknown" address */ +sb::Out_Multicast_Group (._uuid = hash128((ls,name)), + .datapath = ls, + .name = name, + .tunnel_key = tunnel_key, + .ports = port_ids) :- + LogicalSwitchUnknownPorts(ls, port_ids), + (var name, var tunnel_key) = mC_UNKNOWN(). + +/* Create a multicast group to flood multicast traffic to routers with + * multicast relay enabled. + */ +sb::Out_Multicast_Group (._uuid = hash128((sw.ls._uuid,name)), + .datapath = sw.ls._uuid, + .name = name, + .tunnel_key = tunnel_key, + .ports = port_ids) :- + SwitchMcastFloodRelayPorts(&sw, port_ids), not set_is_empty(port_ids), + (var name, var tunnel_key) = mC_MROUTER_FLOOD(). + +/* Create a multicast group to flood traffic (no reports) to ports with + * multicast flood enabled. + */ +sb::Out_Multicast_Group (._uuid = hash128((sw.ls._uuid,name)), + .datapath = sw.ls._uuid, + .name = name, + .tunnel_key = tunnel_key, + .ports = port_ids) :- + SwitchMcastFloodPorts(&sw, port_ids), not set_is_empty(port_ids), + (var name, var tunnel_key) = mC_STATIC(). + +/* Create a multicast group to flood reports to ports with + * multicast flood_reports enabled. + */ +sb::Out_Multicast_Group (._uuid = hash128((sw.ls._uuid,name)), + .datapath = sw.ls._uuid, + .name = name, + .tunnel_key = tunnel_key, + .ports = port_ids) :- + SwitchMcastFloodReportPorts(&sw, port_ids), not set_is_empty(port_ids), + (var name, var tunnel_key) = mC_MROUTER_STATIC(). + +/* Create a multicast group to flood traffic and reports to router ports with + * multicast flood enabled. + */ +sb::Out_Multicast_Group (._uuid = hash128((rtr.lr._uuid,name)), + .datapath = rtr.lr._uuid, + .name = name, + .tunnel_key = tunnel_key, + .ports = port_ids) :- + RouterMcastFloodPorts(&rtr, port_ids), not set_is_empty(port_ids), + (var name, var tunnel_key) = mC_STATIC(). + +/* Create a multicast group for each IGMP group learned by a Switch. + * 'tunnel_key' == 0 triggers an ID allocation later. + */ +OutProxy_Multicast_Group (.datapath = switch.ls._uuid, + .name = address, + .ports = port_ids) :- + IgmpSwitchMulticastGroup(address, &switch, port_ids). + +/* Create a multicast group for each IGMP group learned by a Router. + * 'tunnel_key' == 0 triggers an ID allocation later. + */ +OutProxy_Multicast_Group (.datapath = router.lr._uuid, + .name = address, + .ports = port_ids) :- + IgmpRouterMulticastGroup(address, &router, port_ids). + +/* Allocate a 'tunnel_key' for dynamic multicast groups. */ +sb::Out_Multicast_Group(._uuid = hash128((mcgroup.datapath,mcgroup.name)), + .datapath = mcgroup.datapath, + .name = mcgroup.name, + .tunnel_key = tunnel_key, + .ports = mcgroup.ports) :- + mcgroup in OutProxy_Multicast_Group(), + MulticastGroupTunKeyAllocation(mcgroup.datapath, mcgroup.name, tunnel_key). + +/* + * MAC binding: records inserted by hypervisors; northd removes records for deleted logical ports and datapaths. + */ +sb::Out_MAC_Binding (._uuid = mb._uuid, + .logical_port = mb.logical_port, + .ip = mb.ip, + .mac = mb.mac, + .datapath = mb.datapath) :- + sb::MAC_Binding[mb], + sb::Out_Port_Binding(.logical_port = mb.logical_port), + sb::Out_Datapath_Binding(._uuid = mb.datapath). + +/* + * DHCP options: fixed table + */ +sb::Out_DHCP_Options ( + ._uuid = 128'h7d9d898a_179b_4898_8382_b73bec391f23, + .name = "offerip", + .code = 0, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hea5e7d14_fd97_491c_8004_a120bdbc4306, + .name = "netmask", + .code = 1, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hdab5e39b_6702_4245_9573_6c142aa3724c, + .name = "router", + .code = 3, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h340b4bc5_c5c3_43d1_ae77_564da69c8fcc, + .name = "dns_server", + .code = 6, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hcd1ab302_cbb2_4eab_9ec5_ec1c8541bd82, + .name = "log_server", + .code = 7, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h1c7ea6a0_fe6b_48c1_a920_302583c1ff08, + .name = "lpr_server", + .code = 9, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hae35e575_226a_4ab5_a1c4_166f426dd999, + .name = "domain_name", + .code = 15, + .__type = "str" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'had0ec3e0_8be9_4c77_bceb_f8954a34c7ba, + .name = "swap_server", + .code = 16, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h884c2e02_6e99_4d12_aef7_8454ebf8a3b7, + .name = "policy_filter", + .code = 21, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h57cc2c61_fd2a_41c6_b6b1_6ce9a8901f86, + .name = "router_solicitation", + .code = 32, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h48249097_03f0_46c1_a32a_2dd57cd4d0f8, + .name = "nis_server", + .code = 41, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h333fe07e_bdd1_4371_aa4f_a412bc60f3a2, + .name = "ntp_server", + .code = 42, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h6207109c_49d0_4348_8238_dd92afb69bf0, + .name = "server_id", + .code = 54, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h2090b783_26d3_4c1d_830c_54c1b6c5d846, + .name = "tftp_server", + .code = 66, + .__type = "host_id" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'ha18ff399_caea_406e_af7e_321c6f74e581, + .name = "classless_static_route", + .code = 121, + .__type = "static_routes" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hb81ad7b4_62f0_40c7_a9a3_f96677628767, + .name = "ms_classless_static_route", + .code = 249, + .__type = "static_routes" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h0c2e144e_4b5f_4e21_8978_0e20bac9a6ea, + .name = "ip_forward_enable", + .code = 19, + .__type = "bool" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h6feb1926_9469_4b40_bfbf_478b9888cd3a, + .name = "router_discovery", + .code = 31, + .__type = "bool" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hcb776249_e8b1_4502_b33b_fa294d44077d, + .name = "ethernet_encap", + .code = 36, + .__type = "bool" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'ha2df9eaa_aea9_497f_b339_0c8ec3e39a07, + .name = "default_ttl", + .code = 23, + .__type = "uint8" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hb44b45a9_5004_4ef5_8e6a_aa8629e1afb1, + .name = "tcp_ttl", + .code = 37, + .__type = "uint8" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h50f01ca7_c650_46f0_8f50_39a67ec657da, + .name = "mtu", + .code = 26, + .__type = "uint16" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h9d31c057_6085_4810_96af_eeac7d3c5308, + .name = "lease_time", + .code = 51, + .__type = "uint32" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hea1e2e7a_9585_46ee_ad49_adfdefc0c4ef, + .name = "T1", + .code = 58, + .__type = "uint32" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hbc83a233_554b_453a_afca_1eadf76810d2, + .name = "T2", + .code = 59, + .__type = "uint32" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h1ab3eeca_0523_4101_9076_eea77d0232f4, + .name = "bootfile_name", + .code = 67, + .__type = "str" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'ha5c20b69_f7f3_4fa8_b550_8697aec6cbb7, + .name = "wpad", + .code = 252, + .__type = "str" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h1516bcb6_cc93_4233_a63f_bd29c8601831, + .name = "path_prefix", + .code = 210, + .__type = "str" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hc98e13cd_f653_473c_85c1_850dcad685fc, + .name = "tftp_server_address", + .code = 150, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hfbe06e70_b43d_4dd9_9b21_2f27eb5da5df, + .name = "arp_cache_timeout", + .code = 35, + .__type = "uint32" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h2af54a3c_545c_4104_ae1c_432caa3e085e, + .name = "tcp_keepalive_interval", + .code = 38, + .__type = "uint32" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h4b2144e8_8d3f_4d96_9032_fe23c1866cd4, + .name = "domain_search_list", + .code = 119, + .__type = "domains" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'hb7236164_eea4_4bf2_9306_8619a9e3ad1d, + .name = "broadcast_address", + .code = 28, + .__type = "ipv4" +). + +sb::Out_DHCP_Options ( + ._uuid = 128'h2d738583_96f4_4a78_99a1_f8f7fe328f3f, + .name = "bootfile_name_alt", + .code = 254, + .__type = "str" +). + + +/* + * DHCPv6 options: fixed table + */ +sb::Out_DHCPv6_Options ( + ._uuid = 128'h100b2659_0ec0_4da7_9ec3_25997f92dc00, + .name = "server_id", + .code = 2, + .__type = "mac" +). + +sb::Out_DHCPv6_Options ( + ._uuid = 128'h53f49b50_db75_4b0d_83df_50d31009ca9c, + .name = "ia_addr", + .code = 5, + .__type = "ipv6" +). + +sb::Out_DHCPv6_Options ( + ._uuid = 128'he3619685_d4f7_42ad_936b_4f4440b7eeb4, + .name = "dns_server", + .code = 23, + .__type = "ipv6" +). + +sb::Out_DHCPv6_Options ( + ._uuid = 128'hcb8a4e7f_a312_4cb1_a846_e474d9f0c531, + .name = "domain_search", + .code = 24, + .__type = "str" +). + + +/* + * DNS: copied from NB + datapaths column pointer to LS datapaths that use the record + */ + +function map_to_lowercase(m_in: Map<string,string>): Map<string,string> { + var m_out = map_empty(); + for (node in m_in) { + (var k, var v) = node; + map_insert(m_out, string_to_lowercase(k), string_to_lowercase(v)) + }; + m_out +} + +sb::Out_DNS(._uuid = nbdns._uuid, + .records = map_to_lowercase(nbdns.records), + .datapaths = datapaths, + .external_ids = map_insert_imm(nbdns.external_ids, "dns_id", uuid2str(nbdns._uuid))) :- + nb::DNS[nbdns], + LogicalSwitchDNS(ls_uuid, nbdns._uuid), + var datapaths = ls_uuid.group_by(nbdns).to_set(). + +/* + * RBAC_Permission: fixed + */ + +sb::Out_RBAC_Permission ( + ._uuid = 128'h7df3749a_1754_4a78_afa4_3abf526fe510, + .table = "Chassis", + .authorization = set_singleton("name"), + .insert_delete = true, + .update = ["nb_cfg", "external_ids", "encaps", + "vtep_logical_switches", "other_config", "name"].to_set() +). + +sb::Out_RBAC_Permission ( + ._uuid = 128'h07e623f7_137c_4a11_9084_3b3f89cb4a54, + .table = "Chassis_Private", + .authorization = set_singleton("name"), + .insert_delete = true, + .update = ["nb_cfg", "nb_cfg_timestamp", "chassis", "name"].to_set() +). + +sb::Out_RBAC_Permission ( + ._uuid = 128'h94bec860_431e_4d95_82e7_3b75d8997241, + .table = "Encap", + .authorization = set_singleton("chassis_name"), + .insert_delete = true, + .update = ["type", "options", "ip", "chassis_name"].to_set() +). + +sb::Out_RBAC_Permission ( + ._uuid = 128'hd8ceff1a_2b11_48bd_802f_4a991aa4e908, + .table = "Port_Binding", + .authorization = set_singleton(""), + .insert_delete = false, + .update = set_singleton("chassis") +). + +sb::Out_RBAC_Permission ( + ._uuid = 128'h6ffdc696_8bfb_4d82_b620_a00d39270b2f, + .table = "MAC_Binding", + .authorization = set_singleton(""), + .insert_delete = true, + .update = ["logical_port", "ip", "mac", "datapath"].to_set() +). + +sb::Out_RBAC_Permission ( + ._uuid = 128'h39231c7e_4bf1_41d0_ada4_1d8a319c0da3, + .table = "Service_Monitor", + .authorization = set_singleton(""), + .insert_delete = false, + .update = set_singleton("status") +). + +/* + * RBAC_Role: fixed + */ +sb::Out_RBAC_Role ( + ._uuid = 128'ha406b472_5de8_4456_9f38_bf344c911b22, + .name = "ovn-controller", + .permissions = [ + "Chassis" -> 128'h7df3749a_1754_4a78_afa4_3abf526fe510, + "Chassis_Private" -> 128'h07e623f7_137c_4a11_9084_3b3f89cb4a54, + "Encap" -> 128'h94bec860_431e_4d95_82e7_3b75d8997241, + "Port_Binding" -> 128'hd8ceff1a_2b11_48bd_802f_4a991aa4e908, + "MAC_Binding" -> 128'h6ffdc696_8bfb_4d82_b620_a00d39270b2f, + "Service_Monitor"-> 128'h39231c7e_4bf1_41d0_ada4_1d8a319c0da3] + +). + +/* Output modified Logical_Switch_Port table with dynamic address updated */ +nb::Out_Logical_Switch_Port(._uuid = lsp._uuid, + .tag = tag, + .dynamic_addresses = dynamic_addresses, + .up = Some{up}) :- + SwitchPortNewDynamicAddress(&SwitchPort{.lsp = lsp, .up = up}, opt_dyn_addr), + var dynamic_addresses = match (opt_dyn_addr) { + None -> None, + Some{dyn_addr} -> Some{"${dyn_addr}"} + }, + SwitchPortNewDynamicTag(lsp._uuid, opt_tag), + var tag = match (opt_tag) { + None -> lsp.tag, + Some{t} -> Some{t} + }. + +relation LRPIPv6Prefix0(lrp_uuid: uuid, ipv6_prefix: string) +LRPIPv6Prefix0(lrp._uuid, ipv6_prefix) :- + lrp in nb::Logical_Router_Port(), + map_get_bool_def(lrp.options, "prefix", false), + sb::Port_Binding(.logical_port = lrp.name, .options = options), + Some{var ipv6_ra_pd_list} = map_get(options, "ipv6_ra_pd_list"), + var parts = string_split(ipv6_ra_pd_list, ","), + Some{var ipv6_prefix} = vec_nth(parts, 1). + +relation LRPIPv6Prefix(lrp_uuid: uuid, ipv6_prefix: Option<string>) +LRPIPv6Prefix(lrp_uuid, Some{ipv6_prefix}) :- + LRPIPv6Prefix0(lrp_uuid, ipv6_prefix). +LRPIPv6Prefix(lrp_uuid, None) :- + nb::Logical_Router_Port(._uuid = lrp_uuid), + not LRPIPv6Prefix0(lrp_uuid, _). + +nb::Out_Logical_Router_Port(._uuid = _uuid, + .ipv6_prefix = to_set(ipv6_prefix)) :- + nb::Logical_Router_Port(._uuid = _uuid, .name = name), + LRPIPv6Prefix(_uuid, ipv6_prefix). + +typedef Direction = IN | OUT + +typedef PipelineStage = PORT_SEC_L2 + | PORT_SEC_IP + | PORT_SEC_ND + | PRE_ACL + | PRE_LB + | PRE_STATEFUL + | ACL_HINT + | ACL + | QOS_MARK + | QOS_METER + | LB + | STATEFUL + | PRE_HAIRPIN + | HAIRPIN + | ARP_ND_RSP + | DHCP_OPTIONS + | DHCP_RESPONSE + | DNS_LOOKUP + | DNS_RESPONSE + | EXTERNAL_PORT + | L2_LKUP + | ADMISSION + | LOOKUP_NEIGHBOR + | LEARN_NEIGHBOR + | IP_INPUT + | DEFRAG + | UNSNAT + | DNAT + | ECMP_STATEFUL + | ND_RA_OPTIONS + | ND_RA_RESPONSE + | IP_ROUTING + | IP_ROUTING_ECMP + | POLICY + | ARP_RESOLVE + | CHK_PKT_LEN + | LARGER_PKTS + | GW_REDIRECT + | ARP_REQUEST + | UNDNAT + | SNAT + | EGR_LOOP + | DELIVERY + +typedef DatapathType = LSwitch | LRouter + +typedef Stage = Stage{ + datapath : DatapathType, + direction : Direction, + stage : PipelineStage +} + +function switch_stage(direction: Direction, stage: PipelineStage): Stage = { + Stage{LSwitch, direction, stage} +} + +function router_stage(direction: Direction, stage: PipelineStage): Stage = { + Stage{LRouter, direction, stage} +} + +function stage_id(stage: Stage): (integer, string) = +{ + match ((stage.datapath, stage.direction, stage.stage)) { + /* Logical switch ingress stages. */ + (LSwitch, IN, PORT_SEC_L2) -> (0, "ls_in_port_sec_l2"), + (LSwitch, IN, PORT_SEC_IP) -> (1, "ls_in_port_sec_ip"), + (LSwitch, IN, PORT_SEC_ND) -> (2, "ls_in_port_sec_nd"), + (LSwitch, IN, PRE_ACL) -> (3, "ls_in_pre_acl"), + (LSwitch, IN, PRE_LB) -> (4, "ls_in_pre_lb"), + (LSwitch, IN, PRE_STATEFUL) -> (5, "ls_in_pre_stateful"), + (LSwitch, IN, ACL_HINT) -> (6, "ls_in_acl_hint"), + (LSwitch, IN, ACL) -> (7, "ls_in_acl"), + (LSwitch, IN, QOS_MARK) -> (8, "ls_in_qos_mark"), + (LSwitch, IN, QOS_METER) -> (9, "ls_in_qos_meter"), + (LSwitch, IN, LB) -> (10, "ls_in_lb"), + (LSwitch, IN, STATEFUL) -> (11, "ls_in_stateful"), + (LSwitch, IN, PRE_HAIRPIN) -> (12, "ls_in_pre_hairpin"), + (LSwitch, IN, HAIRPIN) -> (13, "ls_in_hairpin"), + (LSwitch, IN, ARP_ND_RSP) -> (14, "ls_in_arp_rsp"), + (LSwitch, IN, DHCP_OPTIONS) -> (15, "ls_in_dhcp_options"), + (LSwitch, IN, DHCP_RESPONSE) -> (16, "ls_in_dhcp_response"), + (LSwitch, IN, DNS_LOOKUP) -> (17, "ls_in_dns_lookup"), + (LSwitch, IN, DNS_RESPONSE) -> (18, "ls_in_dns_response"), + (LSwitch, IN, EXTERNAL_PORT) -> (19, "ls_in_external_port"), + (LSwitch, IN, L2_LKUP) -> (20, "ls_in_l2_lkup"), + + /* Logical switch egress stages. */ + (LSwitch, OUT, PRE_LB) -> (0, "ls_out_pre_lb"), + (LSwitch, OUT, PRE_ACL) -> (1, "ls_out_pre_acl"), + (LSwitch, OUT, PRE_STATEFUL) -> (2, "ls_out_pre_stateful"), + (LSwitch, OUT, LB) -> (3, "ls_out_lb"), + (LSwitch, OUT, ACL_HINT) -> (4, "ls_out_acl_hint"), + (LSwitch, OUT, ACL) -> (5, "ls_out_acl"), + (LSwitch, OUT, QOS_MARK) -> (6, "ls_out_qos_mark"), + (LSwitch, OUT, QOS_METER) -> (7, "ls_out_qos_meter"), + (LSwitch, OUT, STATEFUL) -> (8, "ls_out_stateful"), + (LSwitch, OUT, PORT_SEC_IP) -> (9, "ls_out_port_sec_ip"), + (LSwitch, OUT, PORT_SEC_L2) -> (10, "ls_out_port_sec_l2"), + + /* Logical router ingress stages. */ + (LRouter, IN, ADMISSION) -> (0, "lr_in_admission"), + (LRouter, IN, LOOKUP_NEIGHBOR) -> (1, "lr_in_lookup_neighbor"), + (LRouter, IN, LEARN_NEIGHBOR) -> (2, "lr_in_learn_neighbor"), + (LRouter, IN, IP_INPUT) -> (3, "lr_in_ip_input"), + (LRouter, IN, DEFRAG) -> (4, "lr_in_defrag"), + (LRouter, IN, UNSNAT) -> (5, "lr_in_unsnat"), + (LRouter, IN, DNAT) -> (6, "lr_in_dnat"), + (LRouter, IN, ECMP_STATEFUL) -> (7, "lr_in_ecmp_stateful"), + (LRouter, IN, ND_RA_OPTIONS) -> (8, "lr_in_nd_ra_options"), + (LRouter, IN, ND_RA_RESPONSE)-> (9, "lr_in_nd_ra_response"), + (LRouter, IN, IP_ROUTING) -> (10, "lr_in_ip_routing"), + (LRouter, IN, IP_ROUTING_ECMP) -> (11, "lr_in_ip_routing_ecmp"), + (LRouter, IN, POLICY) -> (12, "lr_in_policy"), + (LRouter, IN, ARP_RESOLVE) -> (13, "lr_in_arp_resolve"), + (LRouter, IN, CHK_PKT_LEN) -> (14, "lr_in_chk_pkt_len"), + (LRouter, IN, LARGER_PKTS) -> (15, "lr_in_larger_pkts"), + (LRouter, IN, GW_REDIRECT) -> (16, "lr_in_gw_redirect"), + (LRouter, IN, ARP_REQUEST) -> (17, "lr_in_arp_request"), + + /* Logical router egress stages. */ + (LRouter, OUT, UNDNAT) -> (0, "lr_out_undnat"), + (LRouter, OUT, SNAT) -> (1, "lr_out_snat"), + (LRouter, OUT, EGR_LOOP) -> (2, "lr_out_egr_loop"), + (LRouter, OUT, DELIVERY) -> (3, "lr_out_delivery"), + + _ -> (64'hffffffffffffffff, "") /* alternatively crash? */ + } +} + +/* + * OVS register usage: + * + * Logical Switch pipeline: + * +---------+----------------------------------------------+ + * | R0 | REGBIT_{CONNTRACK/DHCP/DNS/HAIRPIN} | + * | | REGBIT_ACL_HINT_{ALLOW_NEW/ALLOW/DROP/BLOCK} | + * +---------+----------------------------------------------+ + * | R1 - R9 | UNUSED | + * +---------+----------------------------------------------+ + * + * Logical Router pipeline: + * +-----+--------------------------+---+-----------------+---+---------------+ + * | R0 | REGBIT_ND_RA_OPTS_RESULT | | | | | + * | | (= IN_ND_RA_OPTIONS) | X | | | | + * | | NEXT_HOP_IPV4 | R | | | | + * | | (>= IP_INPUT) | E | INPORT_ETH_ADDR | X | | + * +-----+--------------------------+ G | (< IP_INPUT) | X | | + * | R1 | SRC_IPV4 for ARP-REQ | 0 | | R | | + * | | (>= IP_INPUT) | | | E | NEXT_HOP_IPV6 | + * +-----+--------------------------+---+-----------------+ G | (>= IP_INPUT) | + * | R2 | UNUSED | X | | 0 | | + * | | | R | | | | + * +-----+--------------------------+ E | UNUSED | | | + * | R3 | UNUSED | G | | | | + * | | | 1 | | | | + * +-----+--------------------------+---+-----------------+---+---------------+ + * | R4 | UNUSED | X | | | | + * | | | R | | | | + * +-----+--------------------------+ E | UNUSED | X | | + * | R5 | UNUSED | G | | X | | + * | | | 2 | | R |SRC_IPV6 for NS| + * +-----+--------------------------+---+-----------------+ E | (>= IP_INPUT) | + * | R6 | UNUSED | X | | G | | + * | | | R | | 1 | | + * +-----+--------------------------+ E | UNUSED | | | + * | R7 | UNUSED | G | | | | + * | | | 3 | | | | + * +-----+--------------------------+---+-----------------+---+---------------+ + * | R8 | ECMP_GROUP_ID | | | + * | | ECMP_MEMBER_ID | X | | + * +-----+--------------------------+ R | | + * | | REGBIT_{ | E | | + * | | EGRESS_LOOPBACK/ | G | UNUSED | + * | R9 | PKT_LARGER/ | 4 | | + * | | LOOKUP_NEIGHBOR_RESULT/| | | + * | | SKIP_LOOKUP_NEIGHBOR} | | | + * +-----+--------------------------+---+-----------------+ + * + */ + +/* Register definitions specific to routers. */ +function rEG_NEXT_HOP(): string = "reg0" /* reg0 for IPv4, xxreg0 for IPv6 */ +function rEG_SRC(): string = "reg1" /* reg1 for IPv4, xxreg1 for IPv6 */ + +/* Register definitions specific to switches. */ +function rEGBIT_CONNTRACK_DEFRAG() : string = "reg0[0]" +function rEGBIT_CONNTRACK_COMMIT() : string = "reg0[1]" +function rEGBIT_CONNTRACK_NAT() : string = "reg0[2]" +function rEGBIT_DHCP_OPTS_RESULT() : string = "reg0[3]" +function rEGBIT_DNS_LOOKUP_RESULT(): string = "reg0[4]" +function rEGBIT_ND_RA_OPTS_RESULT(): string = "reg0[5]" +function rEGBIT_HAIRPIN() : string = "reg0[6]" +function rEGBIT_ACL_HINT_ALLOW_NEW(): string = "reg0[7]" +function rEGBIT_ACL_HINT_ALLOW() : string = "reg0[8]" +function rEGBIT_ACL_HINT_DROP() : string = "reg0[9]" +function rEGBIT_ACL_HINT_BLOCK() : string = "reg0[10]" + +/* Register definitions for switches and routers. */ + +/* Indicate that this packet has been recirculated using egress + * loopback. This allows certain checks to be bypassed, such as a +* logical router dropping packets with source IP address equals +* one of the logical router's own IP addresses. */ +function rEGBIT_EGRESS_LOOPBACK() : string = "reg9[0]" +/* Register to store the result of check_pkt_larger action. */ +function rEGBIT_PKT_LARGER() : string = "reg9[1]" +function rEGBIT_LOOKUP_NEIGHBOR_RESULT() : string = "reg9[2]" +function rEGBIT_LOOKUP_NEIGHBOR_IP_RESULT() : string = "reg9[3]" + +/* Register to store the eth address associated to a router port for packets + * received in S_ROUTER_IN_ADMISSION. + */ +function rEG_INPORT_ETH_ADDR() : string = "xreg0[0..47]" + +/* Register for ECMP bucket selection. */ +function rEG_ECMP_GROUP_ID() : string = "reg8[0..15]" +function rEG_ECMP_MEMBER_ID() : string = "reg8[16..31]" + +function fLAGBIT_NOT_VXLAN() : string = "flags[1] == 0" + +function mFF_N_LOG_REGS() : bit<32> = 10 + +/* + * Logical_Flow + relation Out_Logical_Flow ( + logical_datapath: string, + pipeline: string, + table_id: integer, + priority: integer, + __match: string, + actions: string, + external_ids: Map<string,string>) + */ + +relation Flow ( + logical_datapath: uuid, + stage: Stage, + priority: integer, + __match: string, + actions: string, + external_ids: Map<string,string> +) + +sb::Out_Logical_Flow(._uuid = hash128((f.logical_datapath, f.stage, f.priority, f.__match, f.actions, f.external_ids)), + .logical_datapath = f.logical_datapath, + .pipeline = if (f.stage.direction == IN) "ingress" else "egress", + .table_id = table_id, + .priority = f.priority, + .__match = f.__match, + .actions = f.actions, + .external_ids = map_insert_imm(f.external_ids, "stage-name", table_name)) :- + Flow[f], + (var table_id, var table_name) = stage_id(f.stage). + +/* Logical flows for forwarding groups. */ +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(fg_uuid)) :- + sw in &Switch(), + var fg_uuid = FlatMap(sw.ls.forwarding_groups), + fg in nb::Forwarding_Group(._uuid = fg_uuid), + not set_is_empty(fg.child_port), + var __match = "arp.tpa == ${fg.vip} && arp.op == 1", + var actions = "eth.dst = eth.src; " + "eth.src = ${fg.vmac}; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = ${fg.vmac}; " + "arp.tpa = arp.spa; " + "arp.spa = ${fg.vip}; " + "outport = inport; " + "flags.loopback = 1; " + "output;". + +function escape_child_ports(child_port: Set<string>): string { + var escaped = vec_with_capacity(set_size(child_port)); + for (s in child_port) { + vec_push(escaped, json_string_escape(s)) + }; + string_join(escaped, ",") +} +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = map_empty()) :- + sw in &Switch(), + var fg_uuid = FlatMap(sw.ls.forwarding_groups), + fg in nb::Forwarding_Group(._uuid = fg_uuid), + not set_is_empty(fg.child_port), + var __match = "eth.dst == ${fg.vmac}", + var actions = "fwd_group(" ++ + if (fg.liveness) { "liveness=\"true\"," } else { "" } ++ + "childports=" ++ escape_child_ports(fg.child_port) ++ ");". + +/* Logical switch ingress table PORT_SEC_L2: admission control framework + * (priority 100) */ +for (sw in &Switch()) { + if (not sw.is_vlan_transparent) { + /* Block logical VLANs. */ + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_L2), + .priority = 100, + .__match = "vlan.present", + .actions = "drop;", + .external_ids = map_empty() /*TODO: check*/) + }; + + /* Broadcast/multicast source address is invalid */ + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_L2), + .priority = 100, + .__match = "eth.src[40]", + .actions = "drop;", + .external_ids = map_empty() /*TODO: check*/) + /* Port security flows have priority 50 (see below) and will continue to the next table + if packet source is acceptable. */ +} + +// space-separated set of strings +function join(strings: Set<string>, sep: string): string { + strings.to_vec().join(sep) +} + +function build_port_security_ipv6_flow( + pipeline: Direction, + ea: eth_addr, + ipv6_addrs: Vec<ipv6_netaddr>): string = +{ + var ip6_addrs = vec_empty(); + + /* Allow link-local address. */ + vec_push(ip6_addrs, ipv6_string_mapped(in6_generate_lla(ea))); + + /* Allow ip6.dst=ff00::/8 for multicast packets */ + if (pipeline == OUT) { + vec_push(ip6_addrs, "ff00::/8") + }; + for (addr in ipv6_addrs) { + vec_push(ip6_addrs, ipv6_netaddr_match_network(addr)) + }; + + var dir = if (pipeline == IN) { "src" } else { "dst" }; + " && ip6.${dir} == {" ++ ip6_addrs.join(", ") ++ "}" +} + +function build_port_security_ipv6_nd_flow( + ea: eth_addr, + ipv6_addrs: Vec<ipv6_netaddr>): string = +{ + var __match = " && ip6 && nd && ((nd.sll == ${eth_addr_zero()} || " + "nd.sll == ${ea}) || ((nd.tll == ${eth_addr_zero()} || " + "nd.tll == ${ea})"; + if (vec_is_empty(ipv6_addrs)) { + __match ++ "))" + } else { + var ip6_str = ipv6_string_mapped(in6_generate_lla(ea)); + __match = __match ++ " && (nd.target == ${ip6_str}"; + + for(addr in ipv6_addrs) { + ip6_str = ipv6_netaddr_match_network(addr); + __match = __match ++ " || nd.target == ${ip6_str}" + }; + __match ++ ")))" + } +} + +/* Pre-ACL */ +for (&Switch(.ls =ls)) { + /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are + * allowed by default. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_ACL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_ACL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_ACL), + .priority = 110, + .__match = "eth.dst == $svc_monitor_mac", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_ACL), + .priority = 110, + .__match = "eth.src == $svc_monitor_mac", + .actions = "next;", + .external_ids = map_empty()) +} + + +/* If there are any stateful ACL rules in this datapath, we must + * send all IP packets through the conntrack action, which handles + * defragmentation, in order to match L4 headers. */ + +for (&SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "router"}, + .json_name = lsp_name, + .sw = &Switch{.ls = ls, .has_stateful_acl = true})) { + /* Can't use ct() for router ports. Consider the + * following configuration: lp1(10.0.0.2) on + * hostA--ls1--lr0--ls2--lp2(10.0.1.2) on hostB, For a + * ping from lp1 to lp2, First, the response will go + * through ct() with a zone for lp2 in the ls2 ingress + * pipeline on hostB. That ct zone knows about this + * connection. Next, it goes through ct() with the zone + * for the router port in the egress pipeline of ls2 on + * hostB. This zone does not know about the connection, + * as the icmp request went through the logical router + * on hostA, not hostB. This would only work with + * distributed conntrack state across all chassis. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_ACL), + .priority = 110, + .__match = "ip && inport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_ACL), + .priority = 110, + .__match = "ip && outport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) +} + +for (&SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "localnet"}, + .json_name = lsp_name, + .sw = &Switch{.ls = ls, .has_stateful_acl = true})) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_ACL), + .priority = 110, + .__match = "ip && inport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_ACL), + .priority = 110, + .__match = "ip && outport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) +} + +for (&Switch(.ls = ls, .has_stateful_acl = true)) { + /* Ingress and Egress Pre-ACL Table (Priority 110). + * + * Not to do conntrack on ND and ICMP destination + * unreachable packets. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_ACL), + .priority = 110, + .__match = "nd || nd_rs || nd_ra || mldv1 || mldv2 || " + "(udp && udp.src == 546 && udp.dst == 547)", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_ACL), + .priority = 110, + .__match = "nd || nd_rs || nd_ra || mldv1 || mldv2 || " + "(udp && udp.src == 546 && udp.dst == 547)", + .actions = "next;", + .external_ids = map_empty()); + + /* Ingress and Egress Pre-ACL Table (Priority 100). + * + * Regardless of whether the ACL is "from-lport" or "to-lport", + * we need rules in both the ingress and egress table, because + * the return traffic needs to be followed. + * + * 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send + * it to conntrack for tracking and defragmentation. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_ACL), + .priority = 100, + .__match = "ip", + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_ACL), + .priority = 100, + .__match = "ip", + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", + .external_ids = map_empty()) +} + +/* Pre-LB */ +for (&Switch(.ls = ls)) { + /* Do not send ND packets to conntrack */ + var __match = "nd || nd_rs || nd_ra || mldv1 || mldv2" in { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_LB), + .priority = 110, + .__match = __match, + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_LB), + .priority = 110, + .__match = __match, + .actions = "next;", + .external_ids = map_empty()) + }; + + /* Do not send service monitor packets to conntrack. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_LB), + .priority = 110, + .__match = "eth.dst == $svc_monitor_mac", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_LB), + .priority = 110, + .__match = "eth.src == $svc_monitor_mac", + .actions = "next;", + .external_ids = map_empty()); + + /* Allow all packets to go to next tables by default. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_LB), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_LB), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +for (&SwitchPort(.lsp = lsp, .json_name = lsp_name, .sw = &Switch{.ls = ls})) +if (lsp.__type == "router" or lsp.__type == "localnet") { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_LB), + .priority = 110, + .__match = "ip && inport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_LB), + .priority = 110, + .__match = "ip && outport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) +} + +relation HasEventElbMeter(has_meter: bool) + +HasEventElbMeter(true) :- + nb::Meter(.name = "event-elb"). + +HasEventElbMeter(false) :- + Unit(), + not nb::Meter(.name = "event-elb"). + +/* Empty LoadBalancer Controller event */ +function build_empty_lb_event_flow(key: string, lb: nb::Load_Balancer, + meter: bool): Option<(string, string)> { + (var ip, var port) = match (ip_address_and_port_from_lb_key(key)) { + Some{(ip, port)} -> (ip, port), + _ -> return None + }; + + var protocol = match (lb.protocol) { + Some{"tcp"} -> "tcp", + _ -> "udp" + }; + var meter = match (meter) { + true -> "event-elb", + _ -> "" + }; + var vip = match (port) { + 0 -> "${ip}", + _ -> "${ip.to_bracketed_string()}:${port}" + }; + + var __match = vec_with_capacity(2); + __match.push("${ip46_ipX(ip)}.dst == ${ip}"); + if (port != 0) { + __match.push("${protocol}.dst == ${port}"); + }; + + var action = "trigger_event(" + "event = \"empty_lb_backends\", " + "meter = \"${meter}\", " + "vip = \"${vip}\", " + "protocol = \"${protocol}\", " + "load_balancer = \"${uuid2str(lb._uuid)}\");"; + + Some{(__match.join(" && "), action)} +} + +/* ControllerEventEn has exactly one row, either 'true' to enable controller + * events or 'false' to disable them. */ +relation ControllerEventEn(enable: bool) +ControllerEventEn(map_get_bool_def(options, "controller_event", false)) :- + nb::NB_Global(.options = options). +ControllerEventEn(false) :- Unit(), not nb::NB_Global(). + +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PRE_LB), + .priority = 130, + .__match = __match, + .actions = __action, + .external_ids = stage_hint(lb._uuid)) :- + ControllerEventEn(true), + SwitchLBVIP(.sw_uuid = sw_uuid, .lb = &lb, .vip = vip, .backends = backends), + sw in &Switch(.ls = nb::Logical_Switch{._uuid = sw_uuid}), + backends == "", + HasEventElbMeter(has_elb_meter), + Some {(var __match, var __action)} = build_empty_lb_event_flow( + vip, lb, has_elb_meter). + +/* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send + * packet to conntrack for defragmentation. + * + * Send all the packets to conntrack in the ingress pipeline if the + * logical switch has a load balancer with VIP configured. Earlier + * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline + * if the IP destination matches the VIP. But this causes few issues when + * a logical switch has no ACLs configured with allow-related. + * To understand the issue, lets a take a TCP load balancer - + * 10.0.0.10:80=10.0.0.3:80. + * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with + * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1' + * is sent to the p1's conntrack zone id and the packet is load balanced + * to the backend - 10.0.0.3. For the reply packet from the backend lport, + * it is not sent to the conntrack of backend lport's zone id. This is fine + * as long as the packet is valid. Suppose the backend lport sends an + * invalid TCP packet (like incorrect sequence number), the packet gets + * delivered to the lport 'p1' without unDNATing the packet to the + * VIP - 10.0.0.10. And this causes the connection to be reset by the + * lport p1's VIF. + * + * We can't fix this issue by adding a logical flow to drop ct.inv packets + * in the egress pipeline since it will drop all other connections not + * destined to the load balancers. + * + * To fix this issue, we send all the packets to the conntrack in the + * ingress pipeline if a load balancer is configured. We can now + * add a lflow to drop ct.inv packets. + */ +for (sw in &Switch(.has_lb_vip = true)) { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PRE_LB), + .priority = 100, + .__match = "ip", + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", + .external_ids = map_empty()); + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, PRE_LB), + .priority = 100, + .__match = "ip", + .actions = "${rEGBIT_CONNTRACK_DEFRAG()} = 1; next;", + .external_ids = map_empty()) +} + +/* Pre-stateful */ +for (&Switch(.ls = ls)) { + /* Ingress and Egress pre-stateful Table (Priority 0): Packets are + * allowed by default. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_STATEFUL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_STATEFUL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + /* If REGBIT_CONNTRACK_DEFRAG is set as 1, then the packets should be + * sent to conntrack for tracking and defragmentation. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PRE_STATEFUL), + .priority = 100, + .__match = "${rEGBIT_CONNTRACK_DEFRAG()} == 1", + .actions = "ct_next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PRE_STATEFUL), + .priority = 100, + .__match = "${rEGBIT_CONNTRACK_DEFRAG()} == 1", + .actions = "ct_next;", + .external_ids = map_empty()) +} + +function build_acl_log(acl: nb::ACL): string = +{ + if (not acl.log) { + "" + } else { + var strs = vec_empty(); + match (acl.name) { + None -> (), + Some{name} -> vec_push(strs, "name=${json_string_escape(name)}") + }; + /* If a severity level isn't specified, default to "info". */ + match (acl.severity) { + None -> vec_push(strs, "severity=info"), + Some{severity} -> vec_push(strs, "severity=${severity}") + }; + match (acl.action) { + "drop" -> { + vec_push(strs, "verdict=drop") + }, + "reject" -> { + vec_push(strs, "verdict=reject") + }, + "allow" -> { + vec_push(strs, "verdict=allow") + }, + "allow-related" -> { + vec_push(strs, "verdict=allow") + }, + _ -> () + }; + match (acl.meter) { + None -> (), + Some{meter} -> vec_push(strs, "meter=${json_string_escape(meter)}") + }; + "log(${string_join(strs, \", \")}); " + } +} + +/* Due to various hard-coded priorities need to implement ACLs, the + * northbound database supports a smaller range of ACL priorities than + * are available to logical flows. This value is added to an ACL + * priority to determine the ACL's logical flow priority. */ +function oVN_ACL_PRI_OFFSET(): integer = 1000 + +/* Intermediate relation that stores reject ACLs. + * The following rules generate logical flows for these ACLs. + */ +relation Reject(lsuuid: uuid, pipeline: string, stage: Stage, acl: nb::ACL, extra_match: string, extra_actions: string) + +/* build_reject_acl_rules() */ +for (Reject(lsuuid, pipeline, stage, acl, extra_match_, extra_actions_)) { + var extra_match = match (extra_match_) { + "" -> "", + s -> "(${s}) && " + } in + var extra_actions = match (extra_actions_) { + "" -> "", + s -> "${s} " + } in + var next = match (pipeline == "ingress") { + true -> "next(pipeline=egress,table=${stage_id(switch_stage(OUT, QOS_MARK)).0})", + false -> "next(pipeline=ingress,table=${stage_id(switch_stage(IN, L2_LKUP)).0})" + } in + var acl_log = build_acl_log(acl) in { + var __match = extra_match ++ acl.__match in + var actions = acl_log ++ extra_actions ++ "reg0 = 0; " + "reject { " + "/* eth.dst <-> eth.src; ip.dst <-> ip.src; is implicit. */ " + "outport <-> inport; ${next}; };" in + Flow(.logical_datapath = lsuuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = __match, + .actions = actions, + .external_ids = stage_hint(acl._uuid)) + } +} + +/* build_acls */ +for (sw in &Switch(.ls = ls)) +var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in +{ + /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by + * default. A related rule at priority 1 is added below if there + * are any stateful ACLs in this datapath. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ACL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + if (has_stateful) { + /* Ingress and Egress ACL Table (Priority 1). + * + * By default, traffic is allowed. This is partially handled by + * the Priority 0 ACL flows added earlier, but we also need to + * commit IP flows. This is because, while the initiater's + * direction may not have any stateful rules, the server's may + * and then its return traffic would not have an associated + * conntrack entry and would return "+invalid". + * + * We use "ct_commit" for a connection that is not already known + * by the connection tracker. Once a connection is committed, + * subsequent packets will hit the flow at priority 0 that just + * uses "next;" + * + * We also check for established connections that have ct_label.blocked + * set on them. That's a connection that was disallowed, but is + * now allowed by policy again since it hit this default-allow flow. + * We need to set ct_label.blocked=0 to let the connection continue, + * which will be done by ct_commit() in the "stateful" stage. + * Subsequent packets will hit the flow at priority 0 that just + * uses "next;". */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ACL), + .priority = 1, + .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))", + .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 1, + .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))", + .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;", + .external_ids = map_empty()); + + /* Ingress and Egress ACL Table (Priority 65535). + * + * Always drop traffic that's in an invalid state. Also drop + * reply direction packets for connections that have been marked + * for deletion (bit 0 of ct_label is set). + * + * This is enforced at a higher priority than ACLs can be defined. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ACL), + .priority = 65535, + .__match = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)", + .actions = "drop;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 65535, + .__match = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)", + .actions = "drop;", + .external_ids = map_empty()); + + /* Ingress and Egress ACL Table (Priority 65535). + * + * Allow reply traffic that is part of an established + * conntrack entry that has not been marked for deletion + * (bit 0 of ct_label). We only match traffic in the + * reply direction because we want traffic in the request + * direction to hit the currently defined policy from ACLs. + * + * This is enforced at a higher priority than ACLs can be defined. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ACL), + .priority = 65535, + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv " + "&& ct.rpl && ct_label.blocked == 0", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 65535, + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv " + "&& ct.rpl && ct_label.blocked == 0", + .actions = "next;", + .external_ids = map_empty()); + + /* Ingress and Egress ACL Table (Priority 65535). + * + * Allow traffic that is related to an existing conntrack entry that + * has not been marked for deletion (bit 0 of ct_label). + * + * This is enforced at a higher priority than ACLs can be defined. + * + * NOTE: This does not support related data sessions (eg, + * a dynamically negotiated FTP data channel), but will allow + * related traffic such as an ICMP Port Unreachable through + * that's generated from a non-listening UDP port. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ACL), + .priority = 65535, + .__match = "!ct.est && ct.rel && !ct.new && !ct.inv " + "&& ct_label.blocked == 0", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 65535, + .__match = "!ct.est && ct.rel && !ct.new && !ct.inv " + "&& ct_label.blocked == 0", + .actions = "next;", + .external_ids = map_empty()); + + /* Ingress and Egress ACL Table (Priority 65535). + * + * Not to do conntrack on ND packets. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ACL), + .priority = 65535, + .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 65535, + .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2", + .actions = "next;", + .external_ids = map_empty()) + }; + + /* Add a 34000 priority flow to advance the DNS reply from ovn-controller, + * if the CMS has configured DNS records for the datapath. + */ + if (sw.has_dns_records) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 34000, + .__match = "udp.src == 53", + .actions = if has_stateful "ct_commit; next;" else "next;", + .external_ids = map_empty()) + }; + + /* Add a 34000 priority flow to advance the service monitor reply + * packets to skip applying ingress ACLs. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ACL), + .priority = 34000, + .__match = "eth.dst == $svc_monitor_mac", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 34000, + .__match = "eth.src == $svc_monitor_mac", + .actions = "next;", + .external_ids = map_empty()) +} + +/* This stage builds hints for the IN/OUT_ACL stage. Based on various + * combinations of ct flags packets may hit only a subset of the logical + * flows in the IN/OUT_ACL stage. + * + * Populating ACL hints first and storing them in registers simplifies + * the logical flow match expressions in the IN/OUT_ACL stage and + * generates less openflows. + * + * Certain combinations of ct flags might be valid matches for multiple + * types of ACL logical flows (e.g., allow/drop). In such cases hints + * corresponding to all potential matches are set. + */ +input relation AclHintStages[Stage] +AclHintStages[switch_stage(IN, ACL_HINT)]. +AclHintStages[switch_stage(OUT, ACL_HINT)]. +for (&Switch(.ls = ls)) { + for (AclHintStages[stage]) { + /* New, not already established connections, may hit either allow + * or drop ACLs. For allow ACLs, the connection must also be committed + * to conntrack so we set REGBIT_ACL_HINT_ALLOW_NEW. + */ + Flow(ls._uuid, stage, 7, "ct.new && !ct.est", + "${rEGBIT_ACL_HINT_ALLOW_NEW()} = 1; " + "${rEGBIT_ACL_HINT_DROP()} = 1; " + "next;", map_empty()); + + /* Already established connections in the "request" direction that + * are already marked as "blocked" may hit either: + * - allow ACLs for connections that were previously allowed by a + * policy that was deleted and is being readded now. In this case + * the connection should be recommitted so we set + * REGBIT_ACL_HINT_ALLOW_NEW. + * - drop ACLs. + */ + Flow(ls._uuid, stage, 6, "!ct.new && ct.est && !ct.rpl && ct_label.blocked == 1", + "${rEGBIT_ACL_HINT_ALLOW_NEW()} = 1; " + "${rEGBIT_ACL_HINT_DROP()} = 1; " + "next;", map_empty()); + + /* Not tracked traffic can either be allowed or dropped. */ + Flow(ls._uuid, stage, 5, "!ct.trk", + "${rEGBIT_ACL_HINT_ALLOW()} = 1; " + "${rEGBIT_ACL_HINT_DROP()} = 1; " + "next;", map_empty()); + + /* Already established connections in the "request" direction may hit + * either: + * - allow ACLs in which case the traffic should be allowed so we set + * REGBIT_ACL_HINT_ALLOW. + * - drop ACLs in which case the traffic should be blocked and the + * connection must be committed with ct_label.blocked set so we set + * REGBIT_ACL_HINT_BLOCK. + */ + Flow(ls._uuid, stage, 4, "!ct.new && ct.est && !ct.rpl && ct_label.blocked == 0", + "${rEGBIT_ACL_HINT_ALLOW()} = 1; " + "${rEGBIT_ACL_HINT_BLOCK()} = 1; " + "next;", map_empty()); + + /* Not established or established and already blocked connections may + * hit drop ACLs. + */ + Flow(ls._uuid, stage, 3, "!ct.est", + "${rEGBIT_ACL_HINT_DROP()} = 1; " + "next;", map_empty()); + Flow(ls._uuid, stage, 2, "ct.est && ct_label.blocked == 1", + "${rEGBIT_ACL_HINT_DROP()} = 1; " + "next;", map_empty()); + + /* Established connections that were previously allowed might hit + * drop ACLs in which case the connection must be committed with + * ct_label.blocked set. + */ + Flow(ls._uuid, stage, 1, "ct.est && ct_label.blocked == 0", + "${rEGBIT_ACL_HINT_BLOCK()} = 1; " + "next;", map_empty()); + + /* In any case, advance to the next stage. */ + Flow(ls._uuid, stage, 0, "1", "next;", map_empty()) + } +} + +/* Ingress or Egress ACL Table (Various priorities). */ +for (&SwitchACL(.sw = &Switch{.ls = ls, .has_stateful_acl = has_stateful}, .acl = &acl)) { + /* consider_acl */ + var ingress = acl.direction == "from-lport" in + var stage = if (ingress) { switch_stage(IN, ACL) } else { switch_stage(OUT, ACL) } in + var pipeline = if ingress "ingress" else "egress" in + var stage_hint = stage_hint(acl._uuid) in + if (acl.action == "allow" or acl.action == "allow-related") { + /* If there are any stateful flows, we must even commit "allow" + * actions. This is because, while the initiater's + * direction may not have any stateful rules, the server's + * may and then its return traffic would not have an + * associated conntrack entry and would return "+invalid". */ + if (not has_stateful) { + Flow(.logical_datapath = ls._uuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = acl.__match, + .actions = "${build_acl_log(acl)}next;", + .external_ids = stage_hint) + } else { + /* Commit the connection tracking entry if it's a new + * connection that matches this ACL. After this commit, + * the reply traffic is allowed by a flow we create at + * priority 65535, defined earlier. + * + * It's also possible that a known connection was marked for + * deletion after a policy was deleted, but the policy was + * re-added while that connection is still known. We catch + * that case here and un-set ct_label.blocked (which will be done + * by ct_commit in the "stateful" stage) to indicate that the + * connection should be allowed to resume. + */ + Flow(.logical_datapath = ls._uuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = "${rEGBIT_ACL_HINT_ALLOW_NEW()} == 1 && (${acl.__match})", + .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; ${build_acl_log(acl)}next;", + .external_ids = stage_hint); + + /* Match on traffic in the request direction for an established + * connection tracking entry that has not been marked for + * deletion. There is no need to commit here, so we can just + * proceed to the next table. We use this to ensure that this + * connection is still allowed by the currently defined + * policy. Match untracked packets too. */ + Flow(.logical_datapath = ls._uuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = "${rEGBIT_ACL_HINT_ALLOW()} == 1 && (${acl.__match})", + .actions = "${build_acl_log(acl)}next;", + .external_ids = stage_hint) + } + } else if (acl.action == "drop" or acl.action == "reject") { + /* The implementation of "drop" differs if stateful ACLs are in + * use for this datapath. In that case, the actions differ + * depending on whether the connection was previously committed + * to the connection tracker with ct_commit. */ + if (has_stateful) { + /* If the packet is not tracked or not part of an established + * connection, then we can simply reject/drop it. */ + var __match = "${rEGBIT_ACL_HINT_DROP()} == 1" in + if (acl.action == "reject") { + Reject(ls._uuid, pipeline, stage, acl, __match, "") + } else { + Flow(.logical_datapath = ls._uuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = __match ++ " && (${acl.__match})", + .actions = "${build_acl_log(acl)}/* drop */", + .external_ids = stage_hint) + }; + /* For an existing connection without ct_label set, we've + * encountered a policy change. ACLs previously allowed + * this connection and we committed the connection tracking + * entry. Current policy says that we should drop this + * connection. First, we set bit 0 of ct_label to indicate + * that this connection is set for deletion. By not + * specifying "next;", we implicitly drop the packet after + * updating conntrack state. We would normally defer + * ct_commit() to the "stateful" stage, but since we're + * rejecting/dropping the packet, we go ahead and do it here. + */ + var __match = "${rEGBIT_ACL_HINT_BLOCK()} == 1" in + var actions = "ct_commit { ct_label.blocked = 1; }; " in + if (acl.action == "reject") { + Reject(ls._uuid, pipeline, stage, acl, __match, actions) + } else { + Flow(.logical_datapath = ls._uuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = __match ++ " && (${acl.__match})", + .actions = "${actions}${build_acl_log(acl)}/* drop */", + .external_ids = stage_hint) + } + } else { + /* There are no stateful ACLs in use on this datapath, + * so a "reject/drop" ACL is simply the "reject/drop" + * logical flow action in all cases. */ + if (acl.action == "reject") { + Reject(ls._uuid, pipeline, stage, acl, "", "") + } else { + Flow(.logical_datapath = ls._uuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = acl.__match, + .actions = "${build_acl_log(acl)}/* drop */", + .external_ids = stage_hint) + } + } + } +} + +/* Add 34000 priority flow to allow DHCP reply from ovn-controller to all + * logical ports of the datapath if the CMS has configured DHCPv4 options. + * */ +for (SwitchPortDHCPv4Options(.port = &SwitchPort{.lsp = lsp, .sw = &sw}, + .dhcpv4_options = dhcpv4_options@&nb::DHCP_Options{.options = options}) + if lsp.__type != "external") { + (Some{var server_id}, Some{var server_mac}, Some{var lease_time}) = + (map_get(options, "server_id"), map_get(options, "server_mac"), map_get(options, "lease_time")) in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 34000, + .__match = "outport == ${json_string_escape(lsp.name)} " + "&& eth.src == ${server_mac} " + "&& ip4.src == ${server_id} && udp && udp.src == 67 " + "&& udp.dst == 68", + .actions = if (sw.has_stateful_acl) "ct_commit; next;" else "next;", + .external_ids = stage_hint(dhcpv4_options._uuid)) +} + +for (SwitchPortDHCPv6Options(.port = &SwitchPort{.lsp = lsp, .sw = &sw}, + .dhcpv6_options = dhcpv6_options@&nb::DHCP_Options{.options=options} ) + if lsp.__type != "external") { + Some{var server_mac} = map_get(options, "server_id") in + Some{var ea} = eth_addr_from_string(server_mac) in + var server_ip = ipv6_string_mapped(in6_generate_lla(ea)) in + /* Get the link local IP of the DHCPv6 server from the + * server MAC. */ + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, ACL), + .priority = 34000, + .__match = "outport == ${json_string_escape(lsp.name)} " + "&& eth.src == ${server_mac} " + "&& ip6.src == ${server_ip} && udp && udp.src == 547 " + "&& udp.dst == 546", + .actions = if (sw.has_stateful_acl) "ct_commit; next;" else "next;", + .external_ids = stage_hint(dhcpv6_options._uuid)) +} + +relation QoSAction(qos: uuid, key_action: string, value_action: integer) + +QoSAction(qos, k, v) :- + nb::QoS(._uuid = qos, .action = actions), + var action = FlatMap(actions), + (var k, var v) = action. + +/* QoS rules */ +for (&Switch(.ls = ls)) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, QOS_MARK), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, QOS_MARK), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, QOS_METER), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, QOS_METER), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +for (SwitchQoS(.sw = &sw, .qos = &qos)) { + var ingress = if (qos.direction == "from-lport") true else false in + var pipeline = if ingress "ingress" else "egress" in { + var stage = if (ingress) { switch_stage(IN, QOS_MARK) } else { switch_stage(OUT, QOS_MARK) } in + /* FIXME: Can value_action be negative? */ + for (QoSAction(qos._uuid, key_action, value_action)) { + if (key_action == "dscp") { + Flow(.logical_datapath = sw.ls._uuid, + .stage = stage, + .priority = qos.priority, + .__match = qos.__match, + .actions = "ip.dscp = ${value_action}; next;", + .external_ids = stage_hint(qos._uuid)) + } + }; + + (var burst, var rate) = { + var rate = 0; + var burst = 0; + for (bw in qos.bandwidth) { + /* FIXME: Can value_bandwidth be negative? */ + (var key_bandwidth, var value_bandwidth) = bw; + if (key_bandwidth == "rate") { + rate = value_bandwidth + } else if (key_bandwidth == "burst") { + burst = value_bandwidth + } else () + }; + (burst, rate) + } in + if (rate != 0) { + var stage = if (ingress) { switch_stage(IN, QOS_METER) } else { switch_stage(OUT, QOS_METER) } in + var meter_action = if (burst != 0) { + "set_meter(${rate}, ${burst}); next;" + } else { + "set_meter(${rate}); next;" + } in + /* Ingress and Egress QoS Meter Table. + * + * We limit the bandwidth of this flow by adding a meter table. + */ + Flow(.logical_datapath = sw.ls._uuid, + .stage = stage, + .priority = qos.priority, + .__match = qos.__match, + .actions = meter_action, + .external_ids = stage_hint(qos._uuid)) + } + } +} + +/* LB rules */ +for (&Switch(.ls = ls, .has_lb_vip = has_lb_vip)) { + /* Ingress and Egress LB Table (Priority 0): Packets are allowed by + * default. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, LB), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, LB), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + if (not ls.load_balancer.is_empty()) { + for (&SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "router"}, + .json_name = lsp_name, + .sw = &Switch{.ls = ls})) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, LB), + .priority = 65535, + .__match = "ip && inport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, LB), + .priority = 65535, + .__match = "ip && outport == ${lsp_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) + } + }; + + if (has_lb_vip) { + /* Ingress and Egress LB Table (Priority 65534). + * + * Send established traffic through conntrack for just NAT. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, LB), + .priority = 65534, + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv && ct_label.natted == 1", + .actions = "${rEGBIT_CONNTRACK_NAT()} = 1; next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, LB), + .priority = 65534, + .__match = "ct.est && !ct.rel && !ct.new && !ct.inv && ct_label.natted == 1", + .actions = "${rEGBIT_CONNTRACK_NAT()} = 1; next;", + .external_ids = map_empty()) + } +} + +/* stateful rules */ +for (&Switch(.ls = ls)) { + /* Ingress and Egress stateful Table (Priority 0): Packets are + * allowed by default. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, STATEFUL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, STATEFUL), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + /* If REGBIT_CONNTRACK_COMMIT is set as 1, then the packets should be + * committed to conntrack. We always set ct_label.blocked to 0 here as + * any packet that makes it this far is part of a connection we + * want to allow to continue. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, STATEFUL), + .priority = 100, + .__match = "${rEGBIT_CONNTRACK_COMMIT()} == 1", + .actions = "ct_commit { ct_label.blocked = 0; }; next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, STATEFUL), + .priority = 100, + .__match = "${rEGBIT_CONNTRACK_COMMIT()} == 1", + .actions = "ct_commit { ct_label.blocked = 0; }; next;", + .external_ids = map_empty()); + + /* If REGBIT_CONNTRACK_NAT is set as 1, then packets should just be sent + * through nat (without committing). + * + * REGBIT_CONNTRACK_COMMIT is set for new connections and + * REGBIT_CONNTRACK_NAT is set for established connections. So they + * don't overlap. + */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, STATEFUL), + .priority = 100, + .__match = "${rEGBIT_CONNTRACK_NAT()} == 1", + .actions = "ct_lb;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, STATEFUL), + .priority = 100, + .__match = "${rEGBIT_CONNTRACK_NAT()} == 1", + .actions = "ct_lb;", + .external_ids = map_empty()) +} + +/* Load balancing rules for new connections get committed to conntrack + * table. So even if REGBIT_CONNTRACK_COMMIT is set in a previous table + * a higher priority rule for load balancing below also commits the + * connection, so it is okay if we do not hit the above match on + * REGBIT_CONNTRACK_COMMIT. */ +function get_match_for_lb_key(ip_address: v46_ip, + port: bit<16>, + protocol: Option<string>, + redundancy: bool): string = { + var port_match = if (port != 0) { + var proto = if (protocol == Some{"udp"}) { + "udp" + } else { + "tcp" + }; + if (redundancy) { " && ${proto}" } else { "" } ++ + " && ${proto}.dst == ${port}" + } else { + "" + }; + + var ip_match = match (ip_address) { + IPv4{ipv4} -> "ip4.dst == ${ipv4}", + IPv6{ipv6} -> "ip6.dst == ${ipv6}" + }; + + if (redundancy) { "ip && " } else { "" } ++ ip_match ++ port_match +} +/* New connections in Ingress table. */ + +function ct_lb(backends: string, + selection_fields: Set<string>, protocol: Option<string>): string { + var args = vec_with_capacity(2); + args.push("backends=${backends}"); + + if (not selection_fields.is_empty()) { + var hash_fields = vec_with_capacity(selection_fields.size()); + for (sf in selection_fields) { + var hf = match ((sf, protocol)) { + ("tp_src", Some{p}) -> "${p}_src", + ("tp_dst", Some{p}) -> "${p}_dst", + _ -> sf + }; + hash_fields.push(hf); + }; + args.push("hash_fields=" ++ json_string_escape(hash_fields.join(","))); + }; + + "ct_lb(" ++ args.join("; ") ++ ");" +} +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, STATEFUL), + .priority = priority, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lb._uuid)) :- + sw in &Switch(), + LBVIPBackend[lbvipbackend], + Some{var svc_monitor} = lbvipbackend.svc_monitor, + var lbvip = lbvipbackend.lbvip, + var lb = lbvip.lb, + set_contains(sw.ls.load_balancer, lb._uuid), + bs in &LBVIPBackendStatus(.port = lbvipbackend.port, + .ip = lbvipbackend.ip, + .protocol = default_protocol(lb.protocol), + .logical_port = svc_monitor.port_name), + var bses = bs.group_by((sw, lbvip, lb)).to_set(), + var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false), + var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, + var up_backends = { + var up_backends = set_empty(); + for (bs in bses) { + if (bs.up) { + set_insert(up_backends, "${bs.ip}:${bs.port}") + } + }; + up_backends + }, + var actions = if (set_is_empty(up_backends)) { + "drop;" + } else { + ct_lb(string_join(set_to_vec(up_backends), ","), + lb.selection_fields, lb.protocol) + }. +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, STATEFUL), + .priority = priority, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lb._uuid)) :- + sw in &Switch(), + LBVIPBackend[lbvipbackend], + None = lbvipbackend.svc_monitor, + var lbvip = lbvipbackend.lbvip, + var lb = lbvip.lb, + set_contains(sw.ls.load_balancer, lb._uuid), + var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false), + var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, + var actions = ct_lb(lbvip.backend_ips, lb.selection_fields, lb.protocol). + +/* Also install flows that allow hairpinning of traffic (i.e., if + * a load balancer VIP is DNAT-ed to a backend that happens to be + * the source of the traffic). + */ + +function get_hairpin_match(lbvipbackend: Ref<LBVIPBackend>, + l4_dir: string, l3_dst: Option<v46_ip>): string = { + var lbvip = lbvipbackend.lbvip; + var lb = lbvip.lb; + var ipX = ip46_ipX(lbvip.vip_addr); + + var __match = vec_with_capacity(3); + + vec_push(__match, "${ipX}.src == ${lbvipbackend.ip}"); + + match (l3_dst) { + Some{s} -> vec_push(__match, "${ipX}.dst == ${s}"), + _ -> () + }; + + if (lbvip.vip_port != 0) { + var proto = match (lb.protocol) { + Some{value} -> value, + None -> "tcp" + }; + vec_push(__match, "${proto}.${l4_dir} == ${lbvipbackend.port}") + }; + + "(" ++ string_join(__match, " && ") ++ ")" +} + +/* Ingress Pre-Hairpin table. + * - Priority 2: SNAT load balanced traffic that needs to be hairpinned: + * - Both SRC and DST IP match backend->ip and destination port + * matches backend->port. + * - Priority 1: unSNAT replies to hairpinned load balanced traffic. + * - SRC IP matches backend->ip, DST IP matches LB VIP and source port + * matches backend->port. + */ +/* Packets that after load balancing have equal source and + * destination IPs should be hairpinned. + */ +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PRE_HAIRPIN), + .priority = 2, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lb._uuid)) :- + sw in &Switch(), + LBVIPBackend[lbvipbackend], + var lbvip = lbvipbackend.lbvip, + var lb = lbvip.lb, + set_contains(sw.ls.load_balancer, lb._uuid), + var __match = get_hairpin_match(lbvipbackend, "dst", Some{lbvipbackend.ip}), + var matches = __match.group_by((lbvip, lb, sw)).to_vec(), + var __match = string_join(matches, " || "), + var actions = "${rEGBIT_HAIRPIN()} = 1; ct_snat(${lbvip.vip_addr});". +/* If the packets are replies for hairpinned traffic, UNSNAT them. */ +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PRE_HAIRPIN), + .priority = 1, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lb._uuid)) :- + sw in &Switch(), + LBVIPBackend[lbvipbackend], + var lbvip = lbvipbackend.lbvip, + var lb = lbvip.lb, + set_contains(sw.ls.load_balancer, lb._uuid), + var __match = get_hairpin_match(lbvipbackend, "src", None), + var matches = __match.group_by((lbvip, lb, sw)).to_vec(), + var ipX = ip46_ipX(lbvip.vip_addr), + var __match = "(" ++ string_join(matches, " || ") ++ ") && " + "${ipX}.dst == ${lbvip.vip_addr}", + var actions = "${rEGBIT_HAIRPIN()} = 1; ct_snat;". + + +/* Ingress Pre-Hairpin table (Priority 0). Packets that don't need + * hairpinning should continue processing. + */ +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PRE_HAIRPIN), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) :- + sw in &Switch(). + +/* Ingress Hairpin table. + * - Priority 0: Packets that don't need hairpinning should continue + * processing. + * - Priority 1: Packets that were SNAT-ed for hairpinning should be + * looped back (i.e., swap ETH addresses and send back on inport). + */ +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, HAIRPIN), + .priority = 1, + .__match = "${rEGBIT_HAIRPIN()} == 1", + .actions = "eth.dst <-> eth.src;" + "outport = inport;" + "flags.loopback = 1;" + "output;", + .external_ids = map_empty()) :- + sw in &Switch(). +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, HAIRPIN), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) :- + sw in &Switch(). + + +/* Logical switch ingress table PORT_SEC_L2: ingress port security - L2 (priority 50) + ingress table PORT_SEC_IP: ingress port security - IP (priority 90 and 80) + ingress table PORT_SEC_ND: ingress port security - ND (priority 90 and 80) */ +for (&SwitchPort(.lsp = lsp, .sw = &sw, .json_name = json_name, .ps_eth_addresses = ps_eth_addresses) + if lsp.is_enabled() and lsp.__type != "external") { + for (pbinding in sb::Out_Port_Binding(.logical_port = lsp.name)) { + var __match = if (vec_is_empty(ps_eth_addresses)) { + "inport == ${json_name}" + } else { + "inport == ${json_name} && eth.src == {${ps_eth_addresses.join(\" \")}}" + } in + var actions = match (map_get(pbinding.options, "qdisc_queue_id")) { + None -> "next;", + Some{id} -> "set_queue(${id}); next;" + } in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_L2), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lsp._uuid)) + } +} + +/** +* Build port security constraints on IPv4 and IPv6 src and dst fields +* and add logical flows to S_SWITCH_(IN/OUT)_PORT_SEC_IP stage. +* +* For each port security of the logical port, following +* logical flows are added +* - If the port security has IPv4 addresses, +* - Priority 90 flow to allow IPv4 packets for known IPv4 addresses +* +* - If the port security has IPv6 addresses, +* - Priority 90 flow to allow IPv6 packets for known IPv6 addresses +* +* - If the port security has IPv4 addresses or IPv6 addresses or both +* - Priority 80 flow to drop all IPv4 and IPv6 traffic +*/ +for (SwitchPortPSAddresses(.port = &port@SwitchPort{.sw = &sw}, .ps_addrs = ps) + if port.is_enabled() and + (vec_len(ps.ipv4_addrs) > 0 or vec_len(ps.ipv6_addrs) > 0) and + port.lsp.__type != "external") +{ + if (vec_len(ps.ipv4_addrs) > 0) { + var dhcp_match = "inport == ${port.json_name}" + " && eth.src == ${ps.ea}" + " && ip4.src == 0.0.0.0" + " && ip4.dst == 255.255.255.255" + " && udp.src == 68 && udp.dst == 67" in { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_IP), + .priority = 90, + .__match = dhcp_match, + .actions = "next;", + .external_ids = stage_hint(port.lsp._uuid)) + }; + var addrs = { + var addrs = vec_empty(); + for (addr in ps.ipv4_addrs) { + /* When the netmask is applied, if the host portion is + * non-zero, the host can only use the specified + * address. If zero, the host is allowed to use any + * address in the subnet. + */ + vec_push(addrs, ipv4_netaddr_match_host_or_network(addr)) + }; + addrs + } in + var __match = + "inport == ${port.json_name} && eth.src == ${ps.ea} && ip4.src == {" ++ + string_join(addrs, ", ") ++ "}" in + { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_IP), + .priority = 90, + .__match = __match, + .actions = "next;", + .external_ids = stage_hint(port.lsp._uuid)) + } + }; + if (vec_len(ps.ipv6_addrs) > 0) { + var dad_match = "inport == ${port.json_name}" + " && eth.src == ${ps.ea}" + " && ip6.src == ::" + " && ip6.dst == ff02::/16" + " && icmp6.type == {131, 135, 143}" in + { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_IP), + .priority = 90, + .__match = dad_match, + .actions = "next;", + .external_ids = stage_hint(port.lsp._uuid)) + }; + var __match = "inport == ${port.json_name} && eth.src == ${ps.ea}" ++ + build_port_security_ipv6_flow(IN, ps.ea, ps.ipv6_addrs) in + { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_IP), + .priority = 90, + .__match = __match, + .actions = "next;", + .external_ids = stage_hint(port.lsp._uuid)) + } + }; + var __match = "inport == ${port.json_name} && eth.src == ${ps.ea} && ip" in + { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_IP), + .priority = 80, + .__match = __match, + .actions = "drop;", + .external_ids = stage_hint(port.lsp._uuid)) + } +} + +/** + * Build port security constraints on ARP and IPv6 ND fields + * and add logical flows to S_SWITCH_IN_PORT_SEC_ND stage. + * + * For each port security of the logical port, following + * logical flows are added + * - If the port security has no IP (both IPv4 and IPv6) or + * if it has IPv4 address(es) + * - Priority 90 flow to allow ARP packets for known MAC addresses + * in the eth.src and arp.spa fields. If the port security + * has IPv4 addresses, allow known IPv4 addresses in the arp.tpa field. + * + * - If the port security has no IP (both IPv4 and IPv6) or + * if it has IPv6 address(es) + * - Priority 90 flow to allow IPv6 ND packets for known MAC addresses + * in the eth.src and nd.sll/nd.tll fields. If the port security + * has IPv6 addresses, allow known IPv6 addresses in the nd.target field + * for IPv6 Neighbor Advertisement packet. + * + * - Priority 80 flow to drop ARP and IPv6 ND packets. + */ +for (SwitchPortPSAddresses(.port = &port@SwitchPort{.sw = &sw}, .ps_addrs = ps) + if port.is_enabled() and port.lsp.__type != "external") +{ + var no_ip = vec_is_empty(ps.ipv4_addrs) and vec_is_empty(ps.ipv6_addrs) in + { + if (not vec_is_empty(ps.ipv4_addrs) or no_ip) { + var __match = { + var prefix = "inport == ${port.json_name} && eth.src == ${ps.ea} && arp.sha == ${ps.ea}"; + if (not vec_is_empty(ps.ipv4_addrs)) { + var spas = vec_empty(); + for (addr in ps.ipv4_addrs) { + vec_push(spas, ipv4_netaddr_match_host_or_network(addr)) + }; + prefix ++ " && arp.spa == {${string_join(spas, \", \")}}" + } else { + prefix + } + } in { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_ND), + .priority = 90, + .__match = __match, + .actions = "next;", + .external_ids = stage_hint(port.lsp._uuid)) + } + }; + if (not vec_is_empty(ps.ipv6_addrs) or no_ip) { + var __match = "inport == ${port.json_name} && eth.src == ${ps.ea}" ++ + build_port_security_ipv6_nd_flow(ps.ea, ps.ipv6_addrs) in + { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_ND), + .priority = 90, + .__match = __match, + .actions = "next;", + .external_ids = stage_hint(port.lsp._uuid)) + } + }; + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, PORT_SEC_ND), + .priority = 80, + .__match = "inport == ${port.json_name} && (arp || nd)", + .actions = "drop;", + .external_ids = stage_hint(port.lsp._uuid)) + } +} + +/* Ingress table PORT_SEC_ND and PORT_SEC_IP: Port security - IP and ND, by + * default goto next. (priority 0)*/ +for (&Switch(.ls = ls)) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PORT_SEC_ND), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, PORT_SEC_IP), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +/* Ingress table ARP_ND_RSP: ARP/ND responder, skip requests coming from + * localnet and vtep ports. (priority 100); see ovn-northd.8.xml for the + * rationale. */ +for (&SwitchPort(.lsp = lsp, .sw = &sw, .json_name = json_name) + if lsp.is_enabled() and + (lsp.__type == "localnet" or lsp.__type == "vtep")) +{ + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 100, + .__match = "inport == ${json_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) +} + +function lsp_is_up(lsp: nb::Logical_Switch_Port): bool = { + lsp.up == Some{true} +} + +/* Ingress table ARP_ND_RSP: ARP/ND responder, reply for known IPs. + * (priority 50). */ +/* Handle + * - GARPs for virtual ip which belongs to a logical port + * of type 'virtual' and bind that port. + * + * - ARP reply from the virtual ip which belongs to a logical + * port of type 'virtual' and bind that port. + * */ + Flow(.logical_datapath = sp.sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 100, + .__match = "inport == ${vp.json_name} && " + "((arp.op == 1 && arp.spa == ${virtual_ip} && arp.tpa == ${virtual_ip}) || " + "(arp.op == 2 && arp.spa == ${virtual_ip}))", + .actions = "bind_vport(${sp.json_name}, inport); next;", + .external_ids = stage_hint(lsp._uuid)) :- + sp in &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "virtual"}), + Some{var virtual_ip} = map_get(lsp.options, "virtual-ip"), + Some{var virtual_parents} = map_get(lsp.options, "virtual-parents"), + Some{var ip} = ip_parse(virtual_ip), + var vparent = FlatMap(string_split(virtual_parents, ",")), + vp in &SwitchPort(.lsp = nb::Logical_Switch_Port{.name = vparent}), + vp.sw == sp.sw. + +/* + * Add ARP/ND reply flows if either the + * - port is up and it doesn't have 'unknown' address defined or + * - port type is router or + * - port type is localport + */ +for (CheckLspIsUp[check_lsp_is_up]) { + for (SwitchPortIPv4Address(.port = &SwitchPort{.lsp = lsp, .sw = &sw, .json_name = json_name}, + .ea = ea, .addr = addr) + if lsp.is_enabled() and + ((lsp_is_up(lsp) or not check_lsp_is_up) + or lsp.__type == "router" or lsp.__type == "localport") and + lsp.__type != "external" and lsp.__type != "virtual" and + not set_contains(lsp.addresses, "unknown")) + { + var __match = "arp.tpa == ${addr.addr} && arp.op == 1" in + { + var actions = "eth.dst = eth.src; " + "eth.src = ${ea}; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = ${ea}; " + "arp.tpa = arp.spa; " + "arp.spa = ${addr.addr}; " + "outport = inport; " + "flags.loopback = 1; " + "output;" in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lsp._uuid)); + + /* Do not reply to an ARP request from the port that owns the + * address (otherwise a DHCP client that ARPs to check for a + * duplicate address will fail). Instead, forward it the usual + * way. + * + * (Another alternative would be to simply drop the packet. If + * everything is working as it is configured, then this would + * produce equivalent results, since no one should reply to the + * request. But ARPing for one's own IP address is intended to + * detect situations where the network is not working as + * configured, so dropping the request would frustrate that + * intent.) */ + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 100, + .__match = __match ++ " && inport == ${json_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) + } + } +} + +/* For ND solicitations, we need to listen for both the + * unicast IPv6 address and its all-nodes multicast address, + * but always respond with the unicast IPv6 address. */ +for (SwitchPortIPv6Address(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, + .ea = ea, .addr = addr) + if lsp.is_enabled() and + (lsp_is_up(lsp) or lsp.__type == "router" or lsp.__type == "localport") and + lsp.__type != "external" and lsp.__type != "virtual") +{ + var __match = "nd_ns && ip6.dst == {${addr.addr}, ${ipv6_netaddr_solicited_node(addr)}} && nd.target == ${addr.addr}" in + var actions = "${if (lsp.__type == \"router\") \"nd_na_router\" else \"nd_na\"} { " + "eth.src = ${ea}; " + "ip6.src = ${addr.addr}; " + "nd.target = ${addr.addr}; " + "nd.tll = ${ea}; " + "outport = inport; " + "flags.loopback = 1; " + "output; " + "};" in + { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lsp._uuid)); + + /* Do not reply to a solicitation from the port that owns the + * address (otherwise DAD detection will fail). */ + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 100, + .__match = __match ++ " && inport == ${json_name}", + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) + } +} + +/* Ingress table ARP_ND_RSP: ARP/ND responder, by default goto next. + * (priority 0)*/ +for (ls in nb::Logical_Switch) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +/* Ingress table ARP_ND_RSP: ARP/ND responder for service monitor source ip. + * (priority 110)*/ +Flow(.logical_datapath = sp.sw.ls._uuid, + .stage = switch_stage(IN, ARP_ND_RSP), + .priority = 110, + .__match = "arp.tpa == ${svc_mon_src_ip} && arp.op == 1", + .actions = "eth.dst = eth.src; " + "eth.src = ${svc_monitor_mac}; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = ${svc_monitor_mac}; " + "arp.tpa = arp.spa; " + "arp.spa = ${svc_mon_src_ip}; " + "outport = inport; " + "flags.loopback = 1; " + "output;", + .external_ids = stage_hint(lbvipbackend.lbvip.lb._uuid)) :- + LBVIPBackend[lbvipbackend], + Some{var svc_monitor} = lbvipbackend.svc_monitor, + sp in &SwitchPort( + .lsp = nb::Logical_Switch_Port{.name = svc_monitor.port_name}), + var svc_mon_src_ip = svc_monitor.src_ip, + SvcMonitorMac(svc_monitor_mac). + +function build_dhcpv4_action( + lsp_json_key: string, + dhcpv4_options: nb::DHCP_Options, + offer_ip: in_addr) : Option<(string, string, string)> = +{ + match (ip_parse_masked(dhcpv4_options.cidr)) { + Left{err} -> { + /* cidr defined is invalid */ + None + }, + Right{(var host_ip, var mask)} -> { + if (not ip_same_network((offer_ip, host_ip), mask)) { + /* the offer ip of the logical port doesn't belong to the cidr + * defined in the DHCPv4 options. + */ + None + } else { + match ((map_get(dhcpv4_options.options, "server_id"), + map_get(dhcpv4_options.options, "server_mac"), + map_get(dhcpv4_options.options, "lease_time"))) + { + (Some{var server_ip}, Some{var server_mac}, Some{var lease_time}) -> { + var options_map = dhcpv4_options.options; + + /* server_mac is not DHCPv4 option, delete it from the smap. */ + map_remove(options_map, "server_mac"); + map_insert(options_map, "netmask", "${mask}"); + + /* We're not using SMAP_FOR_EACH because we want a consistent order of the + * options on different architectures (big or little endian, SSE4.2) */ + var options = vec_empty(); + for (node in options_map) { + (var k, var v) = node; + vec_push(options, "${k} = ${v}") + }; + var options_action = "${rEGBIT_DHCP_OPTS_RESULT()} = put_dhcp_opts(offerip = ${offer_ip}, " ++ + string_join(options, ", ") ++ "); next;"; + var response_action = "eth.dst = eth.src; eth.src = ${server_mac}; " + "ip4.src = ${server_ip}; udp.src = 67; " + "udp.dst = 68; outport = inport; flags.loopback = 1; " + "output;"; + + var ipv4_addr_match = "ip4.src == ${offer_ip} && ip4.dst == {${server_ip}, 255.255.255.255}"; + Some{(options_action, response_action, ipv4_addr_match)} + }, + _ -> { + /* "server_id", "server_mac" and "lease_time" should be + * present in the dhcp_options. */ + //static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + warn("Required DHCPv4 options not defined for lport - ${lsp_json_key}"); + None + } + } + } + } + } +} + +function build_dhcpv6_action( + lsp_json_key: string, + dhcpv6_options: nb::DHCP_Options, + offer_ip: in6_addr): Option<(string, string)> = +{ + match (ipv6_parse_masked(dhcpv6_options.cidr)) { + Left{err} -> { + /* cidr defined is invalid */ + //warn("cidr is invalid - ${err}"); + None + }, + Right{(var host_ip, var mask)} -> { + if (not ipv6_same_network((offer_ip, host_ip), mask)) { + /* offer_ip doesn't belongs to the cidr defined in lport's DHCPv6 + * options.*/ + //warn("ip does not belong to cidr"); + None + } else { + /* "server_id" should be the MAC address. */ + match (map_get(dhcpv6_options.options, "server_id")) { + None -> { + warn("server_id not present in the DHCPv6 options for lport ${lsp_json_key}"); + None + }, + Some{server_mac} -> { + match (eth_addr_from_string(server_mac)) { + None -> { + warn("server_id not present in the DHCPv6 options for lport ${lsp_json_key}"); + None + }, + Some{ea} -> { + /* Get the link local IP of the DHCPv6 server from the server MAC. */ + var server_ip = ipv6_string_mapped(in6_generate_lla(ea)); + var ia_addr = ipv6_string_mapped(offer_ip); + var options = vec_empty(); + + /* Check whether the dhcpv6 options should be configured as stateful. + * Only reply with ia_addr option for dhcpv6 stateful address mode. */ + if (map_get_bool_def(dhcpv6_options.options, "dhcpv6_stateless", false) == false) { + vec_push(options, "ia_addr = ${ia_addr}") + } else (); + + /* We're not using SMAP_FOR_EACH because we want a consistent order of the + * options on different architectures (big or little endian, SSE4.2) */ + // FIXME: enumerate map in ascending order of keys. Is this good enough? + for (node in dhcpv6_options.options) { + (var k, var v) = node; + if (k != "dhcpv6_stateless") { + vec_push(options, "${k} = ${v}") + } else () + }; + + var options_action = "${rEGBIT_DHCP_OPTS_RESULT()} = put_dhcpv6_opts(" ++ + string_join(options, ", ") ++ + "); next;"; + var response_action = "eth.dst = eth.src; eth.src = ${server_mac}; " + "ip6.dst = ip6.src; ip6.src = ${server_ip}; udp.src = 547; " + "udp.dst = 546; outport = inport; flags.loopback = 1; " + "output;"; + Some{(options_action, response_action)} + } + } + } + } + } + } + } +} + +/* If 'names' has one element, returns json_string_escape() for it. + * Otherwise, returns json_string_escape() of all of its elements inside "{...}". + */ +function json_string_escape_vec(names: Vec<string>): string +{ + match ((names.len(), names.nth(0))) { + (1, Some{name}) -> json_string_escape(name), + _ -> { + var json_names = vec_with_capacity(names.len()); + for (name in names) { + json_names.push(json_string_escape(name)); + }; + "{" ++ json_names.join(", ") ++ "}" + } + } +} + +/* + * Ordinarily, returns a single match against 'lsp'. + * + * If 'lsp' is an external port, returns a match against the localnet port(s) on + * its switch along with a condition that it only operate if 'lsp' is + * chassis-resident. This makes sense as a condition for sending DHCP replies + * to external ports because only one chassis should send such a reply. + * + * Returns a prefix and a suffix string. There is no reason for this except + * that it makes it possible to exactly mimic the format used by ovn-northd.c + * so that text-based comparisons do not show differences. (This fails if + * there's more than one localnet port since the C version uses multiple flows + * in that case.) + */ +function match_dhcp_input(lsp: Ref<SwitchPort>): (string, string) = +{ + if (lsp.lsp.__type == "external" and not lsp.sw.localnet_port_names.is_empty()) { + ("inport == " ++ json_string_escape_vec(lsp.sw.localnet_port_names) ++ " && ", + " && is_chassis_resident(${lsp.json_name})") + } else { + ("inport == ${lsp.json_name} && ", "") + } +} + +/* Logical switch ingress tables DHCP_OPTIONS and DHCP_RESPONSE: DHCP options + * and response priority 100 flows. */ +for (lsp in &SwitchPort + /* Don't add the DHCP flows if the port is not enabled or if the + * port is a router port. */ + if (lsp.is_enabled() and lsp.lsp.__type != "router") + /* If it's an external port and there is no localnet port + * and if it doesn't belong to an HA chassis group ignore it. */ + and (lsp.lsp.__type != "external" + or (not lsp.sw.localnet_port_names.is_empty() + and is_some(lsp.lsp.ha_chassis_group)))) +{ + for (lps in LogicalSwitchPort(.lport = lsp.lsp._uuid, .lswitch = lsuuid)) { + var json_key = json_string_escape(lsp.lsp.name) in + (var pfx, var sfx) = match_dhcp_input(lsp) in + { + /* DHCPv4 options enabled for this port */ + Some{var dhcpv4_options_uuid} = lsp.lsp.dhcpv4_options in + { + for (dhcpv4_options in nb::DHCP_Options(._uuid = dhcpv4_options_uuid)) { + for (SwitchPortIPv4Address(.port = &SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = lsp.lsp._uuid}}, .ea = ea, .addr = addr)) { + Some{(var options_action, var response_action, var ipv4_addr_match)} = + build_dhcpv4_action(json_key, dhcpv4_options, addr.addr) in + { + var __match = + pfx ++ "eth.src == ${ea} && " + "ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && " + "udp.src == 68 && udp.dst == 67" ++ sfx + in + Flow(.logical_datapath = lsuuid, + .stage = switch_stage(IN, DHCP_OPTIONS), + .priority = 100, + .__match = __match, + .actions = options_action, + .external_ids = stage_hint(lsp.lsp._uuid)); + + /* Allow ip4.src = OFFER_IP and + * ip4.dst = {SERVER_IP, 255.255.255.255} for the below + * cases + * - When the client wants to renew the IP by sending + * the DHCPREQUEST to the server ip. + * - When the client wants to renew the IP by + * broadcasting the DHCPREQUEST. + */ + var __match = pfx ++ "eth.src == ${ea} && " + "${ipv4_addr_match} && udp.src == 68 && udp.dst == 67" ++ sfx in + Flow(.logical_datapath = lsuuid, + .stage = switch_stage(IN, DHCP_OPTIONS), + .priority = 100, + .__match = __match, + .actions = options_action, + .external_ids = stage_hint(lsp.lsp._uuid)); + + /* If REGBIT_DHCP_OPTS_RESULT is set, it means the + * put_dhcp_opts action is successful. */ + var __match = pfx ++ "eth.src == ${ea} && " + "ip4 && udp.src == 68 && udp.dst == 67 && " ++ + rEGBIT_DHCP_OPTS_RESULT() ++ sfx in + Flow(.logical_datapath = lsuuid, + .stage = switch_stage(IN, DHCP_RESPONSE), + .priority = 100, + .__match = __match, + .actions = response_action, + .external_ids = stage_hint(lsp.lsp._uuid)) + // FIXME: is there a constraint somewhere that guarantees that build_dhcpv4_action + // returns Some() for at most 1 address in lsp_addrs? Otherwise, simulate this break + // by computing an aggregate that returns the first element of a group. + //break; + } + } + } + }; + + /* DHCPv6 options enabled for this port */ + Some{var dhcpv6_options_uuid} = lsp.lsp.dhcpv6_options in + { + for (dhcpv6_options in nb::DHCP_Options(._uuid = dhcpv6_options_uuid)) { + for (SwitchPortIPv6Address(.port = &SwitchPort{.lsp = nb::Logical_Switch_Port{._uuid = lsp.lsp._uuid}}, .ea = ea, .addr = addr)) { + Some{(var options_action, var response_action)} = + build_dhcpv6_action(json_key, dhcpv6_options, addr.addr) in + { + var __match = pfx ++ "eth.src == ${ea}" + " && ip6.dst == ff02::1:2 && udp.src == 546 &&" + " udp.dst == 547" ++ sfx in + { + Flow(.logical_datapath = lsuuid, + .stage = switch_stage(IN, DHCP_OPTIONS), + .priority = 100, + .__match = __match, + .actions = options_action, + .external_ids = stage_hint(lsp.lsp._uuid)); + + /* If REGBIT_DHCP_OPTS_RESULT is set to 1, it means the + * put_dhcpv6_opts action is successful */ + Flow(.logical_datapath = lsuuid, + .stage = switch_stage(IN, DHCP_RESPONSE), + .priority = 100, + .__match = __match ++ " && ${rEGBIT_DHCP_OPTS_RESULT()}", + .actions = response_action, + .external_ids = stage_hint(lsp.lsp._uuid)) + // FIXME: is there a constraint somewhere that guarantees that build_dhcpv4_action + // returns Some() for at most 1 address in lsp_addrs? Otherwise, simulate this breaks + // by computing an aggregate that returns the first element of a group. + //break; + } + } + } + } + } + } + } +} + +/* Logical switch ingress tables DNS_LOOKUP and DNS_RESPONSE: DNS lookup and + * response priority 100 flows. + */ +for (LogicalSwitchHasDNSRecords(ls, true)) +{ + Flow(.logical_datapath = ls, + .stage = switch_stage(IN, DNS_LOOKUP), + .priority = 100, + .__match = "udp.dst == 53", + .actions = "${rEGBIT_DNS_LOOKUP_RESULT()} = dns_lookup(); next;", + .external_ids = map_empty()); + + var action = "eth.dst <-> eth.src; ip4.src <-> ip4.dst; " + "udp.dst = udp.src; udp.src = 53; outport = inport; " + "flags.loopback = 1; output;" in + Flow(.logical_datapath = ls, + .stage = switch_stage(IN, DNS_RESPONSE), + .priority = 100, + .__match = "udp.dst == 53 && ${rEGBIT_DNS_LOOKUP_RESULT()}", + .actions = action, + .external_ids = map_empty()); + + var action = "eth.dst <-> eth.src; ip6.src <-> ip6.dst; " + "udp.dst = udp.src; udp.src = 53; outport = inport; " + "flags.loopback = 1; output;" in + Flow(.logical_datapath = ls, + .stage = switch_stage(IN, DNS_RESPONSE), + .priority = 100, + .__match = "udp.dst == 53 && ${rEGBIT_DNS_LOOKUP_RESULT()}", + .actions = action, + .external_ids = map_empty()) +} + +/* Ingress table DHCP_OPTIONS and DHCP_RESPONSE: DHCP options and response, by + * default goto next. (priority 0). + * + * Ingress table DNS_LOOKUP and DNS_RESPONSE: DNS lookup and response, by + * default goto next. (priority 0). + + * Ingress table EXTERNAL_PORT - External port handling, by default goto next. + * (priority 0). */ +for (ls in nb::Logical_Switch) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, DHCP_OPTIONS), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, DHCP_RESPONSE), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, DNS_LOOKUP), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, DNS_RESPONSE), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, EXTERNAL_PORT), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 110, + .__match = "eth.dst == $svc_monitor_mac", + .actions = "handle_svc_check(inport);", + .external_ids = map_empty()) :- + sw in &Switch(). + +for (sw in &Switch(.ls = ls, .mcast_cfg = &mcast_cfg) + if (mcast_cfg.enabled)) { + for (SwitchMcastFloodRelayPorts(sw, relay_ports)) { + for (SwitchMcastFloodReportPorts(sw, flood_report_ports)) { + for (SwitchMcastFloodPorts(sw, flood_ports)) { + var flood_relay = not set_is_empty(relay_ports) in + var flood_reports = not set_is_empty(flood_report_ports) in + var flood_static = not set_is_empty(flood_ports) in + var igmp_act = { + if (flood_reports) { + var mrouter_static = json_string_escape(mC_MROUTER_STATIC().0); + "clone { " + "outport = ${mrouter_static}; " + "output; " + "};igmp;" + } else { + "igmp;" + } + } in { + /* Punt IGMP traffic to controller. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 100, + .__match = "ip4 && ip.proto == 2", + .actions = "${igmp_act}", + .external_ids = map_empty()); + + /* Punt MLD traffic to controller. */ + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 100, + .__match = "mldv1 || mldv2", + .actions = "${igmp_act}", + .external_ids = map_empty()); + + /* Flood all IP multicast traffic destined to 224.0.0.X to + * all ports - RFC 4541, section 2.1.2, item 2. + */ + var flood = json_string_escape(mC_FLOOD().0) in + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 85, + .__match = "ip4.mcast && ip4.dst == 224.0.0.0/24", + .actions = "outport = ${flood}; output;", + .external_ids = map_empty()); + + /* Flood all IPv6 multicast traffic destined to reserved + * multicast IPs (RFC 4291, 2.7.1). + */ + var flood = json_string_escape(mC_FLOOD().0) in + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 85, + .__match = "ip6.mcast_flood", + .actions = "outport = ${flood}; output;", + .external_ids = map_empty()); + + /* Forward uregistered IP multicast to routers with relay + * enabled and to any ports configured to flood IP + * multicast traffic. If configured to flood unregistered + * traffic this will be handled by the L2 multicast flow. + */ + if (not mcast_cfg.flood_unreg) { + var relay_act = { + if (flood_relay) { + var rtr_flood = json_string_escape(mC_MROUTER_FLOOD().0); + "clone { " + "outport = ${rtr_flood}; " + "output; " + "}; " + } else { + "" + } + } in + var static_act = { + if (flood_static) { + var mc_static = json_string_escape(mC_STATIC().0); + "outport =${mc_static}; output;" + } else { + "" + } + } in + var drop_act = { + if (not flood_relay and not flood_static) { + "drop;" + } else { + "" + } + } in + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 80, + .__match = "ip4.mcast || ip6.mcast", + .actions = + "${relay_act}${static_act}${drop_act}", + .external_ids = map_empty()) + } + } + } + } + } +} + +/* Ingress table L2_LKUP: Add IP multicast flows learnt from IGMP/MLD (priority + * 90). */ +for (IgmpSwitchMulticastGroup(.address = address, .switch = &sw)) { + /* RFC 4541, section 2.1.2, item 2: Skip groups in the 224.0.0.X + * range. + * + * RFC 4291, section 2.7.1: Skip groups that correspond to all + * hosts. + */ + Some{var ip} = ip46_parse(address) in + (var skip_address) = match (ip) { + IPv4{ipv4} -> ip_is_local_multicast(ipv4), + IPv6{ipv6} -> ipv6_is_all_hosts(ipv6) + } in + var ipX = ip46_ipX(ip) in + for (SwitchMcastFloodRelayPorts(&sw, relay_ports) if not skip_address) { + for (SwitchMcastFloodPorts(&sw, flood_ports)) { + var flood_relay = not set_is_empty(relay_ports) in + var flood_static = not set_is_empty(flood_ports) in + var mc_rtr_flood = json_string_escape(mC_MROUTER_FLOOD().0) in + var mc_static = json_string_escape(mC_STATIC().0) in + var relay_act = { + if (flood_relay) { + "clone { " + "outport = ${mc_rtr_flood}; output; " + "};" + } else { + "" + } + } in + var static_act = { + if (flood_static) { + "clone { " + "outport =${mc_static}; " + "output; " + "};" + } else { + "" + } + } in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 90, + .__match = "eth.mcast && ${ipX} && ${ipX}.dst == ${address}", + .actions = + "${relay_act} ${static_act} outport = \"${address}\"; " + "output;", + .external_ids = map_empty()) + } + } +} + +/* Table EXTERNAL_PORT: External port. Drop ARP request for router ips from + * external ports on chassis not binding those ports. This makes the router + * pipeline to be run only on the chassis binding the external ports. + * + * For an external port X on logical switch LS, if X is not resident on this + * chassis, drop ARP requests arriving on localnet ports from X's Ethernet + * address, if the ARP request is asking to translate the IP address of a + * router port on LS. */ +Flow(.logical_datapath = sp.sw.ls._uuid, + .stage = switch_stage(IN, EXTERNAL_PORT), + .priority = 100, + .__match = ("inport == ${json_string_escape(localnet_port_name)} && " + "eth.src == ${lp_addr.ea} && " + "!is_chassis_resident(${sp.json_name}) && " + "arp.tpa == ${rp_addr.addr} && arp.op == 1"), + .actions = "drop;", + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(), + sp.lsp.__type == "external", + var localnet_port_name = FlatMap(sp.sw.localnet_port_names), + var lp_addr = FlatMap(sp.static_addresses), + rp in &SwitchPort(.sw = sp.sw), + rp.lsp.__type == "router", + SwitchPortIPv4Address(.port = rp, .addr = rp_addr). +Flow(.logical_datapath = sp.sw.ls._uuid, + .stage = switch_stage(IN, EXTERNAL_PORT), + .priority = 100, + .__match = ("inport == ${json_string_escape(localnet_port_name)} && " + "eth.src == ${lp_addr.ea} && " + "!is_chassis_resident(${sp.json_name}) && " + "nd_ns && ip6.dst == {${rp_addr.addr}, ${ipv6_netaddr_solicited_node(rp_addr)}} && " + "nd.target == ${rp_addr.addr}"), + .actions = "drop;", + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(), + sp.lsp.__type == "external", + var localnet_port_name = FlatMap(sp.sw.localnet_port_names), + var lp_addr = FlatMap(sp.static_addresses), + rp in &SwitchPort(.sw = sp.sw), + rp.lsp.__type == "router", + SwitchPortIPv6Address(.port = rp, .addr = rp_addr). +Flow(.logical_datapath = sp.sw.ls._uuid, + .stage = switch_stage(IN, EXTERNAL_PORT), + .priority = 100, + .__match = ("inport == ${json_string_escape(localnet_port_name)} && " + "eth.src == ${lp_addr.ea} && " + "eth.dst == ${ea} && " + "!is_chassis_resident(${sp.json_name})"), + .actions = "drop;", + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(), + sp.lsp.__type == "external", + var localnet_port_name = FlatMap(sp.sw.localnet_port_names), + var lp_addr = FlatMap(sp.static_addresses), + rp in &SwitchPort(.sw = sp.sw), + rp.lsp.__type == "router", + SwitchPortAddresses(.port = rp, .addrs = LPortAddress{.ea = ea}). + +/* Ingress table L2_LKUP: Destination lookup, broadcast and multicast handling + * (priority 100). */ +for (ls in nb::Logical_Switch) { + var mc_flood = json_string_escape(mC_FLOOD().0) in + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 70, + .__match = "eth.mcast", + .actions = "outport = ${mc_flood}; output;", + .external_ids = map_empty()) +} + +/* Ingress table L2_LKUP: Destination lookup, unicast handling (priority 50). +*/ +for (SwitchPortStaticAddresses(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, + .addrs = addrs) + if lsp.__type != "external") { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 50, + .__match = "eth.dst == ${addrs.ea}", + .actions = "outport = ${json_name}; output;", + .external_ids = stage_hint(lsp._uuid)) +} + +/* + * Ingress table L2_LKUP: Flows that flood self originated ARP/ND packets in the + * switching domain. + */ +/* Self originated ARP requests/ND need to be flooded to the L2 domain + * (except on router ports). Determine that packets are self originated + * by also matching on source MAC. Matching on ingress port is not + * reliable in case this is a VLAN-backed network. + * Priority: 75. + */ + +/* Returns 'true' if the IP 'addr' is on the same subnet with one of the + * IPs configured on the router port. + */ +function lrouter_port_ip_reachable(rp: Ref<RouterPort>, addr: v46_ip): bool { + match (addr) { + IPv4{ipv4} -> { + for (na in rp.networks.ipv4_addrs) { + if (ip_same_network((ipv4, na.addr), ipv4_netaddr_mask(na))) { + return true + } + } + }, + IPv6{ipv6} -> { + for (na in rp.networks.ipv6_addrs) { + if (ipv6_same_network((ipv6, na.addr), ipv6_netaddr_mask(na))) { + return true + } + } + } + }; + false +} +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 75, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(.sw = sw, .peer = Some{rp}), + rp.is_enabled(), + var eth_src_set = { + var eth_src_set = set_singleton("${rp.networks.ea}"); + for (nat in rp.router.nats) { + match (nat.nat.external_mac) { + Some{mac} -> + if (lrouter_port_ip_reachable(rp, nat.external_ip)) { + set_insert(eth_src_set, mac) + } else (), + _ -> () + } + }; + eth_src_set + }, + var eth_src = "{" ++ string_join(eth_src_set.to_vec(), ", ") ++ "}", + var __match = "eth.src == ${eth_src} && (arp.op == 1 || nd_ns)", + var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0), + var actions = "outport = ${mc_flood_l2}; output;". + +/* Forward ARP requests for owned IP addresses (L3, VIP, NAT) only to this + * router port. + * Priority: 80. + */ +function get_arp_forward_ips(rp: Ref<RouterPort>): (Set<string>, Set<string>) = { + var all_ips_v4 = set_empty(); + var all_ips_v6 = set_empty(); + + (var lb_ips_v4, var lb_ips_v6) + = get_router_load_balancer_ips(deref(rp.router)); + for (a in lb_ips_v4) { + /* Check if the ovn port has a network configured on which we could + * expect ARP requests for the LB VIP. + */ + match (ip_parse(a)) { + Some{ipv4} -> if (lrouter_port_ip_reachable(rp, IPv4{ipv4})) { + set_insert(all_ips_v4, a) + }, + _ -> () + } + }; + for (a in lb_ips_v6) { + /* Check if the ovn port has a network configured on which we could + * expect NS requests for the LB VIP. + */ + match (ipv6_parse(a)) { + Some{ipv6} -> if (lrouter_port_ip_reachable(rp, IPv6{ipv6})) { + set_insert(all_ips_v6, a) + }, + _ -> () + } + }; + + for (nat in rp.router.nats) { + if (nat.nat.__type != "snat") { + /* Check if the ovn port has a network configured on which we could + * expect ARP requests/NS for the DNAT external_ip. + */ + if (lrouter_port_ip_reachable(rp, nat.external_ip)) { + match (nat.external_ip) { + IPv4{_} -> set_insert(all_ips_v4, nat.nat.external_ip), + IPv6{_} -> set_insert(all_ips_v6, nat.nat.external_ip) + } + } + } + }; + + for (a in rp.networks.ipv4_addrs) { + set_insert(all_ips_v4, "${a.addr}") + }; + for (a in rp.networks.ipv6_addrs) { + set_insert(all_ips_v6, "${a.addr}") + }; + + (all_ips_v4, all_ips_v6) +} +/* Packets received from VXLAN tunnels have already been through the + * router pipeline so we should skip them. Normally this is done by the + * multicast_group implementation (VXLAN packets skip table 32 which + * delivers to patch ports) but we're bypassing multicast_groups. + * (This is why we match against fLAGBIT_NOT_VXLAN() here.) + */ +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 80, + .__match = fLAGBIT_NOT_VXLAN() ++ + " && arp.op == 1 && arp.tpa == { " ++ + string_join(set_to_vec(all_ips_v4), ", ") ++ "}", + .actions = if (sw.has_non_router_port) { + "clone {outport = ${sp.json_name}; output; }; " + "outport = ${mc_flood_l2}; output;" + } else { + "outport = ${sp.json_name}; output;" + }, + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(.sw = sw, .peer = Some{rp}), + rp.is_enabled(), + (var all_ips_v4, _) = get_arp_forward_ips(rp), + not set_is_empty(all_ips_v4), + var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0). +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 80, + .__match = fLAGBIT_NOT_VXLAN() ++ + " && nd_ns && nd.target == { " ++ + string_join(set_to_vec(all_ips_v6), ", ") ++ "}", + .actions = if (sw.has_non_router_port) { + "clone {outport = ${sp.json_name}; output; }; " + "outport = ${mc_flood_l2}; output;" + } else { + "outport = ${sp.json_name}; output;" + }, + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(.sw = sw, .peer = Some{rp}), + rp.is_enabled(), + (_, var all_ips_v6) = get_arp_forward_ips(rp), + not set_is_empty(all_ips_v6), + var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0). + +for (SwitchPortNewDynamicAddress(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, + .address = Some{addrs}) + if lsp.__type != "external") { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 50, + .__match = "eth.dst == ${addrs.ea}", + .actions = "outport = ${json_name}; output;", + .external_ids = stage_hint(lsp._uuid)) +} + +for (&SwitchPort(.lsp = lsp, + .json_name = json_name, + .sw = &sw, + .peer = Some{&RouterPort{.lrp = lrp, + .is_redirect = is_redirect, + .router = &Router{.lr = lr, + .redirect_port_name = redirect_port_name}}}) + if (set_contains(lsp.addresses, "router") and lsp.__type != "external")) +{ + Some{var mac} = scan_eth_addr(lrp.mac) in { + var add_chassis_resident_check = + not sw.localnet_port_names.is_empty() and + (/* The peer of this port represents a distributed + * gateway port. The destination lookup flow for the + * router's distributed gateway port MAC address should + * only be programmed on the "redirect-chassis". */ + is_redirect or + /* Check if the option 'reside-on-redirect-chassis' + * is set to true on the peer port. If set to true + * and if the logical switch has a localnet port, it + * means the router pipeline for the packets from + * this logical switch should be run on the chassis + * hosting the gateway port. + */ + map_get_bool_def(lrp.options, "reside-on-redirect-chassis", false)) in + var __match = if (add_chassis_resident_check) { + /* The destination lookup flow for the router's + * distributed gateway port MAC address should only be + * programmed on the "redirect-chassis". */ + "eth.dst == ${mac} && is_chassis_resident(${redirect_port_name})" + } else { + "eth.dst == ${mac}" + } in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 50, + .__match = __match, + .actions = "outport = ${json_name}; output;", + .external_ids = stage_hint(lsp._uuid)); + + /* Add ethernet addresses specified in NAT rules on + * distributed logical routers. */ + if (is_redirect) { + for (LogicalRouterNAT(.lr = lr._uuid, .nat = nat)) { + if (nat.nat.__type == "dnat_and_snat") { + Some{var lport} = nat.nat.logical_port in + Some{var emac} = nat.nat.external_mac in + Some{var nat_mac} = eth_addr_from_string(emac) in + var __match = "eth.dst == ${nat_mac} && is_chassis_resident(${json_string_escape(lport)})" in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 50, + .__match = __match, + .actions = "outport = ${json_name}; output;", + .external_ids = stage_hint(nat.nat._uuid)) + } + } + } + } +} +// FIXME: do we care about this? +/* } else { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + + VLOG_INFO_RL(&rl, + "%s: invalid syntax '%s' in addresses column", + op->nbsp->name, op->nbsp->addresses[i]); + }*/ + +/* Ingress table L2_LKUP: Destination lookup for unknown MACs (priority 0). */ +for (LogicalSwitchUnknownPorts(.ls = ls_uuid)) { + var mc_unknown = json_string_escape(mC_UNKNOWN().0) in + Flow(.logical_datapath = ls_uuid, + .stage = switch_stage(IN, L2_LKUP), + .priority = 0, + .__match = "1", + .actions = "outport = ${mc_unknown}; output;", + .external_ids = map_empty()) +} + +/* Egress tables PORT_SEC_IP: Egress port security - IP (priority 0) + * Egress table PORT_SEC_L2: Egress port security L2 - multicast/broadcast (priority 100). */ +for (&Switch(.ls = ls)) { + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PORT_SEC_IP), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = ls._uuid, + .stage = switch_stage(OUT, PORT_SEC_L2), + .priority = 100, + .__match = "eth.mcast", + .actions = "output;", + .external_ids = map_empty()) +} + +/* Egress table PORT_SEC_IP: Egress port security - IP (priorities 90 and 80) + * if port security enabled. + * + * Egress table PORT_SEC_L2: Egress port security - L2 (priorities 50 and 150). + * + * Priority 50 rules implement port security for enabled logical port. + * + * Priority 150 rules drop packets to disabled logical ports, so that they + * don't even receive multicast or broadcast packets. */ +Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, PORT_SEC_L2), + .priority = 50, + .__match = __match, + .actions = queue_action ++ "output;", + .external_ids = stage_hint(lsp._uuid)) :- + &SwitchPort(.sw = &sw, .lsp = lsp, .json_name = json_name, .ps_eth_addresses = ps_eth_addresses), + lsp.is_enabled(), + lsp.__type != "external", + var __match = if (vec_is_empty(ps_eth_addresses)) { + "outport == ${json_name}" + } else { + "outport == ${json_name} && eth.dst == {${ps_eth_addresses.join(\" \")}}" + }, + pbinding in sb::Out_Port_Binding(.logical_port = lsp.name), + var queue_action = match ((lsp.__type, + map_get(pbinding.options, "qdisc_queue_id"))) { + ("localnet", Some{queue_id}) -> "set_queue(${queue_id});", + _ -> "" + }. + +for (&SwitchPort(.lsp = lsp, .json_name = json_name, .sw = &sw) + if not lsp.is_enabled() and lsp.__type != "external") { + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, PORT_SEC_L2), + .priority = 150, + .__match = "outport == {$json_name}", + .actions = "drop;", + .external_ids = stage_hint(lsp._uuid)) +} + +for (SwitchPortPSAddresses(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = &sw}, + .ps_addrs = ps) + if (vec_len(ps.ipv4_addrs) > 0 or vec_len(ps.ipv6_addrs) > 0) + and lsp.__type != "external") +{ + if (vec_len(ps.ipv4_addrs) > 0) { + var addrs = { + var addrs = vec_empty(); + for (addr in ps.ipv4_addrs) { + /* When the netmask is applied, if the host portion is + * non-zero, the host can only use the specified + * address. If zero, the host is allowed to use any + * address in the subnet. + */ + vec_push(addrs, ipv4_netaddr_match_host_or_network(addr)); + if (addr.plen < 32 and not ip_is_zero(ipv4_netaddr_host(addr))) { + vec_push(addrs, "${ipv4_netaddr_bcast(addr)}") + } + }; + addrs + } in + var __match = + "outport == ${json_name} && eth.dst == ${ps.ea} && ip4.dst == {255.255.255.255, 224.0.0.0/4, " ++ + string_join(addrs, ", ") ++ "}" in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, PORT_SEC_IP), + .priority = 90, + .__match = __match, + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) + }; + if (vec_len(ps.ipv6_addrs) > 0) { + var __match = "outport == ${json_name} && eth.dst == ${ps.ea}" ++ + build_port_security_ipv6_flow(OUT, ps.ea, ps.ipv6_addrs) in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, PORT_SEC_IP), + .priority = 90, + .__match = __match, + .actions = "next;", + .external_ids = stage_hint(lsp._uuid)) + }; + var __match = "outport == ${json_name} && eth.dst == ${ps.ea} && ip" in + Flow(.logical_datapath = sw.ls._uuid, + .stage = switch_stage(OUT, PORT_SEC_IP), + .priority = 80, + .__match = __match, + .actions = "drop;", + .external_ids = stage_hint(lsp._uuid)) +} + +/* Logical router ingress table ADMISSION: Admission control framework. */ +for (&Router(.lr = lr)) { + /* Logical VLANs not supported. + * Broadcast/multicast source address is invalid. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ADMISSION), + .priority = 100, + .__match = "vlan.present || eth.src[40]", + .actions = "drop;", + .external_ids = map_empty()) +} + +/* Logical router ingress table ADMISSION: match (priority 50). */ +for (&RouterPort(.lrp = lrp, + .json_name = json_name, + .networks = lrp_networks, + .router = &router, + .is_redirect = is_redirect) + /* Drop packets from disabled logical ports (since logical flow + * tables are default-drop). */ + if lrp.is_enabled()) +{ + //if (op->derived) { + // /* No ingress packets should be received on a chassisredirect + // * port. */ + // continue; + //} + + /* Store the ethernet address of the port receiving the packet. + * This will save us from having to match on inport further down in + * the pipeline. + */ + var actions = "${rEG_INPORT_ETH_ADDR()} = ${lrp_networks.ea}; next;" in { + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ADMISSION), + .priority = 50, + .__match = "eth.mcast && inport == ${json_name}", + .actions = actions, + .external_ids = stage_hint(lrp._uuid)); + + var __match = + "eth.dst == ${lrp_networks.ea} && inport == ${json_name}" ++ + if is_redirect { + /* Traffic with eth.dst = l3dgw_port->lrp_networks.ea + * should only be received on the "redirect-chassis". */ + " && is_chassis_resident(${json_string_escape(chassis_redirect_name(lrp.name))})" + } else { "" } in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ADMISSION), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lrp._uuid)) + } +} + + +/* Logical router ingress table LOOKUP_NEIGHBOR and + * table LEARN_NEIGHBOR. */ +/* Learn MAC bindings from ARP/IPv6 ND. + * + * For ARP packets, table LOOKUP_NEIGHBOR does a lookup for the + * (arp.spa, arp.sha) in the mac binding table using the 'lookup_arp' + * action and stores the result in REGBIT_LOOKUP_NEIGHBOR_RESULT bit. + * If "always_learn_from_arp_request" is set to false, it will also + * lookup for the (arp.spa) in the mac binding table using the + * "lookup_arp_ip" action for ARP request packets, and stores the + * result in REGBIT_LOOKUP_NEIGHBOR_IP_RESULT bit; or set that bit + * to "1" directly for ARP response packets. + * + * For IPv6 ND NA packets, table LOOKUP_NEIGHBOR does a lookup + * for the (nd.target, nd.tll) in the mac binding table using the + * 'lookup_nd' action and stores the result in + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. If + * "always_learn_from_arp_request" is set to false, + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT bit is set. + * + * For IPv6 ND NS packets, table LOOKUP_NEIGHBOR does a lookup + * for the (ip6.src, nd.sll) in the mac binding table using the + * 'lookup_nd' action and stores the result in + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. If + * "always_learn_from_arp_request" is set to false, it will also lookup + * for the (ip6.src) in the mac binding table using the "lookup_nd_ip" + * action and stores the result in REGBIT_LOOKUP_NEIGHBOR_IP_RESULT + * bit. + * + * Table LEARN_NEIGHBOR learns the mac-binding using the action + * - 'put_arp/put_nd'. Learning mac-binding is skipped if + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit is set or + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT is not set. + * + * */ + +/* Flows for LOOKUP_NEIGHBOR. */ +for (&Router(.lr = lr, .learn_from_arp_request = learn_from_arp_request)) +var rLNR = rEGBIT_LOOKUP_NEIGHBOR_RESULT() in +var rLNIR = rEGBIT_LOOKUP_NEIGHBOR_IP_RESULT() in +{ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LOOKUP_NEIGHBOR), + .priority = 100, + .__match = "arp.op == 2", + .actions = + "${rLNR} = lookup_arp(inport, arp.spa, arp.sha); " ++ + { if (learn_from_arp_request) "" else "${rLNIR} = 1; " } ++ + "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LOOKUP_NEIGHBOR), + .priority = 100, + .__match = "nd_na", + .actions = + "${rLNR} = lookup_nd(inport, nd.target, nd.tll); " ++ + { if (learn_from_arp_request) "" else "${rLNIR} = 1; " } ++ + "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LOOKUP_NEIGHBOR), + .priority = 100, + .__match = "nd_ns", + .actions = + "${rLNR} = lookup_nd(inport, ip6.src, nd.sll); " ++ + { if (learn_from_arp_request) "" else + "${rLNIR} = lookup_nd_ip(inport, ip6.src); " } ++ + "next;", + .external_ids = map_empty()); + + /* For other packet types, we can skip neighbor learning. + * So set REGBIT_LOOKUP_NEIGHBOR_RESULT to 1. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LOOKUP_NEIGHBOR), + .priority = 0, + .__match = "1", + .actions = "${rLNR} = 1; next;", + .external_ids = map_empty()); + + /* Flows for LEARN_NEIGHBOR. */ + /* Skip Neighbor learning if not required. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LEARN_NEIGHBOR), + .priority = 100, + .__match = + "${rLNR} == 1" ++ + { if (learn_from_arp_request) "" else " || ${rLNIR} == 0" }, + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LEARN_NEIGHBOR), + .priority = 90, + .__match = "arp", + .actions = "put_arp(inport, arp.spa, arp.sha); next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LEARN_NEIGHBOR), + .priority = 90, + .__match = "arp", + .actions = "put_arp(inport, arp.spa, arp.sha); next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LEARN_NEIGHBOR), + .priority = 90, + .__match = "nd_na", + .actions = "put_nd(inport, nd.target, nd.tll); next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LEARN_NEIGHBOR), + .priority = 90, + .__match = "nd_ns", + .actions = "put_nd(inport, ip6.src, nd.sll); next;", + .external_ids = map_empty()) +} + +/* Check if we need to learn mac-binding from ARP requests. */ +for (RouterPortNetworksIPv4Addr(rp@&RouterPort{.router = router}, addr)) { + var is_l3dgw_port = match (router.l3dgw_port) { + Some{l3dgw_lrp} -> l3dgw_lrp._uuid == rp.lrp._uuid, + None -> false + } in + var has_redirect_port = router.redirect_port_name != "" in + var chassis_residence = match (is_l3dgw_port and has_redirect_port) { + true -> " && is_chassis_resident(${router.redirect_port_name})", + false -> "" + } in + var rLNR = rEGBIT_LOOKUP_NEIGHBOR_RESULT() in + var rLNIR = rEGBIT_LOOKUP_NEIGHBOR_IP_RESULT() in + var match0 = "inport == ${rp.json_name} && " + "arp.spa == ${ipv4_netaddr_match_network(addr)}" in + var match1 = "arp.op == 1" ++ chassis_residence in + var learn_from_arp_request = router.learn_from_arp_request in { + if (not learn_from_arp_request) { + /* ARP request to this address should always get learned, + * so add a priority-110 flow to set + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT to 1. */ + var __match = [match0, "arp.tpa == ${addr.addr}", match1] in + var actions = "${rLNR} = lookup_arp(inport, arp.spa, arp.sha); " + "${rLNIR} = 1; " + "next;" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, LOOKUP_NEIGHBOR), + .priority = 110, + .__match = __match.join(" && "), + .actions = actions, + .external_ids = stage_hint(rp.lrp._uuid)) + }; + + var actions = "${rLNR} = lookup_arp(inport, arp.spa, arp.sha); " ++ + { if (learn_from_arp_request) "" else + "${rLNIR} = lookup_arp_ip(inport, arp.spa); " } ++ + "next;" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, LOOKUP_NEIGHBOR), + .priority = 100, + .__match = "${match0} && ${match1}", + .actions = actions, + .external_ids = stage_hint(rp.lrp._uuid)) + } +} + + +/* Logical router ingress table IP_INPUT: IP Input. */ +for (router in &Router(.lr = lr, .mcast_cfg = &mcast_cfg)) { + /* L3 admission control: drop multicast and broadcast source, localhost + * source or destination, and zero network source or destination + * (priority 100). */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 100, + .__match = "ip4.src_mcast ||" + "ip4.src == 255.255.255.255 || " + "ip4.src == 127.0.0.0/8 || " + "ip4.dst == 127.0.0.0/8 || " + "ip4.src == 0.0.0.0/8 || " + "ip4.dst == 0.0.0.0/8", + .actions = "drop;", + .external_ids = map_empty()); + + /* Drop ARP packets (priority 85). ARP request packets for router's own + * IPs are handled with priority-90 flows. + * Drop IPv6 ND packets (priority 85). ND NA packets for router's own + * IPs are handled with priority-90 flows. + */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 85, + .__match = "arp || nd", + .actions = "drop;", + .external_ids = map_empty()); + + /* Allow IPv6 multicast traffic that's supposed to reach the + * router pipeline (e.g., router solicitations). + */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 84, + .__match = "nd_rs || nd_ra", + .actions = "next;", + .external_ids = map_empty()); + + /* Drop other reserved multicast. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 83, + .__match = "ip6.mcast_rsvd", + .actions = "drop;", + .external_ids = map_empty()); + + /* Allow other multicast if relay enabled (priority 82). */ + var mcast_action = { if (mcast_cfg.relay) { "next;" } else { "drop;" } } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 82, + .__match = "ip4.mcast || ip6.mcast", + .actions = mcast_action, + .external_ids = map_empty()); + + /* Drop Ethernet local broadcast. By definition this traffic should + * not be forwarded.*/ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 50, + .__match = "eth.bcast", + .actions = "drop;", + .external_ids = map_empty()); + + /* TTL discard */ + Flow( + .logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 30, + .__match = "ip4 && ip.ttl == {0, 1}", + .actions = "drop;", + .external_ids = map_empty()); + + /* Pass other traffic not already handled to the next table for + * routing. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +function format_v4_networks(networks: lport_addresses, add_bcast: bool): string = +{ + var addrs = vec_empty(); + for (addr in networks.ipv4_addrs) { + vec_push(addrs, "${addr.addr}"); + if (add_bcast) { + vec_push(addrs, "${ipv4_netaddr_bcast(addr)}") + } else () + }; + if (vec_len(addrs) == 1) { + string_join(addrs , ", ") + } else { + "{" ++ string_join(addrs , ", ") ++ "}" + } +} + +function format_v6_networks(networks: lport_addresses): string = +{ + var addrs = vec_empty(); + for (addr in networks.ipv6_addrs) { + vec_push(addrs, "${addr.addr}") + }; + if (vec_len(addrs) == 1) { + string_join(addrs, ", ") + } else { + "{" ++ string_join(addrs , ", ") ++ "}" + } +} + +/* The following relation is used in ARP reply flow generation to determine whether + * the is_chassis_resident check must be added to the flow. + */ +relation AddChassisResidentCheck_(lrp: uuid, add_check: bool) + +AddChassisResidentCheck_(lrp._uuid, res) :- + &SwitchPort(.peer = Some{&RouterPort{.lrp = lrp, .router = &router, .is_redirect = is_redirect}}, + .sw = sw), + is_some(router.l3dgw_port), + not sw.localnet_port_names.is_empty(), + var res = if (is_redirect) { + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea + * should only be sent from the "redirect-chassis", so that + * upstream MAC learning points to the "redirect-chassis". + * Also need to avoid generation of multiple ARP responses + * from different chassis. */ + true + } else { + /* Check if the option 'reside-on-redirect-chassis' + * is set to true on the router port. If set to true + * and if peer's logical switch has a localnet port, it + * means the router pipeline for the packets from + * peer's logical switch is be run on the chassis + * hosting the gateway port and it should reply to the + * ARP requests for the router port IPs. + */ + map_get_bool_def(lrp.options, "reside-on-redirect-chassis", false) + }. + + +relation AddChassisResidentCheck(lrp: uuid, add_check: bool) + +AddChassisResidentCheck(lrp, add_check) :- + AddChassisResidentCheck_(lrp, add_check). + +AddChassisResidentCheck(lrp, false) :- + nb::Logical_Router_Port(._uuid = lrp), + not AddChassisResidentCheck_(lrp, _). + + +function get_force_snat_ip(lr: nb::Logical_Router, key_type: string): Set<v46_ip> = +{ + var ips = set_empty(); + match (map_get(lr.options, key_type ++ "_force_snat_ip")) { + None -> (), + Some{s} -> { + for (token in s.split(" ")) { + match (ip46_parse(token)) { + Some{ip} -> set_insert(ips, ip), + _ -> () // XXX warn + } + }; + } + }; + ips +} + +function has_force_snat_ip(lr: nb::Logical_Router, key_type: string): bool { + not get_force_snat_ip(lr, key_type).is_empty() +} + +/* Logical router ingress table IP_INPUT: IP Input for IPv4. */ +for (&RouterPort(.router = &router, .networks = networks, .lrp = lrp) + if (not vec_is_empty(networks.ipv4_addrs))) +{ + /* L3 admission control: drop packets that originate from an + * IPv4 address owned by the router or a broadcast address + * known to the router (priority 100). */ + var __match = "ip4.src == " ++ + format_v4_networks(networks, true) ++ + " && ${rEGBIT_EGRESS_LOOPBACK()} == 0" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 100, + .__match = __match, + .actions = "drop;", + .external_ids = stage_hint(lrp._uuid)); + + /* ICMP echo reply. These flows reply to ICMP echo requests + * received for the router's IP address. Since packets only + * get here as part of the logical router datapath, the inport + * (i.e. the incoming locally attached net) does not matter. + * The ip.ttl also does not matter (RFC1812 section 4.2.2.9) */ + var __match = "ip4.dst == " ++ + format_v4_networks(networks, false) ++ + " && icmp4.type == 8 && icmp4.code == 0" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 90, + .__match = __match, + .actions = "ip4.dst <-> ip4.src; " + "ip.ttl = 255; " + "icmp4.type = 0; " + "flags.loopback = 1; " + "next; ", + .external_ids = stage_hint(lrp._uuid)) +} + +/* Priority-90-92 flows handle ARP requests and ND packets. Most are + * per logical port but DNAT addresses can be handled per datapath + * for non gateway router ports. + * + * Priority 91 and 92 flows are added for each gateway router + * port to handle the special cases. In case we get the packet + * on a regular port, just reply with the port's ETH address. + */ +LogicalRouterNatArpNdFlow(router, nat) :- + router in &Router(.lr = nb::Logical_Router{._uuid = lr}), + LogicalRouterNAT(.lr = lr, .nat = nat@NAT{.nat = &nb::NAT{.__type = __type}}), + /* Skip SNAT entries for now, we handle unique SNAT IPs separately + * below. + */ + __type != "snat". +/* Now handle SNAT entries too, one per unique SNAT IP. */ +LogicalRouterNatArpNdFlow(router, nat) :- + router in &Router(.snat_ips = snat_ips), + var snat_ip = FlatMap(snat_ips), + (var ip, var nats) = snat_ip, + Some{var nat} = nats.nth(0). + +relation LogicalRouterNatArpNdFlow(router: Ref<Router>, nat: NAT) +LogicalRouterArpNdFlow(router, nat, None, rEG_INPORT_ETH_ADDR(), None, false, 90) :- + LogicalRouterNatArpNdFlow(router, nat). + +/* ARP / ND handling for external IP addresses. + * + * DNAT and SNAT IP addresses are external IP addresses that need ARP + * handling. + * + * These are already taken care globally, per router. The only + * exception is on the l3dgw_port where we might need to use a + * different ETH address. + */ +LogicalRouterPortNatArpNdFlow(router, nat, l3dgw_port) :- + router in &Router(.lr = lr, .l3dgw_port = Some{l3dgw_port}), + LogicalRouterNAT(lr._uuid, nat), + /* Skip SNAT entries for now, we handle unique SNAT IPs separately + * below. + */ + nat.nat.__type != "snat". +/* Now handle SNAT entries too, one per unique SNAT IP. */ +LogicalRouterPortNatArpNdFlow(router, nat, l3dgw_port) :- + router in &Router(.l3dgw_port = Some{l3dgw_port}, .snat_ips = snat_ips), + var snat_ip = FlatMap(snat_ips), + (var ip, var nats) = snat_ip, + Some{var nat} = nats.nth(0). + +/* Respond to ARP/NS requests on the chassis that binds the gw + * port. Drop the ARP/NS requests on other chassis. + */ +relation LogicalRouterPortNatArpNdFlow(router: Ref<Router>, nat: NAT, lrp: nb::Logical_Router_Port) +LogicalRouterArpNdFlow(router, nat, Some{lrp}, mac, Some{extra_match}, false, 92), +LogicalRouterArpNdFlow(router, nat, Some{lrp}, mac, None, true, 91) :- + LogicalRouterPortNatArpNdFlow(router, nat, lrp), + (var mac, var extra_match) = match ((nat.external_mac, nat.nat.logical_port)) { + (Some{external_mac}, Some{logical_port}) -> ( + /* distributed NAT case, use nat->external_mac */ + external_mac.to_string(), + /* Traffic with eth.src = nat->external_mac should only be + * sent from the chassis where nat->logical_port is + * resident, so that upstream MAC learning points to the + * correct chassis. Also need to avoid generation of + * multiple ARP responses from different chassis. */ + "is_chassis_resident(${json_string_escape(logical_port)})" + ), + _ -> ( + rEG_INPORT_ETH_ADDR(), + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s + * should only be sent from the gateway chassis, so that + * upstream MAC learning points to the gateway chassis. + * Also need to avoid generation of multiple ARP responses + * from different chassis. */ + match (router.redirect_port_name) { + "" -> "", + s -> "is_chassis_resident(${s})" + } + ) + }. + +/* Now divide the ARP/ND flows into ARP and ND. */ +relation LogicalRouterArpNdFlow( + router: Ref<Router>, + nat: NAT, + lrp: Option<nb::Logical_Router_Port>, + mac: string, + extra_match: Option<string>, + drop: bool, + priority: integer) +LogicalRouterArpFlow(router, lrp, ipv4, mac, extra_match, drop, priority, + stage_hint(nat.nat._uuid)) :- + LogicalRouterArpNdFlow(router, nat@NAT{.external_ip = IPv4{ipv4}}, lrp, + mac, extra_match, drop, priority). +LogicalRouterNdFlow(router, lrp, "nd_na", ipv6, true, mac, extra_match, drop, priority, + stage_hint(nat.nat._uuid)) :- + LogicalRouterArpNdFlow(router, nat@NAT{.external_ip = IPv6{ipv6}}, lrp, + mac, extra_match, drop, priority). + +relation LogicalRouterArpFlow( + lr: Ref<Router>, + lrp: Option<nb::Logical_Router_Port>, + ip: in_addr, + mac: string, + extra_match: Option<string>, + drop: bool, + priority: integer, + external_ids: Map<string,string>) +Flow(.logical_datapath = lr.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = priority, + .__match = __match, + .actions = actions, + .external_ids = external_ids) :- + LogicalRouterArpFlow(.lr = lr, .lrp = lrp, .ip = ip, .mac = mac, + .extra_match = extra_match, .drop = drop, + .priority = priority, .external_ids = external_ids), + var __match = { + var clauses = vec_with_capacity(3); + match (lrp) { + Some{p} -> clauses.push("inport == ${json_string_escape(p.name)}"), + None -> () + }; + clauses.push("arp.op == 1 && arp.tpa == ${ip}"); + clauses.append(extra_match.to_vec()); + clauses.join(" && ") + }, + var actions = if (drop) { + "drop;" + } else { + "eth.dst = eth.src; " + "eth.src = ${mac}; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = ${mac}; " + "arp.tpa = arp.spa; " + "arp.spa = ${ip}; " + "outport = inport; " + "flags.loopback = 1; " + "output;" + }. + +relation LogicalRouterNdFlow( + lr: Ref<Router>, + lrp: Option<nb::Logical_Router_Port>, + action: string, + ip: in6_addr, + sn_ip: bool, + mac: string, + extra_match: Option<string>, + drop: bool, + priority: integer, + external_ids: Map<string,string>) +Flow(.logical_datapath = lr.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = priority, + .__match = __match, + .actions = actions, + .external_ids = external_ids) :- + LogicalRouterNdFlow(.lr = lr, .lrp = lrp, .action = action, .ip = ip, + .sn_ip = sn_ip, .mac = mac, .extra_match = extra_match, + .drop = drop, .priority = priority, + .external_ids = external_ids), + var __match = { + var clauses = vec_with_capacity(4); + match (lrp) { + Some{p} -> clauses.push("inport == ${json_string_escape(p.name)}"), + None -> () + }; + if (sn_ip) { + clauses.push("ip6.dst == {${ip}, ${in6_addr_solicited_node(ip)}}") + }; + clauses.push("nd_ns && nd.target == ${ip}"); + clauses.append(extra_match.to_vec()); + clauses.join(" && ") + }, + var actions = if (drop) { + "drop;" + } else { + "${action} { " + "eth.src = ${mac}; " + "ip6.src = ${ip}; " + "nd.target = ${ip}; " + "nd.tll = ${mac}; " + "outport = inport; " + "flags.loopback = 1; " + "output; " + "};" + }. + +/* ICMP time exceeded */ +for (RouterPortNetworksIPv4Addr(.port = &RouterPort{.lrp = lrp, + .json_name = json_name, + .router = router, + .networks = networks, + .is_redirect = is_redirect}, + .addr = addr)) +{ + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 40, + .__match = "inport == ${json_name} && ip4 && " + "ip.ttl == {0, 1} && !ip.later_frag", + .actions = "icmp4 {" + "eth.dst <-> eth.src; " + "icmp4.type = 11; /* Time exceeded */ " + "icmp4.code = 0; /* TTL exceeded in transit */ " + "ip4.dst = ip4.src; " + "ip4.src = ${addr.addr}; " + "ip.ttl = 255; " + "next; };", + .external_ids = stage_hint(lrp._uuid)); + + /* ARP reply. These flows reply to ARP requests for the router's own + * IP address. */ + for (AddChassisResidentCheck(lrp._uuid, add_chassis_resident_check)) { + var __match = + "arp.spa == ${ipv4_netaddr_match_network(addr)}" ++ + if (add_chassis_resident_check) { + " && is_chassis_resident(${router.redirect_port_name})" + } else "" in + LogicalRouterArpFlow(.lr = router, + .lrp = Some{lrp}, + .ip = addr.addr, + .mac = rEG_INPORT_ETH_ADDR(), + .extra_match = Some{__match}, + .drop = false, + .priority = 90, + .external_ids = stage_hint(lrp._uuid)) + } +} + +for (&RouterPort(.lrp = lrp, + .router = router@&Router{.lr = lr}, + .json_name = json_name, + .networks = networks, + .is_redirect = is_redirect)) +var residence_check = match (is_redirect) { + true -> Some{"is_chassis_resident(${router.redirect_port_name})"}, + false -> None +} in { + for (RouterLBVIP(.router = &Router{.lr = nb::Logical_Router{._uuid= lr._uuid}}, .vip = vip)) { + Some{(var ip_address, _)} = ip_address_and_port_from_lb_key(vip) in { + IPv4{var ipv4} = ip_address in + LogicalRouterArpFlow(.lr = router, + .lrp = Some{lrp}, + .ip = ipv4, + .mac = rEG_INPORT_ETH_ADDR(), + .extra_match = residence_check, + .drop = false, + .priority = 90, + .external_ids = map_empty()); + + IPv6{var ipv6} = ip_address in + LogicalRouterNdFlow(.lr = router, + .lrp = Some{lrp}, + .action = "nd_na", + .ip = ipv6, + .sn_ip = false, + .mac = rEG_INPORT_ETH_ADDR(), + .extra_match = residence_check, + .drop = false, + .priority = 90, + .external_ids = map_empty()) + } + } +} + +/* Drop IP traffic destined to router owned IPs except if the IP is + * also a SNAT IP. Those are dropped later, in stage + * "lr_in_arp_resolve", if unSNAT was unsuccessful. + * + * Priority 60. + */ +Flow(.logical_datapath = lr_uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 60, + .__match = "ip4.dst == {" ++ match_ips.join(", ") ++ "}", + .actions = "drop;", + .external_ids = stage_hint(lrp_uuid)) :- + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, + .router = &Router{.snat_ips = snat_ips, + .lr = nb::Logical_Router{._uuid = lr_uuid}}, + .networks = networks), + var addr = FlatMap(networks.ipv4_addrs), + not snat_ips.contains_key(IPv4{addr.addr}), + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). +Flow(.logical_datapath = lr_uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 60, + .__match = "ip6.dst == {" ++ match_ips.join(", ") ++ "}", + .actions = "drop;", + .external_ids = stage_hint(lrp_uuid)) :- + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, + .router = &Router{.snat_ips = snat_ips, + .lr = nb::Logical_Router{._uuid = lr_uuid}}, + .networks = networks), + var addr = FlatMap(networks.ipv6_addrs), + not snat_ips.contains_key(IPv6{addr.addr}), + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). + +for (RouterPortNetworksIPv4Addr( + .port = &RouterPort{ + .router = &Router{.lr = lr, + .l3dgw_port = None, + .is_gateway = false}, + .lrp = lrp}, + .addr = addr)) +{ + /* UDP/TCP port unreachable. */ + var __match = "ip4 && ip4.dst == ${addr.addr} && !ip.later_frag && udp" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 80, + .__match = __match, + .actions = "icmp4 {" + "eth.dst <-> eth.src; " + "ip4.dst <-> ip4.src; " + "ip.ttl = 255; " + "icmp4.type = 3; " + "icmp4.code = 3; " + "next; };", + .external_ids = stage_hint(lrp._uuid)); + + var __match = "ip4 && ip4.dst == ${addr.addr} && !ip.later_frag && tcp" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 80, + .__match = __match, + .actions = "tcp_reset {" + "eth.dst <-> eth.src; " + "ip4.dst <-> ip4.src; " + "next; };", + .external_ids = stage_hint(lrp._uuid)); + + var __match = "ip4 && ip4.dst == ${addr.addr} && !ip.later_frag" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 70, + .__match = __match, + .actions = "icmp4 {" + "eth.dst <-> eth.src; " + "ip4.dst <-> ip4.src; " + "ip.ttl = 255; " + "icmp4.type = 3; " + "icmp4.code = 2; " + "next; };", + .external_ids = stage_hint(lrp._uuid)) +} + +/* DHCPv6 reply handling */ +Flow(.logical_datapath = rp.router.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 100, + .__match = "ip6.dst == ${ipv6_addr.addr} " + "&& udp.src == 547 && udp.dst == 546", + .actions = "reg0 = 0; handle_dhcpv6_reply;", + .external_ids = stage_hint(rp.lrp._uuid)) :- + rp in &RouterPort(), + var ipv6_addr = FlatMap(rp.networks.ipv6_addrs). + +/* Logical router ingress table IP_INPUT: IP Input for IPv6. */ +for (&RouterPort(.router = &router, .networks = networks, .lrp = lrp) + if (not vec_is_empty(networks.ipv6_addrs))) +{ + //if (op->derived) { + // /* No ingress packets are accepted on a chassisredirect + // * port, so no need to program flows for that port. */ + // continue; + //} + + /* ICMPv6 echo reply. These flows reply to echo requests + * received for the router's IP address. */ + var __match = "ip6.dst == " ++ + format_v6_networks(networks) ++ + " && icmp6.type == 128 && icmp6.code == 0" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 90, + .__match = __match, + .actions = "ip6.dst <-> ip6.src; " + "ip.ttl = 255; " + "icmp6.type = 129; " + "flags.loopback = 1; " + "next; ", + .external_ids = stage_hint(lrp._uuid)) +} + +/* ND reply. These flows reply to ND solicitations for the + * router's own IP address. */ +for (RouterPortNetworksIPv6Addr(.port = &RouterPort{.lrp = lrp, + .is_redirect = is_redirect, + .router = router, + .networks = networks, + .json_name = json_name}, + .addr = addr)) +{ + var extra_match = if (is_redirect) { + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea + * should only be sent from the gateway chassis, so that + * upstream MAC learning points to the gateway chassis. + * Also need to avoid generation of multiple ND replies + * from different chassis. */ + Some{"is_chassis_resident(${json_string_escape(chassis_redirect_name(lrp.name))})"} + } else None in + LogicalRouterNdFlow(.lr = router, + .lrp = Some{lrp}, + .action = "nd_na_router", + .ip = addr.addr, + .sn_ip = true, + .mac = rEG_INPORT_ETH_ADDR(), + .extra_match = extra_match, + .drop = false, + .priority = 90, + .external_ids = stage_hint(lrp._uuid)) +} + +/* UDP/TCP port unreachable */ +for (RouterPortNetworksIPv6Addr( + .port = &RouterPort{.router = &Router{.lr = lr, + .l3dgw_port = None, + .is_gateway = false}, + .lrp = lrp, + .json_name = json_name}, + .addr = addr)) +{ + var __match = "ip6 && ip6.dst == ${addr.addr} && !ip.later_frag && tcp" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 80, + .__match = __match, + .actions = "tcp_reset {" + "eth.dst <-> eth.src; " + "ip6.dst <-> ip6.src; " + "next; };", + .external_ids = stage_hint(lrp._uuid)); + + var __match = "ip6 && ip6.dst == ${addr.addr} && !ip.later_frag && udp" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 80, + .__match = __match, + .actions = "icmp6 {" + "eth.dst <-> eth.src; " + "ip6.dst <-> ip6.src; " + "ip.ttl = 255; " + "icmp6.type = 1; " + "icmp6.code = 4; " + "next; };", + .external_ids = stage_hint(lrp._uuid)); + + var __match = "ip6 && ip6.dst == ${addr.addr} && !ip.later_frag" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 70, + .__match = __match, + .actions = "icmp6 {" + "eth.dst <-> eth.src; " + "ip6.dst <-> ip6.src; " + "ip.ttl = 255; " + "icmp6.type = 1; " + "icmp6.code = 3; " + "next; };", + .external_ids = stage_hint(lrp._uuid)) +} + +/* ICMPv6 time exceeded */ +for (RouterPortNetworksIPv6Addr(.port = &RouterPort{.router = &router, + .lrp = lrp, + .json_name = json_name}, + .addr = addr) + /* skip link-local address */ + if (not ipv6_netaddr_is_lla(addr))) +{ + var __match = "inport == ${json_name} && ip6 && " + "ip6.src == ${ipv6_netaddr_match_network(addr)} && " + "ip.ttl == {0, 1} && !ip.later_frag" in + var actions = "icmp6 {" + "eth.dst <-> eth.src; " + "ip6.dst = ip6.src; " + "ip6.src = ${addr.addr}; " + "ip.ttl = 255; " + "icmp6.type = 3; /* Time exceeded */ " + "icmp6.code = 0; /* TTL exceeded in transit */ " + "next; };" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 40, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lrp._uuid)) +} + +/* NAT, Defrag and load balancing. */ + +function default_allow_flow(datapath: uuid, stage: Stage): Flow { + Flow{.logical_datapath = datapath, + .stage = stage, + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()} +} +for (&Router(.lr = lr)) { + /* Packets are allowed by default. */ + Flow[default_allow_flow(lr._uuid, router_stage(IN, DEFRAG))]; + Flow[default_allow_flow(lr._uuid, router_stage(IN, UNSNAT))]; + Flow[default_allow_flow(lr._uuid, router_stage(OUT, SNAT))]; + Flow[default_allow_flow(lr._uuid, router_stage(IN, DNAT))]; + Flow[default_allow_flow(lr._uuid, router_stage(OUT, UNDNAT))]; + Flow[default_allow_flow(lr._uuid, router_stage(OUT, EGR_LOOP))]; + Flow[default_allow_flow(lr._uuid, router_stage(IN, ECMP_STATEFUL))]; + + /* Send the IPv6 NS packets to next table. When ovn-controller + * generates IPv6 NS (for the action - nd_ns{}), the injected + * packet would go through conntrack - which is not required. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, SNAT), + .priority = 120, + .__match = "nd_ns", + .actions = "next;", + .external_ids = map_empty()) +} + +function lrouter_nat_is_stateless(nat: NAT): bool = { + Some{"true"} == map_get(nat.nat.options, "stateless") +} + +/* Handles the match criteria and actions in logical flow + * based on external ip based NAT rule filter. + * + * For ALLOWED_EXT_IPs, we will add an additional match criteria + * of comparing ip*.src/dst with the allowed external ip address set. + * + * For EXEMPTED_EXT_IPs, we will have an additional logical flow + * where we compare ip*.src/dst with the exempted external ip address set + * and action says "next" instead of ct*. + */ +function lrouter_nat_add_ext_ip_match( + router: Ref<Router>, + nat: NAT, + __match: string, + ipX: string, + is_src: bool, + mask: v46_ip): (string, Option<Flow>) +{ + var dir = if (is_src) "src" else "dst"; + match (nat.exceptional_ext_ips) { + None -> ("", None), + Some{AllowedExtIps{__as}} -> (" && ${ipX}.${dir} == $${__as.name}", None), + Some{ExemptedExtIps{__as}} -> { + /* Priority of logical flows corresponding to exempted_ext_ips is + * +1 of the corresponding regulr NAT rule. + * For example, if we have following NAT rule and we associate + * exempted external ips to it: + * "ovn-nbctl lr-nat-add router dnat_and_snat 10.15.24.139 50.0.0.11" + * + * And now we associate exempted external ip address set to it. + * Now corresponding to above rule we will have following logical + * flows: + * lr_out_snat...priority=162, match=(..ip4.dst == $exempt_range), + * action=(next;) + * lr_out_snat...priority=161, match=(..), action=(ct_snat(....);) + * + */ + var priority = match (is_src) { + true -> { + /* S_ROUTER_IN_DNAT uses priority 100 */ + 100 + 1 + }, + false -> { + /* S_ROUTER_OUT_SNAT uses priority (mask + 1 + 128 + 1) */ + var is_gw_router = router.l3dgw_port.is_none(); + var mask_1bits = ip46_count_cidr_bits(mask).unwrap_or(8'd0) as integer; + mask_1bits + 2 + { if (not is_gw_router) 128 else 0 } + } + }; + + ("", + Some{Flow{.logical_datapath = router.lr._uuid, + .stage = if (is_src) { router_stage(IN, DNAT) } else { router_stage(OUT, SNAT) }, + .priority = priority, + .__match = "${__match} && ${ipX}.${dir} == $${__as.name}", + .actions = "next;", + .external_ids = stage_hint(nat.nat._uuid)}}) + } + } +} + +relation LogicalRouterForceSnatFlows( + logical_router: uuid, + ips: Set<v46_ip>, + context: string) +Flow(.logical_datapath = logical_router, + .stage = router_stage(IN, UNSNAT), + .priority = 110, + .__match = "${ipX} && ${ipX}.dst == ${ip}", + .actions = "ct_snat;", + .external_ids = map_empty()), +/* Higher priority rules to force SNAT with the IP addresses + * configured in the Gateway router. This only takes effect + * when the packet has already been DNATed or load balanced once. */ +Flow(.logical_datapath = logical_router, + .stage = router_stage(OUT, SNAT), + .priority = 100, + .__match = "flags.force_snat_for_${context} == 1 && ${ipX}", + .actions = "ct_snat(%{ip});", + .external_ids = map_empty()) :- + LogicalRouterForceSnatFlows(.logical_router = logical_router, + .ips = ips, + .context = context), + var ip = FlatMap(ips), + var ipX = ip46_ipX(ip). + +/* NAT rules are only valid on Gateway routers and routers with + * l3dgw_port (router has a port with "redirect-chassis" + * specified). */ +for (r in &Router(.lr = lr, + .l3dgw_port = l3dgw_port, + .redirect_port_name = redirect_port_name, + .is_gateway = is_gateway) + if is_some(l3dgw_port) or is_gateway) +{ + for (LogicalRouterNAT(.lr = lr._uuid, .nat = nat)) { + var ipX = ip46_ipX(nat.external_ip) in + var xx = ip46_xxreg(nat.external_ip) in + /* Check the validity of nat->logical_ip. 'logical_ip' can + * be a subnet when the type is "snat". */ + Some{(_, var mask)} = ip46_parse_masked(nat.nat.logical_ip) in + true == match ((ip46_is_all_ones(mask), nat.nat.__type)) { + (_, "snat") -> true, + (false, _) -> { + warn("bad ip ${nat.nat.logical_ip} for dnat in router ${uuid2str(lr._uuid)}"); + false + }, + _ -> true + } in + /* For distributed router NAT, determine whether this NAT rule + * satisfies the conditions for distributed NAT processing. */ + var mac = match ((is_some(l3dgw_port) and nat.nat.__type == "dnat_and_snat", + nat.nat.logical_port, nat.external_mac)) { + (true, Some{_}, Some{mac}) -> Some{mac}, + _ -> None + } in + var stateless = (lrouter_nat_is_stateless(nat) + and nat.nat.__type == "dnat_and_snat") in + { + /* Ingress UNSNAT table: It is for already established connections' + * reverse traffic. i.e., SNAT has already been done in egress + * pipeline and now the packet has entered the ingress pipeline as + * part of a reply. We undo the SNAT here. + * + * Undoing SNAT has to happen before DNAT processing. This is + * because when the packet was DNATed in ingress pipeline, it did + * not know about the possibility of eventual additional SNAT in + * egress pipeline. */ + if (nat.nat.__type == "snat" or nat.nat.__type == "dnat_and_snat") { + if (l3dgw_port == None) { + /* Gateway router. */ + var actions = if (stateless) { + "${ipX}.dst=${nat.nat.logical_ip}; next;" + } else { + "ct_snat;" + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, UNSNAT), + .priority = 90, + .__match = "ip && ${ipX}.dst == ${nat.nat.external_ip}", + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)) + }; + Some{var gwport} = l3dgw_port in { + /* Distributed router. */ + + /* Traffic received on l3dgw_port is subject to NAT. */ + var __match = + "ip && ${ipX}.dst == ${nat.nat.external_ip}" + " && inport == ${json_string_escape(gwport.name)}" ++ + if (mac == None) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + " && is_chassis_resident(${redirect_port_name})" + } else { "" } in + var actions = if (stateless) { + "${ipX}.dst=${nat.nat.logical_ip}; next;" + } else { + "ct_snat;" + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, UNSNAT), + .priority = 100, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)) + } + }; + + /* Ingress DNAT table: Packets enter the pipeline with destination + * IP address that needs to be DNATted from a external IP address + * to a logical IP address. */ + var ip_and_ports = "${nat.nat.logical_ip}" ++ + if (nat.nat.external_port_range != "") { + " ${nat.nat.external_port_range}" + } else { + "" + } in + if (nat.nat.__type == "dnat" or nat.nat.__type == "dnat_and_snat") { + None = l3dgw_port in + var __match = "ip && ip4.dst == ${nat.nat.external_ip}" in + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( + r, nat, __match, ipX, true, mask) in + { + /* Gateway router. */ + /* Packet when it goes from the initiator to destination. + * We need to set flags.loopback because the router can + * send the packet back through the same interface. */ + Some{var f} = ext_flow in Flow[f]; + + var flag_action = + if (has_force_snat_ip(lr, "dnat")) { + /* Indicate to the future tables that a DNAT has taken + * place and a force SNAT needs to be done in the + * Egress SNAT table. */ + "flags.force_snat_for_dnat = 1; " + } else { "" } in + var nat_actions = if (stateless) { + "${ipX}.dst=${nat.nat.logical_ip}; next;" + } else { + "flags.loopback = 1; " + "ct_dnat(${ip_and_ports});" + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, DNAT), + .priority = 100, + .__match = __match ++ ext_ip_match, + .actions = flag_action ++ nat_actions, + .external_ids = stage_hint(nat.nat._uuid)) + }; + + Some{var gwport} = l3dgw_port in + var __match = + "ip && ${ipX}.dst == ${nat.nat.external_ip}" + " && inport == ${json_string_escape(gwport.name)}" ++ + if (mac == None) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + " && is_chassis_resident(${redirect_port_name})" + } else { "" } in + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( + r, nat, __match, ipX, true, mask) in + { + /* Distributed router. */ + /* Traffic received on l3dgw_port is subject to NAT. */ + Some{var f} = ext_flow in Flow[f]; + + var actions = if (stateless) { + "${ipX}.dst=${nat.nat.logical_ip}; next;" + } else { + "ct_dnat(${ip_and_ports});" + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, DNAT), + .priority = 100, + .__match = __match ++ ext_ip_match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)) + } + }; + + /* ARP resolve for NAT IPs. */ + Some{var gwport} = l3dgw_port in { + var gwport_name = json_string_escape(gwport.name) in { + if (nat.nat.__type == "snat") { + var __match = "inport == ${gwport_name} && " + "${ipX}.src == ${nat.nat.external_ip}" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, IP_INPUT), + .priority = 120, + .__match = __match, + .actions = "next;", + .external_ids = stage_hint(nat.nat._uuid)) + }; + + var nexthop_reg = "${xx}${rEG_NEXT_HOP()}" in + var __match = "outport == ${gwport_name} && " + "${nexthop_reg} == ${nat.nat.external_ip}" in + var dst_mac = match (mac) { + Some{value} -> "${value}", + None -> gwport.mac + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = __match, + .actions = "eth.dst = ${dst_mac}; next;", + .external_ids = stage_hint(nat.nat._uuid)) + } + }; + + /* Egress UNDNAT table: It is for already established connections' + * reverse traffic. i.e., DNAT has already been done in ingress + * pipeline and now the packet has entered the egress pipeline as + * part of a reply. We undo the DNAT here. + * + * Note that this only applies for NAT on a distributed router. + * Undo DNAT on a gateway router is done in the ingress DNAT + * pipeline stage. */ + if ((nat.nat.__type == "dnat" or nat.nat.__type == "dnat_and_snat")) { + Some{var gwport} = l3dgw_port in + var __match = + "ip && ${ipX}.src == ${nat.nat.logical_ip}" + " && outport == ${json_string_escape(gwport.name)}" ++ + if (mac == None) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + " && is_chassis_resident(${redirect_port_name})" + } else { "" } in + var actions = + match (mac) { + Some{mac_addr} -> "eth.src = ${mac_addr}; ", + None -> "" + } ++ + if (stateless) { + "${ipX}.src=${nat.nat.external_ip}; next;" + } else { + "ct_dnat;" + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, UNDNAT), + .priority = 100, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)) + }; + + /* Egress SNAT table: Packets enter the egress pipeline with + * source ip address that needs to be SNATted to a external ip + * address. */ + var ip_and_ports = "${nat.nat.external_ip}" ++ + if (nat.nat.external_port_range != "") { + " ${nat.nat.external_port_range}" + } else { + "" + } in + if (nat.nat.__type == "snat" or nat.nat.__type == "dnat_and_snat") { + None = l3dgw_port in + var __match = "ip && ${ipX}.src == ${nat.nat.logical_ip}" in + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( + r, nat, __match, ipX, false, mask) in + { + /* Gateway router. */ + Some{var f} = ext_flow in Flow[f]; + + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + var actions = if (stateless) { + "${ipX}.src=${nat.nat.external_ip}; next;" + } else { + "ct_snat(${ip_and_ports});" + } in + Some{var plen} = ip46_count_cidr_bits(mask) in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, SNAT), + .priority = plen as bit<64> + 1, + .__match = __match ++ ext_ip_match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)) + }; + + Some{var gwport} = l3dgw_port in + var __match = + "ip && ${ipX}.src == ${nat.nat.logical_ip}" + " && outport == ${json_string_escape(gwport.name)}" ++ + if (mac == None) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + " && is_chassis_resident(${redirect_port_name})" + } else { "" } in + (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( + r, nat, __match, ipX, false, mask) in + { + /* Distributed router. */ + Some{var f} = ext_flow in Flow[f]; + + var actions = + match (mac) { + Some{mac_addr} -> "eth.src = ${mac_addr}; ", + _ -> "" + } ++ if (stateless) { + "${ipX}.src=${nat.nat.external_ip}; next;" + } else { + "ct_snat(${ip_and_ports});" + } in + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + Some{var plen} = ip46_count_cidr_bits(mask) in + var priority = (plen as bit<64>) + 1 in + var centralized_boost = if (mac == None) 128 else 0 in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, SNAT), + .priority = priority + centralized_boost, + .__match = __match ++ ext_ip_match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)) + } + }; + + /* Logical router ingress table ADMISSION: + * For NAT on a distributed router, add rules allowing + * ingress traffic with eth.dst matching nat->external_mac + * on the l3dgw_port instance where nat->logical_port is + * resident. */ + Some{var mac_addr} = mac in + Some{var gwport} = l3dgw_port in + Some{var logical_port} = nat.nat.logical_port in + var __match = + "eth.dst == ${mac_addr} && inport == ${json_string_escape(gwport.name)}" + " && is_chassis_resident(${json_string_escape(logical_port)})" in + /* Store the ethernet address of the port receiving the packet. + * This will save us from having to match on inport further + * down in the pipeline. + */ + var actions = "${rEG_INPORT_ETH_ADDR()} = ${gwport.mac}; next;" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ADMISSION), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)); + + /* Ingress Gateway Redirect Table: For NAT on a distributed + * router, add flows that are specific to a NAT rule. These + * flows indicate the presence of an applicable NAT rule that + * can be applied in a distributed manner. + * In particulr the IP src register and eth.src are set to NAT external IP and + * NAT external mac so the ARP request generated in the following + * stage is sent out with proper IP/MAC src addresses + */ + Some{var mac_addr} = mac in + Some{var gwport} = l3dgw_port in + Some{var logical_port} = nat.nat.logical_port in + Some{var external_mac} = nat.nat.external_mac in + var __match = + "${ipX}.src == ${nat.nat.logical_ip} && " + "outport == ${json_string_escape(gwport.name)} && " + "is_chassis_resident(${json_string_escape(logical_port)})" in + var actions = + "eth.src = ${external_mac}; " + "${xx}${rEG_SRC()} = ${nat.nat.external_ip}; " + "next;" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, GW_REDIRECT), + .priority = 100, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)); + + /* Egress Loopback table: For NAT on a distributed router. + * If packets in the egress pipeline on the distributed + * gateway port have ip.dst matching a NAT external IP, then + * loop a clone of the packet back to the beginning of the + * ingress pipeline with inport = outport. */ + Some{var gwport} = l3dgw_port in + /* Distributed router. */ + Some{var port} = match (mac) { + Some{_} -> match (nat.nat.logical_port) { + Some{name} -> Some{json_string_escape(name)}, + None -> None: Option<string> + }, + None -> Some{redirect_port_name} + } in + var __match = "${ipX}.dst == ${nat.nat.external_ip} && outport == ${json_string_escape(gwport.name)} && is_chassis_resident(${port})" in + var regs = { + var regs = vec_empty(); + for (j in range_vec(0, mFF_N_LOG_REGS(), 01)) { + vec_push(regs, "reg${j} = 0; ") + }; + regs + } in + var actions = + "clone { ct_clear; " + "inport = outport; outport = \"\"; " + "flags = 0; flags.loopback = 1; " ++ + string_join(regs, "") ++ + "${rEGBIT_EGRESS_LOOPBACK()} = 1; " + "next(pipeline=ingress, table=0); };" in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, EGR_LOOP), + .priority = 100, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(nat.nat._uuid)) + } + }; + + /* Handle force SNAT options set in the gateway router. */ + if (l3dgw_port == None) { + var dnat_force_snat_ips = get_force_snat_ip(lr, "dnat") in + if (not dnat_force_snat_ips.is_empty()) + LogicalRouterForceSnatFlows(.logical_router = lr._uuid, + .ips = dnat_force_snat_ips, + .context = "dnat"); + + var lb_force_snat_ips = get_force_snat_ip(lr, "lb") in + if (not lb_force_snat_ips.is_empty()) + LogicalRouterForceSnatFlows(.logical_router = lr._uuid, + .ips = lb_force_snat_ips, + .context = "lb"); + + /* For gateway router, re-circulate every packet through + * the DNAT zone. This helps with the following. + * + * Any packet that needs to be unDNATed in the reverse + * direction gets unDNATed. Ideally this could be done in + * the egress pipeline. But since the gateway router + * does not have any feature that depends on the source + * ip address being external IP address for IP routing, + * we can do it here, saving a future re-circulation. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, DNAT), + .priority = 50, + .__match = "ip", + .actions = "flags.loopback = 1; ct_dnat;", + .external_ids = map_empty()) + } +} + +function nats_contain_vip(nats: Vec<NAT>, vip: v46_ip): bool { + for (nat in nats) { + if (nat.external_ip == vip) { + return true + } + }; + return false +} + +/* Load balancing and packet defrag are only valid on + * Gateway routers or router with gateway port. */ +for (RouterLBVIP( + .router = &Router{.lr = lr, + .l3dgw_port = l3dgw_port, + .redirect_port_name = redirect_port_name, + .is_gateway = is_gateway, + .nats = nats}, + .lb = &lb, + .vip = vip, + .backends = backends) + if is_some(l3dgw_port) or is_gateway) +{ + if (backends == "") { + for (ControllerEventEn(true)) { + for (HasEventElbMeter(has_elb_meter)) { + Some {(var __match, var __action)} = + build_empty_lb_event_flow(vip, lb, has_elb_meter) in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, DNAT), + .priority = 130, + .__match = __match, + .actions = __action, + .external_ids = stage_hint(lb._uuid)) + } + } + }; + + /* A set to hold all ips that need defragmentation and tracking. */ + + /* vip contains IP:port or just IP. */ + Some{(var ip_address, var port)} = ip_address_and_port_from_lb_key(vip) in + var ipX = ip46_ipX(ip_address) in + var proto = match (lb.protocol) { + Some{proto} -> proto, + _ -> "tcp" + } in { + /* If there are any load balancing rules, we should send + * the packet to conntrack for defragmentation and + * tracking. This helps with two things. + * + * 1. With tracking, we can send only new connections to + * pick a DNAT ip address from a group. + * 2. If there are L4 ports in load balancing rules, we + * need the defragmentation to match on L4 ports. */ + var __match = "ip && ${ipX}.dst == ${ip_address}" in + /* One of these flows must be created for each unique LB VIP address. + * We create one for each VIP:port pair; flows with the same IP and + * different port numbers will produce identical flows that will + * get merged by DDlog. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, DEFRAG), + .priority = 100, + .__match = __match, + .actions = "ct_next;", + .external_ids = stage_hint(lb._uuid)); + + /* Higher priority rules are added for load-balancing in DNAT + * table. For every match (on a VIP[:port]), we add two flows + * via add_router_lb_flow(). One flow is for specific matching + * on ct.new with an action of "ct_lb($targets);". The other + * flow is for ct.est with an action of "ct_dnat;". */ + var match1 = "ip && ${ipX}.dst == ${ip_address}" in + (var prio, var match2) = + if (port != 0) { + (120, " && ${proto} && ${proto}.dst == ${port}") + } else { + (110, "") + } in + var __match = match1 ++ match2 ++ + match (l3dgw_port) { + Some{gwport} -> " && is_chassis_resident(${redirect_port_name})", + _ -> "" + } in + var has_force_snat_ip = has_force_snat_ip(lr, "lb") in + { + /* A match and actions for established connections. */ + var est_match = "ct.est && " ++ __match in + var actions = + match (has_force_snat_ip) { + true -> "flags.force_snat_for_lb = 1; ct_dnat;", + false -> "ct_dnat;" + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, DNAT), + .priority = prio, + .__match = est_match, + .actions = actions, + .external_ids = stage_hint(lb._uuid)); + + if (nats_contain_vip(nats, ip_address)) { + /* The load balancer vip is also present in the NAT entries. + * So add a high priority lflow to advance the the packet + * destined to the vip (and the vip port if defined) + * in the S_ROUTER_IN_UNSNAT stage. + * There seems to be an issue with ovs-vswitchd. When the new + * connection packet destined for the lb vip is received, + * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat + * conntrack zone. For the next packet, if it goes through + * unsnat stage, the conntrack flags are not set properly, and + * it doesn't hit the established state flows in + * S_ROUTER_IN_DNAT stage. */ + var match3 = "${ipX} && ${ipX}.dst == ${ip_address} && ${proto}" ++ + if (port != 0) { " && ${proto}.dst == ${port}" } + else { "" } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, UNSNAT), + .priority = 120, + .__match = match3, + .actions = "next;", + .external_ids = stage_hint(lb._uuid)) + }; + + Some{var gwport} = l3dgw_port in + /* Add logical flows to UNDNAT the load balanced reverse traffic in + * the router egress pipleine stage - S_ROUTER_OUT_UNDNAT if the logical + * router has a gateway router port associated. + */ + var conds = { + var conds = vec_empty(); + for (ip_str in string_split(backends, ",")) { + match (ip_address_and_port_from_lb_key(ip_str)) { + None -> () /* FIXME: put a break here */, + Some{(ip_address_, port_)} -> vec_push(conds, + "(${ipX}.src == ${ip_address_}" ++ + if (port_ != 0) { + " && ${proto}.src == ${port_})" + } else { + ")" + }) + } + }; + conds + } in + not vec_is_empty(conds) in + var undnat_match = + "${ip46_ipX(ip_address)} && (" ++ string_join(conds, " || ") ++ + ") && outport == ${json_string_escape(gwport.name)} && " + "is_chassis_resident(${redirect_port_name})" in + var action = + match (has_force_snat_ip) { + true -> "flags.force_snat_for_lb = 1; ct_dnat;", + false -> "ct_dnat;" + } in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, UNDNAT), + .priority = 120, + .__match = undnat_match, + .actions = action, + .external_ids = stage_hint(lb._uuid)) + } + } +} + +/* Higher priority rules are added for load-balancing in DNAT + * table. For every match (on a VIP[:port]), we add two flows + * via add_router_lb_flow(). One flow is for specific matching + * on ct.new with an action of "ct_lb($targets);". The other + * flow is for ct.est with an action of "ct_dnat;". */ +Flow(.logical_datapath = r.lr._uuid, + .stage = router_stage(IN, DNAT), + .priority = priority, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lb._uuid)) :- + r in &Router(), + is_some(r.l3dgw_port) or r.is_gateway, + LBVIPBackend[lbvipbackend], + Some{var svc_monitor} = lbvipbackend.svc_monitor, + var lbvip = lbvipbackend.lbvip, + var lb = lbvip.lb, + set_contains(r.lr.load_balancer, lb._uuid), + bs in &LBVIPBackendStatus(.port = lbvipbackend.port, + .ip = lbvipbackend.ip, + .protocol = default_protocol(lb.protocol), + .logical_port = svc_monitor.port_name), + var bses = bs.group_by((r, lbvip, lb)).to_set(), + var __match + = "ct.new && " ++ + get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, true) ++ + match (r.l3dgw_port) { + Some{gwport} -> " && is_chassis_resident(${r.redirect_port_name})", + _ -> "" + }, + var priority = if (lbvip.vip_port != 0) 120 else 110, + var up_backends = { + var up_backends = set_empty(); + for (bs in bses) { + if (bs.up) { + set_insert(up_backends, "${bs.ip}:${bs.port}") + } + }; + up_backends + }, + var actions = if (set_is_empty(up_backends)) { + "drop;" + } else { + match (has_force_snat_ip(r.lr, "lb")) { + true -> "flags.force_snat_for_lb = 1; ", + false -> "" + } ++ ct_lb(string_join(set_to_vec(up_backends), ","), lb.selection_fields, + lb.protocol) + }. +Flow(.logical_datapath = r.lr._uuid, + .stage = router_stage(IN, DNAT), + .priority = priority, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lb._uuid)) :- + r in &Router(), + is_some(r.l3dgw_port) or r.is_gateway, + LBVIPBackend[lbvipbackend], + None = lbvipbackend.svc_monitor, + var lbvip = lbvipbackend.lbvip, + var lb = lbvip.lb, + set_contains(r.lr.load_balancer, lb._uuid), + var __match + = "ct.new && " ++ + get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, true) ++ + match (r.l3dgw_port) { + Some{gwport} -> " && is_chassis_resident(${r.redirect_port_name})", + _ -> "" + }, + var priority = if (lbvip.vip_port != 0) 120 else 110, + var actions = ct_lb(lbvip.backend_ips, lb.selection_fields, lb.protocol). + + +/* Defaults based on MaxRtrInterval and MinRtrInterval from RFC 4861 section + * 6.2.1 + */ +function nD_RA_MAX_INTERVAL_DEFAULT(): integer = 600 + +function nd_ra_min_interval_default(max: integer): integer = +{ + if (max >= 9) { max / 3 } else { max * 3 / 4 } +} + +function nD_RA_MAX_INTERVAL_MAX(): integer = 1800 +function nD_RA_MAX_INTERVAL_MIN(): integer = 4 + +function nD_RA_MIN_INTERVAL_MAX(max: integer): integer = ((max * 3) / 4) +function nD_RA_MIN_INTERVAL_MIN(): integer = 3 + +function nD_MTU_DEFAULT(): integer = 0 + +function copy_ra_to_sb(port: RouterPort, address_mode: string): Map<string, string> = +{ + var options = port.sb_options; + + map_insert(options, "ipv6_ra_send_periodic", "true"); + map_insert(options, "ipv6_ra_address_mode", address_mode); + + var max_interval = map_get_int_def(port.lrp.ipv6_ra_configs, "max_interval", + nD_RA_MAX_INTERVAL_DEFAULT()); + + if (max_interval > nD_RA_MAX_INTERVAL_MAX()) { + max_interval = nD_RA_MAX_INTERVAL_MAX() + } else (); + + if (max_interval < nD_RA_MAX_INTERVAL_MIN()) { + max_interval = nD_RA_MAX_INTERVAL_MIN() + } else (); + + map_insert(options, "ipv6_ra_max_interval", "${max_interval}"); + + var min_interval = map_get_int_def(port.lrp.ipv6_ra_configs, + "min_interval", nd_ra_min_interval_default(max_interval)); + + if (min_interval > nD_RA_MIN_INTERVAL_MAX(max_interval)) { + min_interval = nD_RA_MIN_INTERVAL_MAX(max_interval) + } else (); + + if (min_interval < nD_RA_MIN_INTERVAL_MIN()) { + min_interval = nD_RA_MIN_INTERVAL_MIN() + } else (); + + map_insert(options, "ipv6_ra_min_interval", "${min_interval}"); + + var mtu = map_get_int_def(port.lrp.ipv6_ra_configs, "mtu", nD_MTU_DEFAULT()); + + /* RFC 2460 requires the MTU for IPv6 to be at least 1280 */ + if (mtu != 0 and mtu >= 1280) { + map_insert(options, "ipv6_ra_mtu", "${mtu}") + } else (); + + var prefixes = vec_empty(); + for (addrs in port.networks.ipv6_addrs) { + if (ipv6_netaddr_is_lla(addrs)) { + map_insert(options, "ipv6_ra_src_addr", "${addrs.addr}") + } else { + vec_push(prefixes, ipv6_netaddr_match_network(addrs)) + } + }; + match (map_get(port.sb_options, "ipv6_ra_pd_list")) { + Some{value} -> vec_push(prefixes, value), + _ -> () + }; + map_insert(options, "ipv6_ra_prefixes", string_join(prefixes, " ")); + + match (map_get(port.lrp.ipv6_ra_configs, "rdnss")) { + Some{value} -> map_insert(options, "ipv6_ra_rdnss", value), + _ -> () + }; + + match (map_get(port.lrp.ipv6_ra_configs, "dnssl")) { + Some{value} -> map_insert(options, "ipv6_ra_dnssl", value), + _ -> () + }; + + map_insert(options, "ipv6_ra_src_eth", "${port.networks.ea}"); + + var prf = match (map_get(port.lrp.ipv6_ra_configs, "router_preference")) { + Some{prf} -> if (prf == "HIGH" or prf == "LOW") prf else "MEDIUM", + _ -> "MEDIUM" + }; + map_insert(options, "ipv6_ra_prf", prf); + + match (map_get(port.lrp.ipv6_ra_configs, "route_info")) { + Some{s} -> map_insert(options, "ipv6_ra_route_info", s), + _ -> () + }; + + options +} + +/* Logical router ingress table ND_RA_OPTIONS and ND_RA_RESPONSE: IPv6 Router + * Adv (RA) options and response. */ +// FIXME: do these rules apply to derived ports? +for (&RouterPort[port@RouterPort{.lrp = lrp@nb::Logical_Router_Port{.peer = None}, + .router = &router, + .json_name = json_name, + .networks = networks, + .peer = PeerSwitch{}}] + if (not vec_is_empty(networks.ipv6_addrs))) +{ + Some{var address_mode} = map_get(lrp.ipv6_ra_configs, "address_mode") in + /* FIXME: we need a nicer wat to write this */ + true == + if ((address_mode != "slaac") and + (address_mode != "dhcpv6_stateful") and + (address_mode != "dhcpv6_stateless")) { + warn("Invalid address mode [${address_mode}] defined"); + false + } else { true } in + { + if (map_get_bool_def(lrp.ipv6_ra_configs, "send_periodic", false)) { + RouterPortRAOptions(lrp._uuid, copy_ra_to_sb(port, address_mode)) + }; + + (true, var prefix) = + { + var add_rs_response_flow = false; + var prefix = ""; + for (addr in networks.ipv6_addrs) { + if (not ipv6_netaddr_is_lla(addr)) { + prefix = prefix ++ ", prefix = ${ipv6_netaddr_match_network(addr)}"; + add_rs_response_flow = true + } else () + }; + (add_rs_response_flow, prefix) + } in + { + var __match = "inport == ${json_name} && ip6.dst == ff02::2 && nd_rs" in + /* As per RFC 2460, 1280 is minimum IPv6 MTU. */ + var mtu = match(map_get(lrp.ipv6_ra_configs, "mtu")) { + Some{mtu_s} -> { + match (str_to_int(mtu_s, 10)) { + None -> 0, + Some{mtu} -> if (mtu >= 1280) mtu else 0 + } + }, + None -> 0 + } in + var actions0 = + "${rEGBIT_ND_RA_OPTS_RESULT()} = put_nd_ra_opts(" + "addr_mode = ${json_string_escape(address_mode)}, " + "slla = ${networks.ea}" ++ + if (mtu > 0) { ", mtu = ${mtu}" } else { "" } in + var router_preference = match (map_get(lrp.ipv6_ra_configs, "router_preference")) { + Some{"MEDIUM"} -> "", + None -> "", + Some{prf} -> ", router_preference = \"${prf}\"" + } in + var actions = actions0 ++ router_preference ++ prefix ++ "); next;" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ND_RA_OPTIONS), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lrp._uuid)); + + var __match = "inport == ${json_name} && ip6.dst == ff02::2 && " + "nd_ra && ${rEGBIT_ND_RA_OPTS_RESULT()}" in + var ip6_str = ipv6_string_mapped(in6_generate_lla(networks.ea)) in + var actions = "eth.dst = eth.src; eth.src = ${networks.ea}; " + "ip6.dst = ip6.src; ip6.src = ${ip6_str}; " + "outport = inport; flags.loopback = 1; " + "output;" in + Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ND_RA_RESPONSE), + .priority = 50, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(lrp._uuid)) + } + } +} + + +/* Logical router ingress table ND_RA_OPTIONS, ND_RA_RESPONSE: RS responder, by + * default goto next. (priority 0)*/ +for (&Router(.lr = lr)) +{ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ND_RA_OPTIONS), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ND_RA_RESPONSE), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +/* Proxy table that stores per-port routes. + * There routes get converted into logical flows by + * the following rule. + */ +relation Route(key: route_key, // matching criteria + port: Ref<RouterPort>, // output port + src_ip: v46_ip, // source IP address for output + gateway: Option<v46_ip>) // next hop (unless being delivered) + +function build_route_match(key: route_key) : (string, bit<32>) = +{ + var ipX = ip46_ipX(key.ip_prefix); + + (var dir, var priority) = match (key.policy) { + SrcIp -> ("src", key.plen * 2), + DstIp -> ("dst", (key.plen * 2) + 1) + }; + + var network = ip46_get_network(key.ip_prefix, key.plen); + var __match = "${ipX}.${dir} == ${network}/${key.plen}"; + + (__match, priority) +} +for (Route(.port = port, + .key = key, + .src_ip = src_ip, + .gateway = gateway)) +{ + var ipX = ip46_ipX(key.ip_prefix) in + var xx = ip46_xxreg(key.ip_prefix) in + /* IPv6 link-local addresses must be scoped to the local router port. */ + var inport_match = match (key.ip_prefix) { + IPv6{prefix} -> if (in6_is_lla(prefix)) { + "inport == ${port.json_name} && " + } else "", + _ -> "" + } in + (var ip_match, var priority) = build_route_match(key) in + var __match = inport_match ++ ip_match in + var nexthop = match (gateway) { + Some{gw} -> "${gw}", + None -> "${ipX}.dst" + } in + var actions = + "ip.ttl--; " + "${rEG_ECMP_GROUP_ID()} = 0; " + "${xx}${rEG_NEXT_HOP()} = ${nexthop}; " + "${xx}${rEG_SRC()} = ${src_ip}; " + "eth.src = ${port.networks.ea}; " + "outport = ${port.json_name}; " + "flags.loopback = 1; " + "next;" in + /* The priority here is calculated to implement longest-prefix-match + * routing. */ + Flow(.logical_datapath = port.router.lr._uuid, + .stage = router_stage(IN, IP_ROUTING), + .priority = 32'd0 ++ priority, + .__match = __match, + .actions = actions, + .external_ids = stage_hint(port.lrp._uuid)) +} + +/* Logical router ingress table IP_ROUTING & IP_ROUTING_ECMP: IP Routing. + * + * A packet that arrives at this table is an IP packet that should be + * routed to the address in 'ip[46].dst'. + * + * For regular routes without ECMP, table IP_ROUTING sets outport to the + * correct output port, eth.src to the output port's MAC address, and + * '[xx]${rEG_NEXT_HOP()}' to the next-hop IP address (leaving 'ip[46].dst', the + * packet’s final destination, unchanged), and advances to the next table. + * + * For ECMP routes, i.e. multiple routes with same policy and prefix, table + * IP_ROUTING remembers ECMP group id and selects a member id, and advances + * to table IP_ROUTING_ECMP, which sets outport, eth.src, and the appropriate + * next-hop register for the selected ECMP member. + * */ +Route(key, port, src_ip, None) :- + RouterPortNetworksIPv4Addr(.port = port, .addr = addr), + var key = RouteKey{DstIp, IPv4{addr.addr}, addr.plen}, + var src_ip = IPv4{addr.addr}. + +Route(key, port, src_ip, None) :- + RouterPortNetworksIPv6Addr(.port = port, .addr = addr), + var key = RouteKey{DstIp, IPv6{addr.addr}, addr.plen}, + var src_ip = IPv6{addr.addr}. + +Flow(.logical_datapath = r.lr._uuid, + .stage = router_stage(IN, IP_ROUTING_ECMP), + .priority = 150, + .__match = "${rEG_ECMP_GROUP_ID()} == 0", + .actions = "next;", + .external_ids = map_empty()) :- + r in &Router(). + +/* Convert the static routes to flows. */ +Route(key, dst.port, dst.src_ip, Some{dst.nexthop}) :- + RouterStaticRoute(.router = &router, .key = key, .dsts = dsts), + set_size(dsts) == 1, + Some{var dst} = set_nth(dsts, 0). + +/* Return a vector of pairs (1, set[0]), ... (n, set[n - 1]). */ +function numbered_vec(set: Set<'A>) : Vec<(bit<16>, 'A)> = { + var vec = vec_with_capacity(set_size(set)); + var i = 1; + for (x in set) { + vec_push(vec, (i, x)); + i = i + 1 + }; + vec +} + +relation EcmpGroup( + group_id: bit<16>, + router: Ref<Router>, + key: route_key, + dsts: Set<route_dst>, + route_match: string, // This is build_route_match(key).0 + route_priority: integer) // This is build_route_match(key).1 + +EcmpGroup(group_id, router, key, dsts, route_match, route_priority) :- + r in RouterStaticRoute(.router = router, .key = key, .dsts = dsts), + set_size(dsts) > 1, + var groups = (router, key, dsts).group_by(()).to_set(), + var group_id_and_group = FlatMap(numbered_vec(groups)), + (var group_id, (var router, var key, var dsts)) = group_id_and_group, + (var route_match, var route_priority0) = build_route_match(key), + var route_priority = route_priority0 as integer. + +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_ROUTING), + .priority = route_priority, + .__match = route_match, + .actions = actions, + .external_ids = map_empty()) :- + EcmpGroup(group_id, router, key, dsts, route_match, route_priority), + var all_member_ids = { + var member_ids = vec_with_capacity(set_size(dsts)); + for (i in range_vec(1, set_size(dsts)+1, 1)) { + vec_push(member_ids, "${i}") + }; + string_join(member_ids, ", ") + }, + var actions = + "ip.ttl--; " + "flags.loopback = 1; " + "${rEG_ECMP_GROUP_ID()} = ${group_id}; " /* XXX */ + "${rEG_ECMP_MEMBER_ID()} = select(${all_member_ids});". + +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_ROUTING_ECMP), + .priority = 100, + .__match = __match, + .actions = actions, + .external_ids = map_empty()) :- + EcmpGroup(group_id, router, key, dsts, _, _), + var member_id_and_dst = FlatMap(numbered_vec(dsts)), + (var member_id, var dst) = member_id_and_dst, + var xx = ip46_xxreg(dst.nexthop), + var __match = "${rEG_ECMP_GROUP_ID()} == ${group_id} && " + "${rEG_ECMP_MEMBER_ID()} == ${member_id}", + var actions = "${xx}${rEG_NEXT_HOP()} = ${dst.nexthop}; " + "${xx}${rEG_SRC()} = ${dst.src_ip}; " + "eth.src = ${dst.port.networks.ea}; " + "outport = ${dst.port.json_name}; " + "next;". + +/* If symmetric ECMP replies are enabled, then packets that arrive over + * an ECMP route need to go through conntrack. + */ +relation EcmpSymmetricReply( + router: Ref<Router>, + dst: route_dst, + route_match: string, + tunkey: integer) +EcmpSymmetricReply(router, dst, route_match, tunkey) :- + EcmpGroup(.router = router, .dsts = dsts, .route_match = route_match), + router.is_gateway, + var dst = FlatMap(dsts), + dst.ecmp_symmetric_reply, + PortTunKeyAllocation(.port = dst.port.lrp._uuid, .tunkey = tunkey). + +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, DEFRAG), + .priority = 100, + .__match = __match, + .actions = "ct_next;", + .external_ids = map_empty()) :- + EcmpSymmetricReply(router, dst, route_match, _), + var __match = "inport == ${dst.port.json_name} && ${route_match}". + +/* And packets that go out over an ECMP route need conntrack. + XXX this seems to exactly duplicate the above flow? */ + +/* Save src eth and inport in ct_label for packets that arrive over + * an ECMP route. + */ +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ECMP_STATEFUL), + .priority = 100, + .__match = __match, + .actions = actions, + .external_ids = map_empty()) :- + EcmpSymmetricReply(router, dst, route_match, tunkey), + var __match = "inport == ${dst.port.json_name} && ${route_match} && " + "(ct.new && !ct.est)", + var actions = "ct_commit { ct_label.ecmp_reply_eth = eth.src;" + " ct_label.ecmp_reply_port = ${tunkey};}; next;". + +/* Bypass ECMP selection if we already have ct_label information + * for where to route the packet. + */ +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_ROUTING), + .priority = 100, + .__match = "${ecmp_reply} && ${route_match}", + .actions = "ip.ttl--; " + "flags.loopback = 1; " + "eth.src = ${dst.port.networks.ea}; " + "${xx}reg1 = ${dst.src_ip}; " + "outport = ${dst.port.json_name}; " + "next;", + .external_ids = map_empty()), +/* Egress reply traffic for symmetric ECMP routes skips router policies. */ +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, POLICY), + .priority = 65535, + .__match = ecmp_reply, + .actions = "next;", + .external_ids = map_empty()), +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 200, + .__match = ecmp_reply, + .actions = "eth.dst = ct_label.ecmp_reply_eth; next;", + .external_ids = map_empty()) :- + EcmpSymmetricReply(router, dst, route_match, tunkey), + var ecmp_reply = "ct.rpl && ct_label.ecmp_reply_port == ${tunkey}", + var xx = ip46_xxreg(dst.nexthop). + + +/* IP Multicast lookup. Here we set the output port, adjust TTL and advance + * to next table (priority 500). + */ +/* Drop IPv6 multicast traffic that shouldn't be forwarded, + * i.e., router solicitation and router advertisement. + */ +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, IP_ROUTING), + .priority = 550, + .__match = "nd_rs || nd_ra", + .actions = "drop;", + .external_ids = map_empty()) :- + router in &Router(). + +for (IgmpRouterMulticastGroup(address, &rtr, ports)) { + for (RouterMcastFloodPorts(&rtr, flood_ports) if rtr.mcast_cfg.relay) { + var flood_static = not set_is_empty(flood_ports) in + var mc_static = json_string_escape(mC_STATIC().0) in + var static_act = { + if (flood_static) { + "clone { " + "outport = ${mc_static}; " + "ip.ttl--; " + "next; " + "};" + } else { + "" + } + } in + Some{var ip} = ip46_parse(address) in + var ipX = ip46_ipX(ip) in + Flow(.logical_datapath = rtr.lr._uuid, + .stage = router_stage(IN, IP_ROUTING), + .priority = 500, + .__match = "${ipX} && ${ipX}.dst == ${address}", + .actions = + "${static_act} outport = ${json_string_escape(address)}; " + "ip.ttl--; next;", + .external_ids = map_empty()) + } +} + +/* If needed, flood unregistered multicast on statically configured ports. + * Priority 450. Otherwise drop any multicast traffic. + */ +for (RouterMcastFloodPorts(&rtr, flood_ports) if rtr.mcast_cfg.relay) { + var mc_static = json_string_escape(mC_STATIC().0) in + var flood_static = not set_is_empty(flood_ports) in + var actions = if (flood_static) { + "clone { " + "outport = ${mc_static}; " + "ip.ttl--; " + "next; " + "};" + } else { + "drop;" + } in + Flow(.logical_datapath = rtr.lr._uuid, + .stage = router_stage(IN, IP_ROUTING), + .priority = 450, + .__match = "ip4.mcast || ip6.mcast", + .actions = actions, + .external_ids = map_empty()) +} + +/* Logical router ingress table POLICY: Policy. + * + * A packet that arrives at this table is an IP packet that should be + * permitted/denied/rerouted to the address in the rule's nexthop. + * This table sets outport to the correct out_port, + * eth.src to the output port's MAC address, + * the appropriate register to the next-hop IP address (leaving + * 'ip[46].dst', the packet’s final destination, unchanged), and + * advances to the next table for ARP/ND resolution. */ +for (&Router(.lr = lr)) { + /* This is a catch-all rule. It has the lowest priority (0) + * does a match-all("1") and pass-through (next) */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, POLICY), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +function stage_hint(_uuid: uuid): Map<string,string> = { + ["stage-hint" -> "${hex(_uuid[127:96])}"] +} + + +/* Convert routing policies to flows. */ +function pkt_mark_policy(options: Map<string,string>): string { + var pkt_mark = map_get_uint_def(options, "pkt_mark", 0); + if (pkt_mark > 0) { + "pkt.mark = ${pkt_mark}; " + } else { + "" + } +} +Flow(.logical_datapath = r.lr._uuid, + .stage = router_stage(IN, POLICY), + .priority = policy.priority, + .__match = policy.__match, + .actions = actions, + .external_ids = stage_hint(policy._uuid)) :- + r in &Router(), + var policy_uuid = FlatMap(r.lr.policies), + policy in nb::Logical_Router_Policy(._uuid = policy_uuid), + policy.action == "reroute", + out_port in &RouterPort(.router = r), + Some{var nexthop_s} = policy.nexthop, + Some{var nexthop} = ip46_parse(nexthop_s), + Some{var src_ip} = find_lrp_member_ip(out_port.networks, nexthop), + /* + None: + VLOG_WARN_RL(&rl, "lrp_addr not found for routing policy " + " priority %"PRId64" nexthop %s", + rule->priority, rule->nexthop); + */ + var xx = ip46_xxreg(src_ip), + var actions = (pkt_mark_policy(policy.options) ++ + "${xx}${rEG_NEXT_HOP()} = ${nexthop}; " + "${xx}${rEG_SRC()} = ${src_ip}; " + "eth.src = ${out_port.networks.ea}; " + "outport = ${out_port.json_name}; " + "flags.loopback = 1; " + "next;"). +Flow(.logical_datapath = r.lr._uuid, + .stage = router_stage(IN, POLICY), + .priority = policy.priority, + .__match = policy.__match, + .actions = "drop;", + .external_ids = stage_hint(policy._uuid)) :- + r in &Router(), + var policy_uuid = FlatMap(r.lr.policies), + policy in nb::Logical_Router_Policy(._uuid = policy_uuid), + policy.action == "drop". +Flow(.logical_datapath = r.lr._uuid, + .stage = router_stage(IN, POLICY), + .priority = policy.priority, + .__match = policy.__match, + .actions = pkt_mark_policy(policy.options) ++ "next;", + .external_ids = stage_hint(policy._uuid)) :- + r in &Router(), + var policy_uuid = FlatMap(r.lr.policies), + policy in nb::Logical_Router_Policy(._uuid = policy_uuid), + policy.action == "allow". + +/* XXX destination unreachable */ + +/* Local router ingress table ARP_RESOLVE: ARP Resolution. + * + * Multicast packets already have the outport set so just advance to next + * table (priority 500). + */ +for (&Router(.lr = lr)) { + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 500, + .__match = "ip4.mcast || ip6.mcast", + .actions = "next;", + .external_ids = map_empty()) +} + +/* Local router ingress table ARP_RESOLVE: ARP Resolution. + * + * Any packet that reaches this table is an IP packet whose next-hop IP + * address is in the next-hop register. (ip4.dst is the final destination.) This table + * resolves the IP address in the next-hop register into an output port in outport and an + * Ethernet address in eth.dst. */ +// FIXME: does this apply to redirect ports? +for (rp in &RouterPort(.peer = PeerRouter{peer_port, _}, + .router = &router, + .networks = networks)) +{ + for (&RouterPort(.lrp = nb::Logical_Router_Port{._uuid = peer_port}, + .json_name = peer_json_name, + .router = &peer_router)) + { + /* This is a logical router port. If next-hop IP address in + * the next-hop register matches IP address of this router port, then + * the packet is intended to eventually be sent to this + * logical port. Set the destination mac address using this + * port's mac address. + * + * The packet is still in peer's logical pipeline. So the match + * should be on peer's outport. */ + if (not vec_is_empty(networks.ipv4_addrs)) { + var __match = "outport == ${peer_json_name} && " + "${rEG_NEXT_HOP()} == " ++ + format_v4_networks(networks, false) in + Flow(.logical_datapath = peer_router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = __match, + .actions = "eth.dst = ${networks.ea}; next;", + .external_ids = stage_hint(rp.lrp._uuid)) + }; + + if (not vec_is_empty(networks.ipv6_addrs)) { + var __match = "outport == ${peer_json_name} && " + "xx${rEG_NEXT_HOP()} == " ++ + format_v6_networks(networks) in + Flow(.logical_datapath = peer_router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = __match, + .actions = "eth.dst = ${networks.ea}; next;", + .external_ids = stage_hint(rp.lrp._uuid)) + } + } +} + +/* Packet is on a non gateway chassis and + * has an unresolved ARP on a network behind gateway + * chassis attached router port. Since, redirect type + * is "bridged", instead of calling "get_arp" + * on this node, we will redirect the packet to gateway + * chassis, by setting destination mac router port mac.*/ +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 50, + .__match = "outport == ${rp.json_name} && " + "!is_chassis_resident(${router.redirect_port_name})", + .actions = "eth.dst = ${rp.networks.ea}; next;", + .external_ids = stage_hint(lrp._uuid)) :- + rp in &RouterPort(.lrp = lrp, .router = router), + router.redirect_port_name != "", + Some{"bridged"} = map_get(lrp.options, "redirect-type"). + + +/* Drop IP traffic destined to router owned IPs. Part of it is dropped + * in stage "lr_in_ip_input" but traffic that could have been unSNATed + * but didn't match any existing session might still end up here. + * + * Priority 1. + */ +Flow(.logical_datapath = lr_uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 1, + .__match = "ip4.dst == {" ++ match_ips.join(", ") ++ "}", + .actions = "drop;", + .external_ids = stage_hint(lrp_uuid)) :- + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, + .router = &Router{.snat_ips = snat_ips, + .lr = nb::Logical_Router{._uuid = lr_uuid}}, + .networks = networks), + var addr = FlatMap(networks.ipv4_addrs), + snat_ips.contains_key(IPv4{addr.addr}), + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). +Flow(.logical_datapath = lr_uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 1, + .__match = "ip6.dst == {" ++ match_ips.join(", ") ++ "}", + .actions = "drop;", + .external_ids = stage_hint(lrp_uuid)) :- + &RouterPort(.lrp = nb::Logical_Router_Port{._uuid = lrp_uuid}, + .router = &Router{.snat_ips = snat_ips, + .lr = nb::Logical_Router{._uuid = lr_uuid}}, + .networks = networks), + var addr = FlatMap(networks.ipv6_addrs), + snat_ips.contains_key(IPv6{addr.addr}), + var match_ips = "${addr.addr}".group_by((lr_uuid, lrp_uuid)).to_vec(). + +/* This is a logical switch port that backs a VM or a container. + * Extract its addresses. For each of the address, go through all + * the router ports attached to the switch (to which this port + * connects) and if the address in question is reachable from the + * router port, add an ARP/ND entry in that router's pipeline. */ +for (SwitchPortIPv4Address( + .port = &SwitchPort{.lsp = lsp, .sw = &sw}, + .ea = ea, + .addr = addr) + if lsp.__type != "router" and lsp.__type != "virtual" and lsp.is_enabled()) +{ + for (&SwitchPort(.sw = &Switch{.ls = nb::Logical_Switch{._uuid = sw.ls._uuid}}, + .peer = Some{&peer@RouterPort{.router = &peer_router}})) + { + Some{_} = find_lrp_member_ip(peer.networks, IPv4{addr.addr}) in + Flow(.logical_datapath = peer_router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = "outport == ${peer.json_name} && " + "${rEG_NEXT_HOP()} == ${addr.addr}", + .actions = "eth.dst = ${ea}; next;", + .external_ids = stage_hint(lsp._uuid)) + } +} + +for (SwitchPortIPv6Address( + .port = &SwitchPort{.lsp = lsp, .sw = &sw}, + .ea = ea, + .addr = addr) + if lsp.__type != "router" and lsp.__type != "virtual" and lsp.is_enabled()) +{ + for (&SwitchPort(.sw = &Switch{.ls = nb::Logical_Switch{._uuid = sw.ls._uuid}}, + .peer = Some{&peer@RouterPort{.router = &peer_router}})) + { + Some{_} = find_lrp_member_ip(peer.networks, IPv6{addr.addr}) in + Flow(.logical_datapath = peer_router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = "outport == ${peer.json_name} && " + "xx${rEG_NEXT_HOP()} == ${addr.addr}", + .actions = "eth.dst = ${ea}; next;", + .external_ids = stage_hint(lsp._uuid)) + } +} + +/* True if 's' is an empty set or a set that contains just an empty string, + * false otherwise. + * + * This is meant for sets of 0 or 1 elements, like the OVSDB integration + * with DDlog uses. */ +function is_empty_set_or_string(s: Option<string>): bool = { + match (s) { + None -> true, + Some{""} -> true, + _ -> false + } +} + +/* This is a virtual port. Add ARP replies for the virtual ip with + * the mac of the present active virtual parent. + * If the logical port doesn't have virtual parent set in + * Port_Binding table, then add the flow to set eth.dst to + * 00:00:00:00:00:00 and advance to next table so that ARP is + * resolved by router pipeline using the arp{} action. + * The MAC_Binding entry for the virtual ip might be invalid. */ +Flow(.logical_datapath = peer.router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = "outport == ${peer.json_name} && " + "${rEG_NEXT_HOP()} == ${virtual_ip}", + .actions = "eth.dst = 00:00:00:00:00:00; next;", + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "virtual"}), + Some{var virtual_ip_s} = map_get(lsp.options, "virtual-ip"), + Some{var virtual_parents} = map_get(lsp.options, "virtual-parents"), + Some{var virtual_ip} = ip_parse(virtual_ip_s), + pb in sb::Port_Binding(.logical_port = sp.lsp.name), + is_empty_set_or_string(pb.virtual_parent) or is_none(pb.chassis), + sp2 in &SwitchPort(.sw = sp.sw, .peer = Some{peer}), + Some{_} = find_lrp_member_ip(peer.networks, IPv4{virtual_ip}). +Flow(.logical_datapath = peer.router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = "outport == ${peer.json_name} && " + "${rEG_NEXT_HOP()} == ${virtual_ip}", + .actions = "eth.dst = ${address.ea}; next;", + .external_ids = stage_hint(sp.lsp._uuid)) :- + sp in &SwitchPort(.lsp = lsp@nb::Logical_Switch_Port{.__type = "virtual"}), + Some{var virtual_ip_s} = map_get(lsp.options, "virtual-ip"), + Some{var virtual_parents} = map_get(lsp.options, "virtual-parents"), + Some{var virtual_ip} = ip_parse(virtual_ip_s), + pb in sb::Port_Binding(.logical_port = sp.lsp.name), + not (is_empty_set_or_string(pb.virtual_parent) or is_none(pb.chassis)), + Some{var virtual_parent} = pb.virtual_parent, + vp in &SwitchPort(.lsp = nb::Logical_Switch_Port{.name = virtual_parent}), + var address = FlatMap(vp.static_addresses), + sp2 in &SwitchPort(.sw = sp.sw, .peer = Some{peer}), + Some{_} = find_lrp_member_ip(peer.networks, IPv4{virtual_ip}). + +/* This is a logical switch port that connects to a router. */ + +/* The peer of this switch port is the router port for which + * we need to add logical flows such that it can resolve + * ARP entries for all the other router ports connected to + * the switch in question. */ +for (&SwitchPort(.lsp = lsp1, + .peer = Some{&peer1@RouterPort{.router = &peer_router}}, + .sw = &sw) + if lsp1.is_enabled() and + not map_get_bool_def(peer_router.lr.options, "dynamic_neigh_routers", false)) +{ + for (&SwitchPort(.lsp = lsp2, .peer = Some{&peer2}, + .sw = &Switch{.ls = nb::Logical_Switch{._uuid = sw.ls._uuid}}) + /* Skip the router port under consideration. */ + if peer2.lrp._uuid != peer1.lrp._uuid) + { + if (not vec_is_empty(peer2.networks.ipv4_addrs)) { + Flow(.logical_datapath = peer_router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = "outport == ${peer1.json_name} && " + "${rEG_NEXT_HOP()} == ${format_v4_networks(peer2.networks, false)}", + .actions = "eth.dst = ${peer2.networks.ea}; next;", + .external_ids = stage_hint(lsp1._uuid)) + }; + + if (not vec_is_empty(peer2.networks.ipv6_addrs)) { + Flow(.logical_datapath = peer_router.lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 100, + .__match = "outport == ${peer1.json_name} && " + "xx${rEG_NEXT_HOP()} == ${format_v6_networks(peer2.networks)}", + .actions = "eth.dst = ${peer2.networks.ea}; next;", + .external_ids = stage_hint(lsp1._uuid)) + } + } +} + +for (&Router(.lr = lr)) +{ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 0, + .__match = "ip4", + .actions = "get_arp(outport, ${rEG_NEXT_HOP()}); next;", + .external_ids = map_empty()); + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ARP_RESOLVE), + .priority = 0, + .__match = "ip6", + .actions = "get_nd(outport, xx${rEG_NEXT_HOP()}); next;", + .external_ids = map_empty()) +} + +/* Local router ingress table CHK_PKT_LEN: Check packet length. + * + * Any IPv4 packet with outport set to the distributed gateway + * router port, check the packet length and store the result in the + * 'REGBIT_PKT_LARGER' register bit. + * + * Local router ingress table LARGER_PKTS: Handle larger packets. + * + * Any IPv4 packet with outport set to the distributed gateway + * router port and the 'REGBIT_PKT_LARGER' register bit is set, + * generate ICMPv4 packet with type 3 (Destination Unreachable) and + * code 4 (Fragmentation needed). + * */ +Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, CHK_PKT_LEN), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) :- + &Router(.lr = lr). +Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LARGER_PKTS), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) :- + &Router(.lr = lr). +Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, CHK_PKT_LEN), + .priority = 50, + .__match = "outport == ${l3dgw_port_json_name}", + .actions = "${rEGBIT_PKT_LARGER()} = check_pkt_larger(${mtu}); " + "next;", + .external_ids = stage_hint(l3dgw_port._uuid)) :- + r in &Router(.lr = lr), + Some{var l3dgw_port} = r.l3dgw_port, + var l3dgw_port_json_name = json_string_escape(l3dgw_port.name), + r.redirect_port_name != "", + var gw_mtu = map_get_int_def(l3dgw_port.options, "gateway_mtu", 0), + gw_mtu > 0, + var mtu = gw_mtu + vLAN_ETH_HEADER_LEN(). +Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LARGER_PKTS), + .priority = 50, + .__match = "inport == ${rp.json_name} && outport == ${l3dgw_port_json_name} && " + "ip4 && ${rEGBIT_PKT_LARGER()}", + .actions = "icmp4_error {" + "${rEGBIT_EGRESS_LOOPBACK()} = 1; " + "eth.dst = ${rp.networks.ea}; " + "ip4.dst = ip4.src; " + "ip4.src = ${first_ipv4.addr}; " + "ip.ttl = 255; " + "icmp4.type = 3; /* Destination Unreachable. */ " + "icmp4.code = 4; /* Frag Needed and DF was Set. */ " + /* Set icmp4.frag_mtu to gw_mtu */ + "icmp4.frag_mtu = ${gw_mtu}; " + "next(pipeline=ingress, table=0); " + "};", + .external_ids = stage_hint(rp.lrp._uuid)) :- + r in &Router(.lr = lr), + Some{var l3dgw_port} = r.l3dgw_port, + var l3dgw_port_json_name = json_string_escape(l3dgw_port.name), + r.redirect_port_name != "", + var gw_mtu = map_get_int_def(l3dgw_port.options, "gateway_mtu", 0), + gw_mtu > 0, + rp in &RouterPort(.router = r), + rp.lrp != l3dgw_port, + Some{var first_ipv4} = vec_nth(rp.networks.ipv4_addrs, 0). +Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, LARGER_PKTS), + .priority = 50, + .__match = "inport == ${rp.json_name} && outport == ${l3dgw_port_json_name} && " + "ip6 && ${rEGBIT_PKT_LARGER()}", + .actions = "icmp6_error {" + "${rEGBIT_EGRESS_LOOPBACK()} = 1; " + "eth.dst = ${rp.networks.ea}; " + "ip6.dst = ip6.src; " + "ip6.src = ${first_ipv6.addr}; " + "ip.ttl = 255; " + "icmp6.type = 2; /* Packet Too Big. */ " + "icmp6.code = 0; " + /* Set icmp6.frag_mtu to gw_mtu */ + "icmp6.frag_mtu = ${gw_mtu}; " + "next(pipeline=ingress, table=0); " + "};", + .external_ids = stage_hint(rp.lrp._uuid)) :- + r in &Router(.lr = lr), + Some{var l3dgw_port} = r.l3dgw_port, + var l3dgw_port_json_name = json_string_escape(l3dgw_port.name), + r.redirect_port_name != "", + var gw_mtu = map_get_int_def(l3dgw_port.options, "gateway_mtu", 0), + gw_mtu > 0, + rp in &RouterPort(.router = r), + rp.lrp != l3dgw_port, + Some{var first_ipv6} = vec_nth(rp.networks.ipv6_addrs, 0). + +/* Logical router ingress table GW_REDIRECT: Gateway redirect. + * + * For traffic with outport equal to the l3dgw_port + * on a distributed router, this table redirects a subset + * of the traffic to the l3redirect_port which represents + * the central instance of the l3dgw_port. + */ +for (&Router(.lr = lr, + .l3dgw_port = l3dgw_port, + .redirect_port_name = redirect_port_name)) +{ + /* For traffic with outport == l3dgw_port, if the + * packet did not match any higher priority redirect + * rule, then the traffic is redirected to the central + * instance of the l3dgw_port. */ + Some{var gwport} = l3dgw_port in + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, GW_REDIRECT), + .priority = 50, + .__match = "outport == ${json_string_escape(gwport.name)}", + .actions = "outport = ${redirect_port_name}; next;", + .external_ids = stage_hint(gwport._uuid)); + + /* Packets are allowed by default. */ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, GW_REDIRECT), + .priority = 0, + .__match = "1", + .actions = "next;", + .external_ids = map_empty()) +} + +/* Local router ingress table ARP_REQUEST: ARP request. + * + * In the common case where the Ethernet destination has been resolved, + * this table outputs the packet (priority 0). Otherwise, it composes + * and sends an ARP/IPv6 NA request (priority 100). */ +Flow(.logical_datapath = router.lr._uuid, + .stage = router_stage(IN, ARP_REQUEST), + .priority = 200, + .__match = __match, + .actions = actions, + .external_ids = map_empty()) :- + rsr in RouterStaticRoute(.router = &router), + var dst = FlatMap(rsr.dsts), + IPv6{var gw_ip6} = dst.nexthop, + var __match = "eth.dst == 00:00:00:00:00:00 && " + "ip6 && xx${rEG_NEXT_HOP()} == ${dst.nexthop}", + var sn_addr = in6_addr_solicited_node(gw_ip6), + var eth_dst = ipv6_multicast_to_ethernet(sn_addr), + var sn_addr_s = ipv6_string_mapped(sn_addr), + var actions = "nd_ns { " + "eth.dst = ${eth_dst}; " + "ip6.dst = ${sn_addr_s}; " + "nd.target = ${dst.nexthop}; " + "output; " + "};". + +for (&Router(.lr = lr)) +{ + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ARP_REQUEST), + .priority = 100, + .__match = "eth.dst == 00:00:00:00:00:00 && ip4", + .actions = "arp { " + "eth.dst = ff:ff:ff:ff:ff:ff; " + "arp.spa = ${rEG_SRC()}; " + "arp.tpa = ${rEG_NEXT_HOP()}; " + "arp.op = 1; " /* ARP request */ + "output; " + "};", + .external_ids = map_empty()); + + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ARP_REQUEST), + .priority = 100, + .__match = "eth.dst == 00:00:00:00:00:00 && ip6", + .actions = "nd_ns { " + "nd.target = xx${rEG_NEXT_HOP()}; " + "output; " + "};", + .external_ids = map_empty()); + + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(IN, ARP_REQUEST), + .priority = 0, + .__match = "1", + .actions = "output;", + .external_ids = map_empty()) +} + + +/* Logical router egress table DELIVERY: Delivery (priority 100). + * + * Priority 100 rules deliver packets to enabled logical ports. */ +for (&RouterPort(.lrp = lrp, + .json_name = json_name, + .networks = lrp_networks, + .router = &Router{.lr = lr, .mcast_cfg = &mcast_cfg}) + /* Drop packets to disabled logical ports (since logical flow + * tables are default-drop). */ + if lrp.is_enabled()) +{ + /* If multicast relay is enabled then also adjust source mac for IP + * multicast traffic. + */ + if (mcast_cfg.relay) { + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, DELIVERY), + .priority = 110, + .__match = "(ip4.mcast || ip6.mcast) && " + "outport == ${json_name}", + .actions = "eth.src = ${lrp_networks.ea}; output;", + .external_ids = stage_hint(lrp._uuid)) + }; + /* No egress packets should be processed in the context of + * a chassisredirect port. The chassisredirect port should + * be replaced by the l3dgw port in the local output + * pipeline stage before egress processing. */ + + Flow(.logical_datapath = lr._uuid, + .stage = router_stage(OUT, DELIVERY), + .priority = 100, + .__match = "outport == ${json_name}", + .actions = "output;", + .external_ids = stage_hint(lrp._uuid)) +} + +/* + * Datapath tunnel key allocation: + * + * Allocates a globally unique tunnel id in the range 1...2**24-1 for + * each Logical_Switch and Logical_Router. + */ + +function oVN_MAX_DP_KEY(): integer { (64'd1 << 24) - 1 } +function oVN_MAX_DP_GLOBAL_NUM(): integer { (64'd1 << 16) - 1 } +function oVN_MIN_DP_KEY_LOCAL(): integer { 1 } +function oVN_MAX_DP_KEY_LOCAL(): integer { oVN_MAX_DP_KEY() - oVN_MAX_DP_GLOBAL_NUM() } +function oVN_MIN_DP_KEY_GLOBAL(): integer { oVN_MAX_DP_KEY_LOCAL() + 1 } +function oVN_MAX_DP_KEY_GLOBAL(): integer { oVN_MAX_DP_KEY() } + +function oVN_MAX_DP_VXLAN_KEY(): integer { (64'd1 << 12) - 1 } +function oVN_MAX_DP_VXLAN_KEY_LOCAL(): integer { oVN_MAX_DP_KEY() - oVN_MAX_DP_GLOBAL_NUM() } + +/* If any chassis uses VXLAN encapsulation, then the entire deployment is in VXLAN mode. */ +relation IsVxlanMode0() +IsVxlanMode0() :- + sb::Chassis(.encaps = encaps), + var encap_uuid = FlatMap(encaps), + sb::Encap(._uuid = encap_uuid, .__type = "vxlan"). + +relation IsVxlanMode[bool] +IsVxlanMode[true] :- + IsVxlanMode0(). +IsVxlanMode[false] :- + Unit(), + not IsVxlanMode0(). + +/* The maximum datapath tunnel key that may be used. */ +relation OvnMaxDpKeyLocal[integer] +/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ +OvnMaxDpKeyLocal[oVN_MAX_DP_VXLAN_KEY()] :- IsVxlanMode[true]. +OvnMaxDpKeyLocal[oVN_MAX_DP_KEY() - oVN_MAX_DP_GLOBAL_NUM()] :- IsVxlanMode[false]. + +function get_dp_tunkey(map: Map<string,string>, key: string): Option<integer> { + match (map_get(map, key)) { + Some{value} -> match (str_to_int(value, 10)) { + Some{x} -> if (x > 0 and x < (2<<24)) { + Some{x} + } else { + None + }, + _ -> None + }, + _ -> None + } +} + +// Tunnel keys requested by datapaths. +relation RequestedTunKey(datapath: uuid, tunkey: integer) +RequestedTunKey(uuid, tunkey) :- + ls in nb::Logical_Switch(._uuid = uuid), + Some{var tunkey} = get_dp_tunkey(ls.other_config, "requested-tnl-key"). +RequestedTunKey(uuid, tunkey) :- + lr in nb::Logical_Router(._uuid = uuid), + Some{var tunkey} = get_dp_tunkey(lr.options, "requested-tnl-key"). +Warning[message] :- + RequestedTunKey(datapath, tunkey), + var count = datapath.group_by((tunkey)).size(), + count > 1, + var message = "${count} logical switches or routers request " + "datapath tunnel key ${tunkey}". + +// Assign tunnel keys: +// - First priority to requested tunnel keys. +// - Second priority to already assigned tunnel keys. +// In either case, make an arbitrary choice in case of conflicts within a +// priority level. +relation AssignedTunKey(datapath: uuid, tunkey: integer) +AssignedTunKey(datapath, tunkey) :- + RequestedTunKey(datapath, tunkey), + var datapath = datapath.group_by(tunkey).first(). +AssignedTunKey(datapath, tunkey) :- + sb::Datapath_Binding(._uuid = datapath, .tunnel_key = tunkey), + not RequestedTunKey(_, tunkey), + not RequestedTunKey(datapath, _), + var datapath = datapath.group_by(tunkey).first(). + +// all tunnel keys already in use in the Realized table +relation AllocatedTunKeys(keys: Set<integer>) +AllocatedTunKeys(keys) :- + AssignedTunKey(.tunkey = tunkey), + var keys = tunkey.group_by(()).to_set(). + +// Datapath_Binding's not yet in the Realized table +relation NotYetAllocatedTunKeys(datapaths: Vec<uuid>) + +NotYetAllocatedTunKeys(datapaths) :- + OutProxy_Datapath_Binding(._uuid = datapath), + not AssignedTunKey(datapath, _), + var datapaths = datapath.group_by(()).to_vec(). + +// Perform the allocation +relation TunKeyAllocation(datapath: uuid, tunkey: integer) + +TunKeyAllocation(datapath, tunkey) :- AssignedTunKey(datapath, tunkey). + +// Case 1: AllocatedTunKeys relation is not empty (i.e., contains +// a single record that stores a set of allocated keys) +TunKeyAllocation(datapath, tunkey) :- + NotYetAllocatedTunKeys(unallocated), + AllocatedTunKeys(allocated), + OvnMaxDpKeyLocal[max_dp_key_local], + var allocation = FlatMap(allocate(allocated, unallocated, 1, max_dp_key_local)), + (var datapath, var tunkey) = allocation. + +// Case 2: AllocatedTunKeys relation is empty +TunKeyAllocation(datapath, tunkey) :- + NotYetAllocatedTunKeys(unallocated), + not AllocatedTunKeys(_), + OvnMaxDpKeyLocal[max_dp_key_local], + var allocation = FlatMap(allocate(set_empty(), unallocated, 1, max_dp_key_local)), + (var datapath, var tunkey) = allocation. + +/* + * Port id allocation: + * + * Port IDs in a per-datapath space in the range 1...2**15-1 + */ + +function get_port_tunkey(map: Map<string,string>, key: string): Option<integer> { + match (map_get(map, key)) { + Some{value} -> match (str_to_int(value, 10)) { + Some{x} -> if (x > 0 and x < (2<<15)) { + Some{x} + } else { + None + }, + _ -> None + }, + _ -> None + } +} + +// Tunnel keys requested by port bindings. +relation RequestedPortTunKey(datapath: uuid, port: uuid, tunkey: integer) +RequestedPortTunKey(datapath, port, tunkey) :- + sp in &SwitchPort(), + var datapath = sp.sw.ls._uuid, + var port = sp.lsp._uuid, + Some{var tunkey} = get_port_tunkey(sp.lsp.options, "requested-tnl-key"). +RequestedPortTunKey(datapath, port, tunkey) :- + rp in &RouterPort(), + var datapath = rp.router.lr._uuid, + var port = rp.lrp._uuid, + Some{var tunkey} = get_port_tunkey(rp.lrp.options, "requested-tnl-key"). +Warning[message] :- + RequestedPortTunKey(datapath, port, tunkey), + var count = port.group_by((datapath, tunkey)).size(), + count > 1, + var message = "${count} logical ports in the same datapath " + "request port tunnel key ${tunkey}". + +// Assign tunnel keys: +// - First priority to requested tunnel keys. +// - Second priority to already assigned tunnel keys. +// In either case, make an arbitrary choice in case of conflicts within a +// priority level. +relation AssignedPortTunKey(datapath: uuid, port: uuid, tunkey: integer) +AssignedPortTunKey(datapath, port, tunkey) :- + RequestedPortTunKey(datapath, port, tunkey), + var port = port.group_by((datapath, tunkey)).first(). +AssignedPortTunKey(datapath, port, tunkey) :- + sb::Port_Binding(._uuid = port_uuid, + .datapath = datapath, + .tunnel_key = tunkey), + not RequestedPortTunKey(datapath, _, tunkey), + not RequestedPortTunKey(datapath, port_uuid, _), + var port = port_uuid.group_by((datapath, tunkey)).first(). + +// all tunnel keys already in use in the Realized table +relation AllocatedPortTunKeys(datapath: uuid, keys: Set<integer>) + +AllocatedPortTunKeys(datapath, keys) :- + AssignedPortTunKey(datapath, port, tunkey), + var keys = tunkey.group_by(datapath).to_set(). + +// Port_Binding's not yet in the Realized table +relation NotYetAllocatedPortTunKeys(datapath: uuid, all_logical_ids: Vec<uuid>) + +NotYetAllocatedPortTunKeys(datapath, all_names) :- + OutProxy_Port_Binding(._uuid = port_uuid, .datapath = datapath), + not AssignedPortTunKey(datapath, port_uuid, _), + var all_names = port_uuid.group_by(datapath).to_vec(). + +// Perform the allocation. +relation PortTunKeyAllocation(port: uuid, tunkey: integer) + +// Transfer existing allocations from the realized table. +PortTunKeyAllocation(port, tunkey) :- AssignedPortTunKey(_, port, tunkey). + +// Case 1: AllocatedPortTunKeys(datapath) is not empty (i.e., contains +// a single record that stores a set of allocated keys). +PortTunKeyAllocation(port, tunkey) :- + AllocatedPortTunKeys(datapath, allocated), + NotYetAllocatedPortTunKeys(datapath, unallocated), + var allocation = FlatMap(allocate(allocated, unallocated, 1, 64'hffff)), + (var port, var tunkey) = allocation. + +// Case 2: PortAllocatedTunKeys(datapath) relation is empty +PortTunKeyAllocation(port, tunkey) :- + NotYetAllocatedPortTunKeys(datapath, unallocated), + not AllocatedPortTunKeys(datapath, _), + var allocation = FlatMap(allocate(set_empty(), unallocated, 1, 64'hffff)), + (var port, var tunkey) = allocation. + +/* + * Multicast group tunnel_key allocation: + * + * Tunnel-keys in a per-datapath space in the range 32770...65535 + */ + +// All tunnel keys already in use in the Realized table. +relation AllocatedMulticastGroupTunKeys(datapath_uuid: uuid, keys: Set<integer>) + +AllocatedMulticastGroupTunKeys(datapath_uuid, keys) :- + sb::Multicast_Group(.datapath = datapath_uuid, .tunnel_key = tunkey), + //sb::UUIDMap_Datapath_Binding(datapath, Left{datapath_uuid}), + var keys = tunkey.group_by(datapath_uuid).to_set(). + +// Multicast_Group's not yet in the Realized table. +relation NotYetAllocatedMulticastGroupTunKeys(datapath_uuid: uuid, + all_logical_ids: Vec<string>) + +NotYetAllocatedMulticastGroupTunKeys(datapath_uuid, all_names) :- + OutProxy_Multicast_Group(.name = name, .datapath = datapath_uuid), + not sb::Multicast_Group(.name = name, .datapath = datapath_uuid), + var all_names = name.group_by(datapath_uuid).to_vec(). + +// Perform the allocation +relation MulticastGroupTunKeyAllocation(datapath_uuid: uuid, group: string, tunkey: integer) + +// transfer existing allocations from the realized table +MulticastGroupTunKeyAllocation(datapath_uuid, group, tunkey) :- + //sb::UUIDMap_Datapath_Binding(_, datapath_uuid), + sb::Multicast_Group(.name = group, + .datapath = datapath_uuid, + .tunnel_key = tunkey). + +// Case 1: AllocatedMulticastGroupTunKeys(datapath) is not empty (i.e., +// contains a single record that stores a set of allocated keys) +MulticastGroupTunKeyAllocation(datapath_uuid, group, tunkey) :- + AllocatedMulticastGroupTunKeys(datapath_uuid, allocated), + NotYetAllocatedMulticastGroupTunKeys(datapath_uuid, unallocated), + (_, var min_key) = mC_IP_MCAST_MIN(), + (_, var max_key) = mC_IP_MCAST_MAX(), + var allocation = FlatMap(allocate(allocated, unallocated, + min_key, max_key)), + (var group, var tunkey) = allocation. + +// Case 2: AllocatedMulticastGroupTunKeys(datapath) relation is empty +MulticastGroupTunKeyAllocation(datapath_uuid, group, tunkey) :- + NotYetAllocatedMulticastGroupTunKeys(datapath_uuid, unallocated), + not AllocatedMulticastGroupTunKeys(datapath_uuid, _), + (_, var min_key) = mC_IP_MCAST_MIN(), + (_, var max_key) = mC_IP_MCAST_MAX(), + var allocation = FlatMap(allocate(set_empty(), unallocated, + min_key, max_key)), + (var group, var tunkey) = allocation. + +/* + * Queue ID allocation + * + * Queue IDs on a chassis, for routers that have QoS enabled, in a per-chassis + * space in the range 1...0xf000. It looks to me like there'd only be a small + * number of these per chassis, and probably a small number overall, in case it + * matters. + * + * Queue ID may also need to be deallocated if port loses QoS attributes + * + * This logic applies mainly to sb::Port_Binding records bound to a chassis + * (i.e. with the chassis column nonempty) but "localnet" ports can also + * have a queue ID. For those we use the port's own UUID as the chassis UUID. + */ + +function port_has_qos_params(opts: Map<string, string>): bool = { + map_contains_key(opts, "qos_max_rate") or + map_contains_key(opts, "qos_burst") +} + + +// ports in Out_Port_Binding that require queue ID on chassis +relation PortRequiresQID(port: uuid, chassis: uuid) + +PortRequiresQID(pb._uuid, chassis) :- + pb in OutProxy_Port_Binding(), + pb.__type != "localnet", + port_has_qos_params(pb.options), + sb::Port_Binding(._uuid = pb._uuid, .chassis = chassis_set), + Some{var chassis} = chassis_set. +PortRequiresQID(pb._uuid, pb._uuid) :- + pb in OutProxy_Port_Binding(), + pb.__type == "localnet", + port_has_qos_params(pb.options), + sb::Port_Binding(._uuid = pb._uuid). + +relation AggPortRequiresQID(chassis: uuid, ports: Vec<uuid>) + +AggPortRequiresQID(chassis, ports) :- + PortRequiresQID(port, chassis), + var ports = port.group_by(chassis).to_vec(). + +relation AllocatedQIDs(chassis: uuid, allocated_ids: Map<uuid, integer>) + +AllocatedQIDs(chassis, allocated_ids) :- + pb in sb::Port_Binding(), + pb.__type != "localnet", + Some{var chassis} = pb.chassis, + Some{var qid_str} = map_get(pb.options, "qdisc_queue_id"), + Some{var qid} = parse_dec_u64(qid_str), + var allocated_ids = (pb._uuid, qid).group_by(chassis).to_map(). +AllocatedQIDs(chassis, allocated_ids) :- + pb in sb::Port_Binding(), + pb.__type == "localnet", + var chassis = pb._uuid, + Some{var qid_str} = map_get(pb.options, "qdisc_queue_id"), + Some{var qid} = parse_dec_u64(qid_str), + var allocated_ids = (pb._uuid, qid).group_by(chassis).to_map(). + +// allocate queue IDs to ports +relation QueueIDAllocation(port: uuid, qids: Option<integer>) + +// None for ports that do not require a queue +QueueIDAllocation(port, None) :- + OutProxy_Port_Binding(._uuid = port), + not PortRequiresQID(port, _). + +QueueIDAllocation(port, Some{qid}) :- + AggPortRequiresQID(chassis, ports), + AllocatedQIDs(chassis, allocated_ids), + var allocations = FlatMap(adjust_allocation(allocated_ids, ports, 1, 64'hf000)), + (var port, var qid) = allocations. + +QueueIDAllocation(port, Some{qid}) :- + AggPortRequiresQID(chassis, ports), + not AllocatedQIDs(chassis, _), + var allocations = FlatMap(adjust_allocation(map_empty(), ports, 1, 64'hf000)), + (var port, var qid) = allocations. + +/* + * This allows ovn-northd to preserve options:ipv6_ra_pd_list, which is set by + * ovn-controller. + */ +relation PreserveIPv6RAPDList(lrp_uuid: uuid, ipv6_ra_pd_list: Option<string>) +PreserveIPv6RAPDList(lrp_uuid, ipv6_ra_pd_list) :- + sb::Port_Binding(._uuid = lrp_uuid, .options = options), + var ipv6_ra_pd_list = map_get(options, "ipv6_ra_pd_list"). +PreserveIPv6RAPDList(lrp_uuid, None) :- + nb::Logical_Router_Port(._uuid = lrp_uuid), + not sb::Port_Binding(._uuid = lrp_uuid). + +/* + * Tag allocation for nested containers. + */ + +/* Reserved tags for each parent port, including: + * 1. For ports that need a dynamically allocated tag, existing tag, if any, + * 2. For ports that have a statically assigned tag (via `tag_request`), the + * `tag_request` value. + * 3. For ports that do not have a tag_request, but have a tag statically assigned + * by directly setting the `tag` field, use this value. + */ +relation SwitchPortReservedTag(parent_name: string, tags: integer) + +SwitchPortReservedTag(parent_name, tag) :- + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = needs_dynamic_tag, .parent_name = Some{parent_name}), + Some{var tag} = if (needs_dynamic_tag) { + lsp.tag + } else { + match (lsp.tag_request) { + Some{req} -> Some{req}, + None -> lsp.tag + } + }. + +relation SwitchPortReservedTags(parent_name: string, tags: Set<integer>) + +SwitchPortReservedTags(parent_name, tags) :- + SwitchPortReservedTag(parent_name, tag), + var tags = tag.group_by(parent_name).to_set(). + +SwitchPortReservedTags(parent_name, set_empty()) :- + nb::Logical_Switch_Port(.name = parent_name), + not SwitchPortReservedTag(.parent_name = parent_name). + +/* Allocate tags for ports that require dynamically allocated tags and do not + * have any yet. + */ +relation SwitchPortAllocatedTags(lsp_uuid: uuid, tag: Option<integer>) + +SwitchPortAllocatedTags(lsp_uuid, tag) :- + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = true, .parent_name = Some{parent_name}), + is_none(lsp.tag), + var lsps_need_tag = lsp._uuid.group_by(parent_name).to_vec(), + SwitchPortReservedTags(parent_name, reserved), + var dyn_tags = allocate_opt(reserved, + lsps_need_tag, + 1, /* Tag 0 is invalid for nested containers. */ + 4095), + var lsp_tag = FlatMap(dyn_tags), + (var lsp_uuid, var tag) = lsp_tag. + +/* New tag-to-port assignment: + * Case 1. Statically reserved tag (via `tag_request`), if any. + * Case 2. Existing tag for ports that require a dynamically allocated tag and already have one. + * Case 3. Use newly allocated tags (from `SwitchPortAllocatedTags`) for all other ports. + */ +relation SwitchPortNewDynamicTag(port: uuid, tag: Option<integer>) + +/* Case 1 */ +SwitchPortNewDynamicTag(lsp._uuid, tag) :- + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = false), + var tag = match (lsp.tag_request) { + Some{0} -> None, + treq -> treq + }. + +/* Case 2 */ +SwitchPortNewDynamicTag(lsp._uuid, Some{tag}) :- + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = true), + Some{var tag} = lsp.tag. + +/* Case 3 */ +SwitchPortNewDynamicTag(lsp._uuid, tag) :- + &SwitchPort(.lsp = lsp, .needs_dynamic_tag = true), + is_none(lsp.tag), + SwitchPortAllocatedTags(lsp._uuid, tag). + +/* IP_Multicast table (only applicable for Switches). */ +sb::Out_IP_Multicast(._uuid = cfg.datapath, + .datapath = cfg.datapath, + .enabled = Some{cfg.enabled}, + .querier = Some{cfg.querier}, + .eth_src = cfg.eth_src, + .ip4_src = cfg.ip4_src, + .ip6_src = cfg.ip6_src, + .table_size = Some{cfg.table_size}, + .idle_timeout = Some{cfg.idle_timeout}, + .query_interval = Some{cfg.query_interval}, + .query_max_resp = Some{cfg.query_max_resp}) :- + &McastSwitchCfg[cfg]. + + +relation PortExists(name: string) +PortExists(name) :- nb::Logical_Switch_Port(.name = name). +PortExists(name) :- nb::Logical_Router_Port(.name = name). + +sb::Out_Service_Monitor(._uuid = hash128((svc_monitor.port_name, lbvipbackend.ip, lbvipbackend.port, protocol)), + .ip = "${lbvipbackend.ip}", + .protocol = Some{protocol}, + .port = lbvipbackend.port as integer, + .logical_port = svc_monitor.port_name, + .src_mac = to_string(svc_monitor_mac), + .src_ip = svc_monitor.src_ip, + .options = lbhc.options, + .external_ids = map_empty()) :- + SvcMonitorMac(svc_monitor_mac), + LBVIPBackend[lbvipbackend], + Some{var svc_monitor} = lbvipbackend.svc_monitor, + LoadBalancerHealthCheckRef[lbhc], + PortExists(svc_monitor.port_name), + set_contains(lbvipbackend.lbvip.lb.health_check, lbhc._uuid), + lbhc.vip == lbvipbackend.lbvip.vip_key, + var protocol = default_protocol(lbvipbackend.lbvip.lb.protocol), + protocol != "sctp". + +Warning["SCTP load balancers do not currently support " + "health checks. Not creating health checks for " + "load balancer ${uuid2str(lbvipbackend.lbvip.lb._uuid)}"] :- + LBVIPBackend[lbvipbackend], + default_protocol(lbvipbackend.lbvip.lb.protocol) == "sctp", + Some{var svc_monitor} = lbvipbackend.svc_monitor, + LoadBalancerHealthCheckRef[lbhc], + set_contains(lbvipbackend.lbvip.lb.health_check, lbhc._uuid), + lbhc.vip == lbvipbackend.lbvip.vip_key. diff --git a/northd/ovsdb2ddlog2c b/northd/ovsdb2ddlog2c new file mode 100755 index 000000000000..c66ad81073e1 --- /dev/null +++ b/northd/ovsdb2ddlog2c @@ -0,0 +1,127 @@ +#!/usr/bin/env python3 +# Copyright (c) 2020 Nicira, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import getopt +import sys + +import ovs.json +import ovs.db.error +import ovs.db.schema + +argv0 = sys.argv[0] + +def usage(): + print("""\ +%(argv0)s: ovsdb schema compiler for northd +usage: %(argv0)s [OPTIONS] + +The following option must be specified: + -p, --prefix=PREFIX Prefix for declarations in output. + +The following ovsdb2ddlog options are supported: + -f, --schema-file=FILE OVSDB schema file. + -o, --output-table=TABLE Mark TABLE as output. + --output-only-table=TABLE Mark TABLE as output-only. DDlog will send updates to this table directly to OVSDB without comparing it with current OVSDB state. + --ro=TABLE.COLUMN Ignored. + --rw=TABLE.COLUMN Ignored. + --output-file=FILE.inc Write output to FILE.inc. If this option is not specified, output will be written to stdout. + +The following options are also available: + -h, --help display this help message + -V, --version display version information\ +""" % {'argv0': argv0}) + sys.exit(0) + +if __name__ == "__main__": + try: + try: + options, args = getopt.gnu_getopt(sys.argv[1:], 'p:f:o:hV', + ['prefix=', + 'schema-file=', + 'output-table=', + 'output-only-table=', + 'ro=', + 'rw=', + 'output-file=']) + except getopt.GetoptError as geo: + sys.stderr.write("%s: %s\n" % (argv0, geo.msg)) + sys.exit(1) + + prefix = None + schema_file = None + output_tables = set() + output_only_tables = set() + output_file = None + for key, value in options: + if key in ['-h', '--help']: + usage() + elif key in ['-V', '--version']: + print("ovsdb2ddlog2c (OVN) @VERSION@") + elif key in ['-p', '--prefix']: + prefix = value + elif key in ['-f', '--schema-file']: + schema_file = value + elif key in ['-o', '--output-table']: + output_tables.add(value) + elif key == '--output-only-table': + output_only_tables.add(value) + elif key in ['--ro', '--rw']: + pass + elif key == '--output-file': + output_file = value + else: + sys.exit(0) + + if schema_file is None: + sys.stderr.write("%s: missing -f or --schema-file option\n" % argv0) + sys.exit(1) + if prefix is None: + sys.stderr.write("%s: missing -p or --prefix option\n" % argv0) + sys.exit(1) + if not output_tables.isdisjoint(output_only_tables): + example = next(iter(output_tables.intersect(output_only_tables))) + sys.stderr.write("%s: %s may not be both an output table and " + "an output-only table\n" % (argv0, example)) + sys.exit(1) + + schema = ovs.db.schema.DbSchema.from_json(ovs.json.from_file( + schema_file)) + + all_tables = set(schema.tables.keys()) + missing_tables = (output_tables | output_only_tables) - all_tables + if missing_tables: + sys.stderr.write("%s: %s is not the name of a table\n" + % (argv0, next(iter(missing_tables)))) + sys.exit(1) + + f = sys.stdout if output_file is None else open(output_file, "w") + for name, tables in ( + ("input_relations", all_tables - output_only_tables), + ("output_relations", output_tables), + ("output_only_relations", output_only_tables)): + f.write("static const char *%s%s[] = {\n" % (prefix, name)) + for table in sorted(tables): + f.write(" \"%s\",\n" % table) + f.write(" NULL,\n") + f.write("};\n\n") + if schema_file is not None: + f.close() + except ovs.db.error.Error as e: + sys.stderr.write("%s: %s\n" % (argv0, e)) + sys.exit(1) + +# Local variables: +# mode: python +# End: diff --git a/tests/atlocal.in b/tests/atlocal.in index 4517ebf72fab..8a3907d65a20 100644 --- a/tests/atlocal.in +++ b/tests/atlocal.in @@ -210,3 +210,10 @@ export OVS_CTL_TIMEOUT # matter break everything. ASAN_OPTIONS=detect_leaks=0:abort_on_error=true:log_path=asan:$ASAN_OPTIONS export ASAN_OPTIONS + +# Check whether we should run ddlog tests. +if test '@DDLOGLIBDIR@' != no; then + TEST_DDLOG="yes" +else + TEST_DDLOG="no" +fi diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at index b4dc387e54a4..7e7015380758 100644 --- a/tests/ovn-macros.at +++ b/tests/ovn-macros.at @@ -460,4 +460,7 @@ m4_define([OVN_FOR_EACH_NORTHD], [dnl m4_pushdef([NORTHD_TYPE], [ovn-northd])dnl $1 m4_popdef([NORTHD_TYPE])dnl +m4_pushdef([NORTHD_TYPE], [ovn-northd-ddlog])dnl +$1 +m4_popdef([NORTHD_TYPE])dnl ]) diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 972ff5c626a3..7d73b0b835a1 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -704,6 +704,103 @@ check_row_count Datapath_Binding 1 AT_CLEANUP ]) +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn -- ovn-northd restart]) +ovn_start --no-backup-northd + +# Check that ovn-northd is active, by verifying that it creates and +# destroys southbound datapaths as one would expect. +check_row_count Datapath_Binding 0 +check ovn-nbctl --wait=sb ls-add sw0 +check_row_count Datapath_Binding 1 + +# Kill northd. +as northd +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) + +# With ovn-northd gone, changes to nbdb won't be reflected into sbdb. +# Make sure. +check ovn-nbctl ls-add sw1 +sleep 5 +check_row_count Datapath_Binding 1 + +# Now resume ovn-northd. Changes should catch up. +ovn_start_northd primary +wait_row_count Datapath_Binding 2 + +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn -- northbound database reconnection]) +ovn_start --no-backup-northd + +# Check that ovn-northd is active, by verifying that it creates and +# destroys southbound datapaths as one would expect. +check_row_count Datapath_Binding 0 +check ovn-nbctl --wait=sb ls-add sw0 +check_row_count Datapath_Binding 1 +lf=$(count_rows Logical_Flow) + +# Make nbdb ovsdb-server drop connection from ovn-northd. +conn=$(as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/list-remotes) +check as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/remove-remote "$conn" +conn2=punix:`pwd`/special.sock +check as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn2" + +# ovn-northd won't respond to changes (because the nbdb connection dropped). +check ovn-nbctl --db="${conn2#p}" ls-add sw1 +sleep 5 +check_row_count Datapath_Binding 1 +check_row_count Logical_Flow $lf + +# Now re-enable the nbdb connection and observe ovn-northd catch up. +# +# It's important to check both Datapath_Binding and Logical_Flow because +# ovn-northd-ddlog implements them in different ways that might go wrong +# differently on reconnection. +check as ovn-nb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn" +wait_row_count Datapath_Binding 2 +wait_row_count Logical_Flow $(expr 2 \* $lf) + +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn -- southbound database reconnection]) +ovn_start --no-backup-northd + +# Check that ovn-northd is active, by verifying that it creates and +# destroys southbound datapaths as one would expect. +check_row_count Datapath_Binding 0 +check ovn-nbctl --wait=sb ls-add sw0 +check_row_count Datapath_Binding 1 +lf=$(count_rows Logical_Flow) + +# Make sbdb ovsdb-server drop connection from ovn-northd. +conn=$(as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/list-remotes) +check as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/remove-remote "$conn" +conn2=punix:`pwd`/special.sock +check as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn2" + +# ovn-northd can't respond to changes (because the sbdb connection dropped). +check ovn-nbctl ls-add sw1 +sleep 5 +OVN_SB_DB=${conn2#p} check_row_count Datapath_Binding 1 +OVN_SB_DB=${conn2#p} check_row_count Logical_Flow $lf + +# Now re-enable the sbdb connection and observe ovn-northd catch up. +# +# It's important to check both Datapath_Binding and Logical_Flow because +# ovn-northd-ddlog implements them in different ways that might go wrong +# differently on reconnection. +check as ovn-sb ovs-appctl -t ovsdb-server ovsdb-server/add-remote "$conn" +wait_row_count Datapath_Binding 2 +wait_row_count Logical_Flow $(expr 2 \* $lf) + +AT_CLEANUP +]) + OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn -- check Redirect Chassis propagation from NB to SB]) ovn_start diff --git a/tests/ovn.at b/tests/ovn.at index 3d2b7a7989a7..8274d2185b10 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -16820,6 +16820,10 @@ AT_CLEANUP OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn -- IGMP snoop/querier/relay]) + +dnl This test has problems with ovn-northd-ddlog. +AT_SKIP_IF([test NORTHD_TYPE = ovn-northd-ddlog && test "$RUN_ANYWAY" != yes]) + ovn_start # Logical network: @@ -17486,6 +17490,10 @@ AT_CLEANUP OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn -- MLD snoop/querier/relay]) + +dnl This test has problems with ovn-northd-ddlog. +AT_SKIP_IF([test NORTHD_TYPE = ovn-northd-ddlog && test "$RUN_ANYWAY" != yes]) + ovn_start # Logical network: @@ -20187,6 +20195,10 @@ AT_CLEANUP OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn -- interconnection]) + +dnl This test has problems with ovn-northd-ddlog. +AT_SKIP_IF([test NORTHD_TYPE = ovn-northd-ddlog && test "$RUN_ANYWAY" != yes]) + ovn_init_ic_db n_az=5 n_ts=5 diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at index 8cdc0d640cc2..a1727f9d3fd8 100644 --- a/tests/ovs-macros.at +++ b/tests/ovs-macros.at @@ -7,11 +7,14 @@ dnl Make AT_SETUP automatically do some things for us: dnl - Run the ovs_init() shell function as the first step in every test. dnl - If NORTHD_TYPE is defined, then append it to the test name and dnl set it as a shell variable as well. +dnl - Skip the test if it's for ovn-northd-ddlog but it didn't get built. m4_rename([AT_SETUP], [OVS_AT_SETUP]) m4_define([AT_SETUP], [OVS_AT_SETUP($@[]m4_ifdef([NORTHD_TYPE], [ -- NORTHD_TYPE])) m4_ifdef([NORTHD_TYPE], [[NORTHD_TYPE]=NORTHD_TYPE -AT_SKIP_IF([test $NORTHD_TYPE = ovn-northd-ddlog && test $TEST_DDLOG = no]) +])dnl +m4_if(NORTHD_TYPE, [ovn-northd-ddlog], [dnl +AT_SKIP_IF([test $TEST_DDLOG = no]) ])dnl ovs_init ]) diff --git a/tutorial/ovs-sandbox b/tutorial/ovs-sandbox index 1841776a476d..676314b21151 100755 --- a/tutorial/ovs-sandbox +++ b/tutorial/ovs-sandbox @@ -72,6 +72,7 @@ schema= installed=false built=false ovn=true +ddlog=false ovnsb_schema= ovnnb_schema= ic_sb_schema= @@ -143,6 +144,7 @@ General options: -S, --schema=FILE use FILE as vswitch.ovsschema OVN options: + --ddlog use ovn-northd-ddlog --no-ovn-rbac disable role-based access control for OVN --n-northds=NUMBER run NUMBER copies of northd (default: 1) --n-ics=NUMBER run NUMBER copies of ic (default: 1) @@ -234,6 +236,9 @@ EOF --gdb-ovn-controller-vtep) gdb_ovn_controller_vtep=true ;; + --ddlog) + ddlog=true + ;; --no-ovn-rbac) ovn_rbac=false ;; @@ -609,12 +614,23 @@ for i in $(seq $n_ics); do --ovnsb-db="$OVN_SB_DB" --ovnnb-db="$OVN_NB_DB" \ --ic-sb-db="$OVN_IC_SB_DB" --ic-nb-db="$OVN_IC_NB_DB" done + +northd_args= +if $ddlog; then + OVN_NORTHD=ovn-northd-ddlog +else + OVN_NORTHD=ovn-northd +fi + for i in $(seq $n_northds); do if [ $i -eq 1 ]; then inst=""; else inst=$i; fi - rungdb $gdb_ovn_northd $gdb_ovn_northd_ex ovn-northd --detach \ - --no-chdir --pidfile=ovn-northd${inst}.pid -vconsole:off \ - --log-file=ovn-northd${inst}.log -vsyslog:off \ - --ovnsb-db="$OVN_SB_DB" --ovnnb-db="$OVN_NB_DB" + if $ddlog; then + northd_args=--ddlog-record=replay$inst.txt + fi + rungdb $gdb_ovn_northd $gdb_ovn_northd_ex $OVN_NORTHD --detach \ + --no-chdir --pidfile=$OVN_NORTHD$inst.pid -vconsole:off \ + --log-file=$OVN_NORTHD$inst.log -vsyslog:off \ + --ovnsb-db="$OVN_SB_DB" --ovnnb-db="$OVN_NB_DB" $northd_args done for i in $(seq $n_controllers); do if [ $i -eq 1 ]; then inst=""; else inst=$i; fi diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py index 981a433be9cc..fa2a382f1d14 100755 --- a/utilities/checkpatch.py +++ b/utilities/checkpatch.py @@ -184,7 +184,7 @@ skip_signoff_check = False # # Python isn't checked as flake8 performs these checks during build. line_length_blacklist = re.compile( - r'\.(am|at|etc|in|m4|mk|patch|py)$|debian/rules') + r'\.(am|at|etc|in|m4|mk|patch|py|dl)|$|debian/rules') # Don't enforce a requirement that leading whitespace be all spaces on # files that include these characters in their name, since these kinds diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl index c44201ccfb3e..92f03815fa57 100755 --- a/utilities/ovn-ctl +++ b/utilities/ovn-ctl @@ -458,10 +458,10 @@ start_northd () { ovn_northd_params="`cat $ovn_northd_db_conf_file`" fi - if daemon_is_running ovn-northd; then - log_success_msg "ovn-northd is already running" + if daemon_is_running $OVN_NORTHD_BIN; then + log_success_msg "$OVN_NORTHD_BIN is already running" else - set ovn-northd + set $OVN_NORTHD_BIN if test X"$OVN_NORTHD_LOGFILE" != X; then set "$@" --log-file=$OVN_NORTHD_LOGFILE fi @@ -571,7 +571,7 @@ start_controller_vtep () { ## ---- ## stop_northd () { - OVS_RUNDIR=${OVS_RUNDIR} stop_ovn_daemon ovn-northd + OVS_RUNDIR=${OVS_RUNDIR} stop_ovn_daemon $OVN_NORTHD_BIN if [ ! -e $ovn_northd_db_conf_file ]; then if test X"$OVN_MANAGE_OVSDB" = Xyes; then @@ -714,6 +714,7 @@ set_defaults () { OVN_CONTROLLER_WRAPPER= OVSDB_NB_WRAPPER= OVSDB_SB_WRAPPER= + OVN_NORTHD_DDLOG=no OVN_USER= @@ -932,6 +933,8 @@ Options: --ovs-user="user[:group]" pass the --user flag to ovs daemons --ovsdb-nb-wrapper=WRAPPER run with a wrapper like valgrind for debugging --ovsdb-sb-wrapper=WRAPPER run with a wrapper like valgrind for debugging + --ovn-northd-ddlog=yes|no whether we should run the DDlog version + of ovn-northd. The default is "no". -h, --help display this help message File location options: @@ -1087,6 +1090,13 @@ do ;; esac done + +if test X"$OVN_NORTHD_DDLOG" = Xyes; then + OVN_NORTHD_BIN=ovn-northd-ddlog +else + OVN_NORTHD_BIN=ovn-northd +fi + case $command in start_northd) start_northd @@ -1179,7 +1189,7 @@ case $command in restart_ic_sb_ovsdb ;; status_northd) - daemon_status ovn-northd || exit 1 + daemon_status $OVN_NORTHD_BIN || exit 1 ;; status_ovsdb) status_ovsdb

[ovs-dev,v5,8/8] ovn-northd-ddlog: New implementation of ovn-northd based on ddlog.

Commit Message

Comments

Patch