diff mbox series

[ovs-dev] ovs-actions: Document normal pipeline.

Message ID 20210415033446.2823981-1-blp@ovn.org
State Changes Requested
Headers show
Series [ovs-dev] ovs-actions: Document normal pipeline. | expand

Commit Message

Ben Pfaff April 15, 2021, 3:34 a.m. UTC
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 lib/ovs-actions.xml | 288 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 286 insertions(+), 2 deletions(-)

Comments

Ben Pfaff May 6, 2021, 5:15 p.m. UTC | #1
This documentation-only patch could use a review.

On Wed, Apr 14, 2021 at 08:34:46PM -0700, Ben Pfaff wrote:
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
>  lib/ovs-actions.xml | 288 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 286 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ovs-actions.xml b/lib/ovs-actions.xml
> index a2778de4bcd6..de934a244de9 100644
> --- a/lib/ovs-actions.xml
> +++ b/lib/ovs-actions.xml
> @@ -509,7 +509,8 @@ $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
>          <dd>
>            Subjects the packet to the device's normal L2/L3 processing.  This
>            action is not implemented by all OpenFlow switches, and each switch
> -          implements it differently.
> +          implements it differently.  The section ``The OVS Normal Pipeline''
> +          below documents the OVS implementation.
>          </dd>
>  
>          <dt><code>flood</code></dt>
> @@ -582,7 +583,6 @@ $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
>          OpenFlow allows switches to reject such actions.
>        </p>
>  
> -      <!-- XXX output to normal details -->
>        <!-- XXX output to patch ports details -->
>  
>        <h3>Output to the Input Port</h3>
> @@ -664,6 +664,290 @@ $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
>        </conformance>
>      </action>
>  
> +    <h2>The OVS Normal Pipeline</h2>
> +
> +    <p>
> +      This section documents how Open vSwitch implements output to the
> +      <code>normal</code> port.  The OpenFlow specification places no
> +      requirements on how this port works, so all of this documentation is
> +      specific to Open vSwitch.
> +    </p>
> +
> +    <p>
> +      Open vSwitch uses the <code>Open_vSwitch</code> database, detailed in
> +      <code>ovs-vswitchd.conf.db</code>(5), to determine the details of the
> +      normal pipeline.
> +    </p>
> +
> +    <p>
> +      The normal pipeline executes the following ingress stages for each
> +      packet.  The result of the ingress stages is a set of output ports, which
> +      is the empty set if some ingress stage drops the packet:
> +    </p>
> +
> +    <ol>
> +      <li>
> +        <p>
> +          <b>Input port lookup</b>: Looks up the OpenFlow
> +          <code>in_port</code> field's value to the corresponding
> +          <code>Port</code> and <code>Interface</code> record in the database.
> +        </p>
> +
> +        <p>
> +          The <code>in_port</code> is normally the OpenFlow port that the
> +          packet was received on.  If <code>set_field</code> or another actions
> +          changes the <code>in_port</code>, the updated value is honored.  This
> +          lookup will ordinarily succeed; if it fails, for example because
> +          <code>in_port</code> was changed to an unknown value, then the normal
> +          pipeline exits.
> +        </p>
> +      </li>
> +
> +      <li>
> +        <b>Drop malformed packet</b>: If the packet is malformed enough that it
> +        contains only part of an 802.1Q header, then the normal pipeline exits
> +        error.
> +      </li>
> +
> +      <li>
> +        <b>Drop packets sent to a port reserved for mirroring:</b> If the
> +        packet was received on a port that is configured as the output port for
> +        a mirror (that is, it is the <code>output_port</code> in some
> +        <code>Mirror</code> record), then the normal pipeline exits.  Ports
> +        used as mirror outputs don't accept any packets.
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>VLAN input processing:</b> This stage determines what VLAN the
> +          packet is in.  It also verifies that this VLAN is valid for the port;
> +          if not, the normal pipeline exits.  How the VLAN is determined and
> +          which ones are valid vary based on the <code>vlan-mode</code> in the
> +          input port's <code>Port</code> record:
> +        </p>
> +
> +        <dl>
> +          <dt><code>trunk</code></dt>
> +          <dd>
> +            The packet is in the VLAN specified in its 802.1Q header, or in
> +            VLAN 0 if there is no 802.1Q header.  The <code>trunks</code>
> +            column in the <code>Port</code> record lists the valid VLANs; if it
> +            is empty, all VLANs are valid.
> +          </dd>
> +
> +          <dt><code>access</code></dt>
> +          <dd>
> +            The packet is in the VLAN specified in the <code>tag</code> column
> +            of its <code>Port</code> record.  The packet must not have an
> +            802.1Q header with a nonzero VLAN ID; if it does, the pipeline
> +            exits.
> +          </dd>
> +
> +          <dt><code>native-tagged</code></dt>
> +          <dt><code>native-untagged</code></dt>
> +          <dd>
> +            Same as <code>trunk</code> except that the VLAN of a packet without
> +            an 802.1Q header is not necessarily zero; instead, it is taken from
> +            the <code>tag</code> column.
> +          </dd>
> +
> +          <dt><code>dot1q-tunnel</code></dt>
> +          <dd>
> +            The packet is in the VLAN specified in the <code>tag</code> column
> +            of its <code>Port</code> record, which is a QinQ service VLAN with
> +            the Ethertype specified by the <code>Port</code>'s
> +            <code>other_config</code> : <code>qinq-ethtype</code>.  If the
> +            packet has an 802.1Q header, then it specifies the customer VLAN.
> +            The <code>cvlans</code> column specifies the valid customer VLANs;
> +            if it is empty, all customer VLANs are valid.
> +          </dd>
> +        </dl>
> +      </li>
> +
> +      <li>
> +        <b>Drop reserved multicast addresses:</b> If the packet is addressed to
> +        a reserved Ethernet multicast address and the <code>Bridge</code>
> +        record does not have <code>other_config</code> :
> +        <code>forward-bpdu</code> set to <code>true</code>, the pipeline exits.
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Check bond admissibility:</b> If the input port is a member of a
> +          bond, that is, a <code>Port</code> with more than one
> +          <code>Interface</code>, then the bonding code performs an additional
> +          admissibility check to accept or drop the packet.
> +        </p>
> +
> +        <p>
> +          There is a first step if the bond is configured to use LACP.  If so,
> +          then either LACP has been negotiated with the peer or negotiation is
> +          incomplete.  If it has been negotiated, accept the packet if and only
> +          if the bond member is enabled (i.e. carrier is up and it hasn't been
> +          administratively disabled).  If negotiation is incomplete, then
> +          normally the normal pipeline drops the packet, except that if
> +          fallback to active-backup mode is enabled, it continues considering
> +          bond admissibility while acting as though the active-backup balancing
> +          mode were in use.
> +        </p>
> +
> +        <p>
> +          If the packet is an Ethernet multicast, and not received on the
> +          bond's active member, drop it.
> +        </p>
> +
> +        <p>
> +          The remaining behavior depends on the bond's balancing mode:
> +        </p>
> +
> +        <dl>
> +          <dt>L4 (aka TCP balancing)</dt>
> +          <dd>
> +            Drop the packet (this balancing mode is only supported with LACP).
> +          </dd>
> +
> +          <dt>Active-backup</dt>
> +          <dd>
> +            Accept the packet only if and only it was received on the active
> +            member.
> +          </dd>
> +
> +          <dt>SLB (Source Load Balancing)</dt>
> +          <dd>
> +            Drop the packet if the bridge has not learned the packet's source
> +            address (in its VLAN) on the port that received it.  Otherwise,
> +            accept the packet unless it is a gratuituous ARP.  Otherwise,
> +            accept the packet if the MAC entry we found is ARP-locked.
> +            Otherwise, drop the packet.  (See the ``SLB Bonding'' section in
> +            the OVS bonding document for more information and a rationale.)
> +          </dd>
> +        </dl>
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Learn source MAC:</b> If the source Ethernet address is not a
> +          multicast address, then insert a mapping from packet's source
> +          Ethernet address and VLAN to the input port in the bridge's MAC
> +          learning table.  (This is skipped if the packet's VLAN is listed in
> +          the switch's <code>Bridge</code> record in the
> +          <code>flood_vlans</code> column, since there is no use for MAC
> +          learning when all packets are flooded.)
> +        </p>
> +
> +        <p>
> +          When learning happens on a non-bond port, if the packet is a
> +          gratuitous ARP, the entry is marked as ARP-locked.  The lock expires
> +          after 5 seconds.  (See the ``SLB Bonding'' section in the OVS bonding
> +          document for more information and a rationale.)
> +        </p>
> +      </li>
> +
> +      <li>
> +        <b>IP multicast path:</b> If multicast snooping is enabled on the
> +        bridge, and the packet is an Ethernet multicast but not an Ethernet
> +        broadcast, and the packet is an IP packet, then the packet takes a
> +        special processing path.  This path is not yet documented here.  <!--
> +        XXX document multicast processing -->
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Output port set:</b> Search the MAC learning table for the port
> +          corresponding to the packet's Ethernet destination and VLAN.  If the
> +          search finds an entry, the output port set is the just the learned
> +          port.  Otherwise (including the case where the packet is an Ethernet
> +          multicast or in <code>flood_vlans</code>), the output port set is all
> +          of the ports in the bridge that belong to the packet's VLAN, except
> +          for any ports that were disabled for flooding via OpenFlow or that
> +          are configured in a <code>Mirror</code> record as a mirror
> +          destination port.
> +        </p>
> +      </li>
> +    </ol>
> +
> +    <p>
> +      The following egress stages execute once for each element in the set of
> +      output ports.  They execute (conceptually) in parallel, so that a
> +      decision or action taken for a given output port has no effect on those
> +      for another one:
> +    </p>
> +
> +    <ol>
> +      <li>
> +        <b>Drop loopback:</b> If the output port is the same as the input port,
> +        drop the packet.
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>VLAN output processing:</b> This stage adjusts the packet to
> +          represent the VLAN in the correct way for the output port.  Its
> +          behavior varies based on the <code>vlan-mode</code> in the output
> +          port's <code>Port</code> record:
> +        </p>
> +
> +        <dl>
> +          <dt><code>trunk</code></dt>
> +          <dt><code>native-tagged</code></dt>
> +          <dt><code>native-untagged</code></dt>
> +          <dd>
> +            If the packet is in VLAN 0 (for <code>native-untagged</code>, if
> +            the packet is in the native VLAN) drops any 802.1Q header.
> +            Otherwise, ensures that there is an 802.1Q header designating the
> +            VLAN.
> +          </dd>
> +
> +          <dt><code>access</code></dt>
> +          <dd>
> +            Remove any 802.1Q header that was present.
> +          </dd>
> +
> +          <dt><code>dot1q-tunnel</code></dt>
> +          <dd>
> +            Ensures that the packet has an outer 802.1Q header with the QinQ
> +            Ethertype and the specified configured tag, and an inner 802.1Q
> +            header with the packet's VLAN.
> +          </dd>
> +        </dl>
> +      </li>
> +
> +      <li>
> +        <b>VLAN priority tag processing:</b> If VLAN output processing
> +        discarded the 802.1Q headers, but priority tags are enabled with
> +        <code>other_config</code> : <code>priority-tags</code> in the output
> +        port's <code>Port</code> record, then a priority-only tag is added
> +        (perhaps only if the priority woule be nonzero, depending on the
> +        configuration).
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Bond member choice:</b> If the output port is a bond, the code
> +          chooses a particular member.  This step is skipped for non-bonded
> +          ports.
> +        </p>
> +
> +        <p>
> +          If the bond is configured to use LACP, but LACP negotiation is
> +          incomplete, then normally the packet is dropped.  The exception is
> +          that if fallback to active-backup mode is enabled, the egress
> +          pipeline continues choosing a bond member as if active-backup mode
> +          was in use.
> +        </p>
> +
> +        <p>
> +          For active-backup mode, the output member is the active member.
> +          Other modes hash appropriate header fields and use the hash value to
> +          choose one of the enabled members.
> +        </p>
> +      </li>
> +
> +      <li>
> +        <b>Output:</b> The pipeline sends the packet to the output port.
> +      </li>
> +    </ol>
> +
>      <action name="CONTROLLER">
>        <h2>The <code>controller</code> action</h2>
>        <syntax><code>controller</code></syntax>
> -- 
> 2.29.2
>
Ilya Maximets May 12, 2021, 5:09 p.m. UTC | #2
On 4/15/21 5:34 AM, Ben Pfaff wrote:
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
>  lib/ovs-actions.xml | 288 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 286 insertions(+), 2 deletions(-)
> 

Hi, Ben.  Thanks for writing this down!
It looks good to me in general.  Few comments inline.

Best regards, Ilya Maximets.

> diff --git a/lib/ovs-actions.xml b/lib/ovs-actions.xml
> index a2778de4bcd6..de934a244de9 100644
> --- a/lib/ovs-actions.xml
> +++ b/lib/ovs-actions.xml
> @@ -509,7 +509,8 @@ $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
>          <dd>
>            Subjects the packet to the device's normal L2/L3 processing.  This
>            action is not implemented by all OpenFlow switches, and each switch
> -          implements it differently.
> +          implements it differently.  The section ``The OVS Normal Pipeline''
> +          below documents the OVS implementation.
>          </dd>
>  
>          <dt><code>flood</code></dt>
> @@ -582,7 +583,6 @@ $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
>          OpenFlow allows switches to reject such actions.
>        </p>
>  
> -      <!-- XXX output to normal details -->
>        <!-- XXX output to patch ports details -->
>  
>        <h3>Output to the Input Port</h3>
> @@ -664,6 +664,290 @@ $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
>        </conformance>
>      </action>
>  
> +    <h2>The OVS Normal Pipeline</h2>
> +
> +    <p>
> +      This section documents how Open vSwitch implements output to the
> +      <code>normal</code> port.  The OpenFlow specification places no
> +      requirements on how this port works, so all of this documentation is
> +      specific to Open vSwitch.
> +    </p>
> +
> +    <p>
> +      Open vSwitch uses the <code>Open_vSwitch</code> database, detailed in
> +      <code>ovs-vswitchd.conf.db</code>(5), to determine the details of the
> +      normal pipeline.
> +    </p>
> +
> +    <p>
> +      The normal pipeline executes the following ingress stages for each
> +      packet.  The result of the ingress stages is a set of output ports, which
> +      is the empty set if some ingress stage drops the packet:
> +    </p>
> +
> +    <ol>
> +      <li>
> +        <p>
> +          <b>Input port lookup</b>: Looks up the OpenFlow
> +          <code>in_port</code> field's value to the corresponding
> +          <code>Port</code> and <code>Interface</code> record in the database.
> +        </p>
> +
> +        <p>
> +          The <code>in_port</code> is normally the OpenFlow port that the
> +          packet was received on.  If <code>set_field</code> or another actions
> +          changes the <code>in_port</code>, the updated value is honored.  This
> +          lookup will ordinarily succeed; if it fails, for example because
> +          <code>in_port</code> was changed to an unknown value, then the normal
> +          pipeline exits.
> +        </p>
> +      </li>
> +
> +      <li>
> +        <b>Drop malformed packet</b>: If the packet is malformed enough that it
> +        contains only part of an 802.1Q header, then the normal pipeline exits
> +        error.

Should it be "exits with error"?

> +      </li>
> +
> +      <li>
> +        <b>Drop packets sent to a port reserved for mirroring:</b> If the
> +        packet was received on a port that is configured as the output port for
> +        a mirror (that is, it is the <code>output_port</code> in some
> +        <code>Mirror</code> record), then the normal pipeline exits.  Ports
> +        used as mirror outputs don't accept any packets.
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>VLAN input processing:</b> This stage determines what VLAN the
> +          packet is in.  It also verifies that this VLAN is valid for the port;
> +          if not, the normal pipeline exits.  How the VLAN is determined and
> +          which ones are valid vary based on the <code>vlan-mode</code> in the
> +          input port's <code>Port</code> record:
> +        </p>
> +
> +        <dl>
> +          <dt><code>trunk</code></dt>
> +          <dd>
> +            The packet is in the VLAN specified in its 802.1Q header, or in
> +            VLAN 0 if there is no 802.1Q header.  The <code>trunks</code>
> +            column in the <code>Port</code> record lists the valid VLANs; if it
> +            is empty, all VLANs are valid.
> +          </dd>
> +
> +          <dt><code>access</code></dt>
> +          <dd>
> +            The packet is in the VLAN specified in the <code>tag</code> column
> +            of its <code>Port</code> record.  The packet must not have an
> +            802.1Q header with a nonzero VLAN ID; if it does, the pipeline
> +            exits.
> +          </dd>
> +
> +          <dt><code>native-tagged</code></dt>
> +          <dt><code>native-untagged</code></dt>
> +          <dd>
> +            Same as <code>trunk</code> except that the VLAN of a packet without
> +            an 802.1Q header is not necessarily zero; instead, it is taken from
> +            the <code>tag</code> column.
> +          </dd>
> +
> +          <dt><code>dot1q-tunnel</code></dt>
> +          <dd>
> +            The packet is in the VLAN specified in the <code>tag</code> column
> +            of its <code>Port</code> record, which is a QinQ service VLAN with
> +            the Ethertype specified by the <code>Port</code>'s
> +            <code>other_config</code> : <code>qinq-ethtype</code>.  If the
> +            packet has an 802.1Q header, then it specifies the customer VLAN.
> +            The <code>cvlans</code> column specifies the valid customer VLANs;
> +            if it is empty, all customer VLANs are valid.
> +          </dd>
> +        </dl>
> +      </li>
> +
> +      <li>
> +        <b>Drop reserved multicast addresses:</b> If the packet is addressed to
> +        a reserved Ethernet multicast address and the <code>Bridge</code>
> +        record does not have <code>other_config</code> :
> +        <code>forward-bpdu</code> set to <code>true</code>, the pipeline exits.
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Check bond admissibility:</b> If the input port is a member of a
> +          bond, that is, a <code>Port</code> with more than one
> +          <code>Interface</code>, then the bonding code performs an additional
> +          admissibility check to accept or drop the packet.
> +        </p>
> +
> +        <p>
> +          There is a first step if the bond is configured to use LACP.  If so,
> +          then either LACP has been negotiated with the peer or negotiation is
> +          incomplete.  If it has been negotiated, accept the packet if and only
> +          if the bond member is enabled (i.e. carrier is up and it hasn't been
> +          administratively disabled).  If negotiation is incomplete, then
> +          normally the normal pipeline drops the packet, except that if
> +          fallback to active-backup mode is enabled, it continues considering
> +          bond admissibility while acting as though the active-backup balancing
> +          mode were in use.
> +        </p>

This part is a little bit cryptic.  All the text below written for a case where
LACP disabled or falls back to active-backup, but it's not obvious for me from
the previous paragraph.  I got surprised by the part that says that L4 mode
always drops all packets, so I had to go back and re-read from the start very
carefully.

> +
> +        <p>
> +          If the packet is an Ethernet multicast, and not received on the
> +          bond's active member, drop it.
> +        </p>
> +
> +        <p>
> +          The remaining behavior depends on the bond's balancing mode:
> +        </p>
> +
> +        <dl>
> +          <dt>L4 (aka TCP balancing)</dt>
> +          <dd>
> +            Drop the packet (this balancing mode is only supported with LACP).
> +          </dd>
> +
> +          <dt>Active-backup</dt>
> +          <dd>
> +            Accept the packet only if and only it was received on the active
> +            member.
> +          </dd>
> +
> +          <dt>SLB (Source Load Balancing)</dt>
> +          <dd>
> +            Drop the packet if the bridge has not learned the packet's source
> +            address (in its VLAN) on the port that received it.  Otherwise,
> +            accept the packet unless it is a gratuituous ARP.  Otherwise,

s/gratuituous/gratuitous/

> +            accept the packet if the MAC entry we found is ARP-locked.
> +            Otherwise, drop the packet.  (See the ``SLB Bonding'' section in
> +            the OVS bonding document for more information and a rationale.)
> +          </dd>
> +        </dl>
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Learn source MAC:</b> If the source Ethernet address is not a
> +          multicast address, then insert a mapping from packet's source
> +          Ethernet address and VLAN to the input port in the bridge's MAC
> +          learning table.  (This is skipped if the packet's VLAN is listed in
> +          the switch's <code>Bridge</code> record in the
> +          <code>flood_vlans</code> column, since there is no use for MAC
> +          learning when all packets are flooded.)
> +        </p>
> +
> +        <p>
> +          When learning happens on a non-bond port, if the packet is a
> +          gratuitous ARP, the entry is marked as ARP-locked.  The lock expires
> +          after 5 seconds.  (See the ``SLB Bonding'' section in the OVS bonding
> +          document for more information and a rationale.)
> +        </p>
> +      </li>
> +
> +      <li>
> +        <b>IP multicast path:</b> If multicast snooping is enabled on the
> +        bridge, and the packet is an Ethernet multicast but not an Ethernet
> +        broadcast, and the packet is an IP packet, then the packet takes a
> +        special processing path.  This path is not yet documented here.  <!--
> +        XXX document multicast processing -->

Nit: it might be better to move the '<!--' to the next line for readability.

> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Output port set:</b> Search the MAC learning table for the port
> +          corresponding to the packet's Ethernet destination and VLAN.  If the
> +          search finds an entry, the output port set is the just the learned
> +          port.  Otherwise (including the case where the packet is an Ethernet
> +          multicast or in <code>flood_vlans</code>), the output port set is all
> +          of the ports in the bridge that belong to the packet's VLAN, except
> +          for any ports that were disabled for flooding via OpenFlow or that
> +          are configured in a <code>Mirror</code> record as a mirror
> +          destination port.
> +        </p>
> +      </li>
> +    </ol>
> +
> +    <p>
> +      The following egress stages execute once for each element in the set of
> +      output ports.  They execute (conceptually) in parallel, so that a
> +      decision or action taken for a given output port has no effect on those
> +      for another one:
> +    </p>
> +
> +    <ol>
> +      <li>
> +        <b>Drop loopback:</b> If the output port is the same as the input port,
> +        drop the packet.
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>VLAN output processing:</b> This stage adjusts the packet to
> +          represent the VLAN in the correct way for the output port.  Its
> +          behavior varies based on the <code>vlan-mode</code> in the output
> +          port's <code>Port</code> record:
> +        </p>
> +
> +        <dl>
> +          <dt><code>trunk</code></dt>
> +          <dt><code>native-tagged</code></dt>
> +          <dt><code>native-untagged</code></dt>
> +          <dd>
> +            If the packet is in VLAN 0 (for <code>native-untagged</code>, if
> +            the packet is in the native VLAN) drops any 802.1Q header.
> +            Otherwise, ensures that there is an 802.1Q header designating the
> +            VLAN.
> +          </dd>
> +
> +          <dt><code>access</code></dt>
> +          <dd>
> +            Remove any 802.1Q header that was present.
> +          </dd>
> +
> +          <dt><code>dot1q-tunnel</code></dt>
> +          <dd>
> +            Ensures that the packet has an outer 802.1Q header with the QinQ
> +            Ethertype and the specified configured tag, and an inner 802.1Q
> +            header with the packet's VLAN.
> +          </dd>
> +        </dl>
> +      </li>
> +
> +      <li>
> +        <b>VLAN priority tag processing:</b> If VLAN output processing
> +        discarded the 802.1Q headers, but priority tags are enabled with
> +        <code>other_config</code> : <code>priority-tags</code> in the output
> +        port's <code>Port</code> record, then a priority-only tag is added
> +        (perhaps only if the priority woule be nonzero, depending on the

s/woule/would/ ?

> +        configuration).
> +      </li>
> +
> +      <li>
> +        <p>
> +          <b>Bond member choice:</b> If the output port is a bond, the code
> +          chooses a particular member.  This step is skipped for non-bonded
> +          ports.
> +        </p>
> +
> +        <p>
> +          If the bond is configured to use LACP, but LACP negotiation is
> +          incomplete, then normally the packet is dropped.  The exception is
> +          that if fallback to active-backup mode is enabled, the egress
> +          pipeline continues choosing a bond member as if active-backup mode
> +          was in use.
> +        </p>
> +
> +        <p>
> +          For active-backup mode, the output member is the active member.
> +          Other modes hash appropriate header fields and use the hash value to
> +          choose one of the enabled members.
> +        </p>
> +      </li>
> +
> +      <li>
> +        <b>Output:</b> The pipeline sends the packet to the output port.
> +      </li>
> +    </ol>
> +
>      <action name="CONTROLLER">
>        <h2>The <code>controller</code> action</h2>
>        <syntax><code>controller</code></syntax>
>
Ben Pfaff May 12, 2021, 8:16 p.m. UTC | #3
On Wed, May 12, 2021 at 07:09:50PM +0200, Ilya Maximets wrote:
> On 4/15/21 5:34 AM, Ben Pfaff wrote:
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> >  lib/ovs-actions.xml | 288 +++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 286 insertions(+), 2 deletions(-)
> > 
> 
> Hi, Ben.  Thanks for writing this down!
> It looks good to me in general.  Few comments inline.

Thanks!  Your comments make sense.  I fixed them, and did a re-read of
my own to find other ways the writing could be improved, and sent v2 for
a second round of review:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-May/382963.html
diff mbox series

Patch

diff --git a/lib/ovs-actions.xml b/lib/ovs-actions.xml
index a2778de4bcd6..de934a244de9 100644
--- a/lib/ovs-actions.xml
+++ b/lib/ovs-actions.xml
@@ -509,7 +509,8 @@  $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
         <dd>
           Subjects the packet to the device's normal L2/L3 processing.  This
           action is not implemented by all OpenFlow switches, and each switch
-          implements it differently.
+          implements it differently.  The section ``The OVS Normal Pipeline''
+          below documents the OVS implementation.
         </dd>
 
         <dt><code>flood</code></dt>
@@ -582,7 +583,6 @@  $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
         OpenFlow allows switches to reject such actions.
       </p>
 
-      <!-- XXX output to normal details -->
       <!-- XXX output to patch ports details -->
 
       <h3>Output to the Input Port</h3>
@@ -664,6 +664,290 @@  $ ovs-ofctl -O OpenFlow10 add-flow br0 actions=mod_nw_src:1.2.3.4
       </conformance>
     </action>
 
+    <h2>The OVS Normal Pipeline</h2>
+
+    <p>
+      This section documents how Open vSwitch implements output to the
+      <code>normal</code> port.  The OpenFlow specification places no
+      requirements on how this port works, so all of this documentation is
+      specific to Open vSwitch.
+    </p>
+
+    <p>
+      Open vSwitch uses the <code>Open_vSwitch</code> database, detailed in
+      <code>ovs-vswitchd.conf.db</code>(5), to determine the details of the
+      normal pipeline.
+    </p>
+
+    <p>
+      The normal pipeline executes the following ingress stages for each
+      packet.  The result of the ingress stages is a set of output ports, which
+      is the empty set if some ingress stage drops the packet:
+    </p>
+
+    <ol>
+      <li>
+        <p>
+          <b>Input port lookup</b>: Looks up the OpenFlow
+          <code>in_port</code> field's value to the corresponding
+          <code>Port</code> and <code>Interface</code> record in the database.
+        </p>
+
+        <p>
+          The <code>in_port</code> is normally the OpenFlow port that the
+          packet was received on.  If <code>set_field</code> or another actions
+          changes the <code>in_port</code>, the updated value is honored.  This
+          lookup will ordinarily succeed; if it fails, for example because
+          <code>in_port</code> was changed to an unknown value, then the normal
+          pipeline exits.
+        </p>
+      </li>
+
+      <li>
+        <b>Drop malformed packet</b>: If the packet is malformed enough that it
+        contains only part of an 802.1Q header, then the normal pipeline exits
+        error.
+      </li>
+
+      <li>
+        <b>Drop packets sent to a port reserved for mirroring:</b> If the
+        packet was received on a port that is configured as the output port for
+        a mirror (that is, it is the <code>output_port</code> in some
+        <code>Mirror</code> record), then the normal pipeline exits.  Ports
+        used as mirror outputs don't accept any packets.
+      </li>
+
+      <li>
+        <p>
+          <b>VLAN input processing:</b> This stage determines what VLAN the
+          packet is in.  It also verifies that this VLAN is valid for the port;
+          if not, the normal pipeline exits.  How the VLAN is determined and
+          which ones are valid vary based on the <code>vlan-mode</code> in the
+          input port's <code>Port</code> record:
+        </p>
+
+        <dl>
+          <dt><code>trunk</code></dt>
+          <dd>
+            The packet is in the VLAN specified in its 802.1Q header, or in
+            VLAN 0 if there is no 802.1Q header.  The <code>trunks</code>
+            column in the <code>Port</code> record lists the valid VLANs; if it
+            is empty, all VLANs are valid.
+          </dd>
+
+          <dt><code>access</code></dt>
+          <dd>
+            The packet is in the VLAN specified in the <code>tag</code> column
+            of its <code>Port</code> record.  The packet must not have an
+            802.1Q header with a nonzero VLAN ID; if it does, the pipeline
+            exits.
+          </dd>
+
+          <dt><code>native-tagged</code></dt>
+          <dt><code>native-untagged</code></dt>
+          <dd>
+            Same as <code>trunk</code> except that the VLAN of a packet without
+            an 802.1Q header is not necessarily zero; instead, it is taken from
+            the <code>tag</code> column.
+          </dd>
+
+          <dt><code>dot1q-tunnel</code></dt>
+          <dd>
+            The packet is in the VLAN specified in the <code>tag</code> column
+            of its <code>Port</code> record, which is a QinQ service VLAN with
+            the Ethertype specified by the <code>Port</code>'s
+            <code>other_config</code> : <code>qinq-ethtype</code>.  If the
+            packet has an 802.1Q header, then it specifies the customer VLAN.
+            The <code>cvlans</code> column specifies the valid customer VLANs;
+            if it is empty, all customer VLANs are valid.
+          </dd>
+        </dl>
+      </li>
+
+      <li>
+        <b>Drop reserved multicast addresses:</b> If the packet is addressed to
+        a reserved Ethernet multicast address and the <code>Bridge</code>
+        record does not have <code>other_config</code> :
+        <code>forward-bpdu</code> set to <code>true</code>, the pipeline exits.
+      </li>
+
+      <li>
+        <p>
+          <b>Check bond admissibility:</b> If the input port is a member of a
+          bond, that is, a <code>Port</code> with more than one
+          <code>Interface</code>, then the bonding code performs an additional
+          admissibility check to accept or drop the packet.
+        </p>
+
+        <p>
+          There is a first step if the bond is configured to use LACP.  If so,
+          then either LACP has been negotiated with the peer or negotiation is
+          incomplete.  If it has been negotiated, accept the packet if and only
+          if the bond member is enabled (i.e. carrier is up and it hasn't been
+          administratively disabled).  If negotiation is incomplete, then
+          normally the normal pipeline drops the packet, except that if
+          fallback to active-backup mode is enabled, it continues considering
+          bond admissibility while acting as though the active-backup balancing
+          mode were in use.
+        </p>
+
+        <p>
+          If the packet is an Ethernet multicast, and not received on the
+          bond's active member, drop it.
+        </p>
+
+        <p>
+          The remaining behavior depends on the bond's balancing mode:
+        </p>
+
+        <dl>
+          <dt>L4 (aka TCP balancing)</dt>
+          <dd>
+            Drop the packet (this balancing mode is only supported with LACP).
+          </dd>
+
+          <dt>Active-backup</dt>
+          <dd>
+            Accept the packet only if and only it was received on the active
+            member.
+          </dd>
+
+          <dt>SLB (Source Load Balancing)</dt>
+          <dd>
+            Drop the packet if the bridge has not learned the packet's source
+            address (in its VLAN) on the port that received it.  Otherwise,
+            accept the packet unless it is a gratuituous ARP.  Otherwise,
+            accept the packet if the MAC entry we found is ARP-locked.
+            Otherwise, drop the packet.  (See the ``SLB Bonding'' section in
+            the OVS bonding document for more information and a rationale.)
+          </dd>
+        </dl>
+      </li>
+
+      <li>
+        <p>
+          <b>Learn source MAC:</b> If the source Ethernet address is not a
+          multicast address, then insert a mapping from packet's source
+          Ethernet address and VLAN to the input port in the bridge's MAC
+          learning table.  (This is skipped if the packet's VLAN is listed in
+          the switch's <code>Bridge</code> record in the
+          <code>flood_vlans</code> column, since there is no use for MAC
+          learning when all packets are flooded.)
+        </p>
+
+        <p>
+          When learning happens on a non-bond port, if the packet is a
+          gratuitous ARP, the entry is marked as ARP-locked.  The lock expires
+          after 5 seconds.  (See the ``SLB Bonding'' section in the OVS bonding
+          document for more information and a rationale.)
+        </p>
+      </li>
+
+      <li>
+        <b>IP multicast path:</b> If multicast snooping is enabled on the
+        bridge, and the packet is an Ethernet multicast but not an Ethernet
+        broadcast, and the packet is an IP packet, then the packet takes a
+        special processing path.  This path is not yet documented here.  <!--
+        XXX document multicast processing -->
+      </li>
+
+      <li>
+        <p>
+          <b>Output port set:</b> Search the MAC learning table for the port
+          corresponding to the packet's Ethernet destination and VLAN.  If the
+          search finds an entry, the output port set is the just the learned
+          port.  Otherwise (including the case where the packet is an Ethernet
+          multicast or in <code>flood_vlans</code>), the output port set is all
+          of the ports in the bridge that belong to the packet's VLAN, except
+          for any ports that were disabled for flooding via OpenFlow or that
+          are configured in a <code>Mirror</code> record as a mirror
+          destination port.
+        </p>
+      </li>
+    </ol>
+
+    <p>
+      The following egress stages execute once for each element in the set of
+      output ports.  They execute (conceptually) in parallel, so that a
+      decision or action taken for a given output port has no effect on those
+      for another one:
+    </p>
+
+    <ol>
+      <li>
+        <b>Drop loopback:</b> If the output port is the same as the input port,
+        drop the packet.
+      </li>
+
+      <li>
+        <p>
+          <b>VLAN output processing:</b> This stage adjusts the packet to
+          represent the VLAN in the correct way for the output port.  Its
+          behavior varies based on the <code>vlan-mode</code> in the output
+          port's <code>Port</code> record:
+        </p>
+
+        <dl>
+          <dt><code>trunk</code></dt>
+          <dt><code>native-tagged</code></dt>
+          <dt><code>native-untagged</code></dt>
+          <dd>
+            If the packet is in VLAN 0 (for <code>native-untagged</code>, if
+            the packet is in the native VLAN) drops any 802.1Q header.
+            Otherwise, ensures that there is an 802.1Q header designating the
+            VLAN.
+          </dd>
+
+          <dt><code>access</code></dt>
+          <dd>
+            Remove any 802.1Q header that was present.
+          </dd>
+
+          <dt><code>dot1q-tunnel</code></dt>
+          <dd>
+            Ensures that the packet has an outer 802.1Q header with the QinQ
+            Ethertype and the specified configured tag, and an inner 802.1Q
+            header with the packet's VLAN.
+          </dd>
+        </dl>
+      </li>
+
+      <li>
+        <b>VLAN priority tag processing:</b> If VLAN output processing
+        discarded the 802.1Q headers, but priority tags are enabled with
+        <code>other_config</code> : <code>priority-tags</code> in the output
+        port's <code>Port</code> record, then a priority-only tag is added
+        (perhaps only if the priority woule be nonzero, depending on the
+        configuration).
+      </li>
+
+      <li>
+        <p>
+          <b>Bond member choice:</b> If the output port is a bond, the code
+          chooses a particular member.  This step is skipped for non-bonded
+          ports.
+        </p>
+
+        <p>
+          If the bond is configured to use LACP, but LACP negotiation is
+          incomplete, then normally the packet is dropped.  The exception is
+          that if fallback to active-backup mode is enabled, the egress
+          pipeline continues choosing a bond member as if active-backup mode
+          was in use.
+        </p>
+
+        <p>
+          For active-backup mode, the output member is the active member.
+          Other modes hash appropriate header fields and use the hash value to
+          choose one of the enabled members.
+        </p>
+      </li>
+
+      <li>
+        <b>Output:</b> The pipeline sends the packet to the output port.
+      </li>
+    </ol>
+
     <action name="CONTROLLER">
       <h2>The <code>controller</code> action</h2>
       <syntax><code>controller</code></syntax>