diff mbox

[ovs-dev,patch_v7] vtep: add source node replication support.

Message ID 1461901590-32072-2-git-send-email-dlu998@gmail.com
State Superseded
Headers show

Commit Message

Darrell Ball April 29, 2016, 3:46 a.m. UTC
This patch series updates the vtep schema, vtep-ctl commands and vtep
simulator to support source node replication in addition to service node
replication per logical switch.  The default replication mode is service node
as that was the only mode previously supported.  Source node replication
mode is optionally configurable and resetting the replication mode implicitly
sets the replication mode back to a default of service node.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
---
 tests/vtep-ctl.at       | 17 ++++++++++++++
 vtep/README.ovs-vtep.md | 19 ++++++++++++++--
 vtep/ovs-vtep           | 37 ++++++++++++++++++++++++++----
 vtep/vtep-ctl.8.in      |  9 ++++++++
 vtep/vtep-ctl.c         | 42 ++++++++++++++++++++++++++++++++++
 vtep/vtep.ovsschema     |  9 ++++++--
 vtep/vtep.xml           | 60 ++++++++++++++++++++++++++++++++++++++++++++-----
 7 files changed, 180 insertions(+), 13 deletions(-)

Comments

Justin Pettit April 29, 2016, 9:16 p.m. UTC | #1
> On Apr 28, 2016, at 8:46 PM, Darrell Ball <dlu998@gmail.com> wrote:
> 
> This patch series updates the vtep schema, vtep-ctl commands and vtep
> simulator to support source node replication in addition to service node
> replication per logical switch.  The default replication mode is service node
> as that was the only mode previously supported.  Source node replication
> mode is optionally configurable and resetting the replication mode implicitly
> sets the replication mode back to a default of service node.
> 
> Signed-off-by: Darrell Ball <dlu998@gmail.com>

I believe Bruce acked v6, so if you didn't change the patch in ways that would likely cause him to revoke that, you can put his Acked-by with the new patch.  It makes it easier to track what's been already been reviewed--especially in large patch series that may have patches acked at different times.

(That said, if you and Bruce agree with my suggested changes, we will need a new ack.)

> diff --git a/tests/vtep-ctl.at b/tests/vtep-ctl.at
> index 99e97e8..d2323a0 100644
> --- a/tests/vtep-ctl.at
> +++ b/tests/vtep-ctl.at
> @@ -318,6 +318,23 @@ CHECK_LSWITCHES([a])
> VTEP_CTL_CLEANUP
> AT_CLEANUP
> 
> +AT_SETUP([add-ls a, set-ls-replication-mode a source_node])
> +AT_KEYWORDS([vtep-ctl])
> +VTEP_CTL_SETUP
> +AT_CHECK([RUN_VTEP_CTL(
> +  [add-ls a],[set-ls-replication-mode a source_node])],

Thanks for writing tests!  It would probably be good to make sure that this change actually modified the configuration, though.

> diff --git a/vtep/README.ovs-vtep.md b/vtep/README.ovs-vtep.md
> index 6734dab..74900f1 100644
> --- a/vtep/README.ovs-vtep.md
> +++ b/vtep/README.ovs-vtep.md
> ...
> +4. The alternate replication mode can also be reset back to the default of
> +   service node replication, at the logical switch level:
> +
> +      ```
> +vtep-ctl reset-ls-replication-mode ls0
> +      ```

I wonder about the utility of having a reset at all.  Regardless, I think it's likely confusing to have it listed as a step in the instructions about setting the emulator up to work with an NVC.  I'd just drop this step.

> diff --git a/vtep/vtep-ctl.8.in b/vtep/vtep-ctl.8.in
> index 129c7ed..25deebd 100644
> --- a/vtep/vtep-ctl.8.in
> +++ b/vtep/vtep-ctl.8.in
> @@ -195,6 +195,15 @@ combination on the physical switch \fIpswitch\fR.
> List the logical switch bindings for \fIport\fR on the physical switch
> \fIpswitch\fR.
> .
> +.IP "\fBset\-ls\-replication\-mode \fIlswitch replication\-mode\fR"
> +Set logical switch \fIlswitch\fR alternate replication mode to
> +\fIreplication\-mode\fR; the only valid value presently for alternate
> +replication mode is "source_node".
> +.
> +.IP "\fBreset\-ls\-replication\-mode \fIlswitch\fR"
> +Reset a logical switch \fIlswitch\fR alternate replication mode to the
> +default of "service_node".

In vtep-ctl, I would drop support for reset-ls-replication entirely.  I'd suggest that you add a get/set/del group of actions.  If someone chooses del-ls-replication, then it would go back to the default.  This is how fail-mode is handled in ovs-vsctl, which has similar requirements.

> diff --git a/vtep/vtep.ovsschema b/vtep/vtep.ovsschema
> index 533fd2e..a0e27fd 100644
> --- a/vtep/vtep.ovsschema
> +++ b/vtep/vtep.ovsschema
> @@ -96,6 +96,11 @@
>         "name": {"type": "string"},
>         "description": {"type": "string"},
>         "tunnel_key": {"type": {"key": "integer", "min": 0, "max": 1}},
> +        "alt_replication_mode": {

What about just calling this "replication_mode"?  It feels like we're casting judgment on whether source or  service node replication is better.

> +          "type": {
> +            "key": {
> +              "enum": ["set", ["source_node"]],
> +              "type": "string"},"min": 0, "max": 1}},

I would add "service_node" to the enums.  This way people can choose one of the two modes, and if they don't specify one, we can use a default.  This is similar to how we handle "fail_mode" for ovs-vswitchd.

> @@ -296,4 +301,4 @@
>           "ephemeral": true}},
>       "indexes": [["target"]],
>       "isRoot": false}},
> -  "version": "1.5.1"}
> +  "version": "1.6.1"}

I would have expected this to be "1.6.0".

> diff --git a/vtep/vtep.xml b/vtep/vtep.xml
> index a3a6988..a2fab02 100644
> --- a/vtep/vtep.xml
> +++ b/vtep/vtep.xml
> @@ -357,6 +357,24 @@
>         Indicates that an error has occurred in the switch but that no
>         more specific information is available.
>       </column>
> +
> +      <column name="switch_fault_status"
> +        key="unsupported_source_node_replication">
> +        Indicates that the requested source node replication mode cannot be
> +        supported by the physical switch;  this specifically means in this
> +        context that the physical switch lacks the capability to support
> +        source node replication mode.  This error occurs when a controller

Not supporting source node replication is certainly the only possibility with current vtep devices, but will that always be the case?  It seems like the user should be able to figure out that the configured value doesn't work, so the opposite one should.  I would have thought this language would be a bit more general.

> +    <group title="Alternate Replication Mode">
> +      <p>
> +        For handling broadcast, multicast (in default manner) and unknown
> +        unicast traffic, packets can be sent to all members of a logical
> +        switch referenced by a physical switch.  There are different modes
> +        to replicate the packets.  The default mode of replication is to
> +        send the traffic to a service node, which can be a hypervisor,
> +        server or appliance, and let the service node handle replication to
> +        other transport nodes (hypervisors or other VTEP physical
> +        switches).  This mode is called service node replication.  An
> +        alternate mode of replication, called source node replication
> +        involves the source node sending to all other transport nodes.
> +        Hypervisors are always responsible for doing their own
> +        replication for locally attached VMs in both modes.  Service node
> +        mode is the default and was the only option for prior versions of
> +        the schema.  Source node mode is an alternate replication mode
> +        that may be configured using this column.
> +      </p>
> +
> +      <column name="alt_replication_mode">
> +        <p>
> +          This optional column defines the alternate replication mode per
> +          <ref table="Logical_Switch"/>.  There is one valid value presently,
> +          <code>source_node</code>.
> +        </p>

Once again, calling this mode "alternate" feels negative to me.

> @@ -911,9 +958,12 @@
> 
>     <column name="locator_set">
>       The physical locator set to be used to reach this MAC address. In
> -      this table, the physical locator set will be either a service node IP
> -      address or a set of tunnel IP addresses of hypervisors (and
> -      potentially other VTEPs).
> +      this table, the physical locator set will be either a set of service
> +      node when service node replication is used or the set of transport

s/node/nodes/

Thanks,

--Justin
Darrell Ball April 30, 2016, 12:34 a.m. UTC | #2
On Fri, Apr 29, 2016 at 2:16 PM, Justin Pettit <jpettit@ovn.org> wrote:

>
> > On Apr 28, 2016, at 8:46 PM, Darrell Ball <dlu998@gmail.com> wrote:
> >
> > This patch series updates the vtep schema, vtep-ctl commands and vtep
> > simulator to support source node replication in addition to service node
> > replication per logical switch.  The default replication mode is service
> node
> > as that was the only mode previously supported.  Source node replication
> > mode is optionally configurable and resetting the replication mode
> implicitly
> > sets the replication mode back to a default of service node.
> >
> > Signed-off-by: Darrell Ball <dlu998@gmail.com>
>
> I believe Bruce acked v6, so if you didn't change the patch in ways that
> would likely cause him to revoke that, you can put his Acked-by with the
> new patch.  It makes it easier to track what's been already been
> reviewed--especially in large patch series that may have patches acked at
> different times.
>
> (That said, if you and Bruce agree with my suggested changes, we will need
> a new ack.)
>
> > diff --git a/tests/vtep-ctl.at b/tests/vtep-ctl.at
> > index 99e97e8..d2323a0 100644
> > --- a/tests/vtep-ctl.at
> > +++ b/tests/vtep-ctl.at
> > @@ -318,6 +318,23 @@ CHECK_LSWITCHES([a])
> > VTEP_CTL_CLEANUP
> > AT_CLEANUP
> >
> > +AT_SETUP([add-ls a, set-ls-replication-mode a source_node])
> > +AT_KEYWORDS([vtep-ctl])
> > +VTEP_CTL_SETUP
> > +AT_CHECK([RUN_VTEP_CTL(
> > +  [add-ls a],[set-ls-replication-mode a source_node])],
>
> Thanks for writing tests!  It would probably be good to make sure that
> this change actually modified the configuration, though.
>

The modification check is in a subsequent patch, but I can add a trivial
one here. Its good to set a precedence in this regard :-)



>
> > diff --git a/vtep/README.ovs-vtep.md b/vtep/README.ovs-vtep.md
> > index 6734dab..74900f1 100644
> > --- a/vtep/README.ovs-vtep.md
> > +++ b/vtep/README.ovs-vtep.md
> > ...
> > +4. The alternate replication mode can also be reset back to the default
> of
> > +   service node replication, at the logical switch level:
> > +
> > +      ```
> > +vtep-ctl reset-ls-replication-mode ls0
> > +      ```
>
> I wonder about the utility of having a reset at all.  Regardless, I think
> it's likely confusing to have it listed as a step in the instructions about
> setting the emulator up to work with an NVC.  I'd just drop this step
>

This is documentation to aid in vtep configuration. If the suggested
command remains to be the only way to set the replication mode back to
default, then it should probably be more accessible than less accessible. I
am not sure having it only in a man page makes it less confusing, but it
does make it less accessible.



> > diff --git a/vtep/vtep-ctl.8.in b/vtep/vtep-ctl.8.in
> > index 129c7ed..25deebd 100644
> > --- a/vtep/vtep-ctl.8.in
> > +++ b/vtep/vtep-ctl.8.in
> > @@ -195,6 +195,15 @@ combination on the physical switch \fIpswitch\fR.
> > List the logical switch bindings for \fIport\fR on the physical switch
> > \fIpswitch\fR.
> > .
> > +.IP "\fBset\-ls\-replication\-mode \fIlswitch replication\-mode\fR"
> > +Set logical switch \fIlswitch\fR alternate replication mode to
> > +\fIreplication\-mode\fR; the only valid value presently for alternate
> > +replication mode is "source_node".
> > +.
> > +.IP "\fBreset\-ls\-replication\-mode \fIlswitch\fR"
> > +Reset a logical switch \fIlswitch\fR alternate replication mode to the
> > +default of "service_node".
>
> In vtep-ctl, I would drop support for reset-ls-replication entirely.  I'd
> suggest that you add a get/set/del group of actions.  If someone chooses
> del-ls-replication, then it would go back to the default.  This is how
> fail-mode is handled in ovs-vsctl, which has similar requirements.
>

I considered del-ls-replication, but that implies the attribute is gone,
which it is not; its remains with default value.
Delete makes sense for removing a port or bridge.

I see what fail-mode has done and I don't think its ideal, fail-mode uses a
default of standalone that is sometimes displayed and sometimes not.
You can configure fail_mode to standalone and then delete it and it remains
standalone ?
That does not seem very intuitive.

If service-node mode remains as the default and is not configurable (per
suggestions) then reset is the correct semantics.

get is something I would like, but I did not see much prior art, so I did
not want
break anything :-). I'm fine to add it.



>
> > diff --git a/vtep/vtep.ovsschema b/vtep/vtep.ovsschema
> > index 533fd2e..a0e27fd 100644
> > --- a/vtep/vtep.ovsschema
> > +++ b/vtep/vtep.ovsschema
> > @@ -96,6 +96,11 @@
> >         "name": {"type": "string"},
> >         "description": {"type": "string"},
> >         "tunnel_key": {"type": {"key": "integer", "min": 0, "max": 1}},
> > +        "alt_replication_mode": {
>
> What about just calling this "replication_mode"?  It feels like we're
> casting judgment on whether source or  service node replication is better.
>

In the original patch, both service and source modes were equals and the
term "replication_mode" was used in that case.

There are legacy and existing support considerations to take into account,
so the
question is whether service_node is configurable or just a hard default, in
which case source_node is relegated to "alternate" mode.



>
> > +          "type": {
> > +            "key": {
> > +              "enum": ["set", ["source_node"]],
> > +              "type": "string"},"min": 0, "max": 1}},
>
> I would add "service_node" to the enums.  This way people can choose one
> of the two modes, and if they don't specify one, we can use a default.
> This is similar to how we handle "fail_mode" for ovs-vswitchd.
>

Well, both service_node and source_node were originally part of the enum in
the
first patch :-)
We can discuss it.



>
> > @@ -296,4 +301,4 @@
> >           "ephemeral": true}},
> >       "indexes": [["target"]],
> >       "isRoot": false}},
> > -  "version": "1.5.1"}
> > +  "version": "1.6.1"}
>
> I would have expected this to be "1.6.0".
>

thats fine



>
> > diff --git a/vtep/vtep.xml b/vtep/vtep.xml
> > index a3a6988..a2fab02 100644
> > --- a/vtep/vtep.xml
> > +++ b/vtep/vtep.xml
> > @@ -357,6 +357,24 @@
> >         Indicates that an error has occurred in the switch but that no
> >         more specific information is available.
> >       </column>
> > +
> > +      <column name="switch_fault_status"
> > +        key="unsupported_source_node_replication">
> > +        Indicates that the requested source node replication mode
> cannot be
> > +        supported by the physical switch;  this specifically means in
> this
> > +        context that the physical switch lacks the capability to support
> > +        source node replication mode.  This error occurs when a
> controller
>
> Not supporting source node replication is certainly the only possibility
> with current vtep devices, but will that always be the case?  It seems like
> the user should be able to figure out that the configured value doesn't
> work, so the opposite one should.  I would have thought this language would
> be a bit more general.
>

I had originally considered both service_node and source_node as equals in
the first patch, in which case the language here would have certainly
reflected that.

The question really comes down to  - should service_node and source_node be
equals or
are there other overriding considerations.




>
> > +    <group title="Alternate Replication Mode">
> > +      <p>
> > +        For handling broadcast, multicast (in default manner) and
> unknown
> > +        unicast traffic, packets can be sent to all members of a logical
> > +        switch referenced by a physical switch.  There are different
> modes
> > +        to replicate the packets.  The default mode of replication is to
> > +        send the traffic to a service node, which can be a hypervisor,
> > +        server or appliance, and let the service node handle
> replication to
> > +        other transport nodes (hypervisors or other VTEP physical
> > +        switches).  This mode is called service node replication.  An
> > +        alternate mode of replication, called source node replication
> > +        involves the source node sending to all other transport nodes.
> > +        Hypervisors are always responsible for doing their own
> > +        replication for locally attached VMs in both modes.  Service
> node
> > +        mode is the default and was the only option for prior versions
> of
> > +        the schema.  Source node mode is an alternate replication mode
> > +        that may be configured using this column.
> > +      </p>
> > +
> > +      <column name="alt_replication_mode">
> > +        <p>
> > +          This optional column defines the alternate replication mode
> per
> > +          <ref table="Logical_Switch"/>.  There is one valid value
> presently,
> > +          <code>source_node</code>.
> > +        </p>
>
> Once again, calling this mode "alternate" feels negative to me.
>

See my last comment; it applies here as well



>
> > @@ -911,9 +958,12 @@
> >
> >     <column name="locator_set">
> >       The physical locator set to be used to reach this MAC address. In
> > -      this table, the physical locator set will be either a service
> node IP
> > -      address or a set of tunnel IP addresses of hypervisors (and
> > -      potentially other VTEPs).
> > +      this table, the physical locator set will be either a set of
> service
> > +      node when service node replication is used or the set of transport
>
> s/node/nodes/
>

thanks



>
> Thanks,
>
> --Justin
>
>
>
diff mbox

Patch

diff --git a/tests/vtep-ctl.at b/tests/vtep-ctl.at
index 99e97e8..d2323a0 100644
--- a/tests/vtep-ctl.at
+++ b/tests/vtep-ctl.at
@@ -318,6 +318,23 @@  CHECK_LSWITCHES([a])
 VTEP_CTL_CLEANUP
 AT_CLEANUP
 
+AT_SETUP([add-ls a, set-ls-replication-mode a source_node])
+AT_KEYWORDS([vtep-ctl])
+VTEP_CTL_SETUP
+AT_CHECK([RUN_VTEP_CTL(
+  [add-ls a],[set-ls-replication-mode a source_node])],
+  [0], [], [], [VTEP_CTL_CLEANUP])
+VTEP_CTL_CLEANUP
+AT_CLEANUP
+
+AT_SETUP([add-ls a, reset-ls-replication-mode a])
+AT_KEYWORDS([vtep-ctl])
+VTEP_CTL_SETUP
+AT_CHECK([RUN_VTEP_CTL(
+  [add-ls a],[reset-ls-replication-mode a])],
+  [0], [], [], [VTEP_CTL_CLEANUP])
+VTEP_CTL_CLEANUP
+AT_CLEANUP
 
 dnl ----------------------------------------------------------------------
 AT_BANNER([vtep-ctl unit tests -- logical binding tests])
diff --git a/vtep/README.ovs-vtep.md b/vtep/README.ovs-vtep.md
index 6734dab..74900f1 100644
--- a/vtep/README.ovs-vtep.md
+++ b/vtep/README.ovs-vtep.md
@@ -166,13 +166,28 @@  vtep-ctl bind-ls br0 p0 0 ls0
 vtep-ctl set Logical_Switch ls0 tunnel_key=33
       ```
 
-3. Direct unknown destinations out a tunnel:
+3. Optionally, change the replication mode from a default of service_node to
+   an alternate mode of source_node, which can be done at the logical switch
+   level:
+
+      ```
+vtep-ctl set-ls-replication-mode ls0 source_node
+      ```
+
+4. The alternate replication mode can also be reset back to the default of
+   service node replication, at the logical switch level:
+
+      ```
+vtep-ctl reset-ls-replication-mode ls0
+      ```
+
+5. Direct unknown destinations out a tunnel:
 
       ```
 vtep-ctl add-mcast-remote ls0 unknown-dst 10.2.2.2
       ```
 
-4. Direct unicast destinations out a different tunnel:
+6. Direct unicast destinations out a different tunnel:
 
       ```
 vtep-ctl add-ucast-remote ls0 00:11:22:33:44:55 10.2.2.3
diff --git a/vtep/ovs-vtep b/vtep/ovs-vtep
index 31ff159..6798f9e 100755
--- a/vtep/ovs-vtep
+++ b/vtep/ovs-vtep
@@ -94,6 +94,7 @@  class Logical_Switch(object):
         self.unknown_dsts = set()
         self.tunnel_key = 0
         self.setup_ls()
+        self.replication_mode = "service_node"
 
     def __del__(self):
         vlog.info("destroying lswitch %s" % self.name)
@@ -141,13 +142,17 @@  class Logical_Switch(object):
             ovs_ofctl("add-flow %s table=1,priority=1,in_port=%s,action=%s"
                         % (self.short_name, port_no, ",".join(flood_ports)))
 
-        # Traffic coming from a VTEP physical port should only be flooded to
-        # one 'unknown-dst' and to all other physical ports that belong to that
-        # VTEP device and this logical switch.
+        # Traffic coming from a VTEP physical port should always be flooded to
+        # all the other physical ports that belong to that VTEP device and
+        # this logical switch.  If the replication mode is service node then
+        # send to one unknown_dst node (the first one here); else we assume the
+        # replication mode is source node and we send the packet to all
+        # unknown_dst nodes.
         for tunnel in self.unknown_dsts:
             port_no = self.tunnels[tunnel][0]
             flood_ports.append(port_no)
-            break
+            if self.replication_mode == "service_node":
+                break
 
         ovs_ofctl("add-flow %s table=1,priority=0,action=%s"
                   % (self.short_name, ",".join(flood_ports)))
@@ -293,8 +298,32 @@  class Logical_Switch(object):
 
         self.remote_macs = remote_macs
 
+        replication_mode = vtep_ctl("get logical_switch %s "
+                                    "alt_replication_mode"
+                                    % self.name)
+
+        # Replication mode is an optional column and if it is not set,
+        # replication mode defaults to service_node.
+        if replication_mode == "[]":
+            replication_mode = "service_node"
+
+        # If the logical switch level replication mode has changed then
+        # update to that value.
+        replic_mode_change = False
+        if replication_mode != self.replication_mode:
+            self.replication_mode = replication_mode
+            vlog.info("%s replication mode changed to %s" %
+                       (self.name, self.replication_mode))
+            replic_mode_change = True
+
+        unk_dsts_change = False
         if (self.unknown_dsts != unknown_dsts):
             self.unknown_dsts = unknown_dsts
+            unk_dsts_change = True
+
+        # If either the replication mode has changed or the unknown
+        # destinations set has changed, update the flooding decision.
+        if replic_mode_change is True or unk_dsts_change is True:
             self.update_flood()
 
     def update_stats(self):
diff --git a/vtep/vtep-ctl.8.in b/vtep/vtep-ctl.8.in
index 129c7ed..25deebd 100644
--- a/vtep/vtep-ctl.8.in
+++ b/vtep/vtep-ctl.8.in
@@ -195,6 +195,15 @@  combination on the physical switch \fIpswitch\fR.
 List the logical switch bindings for \fIport\fR on the physical switch
 \fIpswitch\fR.
 .
+.IP "\fBset\-ls\-replication\-mode \fIlswitch replication\-mode\fR"
+Set logical switch \fIlswitch\fR alternate replication mode to
+\fIreplication\-mode\fR; the only valid value presently for alternate
+replication mode is "source_node".
+.
+.IP "\fBreset\-ls\-replication\-mode \fIlswitch\fR"
+Reset a logical switch \fIlswitch\fR alternate replication mode to the
+default of "service_node".
+.
 .SS "Logical Router Commands"
 These commands examine and manipulate logical routers.
 .
diff --git a/vtep/vtep-ctl.c b/vtep/vtep-ctl.c
index 29d9a17..69cdb57 100644
--- a/vtep/vtep-ctl.c
+++ b/vtep/vtep-ctl.c
@@ -335,6 +335,8 @@  Logical Switch commands:\n\
   bind-ls PS PORT VLAN LS     bind LS to VLAN on PORT\n\
   unbind-ls PS PORT VLAN      unbind logical switch on VLAN from PORT\n\
   list-bindings PS PORT       list bindings for PORT on PS\n\
+  set-ls-replication-mode LS MODE  set replication mode on LS\n\
+  reset-ls-replication-mode LS     reset replication mode on LS\n\
 \n\
 Logical Router commands:\n\
   add-lr LR                   create a new logical router named LR\n\
@@ -851,6 +853,8 @@  pre_get_info(struct ctl_context *ctx)
     ovsdb_idl_add_column(ctx->idl, &vteprec_physical_port_col_vlan_bindings);
 
     ovsdb_idl_add_column(ctx->idl, &vteprec_logical_switch_col_name);
+    ovsdb_idl_add_column(ctx->idl,
+                         &vteprec_logical_switch_col_alt_replication_mode);
 
     ovsdb_idl_add_column(ctx->idl, &vteprec_logical_router_col_name);
 
@@ -1523,6 +1527,40 @@  cmd_unbind_ls(struct ctl_context *ctx)
     vtep_ctl_context_invalidate_cache(ctx);
 }
 
+static void
+cmd_set_ls_replication_mode(struct ctl_context *ctx)
+{
+    struct vtep_ctl_context *vtepctl_ctx = vtep_ctl_context_cast(ctx);
+    struct vtep_ctl_lswitch *ls;
+    const char *ls_name = ctx->argv[1];
+
+    vtep_ctl_context_populate_cache(ctx);
+
+    if (strcmp(ctx->argv[2], "source_node")) {
+        ctl_fatal("Alternate replication mode must be 'source_node'");
+    }
+
+    ls = find_lswitch(vtepctl_ctx, ls_name, true);
+    vteprec_logical_switch_set_alt_replication_mode(ls->ls_cfg, ctx->argv[2]);
+
+    vtep_ctl_context_invalidate_cache(ctx);
+}
+
+static void
+cmd_reset_ls_replication_mode(struct ctl_context *ctx)
+{
+    struct vtep_ctl_context *vtepctl_ctx = vtep_ctl_context_cast(ctx);
+    struct vtep_ctl_lswitch *ls;
+    const char *ls_name = ctx->argv[1];
+
+    vtep_ctl_context_populate_cache(ctx);
+
+    ls = find_lswitch(vtepctl_ctx, ls_name, true);
+    vteprec_logical_switch_set_alt_replication_mode(ls->ls_cfg, NULL);
+
+    vtep_ctl_context_invalidate_cache(ctx);
+}
+
 static struct vtep_ctl_lrouter *
 find_lrouter(struct vtep_ctl_context *vtepctl_ctx,
              const char *name, bool must_exist)
@@ -2459,6 +2497,10 @@  static const struct ctl_command_syntax vtep_commands[] = {
     {"list-bindings", 2, 2, NULL, pre_get_info, cmd_list_bindings, NULL, "", RO},
     {"bind-ls", 4, 4, NULL, pre_get_info, cmd_bind_ls, NULL, "", RO},
     {"unbind-ls", 3, 3, NULL, pre_get_info, cmd_unbind_ls, NULL, "", RO},
+    {"set-ls-replication-mode", 2, 2, "LS MODE", pre_get_info,
+        cmd_set_ls_replication_mode, NULL, "", RW},
+    {"reset-ls-replication-mode", 1, 1, "LS", pre_get_info,
+        cmd_reset_ls_replication_mode, NULL, "", RW},
 
     /* Logical Router commands. */
     {"add-lr", 1, 1, NULL, pre_get_info, cmd_add_lr, NULL, "--may-exist", RW},
diff --git a/vtep/vtep.ovsschema b/vtep/vtep.ovsschema
index 533fd2e..a0e27fd 100644
--- a/vtep/vtep.ovsschema
+++ b/vtep/vtep.ovsschema
@@ -1,6 +1,6 @@ 
 {
   "name": "hardware_vtep",
-  "cksum": "770244945 11113",
+  "cksum": "1527138929 11290",
   "tables": {
     "Global": {
       "columns": {
@@ -96,6 +96,11 @@ 
         "name": {"type": "string"},
         "description": {"type": "string"},
         "tunnel_key": {"type": {"key": "integer", "min": 0, "max": 1}},
+        "alt_replication_mode": {
+          "type": {
+            "key": {
+              "enum": ["set", ["source_node"]],
+              "type": "string"},"min": 0, "max": 1}},
         "other_config": {
           "type": {"key": "string", "value": "string",
                    "min": 0, "max": "unlimited"}}},
@@ -296,4 +301,4 @@ 
           "ephemeral": true}},
       "indexes": [["target"]],
       "isRoot": false}},
-  "version": "1.5.1"}
+  "version": "1.6.1"}
diff --git a/vtep/vtep.xml b/vtep/vtep.xml
index a3a6988..a2fab02 100644
--- a/vtep/vtep.xml
+++ b/vtep/vtep.xml
@@ -357,6 +357,24 @@ 
         Indicates that an error has occurred in the switch but that no
         more specific information is available.
       </column>
+
+      <column name="switch_fault_status"
+        key="unsupported_source_node_replication">
+        Indicates that the requested source node replication mode cannot be
+        supported by the physical switch;  this specifically means in this
+        context that the physical switch lacks the capability to support
+        source node replication mode.  This error occurs when a controller
+        attempts to set source node replication mode for one of the logical
+        switches that the physical switch is keeping context for.  An NVC
+        that observes this error should take appropriate action (for example
+        reverting the logical switch to service node replication mode).
+        It is recommended that an NVC be proactive and test for support of
+        source node replication by using a test logical switch on vtep 
+        physical switch nodes and then trying to change the replication mode
+        to source node on this logical switch, checking for error.  The NVC
+        could remember this capability per vtep physical switch.  Using
+        mixed replication modes on a given logical switch is not recommended.
+      </column>
     </group>
 
     <group title="Common Column">
@@ -754,6 +772,35 @@ 
       </column>
     </group>
 
+    <group title="Alternate Replication Mode">
+      <p>
+        For handling broadcast, multicast (in default manner) and unknown
+        unicast traffic, packets can be sent to all members of a logical
+        switch referenced by a physical switch.  There are different modes
+        to replicate the packets.  The default mode of replication is to
+        send the traffic to a service node, which can be a hypervisor,
+        server or appliance, and let the service node handle replication to
+        other transport nodes (hypervisors or other VTEP physical
+        switches).  This mode is called service node replication.  An
+        alternate mode of replication, called source node replication
+        involves the source node sending to all other transport nodes.
+        Hypervisors are always responsible for doing their own
+        replication for locally attached VMs in both modes.  Service node
+        mode is the default and was the only option for prior versions of
+        the schema.  Source node mode is an alternate replication mode
+        that may be configured using this column.
+      </p>
+
+      <column name="alt_replication_mode">
+        <p>
+          This optional column defines the alternate replication mode per
+          <ref table="Logical_Switch"/>.  There is one valid value presently,
+          <code>source_node</code>.
+        </p>
+
+      </column>
+    </group>
+
     <group title="Identification">
       <column name="name">
         Symbolic name for the logical switch.
@@ -887,8 +934,8 @@ 
       Multicast packet replication may be handled by a service node,
       in which case the physical locators will be IP addresses of
       service nodes. If the VTEP supports replication onto multiple
-      tunnels, then this may be used to replicate directly onto
-      VTEP-hypervisor tunnels.
+      tunnels, using source node replication, then this may be used to
+      replicate directly onto VTEP-hypervisor or VTEP-VTEP tunnels.
     </p>
 
     <column name="MAC">
@@ -911,9 +958,12 @@ 
 
     <column name="locator_set">
       The physical locator set to be used to reach this MAC address. In
-      this table, the physical locator set will be either a service node IP
-      address or a set of tunnel IP addresses of hypervisors (and
-      potentially other VTEPs).
+      this table, the physical locator set will be either a set of service
+      node when service node replication is used or the set of transport
+      nodes (defined as hypervisors or VTEPs) participating in the associated
+      logical switch. When service node replication is used, the VTEP should
+      send packets to one member of the locator set that is known to be
+      healthy and reachable, which could be determined by BFD.
     </column>
 
     <column name="ipaddr">