diff mbox

[ovs-dev] ovn: Document limitation in the L3HA plan

Message ID 20170424082118.37546-1-majopela@redhat.com
State Accepted
Headers show

Commit Message

Miguel Angel Ajo April 24, 2017, 8:21 a.m. UTC
From: Miguel Angel Ajo <majopela@redhat.com>

The intergateway monitoring covers host failure well, but
it doesn't cover path failure which is a more complicated
problem.

By this change I don't mean we should implement something
to cover path failure right away, but that we should
keep the limitation in mind.

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
---
 Documentation/topics/high-availability.rst | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Miguel Angel Ajo April 24, 2017, 8:24 a.m. UTC | #1
Anil Venkata and I were talking about this last week and we
realised we had this limitation. It's not uncommon to other
mechanisms like VRRP or CARP, but we thought that it was
good to make sure everyone was on the same page, and
that having gateways across multiple L2 domains with routing
in the middle could be problematic. At a minimum it would require
less strict tuning of the BFD pinging intervals.



On Mon, Apr 24, 2017 at 10:21 AM, <majopela@redhat.com> wrote:

> From: Miguel Angel Ajo <majopela@redhat.com>
>
> The intergateway monitoring covers host failure well, but
> it doesn't cover path failure which is a more complicated
> problem.
>
> By this change I don't mean we should implement something
> to cover path failure right away, but that we should
> keep the limitation in mind.
>
> Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
> ---
>  Documentation/topics/high-availability.rst | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/Documentation/topics/high-availability.rst
> b/Documentation/topics/high-availability.rst
> index 5b21b6469..7ee9357c0 100644
> --- a/Documentation/topics/high-availability.rst
> +++ b/Documentation/topics/high-availability.rst
> @@ -288,6 +288,14 @@ which are alive, and therefore whether or not that
> gateway happens to be the
>  leader.  If leading, the gateway forwards traffic normally, otherwise it
> drops
>  all traffic.
>
> +We should note that this method works well under the assumption that there
> +are no inter-gateway connectivity failures, in such case this method
> would fail
> +to elect a single master. The simplest example is two gateways which stop
> seeing
> +each other but can still reach the hypervisors. Protocols like VRRP or
> CARP
> +have the same issue. A mitigation for this type of failure mode could be
> +achieved by having all network elements (hypervisors and gateways)
> periodically
> +share their link status to other endpoints.
> +
>  Gateway Leadership Resignation
>  ++++++++++++++++++++++++++++++
>
> --
> 2.11.0 (Apple Git-81)
>
>
Ben Pfaff May 1, 2017, 9:49 p.m. UTC | #2
On Mon, Apr 24, 2017 at 10:21:18AM +0200, majopela@redhat.com wrote:
> From: Miguel Angel Ajo <majopela@redhat.com>
> 
> The intergateway monitoring covers host failure well, but
> it doesn't cover path failure which is a more complicated
> problem.
> 
> By this change I don't mean we should implement something
> to cover path failure right away, but that we should
> keep the limitation in mind.
> 
> Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>

Thank you for thinking about this!  I applied this to master.
diff mbox

Patch

diff --git a/Documentation/topics/high-availability.rst b/Documentation/topics/high-availability.rst
index 5b21b6469..7ee9357c0 100644
--- a/Documentation/topics/high-availability.rst
+++ b/Documentation/topics/high-availability.rst
@@ -288,6 +288,14 @@  which are alive, and therefore whether or not that gateway happens to be the
 leader.  If leading, the gateway forwards traffic normally, otherwise it drops
 all traffic.
 
+We should note that this method works well under the assumption that there
+are no inter-gateway connectivity failures, in such case this method would fail
+to elect a single master. The simplest example is two gateways which stop seeing
+each other but can still reach the hypervisors. Protocols like VRRP or CARP
+have the same issue. A mitigation for this type of failure mode could be
+achieved by having all network elements (hypervisors and gateways) periodically
+share their link status to other endpoints.
+
 Gateway Leadership Resignation
 ++++++++++++++++++++++++++++++