From patchwork Wed Nov 25 11:02:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406017 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Cgyf74bTMz9sRK for ; Wed, 25 Nov 2020 22:02:39 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 4160B86C2B; Wed, 25 Nov 2020 11:02:38 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bbnZQbwg6-xf; Wed, 25 Nov 2020 11:02:34 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id C6CF186C52; Wed, 25 Nov 2020 11:02:32 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id AD22AC1D9F; Wed, 25 Nov 2020 11:02:32 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 83C1CC0891 for ; Wed, 25 Nov 2020 11:02:31 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 6ABCE86E36 for ; Wed, 25 Nov 2020 11:02:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NvNV9rpppfoW for ; Wed, 25 Nov 2020 11:02:27 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by whitealder.osuosl.org (Postfix) with ESMTPS id CA7DA86E15 for ; Wed, 25 Nov 2020 11:02:26 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZ6-0004U3-AR; Wed, 25 Nov 2020 11:02:24 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZ2-00038T-AW; Wed, 25 Nov 2020 11:02:22 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:03 +0000 Message-Id: <20201125110216.7944-2-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 01/14] ovn-libs: Add support for parallel processing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov This adds a set of functions and macros intended to process hashes in parallel. The principles of operation are documented in the fasthmap.h If these one day go into the OVS tree, the OVS tree versions would be used in preference. Signed-off-by: Anton Ivanov --- lib/automake.mk | 2 + lib/fasthmap.c | 281 ++++++++++++++++++++++++++++++++++++++++++++++++ lib/fasthmap.h | 206 +++++++++++++++++++++++++++++++++++ 3 files changed, 489 insertions(+) create mode 100644 lib/fasthmap.c create mode 100644 lib/fasthmap.h diff --git a/lib/automake.mk b/lib/automake.mk index 250c7aefa..d7e4b20cf 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -13,6 +13,8 @@ lib_libovn_la_SOURCES = \ lib/expr.c \ lib/extend-table.h \ lib/extend-table.c \ + lib/fasthmap.h \ + lib/fasthmap.c \ lib/ip-mcast-index.c \ lib/ip-mcast-index.h \ lib/mcast-group-index.c \ diff --git a/lib/fasthmap.c b/lib/fasthmap.c new file mode 100644 index 000000000..3096c90d3 --- /dev/null +++ b/lib/fasthmap.c @@ -0,0 +1,281 @@ +/* + * Copyright (c) 2020 Red Hat, Inc. + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2015, 2019 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include +#include +#include +#include "fatal-signal.h" +#include "util.h" +#include "openvswitch/vlog.h" +#include "openvswitch/hmap.h" +#include "openvswitch/thread.h" +#include "fasthmap.h" +#include "ovs-atomic.h" +#include "ovs-thread.h" +#include "ovs-numa.h" + +VLOG_DEFINE_THIS_MODULE(fasthmap); + + +static bool worker_pool_setup = false; +static bool workers_must_exit = false; +static bool can_parallelize = false; + +static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools); + +static struct ovs_mutex init_mutex = OVS_MUTEX_INITIALIZER; + +static int pool_size; + +static void worker_pool_hook(void *aux OVS_UNUSED) { + int i; + static struct worker_pool *pool; + workers_must_exit = true; /* all workers must honour this flag */ + atomic_thread_fence(memory_order_release); + LIST_FOR_EACH (pool, list_node, &worker_pools) { + for (i = 0; i < pool->size ; i++) { + sem_post(&pool->controls[i].fire); + } + } +} + +static void setup_worker_pools(void) { + int cores, nodes; + + nodes = ovs_numa_get_n_numas(); + if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) { + nodes = 1; + } + cores = ovs_numa_get_n_cores(); + + /* If there is no NUMA config, use 4 cores. + * If there is NUMA config use half the cores on + * one node so that the OS does not start pushing + * threads to other nodes. + */ + if (cores == OVS_CORE_UNSPEC || cores <= 0) { + /* If there is no NUMA we can try the ovs-threads routine. + * It falls back to sysconf and/or affinity mask. + */ + cores = count_cpu_cores(); + pool_size = cores; + } else { + pool_size = cores / nodes; + } + if (pool_size > 16) { + pool_size = 16; + } + can_parallelize = (pool_size >= 3); + fatal_signal_add_hook(worker_pool_hook, NULL, NULL, true); + worker_pool_setup = true; +} + +bool ovn_cease_fire(void) +{ + return workers_must_exit; +} + +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)){ + + struct worker_pool *new_pool = NULL; + struct worker_control *new_control; + int i; + + ovs_mutex_lock(&init_mutex); + + if (!worker_pool_setup) { + setup_worker_pools(); + } + + if (can_parallelize) { + new_pool = xmalloc(sizeof(struct worker_pool)); + new_pool->size = pool_size; + sem_init(&new_pool->done, 0, 0); + + ovs_list_push_back(&worker_pools, &new_pool->list_node); + + new_pool->controls = + xmalloc(sizeof(struct worker_control) * new_pool->size); + + for (i = 0; i < new_pool->size; i++) { + new_control = &new_pool->controls[i]; + sem_init(&new_control->fire, 0, 0); + new_control->id = i; + new_control->done = &new_pool->done; + new_control->data = NULL; + ovs_mutex_init(&new_control->mutex); + new_control->finished = ATOMIC_VAR_INIT(false); + } + + for (i = 0; i < pool_size; i++) { + ovs_thread_create("worker pool helper", start, &new_pool->controls[i]); + } + } + ovs_mutex_unlock(&init_mutex); + return new_pool; +} + + +/* Initializes 'hmap' as an empty hash table with mask N. */ +void +ovn_fast_hmap_init(struct hmap *hmap, ssize_t mask) +{ + size_t i; + + hmap->buckets = xmalloc(sizeof (struct hmap_node *) * (mask + 1)); + hmap->one = NULL; + hmap->mask = mask; + hmap->n = 0; + for (i = 0; i <= hmap->mask; i++) { + hmap->buckets[i] = NULL; + } +} + +/* Initializes 'hmap' as an empty hash table of size X. + * Intended for use in parallel processing so that all + * fragments used to store results in a parallel job + * are the same size. + */ +void +ovn_fast_hmap_size_for(struct hmap *hmap, int size) +{ + size_t mask; + mask = size / 2; + mask |= mask >> 1; + mask |= mask >> 2; + mask |= mask >> 4; + mask |= mask >> 8; + mask |= mask >> 16; +#if SIZE_MAX > UINT32_MAX + mask |= mask >> 32; +#endif + + /* If we need to dynamically allocate buckets we might as well allocate at + * least 4 of them. */ + mask |= (mask & 1) << 1; + + fast_hmap_init(hmap, mask); +} + +/* Run a thread pool which uses a callback function to process results + */ + +void ovn_run_pool_callback( + struct worker_pool *pool, + void *fin_result, + void *result_frags, + void (*helper_func)(struct worker_pool *pool, + void *fin_result, void *result_frags, int index)) +{ + int index, completed; + + atomic_thread_fence(memory_order_release); + + for (index = 0; index < pool->size; index++) { + sem_post(&pool->controls[index].fire); + } + + completed = 0; + + do { + bool test; + sem_wait(&pool->done); + for (index = 0; index < pool->size; index++) { + test = true; + if (atomic_compare_exchange_weak( + &pool->controls[index].finished, + &test, + false)) { + if (helper_func) { + (helper_func)(pool, fin_result, result_frags, index); + } + completed++; + pool->controls[index].data = NULL; + } + } + } while (completed < pool->size); +} + +/* Run a thread pool - basic, does not do results processing. + */ + +void ovn_run_pool(struct worker_pool *pool) +{ + ovn_run_pool_callback(pool, NULL, NULL, NULL); +} + +/* Brute force merge of a hashmap into another hashmap. + * Intended for use in parallel processing. The destination + * hashmap MUST be the same size as the one being merged. + * + * This can be achieved by pre-allocating them to correct size + * and using hmap_insert_fast() instead of hmap_insert() + */ + +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc) +{ + size_t i; + + ovs_assert(inc->mask == dest->mask); + + if (!inc->n) { + /* Request to merge an empty frag, nothing to do */ + return; + } + + for (i = 0; i <= dest->mask; i++) { + struct hmap_node **dest_bucket = &dest->buckets[i]; + struct hmap_node **inc_bucket = &inc->buckets[i]; + if (*inc_bucket != NULL) { + struct hmap_node *last_node = *inc_bucket; + while (last_node->next != NULL) { + last_node = last_node->next; + } + last_node->next = *dest_bucket; + *dest_bucket = *inc_bucket; + *inc_bucket = NULL; + } + } + dest->n += inc->n; + inc->n = 0; +} + +/* Run a thread pool which gathers results in an array + * of hashes. Merge results. + */ + +static void merge_hash_results(struct worker_pool *pool OVS_UNUSED, + void *fin_result, void *result_frags, int index) +{ + struct hmap *result = (struct hmap *)fin_result; + struct hmap *res_frags = (struct hmap *)result_frags; + + fast_hmap_merge(result, &res_frags[index]); + hmap_destroy(&res_frags[index]); +} + + +void ovn_run_pool_hash( + struct worker_pool *pool, + struct hmap *result, + struct hmap *result_frags) +{ + ovn_run_pool_callback(pool, result, result_frags, merge_hash_results); +} diff --git a/lib/fasthmap.h b/lib/fasthmap.h new file mode 100644 index 000000000..a362b0f5c --- /dev/null +++ b/lib/fasthmap.h @@ -0,0 +1,206 @@ +/* + * Copyright (c) 2020 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef HMAP_HAS_PARALLEL_MACROS +#define HMAP_HAS_PARALLEL_MACROS 1 + +/* if the parallel macros are defined by hmap.h or any other ovs define + * we skip over the ovn specific definitions. + */ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include +#include +#include "openvswitch/util.h" +#include "openvswitch/hmap.h" +#include "openvswitch/thread.h" +#include "ovs-atomic.h" + +/* A version of the HMAP_FOR_EACH macro intended for iterating as part + * of parallel processing. + * Each worker thread has a different ThreadID in the range of 0..POOL_SIZE + * and will iterate hash buckets ThreadID, ThreadID + step, + * ThreadID + step * 2, etc. The actual macro accepts + * ThreadID + step * i as the JOBID parameter. + */ + +#define HMAP_FOR_EACH_IN_PARALLEL(NODE, MEMBER, JOBID, HMAP) \ + for (INIT_CONTAINER(NODE, hmap_first_in_bucket_num(HMAP, JOBID), MEMBER); \ + (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) \ + || ((NODE = NULL), false); \ + ASSIGN_CONTAINER(NODE, hmap_next_in_bucket(&(NODE)->MEMBER), MEMBER)) + +/* We do not have a SAFE version of the macro, because the hash size is not + * atomic and hash removal operations would need to be wrapped with + * locks. This will defeat most of the benefits from doing anything in + * parallel. + * If the code block inside FOR_EACH_IN_PARALLEL needs to remove elements, + * each thread should store them in a temporary list result instead, merging + * the lists into a combined result at the end */ + +/* Work "Handle" */ + +struct worker_control { + int id; /* Used as a modulo when iterating over a hash. */ + atomic_bool finished; /* Set to true after achunk of work is complete. */ + sem_t fire; /* Work start semaphore - sem_post starts the worker. */ + sem_t *done; /* Work completion semaphore - sem_post on completion. */ + struct ovs_mutex mutex; /* Guards the data. */ + void *data; /* Pointer to data to be processed. */ + void *workload; /* back-pointer to the worker pool structure. */ +}; + +struct worker_pool { + int size; /* Number of threads in the pool. */ + struct ovs_list list_node; /* List of pools - used in cleanup/exit. */ + struct worker_control *controls; /* "Handles" in this pool. */ + sem_t done; /* Work completion semaphorew. */ +}; + +/* Add a worker pool for thread function start() which expects a pointer to + * a worker_control structure as an argument. */ + +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *)); + +/* Setting this to true will make all processing threads exit */ + +bool ovn_cease_fire(void); + +/* Build a hmap pre-sized for size elements */ + +void ovn_fast_hmap_size_for(struct hmap *hmap, int size); + +/* Build a hmap with a mask equals to size */ + +void ovn_fast_hmap_init(struct hmap *hmap, ssize_t size); + +/* Brute-force merge a hmap into hmap. + * Dest and inc have to have the same mask. The merge is performed + * by extending the element list for bucket N in the dest hmap with the list + * from bucket N in inc. + */ + +void ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc); + +/* Merge two lists. + * It is possible to achieve the same functionality using ovs_list_splice(). + * This ensures the splicing is exactly for tail of dest to head of inc. + */ + +void ovn_merge_lists(struct ovs_list **dest, struct ovs_list *inc); + +/* Run a pool, without any default processing of results. + */ + +void ovn_run_pool(struct worker_pool *pool); + +/* Run a pool, merge results from hash frags into a final hash result. + * The hash frags must be pre-sized to the same size. + */ + +void ovn_run_pool_hash(struct worker_pool *pool, + struct hmap *result, struct hmap *result_frags); + +/* Run a pool, call a callback function to perform processing of results. + */ + +void ovn_run_pool_callback(struct worker_pool *pool, void *fin_result, + void *result_frags, + void (*helper_func)(struct worker_pool *pool, + void *fin_result, void *result_frags, int index)); + + +/* Returns the first node in 'hmap' in the bucket in which the given 'hash' + * would land, or a null pointer if that bucket is empty. */ + +static inline struct hmap_node * +hmap_first_in_bucket_num(const struct hmap *hmap, size_t num) +{ + return hmap->buckets[num]; +} + +static inline struct hmap_node * +parallel_hmap_next__(const struct hmap *hmap, size_t start, size_t pool_size) +{ + size_t i; + for (i = start; i <= hmap->mask; i+= pool_size) { + struct hmap_node *node = hmap->buckets[i]; + if (node) { + return node; + } + } + return NULL; +} + +/* Returns the first node in 'hmap', as expected by thread with job_id + * for parallel processing in arbitrary order, or a null pointer if + * the slice of 'hmap' for that job_id is empty. */ +static inline struct hmap_node * +parallel_hmap_first(const struct hmap *hmap, size_t job_id, size_t pool_size) +{ + return parallel_hmap_next__(hmap, job_id, pool_size); +} + +/* Returns the next node in the slice of 'hmap' following 'node', + * in arbitrary order, or a * null pointer if 'node' is the last node in + * the 'hmap' slice. + * + */ +static inline struct hmap_node * +parallel_hmap_next(const struct hmap *hmap, + const struct hmap_node *node, ssize_t pool_size) +{ + return (node->next + ? node->next + : parallel_hmap_next__(hmap, + (node->hash & hmap->mask) + pool_size, pool_size)); +} + +/* Use the OVN library functions for stuff which OVS has not defined + * If OVS has defined these, they will still compile using the OVN + * local names, but will be dropped by the linker in favour of the OVS + * supplied functions. + */ + +#define cease_fire() ovn_cease_fire() + +#define add_worker_pool(start) ovn_add_worker_pool(start) + +#define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size) + +#define fast_hmap_init(hmap, size) ovn_fast_hmap_init(hmap, size) + +#define fast_hmap_merge(dest, inc) ovn_fast_hmap_merge(dest, inc) + +#define hmap_merge(dest, inc) ovn_hmap_merge(dest, inc) + +#define ovn_run_pool(pool) ovn_run_pool(pool) + +#define run_pool_hash(pool, result, result_frags) \ + ovn_run_pool_hash(pool, result, result_frags) + +#define run_pool_callback(pool, fin_result, result_frags, helper_func) \ + ovn_run_pool_callback(pool, fin_result, result_frags, helper_func) + +#ifdef __cplusplus +} +#endif + +#endif /* lib/fast-hmap.h */ From patchwork Wed Nov 25 11:02:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406018 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgyfC60Qvz9sRK for ; Wed, 25 Nov 2020 22:02:43 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 3293A86D0F; Wed, 25 Nov 2020 11:02:42 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id J-tADffDdk3K; Wed, 25 Nov 2020 11:02:39 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id AE8D986C7A; Wed, 25 Nov 2020 11:02:33 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 829B7C1D9F; Wed, 25 Nov 2020 11:02:33 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id CBA33C0891 for ; Wed, 25 Nov 2020 11:02:31 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id B192386E1B for ; Wed, 25 Nov 2020 11:02:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DQJ98fUNiLUN for ; Wed, 25 Nov 2020 11:02:27 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by whitealder.osuosl.org (Postfix) with ESMTPS id 2601C86E1A for ; Wed, 25 Nov 2020 11:02:27 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZ7-0004U9-Ig; Wed, 25 Nov 2020 11:02:25 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZ4-00038T-J4; Wed, 25 Nov 2020 11:02:24 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:04 +0000 Message-Id: <20201125110216.7944-3-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 02/14] ovn-northd: introduce parallel lflow build X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov 1. Add support for parallel lflow build. 2. Move combined lflow generation to be build in parallel. Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 201 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 172 insertions(+), 29 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 47a177c99..076a6d27f 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -49,6 +49,7 @@ #include "unixctl.h" #include "util.h" #include "uuid.h" +#include "fasthmap.h" #include "openvswitch/vlog.h" VLOG_DEFINE_THIS_MODULE(ovn_northd); @@ -4152,7 +4153,7 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od, ovn_lflow_init(lflow, od, stage, priority, xstrdup(match), xstrdup(actions), ovn_lflow_hint(stage_hint), where); - hmap_insert(lflow_map, &lflow->hmap_node, ovn_lflow_hash(lflow)); + hmap_insert_fast(lflow_map, &lflow->hmap_node, ovn_lflow_hash(lflow)); } /* Adds a row with the specified contents to the Logical_Flow table. */ @@ -11100,6 +11101,8 @@ struct lswitch_flow_build_info { /* Helper function to combine all lflow generation which is iterated by * datapath. + * Invoked by parallel build over a "chunk" of work or by single threaded + * build over a chunk which is initialized to contain "all" work. */ static void @@ -11134,6 +11137,8 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, } /* Helper function to combine all lflow generation which is iterated by port. + * Invoked by parallel build over a "chunk" of work or by single threaded + * build over a chunk which is initialized to contain "all" work. */ static void @@ -11161,6 +11166,88 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, &lsi->match, &lsi->actions); } +struct lflows_thread_pool { + struct worker_pool *pool; +}; + +static void *build_lflows_thread(void *arg) { + struct worker_control *control = (struct worker_control *) arg; + struct lflows_thread_pool *workload; + struct lswitch_flow_build_info *lsi; + + struct ovn_datapath *od; + struct ovn_port *op; + int bnum; + + while (!cease_fire()) { + sem_wait(&control->fire); + workload = (struct lflows_thread_pool *) control->workload; + lsi = (struct lswitch_flow_build_info *) control->data; + if (lsi && workload) { + /* Iterate over bucket ThreadID, ThreadID+size, ... */ + for (bnum = control->id; + bnum <= lsi->datapaths->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + od, key_node, bnum, lsi->datapaths) { + if (cease_fire()) { + return NULL; + } + build_lswitch_and_lrouter_iterate_by_od(od, lsi); + } + } + for (bnum = control->id; + bnum <= lsi->ports->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + op, key_node, bnum, lsi->ports) { + if (cease_fire()) { + return NULL; + } + build_lswitch_and_lrouter_iterate_by_op(op, lsi); + } + } + atomic_store_relaxed(&control->finished, true); + atomic_thread_fence(memory_order_release); + } + sem_post(control->done); + } + return NULL; +} + +static bool pool_init_done = false; +static struct lflows_thread_pool *build_lflows_pool = NULL; + +static void init_lflows_thread_pool(void) +{ + int index; + + if (!pool_init_done) { + struct worker_pool *pool = add_worker_pool(build_lflows_thread); + pool_init_done = true; + if (pool) { + build_lflows_pool = + xmalloc(sizeof(struct lflows_thread_pool)); + build_lflows_pool->pool = pool; + for (index = 0; index < build_lflows_pool->pool->size; index++) { + build_lflows_pool->pool->controls[index].workload = + build_lflows_pool; + } + } + } +} + +/* TODO: replace hard cutoffs by configurable via commands. These are + * temporary defines to determine single-thread to multi-thread processing + * cutoff. + * Setting to 1 forces "all parallel" lflow build. + */ + +#define OD_CUTOFF 1 +#define OP_CUTOFF 1 + static void build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *port_groups, struct hmap *lflows, @@ -11168,38 +11255,87 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *igmp_groups, struct shash *meter_groups, struct hmap *lbs) { - struct ovn_datapath *od; - struct ovn_port *op; - char *svc_check_match = xasprintf("eth.dst == %s", svc_monitor_mac); - struct lswitch_flow_build_info lsi = { - .datapaths = datapaths, - .ports = ports, - .port_groups = port_groups, - .lflows = lflows, - .mcgroups = mcgroups, - .igmp_groups = igmp_groups, - .meter_groups = meter_groups, - .lbs = lbs, - .svc_check_match = svc_check_match, - .match = DS_EMPTY_INITIALIZER, - .actions = DS_EMPTY_INITIALIZER, - }; + init_lflows_thread_pool(); - /* Combined build - all lflow generation from lswitch and lrouter - * will move here and will be reogranized by iterator type. - */ - HMAP_FOR_EACH (od, key_node, datapaths) { - build_lswitch_and_lrouter_iterate_by_od(od, &lsi); - } - HMAP_FOR_EACH (op, key_node, ports) { - build_lswitch_and_lrouter_iterate_by_op(op, &lsi); + if (build_lflows_pool && + (hmap_count(datapaths) > OD_CUTOFF || hmap_count(ports) > OP_CUTOFF)) { + + struct hmap *lflow_segs; + struct lswitch_flow_build_info *lsiv; + int index; + + lsiv = xmalloc( + sizeof(struct lswitch_flow_build_info) * + build_lflows_pool->pool->size); + lflow_segs = xmalloc( + sizeof(struct hmap) * build_lflows_pool->pool->size); + + /* Set up "work chunks" for each thread to work on. */ + + for (index = 0; index < build_lflows_pool->pool->size; index++) { + fast_hmap_init(&lflow_segs[index], lflows->mask); + + lsiv[index].datapaths = datapaths; + lsiv[index].ports = ports; + lsiv[index].port_groups = port_groups; + lsiv[index].lflows = &lflow_segs[index]; + lsiv[index].mcgroups = mcgroups; + lsiv[index].igmp_groups = igmp_groups; + lsiv[index].meter_groups = meter_groups; + lsiv[index].lbs = lbs; + lsiv[index].svc_check_match = svc_check_match; + ds_init(&lsiv[index].match); + ds_init(&lsiv[index].actions); + + build_lflows_pool->pool->controls[index].data = &lsiv[index]; + } + + /* Run thread pool. */ + + run_pool_hash(build_lflows_pool->pool, lflows, lflow_segs); + + for (index = 0; index < build_lflows_pool->pool->size; index++) { + ds_destroy(&lsiv[index].match); + ds_destroy(&lsiv[index].actions); + } + + free(lflow_segs); + free(lsiv); + } else { + struct ovn_datapath *od; + struct ovn_port *op; + + struct lswitch_flow_build_info lsi = { + .datapaths = datapaths, + .ports = ports, + .port_groups = port_groups, + .lflows = lflows, + .mcgroups = mcgroups, + .igmp_groups = igmp_groups, + .meter_groups = meter_groups, + .lbs = lbs, + .svc_check_match = svc_check_match, + .match = DS_EMPTY_INITIALIZER, + .actions = DS_EMPTY_INITIALIZER, + }; + + + /* Converged build - all lflow generation from lswitch and lrouter + * will move here and will be reogranized by iterator type. + */ + HMAP_FOR_EACH (od, key_node, datapaths) { + build_lswitch_and_lrouter_iterate_by_od(od, &lsi); + } + HMAP_FOR_EACH (op, key_node, ports) { + build_lswitch_and_lrouter_iterate_by_op(op, &lsi); + } + ds_destroy(&lsi.match); + ds_destroy(&lsi.actions); } - free(svc_check_match); - ds_destroy(&lsi.match); - ds_destroy(&lsi.actions); + free(svc_check_match); /* Legacy lswitch build - to be migrated. */ build_lswitch_flows(datapaths, ports, lflows, mcgroups, @@ -11209,6 +11345,7 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_lrouter_flows(datapaths, ports, lflows, meter_groups, lbs); } +static ssize_t max_seen_lflow_size = 128; /* Updates the Logical_Flow and Multicast_Group tables in the OVN_SB database, * constructing their contents based on the OVN_NB database. */ @@ -11219,12 +11356,18 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, struct shash *meter_groups, struct hmap *lbs) { - struct hmap lflows = HMAP_INITIALIZER(&lflows); + struct hmap lflows; + + fast_hmap_size_for(&lflows, max_seen_lflow_size); build_lswitch_and_lrouter_flows(datapaths, ports, port_groups, &lflows, mcgroups, igmp_groups, meter_groups, lbs); + if (hmap_count(&lflows) > max_seen_lflow_size) { + max_seen_lflow_size = hmap_count(&lflows); + } + /* Push changes to the Logical_Flow table to database. */ const struct sbrec_logical_flow *sbflow, *next_sbflow; SBREC_LOGICAL_FLOW_FOR_EACH_SAFE (sbflow, next_sbflow, ctx->ovnsb_idl) { From patchwork Wed Nov 25 11:02:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406019 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgyfD6088z9sSf for ; Wed, 25 Nov 2020 22:02:44 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 30D7B86E92; Wed, 25 Nov 2020 11:02:42 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3TYXsq6rV4dJ; Wed, 25 Nov 2020 11:02:36 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id A089086E15; Wed, 25 Nov 2020 11:02:36 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 85E60C0891; Wed, 25 Nov 2020 11:02:36 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 083EFC0052 for ; Wed, 25 Nov 2020 11:02:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id E65F887476 for ; Wed, 25 Nov 2020 11:02:34 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6Ha421WchN7X for ; Wed, 25 Nov 2020 11:02:30 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by hemlock.osuosl.org (Postfix) with ESMTPS id 4E92787475 for ; Wed, 25 Nov 2020 11:02:30 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZ9-0004UG-Q5; Wed, 25 Nov 2020 11:02:28 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZ6-00038T-Ex; Wed, 25 Nov 2020 11:02:26 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:05 +0000 Message-Id: <20201125110216.7944-4-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 03/14] ovn-northd: move arp responder to a set of functions X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Move arp/ND responder processing to per od, op and lb iterators. Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 497 +++++++++++++++++++++++--------------------- 1 file changed, 261 insertions(+), 236 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 076a6d27f..a1358cc67 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6685,7 +6685,7 @@ is_vlan_transparent(const struct ovn_datapath *od) static void build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *lflows, struct hmap *mcgroups, - struct hmap *igmp_groups, struct hmap *lbs) + struct hmap *igmp_groups) { /* This flow table structure is documented in ovn-northd(8), so please * update ovn-northd.8.xml if you change anything. */ @@ -6693,241 +6693,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ds match = DS_EMPTY_INITIALIZER; struct ds actions = DS_EMPTY_INITIALIZER; struct ovn_datapath *od; - - /* Ingress table 13: ARP/ND responder, skip requests coming from localnet - * and vtep ports. (priority 100); see ovn-northd.8.xml for the - * rationale. */ struct ovn_port *op; - HMAP_FOR_EACH (op, key_node, ports) { - if (!op->nbsp) { - continue; - } - - if ((!strcmp(op->nbsp->type, "localnet")) || - (!strcmp(op->nbsp->type, "vtep"))) { - ds_clear(&match); - ds_put_format(&match, "inport == %s", op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, - 100, ds_cstr(&match), "next;", - &op->nbsp->header_); - } - } - - /* Ingress table 13: ARP/ND responder, reply for known IPs. - * (priority 50). */ - HMAP_FOR_EACH (op, key_node, ports) { - if (!op->nbsp) { - continue; - } - - if (!strcmp(op->nbsp->type, "virtual")) { - /* Handle - * - GARPs for virtual ip which belongs to a logical port - * of type 'virtual' and bind that port. - * - * - ARP reply from the virtual ip which belongs to a logical - * port of type 'virtual' and bind that port. - * */ - ovs_be32 ip; - const char *virtual_ip = smap_get(&op->nbsp->options, - "virtual-ip"); - const char *virtual_parents = smap_get(&op->nbsp->options, - "virtual-parents"); - if (!virtual_ip || !virtual_parents || - !ip_parse(virtual_ip, &ip)) { - continue; - } - - char *tokstr = xstrdup(virtual_parents); - char *save_ptr = NULL; - char *vparent; - for (vparent = strtok_r(tokstr, ",", &save_ptr); vparent != NULL; - vparent = strtok_r(NULL, ",", &save_ptr)) { - struct ovn_port *vp = ovn_port_find(ports, vparent); - if (!vp || vp->od != op->od) { - /* vparent name should be valid and it should belong - * to the same logical switch. */ - continue; - } - - ds_clear(&match); - ds_put_format(&match, "inport == \"%s\" && " - "((arp.op == 1 && arp.spa == %s && " - "arp.tpa == %s) || (arp.op == 2 && " - "arp.spa == %s))", - vparent, virtual_ip, virtual_ip, - virtual_ip); - ds_clear(&actions); - ds_put_format(&actions, - "bind_vport(%s, inport); " - "next;", - op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, - S_SWITCH_IN_ARP_ND_RSP, 100, - ds_cstr(&match), ds_cstr(&actions), - &vp->nbsp->header_); - } - - free(tokstr); - } else { - /* - * Add ARP/ND reply flows if either the - * - port is up and it doesn't have 'unknown' address defined or - * - port type is router or - * - port type is localport - */ - if (check_lsp_is_up && - !lsp_is_up(op->nbsp) && !lsp_is_router(op->nbsp) && - strcmp(op->nbsp->type, "localport")) { - continue; - } - - if (lsp_is_external(op->nbsp) || op->has_unknown) { - continue; - } - - for (size_t i = 0; i < op->n_lsp_addrs; i++) { - for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) { - ds_clear(&match); - ds_put_format(&match, "arp.tpa == %s && arp.op == 1", - op->lsp_addrs[i].ipv4_addrs[j].addr_s); - ds_clear(&actions); - ds_put_format(&actions, - "eth.dst = eth.src; " - "eth.src = %s; " - "arp.op = 2; /* ARP reply */ " - "arp.tha = arp.sha; " - "arp.sha = %s; " - "arp.tpa = arp.spa; " - "arp.spa = %s; " - "outport = inport; " - "flags.loopback = 1; " - "output;", - op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s, - op->lsp_addrs[i].ipv4_addrs[j].addr_s); - ovn_lflow_add_with_hint(lflows, op->od, - S_SWITCH_IN_ARP_ND_RSP, 50, - ds_cstr(&match), - ds_cstr(&actions), - &op->nbsp->header_); - - /* Do not reply to an ARP request from the port that owns - * the address (otherwise a DHCP client that ARPs to check - * for a duplicate address will fail). Instead, forward - * it the usual way. - * - * (Another alternative would be to simply drop the packet. - * If everything is working as it is configured, then this - * would produce equivalent results, since no one should - * reply to the request. But ARPing for one's own IP - * address is intended to detect situations where the - * network is not working as configured, so dropping the - * request would frustrate that intent.) */ - ds_put_format(&match, " && inport == %s", op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, - S_SWITCH_IN_ARP_ND_RSP, 100, - ds_cstr(&match), "next;", - &op->nbsp->header_); - } - - /* For ND solicitations, we need to listen for both the - * unicast IPv6 address and its all-nodes multicast address, - * but always respond with the unicast IPv6 address. */ - for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) { - ds_clear(&match); - ds_put_format(&match, - "nd_ns && ip6.dst == {%s, %s} && nd.target == %s", - op->lsp_addrs[i].ipv6_addrs[j].addr_s, - op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s, - op->lsp_addrs[i].ipv6_addrs[j].addr_s); - - ds_clear(&actions); - ds_put_format(&actions, - "%s { " - "eth.src = %s; " - "ip6.src = %s; " - "nd.target = %s; " - "nd.tll = %s; " - "outport = inport; " - "flags.loopback = 1; " - "output; " - "};", - lsp_is_router(op->nbsp) ? "nd_na_router" : "nd_na", - op->lsp_addrs[i].ea_s, - op->lsp_addrs[i].ipv6_addrs[j].addr_s, - op->lsp_addrs[i].ipv6_addrs[j].addr_s, - op->lsp_addrs[i].ea_s); - ovn_lflow_add_with_hint(lflows, op->od, - S_SWITCH_IN_ARP_ND_RSP, 50, - ds_cstr(&match), - ds_cstr(&actions), - &op->nbsp->header_); - - /* Do not reply to a solicitation from the port that owns - * the address (otherwise DAD detection will fail). */ - ds_put_format(&match, " && inport == %s", op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, - S_SWITCH_IN_ARP_ND_RSP, 100, - ds_cstr(&match), "next;", - &op->nbsp->header_); - } - } - } - } - - /* Ingress table 13: ARP/ND responder, by default goto next. - * (priority 0)*/ - HMAP_FOR_EACH (od, key_node, datapaths) { - if (!od->nbs) { - continue; - } - - ovn_lflow_add(lflows, od, S_SWITCH_IN_ARP_ND_RSP, 0, "1", "next;"); - } - - /* Ingress table 13: ARP/ND responder for service monitor source ip. - * (priority 110)*/ - struct ovn_northd_lb *lb; - HMAP_FOR_EACH (lb, hmap_node, lbs) { - for (size_t i = 0; i < lb->n_vips; i++) { - struct ovn_northd_lb_vip *lb_vip_nb = &lb->vips_nb[i]; - if (!lb_vip_nb->lb_health_check) { - continue; - } - - for (size_t j = 0; j < lb_vip_nb->n_backends; j++) { - struct ovn_northd_lb_backend *backend_nb = - &lb_vip_nb->backends_nb[i]; - if (!backend_nb->op || !backend_nb->svc_mon_src_ip) { - continue; - } - - ds_clear(&match); - ds_put_format(&match, "arp.tpa == %s && arp.op == 1", - backend_nb->svc_mon_src_ip); - ds_clear(&actions); - ds_put_format(&actions, - "eth.dst = eth.src; " - "eth.src = %s; " - "arp.op = 2; /* ARP reply */ " - "arp.tha = arp.sha; " - "arp.sha = %s; " - "arp.tpa = arp.spa; " - "arp.spa = %s; " - "outport = inport; " - "flags.loopback = 1; " - "output;", - svc_monitor_mac, svc_monitor_mac, - backend_nb->svc_mon_src_ip); - ovn_lflow_add_with_hint(lflows, - backend_nb->op->od, - S_SWITCH_IN_ARP_ND_RSP, 110, - ds_cstr(&match), ds_cstr(&actions), - &lb->nlb->header_); - } - } - } - /* Logical switch ingress table 14 and 15: DHCP options and response * priority 100 flows. */ @@ -7385,6 +7151,247 @@ build_lswitch_lflows_admission_control(struct ovn_datapath *od, } } +/* Ingress table 13: ARP/ND responder */ + +static void +build_lswitch_arp_nd_responder_op(struct ovn_port *op, + struct hmap *lflows, + struct hmap *ports, + struct ds *match, + struct ds *actions) +{ + if (!op->nbsp) { + return; + } + + /* Ingress table 13: ARP/ND responder skip requests coming from localnet + * and vtep ports. (priority 100); see ovn-northd.8.xml for the + * rationale. */ + if ((!strcmp(op->nbsp->type, "localnet")) || + (!strcmp(op->nbsp->type, "vtep"))) { + ds_clear(match); + ds_put_format(match, "inport == %s", op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, + 100, ds_cstr(match), "next;", + &op->nbsp->header_); + } + + /* Ingress table 13: ARP/ND responder, reply for known IPs. + * (priority 50). */ + if (!strcmp(op->nbsp->type, "virtual")) { + /* Handle + * - GARPs for virtual ip which belongs to a logical port + * of type 'virtual' and bind that port. + * + * - ARP reply from the virtual ip which belongs to a logical + * port of type 'virtual' and bind that port. + * */ + ovs_be32 ip; + const char *virtual_ip = smap_get(&op->nbsp->options, + "virtual-ip"); + const char *virtual_parents = smap_get(&op->nbsp->options, + "virtual-parents"); + if (!virtual_ip || !virtual_parents || + !ip_parse(virtual_ip, &ip)) { + return; + } + + char *tokstr = xstrdup(virtual_parents); + char *save_ptr = NULL; + char *vparent; + for (vparent = strtok_r(tokstr, ",", &save_ptr); vparent != NULL; + vparent = strtok_r(NULL, ",", &save_ptr)) { + struct ovn_port *vp = ovn_port_find(ports, vparent); + if (!vp || vp->od != op->od) { + /* vparent name should be valid and it should belong + * to the same logical switch. */ + continue; + } + + ds_clear(match); + ds_put_format(match, "inport == \"%s\" && " + "((arp.op == 1 && arp.spa == %s && " + "arp.tpa == %s) || (arp.op == 2 && " + "arp.spa == %s))", + vparent, virtual_ip, virtual_ip, + virtual_ip); + ds_clear(actions); + ds_put_format(actions, + "bind_vport(%s, inport); " + "next;", + op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, + S_SWITCH_IN_ARP_ND_RSP, 100, + ds_cstr(match), ds_cstr(actions), + &vp->nbsp->header_); + } + + free(tokstr); + } else { + /* + * Add ARP/ND reply flows if either the + * - port is up and it doesn't have 'unknown' address defined or + * - port type is router or + * - port type is localport + */ + if (check_lsp_is_up && + !lsp_is_up(op->nbsp) && !lsp_is_router(op->nbsp) && + strcmp(op->nbsp->type, "localport")) { + return; + } + + if (lsp_is_external(op->nbsp) || op->has_unknown) { + return; + } + + for (size_t i = 0; i < op->n_lsp_addrs; i++) { + for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) { + ds_clear(match); + ds_put_format(match, "arp.tpa == %s && arp.op == 1", + op->lsp_addrs[i].ipv4_addrs[j].addr_s); + ds_clear(actions); + ds_put_format(actions, + "eth.dst = eth.src; " + "eth.src = %s; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = %s; " + "arp.tpa = arp.spa; " + "arp.spa = %s; " + "outport = inport; " + "flags.loopback = 1; " + "output;", + op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s, + op->lsp_addrs[i].ipv4_addrs[j].addr_s); + ovn_lflow_add_with_hint(lflows, op->od, + S_SWITCH_IN_ARP_ND_RSP, 50, + ds_cstr(match), + ds_cstr(actions), + &op->nbsp->header_); + + /* Do not reply to an ARP request from the port that owns + * the address (otherwise a DHCP client that ARPs to check + * for a duplicate address will fail). Instead, forward + * it the usual way. + * + * (Another alternative would be to simply drop the packet. + * If everything is working as it is configured, then this + * would produce equivalent results, since no one should + * reply to the request. But ARPing for one's own IP + * address is intended to detect situations where the + * network is not working as configured, so dropping the + * request would frustrate that intent.) */ + ds_put_format(match, " && inport == %s", op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, + S_SWITCH_IN_ARP_ND_RSP, 100, + ds_cstr(match), "next;", + &op->nbsp->header_); + } + + /* For ND solicitations, we need to listen for both the + * unicast IPv6 address and its all-nodes multicast address, + * but always respond with the unicast IPv6 address. */ + for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) { + ds_clear(match); + ds_put_format(match, + "nd_ns && ip6.dst == {%s, %s} && nd.target == %s", + op->lsp_addrs[i].ipv6_addrs[j].addr_s, + op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s, + op->lsp_addrs[i].ipv6_addrs[j].addr_s); + + ds_clear(actions); + ds_put_format(actions, + "%s { " + "eth.src = %s; " + "ip6.src = %s; " + "nd.target = %s; " + "nd.tll = %s; " + "outport = inport; " + "flags.loopback = 1; " + "output; " + "};", + lsp_is_router(op->nbsp) ? "nd_na_router" : "nd_na", + op->lsp_addrs[i].ea_s, + op->lsp_addrs[i].ipv6_addrs[j].addr_s, + op->lsp_addrs[i].ipv6_addrs[j].addr_s, + op->lsp_addrs[i].ea_s); + ovn_lflow_add_with_hint(lflows, op->od, + S_SWITCH_IN_ARP_ND_RSP, 50, + ds_cstr(match), + ds_cstr(actions), + &op->nbsp->header_); + + /* Do not reply to a solicitation from the port that owns + * the address (otherwise DAD detection will fail). */ + ds_put_format(match, " && inport == %s", op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, + S_SWITCH_IN_ARP_ND_RSP, 100, + ds_cstr(match), "next;", + &op->nbsp->header_); + } + } + } +} + +/* Ingress table 13: ARP/ND responder, by default goto next. + * (priority 0)*/ +static void +build_lswitch_arp_nd_responder_od(struct ovn_datapath *od, + struct hmap *lflows) +{ + if (od->nbs) { + ovn_lflow_add(lflows, od, S_SWITCH_IN_ARP_ND_RSP, 0, "1", "next;"); + } +} + +/* Ingress table 13: ARP/ND responder for service monitor source ip. + * (priority 110)*/ +static void +build_lswitch_arp_nd_responder_lb(struct ovn_northd_lb *lb, + struct hmap *lflows, + struct ds *match, + struct ds *actions) +{ + for (size_t i = 0; i < lb->n_vips; i++) { + struct ovn_northd_lb_vip *lb_vip_nb = &lb->vips_nb[i]; + if (!lb_vip_nb->lb_health_check) { + continue; + } + + for (size_t j = 0; j < lb_vip_nb->n_backends; j++) { + struct ovn_northd_lb_backend *backend_nb = + &lb_vip_nb->backends_nb[i]; + if (!backend_nb->op || !backend_nb->svc_mon_src_ip) { + continue; + } + + ds_clear(match); + ds_put_format(match, "arp.tpa == %s && arp.op == 1", + backend_nb->svc_mon_src_ip); + ds_clear(actions); + ds_put_format(actions, + "eth.dst = eth.src; " + "eth.src = %s; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = %s; " + "arp.tpa = arp.spa; " + "arp.spa = %s; " + "outport = inport; " + "flags.loopback = 1; " + "output;", + svc_monitor_mac, svc_monitor_mac, + backend_nb->svc_mon_src_ip); + ovn_lflow_add_with_hint(lflows, + backend_nb->op->od, + S_SWITCH_IN_ARP_ND_RSP, 110, + ds_cstr(match), ds_cstr(actions), + &lb->nlb->header_); + } + } +} + + /* Returns a string of the IP address of the router port 'op' that * overlaps with 'ip_s". If one is not found, returns NULL. @@ -11116,6 +11123,7 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, build_fwd_group_lflows(od, lsi->lflows); build_lswitch_lflows_admission_control(od, lsi->lflows); build_lswitch_input_port_sec_od(od, lsi->lflows); + build_lswitch_arp_nd_responder_od(od, lsi->lflows); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter(od, lsi->lflows); @@ -11148,6 +11156,8 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, /* Build Logical Switch Flows. */ build_lswitch_input_port_sec_op(op, lsi->lflows, &lsi->actions, &lsi->match); + build_lswitch_arp_nd_responder_op(op, lsi->lflows, lsi->ports, + &lsi->match, &lsi->actions); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter_port(op, lsi->lflows, &lsi->match, @@ -11177,6 +11187,7 @@ static void *build_lflows_thread(void *arg) { struct ovn_datapath *od; struct ovn_port *op; + struct ovn_northd_lb *lb; int bnum; while (!cease_fire()) { @@ -11209,6 +11220,20 @@ static void *build_lflows_thread(void *arg) { build_lswitch_and_lrouter_iterate_by_op(op, lsi); } } + for (bnum = control->id; + bnum <= lsi->lbs->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + lb, hmap_node, bnum, lsi->lbs) { + if (cease_fire()) { + return NULL; + } + build_lswitch_arp_nd_responder_lb(lb, lsi->lflows, + &lsi->match, + &lsi->actions); + } + } atomic_store_relaxed(&control->finished, true); atomic_thread_fence(memory_order_release); } @@ -11339,7 +11364,7 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, /* Legacy lswitch build - to be migrated. */ build_lswitch_flows(datapaths, ports, lflows, mcgroups, - igmp_groups, lbs); + igmp_groups); /* Legacy lrouter build - to be migrated. */ build_lrouter_flows(datapaths, ports, lflows, meter_groups, lbs); From patchwork Wed Nov 25 11:02:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406020 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgyfM3K1yz9sRK for ; Wed, 25 Nov 2020 22:02:51 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id ED94F86F90; Wed, 25 Nov 2020 11:02:49 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kjfNksdYyB7j; Wed, 25 Nov 2020 11:02:44 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id DEF1786E6E; Wed, 25 Nov 2020 11:02:38 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id CDD7AC1833; Wed, 25 Nov 2020 11:02:38 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 58DDBC1DA0 for ; Wed, 25 Nov 2020 11:02:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 2E6652DDC9 for ; Wed, 25 Nov 2020 11:02:38 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0PJRKQ+7jFS2 for ; Wed, 25 Nov 2020 11:02:33 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by silver.osuosl.org (Postfix) with ESMTPS id B20942DE24 for ; Wed, 25 Nov 2020 11:02:31 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZB-0004UL-98; Wed, 25 Nov 2020 11:02:29 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZ8-00038T-Et; Wed, 25 Nov 2020 11:02:28 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:06 +0000 Message-Id: <20201125110216.7944-5-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 04/14] ovn-northd: isolate DHCP response in lswitch to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 100 +++++++++++++++++++++++--------------------- 1 file changed, 52 insertions(+), 48 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index a1358cc67..6bfdf9c9a 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6695,54 +6695,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ovn_datapath *od; struct ovn_port *op; - /* Logical switch ingress table 14 and 15: DHCP options and response - * priority 100 flows. */ - HMAP_FOR_EACH (op, key_node, ports) { - if (!op->nbsp) { - continue; - } - - if (!lsp_is_enabled(op->nbsp) || lsp_is_router(op->nbsp)) { - /* Don't add the DHCP flows if the port is not enabled or if the - * port is a router port. */ - continue; - } - - if (!op->nbsp->dhcpv4_options && !op->nbsp->dhcpv6_options) { - /* CMS has disabled both native DHCPv4 and DHCPv6 for this lport. - */ - continue; - } - - bool is_external = lsp_is_external(op->nbsp); - if (is_external && (!op->od->n_localnet_ports || - !op->nbsp->ha_chassis_group)) { - /* If it's an external port and there are no localnet ports - * and if it doesn't belong to an HA chassis group ignore it. */ - continue; - } - - for (size_t i = 0; i < op->n_lsp_addrs; i++) { - if (is_external) { - for (size_t j = 0; j < op->od->n_localnet_ports; j++) { - build_dhcpv4_options_flows( - op, &op->lsp_addrs[i], - op->od->localnet_ports[j]->json_key, is_external, - lflows); - build_dhcpv6_options_flows( - op, &op->lsp_addrs[i], - op->od->localnet_ports[j]->json_key, is_external, - lflows); - } - } else { - build_dhcpv4_options_flows(op, &op->lsp_addrs[i], op->json_key, - is_external, lflows); - build_dhcpv6_options_flows(op, &op->lsp_addrs[i], op->json_key, - is_external, lflows); - } - } - } - /* Logical switch ingress table 17 and 18: DNS lookup and response * priority 100 flows. */ @@ -7391,6 +7343,57 @@ build_lswitch_arp_nd_responder_lb(struct ovn_northd_lb *lb, } } +/* Logical switch ingress table 14 and 15: DHCP options and response + * priority 100 flows. */ + +static void +build_lswitch_dhcp_options_and_response(struct ovn_port *op, + struct hmap *lflows) +{ + if (!op->nbsp) { + return; + } + if (!lsp_is_enabled(op->nbsp) || lsp_is_router(op->nbsp)) { + /* Don't add the DHCP flows if the port is not enabled or if the + * port is a router port. */ + return; + } + + if (!op->nbsp->dhcpv4_options && !op->nbsp->dhcpv6_options) { + /* CMS has disabled both native DHCPv4 and DHCPv6 for this lport. + */ + return; + } + + bool is_external = lsp_is_external(op->nbsp); + if (is_external && (!op->od->n_localnet_ports || + !op->nbsp->ha_chassis_group)) { + /* If it's an external port and there are no localnet ports + * and if it doesn't belong to an HA chassis group ignore it. */ + return; + } + + for (size_t i = 0; i < op->n_lsp_addrs; i++) { + if (is_external) { + for (size_t j = 0; j < op->od->n_localnet_ports; j++) { + build_dhcpv4_options_flows( + op, &op->lsp_addrs[i], + op->od->localnet_ports[j]->json_key, is_external, + lflows); + build_dhcpv6_options_flows( + op, &op->lsp_addrs[i], + op->od->localnet_ports[j]->json_key, is_external, + lflows); + } + } else { + build_dhcpv4_options_flows(op, &op->lsp_addrs[i], op->json_key, + is_external, lflows); + build_dhcpv6_options_flows(op, &op->lsp_addrs[i], op->json_key, + is_external, lflows); + } + } +} + /* Returns a string of the IP address of the router port 'op' that @@ -11158,6 +11161,7 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, &lsi->match); build_lswitch_arp_nd_responder_op(op, lsi->lflows, lsi->ports, &lsi->match, &lsi->actions); + build_lswitch_dhcp_options_and_response(op, lsi->lflows); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter_port(op, lsi->lflows, &lsi->match, From patchwork Wed Nov 25 11:02:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406023 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Cgyfy355yz9sRK for ; Wed, 25 Nov 2020 22:03:22 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 451FE2E1D0; Wed, 25 Nov 2020 11:03:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Gzy1XQJgKP18; Wed, 25 Nov 2020 11:03:06 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id DF4002E149; Wed, 25 Nov 2020 11:02:41 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id AF97BC0891; Wed, 25 Nov 2020 11:02:41 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1E5DCC0891 for ; Wed, 25 Nov 2020 11:02:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id A081B86CE7 for ; Wed, 25 Nov 2020 11:02:38 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P88C0KomgOna for ; Wed, 25 Nov 2020 11:02:34 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 64A9686C4D for ; Wed, 25 Nov 2020 11:02:32 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZD-0004UR-08; Wed, 25 Nov 2020 11:02:31 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZA-00038T-86; Wed, 25 Nov 2020 11:02:29 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:07 +0000 Message-Id: <20201125110216.7944-6-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 05/14] ovn-northd: move DNS lookup-response to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 53 +++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 6bfdf9c9a..aee94569e 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6695,30 +6695,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ovn_datapath *od; struct ovn_port *op; - /* Logical switch ingress table 17 and 18: DNS lookup and response - * priority 100 flows. - */ - HMAP_FOR_EACH (od, key_node, datapaths) { - if (!od->nbs || !ls_has_dns_records(od->nbs)) { - continue; - } - - ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_LOOKUP, 100, - "udp.dst == 53", - REGBIT_DNS_LOOKUP_RESULT" = dns_lookup(); next;"); - const char *dns_action = "eth.dst <-> eth.src; ip4.src <-> ip4.dst; " - "udp.dst = udp.src; udp.src = 53; outport = inport; " - "flags.loopback = 1; output;"; - const char *dns_match = "udp.dst == 53 && "REGBIT_DNS_LOOKUP_RESULT; - ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_RESPONSE, 100, - dns_match, dns_action); - dns_action = "eth.dst <-> eth.src; ip6.src <-> ip6.dst; " - "udp.dst = udp.src; udp.src = 53; outport = inport; " - "flags.loopback = 1; output;"; - ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_RESPONSE, 100, - dns_match, dns_action); - } - /* Ingress table 14 and 15: DHCP options and response, by default goto * next. (priority 0). * Ingress table 16 and 17: DNS lookup and response, by default goto next. @@ -7394,6 +7370,34 @@ build_lswitch_dhcp_options_and_response(struct ovn_port *op, } } +/* Logical switch ingress table 17 and 18: DNS lookup and response + * priority 100 flows. + */ +static void +build_lswitch_dns_lookup_response(struct ovn_datapath *od, + struct hmap *lflows) +{ + if (!od->nbs || !ls_has_dns_records(od->nbs)) { + return; + } + + ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_LOOKUP, 100, + "udp.dst == 53", + REGBIT_DNS_LOOKUP_RESULT" = dns_lookup(); next;"); + const char *dns_action = "eth.dst <-> eth.src; ip4.src <-> ip4.dst; " + "udp.dst = udp.src; udp.src = 53; outport = inport; " + "flags.loopback = 1; output;"; + const char *dns_match = "udp.dst == 53 && "REGBIT_DNS_LOOKUP_RESULT; + ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_RESPONSE, 100, + dns_match, dns_action); + dns_action = "eth.dst <-> eth.src; ip6.src <-> ip6.dst; " + "udp.dst = udp.src; udp.src = 53; outport = inport; " + "flags.loopback = 1; output;"; + ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_RESPONSE, 100, + dns_match, dns_action); +} + + /* Returns a string of the IP address of the router port 'op' that @@ -11127,6 +11131,7 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, build_lswitch_lflows_admission_control(od, lsi->lflows); build_lswitch_input_port_sec_od(od, lsi->lflows); build_lswitch_arp_nd_responder_od(od, lsi->lflows); + build_lswitch_dns_lookup_response(od, lsi->lflows); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter(od, lsi->lflows); From patchwork Wed Nov 25 11:02:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406022 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Cgyfm2KhBz9sRK for ; Wed, 25 Nov 2020 22:03:12 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id A0D678704C; Wed, 25 Nov 2020 11:03:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BNEc4M00WUB2; Wed, 25 Nov 2020 11:03:07 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 7692D86EC3; Wed, 25 Nov 2020 11:02:46 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4874EC1DA0; Wed, 25 Nov 2020 11:02:46 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 77AAAC1833 for ; Wed, 25 Nov 2020 11:02:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 654F42E164 for ; Wed, 25 Nov 2020 11:02:44 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZEQRotClusAo for ; Wed, 25 Nov 2020 11:02:39 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by silver.osuosl.org (Postfix) with ESMTPS id 39EED2E06F for ; Wed, 25 Nov 2020 11:02:34 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZE-0004UW-PK; Wed, 25 Nov 2020 11:02:32 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZC-00038T-09; Wed, 25 Nov 2020 11:02:31 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:08 +0000 Message-Id: <20201125110216.7944-7-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 06/14] ovn-northd: move dns defaults to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index aee94569e..9e6bc32b1 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6695,25 +6695,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ovn_datapath *od; struct ovn_port *op; - /* Ingress table 14 and 15: DHCP options and response, by default goto - * next. (priority 0). - * Ingress table 16 and 17: DNS lookup and response, by default goto next. - * (priority 0). - * Ingress table 18 - External port handling, by default goto next. - * (priority 0). */ - - HMAP_FOR_EACH (od, key_node, datapaths) { - if (!od->nbs) { - continue; - } - - ovn_lflow_add(lflows, od, S_SWITCH_IN_DHCP_OPTIONS, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_SWITCH_IN_DHCP_RESPONSE, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_LOOKUP, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_RESPONSE, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_SWITCH_IN_EXTERNAL_PORT, 0, "1", "next;"); - } - HMAP_FOR_EACH (op, key_node, ports) { if (!op->nbsp || !lsp_is_external(op->nbsp)) { continue; @@ -7397,6 +7378,25 @@ build_lswitch_dns_lookup_response(struct ovn_datapath *od, dns_match, dns_action); } +/* Ingress table 14 and 15: DHCP options and response, by default goto + * next. (priority 0). + * Ingress table 16 and 17: DNS lookup and response, by default goto next. + * (priority 0). + * Ingress table 18 - External port handling, by default goto next. + * (priority 0). */ + +static void +build_lswitch_dns_and_dhcp_defaults(struct ovn_datapath *od, + struct hmap *lflows) +{ + if (od->nbs) { + ovn_lflow_add(lflows, od, S_SWITCH_IN_DHCP_OPTIONS, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_SWITCH_IN_DHCP_RESPONSE, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_LOOKUP, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_SWITCH_IN_DNS_RESPONSE, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_SWITCH_IN_EXTERNAL_PORT, 0, "1", "next;"); + } +} @@ -11132,6 +11132,7 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, build_lswitch_input_port_sec_od(od, lsi->lflows); build_lswitch_arp_nd_responder_od(od, lsi->lflows); build_lswitch_dns_lookup_response(od, lsi->lflows); + build_lswitch_dns_and_dhcp_defaults(od, lsi->lflows); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter(od, lsi->lflows); From patchwork Wed Nov 25 11:02:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406024 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Cgyg54FhFz9sRK for ; Wed, 25 Nov 2020 22:03:29 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 9EA3C2E1D8; Wed, 25 Nov 2020 11:03:27 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2A7d-DUYATc8; Wed, 25 Nov 2020 11:03:12 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 532802E159; Wed, 25 Nov 2020 11:02:43 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 07EC0C1833; Wed, 25 Nov 2020 11:02:43 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4F05AC0052 for ; Wed, 25 Nov 2020 11:02:40 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 47C2087569 for ; Wed, 25 Nov 2020 11:02:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I+V+svdPuBjE for ; Wed, 25 Nov 2020 11:02:36 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by hemlock.osuosl.org (Postfix) with ESMTPS id D479B87475 for ; Wed, 25 Nov 2020 11:02:35 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZG-0004Uc-F8; Wed, 25 Nov 2020 11:02:34 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZD-00038T-Oe; Wed, 25 Nov 2020 11:02:33 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:09 +0000 Message-Id: <20201125110216.7944-8-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 07/14] ovn-northd: move arp drops on external to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 9e6bc32b1..546f7bcde 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6695,21 +6695,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ovn_datapath *od; struct ovn_port *op; - HMAP_FOR_EACH (op, key_node, ports) { - if (!op->nbsp || !lsp_is_external(op->nbsp)) { - continue; - } - - /* Table 18: External port. Drop ARP request for router ips from - * external ports on chassis not binding those ports. - * This makes the router pipeline to be run only on the chassis - * binding the external ports. */ - for (size_t i = 0; i < op->od->n_localnet_ports; i++) { - build_drop_arp_nd_flows_for_unbound_router_ports( - op, op->od->localnet_ports[i], lflows); - } - } - /* Ingress table 19: Destination lookup, broadcast and multicast handling * (priority 70 - 100). */ HMAP_FOR_EACH (od, key_node, datapaths) { @@ -7398,7 +7383,24 @@ build_lswitch_dns_and_dhcp_defaults(struct ovn_datapath *od, } } +/* Table 18: External port. Drop ARP request for router ips from + * external ports on chassis not binding those ports. + * This makes the router pipeline to be run only on the chassis + * binding the external ports. */ + +static void +build_lswitch_drop_arp_on_external(struct ovn_port *op, + struct hmap *lflows) +{ + if (!op->nbsp || !lsp_is_external(op->nbsp)) { + return; + } + for (size_t i = 0; i < op->od->n_localnet_ports; i++) { + build_drop_arp_nd_flows_for_unbound_router_ports( + op, op->od->localnet_ports[i], lflows); + } +} /* Returns a string of the IP address of the router port 'op' that * overlaps with 'ip_s". If one is not found, returns NULL. @@ -11168,6 +11170,7 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, build_lswitch_arp_nd_responder_op(op, lsi->lflows, lsi->ports, &lsi->match, &lsi->actions); build_lswitch_dhcp_options_and_response(op, lsi->lflows); + build_lswitch_drop_arp_on_external(op, lsi->lflows); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter_port(op, lsi->lflows, &lsi->match, From patchwork Wed Nov 25 11:02:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406021 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Cgyfl1Jfkz9sRK for ; Wed, 25 Nov 2020 22:03:11 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id B25DD87031; Wed, 25 Nov 2020 11:03:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s6A3sLroaJI3; Wed, 25 Nov 2020 11:03:03 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 3745386E44; Wed, 25 Nov 2020 11:02:45 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 2020EC0891; Wed, 25 Nov 2020 11:02:45 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8257AC0052 for ; Wed, 25 Nov 2020 11:02:42 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 6F17587572 for ; Wed, 25 Nov 2020 11:02:42 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SWK+IAaM6VXk for ; Wed, 25 Nov 2020 11:02:38 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by hemlock.osuosl.org (Postfix) with ESMTPS id D9DAD87538 for ; Wed, 25 Nov 2020 11:02:37 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZI-0004Uh-Bo; Wed, 25 Nov 2020 11:02:36 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZF-00038T-FP; Wed, 25 Nov 2020 11:02:35 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:10 +0000 Message-Id: <20201125110216.7944-9-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 08/14] ovn-northd: move destination bmcast handling to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 171 +++++++++++++++++++++++--------------------- 1 file changed, 89 insertions(+), 82 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 546f7bcde..0247b69d9 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6695,88 +6695,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ovn_datapath *od; struct ovn_port *op; - /* Ingress table 19: Destination lookup, broadcast and multicast handling - * (priority 70 - 100). */ - HMAP_FOR_EACH (od, key_node, datapaths) { - if (!od->nbs) { - continue; - } - - ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 110, - "eth.dst == $svc_monitor_mac", - "handle_svc_check(inport);"); - - struct mcast_switch_info *mcast_sw_info = &od->mcast_info.sw; - - if (mcast_sw_info->enabled) { - ds_clear(&actions); - if (mcast_sw_info->flood_reports) { - ds_put_cstr(&actions, - "clone { " - "outport = \""MC_MROUTER_STATIC"\"; " - "output; " - "};"); - } - ds_put_cstr(&actions, "igmp;"); - /* Punt IGMP traffic to controller. */ - ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 100, - "ip4 && ip.proto == 2", ds_cstr(&actions)); - - /* Punt MLD traffic to controller. */ - ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 100, - "mldv1 || mldv2", ds_cstr(&actions)); - - /* Flood all IP multicast traffic destined to 224.0.0.X to all - * ports - RFC 4541, section 2.1.2, item 2. - */ - ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 85, - "ip4.mcast && ip4.dst == 224.0.0.0/24", - "outport = \""MC_FLOOD"\"; output;"); - - /* Flood all IPv6 multicast traffic destined to reserved - * multicast IPs (RFC 4291, 2.7.1). - */ - ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 85, - "ip6.mcast_flood", - "outport = \""MC_FLOOD"\"; output;"); - - /* Forward uregistered IP multicast to routers with relay enabled - * and to any ports configured to flood IP multicast traffic. - * If configured to flood unregistered traffic this will be - * handled by the L2 multicast flow. - */ - if (!mcast_sw_info->flood_unregistered) { - ds_clear(&actions); - - if (mcast_sw_info->flood_relay) { - ds_put_cstr(&actions, - "clone { " - "outport = \""MC_MROUTER_FLOOD"\"; " - "output; " - "}; "); - } - - if (mcast_sw_info->flood_static) { - ds_put_cstr(&actions, "outport =\""MC_STATIC"\"; output;"); - } - - /* Explicitly drop the traffic if relay or static flooding - * is not configured. - */ - if (!mcast_sw_info->flood_relay && - !mcast_sw_info->flood_static) { - ds_put_cstr(&actions, "drop;"); - } - - ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 80, - "ip4.mcast || ip6.mcast", ds_cstr(&actions)); - } - } - - ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 70, "eth.mcast", - "outport = \""MC_FLOOD"\"; output;"); - } - /* Ingress table 19: Add IP multicast flows learnt from IGMP/MLD * (priority 90). */ struct ovn_igmp_group *igmp_group; @@ -7401,6 +7319,93 @@ build_lswitch_drop_arp_on_external(struct ovn_port *op, op, op->od->localnet_ports[i], lflows); } } +/* Ingress table 19: Destination lookup, broadcast and multicast handling + * (priority 70 - 100). */ + +static void +build_lswitch_destination_bmcast_handling(struct ovn_datapath *od, + struct hmap *lflows, + struct ds *actions) +{ + if (!od->nbs) { + return; + } + + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 110, + "eth.dst == $svc_monitor_mac", + "handle_svc_check(inport);"); + + struct mcast_switch_info *mcast_sw_info = &od->mcast_info.sw; + + if (mcast_sw_info->enabled) { + ds_clear(actions); + if (mcast_sw_info->flood_reports) { + ds_put_cstr(actions, + "clone { " + "outport = \""MC_MROUTER_STATIC"\"; " + "output; " + "};"); + } + ds_put_cstr(actions, "igmp;"); + /* Punt IGMP traffic to controller. */ + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 100, + "ip4 && ip.proto == 2", ds_cstr(actions)); + + /* Punt MLD traffic to controller. */ + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 100, + "mldv1 || mldv2", ds_cstr(actions)); + + /* Flood all IP multicast traffic destined to 224.0.0.X to all + * ports - RFC 4541, section 2.1.2, item 2. + */ + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 85, + "ip4.mcast && ip4.dst == 224.0.0.0/24", + "outport = \""MC_FLOOD"\"; output;"); + + /* Flood all IPv6 multicast traffic destined to reserved + * multicast IPs (RFC 4291, 2.7.1). + */ + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 85, + "ip6.mcast_flood", + "outport = \""MC_FLOOD"\"; output;"); + + /* Forward uregistered IP multicast to routers with relay enabled + * and to any ports configured to flood IP multicast traffic. + * If configured to flood unregistered traffic this will be + * handled by the L2 multicast flow. + */ + if (!mcast_sw_info->flood_unregistered) { + ds_clear(actions); + + if (mcast_sw_info->flood_relay) { + ds_put_cstr(actions, + "clone { " + "outport = \""MC_MROUTER_FLOOD"\"; " + "output; " + "}; "); + } + + if (mcast_sw_info->flood_static) { + ds_put_cstr(actions, "outport =\""MC_STATIC"\"; output;"); + } + + /* Explicitly drop the traffic if relay or static flooding + * is not configured. + */ + if (!mcast_sw_info->flood_relay && + !mcast_sw_info->flood_static) { + ds_put_cstr(actions, "drop;"); + } + + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 80, + "ip4.mcast || ip6.mcast", ds_cstr(actions)); + } + } + + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 70, "eth.mcast", + "outport = \""MC_FLOOD"\"; output;"); +} + /* Returns a string of the IP address of the router port 'op' that * overlaps with 'ip_s". If one is not found, returns NULL. @@ -11135,6 +11140,8 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, build_lswitch_arp_nd_responder_od(od, lsi->lflows); build_lswitch_dns_lookup_response(od, lsi->lflows); build_lswitch_dns_and_dhcp_defaults(od, lsi->lflows); + build_lswitch_destination_bmcast_handling(od, lsi->lflows, + &lsi->actions); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter(od, lsi->lflows); From patchwork Wed Nov 25 11:02:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406025 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgygB38kTz9sRK for ; Wed, 25 Nov 2020 22:03:34 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id ABEC786E53; Wed, 25 Nov 2020 11:03:32 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UBQ8O9gZVLN1; Wed, 25 Nov 2020 11:03:27 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 6B39A86EAC; Wed, 25 Nov 2020 11:02:57 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5B053C1D9F; Wed, 25 Nov 2020 11:02:57 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id D88F9C0891 for ; Wed, 25 Nov 2020 11:02:56 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id CF74E86FCA for ; Wed, 25 Nov 2020 11:02:56 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dYuK2PIQ1HAS for ; Wed, 25 Nov 2020 11:02:52 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by whitealder.osuosl.org (Postfix) with ESMTPS id B0CF386E82 for ; Wed, 25 Nov 2020 11:02:39 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1khsZK-0004Um-5b; Wed, 25 Nov 2020 11:02:38 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZH-00038T-9I; Wed, 25 Nov 2020 11:02:37 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:11 +0000 Message-Id: <20201125110216.7944-10-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 09/14] ovn-northd: move flows from igmp to a separate function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 164 +++++++++++++++++++++++++------------------- 1 file changed, 92 insertions(+), 72 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 0247b69d9..4bfca7210 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6684,8 +6684,7 @@ is_vlan_transparent(const struct ovn_datapath *od) static void build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, - struct hmap *lflows, struct hmap *mcgroups, - struct hmap *igmp_groups) + struct hmap *lflows, struct hmap *mcgroups) { /* This flow table structure is documented in ovn-northd(8), so please * update ovn-northd.8.xml if you change anything. */ @@ -6695,74 +6694,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ovn_datapath *od; struct ovn_port *op; - /* Ingress table 19: Add IP multicast flows learnt from IGMP/MLD - * (priority 90). */ - struct ovn_igmp_group *igmp_group; - - HMAP_FOR_EACH (igmp_group, hmap_node, igmp_groups) { - if (!igmp_group->datapath) { - continue; - } - - ds_clear(&match); - ds_clear(&actions); - - struct mcast_switch_info *mcast_sw_info = - &igmp_group->datapath->mcast_info.sw; - - if (IN6_IS_ADDR_V4MAPPED(&igmp_group->address)) { - /* RFC 4541, section 2.1.2, item 2: Skip groups in the 224.0.0.X - * range. - */ - ovs_be32 group_address = - in6_addr_get_mapped_ipv4(&igmp_group->address); - if (ip_is_local_multicast(group_address)) { - continue; - } - - if (mcast_sw_info->active_v4_flows >= mcast_sw_info->table_size) { - continue; - } - mcast_sw_info->active_v4_flows++; - ds_put_format(&match, "eth.mcast && ip4 && ip4.dst == %s ", - igmp_group->mcgroup.name); - } else { - /* RFC 4291, section 2.7.1: Skip groups that correspond to all - * hosts. - */ - if (ipv6_is_all_hosts(&igmp_group->address)) { - continue; - } - if (mcast_sw_info->active_v6_flows >= mcast_sw_info->table_size) { - continue; - } - mcast_sw_info->active_v6_flows++; - ds_put_format(&match, "eth.mcast && ip6 && ip6.dst == %s ", - igmp_group->mcgroup.name); - } - - /* Also flood traffic to all multicast routers with relay enabled. */ - if (mcast_sw_info->flood_relay) { - ds_put_cstr(&actions, - "clone { " - "outport = \""MC_MROUTER_FLOOD "\"; " - "output; " - "};"); - } - if (mcast_sw_info->flood_static) { - ds_put_cstr(&actions, - "clone { " - "outport =\""MC_STATIC"\"; " - "output; " - "};"); - } - ds_put_format(&actions, "outport = \"%s\"; output; ", - igmp_group->mcgroup.name); - - ovn_lflow_add(lflows, igmp_group->datapath, S_SWITCH_IN_L2_LKUP, 90, - ds_cstr(&match), ds_cstr(&actions)); - } - /* Ingress table 19: Destination lookup, unicast handling (priority 50), */ HMAP_FOR_EACH (op, key_node, ports) { if (!op->nbsp || lsp_is_external(op->nbsp)) { @@ -7406,6 +7337,81 @@ build_lswitch_destination_bmcast_handling(struct ovn_datapath *od, "outport = \""MC_FLOOD"\"; output;"); } +/* Ingress table 19: Add IP multicast flows learnt from IGMP/MLD + * (priority 90). */ +static void +build_lswitch_igmp_mld_flows(struct ovn_igmp_group *igmp_group, + struct hmap *lflows, + struct ds *match, + struct ds *actions) +{ + if (!igmp_group->datapath) { + return; + } + + ds_clear(match); + ds_clear(actions); + + /* This should be OK to access without locking as this is the only + * place where active_v4_flows is changed */ + + struct mcast_switch_info *mcast_sw_info = + &igmp_group->datapath->mcast_info.sw; + + if (IN6_IS_ADDR_V4MAPPED(&igmp_group->address)) { + /* RFC 4541, section 2.1.2, item 2: Skip groups in the 224.0.0.X + * range. + */ + ovs_be32 group_address = + in6_addr_get_mapped_ipv4(&igmp_group->address); + if (ip_is_local_multicast(group_address)) { + return; + } + + if (mcast_sw_info->active_v4_flows >= mcast_sw_info->table_size) { + return; + } + mcast_sw_info->active_v4_flows++; + ds_put_format(match, "eth.mcast && ip4 && ip4.dst == %s ", + igmp_group->mcgroup.name); + } else { + /* RFC 4291, section 2.7.1: Skip groups that correspond to all + * hosts. + */ + if (ipv6_is_all_hosts(&igmp_group->address)) { + return; + } + if (mcast_sw_info->active_v6_flows >= mcast_sw_info->table_size) { + return; + } + mcast_sw_info->active_v6_flows++; + ds_put_format(match, "eth.mcast && ip6 && ip6.dst == %s ", + igmp_group->mcgroup.name); + } + + /* Also flood traffic to all multicast routers with relay enabled. */ + if (mcast_sw_info->flood_relay) { + ds_put_cstr(actions, + "clone { " + "outport = \""MC_MROUTER_FLOOD "\"; " + "output; " + "};"); + } + if (mcast_sw_info->flood_static) { + ds_put_cstr(actions, + "clone { " + "outport =\""MC_STATIC"\"; " + "output; " + "};"); + } + ds_put_format(actions, "outport = \"%s\"; output; ", + igmp_group->mcgroup.name); + + ovn_lflow_add(lflows, igmp_group->datapath, S_SWITCH_IN_L2_LKUP, 90, + ds_cstr(match), ds_cstr(actions)); +} + + /* Returns a string of the IP address of the router port 'op' that * overlaps with 'ip_s". If one is not found, returns NULL. @@ -11208,6 +11214,7 @@ static void *build_lflows_thread(void *arg) { struct ovn_datapath *od; struct ovn_port *op; struct ovn_northd_lb *lb; + struct ovn_igmp_group *igmp_group; int bnum; while (!cease_fire()) { @@ -11254,6 +11261,20 @@ static void *build_lflows_thread(void *arg) { &lsi->actions); } } + for (bnum = control->id; + bnum <= lsi->igmp_groups->mask; + bnum += workload->pool->size) + { + HMAP_FOR_EACH_IN_PARALLEL ( + igmp_group, hmap_node, bnum, lsi->igmp_groups) { + if (cease_fire()) { + return NULL; + } + build_lswitch_igmp_mld_flows(igmp_group, lsi->lflows, + &lsi->match, + &lsi->actions); + } + } atomic_store_relaxed(&control->finished, true); atomic_thread_fence(memory_order_release); } @@ -11383,8 +11404,7 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, free(svc_check_match); /* Legacy lswitch build - to be migrated. */ - build_lswitch_flows(datapaths, ports, lflows, mcgroups, - igmp_groups); + build_lswitch_flows(datapaths, ports, lflows, mcgroups); /* Legacy lrouter build - to be migrated. */ build_lrouter_flows(datapaths, ports, lflows, meter_groups, lbs); From patchwork Wed Nov 25 11:02:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406035 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgzGf5nVvz9sVY for ; Wed, 25 Nov 2020 22:30:50 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 4156B8756E; Wed, 25 Nov 2020 11:30:48 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VMqujjxvb0gt; Wed, 25 Nov 2020 11:30:40 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 54C2087584; Wed, 25 Nov 2020 11:30:39 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0C36BC1DA1; Wed, 25 Nov 2020 11:30:39 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id E4B06C163C for ; Wed, 25 Nov 2020 11:30:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id B8B4787568 for ; Wed, 25 Nov 2020 11:30:36 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XjyBHdWc2MTA for ; Wed, 25 Nov 2020 11:30:35 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by hemlock.osuosl.org (Postfix) with ESMTPS id 1C9EE87558 for ; Wed, 25 Nov 2020 11:30:34 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kht0L-0004aa-5p; Wed, 25 Nov 2020 11:30:33 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZJ-00038T-2D; Wed, 25 Nov 2020 11:02:38 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:12 +0000 Message-Id: <20201125110216.7944-11-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 10/14] ovn-northd: move lswitch unicast handling to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 291 +++++++++++++++++++++++--------------------- 1 file changed, 152 insertions(+), 139 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 4bfca7210..915e37099 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -6684,7 +6684,7 @@ is_vlan_transparent(const struct ovn_datapath *od) static void build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, - struct hmap *lflows, struct hmap *mcgroups) + struct hmap *lflows) { /* This flow table structure is documented in ovn-northd(8), so please * update ovn-northd.8.xml if you change anything. */ @@ -6692,143 +6692,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct ds match = DS_EMPTY_INITIALIZER; struct ds actions = DS_EMPTY_INITIALIZER; struct ovn_datapath *od; - struct ovn_port *op; - - /* Ingress table 19: Destination lookup, unicast handling (priority 50), */ - HMAP_FOR_EACH (op, key_node, ports) { - if (!op->nbsp || lsp_is_external(op->nbsp)) { - continue; - } - - /* For ports connected to logical routers add flows to bypass the - * broadcast flooding of ARP/ND requests in table 19. We direct the - * requests only to the router port that owns the IP address. - */ - if (lsp_is_router(op->nbsp)) { - build_lswitch_rport_arp_req_flows(op->peer, op->od, op, lflows, - &op->nbsp->header_); - } - - for (size_t i = 0; i < op->nbsp->n_addresses; i++) { - /* Addresses are owned by the logical port. - * Ethernet address followed by zero or more IPv4 - * or IPv6 addresses (or both). */ - struct eth_addr mac; - if (ovs_scan(op->nbsp->addresses[i], - ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))) { - ds_clear(&match); - ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT, - ETH_ADDR_ARGS(mac)); - - ds_clear(&actions); - ds_put_format(&actions, "outport = %s; output;", op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_L2_LKUP, - 50, ds_cstr(&match), - ds_cstr(&actions), - &op->nbsp->header_); - } else if (!strcmp(op->nbsp->addresses[i], "unknown")) { - if (lsp_is_enabled(op->nbsp)) { - ovn_multicast_add(mcgroups, &mc_unknown, op); - op->od->has_unknown = true; - } - } else if (is_dynamic_lsp_address(op->nbsp->addresses[i])) { - if (!op->nbsp->dynamic_addresses - || !ovs_scan(op->nbsp->dynamic_addresses, - ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))) { - continue; - } - ds_clear(&match); - ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT, - ETH_ADDR_ARGS(mac)); - - ds_clear(&actions); - ds_put_format(&actions, "outport = %s; output;", op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_L2_LKUP, - 50, ds_cstr(&match), - ds_cstr(&actions), - &op->nbsp->header_); - } else if (!strcmp(op->nbsp->addresses[i], "router")) { - if (!op->peer || !op->peer->nbrp - || !ovs_scan(op->peer->nbrp->mac, - ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))) { - continue; - } - ds_clear(&match); - ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT, - ETH_ADDR_ARGS(mac)); - if (op->peer->od->l3dgw_port - && op->peer->od->l3redirect_port - && op->od->n_localnet_ports) { - bool add_chassis_resident_check = false; - if (op->peer == op->peer->od->l3dgw_port) { - /* The peer of this port represents a distributed - * gateway port. The destination lookup flow for the - * router's distributed gateway port MAC address should - * only be programmed on the gateway chassis. */ - add_chassis_resident_check = true; - } else { - /* Check if the option 'reside-on-redirect-chassis' - * is set to true on the peer port. If set to true - * and if the logical switch has a localnet port, it - * means the router pipeline for the packets from - * this logical switch should be run on the chassis - * hosting the gateway port. - */ - add_chassis_resident_check = smap_get_bool( - &op->peer->nbrp->options, - "reside-on-redirect-chassis", false); - } - - if (add_chassis_resident_check) { - ds_put_format(&match, " && is_chassis_resident(%s)", - op->peer->od->l3redirect_port->json_key); - } - } - - ds_clear(&actions); - ds_put_format(&actions, "outport = %s; output;", op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, - S_SWITCH_IN_L2_LKUP, 50, - ds_cstr(&match), ds_cstr(&actions), - &op->nbsp->header_); - - /* Add ethernet addresses specified in NAT rules on - * distributed logical routers. */ - if (op->peer->od->l3dgw_port - && op->peer == op->peer->od->l3dgw_port) { - for (int j = 0; j < op->peer->od->nbr->n_nat; j++) { - const struct nbrec_nat *nat - = op->peer->od->nbr->nat[j]; - if (!strcmp(nat->type, "dnat_and_snat") - && nat->logical_port && nat->external_mac - && eth_addr_from_string(nat->external_mac, &mac)) { - - ds_clear(&match); - ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT - " && is_chassis_resident(\"%s\")", - ETH_ADDR_ARGS(mac), - nat->logical_port); - - ds_clear(&actions); - ds_put_format(&actions, "outport = %s; output;", - op->json_key); - ovn_lflow_add_with_hint(lflows, op->od, - S_SWITCH_IN_L2_LKUP, 50, - ds_cstr(&match), - ds_cstr(&actions), - &op->nbsp->header_); - } - } - } - } else { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); - - VLOG_INFO_RL(&rl, - "%s: invalid syntax '%s' in addresses column", - op->nbsp->name, op->nbsp->addresses[i]); - } - } - } /* Ingress table 19: Destination lookup for unknown MACs (priority 0). */ HMAP_FOR_EACH (od, key_node, datapaths) { @@ -7411,6 +7274,152 @@ build_lswitch_igmp_mld_flows(struct ovn_igmp_group *igmp_group, ds_cstr(match), ds_cstr(actions)); } +static struct ovs_mutex mcgroups_lock = OVS_MUTEX_INITIALIZER; + +/* Ingress table 19: Destination lookup, unicast handling (priority 50), */ +static void +build_lswitch_destination_lookup_unicast(struct ovn_port *op, + struct hmap *lflows, + struct hmap *mcgroups, + struct ds *match, + struct ds *actions) +{ + if (!op->nbsp || lsp_is_external(op->nbsp)) { + return; + } + + /* For ports connected to logical routers add flows to bypass the + * broadcast flooding of ARP/ND requests in table 19. We direct the + * requests only to the router port that owns the IP address. + */ + if (lsp_is_router(op->nbsp)) { + build_lswitch_rport_arp_req_flows(op->peer, op->od, op, lflows, + &op->nbsp->header_); + } + + for (size_t i = 0; i < op->nbsp->n_addresses; i++) { + /* Addresses are owned by the logical port. + * Ethernet address followed by zero or more IPv4 + * or IPv6 addresses (or both). */ + struct eth_addr mac; + if (ovs_scan(op->nbsp->addresses[i], + ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))) { + ds_clear(match); + ds_put_format(match, "eth.dst == "ETH_ADDR_FMT, + ETH_ADDR_ARGS(mac)); + + ds_clear(actions); + ds_put_format(actions, "outport = %s; output;", op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_L2_LKUP, + 50, ds_cstr(match), + ds_cstr(actions), + &op->nbsp->header_); + } else if (!strcmp(op->nbsp->addresses[i], "unknown")) { + if (lsp_is_enabled(op->nbsp)) { + ovs_mutex_lock(&mcgroups_lock); + ovn_multicast_add(mcgroups, &mc_unknown, op); + ovs_mutex_unlock(&mcgroups_lock); + op->od->has_unknown = true; + } + } else if (is_dynamic_lsp_address(op->nbsp->addresses[i])) { + if (!op->nbsp->dynamic_addresses + || !ovs_scan(op->nbsp->dynamic_addresses, + ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))) { + continue; + } + ds_clear(match); + ds_put_format(match, "eth.dst == "ETH_ADDR_FMT, + ETH_ADDR_ARGS(mac)); + + ds_clear(actions); + ds_put_format(actions, "outport = %s; output;", op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_L2_LKUP, + 50, ds_cstr(match), + ds_cstr(actions), + &op->nbsp->header_); + } else if (!strcmp(op->nbsp->addresses[i], "router")) { + if (!op->peer || !op->peer->nbrp + || !ovs_scan(op->peer->nbrp->mac, + ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))) { + continue; + } + ds_clear(match); + ds_put_format(match, "eth.dst == "ETH_ADDR_FMT, + ETH_ADDR_ARGS(mac)); + if (op->peer->od->l3dgw_port + && op->peer->od->l3redirect_port + && op->od->n_localnet_ports) { + bool add_chassis_resident_check = false; + if (op->peer == op->peer->od->l3dgw_port) { + /* The peer of this port represents a distributed + * gateway port. The destination lookup flow for the + * router's distributed gateway port MAC address should + * only be programmed on the gateway chassis. */ + add_chassis_resident_check = true; + } else { + /* Check if the option 'reside-on-redirect-chassis' + * is set to true on the peer port. If set to true + * and if the logical switch has a localnet port, it + * means the router pipeline for the packets from + * this logical switch should be run on the chassis + * hosting the gateway port. + */ + add_chassis_resident_check = smap_get_bool( + &op->peer->nbrp->options, + "reside-on-redirect-chassis", false); + } + + if (add_chassis_resident_check) { + ds_put_format(match, " && is_chassis_resident(%s)", + op->peer->od->l3redirect_port->json_key); + } + } + + ds_clear(actions); + ds_put_format(actions, "outport = %s; output;", op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, + S_SWITCH_IN_L2_LKUP, 50, + ds_cstr(match), ds_cstr(actions), + &op->nbsp->header_); + + /* Add ethernet addresses specified in NAT rules on + * distributed logical routers. */ + if (op->peer->od->l3dgw_port + && op->peer == op->peer->od->l3dgw_port) { + for (int j = 0; j < op->peer->od->nbr->n_nat; j++) { + const struct nbrec_nat *nat + = op->peer->od->nbr->nat[j]; + if (!strcmp(nat->type, "dnat_and_snat") + && nat->logical_port && nat->external_mac + && eth_addr_from_string(nat->external_mac, &mac)) { + + ds_clear(match); + ds_put_format(match, "eth.dst == "ETH_ADDR_FMT + " && is_chassis_resident(\"%s\")", + ETH_ADDR_ARGS(mac), + nat->logical_port); + + ds_clear(actions); + ds_put_format(actions, "outport = %s; output;", + op->json_key); + ovn_lflow_add_with_hint(lflows, op->od, + S_SWITCH_IN_L2_LKUP, 50, + ds_cstr(match), + ds_cstr(actions), + &op->nbsp->header_); + } + } + } + } else { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + + VLOG_INFO_RL(&rl, + "%s: invalid syntax '%s' in addresses column", + op->nbsp->name, op->nbsp->addresses[i]); + } + } +} + /* Returns a string of the IP address of the router port 'op' that @@ -11184,6 +11193,10 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, &lsi->match, &lsi->actions); build_lswitch_dhcp_options_and_response(op, lsi->lflows); build_lswitch_drop_arp_on_external(op, lsi->lflows); + build_lswitch_destination_lookup_unicast(op, lsi->lflows, + lsi->mcgroups, + &lsi->match, + &lsi->actions); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter_port(op, lsi->lflows, &lsi->match, @@ -11404,7 +11417,7 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, free(svc_check_match); /* Legacy lswitch build - to be migrated. */ - build_lswitch_flows(datapaths, ports, lflows, mcgroups); + build_lswitch_flows(datapaths, ports, lflows); /* Legacy lrouter build - to be migrated. */ build_lrouter_flows(datapaths, ports, lflows, meter_groups, lbs); From patchwork Wed Nov 25 11:02:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406033 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgzGY0KRZz9sVX for ; Wed, 25 Nov 2020 22:30:40 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 6743087565; Wed, 25 Nov 2020 11:30:36 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RpO9BonwiDPa; Wed, 25 Nov 2020 11:30:34 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 765D587554; Wed, 25 Nov 2020 11:30:34 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 43957C1833; Wed, 25 Nov 2020 11:30:34 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 73135C0052 for ; Wed, 25 Nov 2020 11:30:33 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 546C987554 for ; Wed, 25 Nov 2020 11:30:33 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hVLFPlU6Xe8x for ; Wed, 25 Nov 2020 11:30:32 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by hemlock.osuosl.org (Postfix) with ESMTPS id 95FEE8753E for ; Wed, 25 Nov 2020 11:30:32 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kht0J-0004aV-1w; Wed, 25 Nov 2020 11:30:31 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZL-00038T-0D; Wed, 25 Nov 2020 11:02:40 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:13 +0000 Message-Id: <20201125110216.7944-12-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 11/14] ovn-northd: split lswitch port security into iterators X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Split port security into iterators and move it to common processing Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 119 +++++++++++++++++++++----------------------- 1 file changed, 56 insertions(+), 63 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 915e37099..ffb0668c1 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -4837,75 +4837,71 @@ build_lswitch_input_port_sec_od( } } +/* Egress table 8: Egress port security - IP (priorities 90 and 80) + * if port security enabled. + * + * Egress table 9: Egress port security - L2 (priorities 50 and 150). + * + * Priority 50 rules implement port security for enabled logical port. + * + * Priority 150 rules drop packets to disabled logical ports, so that + * they don't even receive multicast or broadcast packets. + */ + static void -build_lswitch_output_port_sec(struct hmap *ports, struct hmap *datapaths, - struct hmap *lflows) +build_lswitch_output_port_sec_op(struct ovn_port *op, + struct hmap *lflows, + struct ds *match, + struct ds *actions) { - struct ds actions = DS_EMPTY_INITIALIZER; - struct ds match = DS_EMPTY_INITIALIZER; - struct ovn_port *op; - - /* Egress table 8: Egress port security - IP (priorities 90 and 80) - * if port security enabled. - * - * Egress table 9: Egress port security - L2 (priorities 50 and 150). - * - * Priority 50 rules implement port security for enabled logical port. - * - * Priority 150 rules drop packets to disabled logical ports, so that - * they don't even receive multicast or broadcast packets. - */ - HMAP_FOR_EACH (op, key_node, ports) { - if (!op->nbsp || lsp_is_external(op->nbsp)) { - continue; - } + if (!op->nbsp || lsp_is_external(op->nbsp)) { + return; + } - ds_clear(&actions); - ds_clear(&match); + ds_clear(actions); + ds_clear(match); - ds_put_format(&match, "outport == %s", op->json_key); - if (lsp_is_enabled(op->nbsp)) { - build_port_security_l2("eth.dst", op->ps_addrs, op->n_ps_addrs, - &match); + ds_put_format(match, "outport == %s", op->json_key); + if (lsp_is_enabled(op->nbsp)) { + build_port_security_l2("eth.dst", op->ps_addrs, op->n_ps_addrs, + match); - if (!strcmp(op->nbsp->type, "localnet")) { - const char *queue_id = smap_get(&op->sb->options, - "qdisc_queue_id"); - if (queue_id) { - ds_put_format(&actions, "set_queue(%s); ", queue_id); - } + if (!strcmp(op->nbsp->type, "localnet")) { + const char *queue_id = smap_get(&op->sb->options, + "qdisc_queue_id"); + if (queue_id) { + ds_put_format(actions, "set_queue(%s); ", queue_id); } - ds_put_cstr(&actions, "output;"); - ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_OUT_PORT_SEC_L2, - 50, ds_cstr(&match), ds_cstr(&actions), - &op->nbsp->header_); - } else { - ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_OUT_PORT_SEC_L2, - 150, ds_cstr(&match), "drop;", - &op->nbsp->header_); } + ds_put_cstr(actions, "output;"); + ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_OUT_PORT_SEC_L2, + 50, ds_cstr(match), ds_cstr(actions), + &op->nbsp->header_); + } else { + ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_OUT_PORT_SEC_L2, + 150, ds_cstr(match), "drop;", + &op->nbsp->header_); + } - if (op->nbsp->n_port_security) { - build_port_security_ip(P_OUT, op, lflows, &op->nbsp->header_); - } + if (op->nbsp->n_port_security) { + build_port_security_ip(P_OUT, op, lflows, &op->nbsp->header_); } +} /* Egress tables 8: Egress port security - IP (priority 0) * Egress table 9: Egress port security L2 - multicast/broadcast * (priority 100). */ - struct ovn_datapath *od; - HMAP_FOR_EACH (od, key_node, datapaths) { - if (!od->nbs) { - continue; - } - - ovn_lflow_add(lflows, od, S_SWITCH_OUT_PORT_SEC_IP, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_SWITCH_OUT_PORT_SEC_L2, 100, "eth.mcast", - "output;"); +static void +build_lswitch_output_port_sec_od(struct ovn_datapath *od, + struct hmap *lflows) +{ + if (!od->nbs) { + return; } - ds_destroy(&match); - ds_destroy(&actions); + ovn_lflow_add(lflows, od, S_SWITCH_OUT_PORT_SEC_IP, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_SWITCH_OUT_PORT_SEC_L2, 100, "eth.mcast", + "output;"); } static void @@ -6683,14 +6679,11 @@ is_vlan_transparent(const struct ovn_datapath *od) } static void -build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, - struct hmap *lflows) +build_lswitch_flows(struct hmap *datapaths, struct hmap *lflows) { /* This flow table structure is documented in ovn-northd(8), so please * update ovn-northd.8.xml if you change anything. */ - struct ds match = DS_EMPTY_INITIALIZER; - struct ds actions = DS_EMPTY_INITIALIZER; struct ovn_datapath *od; /* Ingress table 19: Destination lookup for unknown MACs (priority 0). */ @@ -6705,10 +6698,6 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, } } - build_lswitch_output_port_sec(ports, datapaths, lflows); - - ds_destroy(&match); - ds_destroy(&actions); } /* Build pre-ACL and ACL tables for both ingress and egress. @@ -11157,6 +11146,7 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, build_lswitch_dns_and_dhcp_defaults(od, lsi->lflows); build_lswitch_destination_bmcast_handling(od, lsi->lflows, &lsi->actions); + build_lswitch_output_port_sec_od(od, lsi->lflows); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter(od, lsi->lflows); @@ -11197,6 +11187,9 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, lsi->mcgroups, &lsi->match, &lsi->actions); + build_lswitch_output_port_sec_op(op, lsi->lflows, + &lsi->match, + &lsi->actions); /* Build Logical Router Flows. */ build_adm_ctrl_flows_for_lrouter_port(op, lsi->lflows, &lsi->match, @@ -11417,7 +11410,7 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, free(svc_check_match); /* Legacy lswitch build - to be migrated. */ - build_lswitch_flows(datapaths, ports, lflows); + build_lswitch_flows(datapaths, lflows); /* Legacy lrouter build - to be migrated. */ build_lrouter_flows(datapaths, ports, lflows, meter_groups, lbs); From patchwork Wed Nov 25 11:02:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406036 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgzGn1DV6z9sVX for ; Wed, 25 Nov 2020 22:30:57 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 2B4868756D; Wed, 25 Nov 2020 11:30:55 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2LrhHjEVJ3HG; Wed, 25 Nov 2020 11:30:49 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 2BB30875A6; Wed, 25 Nov 2020 11:30:49 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id F0805C163C; Wed, 25 Nov 2020 11:30:48 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6D240C0052 for ; Wed, 25 Nov 2020 11:30:47 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 402D02E145 for ; Wed, 25 Nov 2020 11:30:47 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RG-JK-Zh+j5v for ; Wed, 25 Nov 2020 11:30:39 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by silver.osuosl.org (Postfix) with ESMTPS id B4F332DE25 for ; Wed, 25 Nov 2020 11:30:38 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kht0P-0004ak-9z; Wed, 25 Nov 2020 11:30:37 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZM-00038T-PT; Wed, 25 Nov 2020 11:02:42 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:14 +0000 Message-Id: <20201125110216.7944-13-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 12/14] ovn-northd: move lrouter arp/nd flows to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 96 +++++++++++++++++++++++---------------------- 1 file changed, 50 insertions(+), 46 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index ffb0668c1..55b7ea898 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -8754,52 +8754,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct ovn_datapath *od; struct ovn_port *op; - HMAP_FOR_EACH (od, key_node, datapaths) { - if (!od->nbr) { - continue; - } - - /* Priority-90-92 flows handle ARP requests and ND packets. Most are - * per logical port but DNAT addresses can be handled per datapath - * for non gateway router ports. - * - * Priority 91 and 92 flows are added for each gateway router - * port to handle the special cases. In case we get the packet - * on a regular port, just reply with the port's ETH address. - */ - for (int i = 0; i < od->nbr->n_nat; i++) { - struct ovn_nat *nat_entry = &od->nat_entries[i]; - - /* Skip entries we failed to parse. */ - if (!nat_entry_is_valid(nat_entry)) { - continue; - } - - /* Skip SNAT entries for now, we handle unique SNAT IPs separately - * below. - */ - if (!strcmp(nat_entry->nb->type, "snat")) { - continue; - } - build_lrouter_nat_arp_nd_flow(od, nat_entry, lflows); - } - - /* Now handle SNAT entries too, one per unique SNAT IP. */ - struct shash_node *snat_snode; - SHASH_FOR_EACH (snat_snode, &od->snat_ips) { - struct ovn_snat_ip *snat_ip = snat_snode->data; - - if (ovs_list_is_empty(&snat_ip->snat_entries)) { - continue; - } - - struct ovn_nat *nat_entry = - CONTAINER_OF(ovs_list_front(&snat_ip->snat_entries), - struct ovn_nat, ext_addr_list_node); - build_lrouter_nat_arp_nd_flow(od, nat_entry, lflows); - } - } - /* Logical router ingress table 3: IP Input for IPv4. */ HMAP_FOR_EACH (op, key_node, ports) { if (!op->nbrp) { @@ -11110,6 +11064,55 @@ build_ipv6_input_flows_for_lrouter_port( } +/* Priority-90-92 flows handle ARP requests and ND packets. Most are +* per logical port but DNAT addresses can be handled per datapath +* for non gateway router ports. +* +* Priority 91 and 92 flows are added for each gateway router +* port to handle the special cases. In case we get the packet +* on a regular port, just reply with the port's ETH address. +*/ +static void +build_lrouter_arp_nd_flows(struct ovn_datapath *od, struct hmap *lflows) +{ + if (!od->nbr) { + return; + } + + for (int i = 0; i < od->nbr->n_nat; i++) { + struct ovn_nat *nat_entry = &od->nat_entries[i]; + + /* Skip entries we failed to parse. */ + if (!nat_entry_is_valid(nat_entry)) { + continue; + } + + /* Skip SNAT entries for now, we handle unique SNAT IPs separately + * below. + */ + if (!strcmp(nat_entry->nb->type, "snat")) { + continue; + } + build_lrouter_nat_arp_nd_flow(od, nat_entry, lflows); + } + + /* Now handle SNAT entries too, one per unique SNAT IP. */ + struct shash_node *snat_snode; + SHASH_FOR_EACH (snat_snode, &od->snat_ips) { + struct ovn_snat_ip *snat_ip = snat_snode->data; + + if (ovs_list_is_empty(&snat_ip->snat_entries)) { + continue; + } + + struct ovn_nat *nat_entry = + CONTAINER_OF(ovs_list_front(&snat_ip->snat_entries), + struct ovn_nat, ext_addr_list_node); + build_lrouter_nat_arp_nd_flow(od, nat_entry, lflows); + } +} + + struct lswitch_flow_build_info { struct hmap *datapaths; struct hmap *ports; @@ -11165,6 +11168,7 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, build_arp_request_flows_for_lrouter(od, lsi->lflows, &lsi->match, &lsi->actions); build_misc_local_traffic_drop_flows_for_lrouter(od, lsi->lflows); + build_lrouter_arp_nd_flows(od, lsi->lflows); } /* Helper function to combine all lflow generation which is iterated by port. From patchwork Wed Nov 25 11:02:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406037 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgzHD6lL3z9sVj for ; Wed, 25 Nov 2020 22:31:20 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 4DF7786E6F; Wed, 25 Nov 2020 11:31:19 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m11r2cVRvE1q; Wed, 25 Nov 2020 11:31:06 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 0F23A86ED7; Wed, 25 Nov 2020 11:31:01 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id DEFFBC163C; Wed, 25 Nov 2020 11:31:00 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 9D41FC0052 for ; Wed, 25 Nov 2020 11:30:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 7EE872E177 for ; Wed, 25 Nov 2020 11:30:59 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ah3lKi2+eMDj for ; Wed, 25 Nov 2020 11:30:37 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by silver.osuosl.org (Postfix) with ESMTPS id BED972E10B for ; Wed, 25 Nov 2020 11:30:36 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kht0M-0004af-V0; Wed, 25 Nov 2020 11:30:35 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZO-00038T-ID; Wed, 25 Nov 2020 11:02:44 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:15 +0000 Message-Id: <20201125110216.7944-14-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 13/14] ovn-northd: move lrouter ip input to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 503 ++++++++++++++++++++++---------------------- 1 file changed, 253 insertions(+), 250 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 55b7ea898..0bcab7bd7 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -8741,7 +8741,7 @@ build_lrouter_force_snat_flows(struct hmap *lflows, struct ovn_datapath *od, } static void -build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, +build_lrouter_flows(struct hmap *datapaths, struct hmap *lflows, struct shash *meter_groups, struct hmap *lbs) { @@ -8752,254 +8752,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct ds actions = DS_EMPTY_INITIALIZER; struct ovn_datapath *od; - struct ovn_port *op; - - /* Logical router ingress table 3: IP Input for IPv4. */ - HMAP_FOR_EACH (op, key_node, ports) { - if (!op->nbrp) { - continue; - } - - if (op->derived) { - /* No ingress packets are accepted on a chassisredirect - * port, so no need to program flows for that port. */ - continue; - } - - if (op->lrp_networks.n_ipv4_addrs) { - /* L3 admission control: drop packets that originate from an - * IPv4 address owned by the router or a broadcast address - * known to the router (priority 100). */ - ds_clear(&match); - ds_put_cstr(&match, "ip4.src == "); - op_put_v4_networks(&match, op, true); - ds_put_cstr(&match, " && "REGBIT_EGRESS_LOOPBACK" == 0"); - ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, 100, - ds_cstr(&match), "drop;", - &op->nbrp->header_); - - /* ICMP echo reply. These flows reply to ICMP echo requests - * received for the router's IP address. Since packets only - * get here as part of the logical router datapath, the inport - * (i.e. the incoming locally attached net) does not matter. - * The ip.ttl also does not matter (RFC1812 section 4.2.2.9) */ - ds_clear(&match); - ds_put_cstr(&match, "ip4.dst == "); - op_put_v4_networks(&match, op, false); - ds_put_cstr(&match, " && icmp4.type == 8 && icmp4.code == 0"); - - const char * icmp_actions = "ip4.dst <-> ip4.src; " - "ip.ttl = 255; " - "icmp4.type = 0; " - "flags.loopback = 1; " - "next; "; - ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, - ds_cstr(&match), icmp_actions, - &op->nbrp->header_); - } - - /* ICMP time exceeded */ - for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { - ds_clear(&match); - ds_clear(&actions); - - ds_put_format(&match, - "inport == %s && ip4 && " - "ip.ttl == {0, 1} && !ip.later_frag", op->json_key); - ds_put_format(&actions, - "icmp4 {" - "eth.dst <-> eth.src; " - "icmp4.type = 11; /* Time exceeded */ " - "icmp4.code = 0; /* TTL exceeded in transit */ " - "ip4.dst = ip4.src; " - "ip4.src = %s; " - "ip.ttl = 255; " - "next; };", - op->lrp_networks.ipv4_addrs[i].addr_s); - ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, 40, - ds_cstr(&match), ds_cstr(&actions), - &op->nbrp->header_); - } - - /* ARP reply. These flows reply to ARP requests for the router's own - * IP address. */ - for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { - ds_clear(&match); - ds_put_format(&match, "arp.spa == %s/%u", - op->lrp_networks.ipv4_addrs[i].network_s, - op->lrp_networks.ipv4_addrs[i].plen); - - if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer - && op->peer->od->n_localnet_ports) { - bool add_chassis_resident_check = false; - if (op == op->od->l3dgw_port) { - /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s - * should only be sent from the gateway chassis, so that - * upstream MAC learning points to the gateway chassis. - * Also need to avoid generation of multiple ARP responses - * from different chassis. */ - add_chassis_resident_check = true; - } else { - /* Check if the option 'reside-on-redirect-chassis' - * is set to true on the router port. If set to true - * and if peer's logical switch has a localnet port, it - * means the router pipeline for the packets from - * peer's logical switch is be run on the chassis - * hosting the gateway port and it should reply to the - * ARP requests for the router port IPs. - */ - add_chassis_resident_check = smap_get_bool( - &op->nbrp->options, - "reside-on-redirect-chassis", false); - } - - if (add_chassis_resident_check) { - ds_put_format(&match, " && is_chassis_resident(%s)", - op->od->l3redirect_port->json_key); - } - } - - build_lrouter_arp_flow(op->od, op, - op->lrp_networks.ipv4_addrs[i].addr_s, - REG_INPORT_ETH_ADDR, &match, false, 90, - &op->nbrp->header_, lflows); - } - - /* A set to hold all load-balancer vips that need ARP responses. */ - struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4); - struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6); - get_router_load_balancer_ips(op->od, &all_ips_v4, &all_ips_v6); - - const char *ip_address; - SSET_FOR_EACH (ip_address, &all_ips_v4) { - ds_clear(&match); - if (op == op->od->l3dgw_port) { - ds_put_format(&match, "is_chassis_resident(%s)", - op->od->l3redirect_port->json_key); - } - - build_lrouter_arp_flow(op->od, op, - ip_address, REG_INPORT_ETH_ADDR, - &match, false, 90, NULL, lflows); - } - - SSET_FOR_EACH (ip_address, &all_ips_v6) { - ds_clear(&match); - if (op == op->od->l3dgw_port) { - ds_put_format(&match, "is_chassis_resident(%s)", - op->od->l3redirect_port->json_key); - } - - build_lrouter_nd_flow(op->od, op, "nd_na", - ip_address, NULL, REG_INPORT_ETH_ADDR, - &match, false, 90, NULL, lflows); - } - - sset_destroy(&all_ips_v4); - sset_destroy(&all_ips_v6); - - if (!smap_get(&op->od->nbr->options, "chassis") - && !op->od->l3dgw_port) { - /* UDP/TCP port unreachable. */ - for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { - ds_clear(&match); - ds_put_format(&match, - "ip4 && ip4.dst == %s && !ip.later_frag && udp", - op->lrp_networks.ipv4_addrs[i].addr_s); - const char *action = "icmp4 {" - "eth.dst <-> eth.src; " - "ip4.dst <-> ip4.src; " - "ip.ttl = 255; " - "icmp4.type = 3; " - "icmp4.code = 3; " - "next; };"; - ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, - 80, ds_cstr(&match), action, - &op->nbrp->header_); - - ds_clear(&match); - ds_put_format(&match, - "ip4 && ip4.dst == %s && !ip.later_frag && tcp", - op->lrp_networks.ipv4_addrs[i].addr_s); - action = "tcp_reset {" - "eth.dst <-> eth.src; " - "ip4.dst <-> ip4.src; " - "next; };"; - ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, - 80, ds_cstr(&match), action, - &op->nbrp->header_); - - ds_clear(&match); - ds_put_format(&match, - "ip4 && ip4.dst == %s && !ip.later_frag", - op->lrp_networks.ipv4_addrs[i].addr_s); - action = "icmp4 {" - "eth.dst <-> eth.src; " - "ip4.dst <-> ip4.src; " - "ip.ttl = 255; " - "icmp4.type = 3; " - "icmp4.code = 2; " - "next; };"; - ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, - 70, ds_cstr(&match), action, - &op->nbrp->header_); - } - } - - /* Drop IP traffic destined to router owned IPs except if the IP is - * also a SNAT IP. Those are dropped later, in stage - * "lr_in_arp_resolve", if unSNAT was unsuccessful. - * - * Priority 60. - */ - build_lrouter_drop_own_dest(op, S_ROUTER_IN_IP_INPUT, 60, false, - lflows); - - /* ARP / ND handling for external IP addresses. - * - * DNAT and SNAT IP addresses are external IP addresses that need ARP - * handling. - * - * These are already taken care globally, per router. The only - * exception is on the l3dgw_port where we might need to use a - * different ETH address. - */ - if (op != op->od->l3dgw_port) { - continue; - } - - for (size_t i = 0; i < op->od->nbr->n_nat; i++) { - struct ovn_nat *nat_entry = &op->od->nat_entries[i]; - - /* Skip entries we failed to parse. */ - if (!nat_entry_is_valid(nat_entry)) { - continue; - } - - /* Skip SNAT entries for now, we handle unique SNAT IPs separately - * below. - */ - if (!strcmp(nat_entry->nb->type, "snat")) { - continue; - } - build_lrouter_port_nat_arp_nd_flow(op, nat_entry, lflows); - } - - /* Now handle SNAT entries too, one per unique SNAT IP. */ - struct shash_node *snat_snode; - SHASH_FOR_EACH (snat_snode, &op->od->snat_ips) { - struct ovn_snat_ip *snat_ip = snat_snode->data; - - if (ovs_list_is_empty(&snat_ip->snat_entries)) { - continue; - } - - struct ovn_nat *nat_entry = - CONTAINER_OF(ovs_list_front(&snat_ip->snat_entries), - struct ovn_nat, ext_addr_list_node); - build_lrouter_port_nat_arp_nd_flow(op, nat_entry, lflows); - } - } /* NAT, Defrag and load balancing. */ HMAP_FOR_EACH (od, key_node, datapaths) { @@ -11112,6 +10864,256 @@ build_lrouter_arp_nd_flows(struct ovn_datapath *od, struct hmap *lflows) } } +/* Logical router ingress table 3: IP Input for IPv4. */ +static void +build_lrouter_ip_input_v4(struct ovn_port *op, struct hmap *lflows, + struct ds *match, struct ds *actions) +{ + if (!op->nbrp) { + return; + } + + if (op->derived) { + /* No ingress packets are accepted on a chassisredirect + * port, so no need to program flows for that port. */ + return; + } + + if (op->lrp_networks.n_ipv4_addrs) { + /* L3 admission control: drop packets that originate from an + * IPv4 address owned by the router or a broadcast address + * known to the router (priority 100). */ + ds_clear(match); + ds_put_cstr(match, "ip4.src == "); + op_put_v4_networks(match, op, true); + ds_put_cstr(match, " && "REGBIT_EGRESS_LOOPBACK" == 0"); + ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, 100, + ds_cstr(match), "drop;", + &op->nbrp->header_); + + /* ICMP echo reply. These flows reply to ICMP echo requests + * received for the router's IP address. Since packets only + * get here as part of the logical router datapath, the inport + * (i.e. the incoming locally attached net) does not matter. + * The ip.ttl also does not matter (RFC1812 section 4.2.2.9) */ + ds_clear(match); + ds_put_cstr(match, "ip4.dst == "); + op_put_v4_networks(match, op, false); + ds_put_cstr(match, " && icmp4.type == 8 && icmp4.code == 0"); + + const char * icmp_actions = "ip4.dst <-> ip4.src; " + "ip.ttl = 255; " + "icmp4.type = 0; " + "flags.loopback = 1; " + "next; "; + ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, + ds_cstr(match), icmp_actions, + &op->nbrp->header_); + } + + /* ICMP time exceeded */ + for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { + ds_clear(match); + ds_clear(actions); + + ds_put_format(match, + "inport == %s && ip4 && " + "ip.ttl == {0, 1} && !ip.later_frag", op->json_key); + ds_put_format(actions, + "icmp4 {" + "eth.dst <-> eth.src; " + "icmp4.type = 11; /* Time exceeded */ " + "icmp4.code = 0; /* TTL exceeded in transit */ " + "ip4.dst = ip4.src; " + "ip4.src = %s; " + "ip.ttl = 255; " + "next; };", + op->lrp_networks.ipv4_addrs[i].addr_s); + ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, 40, + ds_cstr(match), ds_cstr(actions), + &op->nbrp->header_); + } + + /* ARP reply. These flows reply to ARP requests for the router's own + * IP address. */ + for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { + ds_clear(match); + ds_put_format(match, "arp.spa == %s/%u", + op->lrp_networks.ipv4_addrs[i].network_s, + op->lrp_networks.ipv4_addrs[i].plen); + + if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer + && op->peer->od->n_localnet_ports) { + bool add_chassis_resident_check = false; + if (op == op->od->l3dgw_port) { + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s + * should only be sent from the gateway chassis, so that + * upstream MAC learning points to the gateway chassis. + * Also need to avoid generation of multiple ARP responses + * from different chassis. */ + add_chassis_resident_check = true; + } else { + /* Check if the option 'reside-on-redirect-chassis' + * is set to true on the router port. If set to true + * and if peer's logical switch has a localnet port, it + * means the router pipeline for the packets from + * peer's logical switch is be run on the chassis + * hosting the gateway port and it should reply to the + * ARP requests for the router port IPs. + */ + add_chassis_resident_check = smap_get_bool( + &op->nbrp->options, + "reside-on-redirect-chassis", false); + } + + if (add_chassis_resident_check) { + ds_put_format(match, " && is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + } + } + + build_lrouter_arp_flow(op->od, op, + op->lrp_networks.ipv4_addrs[i].addr_s, + REG_INPORT_ETH_ADDR, match, false, 90, + &op->nbrp->header_, lflows); + } + + /* A set to hold all load-balancer vips that need ARP responses. */ + struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4); + struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6); + get_router_load_balancer_ips(op->od, &all_ips_v4, &all_ips_v6); + + const char *ip_address; + SSET_FOR_EACH (ip_address, &all_ips_v4) { + ds_clear(match); + if (op == op->od->l3dgw_port) { + ds_put_format(match, "is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + } + + build_lrouter_arp_flow(op->od, op, + ip_address, REG_INPORT_ETH_ADDR, + match, false, 90, NULL, lflows); + } + + SSET_FOR_EACH (ip_address, &all_ips_v6) { + ds_clear(match); + if (op == op->od->l3dgw_port) { + ds_put_format(match, "is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + } + + build_lrouter_nd_flow(op->od, op, "nd_na", + ip_address, NULL, REG_INPORT_ETH_ADDR, + match, false, 90, NULL, lflows); + } + + sset_destroy(&all_ips_v4); + sset_destroy(&all_ips_v6); + + if (!smap_get(&op->od->nbr->options, "chassis") + && !op->od->l3dgw_port) { + /* UDP/TCP port unreachable. */ + for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { + ds_clear(match); + ds_put_format(match, + "ip4 && ip4.dst == %s && !ip.later_frag && udp", + op->lrp_networks.ipv4_addrs[i].addr_s); + const char *action = "icmp4 {" + "eth.dst <-> eth.src; " + "ip4.dst <-> ip4.src; " + "ip.ttl = 255; " + "icmp4.type = 3; " + "icmp4.code = 3; " + "next; };"; + ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, + 80, ds_cstr(match), action, + &op->nbrp->header_); + + ds_clear(match); + ds_put_format(match, + "ip4 && ip4.dst == %s && !ip.later_frag && tcp", + op->lrp_networks.ipv4_addrs[i].addr_s); + action = "tcp_reset {" + "eth.dst <-> eth.src; " + "ip4.dst <-> ip4.src; " + "next; };"; + ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, + 80, ds_cstr(match), action, + &op->nbrp->header_); + + ds_clear(match); + ds_put_format(match, + "ip4 && ip4.dst == %s && !ip.later_frag", + op->lrp_networks.ipv4_addrs[i].addr_s); + action = "icmp4 {" + "eth.dst <-> eth.src; " + "ip4.dst <-> ip4.src; " + "ip.ttl = 255; " + "icmp4.type = 3; " + "icmp4.code = 2; " + "next; };"; + ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_IP_INPUT, + 70, ds_cstr(match), action, + &op->nbrp->header_); + } + } + + /* Drop IP traffic destined to router owned IPs except if the IP is + * also a SNAT IP. Those are dropped later, in stage + * "lr_in_arp_resolve", if unSNAT was unsuccessful. + * + * Priority 60. + */ + build_lrouter_drop_own_dest(op, S_ROUTER_IN_IP_INPUT, 60, false, + lflows); + + /* ARP / ND handling for external IP addresses. + * + * DNAT and SNAT IP addresses are external IP addresses that need ARP + * handling. + * + * These are already taken care globally, per router. The only + * exception is on the l3dgw_port where we might need to use a + * different ETH address. + */ + if (op != op->od->l3dgw_port) { + return; + } + + for (size_t i = 0; i < op->od->nbr->n_nat; i++) { + struct ovn_nat *nat_entry = &op->od->nat_entries[i]; + + /* Skip entries we failed to parse. */ + if (!nat_entry_is_valid(nat_entry)) { + continue; + } + + /* Skip SNAT entries for now, we handle unique SNAT IPs separately + * below. + */ + if (!strcmp(nat_entry->nb->type, "snat")) { + continue; + } + build_lrouter_port_nat_arp_nd_flow(op, nat_entry, lflows); + } + + /* Now handle SNAT entries too, one per unique SNAT IP. */ + struct shash_node *snat_snode; + SHASH_FOR_EACH (snat_snode, &op->od->snat_ips) { + struct ovn_snat_ip *snat_ip = snat_snode->data; + + if (ovs_list_is_empty(&snat_ip->snat_entries)) { + continue; + } + + struct ovn_nat *nat_entry = + CONTAINER_OF(ovs_list_front(&snat_ip->snat_entries), + struct ovn_nat, ext_addr_list_node); + build_lrouter_port_nat_arp_nd_flow(op, nat_entry, lflows); + } +} + struct lswitch_flow_build_info { struct hmap *datapaths; @@ -11210,6 +11212,7 @@ build_lswitch_and_lrouter_iterate_by_op(struct ovn_port *op, build_dhcpv6_reply_flows_for_lrouter_port(op, lsi->lflows, &lsi->match); build_ipv6_input_flows_for_lrouter_port(op, lsi->lflows, &lsi->match, &lsi->actions); + build_lrouter_ip_input_v4(op, lsi->lflows, &lsi->match, &lsi->actions); } struct lflows_thread_pool { @@ -11417,7 +11420,7 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_lswitch_flows(datapaths, lflows); /* Legacy lrouter build - to be migrated. */ - build_lrouter_flows(datapaths, ports, lflows, meter_groups, lbs); + build_lrouter_flows(datapaths, lflows, meter_groups, lbs); } static ssize_t max_seen_lflow_size = 128; From patchwork Wed Nov 25 11:02:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Ivanov X-Patchwork-Id: 1406034 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cambridgegreys.com Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CgzGc69vyz9sVf for ; Wed, 25 Nov 2020 22:30:48 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 0595987597; Wed, 25 Nov 2020 11:30:47 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id st9HXSNWWw+P; Wed, 25 Nov 2020 11:30:36 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 9FC5187554; Wed, 25 Nov 2020 11:30:36 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 89587C163C; Wed, 25 Nov 2020 11:30:36 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4115AC163C for ; Wed, 25 Nov 2020 11:30:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 2231986C92 for ; Wed, 25 Nov 2020 11:30:34 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Q7q-akbfRAth for ; Wed, 25 Nov 2020 11:30:31 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from www.kot-begemot.co.uk (ivanoab7.miniserver.com [37.128.132.42]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 3B66B86C8E for ; Wed, 25 Nov 2020 11:30:31 +0000 (UTC) Received: from tun252.jain.kot-begemot.co.uk ([192.168.18.6] helo=jain.kot-begemot.co.uk) by www.kot-begemot.co.uk with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kht0G-0004aP-Kj; Wed, 25 Nov 2020 11:30:29 +0000 Received: from jain.kot-begemot.co.uk ([192.168.3.3]) by jain.kot-begemot.co.uk with esmtp (Exim 4.92) (envelope-from ) id 1khsZQ-00038T-Te; Wed, 25 Nov 2020 11:02:47 +0000 From: anton.ivanov@cambridgegreys.com To: ovs-dev@openvswitch.org Date: Wed, 25 Nov 2020 11:02:16 +0000 Message-Id: <20201125110216.7944-15-anton.ivanov@cambridgegreys.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> References: <20201125110216.7944-1-anton.ivanov@cambridgegreys.com> MIME-Version: 1.0 X-Clacks-Overhead: GNU Terry Pratchett Cc: Anton Ivanov Subject: [ovs-dev] [PATCH ovn v7 14/14] ovn-northd: move NAT, defrag, lb to a function X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Anton Ivanov Move NAT, defrag, lb to a function and eliminate the old build_lrouter_flows in favour of build by iterators. Signed-off-by: Anton Ivanov --- northd/ovn-northd.c | 1342 +++++++++++++++++++++---------------------- 1 file changed, 666 insertions(+), 676 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 0bcab7bd7..a0ac1374e 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -8740,679 +8740,6 @@ build_lrouter_force_snat_flows(struct hmap *lflows, struct ovn_datapath *od, ds_destroy(&actions); } -static void -build_lrouter_flows(struct hmap *datapaths, - struct hmap *lflows, struct shash *meter_groups, - struct hmap *lbs) -{ - /* This flow table structure is documented in ovn-northd(8), so please - * update ovn-northd.8.xml if you change anything. */ - - struct ds match = DS_EMPTY_INITIALIZER; - struct ds actions = DS_EMPTY_INITIALIZER; - - struct ovn_datapath *od; - - /* NAT, Defrag and load balancing. */ - HMAP_FOR_EACH (od, key_node, datapaths) { - if (!od->nbr) { - continue; - } - - /* Packets are allowed by default. */ - ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 0, "1", "next;"); - ovn_lflow_add(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 0, "1", "next;"); - - /* Send the IPv6 NS packets to next table. When ovn-controller - * generates IPv6 NS (for the action - nd_ns{}), the injected - * packet would go through conntrack - which is not required. */ - ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 120, "nd_ns", "next;"); - - /* NAT rules are only valid on Gateway routers and routers with - * l3dgw_port (router has a port with gateway chassis - * specified). */ - if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) { - continue; - } - - struct sset nat_entries = SSET_INITIALIZER(&nat_entries); - - bool dnat_force_snat_ip = - !lport_addresses_is_empty(&od->dnat_force_snat_addrs); - bool lb_force_snat_ip = - !lport_addresses_is_empty(&od->lb_force_snat_addrs); - - for (int i = 0; i < od->nbr->n_nat; i++) { - const struct nbrec_nat *nat; - - nat = od->nbr->nat[i]; - - ovs_be32 ip, mask; - struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT; - bool is_v6 = false; - bool stateless = lrouter_nat_is_stateless(nat); - struct nbrec_address_set *allowed_ext_ips = - nat->allowed_ext_ips; - struct nbrec_address_set *exempted_ext_ips = - nat->exempted_ext_ips; - - if (allowed_ext_ips && exempted_ext_ips) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); - VLOG_WARN_RL(&rl, "NAT rule: "UUID_FMT" not applied, since " - "both allowed and exempt external ips set", - UUID_ARGS(&(nat->header_.uuid))); - continue; - } - - char *error = ip_parse_masked(nat->external_ip, &ip, &mask); - if (error || mask != OVS_BE32_MAX) { - free(error); - error = ipv6_parse_masked(nat->external_ip, &ipv6, &mask_v6); - if (error || memcmp(&mask_v6, &v6_exact, sizeof(mask_v6))) { - /* Invalid for both IPv4 and IPv6 */ - static struct vlog_rate_limit rl = - VLOG_RATE_LIMIT_INIT(5, 1); - VLOG_WARN_RL(&rl, "bad external ip %s for nat", - nat->external_ip); - free(error); - continue; - } - /* It was an invalid IPv4 address, but valid IPv6. - * Treat the rest of the handling of this NAT rule - * as IPv6. */ - is_v6 = true; - } - - /* Check the validity of nat->logical_ip. 'logical_ip' can - * be a subnet when the type is "snat". */ - int cidr_bits; - if (is_v6) { - error = ipv6_parse_masked(nat->logical_ip, &ipv6, &mask_v6); - cidr_bits = ipv6_count_cidr_bits(&mask_v6); - } else { - error = ip_parse_masked(nat->logical_ip, &ip, &mask); - cidr_bits = ip_count_cidr_bits(mask); - } - if (!strcmp(nat->type, "snat")) { - if (error) { - /* Invalid for both IPv4 and IPv6 */ - static struct vlog_rate_limit rl = - VLOG_RATE_LIMIT_INIT(5, 1); - VLOG_WARN_RL(&rl, "bad ip network or ip %s for snat " - "in router "UUID_FMT"", - nat->logical_ip, UUID_ARGS(&od->key)); - free(error); - continue; - } - } else { - if (error || (!is_v6 && mask != OVS_BE32_MAX) - || (is_v6 && memcmp(&mask_v6, &v6_exact, - sizeof mask_v6))) { - /* Invalid for both IPv4 and IPv6 */ - static struct vlog_rate_limit rl = - VLOG_RATE_LIMIT_INIT(5, 1); - VLOG_WARN_RL(&rl, "bad ip %s for dnat in router " - ""UUID_FMT"", nat->logical_ip, UUID_ARGS(&od->key)); - free(error); - continue; - } - } - - /* For distributed router NAT, determine whether this NAT rule - * satisfies the conditions for distributed NAT processing. */ - bool distributed = false; - struct eth_addr mac; - if (od->l3dgw_port && !strcmp(nat->type, "dnat_and_snat") && - nat->logical_port && nat->external_mac) { - if (eth_addr_from_string(nat->external_mac, &mac)) { - distributed = true; - } else { - static struct vlog_rate_limit rl = - VLOG_RATE_LIMIT_INIT(5, 1); - VLOG_WARN_RL(&rl, "bad mac %s for dnat in router " - ""UUID_FMT"", nat->external_mac, UUID_ARGS(&od->key)); - continue; - } - } - - /* Ingress UNSNAT table: It is for already established connections' - * reverse traffic. i.e., SNAT has already been done in egress - * pipeline and now the packet has entered the ingress pipeline as - * part of a reply. We undo the SNAT here. - * - * Undoing SNAT has to happen before DNAT processing. This is - * because when the packet was DNATed in ingress pipeline, it did - * not know about the possibility of eventual additional SNAT in - * egress pipeline. */ - if (!strcmp(nat->type, "snat") - || !strcmp(nat->type, "dnat_and_snat")) { - if (!od->l3dgw_port) { - /* Gateway router. */ - ds_clear(&match); - ds_clear(&actions); - ds_put_format(&match, "ip && ip%s.dst == %s", - is_v6 ? "6" : "4", - nat->external_ip); - if (!strcmp(nat->type, "dnat_and_snat") && stateless) { - ds_put_format(&actions, "ip%s.dst=%s; next;", - is_v6 ? "6" : "4", nat->logical_ip); - } else { - ds_put_cstr(&actions, "ct_snat;"); - } - - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, - 90, ds_cstr(&match), - ds_cstr(&actions), - &nat->header_); - } else { - /* Distributed router. */ - - /* Traffic received on l3dgw_port is subject to NAT. */ - ds_clear(&match); - ds_clear(&actions); - ds_put_format(&match, "ip && ip%s.dst == %s" - " && inport == %s", - is_v6 ? "6" : "4", - nat->external_ip, - od->l3dgw_port->json_key); - if (!distributed && od->l3redirect_port) { - /* Flows for NAT rules that are centralized are only - * programmed on the gateway chassis. */ - ds_put_format(&match, " && is_chassis_resident(%s)", - od->l3redirect_port->json_key); - } - - if (!strcmp(nat->type, "dnat_and_snat") && stateless) { - ds_put_format(&actions, "ip%s.dst=%s; next;", - is_v6 ? "6" : "4", nat->logical_ip); - } else { - ds_put_cstr(&actions, "ct_snat;"); - } - - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, - 100, - ds_cstr(&match), ds_cstr(&actions), - &nat->header_); - } - } - - /* Ingress DNAT table: Packets enter the pipeline with destination - * IP address that needs to be DNATted from a external IP address - * to a logical IP address. */ - if (!strcmp(nat->type, "dnat") - || !strcmp(nat->type, "dnat_and_snat")) { - if (!od->l3dgw_port) { - /* Gateway router. */ - /* Packet when it goes from the initiator to destination. - * We need to set flags.loopback because the router can - * send the packet back through the same interface. */ - ds_clear(&match); - ds_put_format(&match, "ip && ip%s.dst == %s", - is_v6 ? "6" : "4", - nat->external_ip); - ds_clear(&actions); - if (allowed_ext_ips || exempted_ext_ips) { - lrouter_nat_add_ext_ip_match(od, lflows, &match, nat, - is_v6, true, mask); - } - - if (dnat_force_snat_ip) { - /* Indicate to the future tables that a DNAT has taken - * place and a force SNAT needs to be done in the - * Egress SNAT table. */ - ds_put_format(&actions, - "flags.force_snat_for_dnat = 1; "); - } - - if (!strcmp(nat->type, "dnat_and_snat") && stateless) { - ds_put_format(&actions, "flags.loopback = 1; " - "ip%s.dst=%s; next;", - is_v6 ? "6" : "4", nat->logical_ip); - } else { - ds_put_format(&actions, "flags.loopback = 1; " - "ct_dnat(%s", nat->logical_ip); - - if (nat->external_port_range[0]) { - ds_put_format(&actions, ",%s", - nat->external_port_range); - } - ds_put_format(&actions, ");"); - } - - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DNAT, 100, - ds_cstr(&match), ds_cstr(&actions), - &nat->header_); - } else { - /* Distributed router. */ - - /* Traffic received on l3dgw_port is subject to NAT. */ - ds_clear(&match); - ds_put_format(&match, "ip && ip%s.dst == %s" - " && inport == %s", - is_v6 ? "6" : "4", - nat->external_ip, - od->l3dgw_port->json_key); - if (!distributed && od->l3redirect_port) { - /* Flows for NAT rules that are centralized are only - * programmed on the gateway chassis. */ - ds_put_format(&match, " && is_chassis_resident(%s)", - od->l3redirect_port->json_key); - } - ds_clear(&actions); - if (allowed_ext_ips || exempted_ext_ips) { - lrouter_nat_add_ext_ip_match(od, lflows, &match, nat, - is_v6, true, mask); - } - - if (!strcmp(nat->type, "dnat_and_snat") && stateless) { - ds_put_format(&actions, "ip%s.dst=%s; next;", - is_v6 ? "6" : "4", nat->logical_ip); - } else { - ds_put_format(&actions, "ct_dnat(%s", nat->logical_ip); - if (nat->external_port_range[0]) { - ds_put_format(&actions, ",%s", - nat->external_port_range); - } - ds_put_format(&actions, ");"); - } - - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DNAT, 100, - ds_cstr(&match), ds_cstr(&actions), - &nat->header_); - } - } - - /* ARP resolve for NAT IPs. */ - if (od->l3dgw_port) { - if (!strcmp(nat->type, "snat")) { - ds_clear(&match); - ds_put_format( - &match, "inport == %s && %s == %s", - od->l3dgw_port->json_key, - is_v6 ? "ip6.src" : "ip4.src", nat->external_ip); - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_INPUT, - 120, ds_cstr(&match), "next;", - &nat->header_); - } - - if (!sset_contains(&nat_entries, nat->external_ip)) { - ds_clear(&match); - ds_put_format( - &match, "outport == %s && %s == %s", - od->l3dgw_port->json_key, - is_v6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4, - nat->external_ip); - ds_clear(&actions); - ds_put_format( - &actions, "eth.dst = %s; next;", - distributed ? nat->external_mac : - od->l3dgw_port->lrp_networks.ea_s); - ovn_lflow_add_with_hint(lflows, od, - S_ROUTER_IN_ARP_RESOLVE, - 100, ds_cstr(&match), - ds_cstr(&actions), - &nat->header_); - sset_add(&nat_entries, nat->external_ip); - } - } else { - /* Add the NAT external_ip to the nat_entries even for - * gateway routers. This is required for adding load balancer - * flows.*/ - sset_add(&nat_entries, nat->external_ip); - } - - /* Egress UNDNAT table: It is for already established connections' - * reverse traffic. i.e., DNAT has already been done in ingress - * pipeline and now the packet has entered the egress pipeline as - * part of a reply. We undo the DNAT here. - * - * Note that this only applies for NAT on a distributed router. - * Undo DNAT on a gateway router is done in the ingress DNAT - * pipeline stage. */ - if (od->l3dgw_port && (!strcmp(nat->type, "dnat") - || !strcmp(nat->type, "dnat_and_snat"))) { - ds_clear(&match); - ds_put_format(&match, "ip && ip%s.src == %s" - " && outport == %s", - is_v6 ? "6" : "4", - nat->logical_ip, - od->l3dgw_port->json_key); - if (!distributed && od->l3redirect_port) { - /* Flows for NAT rules that are centralized are only - * programmed on the gateway chassis. */ - ds_put_format(&match, " && is_chassis_resident(%s)", - od->l3redirect_port->json_key); - } - ds_clear(&actions); - if (distributed) { - ds_put_format(&actions, "eth.src = "ETH_ADDR_FMT"; ", - ETH_ADDR_ARGS(mac)); - } - - if (!strcmp(nat->type, "dnat_and_snat") && stateless) { - ds_put_format(&actions, "ip%s.src=%s; next;", - is_v6 ? "6" : "4", nat->external_ip); - } else { - ds_put_format(&actions, "ct_dnat;"); - } - - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_UNDNAT, 100, - ds_cstr(&match), ds_cstr(&actions), - &nat->header_); - } - - /* Egress SNAT table: Packets enter the egress pipeline with - * source ip address that needs to be SNATted to a external ip - * address. */ - if (!strcmp(nat->type, "snat") - || !strcmp(nat->type, "dnat_and_snat")) { - if (!od->l3dgw_port) { - /* Gateway router. */ - ds_clear(&match); - ds_put_format(&match, "ip && ip%s.src == %s", - is_v6 ? "6" : "4", - nat->logical_ip); - ds_clear(&actions); - - if (allowed_ext_ips || exempted_ext_ips) { - lrouter_nat_add_ext_ip_match(od, lflows, &match, nat, - is_v6, false, mask); - } - - if (!strcmp(nat->type, "dnat_and_snat") && stateless) { - ds_put_format(&actions, "ip%s.src=%s; next;", - is_v6 ? "6" : "4", nat->external_ip); - } else { - ds_put_format(&actions, "ct_snat(%s", - nat->external_ip); - - if (nat->external_port_range[0]) { - ds_put_format(&actions, ",%s", - nat->external_port_range); - } - ds_put_format(&actions, ");"); - } - - /* The priority here is calculated such that the - * nat->logical_ip with the longest mask gets a higher - * priority. */ - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT, - cidr_bits + 1, - ds_cstr(&match), ds_cstr(&actions), - &nat->header_); - } else { - uint16_t priority = cidr_bits + 1; - - /* Distributed router. */ - ds_clear(&match); - ds_put_format(&match, "ip && ip%s.src == %s" - " && outport == %s", - is_v6 ? "6" : "4", - nat->logical_ip, - od->l3dgw_port->json_key); - if (!distributed && od->l3redirect_port) { - /* Flows for NAT rules that are centralized are only - * programmed on the gateway chassis. */ - priority += 128; - ds_put_format(&match, " && is_chassis_resident(%s)", - od->l3redirect_port->json_key); - } - ds_clear(&actions); - - if (allowed_ext_ips || exempted_ext_ips) { - lrouter_nat_add_ext_ip_match(od, lflows, &match, nat, - is_v6, false, mask); - } - - if (distributed) { - ds_put_format(&actions, "eth.src = "ETH_ADDR_FMT"; ", - ETH_ADDR_ARGS(mac)); - } - - if (!strcmp(nat->type, "dnat_and_snat") && stateless) { - ds_put_format(&actions, "ip%s.src=%s; next;", - is_v6 ? "6" : "4", nat->external_ip); - } else { - ds_put_format(&actions, "ct_snat(%s", - nat->external_ip); - if (nat->external_port_range[0]) { - ds_put_format(&actions, ",%s", - nat->external_port_range); - } - ds_put_format(&actions, ");"); - } - - /* The priority here is calculated such that the - * nat->logical_ip with the longest mask gets a higher - * priority. */ - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT, - priority, ds_cstr(&match), - ds_cstr(&actions), - &nat->header_); - } - } - - /* Logical router ingress table 0: - * For NAT on a distributed router, add rules allowing - * ingress traffic with eth.dst matching nat->external_mac - * on the l3dgw_port instance where nat->logical_port is - * resident. */ - if (distributed) { - /* Store the ethernet address of the port receiving the packet. - * This will save us from having to match on inport further - * down in the pipeline. - */ - ds_clear(&actions); - ds_put_format(&actions, REG_INPORT_ETH_ADDR " = %s; next;", - od->l3dgw_port->lrp_networks.ea_s); - - ds_clear(&match); - ds_put_format(&match, - "eth.dst == "ETH_ADDR_FMT" && inport == %s" - " && is_chassis_resident(\"%s\")", - ETH_ADDR_ARGS(mac), - od->l3dgw_port->json_key, - nat->logical_port); - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ADMISSION, 50, - ds_cstr(&match), ds_cstr(&actions), - &nat->header_); - } - - /* Ingress Gateway Redirect Table: For NAT on a distributed - * router, add flows that are specific to a NAT rule. These - * flows indicate the presence of an applicable NAT rule that - * can be applied in a distributed manner. - * In particulr REG_SRC_IPV4/REG_SRC_IPV6 and eth.src are set to - * NAT external IP and NAT external mac so the ARP request - * generated in the following stage is sent out with proper IP/MAC - * src addresses. - */ - if (distributed) { - ds_clear(&match); - ds_clear(&actions); - ds_put_format(&match, - "ip%s.src == %s && outport == %s && " - "is_chassis_resident(\"%s\")", - is_v6 ? "6" : "4", nat->logical_ip, - od->l3dgw_port->json_key, nat->logical_port); - ds_put_format(&actions, "eth.src = %s; %s = %s; next;", - nat->external_mac, - is_v6 ? REG_SRC_IPV6 : REG_SRC_IPV4, - nat->external_ip); - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_GW_REDIRECT, - 100, ds_cstr(&match), - ds_cstr(&actions), &nat->header_); - } - - /* Egress Loopback table: For NAT on a distributed router. - * If packets in the egress pipeline on the distributed - * gateway port have ip.dst matching a NAT external IP, then - * loop a clone of the packet back to the beginning of the - * ingress pipeline with inport = outport. */ - if (od->l3dgw_port) { - /* Distributed router. */ - ds_clear(&match); - ds_put_format(&match, "ip%s.dst == %s && outport == %s", - is_v6 ? "6" : "4", - nat->external_ip, - od->l3dgw_port->json_key); - if (!distributed) { - ds_put_format(&match, " && is_chassis_resident(%s)", - od->l3redirect_port->json_key); - } else { - ds_put_format(&match, " && is_chassis_resident(\"%s\")", - nat->logical_port); - } - - ds_clear(&actions); - ds_put_format(&actions, - "clone { ct_clear; " - "inport = outport; outport = \"\"; " - "flags = 0; flags.loopback = 1; "); - for (int j = 0; j < MFF_N_LOG_REGS; j++) { - ds_put_format(&actions, "reg%d = 0; ", j); - } - ds_put_format(&actions, REGBIT_EGRESS_LOOPBACK" = 1; " - "next(pipeline=ingress, table=%d); };", - ovn_stage_get_table(S_ROUTER_IN_ADMISSION)); - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_EGR_LOOP, 100, - ds_cstr(&match), ds_cstr(&actions), - &nat->header_); - } - } - - /* Handle force SNAT options set in the gateway router. */ - if (!od->l3dgw_port) { - if (dnat_force_snat_ip) { - if (od->dnat_force_snat_addrs.n_ipv4_addrs) { - build_lrouter_force_snat_flows(lflows, od, "4", - od->dnat_force_snat_addrs.ipv4_addrs[0].addr_s, - "dnat"); - } - if (od->dnat_force_snat_addrs.n_ipv6_addrs) { - build_lrouter_force_snat_flows(lflows, od, "6", - od->dnat_force_snat_addrs.ipv6_addrs[0].addr_s, - "dnat"); - } - } - if (lb_force_snat_ip) { - if (od->lb_force_snat_addrs.n_ipv4_addrs) { - build_lrouter_force_snat_flows(lflows, od, "4", - od->lb_force_snat_addrs.ipv4_addrs[0].addr_s, "lb"); - } - if (od->lb_force_snat_addrs.n_ipv6_addrs) { - build_lrouter_force_snat_flows(lflows, od, "6", - od->lb_force_snat_addrs.ipv6_addrs[0].addr_s, "lb"); - } - } - - /* For gateway router, re-circulate every packet through - * the DNAT zone. This helps with the following. - * - * Any packet that needs to be unDNATed in the reverse - * direction gets unDNATed. Ideally this could be done in - * the egress pipeline. But since the gateway router - * does not have any feature that depends on the source - * ip address being external IP address for IP routing, - * we can do it here, saving a future re-circulation. */ - ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, - "ip", "flags.loopback = 1; ct_dnat;"); - } - - /* Load balancing and packet defrag are only valid on - * Gateway routers or router with gateway port. */ - if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) { - sset_destroy(&nat_entries); - continue; - } - - /* A set to hold all ips that need defragmentation and tracking. */ - struct sset all_ips = SSET_INITIALIZER(&all_ips); - - for (int i = 0; i < od->nbr->n_load_balancer; i++) { - struct nbrec_load_balancer *nb_lb = od->nbr->load_balancer[i]; - struct ovn_northd_lb *lb = - ovn_northd_lb_find(lbs, &nb_lb->header_.uuid); - ovs_assert(lb); - - for (size_t j = 0; j < lb->n_vips; j++) { - struct ovn_lb_vip *lb_vip = &lb->vips[j]; - struct ovn_northd_lb_vip *lb_vip_nb = &lb->vips_nb[j]; - ds_clear(&actions); - build_lb_vip_ct_lb_actions(lb_vip, lb_vip_nb, &actions, - lb->selection_fields); - - if (!sset_contains(&all_ips, lb_vip->vip_str)) { - sset_add(&all_ips, lb_vip->vip_str); - /* If there are any load balancing rules, we should send - * the packet to conntrack for defragmentation and - * tracking. This helps with two things. - * - * 1. With tracking, we can send only new connections to - * pick a DNAT ip address from a group. - * 2. If there are L4 ports in load balancing rules, we - * need the defragmentation to match on L4 ports. */ - ds_clear(&match); - if (IN6_IS_ADDR_V4MAPPED(&lb_vip->vip)) { - ds_put_format(&match, "ip && ip4.dst == %s", - lb_vip->vip_str); - } else { - ds_put_format(&match, "ip && ip6.dst == %s", - lb_vip->vip_str); - } - ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, - 100, ds_cstr(&match), "ct_next;", - &nb_lb->header_); - } - - /* Higher priority rules are added for load-balancing in DNAT - * table. For every match (on a VIP[:port]), we add two flows - * via add_router_lb_flow(). One flow is for specific matching - * on ct.new with an action of "ct_lb($targets);". The other - * flow is for ct.est with an action of "ct_dnat;". */ - ds_clear(&match); - if (IN6_IS_ADDR_V4MAPPED(&lb_vip->vip)) { - ds_put_format(&match, "ip && ip4.dst == %s", - lb_vip->vip_str); - } else { - ds_put_format(&match, "ip && ip6.dst == %s", - lb_vip->vip_str); - } - - int prio = 110; - bool is_udp = nullable_string_is_equal(nb_lb->protocol, "udp"); - bool is_sctp = nullable_string_is_equal(nb_lb->protocol, - "sctp"); - const char *proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; - - if (lb_vip->vip_port) { - ds_put_format(&match, " && %s && %s.dst == %d", proto, - proto, lb_vip->vip_port); - prio = 120; - } - - if (od->l3redirect_port) { - ds_put_format(&match, " && is_chassis_resident(%s)", - od->l3redirect_port->json_key); - } - add_router_lb_flow(lflows, od, &match, &actions, prio, - lb_force_snat_ip, lb_vip, proto, - nb_lb, meter_groups, &nat_entries); - } - } - sset_destroy(&all_ips); - sset_destroy(&nat_entries); - } - - ds_destroy(&match); - ds_destroy(&actions); -} - /* Logical router ingress Table 0: L2 Admission Control * Generic admission control flows (without inport check). */ @@ -11114,6 +10441,669 @@ build_lrouter_ip_input_v4(struct ovn_port *op, struct hmap *lflows, } } +/* NAT, Defrag and load balancing. */ +static void +build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od, + struct hmap *lflows, + struct shash *meter_groups, + struct hmap *lbs, + struct ds *match, + struct ds *actions) +{ + if (!od->nbr) { + return; + } + + /* Packets are allowed by default. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 0, "1", "next;"); + + /* Send the IPv6 NS packets to next table. When ovn-controller + * generates IPv6 NS (for the action - nd_ns{}), the injected + * packet would go through conntrack - which is not required. */ + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 120, "nd_ns", "next;"); + + /* NAT rules are only valid on Gateway routers and routers with + * l3dgw_port (router has a port with gateway chassis + * specified). */ + if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) { + return; + } + + struct sset nat_entries = SSET_INITIALIZER(&nat_entries); + + bool dnat_force_snat_ip = + !lport_addresses_is_empty(&od->dnat_force_snat_addrs); + bool lb_force_snat_ip = + !lport_addresses_is_empty(&od->lb_force_snat_addrs); + + for (int i = 0; i < od->nbr->n_nat; i++) { + const struct nbrec_nat *nat; + + nat = od->nbr->nat[i]; + + ovs_be32 ip, mask; + struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT; + bool is_v6 = false; + bool stateless = lrouter_nat_is_stateless(nat); + struct nbrec_address_set *allowed_ext_ips = + nat->allowed_ext_ips; + struct nbrec_address_set *exempted_ext_ips = + nat->exempted_ext_ips; + + if (allowed_ext_ips && exempted_ext_ips) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + VLOG_WARN_RL(&rl, "NAT rule: "UUID_FMT" not applied, since " + "both allowed and exempt external ips set", + UUID_ARGS(&(nat->header_.uuid))); + continue; + } + + char *error = ip_parse_masked(nat->external_ip, &ip, &mask); + if (error || mask != OVS_BE32_MAX) { + free(error); + error = ipv6_parse_masked(nat->external_ip, &ipv6, &mask_v6); + if (error || memcmp(&mask_v6, &v6_exact, sizeof(mask_v6))) { + /* Invalid for both IPv4 and IPv6 */ + static struct vlog_rate_limit rl = + VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad external ip %s for nat", + nat->external_ip); + free(error); + continue; + } + /* It was an invalid IPv4 address, but valid IPv6. + * Treat the rest of the handling of this NAT rule + * as IPv6. */ + is_v6 = true; + } + + /* Check the validity of nat->logical_ip. 'logical_ip' can + * be a subnet when the type is "snat". */ + int cidr_bits; + if (is_v6) { + error = ipv6_parse_masked(nat->logical_ip, &ipv6, &mask_v6); + cidr_bits = ipv6_count_cidr_bits(&mask_v6); + } else { + error = ip_parse_masked(nat->logical_ip, &ip, &mask); + cidr_bits = ip_count_cidr_bits(mask); + } + if (!strcmp(nat->type, "snat")) { + if (error) { + /* Invalid for both IPv4 and IPv6 */ + static struct vlog_rate_limit rl = + VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad ip network or ip %s for snat " + "in router "UUID_FMT"", + nat->logical_ip, UUID_ARGS(&od->key)); + free(error); + continue; + } + } else { + if (error || (!is_v6 && mask != OVS_BE32_MAX) + || (is_v6 && memcmp(&mask_v6, &v6_exact, + sizeof mask_v6))) { + /* Invalid for both IPv4 and IPv6 */ + static struct vlog_rate_limit rl = + VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad ip %s for dnat in router " + ""UUID_FMT"", nat->logical_ip, UUID_ARGS(&od->key)); + free(error); + continue; + } + } + + /* For distributed router NAT, determine whether this NAT rule + * satisfies the conditions for distributed NAT processing. */ + bool distributed = false; + struct eth_addr mac; + if (od->l3dgw_port && !strcmp(nat->type, "dnat_and_snat") && + nat->logical_port && nat->external_mac) { + if (eth_addr_from_string(nat->external_mac, &mac)) { + distributed = true; + } else { + static struct vlog_rate_limit rl = + VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad mac %s for dnat in router " + ""UUID_FMT"", nat->external_mac, UUID_ARGS(&od->key)); + continue; + } + } + + /* Ingress UNSNAT table: It is for already established connections' + * reverse traffic. i.e., SNAT has already been done in egress + * pipeline and now the packet has entered the ingress pipeline as + * part of a reply. We undo the SNAT here. + * + * Undoing SNAT has to happen before DNAT processing. This is + * because when the packet was DNATed in ingress pipeline, it did + * not know about the possibility of eventual additional SNAT in + * egress pipeline. */ + if (!strcmp(nat->type, "snat") + || !strcmp(nat->type, "dnat_and_snat")) { + if (!od->l3dgw_port) { + /* Gateway router. */ + ds_clear(match); + ds_clear(actions); + ds_put_format(match, "ip && ip%s.dst == %s", + is_v6 ? "6" : "4", + nat->external_ip); + if (!strcmp(nat->type, "dnat_and_snat") && stateless) { + ds_put_format(actions, "ip%s.dst=%s; next;", + is_v6 ? "6" : "4", nat->logical_ip); + } else { + ds_put_cstr(actions, "ct_snat;"); + } + + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, + 90, ds_cstr(match), + ds_cstr(actions), + &nat->header_); + } else { + /* Distributed router. */ + + /* Traffic received on l3dgw_port is subject to NAT. */ + ds_clear(match); + ds_clear(actions); + ds_put_format(match, "ip && ip%s.dst == %s" + " && inport == %s", + is_v6 ? "6" : "4", + nat->external_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the gateway chassis. */ + ds_put_format(match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + + if (!strcmp(nat->type, "dnat_and_snat") && stateless) { + ds_put_format(actions, "ip%s.dst=%s; next;", + is_v6 ? "6" : "4", nat->logical_ip); + } else { + ds_put_cstr(actions, "ct_snat;"); + } + + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, + 100, + ds_cstr(match), ds_cstr(actions), + &nat->header_); + } + } + + /* Ingress DNAT table: Packets enter the pipeline with destination + * IP address that needs to be DNATted from a external IP address + * to a logical IP address. */ + if (!strcmp(nat->type, "dnat") + || !strcmp(nat->type, "dnat_and_snat")) { + if (!od->l3dgw_port) { + /* Gateway router. */ + /* Packet when it goes from the initiator to destination. + * We need to set flags.loopback because the router can + * send the packet back through the same interface. */ + ds_clear(match); + ds_put_format(match, "ip && ip%s.dst == %s", + is_v6 ? "6" : "4", + nat->external_ip); + ds_clear(actions); + if (allowed_ext_ips || exempted_ext_ips) { + lrouter_nat_add_ext_ip_match(od, lflows, match, nat, + is_v6, true, mask); + } + + if (dnat_force_snat_ip) { + /* Indicate to the future tables that a DNAT has taken + * place and a force SNAT needs to be done in the + * Egress SNAT table. */ + ds_put_format(actions, + "flags.force_snat_for_dnat = 1; "); + } + + if (!strcmp(nat->type, "dnat_and_snat") && stateless) { + ds_put_format(actions, "flags.loopback = 1; " + "ip%s.dst=%s; next;", + is_v6 ? "6" : "4", nat->logical_ip); + } else { + ds_put_format(actions, "flags.loopback = 1; " + "ct_dnat(%s", nat->logical_ip); + + if (nat->external_port_range[0]) { + ds_put_format(actions, ",%s", + nat->external_port_range); + } + ds_put_format(actions, ");"); + } + + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_cstr(match), ds_cstr(actions), + &nat->header_); + } else { + /* Distributed router. */ + + /* Traffic received on l3dgw_port is subject to NAT. */ + ds_clear(match); + ds_put_format(match, "ip && ip%s.dst == %s" + " && inport == %s", + is_v6 ? "6" : "4", + nat->external_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the gateway chassis. */ + ds_put_format(match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ds_clear(actions); + if (allowed_ext_ips || exempted_ext_ips) { + lrouter_nat_add_ext_ip_match(od, lflows, match, nat, + is_v6, true, mask); + } + + if (!strcmp(nat->type, "dnat_and_snat") && stateless) { + ds_put_format(actions, "ip%s.dst=%s; next;", + is_v6 ? "6" : "4", nat->logical_ip); + } else { + ds_put_format(actions, "ct_dnat(%s", nat->logical_ip); + if (nat->external_port_range[0]) { + ds_put_format(actions, ",%s", + nat->external_port_range); + } + ds_put_format(actions, ");"); + } + + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_cstr(match), ds_cstr(actions), + &nat->header_); + } + } + + /* ARP resolve for NAT IPs. */ + if (od->l3dgw_port) { + if (!strcmp(nat->type, "snat")) { + ds_clear(match); + ds_put_format( + match, "inport == %s && %s == %s", + od->l3dgw_port->json_key, + is_v6 ? "ip6.src" : "ip4.src", nat->external_ip); + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_INPUT, + 120, ds_cstr(match), "next;", + &nat->header_); + } + + if (!sset_contains(&nat_entries, nat->external_ip)) { + ds_clear(match); + ds_put_format( + match, "outport == %s && %s == %s", + od->l3dgw_port->json_key, + is_v6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4, + nat->external_ip); + ds_clear(actions); + ds_put_format( + actions, "eth.dst = %s; next;", + distributed ? nat->external_mac : + od->l3dgw_port->lrp_networks.ea_s); + ovn_lflow_add_with_hint(lflows, od, + S_ROUTER_IN_ARP_RESOLVE, + 100, ds_cstr(match), + ds_cstr(actions), + &nat->header_); + sset_add(&nat_entries, nat->external_ip); + } + } else { + /* Add the NAT external_ip to the nat_entries even for + * gateway routers. This is required for adding load balancer + * flows.*/ + sset_add(&nat_entries, nat->external_ip); + } + + /* Egress UNDNAT table: It is for already established connections' + * reverse traffic. i.e., DNAT has already been done in ingress + * pipeline and now the packet has entered the egress pipeline as + * part of a reply. We undo the DNAT here. + * + * Note that this only applies for NAT on a distributed router. + * Undo DNAT on a gateway router is done in the ingress DNAT + * pipeline stage. */ + if (od->l3dgw_port && (!strcmp(nat->type, "dnat") + || !strcmp(nat->type, "dnat_and_snat"))) { + ds_clear(match); + ds_put_format(match, "ip && ip%s.src == %s" + " && outport == %s", + is_v6 ? "6" : "4", + nat->logical_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the gateway chassis. */ + ds_put_format(match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ds_clear(actions); + if (distributed) { + ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac)); + } + + if (!strcmp(nat->type, "dnat_and_snat") && stateless) { + ds_put_format(actions, "ip%s.src=%s; next;", + is_v6 ? "6" : "4", nat->external_ip); + } else { + ds_put_format(actions, "ct_dnat;"); + } + + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_UNDNAT, 100, + ds_cstr(match), ds_cstr(actions), + &nat->header_); + } + + /* Egress SNAT table: Packets enter the egress pipeline with + * source ip address that needs to be SNATted to a external ip + * address. */ + if (!strcmp(nat->type, "snat") + || !strcmp(nat->type, "dnat_and_snat")) { + if (!od->l3dgw_port) { + /* Gateway router. */ + ds_clear(match); + ds_put_format(match, "ip && ip%s.src == %s", + is_v6 ? "6" : "4", + nat->logical_ip); + ds_clear(actions); + + if (allowed_ext_ips || exempted_ext_ips) { + lrouter_nat_add_ext_ip_match(od, lflows, match, nat, + is_v6, false, mask); + } + + if (!strcmp(nat->type, "dnat_and_snat") && stateless) { + ds_put_format(actions, "ip%s.src=%s; next;", + is_v6 ? "6" : "4", nat->external_ip); + } else { + ds_put_format(actions, "ct_snat(%s", + nat->external_ip); + + if (nat->external_port_range[0]) { + ds_put_format(actions, ",%s", + nat->external_port_range); + } + ds_put_format(actions, ");"); + } + + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT, + cidr_bits + 1, + ds_cstr(match), ds_cstr(actions), + &nat->header_); + } else { + uint16_t priority = cidr_bits + 1; + + /* Distributed router. */ + ds_clear(match); + ds_put_format(match, "ip && ip%s.src == %s" + " && outport == %s", + is_v6 ? "6" : "4", + nat->logical_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the gateway chassis. */ + priority += 128; + ds_put_format(match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ds_clear(actions); + + if (allowed_ext_ips || exempted_ext_ips) { + lrouter_nat_add_ext_ip_match(od, lflows, match, nat, + is_v6, false, mask); + } + + if (distributed) { + ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac)); + } + + if (!strcmp(nat->type, "dnat_and_snat") && stateless) { + ds_put_format(actions, "ip%s.src=%s; next;", + is_v6 ? "6" : "4", nat->external_ip); + } else { + ds_put_format(actions, "ct_snat(%s", + nat->external_ip); + if (nat->external_port_range[0]) { + ds_put_format(actions, ",%s", + nat->external_port_range); + } + ds_put_format(actions, ");"); + } + + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT, + priority, ds_cstr(match), + ds_cstr(actions), + &nat->header_); + } + } + + /* Logical router ingress table 0: + * For NAT on a distributed router, add rules allowing + * ingress traffic with eth.dst matching nat->external_mac + * on the l3dgw_port instance where nat->logical_port is + * resident. */ + if (distributed) { + /* Store the ethernet address of the port receiving the packet. + * This will save us from having to match on inport further + * down in the pipeline. + */ + ds_clear(actions); + ds_put_format(actions, REG_INPORT_ETH_ADDR " = %s; next;", + od->l3dgw_port->lrp_networks.ea_s); + + ds_clear(match); + ds_put_format(match, + "eth.dst == "ETH_ADDR_FMT" && inport == %s" + " && is_chassis_resident(\"%s\")", + ETH_ADDR_ARGS(mac), + od->l3dgw_port->json_key, + nat->logical_port); + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ADMISSION, 50, + ds_cstr(match), ds_cstr(actions), + &nat->header_); + } + + /* Ingress Gateway Redirect Table: For NAT on a distributed + * router, add flows that are specific to a NAT rule. These + * flows indicate the presence of an applicable NAT rule that + * can be applied in a distributed manner. + * In particulr REG_SRC_IPV4/REG_SRC_IPV6 and eth.src are set to + * NAT external IP and NAT external mac so the ARP request + * generated in the following stage is sent out with proper IP/MAC + * src addresses. + */ + if (distributed) { + ds_clear(match); + ds_clear(actions); + ds_put_format(match, + "ip%s.src == %s && outport == %s && " + "is_chassis_resident(\"%s\")", + is_v6 ? "6" : "4", nat->logical_ip, + od->l3dgw_port->json_key, nat->logical_port); + ds_put_format(actions, "eth.src = %s; %s = %s; next;", + nat->external_mac, + is_v6 ? REG_SRC_IPV6 : REG_SRC_IPV4, + nat->external_ip); + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_GW_REDIRECT, + 100, ds_cstr(match), + ds_cstr(actions), &nat->header_); + } + + /* Egress Loopback table: For NAT on a distributed router. + * If packets in the egress pipeline on the distributed + * gateway port have ip.dst matching a NAT external IP, then + * loop a clone of the packet back to the beginning of the + * ingress pipeline with inport = outport. */ + if (od->l3dgw_port) { + /* Distributed router. */ + ds_clear(match); + ds_put_format(match, "ip%s.dst == %s && outport == %s", + is_v6 ? "6" : "4", + nat->external_ip, + od->l3dgw_port->json_key); + if (!distributed) { + ds_put_format(match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } else { + ds_put_format(match, " && is_chassis_resident(\"%s\")", + nat->logical_port); + } + + ds_clear(actions); + ds_put_format(actions, + "clone { ct_clear; " + "inport = outport; outport = \"\"; " + "flags = 0; flags.loopback = 1; "); + for (int j = 0; j < MFF_N_LOG_REGS; j++) { + ds_put_format(actions, "reg%d = 0; ", j); + } + ds_put_format(actions, REGBIT_EGRESS_LOOPBACK" = 1; " + "next(pipeline=ingress, table=%d); };", + ovn_stage_get_table(S_ROUTER_IN_ADMISSION)); + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_EGR_LOOP, 100, + ds_cstr(match), ds_cstr(actions), + &nat->header_); + } + } + + /* Handle force SNAT options set in the gateway router. */ + if (!od->l3dgw_port) { + if (dnat_force_snat_ip) { + if (od->dnat_force_snat_addrs.n_ipv4_addrs) { + build_lrouter_force_snat_flows(lflows, od, "4", + od->dnat_force_snat_addrs.ipv4_addrs[0].addr_s, + "dnat"); + } + if (od->dnat_force_snat_addrs.n_ipv6_addrs) { + build_lrouter_force_snat_flows(lflows, od, "6", + od->dnat_force_snat_addrs.ipv6_addrs[0].addr_s, + "dnat"); + } + } + if (lb_force_snat_ip) { + if (od->lb_force_snat_addrs.n_ipv4_addrs) { + build_lrouter_force_snat_flows(lflows, od, "4", + od->lb_force_snat_addrs.ipv4_addrs[0].addr_s, "lb"); + } + if (od->lb_force_snat_addrs.n_ipv6_addrs) { + build_lrouter_force_snat_flows(lflows, od, "6", + od->lb_force_snat_addrs.ipv6_addrs[0].addr_s, "lb"); + } + } + + /* For gateway router, re-circulate every packet through + * the DNAT zone. This helps with the following. + * + * Any packet that needs to be unDNATed in the reverse + * direction gets unDNATed. Ideally this could be done in + * the egress pipeline. But since the gateway router + * does not have any feature that depends on the source + * ip address being external IP address for IP routing, + * we can do it here, saving a future re-circulation. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, + "ip", "flags.loopback = 1; ct_dnat;"); + } + + /* Load balancing and packet defrag are only valid on + * Gateway routers or router with gateway port. */ + if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) { + sset_destroy(&nat_entries); + return; + } + + /* A set to hold all ips that need defragmentation and tracking. */ + struct sset all_ips = SSET_INITIALIZER(&all_ips); + + for (int i = 0; i < od->nbr->n_load_balancer; i++) { + struct nbrec_load_balancer *nb_lb = od->nbr->load_balancer[i]; + struct ovn_northd_lb *lb = + ovn_northd_lb_find(lbs, &nb_lb->header_.uuid); + ovs_assert(lb); + + for (size_t j = 0; j < lb->n_vips; j++) { + struct ovn_lb_vip *lb_vip = &lb->vips[j]; + struct ovn_northd_lb_vip *lb_vip_nb = &lb->vips_nb[j]; + ds_clear(actions); + build_lb_vip_ct_lb_actions(lb_vip, lb_vip_nb, actions, + lb->selection_fields); + + if (!sset_contains(&all_ips, lb_vip->vip_str)) { + sset_add(&all_ips, lb_vip->vip_str); + /* If there are any load balancing rules, we should send + * the packet to conntrack for defragmentation and + * tracking. This helps with two things. + * + * 1. With tracking, we can send only new connections to + * pick a DNAT ip address from a group. + * 2. If there are L4 ports in load balancing rules, we + * need the defragmentation to match on L4 ports. */ + ds_clear(match); + if (IN6_IS_ADDR_V4MAPPED(&lb_vip->vip)) { + ds_put_format(match, "ip && ip4.dst == %s", + lb_vip->vip_str); + } else { + ds_put_format(match, "ip && ip6.dst == %s", + lb_vip->vip_str); + } + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, + 100, ds_cstr(match), "ct_next;", + &nb_lb->header_); + } + + /* Higher priority rules are added for load-balancing in DNAT + * table. For every match (on a VIP[:port]), we add two flows + * via add_router_lb_flow(). One flow is for specific matching + * on ct.new with an action of "ct_lb($targets);". The other + * flow is for ct.est with an action of "ct_dnat;". */ + ds_clear(match); + if (IN6_IS_ADDR_V4MAPPED(&lb_vip->vip)) { + ds_put_format(match, "ip && ip4.dst == %s", + lb_vip->vip_str); + } else { + ds_put_format(match, "ip && ip6.dst == %s", + lb_vip->vip_str); + } + + int prio = 110; + bool is_udp = nullable_string_is_equal(nb_lb->protocol, "udp"); + bool is_sctp = nullable_string_is_equal(nb_lb->protocol, + "sctp"); + const char *proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; + + if (lb_vip->vip_port) { + ds_put_format(match, " && %s && %s.dst == %d", proto, + proto, lb_vip->vip_port); + prio = 120; + } + + if (od->l3redirect_port) { + ds_put_format(match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + add_router_lb_flow(lflows, od, match, actions, prio, + lb_force_snat_ip, lb_vip, proto, + nb_lb, meter_groups, &nat_entries); + } + } + sset_destroy(&all_ips); + sset_destroy(&nat_entries); +} + struct lswitch_flow_build_info { struct hmap *datapaths; @@ -11171,6 +11161,9 @@ build_lswitch_and_lrouter_iterate_by_od(struct ovn_datapath *od, &lsi->actions); build_misc_local_traffic_drop_flows_for_lrouter(od, lsi->lflows); build_lrouter_arp_nd_flows(od, lsi->lflows); + build_lrouter_nat_defrag_and_lb(od, lsi->lflows, lsi->meter_groups, + lsi->lbs, &lsi->match, + &lsi->actions); } /* Helper function to combine all lflow generation which is iterated by port. @@ -11418,9 +11411,6 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, /* Legacy lswitch build - to be migrated. */ build_lswitch_flows(datapaths, lflows); - - /* Legacy lrouter build - to be migrated. */ - build_lrouter_flows(datapaths, lflows, meter_groups, lbs); } static ssize_t max_seen_lflow_size = 128;