From patchwork Tue Nov 5 09:22:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Numan Siddique X-Patchwork-Id: 1189447 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 476kjb3z9cz9sPF for ; Tue, 5 Nov 2019 20:23:15 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 6CF041971; Tue, 5 Nov 2019 09:23:11 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 82796195C for ; Tue, 5 Nov 2019 09:23:10 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay1-d.mail.gandi.net (relay1-d.mail.gandi.net [217.70.183.193]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 423628A8 for ; Tue, 5 Nov 2019 09:23:07 +0000 (UTC) X-Originating-IP: 27.7.151.135 Received: from nummac.local (unknown [27.7.151.135]) (Authenticated sender: numans@ovn.org) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id D3765240013; Tue, 5 Nov 2019 09:23:03 +0000 (UTC) From: numans@ovn.org To: dev@openvswitch.org Date: Tue, 5 Nov 2019 14:52:54 +0530 Message-Id: <20191105092254.3657851-1-numans@ovn.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191105092215.3656232-1-numans@ovn.org> References: <20191105092215.3656232-1-numans@ovn.org> MIME-Version: 1.0 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH ovn v2 1/3] ovn-northd: Add support for Load Balancer health check X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Numan Siddique The present Load balancer feature in OVN provides load balancing functionality to the back end ips without checking if the chosen backend ip is reachable or not. In case a back end service is down and if that IP is chosen, then packet will be lost. This patch series adds the health check feature for these backend IPs and only active backend IPs are considered for load balancing. CMS needs to enable this functionality. In the SB DB a new table Service_Monitor is added. For every backend IP in the load balancer for which health check is configured, a new row in the Service_Monitor table is created. In the upcoming patch in this series, ovn-controller will monitor the services set in this table by generating a health check packet. Health checks are supported only for IPv4 Load balancers in this patch. Existing load balancers will be unaffected after this patch. Acked-by: Mark Michelson Signed-off-by: Numan Siddique --- northd/ovn-northd.8.xml | 75 +++++- northd/ovn-northd.c | 492 ++++++++++++++++++++++++++++++++++++---- ovn-nb.ovsschema | 25 +- ovn-nb.xml | 68 ++++++ ovn-sb.ovsschema | 33 ++- ovn-sb.xml | 85 +++++++ tests/ovn-northd.at | 215 ++++++++++++++++++ 7 files changed, 947 insertions(+), 46 deletions(-) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 0a33dcd9e..2e38e2d90 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -308,6 +308,16 @@ previously created, it will be associated to the empty_lb logical flow

+

+ This table also has a priority-110 flow with the match + eth.src == E for all logical switch + datapaths to move traffic to the next table. Where E + is the service monitor mac defined in the + colum of table. +

+

Ingress Table 5: Pre-stateful

@@ -476,7 +486,10 @@ , where args contains comma separated IP addresses (and optional port numbers) to load balance to. The address family of the IP addresses of args is the same as the address family - of VIP + of VIP. If health check is enabled, then args + will only contain those endpoints whose service monitor status entry + in OVN_Southbound db is either online or + empty.

  • For all the configured load balancing rules for a switch in @@ -698,6 +711,51 @@ nd_na_router {

  • +
  • +

    + For each SVC_MON_SRC_IP defined in the value of + the column of + table, priority-110 + logical flow is added with the match + arp.tpa == SVC_MON_SRC_IP + && && arp.op == 1 and applies the action +

    + +
    +eth.dst = eth.src;
    +eth.src = E;
    +arp.op = 2; /* ARP reply. */
    +arp.tha = arp.sha;
    +arp.sha = E;
    +arp.tpa = arp.spa;
    +arp.spa = A;
    +outport = inport;
    +flags.loopback = 1;
    +output;
    +        
    + +

    + where E is the service monitor source mac defined in + the column in the table. This mac is used as the source mac + in the service monitor packets for the load balancer endpoint IP + health checks. +

    + +

    + SVC_MON_SRC_IP is used as the source ip in the + service monitor IPv4 packets for the load balancer endpoint IP + health checks. +

    + +

    + These flows are required if an ARP request is sent for the IP + SVC_MON_SRC_IP. +

    +
  • +
  • One priority-0 fallback flow that matches all packets and advances to the next table. @@ -1086,6 +1144,16 @@ output; tracker for packet de-fragmentation.

    +

    + This table also has a priority-110 flow with the match + eth.src == E for all logical switch + datapaths to move traffic to the next table. Where E + is the service monitor mac defined in the + colum of table. +

    +

    Egress Table 1: to-lport Pre-ACLs

    @@ -1926,7 +1994,10 @@ icmp6 { (and optional port numbers) to load balance to. If the router is configured to force SNAT any load-balanced packets, the above action will be replaced by flags.force_snat_for_lb = 1; - ct_lb(args);. + ct_lb(args);. If health check is enabled, then + args will only contain those endpoints whose service + monitor status entry in OVN_Southbound db is + either online or empty.

  • diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index c23c270dc..b0f513de6 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -88,6 +88,13 @@ static struct eth_addr mac_prefix; static bool controller_event_en; +/* MAC allocated for service monitor usage. Just one mac is allocated + * for this purpose and ovn-controller's on each chassis will make use + * of this mac when sending out the packets to monitor the services + * defined in Service_Monitor Southbound table. Since these packets + * all locally handled, having just one mac is good enough. */ +static char svc_monitor_mac[ETH_ADDR_STRLEN + 1]; + #define MAX_OVN_TAGS 4096 /* Pipeline stages. */ @@ -2935,6 +2942,291 @@ cleanup_sb_ha_chassis_groups(struct northd_context *ctx, } } +struct ovn_lb { + struct hmap_node hmap_node; + + const struct nbrec_load_balancer *nlb; /* May be NULL. */ + + struct lb_vip *vips; + size_t n_vips; +}; + +struct lb_vip { + char *vip; + uint16_t vip_port; + int addr_family; + char *backend_ips; + + bool health_check; + struct lb_vip_backend *backends; + size_t n_backends; +}; + +struct lb_vip_backend { + char *ip; + uint16_t port; + + struct ovn_port *op; /* Logical port to which the ip belong to. */ + bool health_check; + char *svc_mon_src_ip; /* Source IP to use for monitoring. */ + const struct sbrec_service_monitor *sbrec_monitor; +}; + + +static inline struct ovn_lb * +ovn_lb_find(struct hmap *lbs, struct uuid *uuid) +{ + struct ovn_lb *lb; + size_t hash = uuid_hash(uuid); + HMAP_FOR_EACH_WITH_HASH (lb, hmap_node, hash, lbs) { + if (uuid_equals(&lb->nlb->header_.uuid, uuid)) { + return lb; + } + } + + return NULL; +} + + +struct service_monitor_info { + struct hmap_node hmap_node; + const struct sbrec_service_monitor *sbrec_mon; + bool required; +}; + + +static struct service_monitor_info * +create_or_get_service_mon(struct northd_context *ctx, + struct hmap *monitor_map, + const char *ip, const char *logical_port, + uint16_t service_port, const char *protocol) +{ + uint32_t hash = service_port; + hash = hash_string(ip, hash); + hash = hash_string(logical_port, hash); + struct service_monitor_info *mon_info; + + HMAP_FOR_EACH_WITH_HASH (mon_info, hmap_node, hash, monitor_map) { + if (mon_info->sbrec_mon->port == service_port && + !strcmp(mon_info->sbrec_mon->ip, ip) && + !strcmp(mon_info->sbrec_mon->protocol, protocol) && + !strcmp(mon_info->sbrec_mon->logical_port, logical_port)) { + return mon_info; + } + } + + struct sbrec_service_monitor *sbrec_mon = + sbrec_service_monitor_insert(ctx->ovnsb_txn); + sbrec_service_monitor_set_ip(sbrec_mon, ip); + sbrec_service_monitor_set_port(sbrec_mon, service_port); + sbrec_service_monitor_set_logical_port(sbrec_mon, logical_port); + sbrec_service_monitor_set_protocol(sbrec_mon, protocol); + mon_info = xzalloc(sizeof *mon_info); + mon_info->sbrec_mon = sbrec_mon; + hmap_insert(monitor_map, &mon_info->hmap_node, hash); + return mon_info; +} + +static struct ovn_lb * +ovn_lb_create(struct northd_context *ctx, struct hmap *lbs, + const struct nbrec_load_balancer *nbrec_lb, + struct hmap *ports, struct hmap *monitor_map) +{ + struct ovn_lb *lb = xzalloc(sizeof *lb); + + size_t hash = uuid_hash(&nbrec_lb->header_.uuid); + lb->nlb = nbrec_lb; + hmap_insert(lbs, &lb->hmap_node, hash); + + lb->n_vips = smap_count(&nbrec_lb->vips); + lb->vips = xcalloc(lb->n_vips, sizeof (struct lb_vip)); + struct smap_node *node; + size_t n_vips = 0; + + SMAP_FOR_EACH (node, &nbrec_lb->vips) { + char *vip = NULL; + uint16_t port; + int addr_family; + + ip_address_and_port_from_lb_key(node->key, &vip, &port, + &addr_family); + if (!vip) { + continue; + } + + lb->vips[n_vips].vip = vip; + lb->vips[n_vips].vip_port = port; + lb->vips[n_vips].addr_family = addr_family; + lb->vips[n_vips].backend_ips = xstrdup(node->value); + + struct nbrec_load_balancer_health_check *lb_health_check = NULL; + for (size_t i = 0; i < nbrec_lb->n_health_check; i++) { + if (!strcmp(nbrec_lb->health_check[i]->vip, node->key)) { + lb_health_check = nbrec_lb->health_check[i]; + break; + } + } + + char *tokstr = xstrdup(node->value); + char *save_ptr = NULL; + char *token; + size_t n_backends = 0; + /* Format for a backend ips : IP1:port1,IP2:port2,...". */ + for (token = strtok_r(tokstr, ",", &save_ptr); + token != NULL; + token = strtok_r(NULL, ",", &save_ptr)) { + n_backends++; + } + + free(tokstr); + tokstr = xstrdup(node->value); + save_ptr = NULL; + + lb->vips[n_vips].n_backends = n_backends; + lb->vips[n_vips].backends = xcalloc(n_backends, + sizeof (struct lb_vip_backend)); + lb->vips[n_vips].health_check = lb_health_check ? true: false; + + size_t i = 0; + for (token = strtok_r(tokstr, ",", &save_ptr); + token != NULL; + token = strtok_r(NULL, ",", &save_ptr)) { + char *backend_ip; + uint16_t backend_port; + + ip_address_and_port_from_lb_key(token, &backend_ip, &backend_port, + &addr_family); + + if (!backend_ip) { + continue; + } + + /* Get the logical port to which this ip belongs to. */ + struct ovn_port *op = NULL; + char *svc_mon_src_ip = NULL; + const char *s = smap_get(&nbrec_lb->ip_port_mappings, + backend_ip); + if (s) { + char *port_name = xstrdup(s); + char *p = strstr(port_name, ":"); + if (p) { + *p = 0; + p++; + op = ovn_port_find(ports, port_name); + svc_mon_src_ip = xstrdup(p); + } + free(port_name); + } + + lb->vips[n_vips].backends[i].ip = backend_ip; + lb->vips[n_vips].backends[i].port = backend_port; + lb->vips[n_vips].backends[i].op = op; + lb->vips[n_vips].backends[i].svc_mon_src_ip = svc_mon_src_ip; + + if (lb_health_check && op && svc_mon_src_ip) { + const char *protocol = nbrec_lb->protocol; + if (!protocol || !protocol[0]) { + protocol = "tcp"; + } + lb->vips[n_vips].backends[i].health_check = true; + struct service_monitor_info *mon_info = + create_or_get_service_mon(ctx, monitor_map, backend_ip, + op->nbsp->name, backend_port, + protocol); + + ovs_assert(mon_info); + sbrec_service_monitor_set_options( + mon_info->sbrec_mon, &lb_health_check->options); + if (!mon_info->sbrec_mon->src_mac || + strcmp(mon_info->sbrec_mon->src_mac, svc_monitor_mac)) { + sbrec_service_monitor_set_src_mac(mon_info->sbrec_mon, + svc_monitor_mac); + } + + if (!mon_info->sbrec_mon->src_ip || + strcmp(mon_info->sbrec_mon->src_ip, svc_mon_src_ip)) { + sbrec_service_monitor_set_src_ip(mon_info->sbrec_mon, + svc_mon_src_ip); + } + + lb->vips[n_vips].backends[i].sbrec_monitor = + mon_info->sbrec_mon; + mon_info->required = true; + } else { + lb->vips[n_vips].backends[i].health_check = false; + } + + i++; + } + + free(tokstr); + n_vips++; + } + + return lb; +} + +static void +ovn_lb_destroy(struct ovn_lb *lb) +{ + for (size_t i = 0; i < lb->n_vips; i++) { + free(lb->vips[i].vip); + free(lb->vips[i].backend_ips); + + for (size_t j = 0; j < lb->vips[i].n_backends; j++) { + free(lb->vips[i].backends[j].ip); + free(lb->vips[i].backends[j].svc_mon_src_ip); + } + + free(lb->vips[i].backends); + } + free(lb->vips); +} + +static void +build_ovn_lbs(struct northd_context *ctx, struct hmap *ports, + struct hmap *lbs) +{ + hmap_init(lbs); + struct hmap monitor_map = HMAP_INITIALIZER(&monitor_map); + + const struct sbrec_service_monitor *sbrec_mon; + SBREC_SERVICE_MONITOR_FOR_EACH (sbrec_mon, ctx->ovnsb_idl) { + uint32_t hash = sbrec_mon->port; + hash = hash_string(sbrec_mon->ip, hash); + hash = hash_string(sbrec_mon->logical_port, hash); + struct service_monitor_info *mon_info = xzalloc(sizeof *mon_info); + mon_info->sbrec_mon = sbrec_mon; + mon_info->required = false; + hmap_insert(&monitor_map, &mon_info->hmap_node, hash); + } + + const struct nbrec_load_balancer *nbrec_lb; + NBREC_LOAD_BALANCER_FOR_EACH (nbrec_lb, ctx->ovnnb_idl) { + ovn_lb_create(ctx, lbs, nbrec_lb, ports, &monitor_map); + } + + struct service_monitor_info *mon_info; + HMAP_FOR_EACH_POP (mon_info, hmap_node, &monitor_map) { + if (!mon_info->required) { + sbrec_service_monitor_delete(mon_info->sbrec_mon); + } + + free(mon_info); + } + hmap_destroy(&monitor_map); +} + +static void +destroy_ovn_lbs(struct hmap *lbs) +{ + struct ovn_lb *lb; + HMAP_FOR_EACH_POP (lb, hmap_node, lbs) { + ovn_lb_destroy(lb); + free(lb); + } +} + /* Updates the southbound Port_Binding table so that it contains the logical * switch ports specified by the northbound database. * @@ -4314,6 +4606,14 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110, "nd || nd_rs || nd_ra", "next;"); + /* Do not send service monitor packets to conntrack. */ + char *svc_check_match = xasprintf("eth.src == %s", svc_monitor_mac); + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110, + svc_check_match, "next;"); + ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110, + svc_check_match, "next;"); + free(svc_check_match); + /* Allow all packets to go to next tables by default. */ ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 0, "1", "next;"); @@ -4981,7 +5281,7 @@ build_lb(struct ovn_datapath *od, struct hmap *lflows) } static void -build_stateful(struct ovn_datapath *od, struct hmap *lflows) +build_stateful(struct ovn_datapath *od, struct hmap *lflows, struct hmap *lbs) { /* Ingress and Egress stateful Table (Priority 0): Packets are * allowed by default. */ @@ -5015,47 +5315,69 @@ build_stateful(struct ovn_datapath *od, struct hmap *lflows) * connection, so it is okay if we do not hit the above match on * REGBIT_CONNTRACK_COMMIT. */ for (int i = 0; i < od->nbs->n_load_balancer; i++) { - struct nbrec_load_balancer *lb = od->nbs->load_balancer[i]; - struct smap *vips = &lb->vips; - struct smap_node *node; + struct ovn_lb *lb = + ovn_lb_find(lbs, &od->nbs->load_balancer[i]->header_.uuid); + ovs_assert(lb); - SMAP_FOR_EACH (node, vips) { - uint16_t port = 0; - int addr_family; + for (size_t j = 0; j < lb->n_vips; j++) { + struct lb_vip *lb_vip = &lb->vips[j]; + /* New connections in Ingress table. */ + struct ds action = DS_EMPTY_INITIALIZER; + if (lb_vip->health_check) { + ds_put_cstr(&action, "ct_lb("); + + size_t n_active_backends = 0; + for (size_t k = 0; k < lb_vip->n_backends; k++) { + struct lb_vip_backend *backend = &lb_vip->backends[k]; + bool is_up = true; + if (backend->health_check && backend->sbrec_monitor && + backend->sbrec_monitor->status && + strcmp(backend->sbrec_monitor->status, "online")) { + is_up = false; + } - /* node->key contains IP:port or just IP. */ - char *ip_address = NULL; - ip_address_and_port_from_lb_key(node->key, &ip_address, &port, - &addr_family); - if (!ip_address) { - continue; + if (is_up) { + n_active_backends++; + ds_put_format(&action, "%s:%"PRIu16",", + backend->ip, backend->port); + } + } + + if (!n_active_backends) { + ds_clear(&action); + ds_put_cstr(&action, "drop;"); + } else { + ds_chomp(&action, ','); + ds_put_cstr(&action, ");"); + } + } else { + ds_put_format(&action, "ct_lb(%s);", lb_vip->backend_ips); } - /* New connections in Ingress table. */ - char *action = xasprintf("ct_lb(%s);", node->value); struct ds match = DS_EMPTY_INITIALIZER; - if (addr_family == AF_INET) { - ds_put_format(&match, "ct.new && ip4.dst == %s", ip_address); + if (lb_vip->addr_family == AF_INET) { + ds_put_format(&match, "ct.new && ip4.dst == %s", lb_vip->vip); } else { - ds_put_format(&match, "ct.new && ip6.dst == %s", ip_address); + ds_put_format(&match, "ct.new && ip6.dst == %s", lb_vip->vip); } - if (port) { - if (lb->protocol && !strcmp(lb->protocol, "udp")) { - ds_put_format(&match, " && udp.dst == %d", port); + if (lb_vip->vip_port) { + if (lb->nlb->protocol && !strcmp(lb->nlb->protocol, "udp")) { + ds_put_format(&match, " && udp.dst == %d", + lb_vip->vip_port); } else { - ds_put_format(&match, " && tcp.dst == %d", port); + ds_put_format(&match, " && tcp.dst == %d", + lb_vip->vip_port); } ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL, - 120, ds_cstr(&match), action); + 120, ds_cstr(&match), ds_cstr(&action)); } else { ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL, - 110, ds_cstr(&match), action); + 110, ds_cstr(&match), ds_cstr(&action)); } - free(ip_address); ds_destroy(&match); - free(action); - } + ds_destroy(&action); + } } } @@ -5165,7 +5487,8 @@ static void build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *port_groups, struct hmap *lflows, struct hmap *mcgroups, struct hmap *igmp_groups, - struct shash *meter_groups) + struct shash *meter_groups, + struct hmap *lbs) { /* This flow table structure is documented in ovn-northd(8), so please * update ovn-northd.8.xml if you change anything. */ @@ -5187,7 +5510,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, build_acls(od, lflows, port_groups); build_qos(od, lflows); build_lb(od, lflows); - build_stateful(od, lflows); + build_stateful(od, lflows, lbs); } /* Logical switch ingress table 0: Admission control framework (priority @@ -5389,6 +5712,46 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_SWITCH_IN_ARP_ND_RSP, 0, "1", "next;"); } + /* Ingress table 11: ARP/ND responder for service monitor source ip. + * (priority 110)*/ + struct ovn_lb *lb; + HMAP_FOR_EACH (lb, hmap_node, lbs) { + for (size_t i = 0; i < lb->n_vips; i++) { + if (!lb->vips[i].health_check) { + continue; + } + + for (size_t j = 0; j < lb->vips[i].n_backends; j++) { + if (!lb->vips[i].backends[j].op || + !lb->vips[i].backends[j].svc_mon_src_ip) { + continue; + } + + ds_clear(&match); + ds_put_format(&match, "arp.tpa == %s && arp.op == 1", + lb->vips[i].backends[j].svc_mon_src_ip); + ds_clear(&actions); + ds_put_format(&actions, + "eth.dst = eth.src; " + "eth.src = %s; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = %s; " + "arp.tpa = arp.spa; " + "arp.spa = %s; " + "outport = inport; " + "flags.loopback = 1; " + "output;", + svc_monitor_mac, svc_monitor_mac, + lb->vips[i].backends[j].svc_mon_src_ip); + ovn_lflow_add(lflows, lb->vips[i].backends[j].op->od, + S_SWITCH_IN_ARP_ND_RSP, 110, + ds_cstr(&match), ds_cstr(&actions)); + } + } + } + + /* Logical switch ingress table 12 and 13: DHCP options and response * priority 100 flows. */ HMAP_FOR_EACH (op, key_node, ports) { @@ -8820,12 +9183,13 @@ static void build_lflows(struct northd_context *ctx, struct hmap *datapaths, struct hmap *ports, struct hmap *port_groups, struct hmap *mcgroups, struct hmap *igmp_groups, - struct shash *meter_groups) + struct shash *meter_groups, + struct hmap *lbs) { struct hmap lflows = HMAP_INITIALIZER(&lflows); build_lswitch_flows(datapaths, ports, port_groups, &lflows, mcgroups, - igmp_groups, meter_groups); + igmp_groups, meter_groups, lbs); build_lrouter_flows(datapaths, ports, &lflows, meter_groups); /* Push changes to the Logical_Flow table to database. */ @@ -9542,9 +9906,11 @@ ovnnb_db_run(struct northd_context *ctx, struct hmap mcast_groups; struct hmap igmp_groups; struct shash meter_groups = SHASH_INITIALIZER(&meter_groups); + struct hmap lbs; build_datapaths(ctx, datapaths, lr_list); build_ports(ctx, sbrec_chassis_by_name, datapaths, ports); + build_ovn_lbs(ctx, ports, &lbs); build_ipam(datapaths, ports); build_port_group_lswitches(ctx, &port_groups, ports); build_lrouter_groups(ports, lr_list); @@ -9552,12 +9918,14 @@ ovnnb_db_run(struct northd_context *ctx, build_mcast_groups(ctx, datapaths, ports, &mcast_groups, &igmp_groups); build_meter_groups(ctx, &meter_groups); build_lflows(ctx, datapaths, ports, &port_groups, &mcast_groups, - &igmp_groups, &meter_groups); + &igmp_groups, &meter_groups, &lbs); sync_address_sets(ctx); sync_port_groups(ctx); sync_meters(ctx); sync_dns_entries(ctx, datapaths); + destroy_ovn_lbs(&lbs); + hmap_destroy(&lbs); struct ovn_igmp_group *igmp_group, *next_igmp_group; @@ -9606,16 +9974,43 @@ ovnnb_db_run(struct northd_context *ctx, &addr.ea[0], &addr.ea[1], &addr.ea[2])) { mac_prefix = addr; } - } else { - struct smap options; + } + const char *monitor_mac = smap_get(&nb->options, "svc_monitor_mac"); + if (monitor_mac) { + struct eth_addr addr; + + memset(&addr, 0, sizeof addr); + if (eth_addr_from_string(monitor_mac, &addr)) { + snprintf(svc_monitor_mac, sizeof svc_monitor_mac, + ETH_ADDR_FMT, ETH_ADDR_ARGS(addr)); + } else { + monitor_mac = NULL; + } + } + + if (!mac_addr_prefix || !monitor_mac) { + struct smap options; smap_clone(&options, &nb->options); - eth_addr_random(&mac_prefix); - memset(&mac_prefix.ea[3], 0, 3); - smap_add_format(&options, "mac_prefix", - "%02"PRIx8":%02"PRIx8":%02"PRIx8, - mac_prefix.ea[0], mac_prefix.ea[1], mac_prefix.ea[2]); + if (!mac_addr_prefix) { + eth_addr_random(&mac_prefix); + memset(&mac_prefix.ea[3], 0, 3); + + smap_add_format(&options, "mac_prefix", + "%02"PRIx8":%02"PRIx8":%02"PRIx8, + mac_prefix.ea[0], mac_prefix.ea[1], + mac_prefix.ea[2]); + } + + if (!monitor_mac) { + struct eth_addr addr; + eth_addr_random(&addr); + snprintf(svc_monitor_mac, sizeof svc_monitor_mac, + ETH_ADDR_FMT, ETH_ADDR_ARGS(addr)); + smap_replace(&options, "svc_monitor_mac", svc_monitor_mac); + } + nbrec_nb_global_verify_options(nb); nbrec_nb_global_set_options(nb, &options); @@ -10415,6 +10810,25 @@ main(int argc, char *argv[]) &sbrec_ip_multicast_col_query_interval); add_column_noalert(ovnsb_idl_loop.idl, &sbrec_ip_multicast_col_query_max_resp); + ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_service_monitor); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_ip); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_logical_port); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_port); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_options); + ovsdb_idl_add_column(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_status); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_protocol); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_src_mac); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_src_ip); + add_column_noalert(ovnsb_idl_loop.idl, + &sbrec_service_monitor_col_external_ids); struct ovsdb_idl_index *sbrec_chassis_by_name = chassis_index_create(ovnsb_idl_loop.idl); diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema index 084305b24..12999a466 100644 --- a/ovn-nb.ovsschema +++ b/ovn-nb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Northbound", - "version": "5.17.0", - "cksum": "1128988054 23237", + "version": "5.18.0", + "cksum": "2806349485 24196", "tables": { "NB_Global": { "columns": { @@ -152,10 +152,31 @@ "type": {"key": {"type": "string", "enum": ["set", ["tcp", "udp"]]}, "min": 0, "max": 1}}, + "health_check": {"type": { + "key": {"type": "uuid", + "refTable": "Load_Balancer_Health_Check", + "refType": "strong"}, + "min": 0, + "max": "unlimited"}}, + "ip_port_mappings": { + "type": {"key": "string", "value": "string", + "min": 0, "max": "unlimited"}}, "external_ids": { "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"}}}, "isRoot": true}, + "Load_Balancer_Health_Check": { + "columns": { + "vip": {"type": "string"}, + "options": { + "type": {"key": "string", + "value": "string", + "min": 0, + "max": "unlimited"}}, + "external_ids": { + "type": {"key": "string", "value": "string", + "min": 0, "max": "unlimited"}}}, + "isRoot": false}, "ACL": { "columns": { "name": {"type": {"key": {"type": "string", diff --git a/ovn-nb.xml b/ovn-nb.xml index d8f3237fc..4a93d2f4a 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -1297,6 +1297,74 @@

    + + Load balancer health checks associated with this load balancer. + If health check is desired for a vip's endpoints defined in + the + column, then a row in the table + should + be created and referenced here and L4 port should be defined + for the vip and it's endpoints. Health checks are supported only + for IPv4 load balancers. + + + +

    + This column is used if load balancer health checks are enabled. + This keeps a mapping of endpoint IP to the logical port name. + The source ip to be used for health checks is also expected to be + defined. The key of the mapping is the endpoint IP and the value + is in the format : port_name:SRC_IP +

    + +

    + Eg. If there is a VIP entry: + "10.0.0.10:80=10.0.0.4:8080,20.0.0.4:8080", + then the IP to port mappings should be defined as: + "10.0.0.4"="sw0-p1:10.0.0.2" and + "20.0.0.4"="sw1-p1:20.0.0.2". 10.0.0.2 + and 20.0.0.2 will be used by ovn-controller + as source ip when it sends out health check packets. +

    +
    + + + + See External IDs at the beginning of this document. + + + + + +

    + Each row represents one load balancer health check. Health checks + are supported for IPv4 load balancers only. +

    + + + vip whose endpoints should be monitored for health check. + + + + + The interval, in seconds, between health checks. + + + + The time, in seconds, after which a health check times out. + + + + The number of successful checks after which the endpoint is + considered online. + + + + The number of failure checks after which the endpoint is considered + offline. + + + See External IDs at the beginning of this document. diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema index 5c013b17e..56af0ed3e 100644 --- a/ovn-sb.ovsschema +++ b/ovn-sb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Southbound", - "version": "2.5.0", - "cksum": "1257419092 20387", + "version": "2.6.0", + "cksum": "4271405686 21646", "tables": { "SB_Global": { "columns": { @@ -403,4 +403,31 @@ "refType": "weak"}, "min": 0, "max": "unlimited"}}}, "indexes": [["address", "datapath", "chassis"]], - "isRoot": true}}} + "isRoot": true}, + "Service_Monitor": { + "columns": { + "ip": {"type": "string"}, + "protocol": { + "type": {"key": {"type": "string", + "enum": ["set", ["tcp", "udp"]]}, + "min": 0, "max": 1}}, + "port": {"type": {"key": {"type": "integer", + "minInteger": 0, + "maxInteger": 32767}}}, + "logical_port": {"type": "string"}, + "src_mac": {"type": "string"}, + "src_ip": {"type": "string"}, + "status": { + "type": {"key": {"type": "string", + "enum": ["set", ["online", "offline", "error"]]}, + "min": 0, "max": 1}}, + "options": { + "type": {"key": "string", "value": "string", + "min": 0, "max": "unlimited"}}, + "external_ids": { + "type": {"key": "string", "value": "string", + "min": 0, "max": "unlimited"}}}, + "indexes": [["logical_port", "ip", "port", "protocol"]], + "isRoot": true} + } +} diff --git a/ovn-sb.xml b/ovn-sb.xml index e5fb51a9d..335f9031b 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -3743,4 +3743,89 @@ tcp.flags = RST; The destination port bindings for this IGMP group.
    + + +

    + This table montiors a service for its liveliness. The service + can be an IPv4 tcp or a udp service. ovn-controller + periodically sends out service monitor packets and updates the + status of the service. Service monitoring for IPv6 services is + not supported. +

    + + + IP of the service to be monitored. Only IPv4 is supported. + + + + The protocol of the service. It can be either tcp or + udp. + + + + The tcp or udp port of the service. + + + + The VIF of logical port on which the service is running. The + ovn-controller which binds this logical_port + monitors the service by sending periodic monitor packets. + + + +

    + The ovn-controller which binds the + logical_port updates the status to online + offline or error. +

    + +

    + For tcp service, ovn-controller sends a + TCP SYN packet to the service and expects a + TCP ACK response to consider the service to be + online. +

    + +

    + For udp service, ovn-controller sends a udp + packet to the service and doesn't expect any reply. If it receives + ICMP reply, then it considers the service to be offline. +

    +
    + + + Source Ethernet address to use in the service monitor packet. + + + + Source IPv4 address to use in the service monitor packet. + + + + + The interval, in seconds, between service monitor checks. + + + + The time, in seconds, after which the service monitor check times + out. + + + + The number of successful checks after which the service is + considered online. + + + + The number of failure checks after which the service is considered + offline. + + + + + + See External IDs at the beginning of this document. + + +
    diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 989ed4f47..da566f900 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -1061,3 +1061,218 @@ AT_CHECK([ovn-sbctl dump-flows R1 | grep ip6.src=| wc -l], [0], [2 ]) AT_CLEANUP + +AT_SETUP([ovn -- check Load balancer health check and Service Monitor sync]) +AT_SKIP_IF([test $HAVE_PYTHON = no]) +ovn_start + +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80,20.0.0.3:80 + +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:10.0.0.3=sw0-p1 +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:20.0.0.3=sw1-p1 + +OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor | wc -l`]) + +ovn-nbctl --wait=sb -- --id=@hc create \ +Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ +health_check @hc + +OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor | wc -l`]) + +# create logical switches and ports +ovn-nbctl ls-add sw0 +ovn-nbctl --wait=sb lsp-add sw0 sw0-p1 -- lsp-set-addresses sw0-p1 \ +"00:00:00:00:00:03 10.0.0.3" + +OVS_WAIT_UNTIL([test 0 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | wc -l`]) + +ovn-nbctl ls-add sw1 +ovn-nbctl --wait=sb lsp-add sw1 sw1-p1 -- lsp-set-addresses sw1-p1 \ +"02:00:00:00:00:03 20.0.0.3" + +OVS_WAIT_UNTIL([test 0 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | sed '/^$/d' | wc -l`]) + +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:10.0.0.3=sw0-p1:10.0.0.2 +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | wc -l`]) + +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:20.0.0.3=sw1-p1:20.0.0.2 + +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | sed '/^$/d' | wc -l`]) + +ovn-nbctl --wait=sb ls-lb-add sw0 lb1 + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80,20.0.0.3:80);) +]) + +# Delete the Load_Balancer_Health_Check +ovn-nbctl --wait=sb clear load_balancer . health_check +OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor | wc -l`]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80,20.0.0.3:80);) +]) + +# Create the Load_Balancer_Health_Check again. +ovn-nbctl --wait=sb -- --id=@hc create \ +Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ +health_check @hc + +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | sed '/^$/d' | wc -l`]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80,20.0.0.3:80);) +]) + +# Get the uuid of both the service_monitor +sm_sw0_p1=`ovn-sbctl --bare --columns _uuid find service_monitor logical_port=sw0-p1` +sm_sw1_p1=`ovn-sbctl --bare --columns _uuid find service_monitor logical_port=sw1-p1` + +# Set the service monitor for sw1-p1 to offline +ovn-sbctl set service_monitor $sm_sw1_p1 status=offline + +OVS_WAIT_UNTIL([ + status=`ovn-sbctl --bare --columns status find service_monitor logical_port=sw1-p1` + test "$status" = "offline"]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80);) +]) + +# Set the service monitor for sw0-p1 to offline +ovn-sbctl set service_monitor $sm_sw0_p1 status=offline + +OVS_WAIT_UNTIL([ + status=`ovn-sbctl --bare --columns status find service_monitor logical_port=sw0-p1` + test "$status" = "offline"]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl +]) + +ovn-sbctl dump-flows sw0 | grep "ip4.dst == 10.0.0.10 && tcp.dst == 80" \ +| grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(drop;) +]) + +# Set the service monitor for sw0-p1 and sw1-p1 to online +ovn-sbctl set service_monitor $sm_sw0_p1 status=online +ovn-sbctl set service_monitor $sm_sw1_p1 status=online + +OVS_WAIT_UNTIL([ + status=`ovn-sbctl --bare --columns status find service_monitor logical_port=sw1-p1` + test "$status" = "online"]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80,20.0.0.3:80);) +]) + +# Set the service monitor for sw1-p1 to error +ovn-sbctl set service_monitor $sm_sw1_p1 status=error +OVS_WAIT_UNTIL([ + status=`ovn-sbctl --bare --columns status find service_monitor logical_port=sw1-p1` + test "$status" = "error"]) + +ovn-sbctl dump-flows sw0 | grep "ip4.dst == 10.0.0.10 && tcp.dst == 80" \ +| grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80);) +]) + +# Add one more vip to lb1 + +ovn-nbctl set load_balancer . vip:"10.0.0.40\:1000"="10.0.0.3:1000,20.0.0.3:80" + +# create health_check for new vip - 10.0.0.40 +ovn-nbctl --wait=sb -- --id=@hc create \ +Load_Balancer_Health_Check vip="10.0.0.40\:1000" -- add Load_Balancer . \ +health_check @hc + +# There should be totally 3 rows in service_monitor for - +# * 10.0.0.3:80 +# * 10.0.0.3:1000 +# * 20.0.0.3:80 + +OVS_WAIT_UNTIL([test 3 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | sed '/^$/d' | wc -l`]) + +# There should be 2 rows with logical_port=sw0-p1 +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor logical_port=sw0-p1 | sed '/^$/d' | wc -l`]) + +# There should be 1 row1 with port=1000 +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor port=1000 | sed '/^$/d' | wc -l`]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80);) + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.40 && tcp.dst == 1000), action=(ct_lb(10.0.0.3:1000);) +]) + +# Set the service monitor for sw1-p1 to online +ovn-sbctl set service_monitor $sm_sw1_p1 status=online + +OVS_WAIT_UNTIL([ + status=`ovn-sbctl --bare --columns status find service_monitor logical_port=sw1-p1` + test "$status" = "online"]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80,20.0.0.3:80);) + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.40 && tcp.dst == 1000), action=(ct_lb(10.0.0.3:1000,20.0.0.3:80);) +]) + +# Associate lb1 to sw1 +ovn-nbctl --wait=sb ls-lb-add sw1 lb1 +ovn-sbctl dump-flows sw1 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80,20.0.0.3:80);) + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.40 && tcp.dst == 1000), action=(ct_lb(10.0.0.3:1000,20.0.0.3:80);) +]) + +# Now create lb2 same as lb1 but udp protocol. +ovn-nbctl lb-add lb2 10.0.0.10:80 10.0.0.3:80,20.0.0.3:80 udp +lb2_uuid=`ovn-nbctl lb-list | grep udp | awk '{print $1}'` +ovn-nbctl --wait=sb set load_balancer $lb2_uuid ip_port_mappings:10.0.0.3=sw0-p1:10.0.0.2 +ovn-nbctl --wait=sb set load_balancer $lb2_uuid ip_port_mappings:20.0.0.3=sw1-p1:20.0.0.2 + +ovn-nbctl -- --id=@hc create Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer $lb2_uuid health_check @hc + +ovn-nbctl ls-lb-add sw0 lb2 +ovn-nbctl ls-lb-add sw1 lb2 +ovn-nbctl lr-lb-add lr0 lb2 + +OVS_WAIT_UNTIL([test 5 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | sed '/^$/d' | wc -l`]) + +# Change the svc_monitor_mac. This should get reflected in service_monitor table rows. +ovn-nbctl set NB_Global . options:svc_monitor_mac="fe:a0:65:a2:01:03" + +OVS_WAIT_UNTIL([test 5 = `ovn-sbctl --bare --columns src_mac find \ +service_monitor | grep "fe:a0:65:a2:01:03" | wc -l`]) + +# Change the source ip for 10.0.0.3 backend ip in lb2 +ovn-nbctl --wait=sb set load_balancer $lb2_uuid ip_port_mappings:10.0.0.3=sw0-p1:10.0.0.100 + +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl --bare --columns src_ip find \ +service_monitor logical_port=sw0-p1 | grep "10.0.0.100" | wc -l`]) + +ovn-nbctl --wait=sb lb-del lb1 +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns _uuid find service_monitor | sed '/^$/d' | wc -l`]) + +ovn-nbctl --wait=sb lb-del lb2 +OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor | wc -l`]) + +AT_CLEANUP From patchwork Tue Nov 5 09:23:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Numan Siddique X-Patchwork-Id: 1189449 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 476kkj208lz9s4Y for ; Tue, 5 Nov 2019 20:24:13 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 58E7C1977; Tue, 5 Nov 2019 09:23:19 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 047A11981 for ; Tue, 5 Nov 2019 09:23:18 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay12.mail.gandi.net (relay12.mail.gandi.net [217.70.178.232]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0EE8B27B for ; Tue, 5 Nov 2019 09:23:16 +0000 (UTC) Received: from nummac.local (unknown [27.7.151.135]) (Authenticated sender: numans@ovn.org) by relay12.mail.gandi.net (Postfix) with ESMTPSA id 4AD3C20001F; Tue, 5 Nov 2019 09:23:14 +0000 (UTC) From: numans@ovn.org To: dev@openvswitch.org Date: Tue, 5 Nov 2019 14:53:06 +0530 Message-Id: <20191105092306.3658405-1-numans@ovn.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191105092215.3656232-1-numans@ovn.org> References: <20191105092215.3656232-1-numans@ovn.org> MIME-Version: 1.0 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH ovn v2 2/3] Add a new action - handle_svc_check X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Numan Siddique This action will be used in an upcoming patch to handle the service monitor replies from ovn-controller when it sends out service monitor requests. This action gets translated to openflow controller action. Acked-by: Mark Michelson Signed-off-by: Numan Siddique --- include/ovn/actions.h | 17 ++++++++++++++++- lib/actions.c | 42 ++++++++++++++++++++++++++++++++++++++++++ ovn-sb.xml | 17 +++++++++++++++++ tests/ovn.at | 13 +++++++++++++ utilities/ovn-trace.c | 3 +++ 5 files changed, 91 insertions(+), 1 deletion(-) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index f4997e9c9..047a8d737 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -88,7 +88,8 @@ struct ovn_extend_table; OVNACT(OVNFIELD_LOAD, ovnact_load) \ OVNACT(CHECK_PKT_LARGER, ovnact_check_pkt_larger) \ OVNACT(TRIGGER_EVENT, ovnact_controller_event) \ - OVNACT(BIND_VPORT, ovnact_bind_vport) + OVNACT(BIND_VPORT, ovnact_bind_vport) \ + OVNACT(HANDLE_SVC_CHECK, ovnact_handle_svc_check) /* enum ovnact_type, with a member OVNACT_ for each action. */ enum OVS_PACKED_ENUM ovnact_type { @@ -352,6 +353,12 @@ struct ovnact_bind_vport { struct expr_field vport_parent; /* Logical virtual port's port name. */ }; +/* OVNACT_HANDLE_SVC_CHECK. */ +struct ovnact_handle_svc_check { + struct ovnact ovnact; + struct expr_field port; /* Logical port name. */ +}; + /* Internal use by the helpers below. */ void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); @@ -537,6 +544,14 @@ enum action_opcode { * MFF_LOG_INPORT. */ ACTION_OPCODE_BIND_VPORT, + + /* "handle_svc_check(port)"." + * + * Arguments are passed through the packet metadata and data, as follows: + * + * MFF_LOG_INPORT = port + */ + ACTION_OPCODE_HANDLE_SVC_CHECK, }; /* Header. */ diff --git a/lib/actions.c b/lib/actions.c index a999a4fda..586d7b75d 100644 --- a/lib/actions.c +++ b/lib/actions.c @@ -2814,6 +2814,46 @@ ovnact_bind_vport_free(struct ovnact_bind_vport *bp) free(bp->vport); } +static void +parse_handle_svc_check(struct action_context *ctx OVS_UNUSED) +{ + if (!lexer_force_match(ctx->lexer, LEX_T_LPAREN)) { + return; + } + + struct ovnact_handle_svc_check *svc_chk = + ovnact_put_HANDLE_SVC_CHECK(ctx->ovnacts); + action_parse_field(ctx, 0, false, &svc_chk->port); + lexer_force_match(ctx->lexer, LEX_T_RPAREN); +} + +static void +format_HANDLE_SVC_CHECK(const struct ovnact_handle_svc_check *svc_chk, + struct ds *s) +{ + ds_put_cstr(s, "handle_svc_check("); + expr_field_format(&svc_chk->port, s); + ds_put_cstr(s, ");"); +} + +static void +encode_HANDLE_SVC_CHECK(const struct ovnact_handle_svc_check *svc_chk, + const struct ovnact_encode_params *ep OVS_UNUSED, + struct ofpbuf *ofpacts) +{ + const struct arg args[] = { + { expr_resolve_field(&svc_chk->port), MFF_LOG_INPORT }, + }; + encode_setup_args(args, ARRAY_SIZE(args), ofpacts); + encode_controller_op(ACTION_OPCODE_HANDLE_SVC_CHECK, ofpacts); + encode_restore_args(args, ARRAY_SIZE(args), ofpacts); +} + +static void +ovnact_handle_svc_check_free(struct ovnact_handle_svc_check *sc OVS_UNUSED) +{ +} + /* Parses an assignment or exchange or put_dhcp_opts action. */ static void parse_set_action(struct action_context *ctx) @@ -2931,6 +2971,8 @@ parse_action(struct action_context *ctx) parse_trigger_event(ctx, ovnact_put_TRIGGER_EVENT(ctx->ovnacts)); } else if (lexer_match_id(ctx->lexer, "bind_vport")) { parse_bind_vport(ctx); + } else if (lexer_match_id(ctx->lexer, "handle_svc_check")) { + parse_handle_svc_check(ctx); } else { lexer_syntax_error(ctx->lexer, "expecting action"); } diff --git a/ovn-sb.xml b/ovn-sb.xml index 335f9031b..82167c488 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -2097,6 +2097,23 @@ tcp.flags = RST; set to P.

    + +
    handle_svc_check(P);
    +
    +

    + Parameters: logical port string field P. +

    + +

    + Handles the service monitor reply received from the VIF of + the logical port P. ovn-controller + periodically sends out the service monitor packets for the + services configured in the + table and this action updates the status of those services. +

    + +

    Example: handle_svc_check(inport);

    +
    diff --git a/tests/ovn.at b/tests/ovn.at index 410f4b514..b30f12c9a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -1468,6 +1468,19 @@ bind_vport("xyzzy",; bind_vport("xyzzy", inport; Syntax error at `;' expecting `)'. +# handle_svc_check +handle_svc_check(inport); + encodes as controller(userdata=00.00.00.12.00.00.00.00) + +handle_svc_check(outport); + encodes as push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],controller(userdata=00.00.00.12.00.00.00.00),pop:NXM_NX_REG14[] + +handle_svc_check(); + Syntax error at `)' expecting field name. + +handle_svc_check(reg0); + Cannot use numeric field reg0 where string field is required. + # Miscellaneous negative tests. ; Syntax error at `;'. diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c index ea64dc673..19b82e6a4 100644 --- a/utilities/ovn-trace.c +++ b/utilities/ovn-trace.c @@ -2221,6 +2221,9 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, case OVNACT_BIND_VPORT: break; + + case OVNACT_HANDLE_SVC_CHECK: + break; } } ds_destroy(&s); From patchwork Tue Nov 5 09:23:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Numan Siddique X-Patchwork-Id: 1189450 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 476klV5tczz9s4Y for ; Tue, 5 Nov 2019 20:24:54 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 02393198C; Tue, 5 Nov 2019 09:23:32 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 581CB198B for ; Tue, 5 Nov 2019 09:23:30 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay11.mail.gandi.net (relay11.mail.gandi.net [217.70.178.231]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id D977389E for ; Tue, 5 Nov 2019 09:23:26 +0000 (UTC) Received: from nummac.local (unknown [27.7.151.135]) (Authenticated sender: numans@ovn.org) by relay11.mail.gandi.net (Postfix) with ESMTPSA id CF931100017; Tue, 5 Nov 2019 09:23:23 +0000 (UTC) From: numans@ovn.org To: dev@openvswitch.org Date: Tue, 5 Nov 2019 14:53:16 +0530 Message-Id: <20191105092316.3658815-1-numans@ovn.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191105092215.3656232-1-numans@ovn.org> References: <20191105092215.3656232-1-numans@ovn.org> MIME-Version: 1.0 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH ovn v2 3/3] Send service monitor health checks X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Numan Siddique ovn-controller will periodically sends out the service monitor packets for the services configured in the SB DB Service_Monitor table. This patch makes use of the action - handle_svc_check to handle the service monitor reply packets from the service. This patch supports IPv4 TCP and UDP service monitoring. For TCP services, it sends out a TCP SYN packet and expects TCP ACK packet in response. If the response is received on time, the status of the service is set to "online", otherwise it is set to "offline". For UDP services, it sends out a empty UDP packet and doesn't expect any reply. In case the service is down, the host running the service, sends out ICMP type 3 code 4 (destination unreachable) packet. If ovn-controller receives this ICMP packet, it sets the status of the service to "offline". Right now only IPv4 service monitoring is supported. An upcoming patch will add the support for IPv6. Acked-by: Mark Michelson Signed-off-by: Numan Siddique --- controller/ovn-controller.c | 2 + controller/pinctrl.c | 775 ++++++++++++++++++++++++++++++++-- controller/pinctrl.h | 2 + northd/ovn-northd.8.xml | 10 + northd/ovn-northd.c | 18 + tests/ovn.at | 119 ++++++ tests/system-common-macros.at | 1 + tests/system-ovn.at | 180 ++++++++ 8 files changed, 1081 insertions(+), 26 deletions(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 9ab98be5c..27cb4885b 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -2083,6 +2083,8 @@ main(int argc, char *argv[]) sbrec_dns_table_get(ovnsb_idl_loop.idl), sbrec_controller_event_table_get( ovnsb_idl_loop.idl), + sbrec_service_monitor_table_get( + ovnsb_idl_loop.idl), br_int, chassis, &ed_runtime_data.local_datapaths, &ed_runtime_data.active_tunnels); diff --git a/controller/pinctrl.c b/controller/pinctrl.c index a90ee73d6..8fc31d38a 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -38,6 +38,7 @@ #include "openvswitch/ofp-switch.h" #include "openvswitch/ofp-util.h" #include "openvswitch/vlog.h" +#include "lib/random.h" #include "lib/dhcp.h" #include "ovn-controller.h" @@ -282,6 +283,22 @@ static void run_put_vport_bindings( static void wait_put_vport_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn); static void pinctrl_handle_bind_vport(const struct flow *md, struct ofpbuf *userdata); +static void pinctrl_handle_svc_check(struct rconn *swconn, + const struct flow *ip_flow, + struct dp_packet *pkt_in, + const struct match *md); +static void init_svc_monitors(void); +static void destroy_svc_monitors(void); +static void sync_svc_monitors( + struct ovsdb_idl_txn *ovnsb_idl_txn, + const struct sbrec_service_monitor_table *svc_mon_table, + struct ovsdb_idl_index *sbrec_port_binding_by_name, + const struct sbrec_chassis *our_chassis) + OVS_REQUIRES(pinctrl_mutex); +static void svc_monitors_run(struct rconn *swconn, + long long int *svc_monitors_next_run_time) + OVS_REQUIRES(pinctrl_mutex); +static void svc_monitors_wait(long long int svc_monitors_next_run_time); COVERAGE_DEFINE(pinctrl_drop_put_mac_binding); COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map); @@ -444,6 +461,7 @@ pinctrl_init(void) init_event_table(); ip_mcast_snoop_init(); init_put_vport_bindings(); + init_svc_monitors(); pinctrl.br_int_name = NULL; pinctrl_handler_seq = seq_create(); pinctrl_main_seq = seq_create(); @@ -1981,6 +1999,13 @@ process_packet_in(struct rconn *swconn, const struct ofp_header *msg) ovs_mutex_unlock(&pinctrl_mutex); break; + case ACTION_OPCODE_HANDLE_SVC_CHECK: + ovs_mutex_lock(&pinctrl_mutex); + pinctrl_handle_svc_check(swconn, &headers, &packet, + &pin.flow_metadata); + ovs_mutex_unlock(&pinctrl_mutex); + break; + default: VLOG_WARN_RL(&rl, "unrecognized packet-in opcode %"PRIu32, ntohl(ah->opcode)); @@ -2050,6 +2075,7 @@ pinctrl_handler(void *arg_) static long long int send_garp_rarp_time = LLONG_MAX; /* Next multicast query (IGMP) in ms. */ static long long int send_mcast_query_time = LLONG_MAX; + static long long int svc_monitors_next_run_time = LLONG_MAX; swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP13_VERSION); @@ -2110,11 +2136,16 @@ pinctrl_handler(void *arg_) } } + ovs_mutex_lock(&pinctrl_mutex); + svc_monitors_run(swconn, &svc_monitors_next_run_time); + ovs_mutex_unlock(&pinctrl_mutex); + rconn_run_wait(swconn); rconn_recv_wait(swconn); send_garp_rarp_wait(send_garp_rarp_time); ipv6_ra_wait(send_ipv6_ra_time); ip_mcast_querier_wait(send_mcast_query_time); + svc_monitors_wait(svc_monitors_next_run_time); new_seq = seq_read(pinctrl_handler_seq); seq_wait(pinctrl_handler_seq, new_seq); @@ -2140,6 +2171,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_ip_multicast_opts, const struct sbrec_dns_table *dns_table, const struct sbrec_controller_event_table *ce_table, + const struct sbrec_service_monitor_table *svc_mon_table, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis, const struct hmap *local_datapaths, @@ -2174,6 +2206,8 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, sbrec_ip_multicast_opts); run_buffered_binding(sbrec_mac_binding_by_lport_ip, local_datapaths); + sync_svc_monitors(ovnsb_idl_txn, svc_mon_table, sbrec_port_binding_by_name, + chassis); ovs_mutex_unlock(&pinctrl_mutex); } @@ -2612,6 +2646,7 @@ pinctrl_destroy(void) destroy_put_vport_bindings(); destroy_dns_cache(); ip_mcast_snoop_destroy(); + destroy_svc_monitors(); seq_destroy(pinctrl_main_seq); seq_destroy(pinctrl_handler_seq); } @@ -3050,6 +3085,36 @@ send_garp_rarp(struct rconn *swconn, struct garp_rarp_data *garp_rarp, return garp_rarp->announce_time; } +static void +pinctrl_compose_ipv4(struct dp_packet *packet, struct eth_addr eth_src, + struct eth_addr eth_dst, ovs_be32 ipv4_src, + ovs_be32 ipv4_dst, uint8_t ip_proto, uint8_t ttl, + uint16_t ip_payload_len) +{ + dp_packet_clear(packet); + packet->packet_type = htonl(PT_ETH); + + struct eth_header *eh = dp_packet_put_zeros(packet, sizeof *eh); + eh->eth_dst = eth_dst; + eh->eth_src = eth_src; + + struct ip_header *nh = dp_packet_put_zeros(packet, sizeof *nh); + + eh->eth_type = htons(ETH_TYPE_IP); + dp_packet_set_l3(packet, nh); + nh->ip_ihl_ver = IP_IHL_VER(5, 4); + nh->ip_tot_len = htons(sizeof(struct ip_header) + ip_payload_len); + nh->ip_tos = IP_DSCP_CS6; + nh->ip_proto = ip_proto; + nh->ip_frag_off = htons(IP_DF); + + /* Setting tos and ttl to 0 and 1 respectively. */ + packet_set_ipv4(packet, ipv4_src, ipv4_dst, 0, ttl); + + nh->ip_csum = 0; + nh->ip_csum = csum(nh, sizeof *nh); +} + /* * Multicast snooping configuration. */ @@ -3667,32 +3732,11 @@ ip_mcast_querier_send(struct rconn *swconn, struct ip_mcast_snoop *ip_ms, struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); - - uint8_t ip_tos = 0; - uint8_t igmp_ttl = 1; - - dp_packet_clear(&packet); - packet.packet_type = htonl(PT_ETH); - - struct eth_header *eh = dp_packet_put_zeros(&packet, sizeof *eh); - eh->eth_dst = ip_ms->cfg.query_eth_dst; - eh->eth_src = ip_ms->cfg.query_eth_src; - - struct ip_header *nh = dp_packet_put_zeros(&packet, sizeof *nh); - - eh->eth_type = htons(ETH_TYPE_IP); - dp_packet_set_l3(&packet, nh); - nh->ip_ihl_ver = IP_IHL_VER(5, 4); - nh->ip_tot_len = htons(sizeof(struct ip_header) + - sizeof(struct igmpv3_query_header)); - nh->ip_tos = IP_DSCP_CS6; - nh->ip_proto = IPPROTO_IGMP; - nh->ip_frag_off = htons(IP_DF); - packet_set_ipv4(&packet, ip_ms->cfg.query_ipv4_src, - ip_ms->cfg.query_ipv4_dst, ip_tos, igmp_ttl); - - nh->ip_csum = 0; - nh->ip_csum = csum(nh, sizeof *nh); + pinctrl_compose_ipv4(&packet, ip_ms->cfg.query_eth_src, + ip_ms->cfg.query_eth_dst, + ip_ms->cfg.query_ipv4_src, + ip_ms->cfg.query_ipv4_dst, + IPPROTO_IGMP, 1, sizeof(struct igmpv3_query_header)); struct igmpv3_query_header *igh = dp_packet_put_zeros(&packet, sizeof *igh); @@ -4586,3 +4630,682 @@ pinctrl_handle_bind_vport( notify_pinctrl_main(); } + +enum svc_monitor_state { + SVC_MON_S_INIT, + SVC_MON_S_WAITING, + SVC_MON_S_ONLINE, + SVC_MON_S_OFFLINE, +}; + +enum svc_monitor_status { + SVC_MON_ST_UNKNOWN, + SVC_MON_ST_OFFLINE, + SVC_MON_ST_ONLINE, +}; + +enum svc_monitor_protocol { + SVC_MON_PROTO_TCP, + SVC_MON_PROTO_UDP, +}; + +/* Service monitor health checks. */ +struct svc_monitor { + struct hmap_node hmap_node; + struct ovs_list list_node; + + /* Should be accessed only with in the main ovn-controller + * thread. */ + const struct sbrec_service_monitor *sb_svc_mon; + + /* key */ + struct in6_addr ip; + uint32_t dp_key; + uint32_t port_key; + uint32_t proto_port; /* tcp/udp port */ + + struct eth_addr ea; + long long int timestamp; + bool is_ip6; + + long long int wait_time; + long long int next_send_time; + + struct smap options; + /* The interval, in milli seconds, between service monitor checks. */ + int interval; + + /* The time, in milli seconds, after which the service monitor check + * times out. */ + int svc_timeout; + + /* The number of successful checks after which the service is + * considered online. */ + int success_count; + int n_success; + + /* The number of failure checks after which the service is + * considered offline. */ + int failure_count; + int n_failures; + + enum svc_monitor_protocol protocol; + enum svc_monitor_state state; + enum svc_monitor_status status; + struct dp_packet pkt; + + uint32_t seq_no; + ovs_be16 tp_src; + + bool delete; +}; + +static struct hmap svc_monitors_map; +static struct ovs_list svc_monitors; + +static void +init_svc_monitors(void) +{ + hmap_init(&svc_monitors_map); + ovs_list_init(&svc_monitors); +} + +static void +destroy_svc_monitors(void) +{ + struct svc_monitor *svc; + HMAP_FOR_EACH_POP (svc, hmap_node, &svc_monitors_map) { + + } + + hmap_destroy(&svc_monitors_map); + + LIST_FOR_EACH_POP (svc, list_node, &svc_monitors) { + smap_destroy(&svc->options); + free(svc); + } +} + + +static struct svc_monitor * +pinctrl_find_svc_monitor(uint32_t dp_key, uint32_t port_key, + const struct in6_addr *ip_key, uint32_t port, + enum svc_monitor_protocol protocol, + uint32_t hash) +{ + struct svc_monitor *svc; + HMAP_FOR_EACH_WITH_HASH (svc, hmap_node, hash, &svc_monitors_map) { + if (svc->dp_key == dp_key + && svc->port_key == port_key + && svc->proto_port == port + && IN6_ARE_ADDR_EQUAL(&svc->ip, ip_key) + && svc->protocol == protocol) { + return svc; + } + } + return NULL; +} + +static void +sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn, + const struct sbrec_service_monitor_table *svc_mon_table, + struct ovsdb_idl_index *sbrec_port_binding_by_name, + const struct sbrec_chassis *our_chassis) + OVS_REQUIRES(pinctrl_mutex) +{ + bool changed = false; + struct svc_monitor *svc_mon; + + LIST_FOR_EACH (svc_mon, list_node, &svc_monitors) { + svc_mon->delete = true; + } + + const struct sbrec_service_monitor *sb_svc_mon; + SBREC_SERVICE_MONITOR_TABLE_FOR_EACH (sb_svc_mon, svc_mon_table) { + const struct sbrec_port_binding *pb + = lport_lookup_by_name(sbrec_port_binding_by_name, + sb_svc_mon->logical_port); + if (!pb) { + continue; + } + + if (pb->chassis != our_chassis) { + continue; + } + + struct in6_addr ip_addr; + ovs_be32 ip4; + if (ip_parse(sb_svc_mon->ip, &ip4)) { + ip_addr = in6_addr_mapped_ipv4(ip4); + } else { + continue; + } + + struct eth_addr ea; + bool mac_found = false; + for (size_t i = 0; i < pb->n_mac; i++) { + struct lport_addresses laddrs; + if (!extract_lsp_addresses(pb->mac[i], &laddrs)) { + continue; + } + + for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) { + if (ip4 == laddrs.ipv4_addrs[j].addr) { + ea = laddrs.ea; + mac_found = true; + break; + } + } + + if (mac_found) { + break; + } + } + + if (!mac_found) { + continue; + } + + uint32_t dp_key = pb->datapath->tunnel_key; + uint32_t port_key = pb->tunnel_key; + uint32_t hash = + hash_bytes(&ip_addr, sizeof ip_addr, + hash_3words(dp_key, port_key, sb_svc_mon->port)); + + enum svc_monitor_protocol protocol; + if (!sb_svc_mon->protocol || strcmp(sb_svc_mon->protocol, "udp")) { + protocol = SVC_MON_PROTO_TCP; + } else { + protocol = SVC_MON_PROTO_UDP; + } + + svc_mon = pinctrl_find_svc_monitor(dp_key, port_key, &ip_addr, + sb_svc_mon->port, protocol, hash); + + if (!svc_mon) { + svc_mon = xmalloc(sizeof *svc_mon); + svc_mon->dp_key = dp_key; + svc_mon->port_key = port_key; + svc_mon->proto_port = sb_svc_mon->port; + svc_mon->ip = ip_addr; + svc_mon->is_ip6 = false; + svc_mon->state = SVC_MON_S_INIT; + svc_mon->status = SVC_MON_ST_UNKNOWN; + svc_mon->protocol = protocol; + + smap_init(&svc_mon->options); + svc_mon->interval = + smap_get_int(&svc_mon->options, "interval", 5) * 1000; + svc_mon->svc_timeout = + smap_get_int(&svc_mon->options, "timeout", 3) * 1000; + svc_mon->success_count = + smap_get_int(&svc_mon->options, "success_count", 1); + svc_mon->failure_count = + smap_get_int(&svc_mon->options, "failure_count", 1); + svc_mon->n_success = 0; + svc_mon->n_failures = 0; + + hmap_insert(&svc_monitors_map, &svc_mon->hmap_node, hash); + ovs_list_push_back(&svc_monitors, &svc_mon->list_node); + changed = true; + } + + svc_mon->sb_svc_mon = sb_svc_mon; + svc_mon->ea = ea; + if (!smap_equal(&svc_mon->options, &sb_svc_mon->options)) { + smap_destroy(&svc_mon->options); + smap_clone(&svc_mon->options, &sb_svc_mon->options); + svc_mon->interval = + smap_get_int(&svc_mon->options, "interval", 5) * 1000; + svc_mon->svc_timeout = + smap_get_int(&svc_mon->options, "timeout", 3) * 1000; + svc_mon->success_count = + smap_get_int(&svc_mon->options, "success_count", 1); + svc_mon->failure_count = + smap_get_int(&svc_mon->options, "failure_count", 1); + changed = true; + } + + svc_mon->delete = false; + } + + struct svc_monitor *next; + LIST_FOR_EACH_SAFE (svc_mon, next, list_node, &svc_monitors) { + if (svc_mon->delete) { + hmap_remove(&svc_monitors_map, &svc_mon->hmap_node); + ovs_list_remove(&svc_mon->list_node); + smap_destroy(&svc_mon->options); + free(svc_mon); + changed = true; + } else if (ovnsb_idl_txn) { + /* Update the status of the service monitor. */ + if (svc_mon->status != SVC_MON_ST_UNKNOWN) { + if (svc_mon->status == SVC_MON_ST_ONLINE) { + sbrec_service_monitor_set_status(svc_mon->sb_svc_mon, + "online"); + } else { + sbrec_service_monitor_set_status(svc_mon->sb_svc_mon, + "offline"); + } + } + } + } + + if (changed) { + notify_pinctrl_handler(); + } + +} + +static uint16_t +get_random_src_port(void) +{ + uint16_t random_src_port = random_uint16(); + while (random_src_port < 1024) { + random_src_port = random_uint16(); + } + + return random_src_port; +} + +static void +svc_monitor_send_tcp_health_check__(struct rconn *swconn, + struct svc_monitor *svc_mon, + uint16_t ctl_flags, + ovs_be32 tcp_seq, + ovs_be32 tcp_ack, + ovs_be16 tcp_src) +{ + if (svc_mon->is_ip6) { + return; + } + + /* Compose a TCP-SYN packet. */ + uint64_t packet_stub[128 / 8]; + struct dp_packet packet; + + struct eth_addr eth_src; + eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); + ovs_be32 ip4_src; + ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src); + + dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); + pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, + ip4_src, in6_addr_get_mapped_ipv4(&svc_mon->ip), + IPPROTO_TCP, 63, TCP_HEADER_LEN); + + struct tcp_header *th = dp_packet_put_zeros(&packet, sizeof *th); + dp_packet_set_l4(&packet, th); + th->tcp_dst = htons(svc_mon->proto_port); + th->tcp_src = tcp_src; + + th->tcp_ctl = htons((5 << 12) | ctl_flags); + put_16aligned_be32(&th->tcp_seq, tcp_seq); + put_16aligned_be32(&th->tcp_ack, tcp_ack); + + th->tcp_winsz = htons(65160); + + uint32_t csum; + csum = packet_csum_pseudoheader(dp_packet_l3(&packet)); + csum = csum_continue(csum, th, dp_packet_size(&packet) - + ((const unsigned char *)th - + (const unsigned char *)dp_packet_eth(&packet))); + th->tcp_csum = csum_finish(csum); + + uint64_t ofpacts_stub[4096 / 8]; + struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub); + enum ofp_version version = rconn_get_version(swconn); + put_load(svc_mon->dp_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts); + put_load(svc_mon->port_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts); + put_load(1, MFF_LOG_FLAGS, MLF_LOCAL_ONLY, 1, &ofpacts); + struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts); + resubmit->in_port = OFPP_CONTROLLER; + resubmit->table_id = OFTABLE_LOCAL_OUTPUT; + + struct ofputil_packet_out po = { + .packet = dp_packet_data(&packet), + .packet_len = dp_packet_size(&packet), + .buffer_id = UINT32_MAX, + .ofpacts = ofpacts.data, + .ofpacts_len = ofpacts.size, + }; + match_set_in_port(&po.flow_metadata, OFPP_CONTROLLER); + enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version); + queue_msg(swconn, ofputil_encode_packet_out(&po, proto)); + dp_packet_uninit(&packet); + ofpbuf_uninit(&ofpacts); +} + +static void +svc_monitor_send_udp_health_check(struct rconn *swconn, + struct svc_monitor *svc_mon, + ovs_be16 udp_src) +{ + if (svc_mon->is_ip6) { + return; + } + + struct eth_addr eth_src; + eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); + ovs_be32 ip4_src; + ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src); + + uint64_t packet_stub[128 / 8]; + struct dp_packet packet; + dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); + pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, + ip4_src, in6_addr_get_mapped_ipv4(&svc_mon->ip), + IPPROTO_UDP, 63, UDP_HEADER_LEN + 8); + + struct udp_header *uh = dp_packet_put_zeros(&packet, sizeof *uh); + dp_packet_set_l4(&packet, uh); + uh->udp_dst = htons(svc_mon->proto_port); + uh->udp_src = udp_src; + uh->udp_len = htons(UDP_HEADER_LEN + 8); + uh->udp_csum = 0; + dp_packet_put_zeros(&packet, 8); + + uint64_t ofpacts_stub[4096 / 8]; + struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub); + enum ofp_version version = rconn_get_version(swconn); + put_load(svc_mon->dp_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts); + put_load(svc_mon->port_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts); + put_load(1, MFF_LOG_FLAGS, MLF_LOCAL_ONLY, 1, &ofpacts); + struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts); + resubmit->in_port = OFPP_CONTROLLER; + resubmit->table_id = OFTABLE_LOCAL_OUTPUT; + + struct ofputil_packet_out po = { + .packet = dp_packet_data(&packet), + .packet_len = dp_packet_size(&packet), + .buffer_id = UINT32_MAX, + .ofpacts = ofpacts.data, + .ofpacts_len = ofpacts.size, + }; + match_set_in_port(&po.flow_metadata, OFPP_CONTROLLER); + enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version); + queue_msg(swconn, ofputil_encode_packet_out(&po, proto)); + dp_packet_uninit(&packet); + ofpbuf_uninit(&ofpacts); +} + +static void +svc_monitor_send_health_check(struct rconn *swconn, + struct svc_monitor *svc_mon) +{ + if (svc_mon->protocol == SVC_MON_PROTO_TCP) { + svc_mon->seq_no = random_uint32(); + svc_mon->tp_src = htons(get_random_src_port()); + svc_monitor_send_tcp_health_check__(swconn, svc_mon, + TCP_SYN, + htonl(svc_mon->seq_no), htonl(0), + svc_mon->tp_src); + } else { + if (!svc_mon->tp_src) { + svc_mon->tp_src = htons(get_random_src_port()); + } + svc_monitor_send_udp_health_check(swconn, svc_mon, svc_mon->tp_src); + } + + svc_mon->wait_time = time_msec() + svc_mon->svc_timeout; + svc_mon->state = SVC_MON_S_WAITING; +} + +static void +svc_monitors_run(struct rconn *swconn, + long long int *svc_monitors_next_run_time) + OVS_REQUIRES(pinctrl_mutex) +{ + *svc_monitors_next_run_time = LLONG_MAX; + struct svc_monitor *svc_mon; + LIST_FOR_EACH (svc_mon, list_node, &svc_monitors) { + char ip_[INET6_ADDRSTRLEN + 1]; + memset(ip_, 0, INET6_ADDRSTRLEN + 1); + ipv6_string_mapped(ip_, &svc_mon->ip); + + long long int current_time = time_msec(); + long long int next_run_time = LLONG_MAX; + enum svc_monitor_status old_status = svc_mon->status; + switch (svc_mon->state) { + case SVC_MON_S_INIT: + svc_monitor_send_health_check(swconn, svc_mon); + next_run_time = svc_mon->wait_time; + break; + + case SVC_MON_S_WAITING: + if (current_time > svc_mon->wait_time) { + if (svc_mon->protocol == SVC_MON_PROTO_TCP) { + svc_mon->n_failures++; + svc_mon->state = SVC_MON_S_OFFLINE; + } else { + svc_mon->n_success++; + svc_mon->state = SVC_MON_S_ONLINE; + } + svc_mon->next_send_time = current_time + svc_mon->interval; + next_run_time = svc_mon->next_send_time; + } else { + next_run_time = svc_mon->wait_time - current_time; + next_run_time = svc_mon->wait_time; + } + break; + + case SVC_MON_S_ONLINE: + if (svc_mon->n_success >= svc_mon->success_count) { + svc_mon->status = SVC_MON_ST_ONLINE; + svc_mon->n_success = 0; + } + if (current_time >= svc_mon->next_send_time) { + svc_monitor_send_health_check(swconn, svc_mon); + next_run_time = svc_mon->wait_time; + } else { + next_run_time = svc_mon->next_send_time; + } + break; + + case SVC_MON_S_OFFLINE: + if (svc_mon->n_failures >= svc_mon->failure_count) { + svc_mon->status = SVC_MON_ST_OFFLINE; + svc_mon->n_failures = 0; + } + + if (current_time >= svc_mon->next_send_time) { + svc_monitor_send_health_check(swconn, svc_mon); + next_run_time = svc_mon->wait_time; + } else { + next_run_time = svc_mon->next_send_time; + } + break; + + default: + OVS_NOT_REACHED(); + } + + if (*svc_monitors_next_run_time > next_run_time) { + *svc_monitors_next_run_time = next_run_time; + } + + if (old_status != svc_mon->status) { + /* Notify the main thread to update the status in the SB DB. */ + notify_pinctrl_main(); + } + } +} + +static void +svc_monitors_wait(long long int svc_monitors_next_run_time) +{ + if (!ovs_list_is_empty(&svc_monitors)) { + poll_timer_wait_until(svc_monitors_next_run_time); + } +} + +static bool +pinctrl_handle_tcp_svc_check(struct rconn *swconn, + struct dp_packet *pkt_in, + struct svc_monitor *svc_mon) +{ + struct tcp_header *th = dp_packet_l4(pkt_in); + + if (!th) { + return false; + } + + uint32_t tcp_seq = ntohl(get_16aligned_be32(&th->tcp_seq)); + uint32_t tcp_ack = ntohl(get_16aligned_be32(&th->tcp_ack)); + + if (th->tcp_dst != svc_mon->tp_src) { + return false; + } + + if (tcp_ack != (svc_mon->seq_no + 1)) { + return false; + } + + /* Check for SYN flag and Ack flag. */ + if ((TCP_FLAGS(th->tcp_ctl) & (TCP_SYN | TCP_ACK)) + == (TCP_SYN | TCP_ACK)) { + svc_mon->n_success++; + svc_mon->state = SVC_MON_S_ONLINE; + + /* Send RST-ACK packet. */ + svc_monitor_send_tcp_health_check__(swconn, svc_mon, TCP_RST | TCP_ACK, + htonl(tcp_ack + 1), + htonl(tcp_seq + 1), th->tcp_dst); + /* Calculate next_send_time. */ + svc_mon->next_send_time = time_msec() + svc_mon->interval; + return true; + } + + /* Check if RST flag is set. */ + if (TCP_FLAGS(th->tcp_ctl) & TCP_RST) { + svc_mon->n_failures++; + svc_mon->state = SVC_MON_S_OFFLINE; + + /* Calculate next_send_time. */ + svc_mon->next_send_time = time_msec() + svc_mon->interval; + return false; + } + + return false; +} + +static void +pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow, + struct dp_packet *pkt_in, const struct match *md) +{ + uint32_t dp_key = ntohll(md->flow.metadata); + uint32_t port_key = md->flow.regs[MFF_LOG_INPORT - MFF_REG0]; + struct in6_addr ip_addr; + struct eth_header *in_eth = dp_packet_data(pkt_in); + struct ip_header *in_ip = dp_packet_l3(pkt_in); + + if (in_ip->ip_proto != IPPROTO_TCP && in_ip->ip_proto != IPPROTO_ICMP) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, + "handle service check: Unsupported protocol - [%x]", + in_ip->ip_proto); + return; + } + + uint16_t in_ip_len = ntohs(in_ip->ip_tot_len); + if (in_ip_len < IP_HEADER_LEN) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, + "IP packet with invalid length (%u)", + in_ip_len); + return; + } + + if (in_eth->eth_type == htons(ETH_TYPE_IP)) { + ip_addr = in6_addr_mapped_ipv4(ip_flow->nw_src); + } else { + ip_addr = ip_flow->ipv6_dst; + } + + if (in_ip->ip_proto == IPPROTO_TCP) { + uint32_t hash = + hash_bytes(&ip_addr, sizeof ip_addr, + hash_3words(dp_key, port_key, ntohs(ip_flow->tp_src))); + + struct svc_monitor *svc_mon = + pinctrl_find_svc_monitor(dp_key, port_key, &ip_addr, + ntohs(ip_flow->tp_src), + SVC_MON_PROTO_TCP, hash); + if (!svc_mon) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "handle service check: Service monitor " + "not found"); + return; + } + pinctrl_handle_tcp_svc_check(swconn, pkt_in, svc_mon); + } else { + /* It's ICMP packet. */ + struct icmp_header *ih = dp_packet_l4(pkt_in); + if (!ih) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "ICMPv4 packet with invalid header"); + return; + } + + if (ih->icmp_type != ICMP4_DST_UNREACH || ih->icmp_code != 3) { + return; + } + + const char *end = + (char *)dp_packet_l4(pkt_in) + dp_packet_l4_size(pkt_in); + + const struct ip_header *orig_ip_hr = + dp_packet_get_icmp_payload(pkt_in); + if (!orig_ip_hr) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "Original IP datagram not present in " + "ICMP packet"); + return; + } + + if (ntohs(orig_ip_hr->ip_tot_len) != + (IP_HEADER_LEN + UDP_HEADER_LEN + 8)) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "Invalid original IP datagram length present " + "in ICMP packet"); + return; + } + + struct udp_header *orig_uh = (struct udp_header *) (orig_ip_hr + 1); + if ((char *)orig_uh >= end) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "Invalid UDP header in the original " + "IP datagram"); + return; + } + + uint32_t hash = + hash_bytes(&ip_addr, sizeof ip_addr, + hash_3words(dp_key, port_key, ntohs(orig_uh->udp_dst))); + + struct svc_monitor *svc_mon = + pinctrl_find_svc_monitor(dp_key, port_key, &ip_addr, + ntohs(orig_uh->udp_dst), + SVC_MON_PROTO_UDP, hash); + if (!svc_mon) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "handle service check: Service monitor not " + "found for ICMP packet"); + return; + } + + if (orig_uh->udp_src != svc_mon->tp_src) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "handle service check: UDP src port doesn't " + "match in the Original IP datagram of ICMP packet"); + return; + } + + /* The UDP service monitor is down. */ + svc_mon->n_failures++; + svc_mon->state = SVC_MON_S_OFFLINE; + + /* Calculate next_send_time. */ + svc_mon->next_send_time = time_msec() + svc_mon->interval; + } +} diff --git a/controller/pinctrl.h b/controller/pinctrl.h index 80da28d34..8fa4baae9 100644 --- a/controller/pinctrl.h +++ b/controller/pinctrl.h @@ -30,6 +30,7 @@ struct ovsrec_bridge; struct sbrec_chassis; struct sbrec_dns_table; struct sbrec_controller_event_table; +struct sbrec_service_monitor_table; void pinctrl_init(void); void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, @@ -42,6 +43,7 @@ void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_ip_multicast_opts, const struct sbrec_dns_table *, const struct sbrec_controller_event_table *, + const struct sbrec_service_monitor_table *, const struct ovsrec_bridge *, const struct sbrec_chassis *, const struct hmap *local_datapaths, const struct sset *active_tunnels); diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 2e38e2d90..78fcf49c2 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -1009,6 +1009,16 @@ output;

      +
    • + A priorirty-110 flow with the match + eth.src == E for all logical switch + datapaths and applies the action handle_svc_check(inport). + Where E is the service monitor mac defined in the + colum of table. +
    • +
    • A priority-100 flow that punts all IGMP packets to ovn-controller if IGMP snooping is enabled on the diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index b0f513de6..fd5f60306 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -5993,6 +5993,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, } } + char *svc_check_match = xasprintf("eth.dst == %s", svc_monitor_mac); /* Ingress table 17: Destination lookup, broadcast and multicast handling * (priority 70 - 100). */ HMAP_FOR_EACH (od, key_node, datapaths) { @@ -6000,6 +6001,9 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, continue; } + ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 110, svc_check_match, + "handle_svc_check(inport);"); + struct mcast_switch_info *mcast_sw_info = &od->mcast_info.sw; if (mcast_sw_info->enabled) { @@ -6059,6 +6063,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 70, "eth.mcast", "outport = \""MC_FLOOD"\"; output;"); } + free(svc_check_match); /* Ingress table 17: Add IP multicast flows learnt from IGMP * (priority 90). */ @@ -10298,6 +10303,11 @@ static const char *rbac_mac_binding_auth[] = static const char *rbac_mac_binding_update[] = {"logical_port", "ip", "mac", "datapath"}; +static const char *rbac_svc_monitor_auth[] = + {""}; +static const char *rbac_svc_monitor_auth_update[] = + {"status"}; + static struct rbac_perm_cfg { const char *table; const char **auth; @@ -10339,6 +10349,14 @@ static struct rbac_perm_cfg { .update = rbac_mac_binding_update, .n_update = ARRAY_SIZE(rbac_mac_binding_update), .row = NULL + },{ + .table = "Service_Monitor", + .auth = rbac_svc_monitor_auth, + .n_auth = ARRAY_SIZE(rbac_svc_monitor_auth), + .insdel = false, + .update = rbac_svc_monitor_auth_update, + .n_update = ARRAY_SIZE(rbac_svc_monitor_auth_update), + .row = NULL },{ .table = NULL, .auth = NULL, diff --git a/tests/ovn.at b/tests/ovn.at index b30f12c9a..da96d1508 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -16687,5 +16687,124 @@ as hv4 ovs-ofctl show br-phys as hv4 ovs-appctl fdb/show br-phys OVN_CLEANUP([hv1],[hv2],[hv3],[hv4]) +AT_CLEANUP + +AT_SETUP([ovn -- Load balancer health checks]) +AT_KEYWORDS([lb]) +ovn_start + +net_add n1 + +sim_add hv1 +as hv1 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.1 +ovs-vsctl -- add-port br-int hv1-vif1 -- \ + set interface hv1-vif1 external-ids:iface-id=sw0-p1 \ + options:tx_pcap=hv1/vif1-tx.pcap \ + options:rxq_pcap=hv1/vif1-rx.pcap \ + ofport-request=1 +ovs-vsctl -- add-port br-int hv1-vif2 -- \ + set interface hv1-vif2 external-ids:iface-id=sw0-p2 \ + options:tx_pcap=hv1/vif2-tx.pcap \ + options:rxq_pcap=hv1/vif2-rx.pcap \ + ofport-request=2 + +sim_add hv2 +as hv2 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.2 +ovs-vsctl -- add-port br-int hv2-vif1 -- \ + set interface hv2-vif1 external-ids:iface-id=sw1-p1 \ + options:tx_pcap=hv2/vif1-tx.pcap \ + options:rxq_pcap=hv2/vif1-rx.pcap \ + ofport-request=1 + +ovn-nbctl ls-add sw0 + +ovn-nbctl lsp-add sw0 sw0-p1 +ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3" +ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3" + +ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4" +ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4" + +# Create the second logical switch with one port +ovn-nbctl ls-add sw1 +ovn-nbctl lsp-add sw1 sw1-p1 +ovn-nbctl lsp-set-addresses sw1-p1 "40:54:00:00:00:03 20.0.0.3" +ovn-nbctl lsp-set-port-security sw1-p1 "40:54:00:00:00:03 20.0.0.3" +# Create a logical router and attach both logical switches +ovn-nbctl lr-add lr0 +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24 +ovn-nbctl lsp-add sw0 sw0-lr0 +ovn-nbctl lsp-set-type sw0-lr0 router +ovn-nbctl lsp-set-addresses sw0-lr0 router +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0 + +ovn-nbctl lrp-add lr0 lr0-sw1 00:00:00:00:ff:02 20.0.0.1/24 +ovn-nbctl lsp-add sw1 sw1-lr0 +ovn-nbctl lsp-set-type sw1-lr0 router +ovn-nbctl lsp-set-addresses sw1-lr0 router +ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1 + +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80,20.0.0.3:80 + +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:10.0.0.3=sw0-p1:10.0.0.2 +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:20.0.0.3=sw1-p1:20.0.0.2 + +ovn-nbctl --wait=sb -- --id=@hc create \ +Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ +health_check @hc + +ovn-nbctl --wait=sb ls-lb-add sw0 lb1 +ovn-nbctl --wait=sb ls-lb-add sw1 lb1 +ovn-nbctl --wait=sb lr-lb-add lr0 lb1 + +OVN_POPULATE_ARP +ovn-nbctl --wait=hv sync + +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns _uuid find \ +service_monitor | sed '/^$/d' | wc -l`]) + +ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(ct_lb(10.0.0.3:80,20.0.0.3:80);) +]) + +# get the svc monitor mac. +svc_mon_src_mac=`ovn-nbctl get NB_Global . options:svc_monitor_mac | \ +sed s/":"//g | sed s/\"//g` + +OVS_WAIT_UNTIL( + [test 1 = `$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | \ +grep "505400000003${svc_mon_src_mac}" | wc -l`] +) + +OVS_WAIT_UNTIL( + [test 1 = `$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap | \ +grep "405400000003${svc_mon_src_mac}" | wc -l`] +) + +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns status find \ +service_monitor | grep offline | wc -l`]) + +OVS_WAIT_UNTIL( + [test 2 = `$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | \ +grep "505400000003${svc_mon_src_mac}" | wc -l`] +) + +OVS_WAIT_UNTIL( + [test 2 = `$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap | \ +grep "405400000003${svc_mon_src_mac}" | wc -l`] +) + +ovn-sbctl dump-flows sw0 | grep "ip4.dst == 10.0.0.10 && tcp.dst == 80" \ +| grep priority=120 > lflows.txt +AT_CHECK([cat lflows.txt], [0], [dnl + table=10(ls_in_stateful ), priority=120 , match=(ct.new && ip4.dst == 10.0.0.10 && tcp.dst == 80), action=(drop;) +]) + +OVN_CLEANUP([hv1], [hv2]) AT_CLEANUP diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at index 64bf5ec63..c8fa6f03f 100644 --- a/tests/system-common-macros.at +++ b/tests/system-common-macros.at @@ -267,6 +267,7 @@ m4_define([OVS_CHECK_FIREWALL], # m4_define([OVS_START_L7], [PIDFILE=$(mktemp $2XXX.pid) + echo $PIDFILE > l7_pid_file NETNS_DAEMONIZE([$1], [[$PYTHON $srcdir/test-l7.py $2]], [$PIDFILE]) dnl netstat doesn't print http over IPv6 as "http6"; drop the number. diff --git a/tests/system-ovn.at b/tests/system-ovn.at index b3f90aae2..7d1c65d85 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -2523,3 +2523,183 @@ as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d /connection dropped.*/d"]) AT_CLEANUP + +AT_SETUP([ovn -- Load balancer health checks]) +AT_KEYWORDS([lb]) +ovn_start + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-int]) + +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller + +ovn-nbctl ls-add sw0 + +ovn-nbctl lsp-add sw0 sw0-p1 +ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3" +ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3" + +ovn-nbctl lsp-add sw0 sw0-p2 +ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4" +ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4" + +# Create the second logical switch with one port +ovn-nbctl ls-add sw1 +ovn-nbctl lsp-add sw1 sw1-p1 +ovn-nbctl lsp-set-addresses sw1-p1 "40:54:00:00:00:03 20.0.0.3" +ovn-nbctl lsp-set-port-security sw1-p1 "40:54:00:00:00:03 20.0.0.3" + +# Create a logical router and attach both logical switches +ovn-nbctl lr-add lr0 +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24 +ovn-nbctl lsp-add sw0 sw0-lr0 +ovn-nbctl lsp-set-type sw0-lr0 router +ovn-nbctl lsp-set-addresses sw0-lr0 router +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0 + +ovn-nbctl lrp-add lr0 lr0-sw1 00:00:00:00:ff:02 20.0.0.1/24 +ovn-nbctl lsp-add sw1 sw1-lr0 +ovn-nbctl lsp-set-type sw1-lr0 router +ovn-nbctl lsp-set-addresses sw1-lr0 router +ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1 + +ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80,20.0.0.3:80 + +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:10.0.0.3=sw0-p1:10.0.0.2 +ovn-nbctl --wait=sb set load_balancer . ip_port_mappings:20.0.0.3=sw1-p1:20.0.0.2 + +ovn-nbctl --wait=sb -- --id=@hc create \ +Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ +health_check @hc + +ovn-nbctl --wait=sb ls-lb-add sw0 lb1 +ovn-nbctl --wait=sb ls-lb-add sw1 lb1 +ovn-nbctl --wait=sb lr-lb-add lr0 lb1 + +OVN_POPULATE_ARP +ovn-nbctl --wait=hv sync + +ADD_NAMESPACES(sw0-p1) +ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \ + "10.0.0.1") + +ADD_NAMESPACES(sw1-p1) +ADD_VETH(sw1-p1, sw1-p1, br-int, "20.0.0.3/24", "40:54:00:00:00:03", \ + "20.0.0.1") + +ADD_NAMESPACES(sw0-p2) +ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \ + "10.0.0.1") + +# Wait until all the services are set to offline. +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns status find \ +service_monitor | sed '/^$/d' | grep offline | wc -l`]) + +# Start webservers in 'sw0-p1' and 'sw1-p1'. +OVS_START_L7([sw0-p1], [http]) +sw0_p1_pid_file=`cat l7_pid_file` +OVS_START_L7([sw1-p1], [http]) + +# Wait until the services are set to online. +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns status find \ +service_monitor | sed '/^$/d' | grep online | wc -l`]) + +OVS_WAIT_UNTIL( + [ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 | grep "ip4.dst == 10.0.0.10" > lflows.txt + test 1 = `cat lflows.txt | grep "ct_lb(10.0.0.3:80,20.0.0.3:80)" | wc -l`] +) + +# From sw0-p2 send traffic to vip - 10.0.0.10 +for i in `seq 1 20`; do + echo Request $i + ovn-sbctl list service_monitor + NS_CHECK_EXEC([sw0-p2], [wget 10.0.0.10 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) +done + +dnl Each server should have at least one connection. +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.0.0.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl +tcp,orig=(src=10.0.0.4,dst=10.0.0.10,sport=,dport=),reply=(src=10.0.0.3,dst=10.0.0.4,sport=,dport=),zone=,protoinfo=(state=) +tcp,orig=(src=10.0.0.4,dst=10.0.0.10,sport=,dport=),reply=(src=20.0.0.3,dst=10.0.0.4,sport=,dport=),zone=,protoinfo=(state=) +]) + +# Stop webserer in sw0-p1 +kill `cat $sw0_p1_pid_file` + +# Wait until service_monitor for sw0-p1 is set to offline +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl --bare --columns status find \ +service_monitor logical_port=sw0-p1 | sed '/^$/d' | grep offline | wc -l`]) + +OVS_WAIT_UNTIL( + [ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 | grep "ip4.dst == 10.0.0.10" > lflows.txt + test 1 = `cat lflows.txt | grep "ct_lb(20.0.0.3:80)" | wc -l`] +) + +ovs-appctl dpctl/flush-conntrack +# From sw0-p2 send traffic to vip - 10.0.0.10 +for i in `seq 1 20`; do + echo Request $i + NS_CHECK_EXEC([sw0-p2], [wget 10.0.0.10 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) +done + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.0.0.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl +tcp,orig=(src=10.0.0.4,dst=10.0.0.10,sport=,dport=),reply=(src=20.0.0.3,dst=10.0.0.4,sport=,dport=),zone=,protoinfo=(state=) +]) + +# Create udp load balancer. +ovn-nbctl lb-add lb2 10.0.0.10:80 10.0.0.3:80,20.0.0.3:80 udp +lb_udp=`ovn-nbctl lb-list | grep udp | awk '{print $1}'` + +echo "lb udp uuid = $lb_udp" + +ovn-nbctl list load_balancer + +ovn-nbctl --wait=sb set load_balancer $lb_udp ip_port_mappings:10.0.0.3=sw0-p1:10.0.0.2 +ovn-nbctl --wait=sb set load_balancer $lb_udp ip_port_mappings:20.0.0.3=sw1-p1:20.0.0.2 + +ovn-nbctl --wait=sb -- --id=@hc create \ +Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer $lb_udp \ +health_check @hc + +ovn-nbctl --wait=sb ls-lb-add sw0 lb2 +ovn-nbctl --wait=sb ls-lb-add sw1 lb2 +ovn-nbctl --wait=sb lr-lb-add lr0 lb2 + +sleep 10 + +ovn-nbctl list load_balancer +echo "*******Next is health check*******" +ovn-nbctl list Load_Balancer_Health_Check +echo "********************" +ovn-sbctl list service_monitor + +# Wait until udp service_monitor are set to offline +OVS_WAIT_UNTIL([test 2 = `ovn-sbctl --bare --columns status find \ +service_monitor protocol=udp | sed '/^$/d' | grep offline | wc -l`]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d +/connection dropped.*/d"]) + +AT_CLEANUP