From patchwork Fri Jul 29 06:26:17 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Justin Pettit X-Patchwork-Id: 653997 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3s0zKp3m4fz9t0G for ; Fri, 29 Jul 2016 16:27:26 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 350EB114C4; Thu, 28 Jul 2016 23:26:48 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id 6B976114A4 for ; Thu, 28 Jul 2016 23:26:43 -0700 (PDT) Received: from bar5.cudamail.com (localhost [127.0.0.1]) by mx1e3.cudamail.com (Postfix) with ESMTPS id 043EC4201B3 for ; Fri, 29 Jul 2016 00:26:43 -0600 (MDT) X-ASG-Debug-ID: 1469773602-09eadd7ae938ea80001-byXFYA Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com with ESMTP id hqXC1QQQVScjGipE (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Fri, 29 Jul 2016 00:26:42 -0600 (MDT) X-Barracuda-Envelope-From: jpettit@ovn.org X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1 Received: from unknown (HELO relay5-d.mail.gandi.net) (217.70.183.197) by mx1-pf1.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted); 29 Jul 2016 06:26:41 -0000 Received-SPF: pass (mx1-pf1.cudamail.com: SPF record at ovn.org designates 217.70.183.197 as permitted sender) X-Barracuda-Apparent-Source-IP: 217.70.183.197 X-Barracuda-RBL-IP: 217.70.183.197 Received: from mfilter43-d.gandi.net (mfilter43-d.gandi.net [217.70.178.174]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id A7E4F41C097 for ; Fri, 29 Jul 2016 08:26:40 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter43-d.gandi.net Received: from relay5-d.mail.gandi.net ([IPv6:::ffff:217.70.183.197]) by mfilter43-d.gandi.net (mfilter43-d.gandi.net [::ffff:10.0.15.180]) (amavisd-new, port 10024) with ESMTP id rvYG8bXgMv0f for ; Fri, 29 Jul 2016 08:26:38 +0200 (CEST) X-Originating-IP: 108.70.244.32 Received: from raznick.localdomain (unknown [108.70.244.32]) (Authenticated sender: jpettit@ovn.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id A0DE541C08F for ; Fri, 29 Jul 2016 08:26:37 +0200 (CEST) X-CudaMail-Envelope-Sender: jpettit@ovn.org From: Justin Pettit To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-E1-728000359 X-CudaMail-DTE: 072916 X-CudaMail-Originating-IP: 217.70.183.197 Date: Thu, 28 Jul 2016 23:26:17 -0700 X-ASG-Orig-Subj: [##CM-E1-728000359##][IPv6 v2 07/10] ovn-northd: Implement basic IPv6 routing. Message-Id: <1469773580-33112-7-git-send-email-jpettit@ovn.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1469773580-33112-1-git-send-email-jpettit@ovn.org> References: <1469773580-33112-1-git-send-email-jpettit@ovn.org> MIME-Version: 1.0 X-Barracuda-Connect: UNKNOWN[192.168.24.1] X-Barracuda-Start-Time: 1469773602 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Subject: [ovs-dev] [IPv6 v2 07/10] ovn-northd: Implement basic IPv6 routing. X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@openvswitch.org Sender: "dev" This commit only supports static MAC bindings. A future commit will add support for dynamic IPv6/MAC bindings. It has a few other limitations described in "ovn/TODO". Signed-off-by: Justin Pettit Acked-by: Ben Pfaff --- ovn/TODO | 19 +- ovn/northd/ovn-northd.8.xml | 196 +++++++++++++++--- ovn/northd/ovn-northd.c | 476 +++++++++++++++++++++++++++++++++----------- 3 files changed, 536 insertions(+), 155 deletions(-) diff --git a/ovn/TODO b/ovn/TODO index fd15f5d..dab8648 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -40,11 +40,24 @@ the "arp" action, and an action for generating ** IPv6 -*** ND versus ARP +*** Drop invalid source IPv6 addresses -*** IPv6 routing +*** Don't forward non-routable addresses -*** ICMPv6 +*** ICMPv6 action + +Similar to the ICMPv4 action, ICMPv6 messages should be generated. + +*** Neighbor Discovery + +**** ND Router Advertisements + +The router ports should periodically send out ND Router Advertisements +and respond to Router Solicitations. + +**** Learn MAC bindings on ND Solicitations + +**** Properly set RSO flags ** Dynamic IP to MAC bindings diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 85e0e66..6d9b6ca 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -412,10 +412,10 @@ -

Ingress Table 9: ARP responder

+

Ingress Table 9: ARP/ND responder

- This table implements ARP responder for known IPs. It contains these + This table implements ARP/ND responder for known IPs. It contains these logical flows:

@@ -427,8 +427,8 @@
  • - Priority-50 flows that matches ARP requests to each known IP address - A of logical port P, and respond with ARP + Priority-50 flows that match ARP requests to each known IP address + A of every logical router port, and respond with ARP replies directly with corresponding Ethernet address E:

    @@ -440,7 +440,7 @@ arp.tha = arp.sha; arp.sha = E; arp.tpa = arp.spa; arp.spa = A; -outport = P; +outport = inport; inport = ""; /* Allow sending out inport. */ output; @@ -452,6 +452,33 @@ output;
  • +

    + Priority-50 flows that match IPv6 ND neighbor solicitations to + each known IP address A (and A's + solicited node address) of every logical router port, and + respond with neighbor advertisements directly with + corresponding Ethernet address E: +

    + +
    +nd_na {
    +    eth.src = E;
    +    ip6.src = A;
    +    nd.target = A;
    +    nd.tll = E;
    +    outport = inport;
    +    inport = ""; /* Allow sending out inport. */
    +    output;
    +};
    +        
    + +

    + These flows are omitted for logical ports (other than router ports) + that are down. +

    +
  • + +
  • One priority-0 fallback flow that matches all packets and advances to the next table.
  • @@ -712,7 +739,8 @@ output; network source or destination)
  • - ip4.src is any IP address owned by the router. + ip4.src or ip6.src is any IP + address owned by the router.
  • ip4.src is the broadcast address of any IP network @@ -725,13 +753,19 @@ output;

    ICMP echo reply. These flows reply to ICMP echo requests received for the router's IP address. Let A be an IP address - owned by a router port. Then, for each A, a priority-90 - flow matches on ip4.dst == A and - icmp4.type == 8 && icmp4.code == 0 (ICMP echo - request). The port of the router that receives the echo request - does not matter. Also, the ip.ttl of the echo request packet is not - checked, so it complies with RFC 1812, section 4.2.2.9. These flows - use the following actions: + owned by a router port. Then, for each A that is + an IPv4 address, a priority-90 flow matches on + ip4.dst == A and + icmp4.type == 8 && icmp4.code == 0 + (ICMP echo request). For each A that is an IPv6 + address, a priority-90 flow matches on + ip6.dst == A and + icmp6.type == 128 && icmp6.code == 0 + (ICMPv6 echo request).The port of the router that receives the + echo request does not matter. Also, the ip.ttl of the echo + request packet is not checked, so it complies with RFC 1812, + section 4.2.2.9. Flows for ICMPv4 echo requests use the + following actions:

    @@ -741,6 +775,18 @@ icmp4.type = 0;
     inport = ""; /* Allow sending out inport. */
     next;
             
    + +

    + Flows for ICMPv6 echo requests use the following actions: +

    + +
    +ip6.dst <-> ip6.src;
    +ip.ttl = 255;
    +icmp6.type = 129;
    +inport = ""; /* Allow sending out inport. */
    +next;
    +        
  • @@ -805,6 +851,35 @@ output;
  • + Reply to IPv6 Neighbor Solicitations. +

    + +

    + These flows reply to Neighbor Solictation requests for the + router's own IPv6 address. For each router port P + that owns IPv6 address A, solicited node address + S, and Ethernet address + E, a priority-90 flow matches inport == + P && nd_ns && ip6.dst == {A, + E} && nd.target == A + with the following actions: +

    + +
    +nd_na {
    +    eth.src = E;
    +    ip6.src = A;
    +    nd.target = A;
    +    nd.tll = E;
    +    outport = inport;
    +    inport = \"\"; /* Allow sending out inport. */
    +    output;
    +};
    +        
    +
  • + +
  • +

    UDP port unreachable. Priority-80 flows generate ICMP port unreachable messages in reply to UDP datagrams directed to the router's IP address. The logical router doesn't accept any UDP @@ -976,14 +1051,16 @@ icmp4 {

    Ingress Table 4: IP Routing

    - A packet that arrives at this table is an IP packet that should be routed - to the address in ip4.dst. This table implements IP - routing, setting reg0 to the next-hop IP address (leaving - ip4.dst, the packet's final destination, unchanged) and - advances to the next table for ARP resolution. It also sets - reg1 to the IP address owned by the selected router port - (which is used later in table 6 as the IP source address for an ARP - request, if needed). + A packet that arrives at this table is an IP packet that should be + routed to the address in ip4.dst or + ip6.dst. This table implements IP routing, setting + reg0 (or xxreg0 for IPv6) to the next-hop IP + address (leaving ip4.dst or ip6.dst, the + packet's final destination, unchanged) and advances to the next + table for ARP resolution. It also sets reg1 (or + xxreg1 to the IP address owned by the selected router + port (which is used later in table 6 as the IP source address for + an ARP request, if needed).

    @@ -993,7 +1070,7 @@ icmp4 {

    • - Routing table. For each route to IPv4 network N with + IPv4 routing table. For each route to IPv4 network N with netmask M, on router port P with IP address A and Ethernet address E, a logical flow with match ip4.dst == @@ -1023,6 +1100,39 @@ next;

    • +
    • +

      + IPv6 routing table. For each route to IPv6 network + N with netmask M, on router port + P with IP address A and Ethernet address + E, a logical flow with match in CIDR notation + ip6.dst == N/M, + whose priority is the integer value of M, has the + following actions: +

      + +
      +ip.ttl--;
      +xxreg0 = G;
      +xxreg1 = A;
      +eth.src = E;
      +outport = P;
      +inport = ""; /* Allow sending out inport. */
      +next;
      +        
      + +

      + (Ingress table 1 already verified that ip.ttl--; will + not yield a TTL exceeded error.) +

      + +

      + If the route has a gateway, G is the gateway IP address. + Instead, if the route is from a configured static route, G + is the next hop IP address. Else it is ip6.dst. +

      +
    • +
    -

    Ingress Table 5: ARP Resolution

    +

    Ingress Table 5: ARP/ND Resolution

    - Any packet that reaches this table is an IP packet whose next-hop IP - address is in reg0. (ip4.dst is the final - destination.) This table resolves the IP address in reg0 - into an output port in outport and an Ethernet address in - eth.dst, using the following flows: + Any packet that reaches this table is an IP packet whose next-hop + IPv4 address is in reg0 or IPv6 address is in + xxreg0. (ip4.dst or + ip6.dst contains the final destination.) This table + resolves the IP address in reg0 (or + xxreg0) into an output port in outport + and an Ethernet address in eth.dst, using the + following flows:

      @@ -1080,18 +1193,35 @@ icmp4 {

      - For each IP address A whose host is known to have Ethernet - address E on router port P, a priority-100 flow - with match outport === P && reg0 == + For each IPv4 address A whose host is known to have + Ethernet address E on router port P, a + priority-100 flow with match outport === P + && reg0 == A has actions + eth.dst = E; next;. +

      + +

      + For each IPv6 address A whose host is known to have + Ethernet address E on router port P, a + priority-100 flow with match outport === P + && xxreg0 == A has actions + eth.dst = E; next;. +

      + +

      + For each logical router port with an IPv4 address A and + a mac address of E that is reachable via a different + logical router port P, a priority-100 flow with + match outport === P && reg0 == A has actions eth.dst = E; next;.

      - For each logical router port with an IP address A and + For each logical router port with an IPv6 address A and a mac address of E that is reachable via a different logical router port P, a priority-100 flow with - match outport === P && reg0 == + match outport === P && xxreg0 == A has actions eth.dst = E; next;.

      diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index af13ec2..1fe73bf 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2478,35 +2478,34 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, ds_cstr(&match), ds_cstr(&actions)); } - if (op->lsp_addrs[i].n_ipv6_addrs > 0) { + /* For ND solicitations, we need to listen for both the + * unicast IPv6 address and its all-nodes multicast address, + * but always respond with the unicast IPv6 address. */ + for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) { ds_clear(&match); - ds_put_cstr(&match, "icmp6 && icmp6.type == 135 && "); - if (op->lsp_addrs[i].n_ipv6_addrs == 1) { - ds_put_format(&match, "nd.target == %s", - op->lsp_addrs[i].ipv6_addrs[0].addr_s); - } else { - ds_put_format(&match, "nd.target == {"); - for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) { - ds_put_format(&match, "%s, ", - op->lsp_addrs[i].ipv6_addrs[j].addr_s); - } - ds_chomp(&match, ' '); - ds_chomp(&match, ','); - ds_put_cstr(&match, "}"); - } + ds_put_format(&match, + "nd_ns && ip6.dst == {%s, %s} && nd.target == %s", + op->lsp_addrs[i].ipv6_addrs[j].addr_s, + op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s, + op->lsp_addrs[i].ipv6_addrs[j].addr_s); + ds_clear(&actions); ds_put_format(&actions, - "nd_na { eth.src = %s; " - "nd.tll = %s; " - "outport = inport; " - "inport = \"\"; /* Allow sending out inport. */ " - "output; };", - op->lsp_addrs[i].ea_s, - op->lsp_addrs[i].ea_s); - + "nd_na { " + "eth.src = %s; " + "ip6.src = %s; " + "nd.target = %s; " + "nd.tll = %s; " + "outport = inport; " + "inport = \"\"; /* Allow sending out inport. */ " + "output; " + "};", + op->lsp_addrs[i].ea_s, + op->lsp_addrs[i].ipv6_addrs[j].addr_s, + op->lsp_addrs[i].ipv6_addrs[j].addr_s, + op->lsp_addrs[i].ea_s); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50, ds_cstr(&match), ds_cstr(&actions)); - } } } @@ -2723,23 +2722,49 @@ lrport_is_enabled(const struct nbrec_logical_router_port *lrport) static const char * find_lrp_member_ip(const struct ovn_port *op, const char *ip_s) { - ovs_be32 ip; + bool is_ipv4 = strchr(ip_s, '.') ? true : false; - if (!ip_parse(ip_s, &ip)) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); - VLOG_WARN_RL(&rl, "bad ip address %s", ip_s); - return NULL; - } + if (is_ipv4) { + ovs_be32 ip; - for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { - const struct ipv4_netaddr *na = &op->lrp_networks.ipv4_addrs[i]; + if (!ip_parse(ip_s, &ip)) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad ip address %s", ip_s); + return NULL; + } + + for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { + const struct ipv4_netaddr *na = &op->lrp_networks.ipv4_addrs[i]; + + if (!((na->network ^ ip) & na->mask)) { + /* There should be only 1 interface that matches the + * supplied IP. Otherwise, it's a configuration error, + * because subnets of a router's interfaces should NOT + * overlap. */ + return na->addr_s; + } + } + } else { + struct in6_addr ip6; + + if (!ipv6_parse(ip_s, &ip6)) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad ipv6 address %s", ip_s); + return NULL; + } - if (!((na->network ^ ip) & na->mask)) { - /* There should be only 1 interface that matches the - * next hop. Otherwise, it's a configuration error, - * because subnets of router's interfaces should NOT - * overlap. */ - return na->addr_s; + for (int i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { + const struct ipv6_netaddr *na = &op->lrp_networks.ipv6_addrs[i]; + struct in6_addr xor_addr = ipv6_addr_bitxor(&na->network, &ip6); + struct in6_addr and_addr = ipv6_addr_bitand(&xor_addr, &na->mask); + + if (ipv6_is_zero(&and_addr)) { + /* There should be only 1 interface that matches the + * supplied IP. Otherwise, it's a configuration error, + * because subnets of a router's interfaces should NOT + * overlap. */ + return na->addr_s; + } } } @@ -2751,21 +2776,26 @@ add_route(struct hmap *lflows, const struct ovn_port *op, const char *lrp_addr_s, const char *network_s, int plen, const char *gateway) { - char *match = xasprintf("ip4.dst == %s/%d", network_s, plen); + bool is_ipv4 = strchr(network_s, '.') ? true : false; + + char *match = xasprintf("ip%s.dst == %s/%d", is_ipv4 ? "4" : "6", + network_s, plen); struct ds actions = DS_EMPTY_INITIALIZER; - ds_put_cstr(&actions, "ip.ttl--; reg0 = "); + ds_put_format(&actions, "ip.ttl--; %sreg0 = ", is_ipv4 ? "" : "xx"); + if (gateway) { ds_put_cstr(&actions, gateway); } else { - ds_put_cstr(&actions, "ip4.dst"); + ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6"); } ds_put_format(&actions, "; " - "reg1 = %s; " + "%sreg1 = %s; " "eth.src = %s; " "outport = %s; " "inport = \"\"; /* Allow sending out inport. */ " "next;", + is_ipv4 ? "" : "xx", lrp_addr_s, op->lrp_networks.ea_s, op->json_key); @@ -2783,26 +2813,68 @@ build_static_route_flow(struct hmap *lflows, struct ovn_datapath *od, struct hmap *ports, const struct nbrec_logical_router_static_route *route) { - ovs_be32 prefix, nexthop, mask; + ovs_be32 nexthop; const char *lrp_addr_s; + unsigned int plen; + bool is_ipv4; - /* Verify that next hop is an IP address with 32 bits mask. */ - char *error = ip_parse_masked(route->nexthop, &nexthop, &mask); - if (error || mask != OVS_BE32_MAX) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); - VLOG_WARN_RL(&rl, "bad next hop ip address %s", route->nexthop); + /* Verify that the next hop is an IP address with an all-ones mask. */ + char *error = ip_parse_cidr(route->nexthop, &nexthop, &plen); + if (!error) { + if (plen != 32) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad next hop mask %s", route->nexthop); + return; + } + is_ipv4 = true; + } else { free(error); - return; + + struct in6_addr ip6; + char *error = ipv6_parse_cidr(route->nexthop, &ip6, &plen); + if (!error) { + if (plen != 128) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad next hop mask %s", route->nexthop); + return; + } + is_ipv4 = false; + } else { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad next hop ip address %s", route->nexthop); + free(error); + return; + } } - /* Verify that ip prefix is a valid CIDR address. */ - error = ip_parse_masked(route->ip_prefix, &prefix, &mask); - if (error || !ip_is_cidr(mask)) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); - VLOG_WARN_RL(&rl, "bad 'ip_prefix' in static routes %s", - route->ip_prefix); - free(error); - return; + char *prefix_s; + if (is_ipv4) { + ovs_be32 prefix; + /* Verify that ip prefix is a valid IPv4 address. */ + error = ip_parse_cidr(route->ip_prefix, &prefix, &plen); + if (error) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad 'ip_prefix' in static routes %s", + route->ip_prefix); + free(error); + return; + } + prefix_s = xasprintf(IP_FMT, IP_ARGS(prefix & be32_prefix_mask(plen))); + } else { + /* Verify that ip prefix is a valid IPv6 address. */ + struct in6_addr prefix; + error = ipv6_parse_cidr(route->ip_prefix, &prefix, &plen); + if (error) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad 'ip_prefix' in static routes %s", + route->ip_prefix); + free(error); + return; + } + struct in6_addr mask = ipv6_create_mask(plen); + struct in6_addr network = ipv6_addr_bitand(&prefix, &mask); + prefix_s = xmalloc(INET6_ADDRSTRLEN); + inet_ntop(AF_INET6, &network, prefix_s, INET6_ADDRSTRLEN); } /* Find the outgoing port. */ @@ -2813,7 +2885,7 @@ build_static_route_flow(struct hmap *lflows, struct ovn_datapath *od, static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); VLOG_WARN_RL(&rl, "Bad out port %s for static route %s", route->output_port, route->ip_prefix); - return; + goto free_prefix_s; } lrp_addr_s = find_lrp_member_ip(out_port, route->nexthop); } else { @@ -2840,17 +2912,17 @@ build_static_route_flow(struct hmap *lflows, struct ovn_datapath *od, static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); VLOG_WARN_RL(&rl, "No path for static route %s; next hop %s", route->ip_prefix, route->nexthop); - return; + goto free_prefix_s; } - char *prefix_s = xasprintf(IP_FMT, IP_ARGS(prefix & mask)); - add_route(lflows, out_port, lrp_addr_s, prefix_s, - ip_count_cidr_bits(mask), route->nexthop); + add_route(lflows, out_port, lrp_addr_s, prefix_s, plen, route->nexthop); + +free_prefix_s: free(prefix_s); } static void -op_put_networks(struct ds *ds, const struct ovn_port *op, bool add_bcast) +op_put_v4_networks(struct ds *ds, const struct ovn_port *op, bool add_bcast) { if (!add_bcast && op->lrp_networks.n_ipv4_addrs == 1) { ds_put_format(ds, "%s", op->lrp_networks.ipv4_addrs[0].addr_s); @@ -2870,6 +2942,23 @@ op_put_networks(struct ds *ds, const struct ovn_port *op, bool add_bcast) } static void +op_put_v6_networks(struct ds *ds, const struct ovn_port *op) +{ + if (op->lrp_networks.n_ipv6_addrs == 1) { + ds_put_format(ds, "%s", op->lrp_networks.ipv6_addrs[0].addr_s); + return; + } + + ds_put_cstr(ds, "{"); + for (int i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { + ds_put_format(ds, "%s, ", op->lrp_networks.ipv6_addrs[i].addr_s); + } + ds_chomp(ds, ' '); + ds_chomp(ds, ','); + ds_put_cstr(ds, "}"); +} + +static void build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, struct hmap *lflows) { @@ -2953,39 +3042,43 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_INPUT, 0, "1", "next;"); } + /* Logical router ingress table 1: IP Input for IPv4. */ HMAP_FOR_EACH (op, key_node, ports) { if (!op->nbrp) { continue; } - /* L3 admission control: drop packets that originate from an IP address - * owned by the router or a broadcast address known to the router - * (priority 100). */ - ds_clear(&match); - ds_put_cstr(&match, "ip4.src == "); - op_put_networks(&match, op, true); - ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 100, - ds_cstr(&match), "drop;"); - /* ICMP echo reply. These flows reply to ICMP echo requests - * received for the router's IP address. Since packets only - * get here as part of the logical router datapath, the inport - * (i.e. the incoming locally attached net) does not matter. - * The ip.ttl also does not matter (RFC1812 section 4.2.2.9) */ - ds_clear(&match); - ds_put_cstr(&match, "ip4.dst == "); - op_put_networks(&match, op, false); - ds_put_cstr(&match, " && icmp4.type == 8 && icmp4.code == 0"); - - ds_clear(&actions); - ds_put_format(&actions, - "ip4.dst <-> ip4.src; " - "ip.ttl = 255; " - "icmp4.type = 0; " - "inport = \"\"; /* Allow sending out inport. */ " - "next; "); - ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, - ds_cstr(&match), ds_cstr(&actions)); + if (op->lrp_networks.n_ipv4_addrs) { + /* L3 admission control: drop packets that originate from an + * IPv4 address owned by the router or a broadcast address + * known to the router (priority 100). */ + ds_clear(&match); + ds_put_cstr(&match, "ip4.src == "); + op_put_v4_networks(&match, op, true); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 100, + ds_cstr(&match), "drop;"); + + /* ICMP echo reply. These flows reply to ICMP echo requests + * received for the router's IP address. Since packets only + * get here as part of the logical router datapath, the inport + * (i.e. the incoming locally attached net) does not matter. + * The ip.ttl also does not matter (RFC1812 section 4.2.2.9) */ + ds_clear(&match); + ds_put_cstr(&match, "ip4.dst == "); + op_put_v4_networks(&match, op, false); + ds_put_cstr(&match, " && icmp4.type == 8 && icmp4.code == 0"); + + ds_clear(&actions); + ds_put_format(&actions, + "ip4.dst <-> ip4.src; " + "ip.ttl = 255; " + "icmp4.type = 0; " + "inport = \"\"; /* Allow sending out inport. */ " + "next; "); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, + ds_cstr(&match), ds_cstr(&actions)); + } /* ARP reply. These flows reply to ARP requests for the router's own * IP address. */ @@ -3096,6 +3189,78 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, free(snat_ips); } + /* Logical router ingress table 1: IP Input for IPv6. */ + HMAP_FOR_EACH (op, key_node, ports) { + if (!op->nbrp) { + continue; + } + + if (op->lrp_networks.n_ipv6_addrs) { + /* L3 admission control: drop packets that originate from an + * IPv6 address owned by the router (priority 100). */ + ds_clear(&match); + ds_put_cstr(&match, "ip6.src == "); + op_put_v6_networks(&match, op); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 100, + ds_cstr(&match), "drop;"); + + /* ICMPv6 echo reply. These flows reply to echo requests + * received for the router's IP address. */ + ds_clear(&match); + ds_put_cstr(&match, "ip6.dst == "); + op_put_v6_networks(&match, op); + ds_put_cstr(&match, " && icmp6.type == 128 && icmp6.code == 0"); + + ds_clear(&actions); + ds_put_cstr(&actions, + "ip6.dst <-> ip6.src; " + "ip.ttl = 255; " + "icmp6.type = 129; " + "inport = \"\"; /* Allow sending out inport. */ " + "next; "); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, + ds_cstr(&match), ds_cstr(&actions)); + + /* Drop IPv6 traffic to this router. */ + ds_clear(&match); + ds_put_cstr(&match, "ip6.dst == "); + op_put_v6_networks(&match, op); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 60, + ds_cstr(&match), "drop;"); + } + + /* ND reply. These flows reply to ND solicitations for the + * router's own IP address. */ + for (int i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { + ds_clear(&match); + ds_put_format(&match, + "inport == %s && nd_ns && ip6.dst == {%s, %s} " + "&& nd.target == %s", + op->json_key, + op->lrp_networks.ipv6_addrs[i].addr_s, + op->lrp_networks.ipv6_addrs[i].sn_addr_s, + op->lrp_networks.ipv6_addrs[i].addr_s); + + ds_clear(&actions); + ds_put_format(&actions, + "nd_na { " + "eth.src = %s; " + "ip6.src = %s; " + "nd.target = %s; " + "nd.tll = %s; " + "outport = inport; " + "inport = \"\"; /* Allow sending out inport. */ " + "output; " + "};", + op->lrp_networks.ea_s, + op->lrp_networks.ipv6_addrs[i].addr_s, + op->lrp_networks.ipv6_addrs[i].addr_s, + op->lrp_networks.ea_s); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, + ds_cstr(&match), ds_cstr(&actions)); + } + } + /* NAT in Gateway routers. */ HMAP_FOR_EACH (od, key_node, datapaths) { if (!od->nbr) { @@ -3226,10 +3391,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, /* Logical router ingress table 4: IP Routing. * * A packet that arrives at this table is an IP packet that should be - * routed to the address in ip4.dst. This table sets outport to the correct - * output port, eth.src to the output port's MAC address, and reg0 to the - * next-hop IP address (leaving ip4.dst, the packet’s final destination, - * unchanged), and advances to the next table for ARP resolution. */ + * routed to the address in 'ip[46].dst'. This table sets outport to + * the correct output port, eth.src to the output port's MAC + * address, and '[xx]reg0' to the next-hop IP address (leaving + * 'ip[46].dst', the packet’s final destination, unchanged), and + * advances to the next table for ARP/ND resolution. */ HMAP_FOR_EACH (op, key_node, ports) { if (!op->nbrp) { continue; @@ -3240,14 +3406,20 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, op->lrp_networks.ipv4_addrs[i].network_s, op->lrp_networks.ipv4_addrs[i].plen, NULL); } + + for (int i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { + add_route(lflows, op, op->lrp_networks.ipv6_addrs[i].addr_s, + op->lrp_networks.ipv6_addrs[i].network_s, + op->lrp_networks.ipv6_addrs[i].plen, NULL); + } } + /* Convert the static routes to flows. */ HMAP_FOR_EACH (od, key_node, datapaths) { if (!od->nbr) { continue; } - /* Convert the static routes to flows. */ for (int i = 0; i < od->nbr->n_static_routes; i++) { const struct nbrec_logical_router_static_route *route; @@ -3255,6 +3427,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_static_route_flow(lflows, od, ports, route); } } + /* XXX destination unreachable */ /* Local router ingress table 5: ARP Resolution. @@ -3265,31 +3438,47 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * Ethernet address in eth.dst. */ HMAP_FOR_EACH (op, key_node, ports) { if (op->nbrp) { - /* This is a logical router port. If next-hop IP address in 'reg0' - * matches ip address of this router port, then the packet is - * intended to eventually be sent to this logical port. Set the - * destination mac address using this port's mac address. + /* This is a logical router port. If next-hop IP address in + * '[xx]reg0' matches IP address of this router port, then + * the packet is intended to eventually be sent to this + * logical port. Set the destination mac address using this + * port's mac address. * * The packet is still in peer's logical pipeline. So the match * should be on peer's outport. */ - if (op->peer && op->peer->nbrp) { - ds_clear(&match); - ds_put_format(&match, "outport == %s && reg0 == ", - op->peer->json_key); - op_put_networks(&match, op, false); + if (op->peer && op->nbrp->peer) { + if (op->lrp_networks.n_ipv4_addrs) { + ds_clear(&match); + ds_put_format(&match, "outport == %s && reg0 == ", + op->peer->json_key); + op_put_v4_networks(&match, op, false); + + ds_clear(&actions); + ds_put_format(&actions, "eth.dst = %s; next;", + op->lrp_networks.ea_s); + ovn_lflow_add(lflows, op->peer->od, S_ROUTER_IN_ARP_RESOLVE, + 100, ds_cstr(&match), ds_cstr(&actions)); + } - ds_clear(&actions); - ds_put_format(&actions, "eth.dst = %s; next;", - op->lrp_networks.ea_s); - ovn_lflow_add(lflows, op->peer->od, S_ROUTER_IN_ARP_RESOLVE, - 100, ds_cstr(&match), ds_cstr(&actions)); + if (op->lrp_networks.n_ipv6_addrs) { + ds_clear(&match); + ds_put_format(&match, "outport == %s && xxreg0 == ", + op->peer->json_key); + op_put_v6_networks(&match, op); + + ds_clear(&actions); + ds_put_format(&actions, "eth.dst = %s; next;", + op->lrp_networks.ea_s); + ovn_lflow_add(lflows, op->peer->od, S_ROUTER_IN_ARP_RESOLVE, + 100, ds_cstr(&match), ds_cstr(&actions)); + } } } else if (op->od->n_router_ports && strcmp(op->nbsp->type, "router")) { /* This is a logical switch port that backs a VM or a container. * Extract its addresses. For each of the address, go through all * the router ports attached to the switch (to which this port * connects) and if the address in question is reachable from the - * router port, add an ARP entry in that router's pipeline. */ + * router port, add an ARP/ND entry in that router's pipeline. */ for (size_t i = 0; i < op->n_lsp_addrs; i++) { const char *ea_s = op->lsp_addrs[i].ea_s; @@ -3326,6 +3515,40 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_cstr(&match), ds_cstr(&actions)); } } + + for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) { + const char *ip_s = op->lsp_addrs[i].ipv6_addrs[j].addr_s; + for (size_t k = 0; k < op->od->n_router_ports; k++) { + /* Get the Logical_Router_Port that the + * Logical_Switch_Port is connected to, as + * 'peer'. */ + const char *peer_name = smap_get( + &op->od->router_ports[k]->nbsp->options, + "router-port"); + if (!peer_name) { + continue; + } + + struct ovn_port *peer = ovn_port_find(ports, peer_name); + if (!peer || !peer->nbrp) { + continue; + } + + if (!find_lrp_member_ip(peer, ip_s)) { + continue; + } + + ds_clear(&match); + ds_put_format(&match, "outport == %s && xxreg0 == %s", + peer->json_key, ip_s); + + ds_clear(&actions); + ds_put_format(&actions, "eth.dst = %s; next;", ea_s); + ovn_lflow_add(lflows, peer->od, + S_ROUTER_IN_ARP_RESOLVE, 100, + ds_cstr(&match), ds_cstr(&actions)); + } + } } } else if (!strcmp(op->nbsp->type, "router")) { /* This is a logical switch port that connects to a router. */ @@ -3361,16 +3584,31 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, continue; } - ds_clear(&match); - ds_put_format(&match, "outport == %s && reg0 == ", - peer->json_key); - op_put_networks(&match, router_port, false); + if (router_port->lrp_networks.n_ipv4_addrs) { + ds_clear(&match); + ds_put_format(&match, "outport == %s && reg0 == ", + peer->json_key); + op_put_v4_networks(&match, router_port, false); + + ds_clear(&actions); + ds_put_format(&actions, "eth.dst = %s; next;", + router_port->lrp_networks.ea_s); + ovn_lflow_add(lflows, peer->od, S_ROUTER_IN_ARP_RESOLVE, + 100, ds_cstr(&match), ds_cstr(&actions)); + } - ds_clear(&actions); - ds_put_format(&actions, "eth.dst = %s; next;", - router_port->lrp_networks.ea_s); - ovn_lflow_add(lflows, peer->od, S_ROUTER_IN_ARP_RESOLVE, - 100, ds_cstr(&match), ds_cstr(&actions)); + if (router_port->lrp_networks.n_ipv6_addrs) { + ds_clear(&match); + ds_put_format(&match, "outport == %s && xxreg0 == ", + peer->json_key); + op_put_v6_networks(&match, router_port); + + ds_clear(&actions); + ds_put_format(&actions, "eth.dst = %s; next;", + router_port->lrp_networks.ea_s); + ovn_lflow_add(lflows, peer->od, S_ROUTER_IN_ARP_RESOLVE, + 100, ds_cstr(&match), ds_cstr(&actions)); + } } } }