diff mbox

[ovs-dev] ovn: Support ARP proxy in logical switches.

Message ID 1482471796-1020-1-git-send-email-zhouhan@gmail.com
State Rejected
Headers show

Commit Message

Han Zhou Dec. 23, 2016, 5:43 a.m. UTC
This patch support "arp_proxy" option for logical switch ports. If
a lsp with arp_proxy=true, all the arp request to known ipv4
addresses on the ls will be responded with the arp proxy lsp's MAC
address, except when the arp request come from the arp proxy lsp
itself.

Signed-off-by: Han Zhou <zhouhan@gmail.com>
---
 ovn/northd/ovn-northd.8.xml |  23 ++++++++
 ovn/northd/ovn-northd.c     | 119 ++++++++++++++++++++++++++++++++------
 ovn/ovn-nb.xml              |  11 ++++
 tests/ovn.at                | 136 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 271 insertions(+), 18 deletions(-)

Comments

Ben Pfaff Jan. 4, 2017, 11:29 p.m. UTC | #1
[adding Bruce to leverage his knowledge of networking, if he's willing]

It seems like the mechanics of this patch are mostly OK.  There is at
least one inconsistency in ovn-nb.xml, where it says in one place that
the new option is for all port types and in another that it's only for
ports with an empty 'type'.  (For the record, if it's for every port
type, then it should be in other_config.  'options' is intended for
type-specific configuration, not for generic configuration.)

However, I'm concerned about the general utility here.  I usually think
of proxy ARP as being used for the kinds of applications you see in the
wikipedia on proxy ARP: https://en.wikipedia.org/wiki/Proxy_ARP.  This
seems to aimed at a different application that is more like a weak form
of service function chaining than for the traditional applications of
proxy ARP.

Is this an appropriate application for proxy ARP?  Is it a common enough
use case to support in OVN?  Should it instead be handled through a more
general service function chaining interface?

Bruce, the first file in the patch explains the application here pretty
well, that's probably the bit that you should read.

Thanks,

Ben.

On Thu, Dec 22, 2016 at 09:43:16PM -0800, Han Zhou wrote:
> This patch support "arp_proxy" option for logical switch ports. If
> a lsp with arp_proxy=true, all the arp request to known ipv4
> addresses on the ls will be responded with the arp proxy lsp's MAC
> address, except when the arp request come from the arp proxy lsp
> itself.
> 
> Signed-off-by: Han Zhou <zhouhan@gmail.com>
> ---
>  ovn/northd/ovn-northd.8.xml |  23 ++++++++
>  ovn/northd/ovn-northd.c     | 119 ++++++++++++++++++++++++++++++++------
>  ovn/ovn-nb.xml              |  11 ++++
>  tests/ovn.at                | 136 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 271 insertions(+), 18 deletions(-)
> 
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index f3c1682..aefa086 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -446,6 +446,19 @@
>      </p>
>  
>      <p>
> +      In particular, if a logical switch port has option proxy_arp set to
> +      true, we call it an arp-proxy port. In such case, ARP requests to
> +      IPv4 addresses of any logical port on the same logical switch will
> +      be responded with the MAC address of that arp-proxy port, except when
> +      the request comes from that arp-proxy port, which can still learn the
> +      real MAC of destination VM/Container. This feature is useful if we
> +      want all the traffic to/from any logical ports on the logical switch
> +      being forwarded to a particular logical port, which may act as a proxy
> +      and will forward the traffic to the real destination after some pre-
> +      processing.
> +    </p>
> +
> +    <p>
>        ARP requests arrive from VMs from a logical switch inport of type
>        default.  For this case, the logical switch proxy ARP rules can be
>        for other VMs or logical router ports.  Logical switch proxy ARP
> @@ -521,6 +534,16 @@ output;
>          </pre>
>  
>          <p>
> +          If arp-proxy port is present in the logical switch, the above
> +          mentioned Ethernet address <var>E</var> will be the one belonging
> +          to the arp-proxy port, and there will be Priority-60 flows that
> +          match ARP requests from the arp-proxy port to each known IP address
> +          of every logical switch port, and respond with ARP replies in the
> +          same format as above but with Ethernet address <var>E</var> being
> +          the real one belonging to each target logical switch port.
> +        </p>
> +
> +        <p>
>            These flows are omitted for logical ports (other than router ports)
>            that are down.
>          </p>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index c56ac79..f897658 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -380,6 +380,10 @@ struct ovn_datapath {
>  
>      bool has_unknown;
>  
> +    /* ARP proxy port: MAC of the lport will be used by arp
> +     * responder for other lports in the same LS. */
> +    struct ovn_port *arp_proxy_port;
> +
>      /* IPAM data. */
>      struct hmap ipam;
>  };
> @@ -2012,6 +2016,15 @@ build_port_security_ip(enum ovn_pipeline pipeline, struct ovn_port *op,
>  }
>  
>  static bool
> +lsp_is_arp_proxy(const struct nbrec_logical_switch_port *lsp)
> +{
> +
> +    bool is_arp_proxy = smap_get_bool(&lsp->options,
> +                                      "arp_proxy", false);
> +    return is_arp_proxy;
> +}
> +
> +static bool
>  lsp_is_enabled(const struct nbrec_logical_switch_port *lsp)
>  {
>      return !lsp->enabled || *lsp->enabled;
> @@ -2852,6 +2865,32 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_IP, 0, "1", "next;");
>      }
>  
> +    /* Check arp_proxy option for LS ports */
> +    HMAP_FOR_EACH (op, key_node, ports) {
> +        if (!op->nbsp) {
> +            continue;
> +        }
> +
> +        if (lsp_is_arp_proxy(op->nbsp)) {
> +            if (op->n_lsp_addrs != 1) {
> +                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> +                VLOG_WARN_RL(&rl, "one and only one address expected for arp "
> +                             "proxy port %s", op->json_key);
> +                continue;
> +            }
> +            if (op->od->arp_proxy_port) {
> +                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> +                VLOG_WARN_RL(&rl,
> +                             "ARP proxy lsp %s already existed on ls "UUID_FMT
> +                             " but another lsp %s is seen",
> +                             op->od->arp_proxy_port->json_key,
> +                             UUID_ARGS(&op->od->key),
> +                             op->json_key);
> +                continue;
> +            }
> +            op->od->arp_proxy_port = op;
> +        }
> +    }
>      /* Ingress table 10: ARP/ND responder, skip requests coming from localnet
>       * and vtep ports. (priority 100); see ovn-northd.8.xml for the
>       * rationale. */
> @@ -2888,24 +2927,68 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
>          for (size_t i = 0; i < op->n_lsp_addrs; i++) {
>              for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) {
>                  ds_clear(&match);
> -                ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
> -                              op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> -                ds_clear(&actions);
> -                ds_put_format(&actions,
> -                    "eth.dst = eth.src; "
> -                    "eth.src = %s; "
> -                    "arp.op = 2; /* ARP reply */ "
> -                    "arp.tha = arp.sha; "
> -                    "arp.sha = %s; "
> -                    "arp.tpa = arp.spa; "
> -                    "arp.spa = %s; "
> -                    "outport = inport; "
> -                    "flags.loopback = 1; "
> -                    "output;",
> -                    op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
> -                    op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> -                ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
> -                              ds_cstr(&match), ds_cstr(&actions));
> +                if (op->od->arp_proxy_port) {
> +                    ds_put_format(&match, "arp.tpa == %s && arp.op == 1 && inport == %s",
> +                                  op->lsp_addrs[i].ipv4_addrs[j].addr_s,
> +                                  op->od->arp_proxy_port->json_key);
> +                    ds_clear(&actions);
> +                    ds_put_format(&actions,
> +                        "eth.dst = eth.src; "
> +                        "eth.src = %s; "
> +                        "arp.op = 2; /* ARP reply */ "
> +                        "arp.tha = arp.sha; "
> +                        "arp.sha = %s; "
> +                        "arp.tpa = arp.spa; "
> +                        "arp.spa = %s; "
> +                        "outport = inport; "
> +                        "flags.loopback = 1; "
> +                        "output;",
> +                        op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
> +                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> +                    ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 60,
> +                                  ds_cstr(&match), ds_cstr(&actions));
> +
> +                    ds_clear(&match);
> +                    ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
> +                                  op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> +                    ds_clear(&actions);
> +                    char *proxy_mac = op->od->arp_proxy_port->lsp_addrs[0].ea_s;
> +                    ds_put_format(&actions,
> +                        "eth.dst = eth.src; "
> +                        "eth.src = %s; "
> +                        "arp.op = 2; /* ARP reply */ "
> +                        "arp.tha = arp.sha; "
> +                        "arp.sha = %s; "
> +                        "arp.tpa = arp.spa; "
> +                        "arp.spa = %s; "
> +                        "outport = inport; "
> +                        "flags.loopback = 1; "
> +                        "output;",
> +                        proxy_mac, proxy_mac,
> +                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> +                    ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
> +                                  ds_cstr(&match), ds_cstr(&actions));
> +
> +                } else {
> +                    ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
> +                                  op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> +                    ds_clear(&actions);
> +                    ds_put_format(&actions,
> +                        "eth.dst = eth.src; "
> +                        "eth.src = %s; "
> +                        "arp.op = 2; /* ARP reply */ "
> +                        "arp.tha = arp.sha; "
> +                        "arp.sha = %s; "
> +                        "arp.tpa = arp.spa; "
> +                        "arp.spa = %s; "
> +                        "outport = inport; "
> +                        "flags.loopback = 1; "
> +                        "output;",
> +                        op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
> +                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> +                    ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
> +                                  ds_cstr(&match), ds_cstr(&actions));
> +                }
>  
>                  /* Do not reply to an ARP request from the port that owns the
>                   * address (otherwise a DHCP client that ARPs to check for a
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index 3e40881..a58d800 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -219,6 +219,17 @@
>          individually below.
>        </column>
>  
> +      <group title="Options for all port types">
> +        <p>
> +          These options apply when <ref column="type"/> is not set.
> +        </p>
> +
> +        <column name="options" key="arp_proxy">
> +          If this option is set to true, the lport acts as ARP proxy for all the
> +          other lports on the same logical switch. Default value is false.
> +        </column>
> +      </group>
> +
>        <group title="Options for router ports">
>          <p>
>            These options apply when <ref column="type"/> is <code>router</code>.
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 3779741..1830920 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -6023,3 +6023,139 @@ OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>  
>  AT_CLEANUP
> +
> +# 3 hypervisors, one logical switch, 3 logical ports per hypervisor, ARP proxy
> +AT_SETUP([ovn -- 3 HVs, 1 LS, 3 lports/HV, ARP proxy])
> +AT_KEYWORDS([ovnarpproxy])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Create hypervisors hv[123].
> +# Add vif1[123] to hv1, vif2[123] to hv2, vif3[123] to hv3.
> +# Add all of the vifs to a single logical switch lsw0.
> +# Set logical switch port lp11 as arp proxy.
> +ovn-nbctl ls-add lsw0
> +net_add n1
> +for i in 1 2 3; do
> +    sim_add hv$i
> +    as hv$i
> +    ovs-vsctl add-br br-phys
> +    ovn_attach n1 br-phys 192.168.0.$i
> +
> +    for j in 1 2 3; do
> +        ovs-vsctl add-port br-int vif$i$j -- set Interface vif$i$j external-ids:iface-id=lp$i$j options:tx_pcap=hv$i/vif$i$j-tx.pcap options:rxq_pcap=hv$i/vif$i$j-rx.pcap ofport-request=$i$j
> +        ovn-nbctl lsp-add lsw0 lp$i$j
> +        ovn-nbctl lsp-set-addresses lp$i$j "f0:00:00:00:00:$i$j 192.168.0.$i$j"
> +    done
> +done
> +ovn-nbctl set Logical_Switch_Port lp11 options:arp_proxy=true
> +
> +# Pre-populate the hypervisors' ARP tables so that we don't lose any
> +# packets for ARP resolution (native tunneling doesn't queue packets
> +# for ARP resolution).
> +ovn_populate_arp
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +sleep 1
> +
> +# Given the name of a logical port, prints the name of the hypervisor
> +# on which it is located.
> +vif_to_hv() {
> +    echo hv${1%?}
> +}
> +
> +# test_arp INPORT SHA SPA TPA [REPLY_HA]
> +#
> +# Causes a packet to be received on INPORT.  The packet is an ARP
> +# request with SHA, SPA, and TPA as specified.  If REPLY_HA is provided, then
> +# it should be the hardware address of the target to expect to receive in an
> +# ARP reply; otherwise no reply is expected.
> +#
> +# INPORT is an logical switch port number, e.g. 11 for vif11.
> +# SHA and REPLY_HA are each 12 hex digits.
> +# SPA and TPA are each 8 hex digits.
> +test_arp() {
> +    local inport=$1 sha=$2 spa=$3 tpa=$4 reply_ha=$5
> +    local request=ffffffffffff${sha}08060001080006040001${sha}${spa}ffffffffffff${tpa}
> +    hv=`vif_to_hv $inport`
> +    as $hv ovs-appctl netdev-dummy/receive vif$inport $request
> +
> +    if test X$reply_ha = X; then
> +        # Expect to receive the broadcast ARP on the other logical switch ports
> +        # if no reply is expected.
> +        local i j
> +        for i in 1 2 3; do
> +            for j in 1 2 3; do
> +                if test $i$j != $inport; then
> +                    echo $request >> $i$j.expected
> +                fi
> +            done
> +        done
> +    else
> +        # Expect to receive the reply, if any.
> +        local reply=${sha}${reply_ha}08060001080006040002${reply_ha}${tpa}${sha}${spa}
> +        echo $reply >> $inport.expected
> +    fi
> +}
> +
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +# Send ARP requests between all pairs of source and destination ports.
> +#
> +# No reply when sending ARP request to self.
> +#
> +# If ARP request is sent from ARP proxy port, OVN will generate response with
> +# MAC of destination port. Otherwise, OVN will generate response with MAC of
> +# the ARP proxy port.
> +
> +for is in 1 2 3; do
> +    for js in 1 2 3; do
> +        s=$is$js
> +        for id in 1 2 3; do
> +            for jd in 1 2 3; do
> +                d=$id$jd
> +                sip=`ip_to_hex 192 168 0 $i$j`
> +                tip=`ip_to_hex 192 168 0 $id$jd`
> +                if test $d != $s; then
> +                    if test $s = "11"; then
> +                        reply_ha=f000000000$d
> +                    else
> +                        reply_ha=f00000000011
> +                    fi
> +                else
> +                    reply_ha=
> +                fi
> +                test_arp $s f000000000$s $sip $tip $reply_ha
> +            done
> +        done
> +    done
> +done
> +
> +# dump information and flows with counters
> +ovn-sbctl dump-flows -- list multicast_group
> +
> +echo "------ hv1 dump ------"
> +as hv1 ovs-vsctl show
> +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +
> +echo "------ hv2 dump ------"
> +as hv2 ovs-vsctl show
> +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +
> +echo "------ hv3 dump ------"
> +as hv3 ovs-vsctl show
> +as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +
> +# Now check the packets actually received against the ones expected.
> +for i in 1 2 3; do
> +    for j in 1 2 3; do
> +        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])
> +    done
> +done
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3])
> +
> +AT_CLEANUP
> -- 
> 2.1.0
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Han Zhou Jan. 5, 2017, 8:33 a.m. UTC | #2
On Wed, Jan 4, 2017 at 3:29 PM, Ben Pfaff <blp@ovn.org> wrote:
>
> [adding Bruce to leverage his knowledge of networking, if he's willing]

Thanks! It would be great to get some views from networking experts.

>
> It seems like the mechanics of this patch are mostly OK.  There is at
> least one inconsistency in ovn-nb.xml, where it says in one place that
> the new option is for all port types and in another that it's only for
> ports with an empty 'type'.  (For the record, if it's for every port
> type, then it should be in other_config.  'options' is intended for
> type-specific configuration, not for generic configuration.)

Ah, my bad. In the beginning I thought it may be applicable for any type,
but then I realized it is not. My current use case is for empty type only.
I'll move the text to the section "VMI (or VIF) Options". It may also be
useful for router type, but not likely for l2gateway, and definitely not
applicable for localnet.

>
> However, I'm concerned about the general utility here.  I usually think
> of proxy ARP as being used for the kinds of applications you see in the
> wikipedia on proxy ARP: https://en.wikipedia.org/wiki/Proxy_ARP.  This
> seems to aimed at a different application that is more like a weak form
> of service function chaining than for the traditional applications of
> proxy ARP.
>
> Is this an appropriate application for proxy ARP?  Is it a common enough
> use case to support in OVN?  Should it instead be handled through a more
> general service function chaining interface?
>

I have similar concerns about how useful it is. I agree it is kind of
lightweight SFC, but in my opinion it is not contradict with the fact that
it is also Proxy ARP. At least the firewall example (the 3rd use case)
described in https://en.wikipedia.org/wiki/Proxy_ARP seems to be very
close. SFC may be a more general approach to redirect packets to a
firewall, but it is heavier, and may be difficult in a non-overlay
environment, while Proxy ARP is sufficient and simple for such specific use
cases when the nodes are L2 adjacent. So I think it may be good to have
such feature in OVN, if the change is not intrusive.

> Bruce, the first file in the patch explains the application here pretty
> well, that's probably the bit that you should read.
>
> Thanks,
>
> Ben.
>
> On Thu, Dec 22, 2016 at 09:43:16PM -0800, Han Zhou wrote:
> > This patch support "arp_proxy" option for logical switch ports. If
> > a lsp with arp_proxy=true, all the arp request to known ipv4
> > addresses on the ls will be responded with the arp proxy lsp's MAC
> > address, except when the arp request come from the arp proxy lsp
> > itself.
> >
> > Signed-off-by: Han Zhou <zhouhan@gmail.com>
> > ---
> >  ovn/northd/ovn-northd.8.xml |  23 ++++++++
> >  ovn/northd/ovn-northd.c     | 119
++++++++++++++++++++++++++++++++------
> >  ovn/ovn-nb.xml              |  11 ++++
> >  tests/ovn.at                | 136
++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 271 insertions(+), 18 deletions(-)
> >
> > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> > index f3c1682..aefa086 100644
> > --- a/ovn/northd/ovn-northd.8.xml
> > +++ b/ovn/northd/ovn-northd.8.xml
> > @@ -446,6 +446,19 @@
> >      </p>
> >
> >      <p>
> > +      In particular, if a logical switch port has option proxy_arp set
to
> > +      true, we call it an arp-proxy port. In such case, ARP requests to
> > +      IPv4 addresses of any logical port on the same logical switch
will
> > +      be responded with the MAC address of that arp-proxy port, except
when
> > +      the request comes from that arp-proxy port, which can still
learn the
> > +      real MAC of destination VM/Container. This feature is useful if
we
> > +      want all the traffic to/from any logical ports on the logical
switch
> > +      being forwarded to a particular logical port, which may act as a
proxy
> > +      and will forward the traffic to the real destination after some
pre-
> > +      processing.
> > +    </p>
> > +
> > +    <p>
> >        ARP requests arrive from VMs from a logical switch inport of type
> >        default.  For this case, the logical switch proxy ARP rules can
be
> >        for other VMs or logical router ports.  Logical switch proxy ARP
> > @@ -521,6 +534,16 @@ output;
> >          </pre>
> >
> >          <p>
> > +          If arp-proxy port is present in the logical switch, the above
> > +          mentioned Ethernet address <var>E</var> will be the one
belonging
> > +          to the arp-proxy port, and there will be Priority-60 flows
that
> > +          match ARP requests from the arp-proxy port to each known IP
address
> > +          of every logical switch port, and respond with ARP replies
in the
> > +          same format as above but with Ethernet address <var>E</var>
being
> > +          the real one belonging to each target logical switch port.
> > +        </p>
> > +
> > +        <p>
> >            These flows are omitted for logical ports (other than router
ports)
> >            that are down.
> >          </p>
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index c56ac79..f897658 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -380,6 +380,10 @@ struct ovn_datapath {
> >
> >      bool has_unknown;
> >
> > +    /* ARP proxy port: MAC of the lport will be used by arp
> > +     * responder for other lports in the same LS. */
> > +    struct ovn_port *arp_proxy_port;
> > +
> >      /* IPAM data. */
> >      struct hmap ipam;
> >  };
> > @@ -2012,6 +2016,15 @@ build_port_security_ip(enum ovn_pipeline
pipeline, struct ovn_port *op,
> >  }
> >
> >  static bool
> > +lsp_is_arp_proxy(const struct nbrec_logical_switch_port *lsp)
> > +{
> > +
> > +    bool is_arp_proxy = smap_get_bool(&lsp->options,
> > +                                      "arp_proxy", false);
> > +    return is_arp_proxy;
> > +}
> > +
> > +static bool
> >  lsp_is_enabled(const struct nbrec_logical_switch_port *lsp)
> >  {
> >      return !lsp->enabled || *lsp->enabled;
> > @@ -2852,6 +2865,32 @@ build_lswitch_flows(struct hmap *datapaths,
struct hmap *ports,
> >          ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_IP, 0, "1",
"next;");
> >      }
> >
> > +    /* Check arp_proxy option for LS ports */
> > +    HMAP_FOR_EACH (op, key_node, ports) {
> > +        if (!op->nbsp) {
> > +            continue;
> > +        }
> > +
> > +        if (lsp_is_arp_proxy(op->nbsp)) {
> > +            if (op->n_lsp_addrs != 1) {
> > +                static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5, 1);
> > +                VLOG_WARN_RL(&rl, "one and only one address expected
for arp "
> > +                             "proxy port %s", op->json_key);
> > +                continue;
> > +            }
> > +            if (op->od->arp_proxy_port) {
> > +                static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5, 1);
> > +                VLOG_WARN_RL(&rl,
> > +                             "ARP proxy lsp %s already existed on ls
"UUID_FMT
> > +                             " but another lsp %s is seen",
> > +                             op->od->arp_proxy_port->json_key,
> > +                             UUID_ARGS(&op->od->key),
> > +                             op->json_key);
> > +                continue;
> > +            }
> > +            op->od->arp_proxy_port = op;
> > +        }
> > +    }
> >      /* Ingress table 10: ARP/ND responder, skip requests coming from
localnet
> >       * and vtep ports. (priority 100); see ovn-northd.8.xml for the
> >       * rationale. */
> > @@ -2888,24 +2927,68 @@ build_lswitch_flows(struct hmap *datapaths,
struct hmap *ports,
> >          for (size_t i = 0; i < op->n_lsp_addrs; i++) {
> >              for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++)
{
> >                  ds_clear(&match);
> > -                ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
> > -                              op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > -                ds_clear(&actions);
> > -                ds_put_format(&actions,
> > -                    "eth.dst = eth.src; "
> > -                    "eth.src = %s; "
> > -                    "arp.op = 2; /* ARP reply */ "
> > -                    "arp.tha = arp.sha; "
> > -                    "arp.sha = %s; "
> > -                    "arp.tpa = arp.spa; "
> > -                    "arp.spa = %s; "
> > -                    "outport = inport; "
> > -                    "flags.loopback = 1; "
> > -                    "output;",
> > -                    op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
> > -                    op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > -                ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
50,
> > -                              ds_cstr(&match), ds_cstr(&actions));
> > +                if (op->od->arp_proxy_port) {
> > +                    ds_put_format(&match, "arp.tpa == %s && arp.op ==
1 && inport == %s",
> > +
 op->lsp_addrs[i].ipv4_addrs[j].addr_s,
> > +                                  op->od->arp_proxy_port->json_key);
> > +                    ds_clear(&actions);
> > +                    ds_put_format(&actions,
> > +                        "eth.dst = eth.src; "
> > +                        "eth.src = %s; "
> > +                        "arp.op = 2; /* ARP reply */ "
> > +                        "arp.tha = arp.sha; "
> > +                        "arp.sha = %s; "
> > +                        "arp.tpa = arp.spa; "
> > +                        "arp.spa = %s; "
> > +                        "outport = inport; "
> > +                        "flags.loopback = 1; "
> > +                        "output;",
> > +                        op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
> > +                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > +                    ovn_lflow_add(lflows, op->od,
S_SWITCH_IN_ARP_ND_RSP, 60,
> > +                                  ds_cstr(&match), ds_cstr(&actions));
> > +
> > +                    ds_clear(&match);
> > +                    ds_put_format(&match, "arp.tpa == %s && arp.op ==
1",
> > +
 op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > +                    ds_clear(&actions);
> > +                    char *proxy_mac =
op->od->arp_proxy_port->lsp_addrs[0].ea_s;
> > +                    ds_put_format(&actions,
> > +                        "eth.dst = eth.src; "
> > +                        "eth.src = %s; "
> > +                        "arp.op = 2; /* ARP reply */ "
> > +                        "arp.tha = arp.sha; "
> > +                        "arp.sha = %s; "
> > +                        "arp.tpa = arp.spa; "
> > +                        "arp.spa = %s; "
> > +                        "outport = inport; "
> > +                        "flags.loopback = 1; "
> > +                        "output;",
> > +                        proxy_mac, proxy_mac,
> > +                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > +                    ovn_lflow_add(lflows, op->od,
S_SWITCH_IN_ARP_ND_RSP, 50,
> > +                                  ds_cstr(&match), ds_cstr(&actions));
> > +
> > +                } else {
> > +                    ds_put_format(&match, "arp.tpa == %s && arp.op ==
1",
> > +
 op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > +                    ds_clear(&actions);
> > +                    ds_put_format(&actions,
> > +                        "eth.dst = eth.src; "
> > +                        "eth.src = %s; "
> > +                        "arp.op = 2; /* ARP reply */ "
> > +                        "arp.tha = arp.sha; "
> > +                        "arp.sha = %s; "
> > +                        "arp.tpa = arp.spa; "
> > +                        "arp.spa = %s; "
> > +                        "outport = inport; "
> > +                        "flags.loopback = 1; "
> > +                        "output;",
> > +                        op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
> > +                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
> > +                    ovn_lflow_add(lflows, op->od,
S_SWITCH_IN_ARP_ND_RSP, 50,
> > +                                  ds_cstr(&match), ds_cstr(&actions));
> > +                }
> >
> >                  /* Do not reply to an ARP request from the port that
owns the
> >                   * address (otherwise a DHCP client that ARPs to check
for a
> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> > index 3e40881..a58d800 100644
> > --- a/ovn/ovn-nb.xml
> > +++ b/ovn/ovn-nb.xml
> > @@ -219,6 +219,17 @@
> >          individually below.
> >        </column>
> >
> > +      <group title="Options for all port types">
> > +        <p>
> > +          These options apply when <ref column="type"/> is not set.
> > +        </p>
> > +
> > +        <column name="options" key="arp_proxy">
> > +          If this option is set to true, the lport acts as ARP proxy
for all the
> > +          other lports on the same logical switch. Default value is
false.
> > +        </column>
> > +      </group>
> > +
> >        <group title="Options for router ports">
> >          <p>
> >            These options apply when <ref column="type"/> is
<code>router</code>.
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 3779741..1830920 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -6023,3 +6023,139 @@ OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> >  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >
> >  AT_CLEANUP
> > +
> > +# 3 hypervisors, one logical switch, 3 logical ports per hypervisor,
ARP proxy
> > +AT_SETUP([ovn -- 3 HVs, 1 LS, 3 lports/HV, ARP proxy])
> > +AT_KEYWORDS([ovnarpproxy])
> > +AT_SKIP_IF([test $HAVE_PYTHON = no])
> > +ovn_start
> > +
> > +# Create hypervisors hv[123].
> > +# Add vif1[123] to hv1, vif2[123] to hv2, vif3[123] to hv3.
> > +# Add all of the vifs to a single logical switch lsw0.
> > +# Set logical switch port lp11 as arp proxy.
> > +ovn-nbctl ls-add lsw0
> > +net_add n1
> > +for i in 1 2 3; do
> > +    sim_add hv$i
> > +    as hv$i
> > +    ovs-vsctl add-br br-phys
> > +    ovn_attach n1 br-phys 192.168.0.$i
> > +
> > +    for j in 1 2 3; do
> > +        ovs-vsctl add-port br-int vif$i$j -- set Interface vif$i$j
external-ids:iface-id=lp$i$j options:tx_pcap=hv$i/vif$i$j-tx.pcap
options:rxq_pcap=hv$i/vif$i$j-rx.pcap ofport-request=$i$j
> > +        ovn-nbctl lsp-add lsw0 lp$i$j
> > +        ovn-nbctl lsp-set-addresses lp$i$j "f0:00:00:00:00:$i$j
192.168.0.$i$j"
> > +    done
> > +done
> > +ovn-nbctl set Logical_Switch_Port lp11 options:arp_proxy=true
> > +
> > +# Pre-populate the hypervisors' ARP tables so that we don't lose any
> > +# packets for ARP resolution (native tunneling doesn't queue packets
> > +# for ARP resolution).
> > +ovn_populate_arp
> > +
> > +# Allow some time for ovn-northd and ovn-controller to catch up.
> > +# XXX This should be more systematic.
> > +sleep 1
> > +
> > +# Given the name of a logical port, prints the name of the hypervisor
> > +# on which it is located.
> > +vif_to_hv() {
> > +    echo hv${1%?}
> > +}
> > +
> > +# test_arp INPORT SHA SPA TPA [REPLY_HA]
> > +#
> > +# Causes a packet to be received on INPORT.  The packet is an ARP
> > +# request with SHA, SPA, and TPA as specified.  If REPLY_HA is
provided, then
> > +# it should be the hardware address of the target to expect to receive
in an
> > +# ARP reply; otherwise no reply is expected.
> > +#
> > +# INPORT is an logical switch port number, e.g. 11 for vif11.
> > +# SHA and REPLY_HA are each 12 hex digits.
> > +# SPA and TPA are each 8 hex digits.
> > +test_arp() {
> > +    local inport=$1 sha=$2 spa=$3 tpa=$4 reply_ha=$5
> > +    local
request=ffffffffffff${sha}08060001080006040001${sha}${spa}ffffffffffff${tpa}
> > +    hv=`vif_to_hv $inport`
> > +    as $hv ovs-appctl netdev-dummy/receive vif$inport $request
> > +
> > +    if test X$reply_ha = X; then
> > +        # Expect to receive the broadcast ARP on the other logical
switch ports
> > +        # if no reply is expected.
> > +        local i j
> > +        for i in 1 2 3; do
> > +            for j in 1 2 3; do
> > +                if test $i$j != $inport; then
> > +                    echo $request >> $i$j.expected
> > +                fi
> > +            done
> > +        done
> > +    else
> > +        # Expect to receive the reply, if any.
> > +        local
reply=${sha}${reply_ha}08060001080006040002${reply_ha}${tpa}${sha}${spa}
> > +        echo $reply >> $inport.expected
> > +    fi
> > +}
> > +
> > +ip_to_hex() {
> > +    printf "%02x%02x%02x%02x" "$@"
> > +}
> > +
> > +# Send ARP requests between all pairs of source and destination ports.
> > +#
> > +# No reply when sending ARP request to self.
> > +#
> > +# If ARP request is sent from ARP proxy port, OVN will generate
response with
> > +# MAC of destination port. Otherwise, OVN will generate response with
MAC of
> > +# the ARP proxy port.
> > +
> > +for is in 1 2 3; do
> > +    for js in 1 2 3; do
> > +        s=$is$js
> > +        for id in 1 2 3; do
> > +            for jd in 1 2 3; do
> > +                d=$id$jd
> > +                sip=`ip_to_hex 192 168 0 $i$j`
> > +                tip=`ip_to_hex 192 168 0 $id$jd`
> > +                if test $d != $s; then
> > +                    if test $s = "11"; then
> > +                        reply_ha=f000000000$d
> > +                    else
> > +                        reply_ha=f00000000011
> > +                    fi
> > +                else
> > +                    reply_ha=
> > +                fi
> > +                test_arp $s f000000000$s $sip $tip $reply_ha
> > +            done
> > +        done
> > +    done
> > +done
> > +
> > +# dump information and flows with counters
> > +ovn-sbctl dump-flows -- list multicast_group
> > +
> > +echo "------ hv1 dump ------"
> > +as hv1 ovs-vsctl show
> > +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
> > +
> > +echo "------ hv2 dump ------"
> > +as hv2 ovs-vsctl show
> > +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
> > +
> > +echo "------ hv3 dump ------"
> > +as hv3 ovs-vsctl show
> > +as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int
> > +
> > +# Now check the packets actually received against the ones expected.
> > +for i in 1 2 3; do
> > +    for j in 1 2 3; do
> > +        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])
> > +    done
> > +done
> > +
> > +OVN_CLEANUP([hv1],[hv2],[hv3])
> > +
> > +AT_CLEANUP
> > --
> > 2.1.0
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Bruce Davie Jan. 9, 2017, 10:57 p.m. UTC | #3
Han,
Your comment gave me pause:
> I have similar concerns about how useful it is.


Whereas the current proxy ARP function in OVN has a pretty clear motivation & tightly defined use case (to avoid needless broadcast of ARP requests across the overlay when the logical router’s IP and MAC are known) it seems like even you don’t really have a clear use case in mind for this additional functionality. Can you try to lay our more clearly why you think this is a useful enough addition to OVN to warrant the extra complexity?

Thanks
Bruce
Han Zhou Jan. 10, 2017, 12:08 a.m. UTC | #4
Hi Bruce,

This feature is useful for me. I had the concern because the use case for
me is intermediate. It is for k8s integration. In k8s there is a kubeproxy
running on each host to do service-ip NATting, and I am using OVS
(programmed by OVN) to connect host network namespace to containers (and
also connection between containers on the same host). When an NATted
end-point of a service-ip happen to be on the same host (and of course,
same L2, since we use bridged mode), then the return traffic would be
forwarded directly to the source without going through the kubeproxy. With
this ARP-proxy patch, the interface to the host network namespace is
specified as ARP proxy port, and traffic between containers will go through
the host for NATting.

This is an intermediate solution because a better way to do it is to
utilize the load-balancing feature of OVN to replace kubeproxy completely,
and the problem won't exist at all. It just takes more effort to integrate
and we are not there yet.

Hope this clarifies my use case. But I'd like to hear if this feature would
useful in any other circumstances.

Thanks,
Han

On Mon, Jan 9, 2017 at 2:57 PM, Bruce Davie <bdavie@vmware.com> wrote:
>
> Han,
> Your comment gave me pause:
> > I have similar concerns about how useful it is.
>
> Whereas the current proxy ARP function in OVN has a pretty clear
motivation & tightly defined use case (to avoid needless broadcast of ARP
requests across the overlay when the logical router’s IP and MAC are known)
it seems like even you don’t really have a clear use case in mind for this
additional functionality. Can you try to lay our more clearly why you think
this is a useful enough addition to OVN to warrant the extra complexity?
>
> Thanks
> Bruce
>
Bruce Davie Jan. 10, 2017, 10:51 p.m. UTC | #5
Han,
 Thanks - that makes sense to me. I can’t claim that this is a hugely general feature, but your use case seems legit to me (with the caveat that I’m no k8s expert).

Bruce

On Jan 9, 2017, at 4:08 PM, Han Zhou <zhouhan@gmail.com<mailto:zhouhan@gmail.com>> wrote:

Hi Bruce,

This feature is useful for me. I had the concern because the use case for me is intermediate. It is for k8s integration. In k8s there is a kubeproxy running on each host to do service-ip NATting, and I am using OVS (programmed by OVN) to connect host network namespace to containers (and also connection between containers on the same host). When an NATted end-point of a service-ip happen to be on the same host (and of course, same L2, since we use bridged mode), then the return traffic would be forwarded directly to the source without going through the kubeproxy. With this ARP-proxy patch, the interface to the host network namespace is specified as ARP proxy port, and traffic between containers will go through the host for NATting.

This is an intermediate solution because a better way to do it is to utilize the load-balancing feature of OVN to replace kubeproxy completely, and the problem won't exist at all. It just takes more effort to integrate and we are not there yet.

Hope this clarifies my use case. But I'd like to hear if this feature would useful in any other circumstances.

Thanks,
Han

On Mon, Jan 9, 2017 at 2:57 PM, Bruce Davie <bdavie@vmware.com<mailto:bdavie@vmware.com>> wrote:
>

> Han,

> Your comment gave me pause:

> > I have similar concerns about how useful it is.

>

> Whereas the current proxy ARP function in OVN has a pretty clear motivation & tightly defined use case (to avoid needless broadcast of ARP requests across the overlay when the logical router’s IP and MAC are known) it seems like even you don’t really have a clear use case in mind for this additional functionality. Can you try to lay our more clearly why you think this is a useful enough addition to OVN to warrant the extra complexity?

>

> Thanks

> Bruce

>
Ben Pfaff Jan. 13, 2017, 9:40 p.m. UTC | #6
On Mon, Jan 09, 2017 at 04:08:08PM -0800, Han Zhou wrote:
> This is an intermediate solution because a better way to do it is to
> utilize the load-balancing feature of OVN to replace kubeproxy completely,
> and the problem won't exist at all. It just takes more effort to integrate
> and we are not there yet.

Guru has already built OVN-k8 integration, which includes a load
balancer.  Does it work for your case?
Han Zhou Jan. 13, 2017, 10:50 p.m. UTC | #7
Hi Ben,

I am aware of the ovn-k8s integration by Guru, but our approach is slightly
different. It is somehow customized to avoid the current scale limit of ovn
cluster, so Guru's work can't be directly used. But I believe it will be a
very good reference for us when replacing kubeproxy by DLB of ovn. Another
not so critical concern may be that people prefer Go instead of python for
k8s related stuff (although I am not sure if it really matters).

Thanks,
Han

On Fri, Jan 13, 2017 at 1:40 PM, Ben Pfaff <blp@ovn.org> wrote:
>
> On Mon, Jan 09, 2017 at 04:08:08PM -0800, Han Zhou wrote:
> > This is an intermediate solution because a better way to do it is to
> > utilize the load-balancing feature of OVN to replace kubeproxy
completely,
> > and the problem won't exist at all. It just takes more effort to
integrate
> > and we are not there yet.
>
> Guru has already built OVN-k8 integration, which includes a load
> balancer.  Does it work for your case?
Ben Pfaff Jan. 31, 2017, 10:49 p.m. UTC | #8
On Thu, Jan 05, 2017 at 12:33:51AM -0800, Han Zhou wrote:
> On Wed, Jan 4, 2017 at 3:29 PM, Ben Pfaff <blp@ovn.org> wrote:
> > However, I'm concerned about the general utility here.  I usually think
> > of proxy ARP as being used for the kinds of applications you see in the
> > wikipedia on proxy ARP: https://en.wikipedia.org/wiki/Proxy_ARP.  This
> > seems to aimed at a different application that is more like a weak form
> > of service function chaining than for the traditional applications of
> > proxy ARP.
> >
> > Is this an appropriate application for proxy ARP?  Is it a common enough
> > use case to support in OVN?  Should it instead be handled through a more
> > general service function chaining interface?
> >
> 
> I have similar concerns about how useful it is. I agree it is kind of
> lightweight SFC, but in my opinion it is not contradict with the fact that
> it is also Proxy ARP. At least the firewall example (the 3rd use case)
> described in https://en.wikipedia.org/wiki/Proxy_ARP seems to be very
> close. SFC may be a more general approach to redirect packets to a
> firewall, but it is heavier, and may be difficult in a non-overlay
> environment, while Proxy ARP is sufficient and simple for such specific use
> cases when the nodes are L2 adjacent. So I think it may be good to have
> such feature in OVN, if the change is not intrusive.

Do the general SFC patches that John McDowall is currently working on
serve this use case too?
Han Zhou Feb. 1, 2017, 1:03 a.m. UTC | #9
On Tue, Jan 31, 2017 at 2:49 PM, Ben Pfaff <blp@ovn.org> wrote:
>
> On Thu, Jan 05, 2017 at 12:33:51AM -0800, Han Zhou wrote:
> > On Wed, Jan 4, 2017 at 3:29 PM, Ben Pfaff <blp@ovn.org> wrote:
> > > However, I'm concerned about the general utility here.  I usually
think
> > > of proxy ARP as being used for the kinds of applications you see in
the
> > > wikipedia on proxy ARP: https://en.wikipedia.org/wiki/Proxy_ARP.  This
> > > seems to aimed at a different application that is more like a weak
form
> > > of service function chaining than for the traditional applications of
> > > proxy ARP.
> > >
> > > Is this an appropriate application for proxy ARP?  Is it a common
enough
> > > use case to support in OVN?  Should it instead be handled through a
more
> > > general service function chaining interface?
> > >
> >
> > I have similar concerns about how useful it is. I agree it is kind of
> > lightweight SFC, but in my opinion it is not contradict with the fact
that
> > it is also Proxy ARP. At least the firewall example (the 3rd use case)
> > described in https://en.wikipedia.org/wiki/Proxy_ARP seems to be very
> > close. SFC may be a more general approach to redirect packets to a
> > firewall, but it is heavier, and may be difficult in a non-overlay
> > environment, while Proxy ARP is sufficient and simple for such specific
use
> > cases when the nodes are L2 adjacent. So I think it may be good to have
> > such feature in OVN, if the change is not intrusive.
>
> Do the general SFC patches that John McDowall is currently working on
> serve this use case too?

The scenario looks match, but with the SFC approach, if I understanding
correctly, it requires the VNF to be able to handle packets that is not
with dst mac equal to the mac of the VNF port. While with ARP proxy, MAC is
replaced with the MAC of the VNF (in my case it is just the MAC of the veth
interface in the host namespace where kubeproxy is running, which is not a
real so-called VNF), and no change is required in the VNF host/vm/container.

Thanks,
Han
diff mbox

Patch

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index f3c1682..aefa086 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -446,6 +446,19 @@ 
     </p>
 
     <p>
+      In particular, if a logical switch port has option proxy_arp set to
+      true, we call it an arp-proxy port. In such case, ARP requests to
+      IPv4 addresses of any logical port on the same logical switch will
+      be responded with the MAC address of that arp-proxy port, except when
+      the request comes from that arp-proxy port, which can still learn the
+      real MAC of destination VM/Container. This feature is useful if we
+      want all the traffic to/from any logical ports on the logical switch
+      being forwarded to a particular logical port, which may act as a proxy
+      and will forward the traffic to the real destination after some pre-
+      processing.
+    </p>
+
+    <p>
       ARP requests arrive from VMs from a logical switch inport of type
       default.  For this case, the logical switch proxy ARP rules can be
       for other VMs or logical router ports.  Logical switch proxy ARP
@@ -521,6 +534,16 @@  output;
         </pre>
 
         <p>
+          If arp-proxy port is present in the logical switch, the above
+          mentioned Ethernet address <var>E</var> will be the one belonging
+          to the arp-proxy port, and there will be Priority-60 flows that
+          match ARP requests from the arp-proxy port to each known IP address
+          of every logical switch port, and respond with ARP replies in the
+          same format as above but with Ethernet address <var>E</var> being
+          the real one belonging to each target logical switch port.
+        </p>
+
+        <p>
           These flows are omitted for logical ports (other than router ports)
           that are down.
         </p>
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index c56ac79..f897658 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -380,6 +380,10 @@  struct ovn_datapath {
 
     bool has_unknown;
 
+    /* ARP proxy port: MAC of the lport will be used by arp
+     * responder for other lports in the same LS. */
+    struct ovn_port *arp_proxy_port;
+
     /* IPAM data. */
     struct hmap ipam;
 };
@@ -2012,6 +2016,15 @@  build_port_security_ip(enum ovn_pipeline pipeline, struct ovn_port *op,
 }
 
 static bool
+lsp_is_arp_proxy(const struct nbrec_logical_switch_port *lsp)
+{
+
+    bool is_arp_proxy = smap_get_bool(&lsp->options,
+                                      "arp_proxy", false);
+    return is_arp_proxy;
+}
+
+static bool
 lsp_is_enabled(const struct nbrec_logical_switch_port *lsp)
 {
     return !lsp->enabled || *lsp->enabled;
@@ -2852,6 +2865,32 @@  build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_IP, 0, "1", "next;");
     }
 
+    /* Check arp_proxy option for LS ports */
+    HMAP_FOR_EACH (op, key_node, ports) {
+        if (!op->nbsp) {
+            continue;
+        }
+
+        if (lsp_is_arp_proxy(op->nbsp)) {
+            if (op->n_lsp_addrs != 1) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+                VLOG_WARN_RL(&rl, "one and only one address expected for arp "
+                             "proxy port %s", op->json_key);
+                continue;
+            }
+            if (op->od->arp_proxy_port) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+                VLOG_WARN_RL(&rl,
+                             "ARP proxy lsp %s already existed on ls "UUID_FMT
+                             " but another lsp %s is seen",
+                             op->od->arp_proxy_port->json_key,
+                             UUID_ARGS(&op->od->key),
+                             op->json_key);
+                continue;
+            }
+            op->od->arp_proxy_port = op;
+        }
+    }
     /* Ingress table 10: ARP/ND responder, skip requests coming from localnet
      * and vtep ports. (priority 100); see ovn-northd.8.xml for the
      * rationale. */
@@ -2888,24 +2927,68 @@  build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         for (size_t i = 0; i < op->n_lsp_addrs; i++) {
             for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) {
                 ds_clear(&match);
-                ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
-                              op->lsp_addrs[i].ipv4_addrs[j].addr_s);
-                ds_clear(&actions);
-                ds_put_format(&actions,
-                    "eth.dst = eth.src; "
-                    "eth.src = %s; "
-                    "arp.op = 2; /* ARP reply */ "
-                    "arp.tha = arp.sha; "
-                    "arp.sha = %s; "
-                    "arp.tpa = arp.spa; "
-                    "arp.spa = %s; "
-                    "outport = inport; "
-                    "flags.loopback = 1; "
-                    "output;",
-                    op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
-                    op->lsp_addrs[i].ipv4_addrs[j].addr_s);
-                ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
-                              ds_cstr(&match), ds_cstr(&actions));
+                if (op->od->arp_proxy_port) {
+                    ds_put_format(&match, "arp.tpa == %s && arp.op == 1 && inport == %s",
+                                  op->lsp_addrs[i].ipv4_addrs[j].addr_s,
+                                  op->od->arp_proxy_port->json_key);
+                    ds_clear(&actions);
+                    ds_put_format(&actions,
+                        "eth.dst = eth.src; "
+                        "eth.src = %s; "
+                        "arp.op = 2; /* ARP reply */ "
+                        "arp.tha = arp.sha; "
+                        "arp.sha = %s; "
+                        "arp.tpa = arp.spa; "
+                        "arp.spa = %s; "
+                        "outport = inport; "
+                        "flags.loopback = 1; "
+                        "output;",
+                        op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
+                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
+                    ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 60,
+                                  ds_cstr(&match), ds_cstr(&actions));
+
+                    ds_clear(&match);
+                    ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
+                                  op->lsp_addrs[i].ipv4_addrs[j].addr_s);
+                    ds_clear(&actions);
+                    char *proxy_mac = op->od->arp_proxy_port->lsp_addrs[0].ea_s;
+                    ds_put_format(&actions,
+                        "eth.dst = eth.src; "
+                        "eth.src = %s; "
+                        "arp.op = 2; /* ARP reply */ "
+                        "arp.tha = arp.sha; "
+                        "arp.sha = %s; "
+                        "arp.tpa = arp.spa; "
+                        "arp.spa = %s; "
+                        "outport = inport; "
+                        "flags.loopback = 1; "
+                        "output;",
+                        proxy_mac, proxy_mac,
+                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
+                    ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
+                                  ds_cstr(&match), ds_cstr(&actions));
+
+                } else {
+                    ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
+                                  op->lsp_addrs[i].ipv4_addrs[j].addr_s);
+                    ds_clear(&actions);
+                    ds_put_format(&actions,
+                        "eth.dst = eth.src; "
+                        "eth.src = %s; "
+                        "arp.op = 2; /* ARP reply */ "
+                        "arp.tha = arp.sha; "
+                        "arp.sha = %s; "
+                        "arp.tpa = arp.spa; "
+                        "arp.spa = %s; "
+                        "outport = inport; "
+                        "flags.loopback = 1; "
+                        "output;",
+                        op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
+                        op->lsp_addrs[i].ipv4_addrs[j].addr_s);
+                    ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
+                                  ds_cstr(&match), ds_cstr(&actions));
+                }
 
                 /* Do not reply to an ARP request from the port that owns the
                  * address (otherwise a DHCP client that ARPs to check for a
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 3e40881..a58d800 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -219,6 +219,17 @@ 
         individually below.
       </column>
 
+      <group title="Options for all port types">
+        <p>
+          These options apply when <ref column="type"/> is not set.
+        </p>
+
+        <column name="options" key="arp_proxy">
+          If this option is set to true, the lport acts as ARP proxy for all the
+          other lports on the same logical switch. Default value is false.
+        </column>
+      </group>
+
       <group title="Options for router ports">
         <p>
           These options apply when <ref column="type"/> is <code>router</code>.
diff --git a/tests/ovn.at b/tests/ovn.at
index 3779741..1830920 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -6023,3 +6023,139 @@  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 
 AT_CLEANUP
+
+# 3 hypervisors, one logical switch, 3 logical ports per hypervisor, ARP proxy
+AT_SETUP([ovn -- 3 HVs, 1 LS, 3 lports/HV, ARP proxy])
+AT_KEYWORDS([ovnarpproxy])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Create hypervisors hv[123].
+# Add vif1[123] to hv1, vif2[123] to hv2, vif3[123] to hv3.
+# Add all of the vifs to a single logical switch lsw0.
+# Set logical switch port lp11 as arp proxy.
+ovn-nbctl ls-add lsw0
+net_add n1
+for i in 1 2 3; do
+    sim_add hv$i
+    as hv$i
+    ovs-vsctl add-br br-phys
+    ovn_attach n1 br-phys 192.168.0.$i
+
+    for j in 1 2 3; do
+        ovs-vsctl add-port br-int vif$i$j -- set Interface vif$i$j external-ids:iface-id=lp$i$j options:tx_pcap=hv$i/vif$i$j-tx.pcap options:rxq_pcap=hv$i/vif$i$j-rx.pcap ofport-request=$i$j
+        ovn-nbctl lsp-add lsw0 lp$i$j
+        ovn-nbctl lsp-set-addresses lp$i$j "f0:00:00:00:00:$i$j 192.168.0.$i$j"
+    done
+done
+ovn-nbctl set Logical_Switch_Port lp11 options:arp_proxy=true
+
+# Pre-populate the hypervisors' ARP tables so that we don't lose any
+# packets for ARP resolution (native tunneling doesn't queue packets
+# for ARP resolution).
+ovn_populate_arp
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+# Given the name of a logical port, prints the name of the hypervisor
+# on which it is located.
+vif_to_hv() {
+    echo hv${1%?}
+}
+
+# test_arp INPORT SHA SPA TPA [REPLY_HA]
+#
+# Causes a packet to be received on INPORT.  The packet is an ARP
+# request with SHA, SPA, and TPA as specified.  If REPLY_HA is provided, then
+# it should be the hardware address of the target to expect to receive in an
+# ARP reply; otherwise no reply is expected.
+#
+# INPORT is an logical switch port number, e.g. 11 for vif11.
+# SHA and REPLY_HA are each 12 hex digits.
+# SPA and TPA are each 8 hex digits.
+test_arp() {
+    local inport=$1 sha=$2 spa=$3 tpa=$4 reply_ha=$5
+    local request=ffffffffffff${sha}08060001080006040001${sha}${spa}ffffffffffff${tpa}
+    hv=`vif_to_hv $inport`
+    as $hv ovs-appctl netdev-dummy/receive vif$inport $request
+
+    if test X$reply_ha = X; then
+        # Expect to receive the broadcast ARP on the other logical switch ports
+        # if no reply is expected.
+        local i j
+        for i in 1 2 3; do
+            for j in 1 2 3; do
+                if test $i$j != $inport; then
+                    echo $request >> $i$j.expected
+                fi
+            done
+        done
+    else
+        # Expect to receive the reply, if any.
+        local reply=${sha}${reply_ha}08060001080006040002${reply_ha}${tpa}${sha}${spa}
+        echo $reply >> $inport.expected
+    fi
+}
+
+ip_to_hex() {
+    printf "%02x%02x%02x%02x" "$@"
+}
+
+# Send ARP requests between all pairs of source and destination ports.
+#
+# No reply when sending ARP request to self.
+#
+# If ARP request is sent from ARP proxy port, OVN will generate response with
+# MAC of destination port. Otherwise, OVN will generate response with MAC of
+# the ARP proxy port.
+
+for is in 1 2 3; do
+    for js in 1 2 3; do
+        s=$is$js
+        for id in 1 2 3; do
+            for jd in 1 2 3; do
+                d=$id$jd
+                sip=`ip_to_hex 192 168 0 $i$j`
+                tip=`ip_to_hex 192 168 0 $id$jd`
+                if test $d != $s; then
+                    if test $s = "11"; then
+                        reply_ha=f000000000$d
+                    else
+                        reply_ha=f00000000011
+                    fi
+                else
+                    reply_ha=
+                fi
+                test_arp $s f000000000$s $sip $tip $reply_ha
+            done
+        done
+    done
+done
+
+# dump information and flows with counters
+ovn-sbctl dump-flows -- list multicast_group
+
+echo "------ hv1 dump ------"
+as hv1 ovs-vsctl show
+as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
+
+echo "------ hv2 dump ------"
+as hv2 ovs-vsctl show
+as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
+
+echo "------ hv3 dump ------"
+as hv3 ovs-vsctl show
+as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int
+
+# Now check the packets actually received against the ones expected.
+for i in 1 2 3; do
+    for j in 1 2 3; do
+        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])
+    done
+done
+
+OVN_CLEANUP([hv1],[hv2],[hv3])
+
+AT_CLEANUP