diff mbox

[ovs-dev,rebase,3/3] ovn-controller: Use UDP checksums when creating Geneve tunnels.

Message ID 1470961234-90115-4-git-send-email-jesse@kernel.org
State Accepted
Headers show

Commit Message

Jesse Gross Aug. 12, 2016, 12:20 a.m. UTC
Currently metadata transmitted by OVN over Geneve tunnels is
unprotected by any checksum other than the one provided by the link
layer - this includes both the VNI and data stored in options. Turning
on UDP checksums which cover this data has obvious benefits in terms of
integrity protection.

In terms of performance, this actually significantly increases throughput
in most common cases when running on Linux based hosts without NICs
supporting Geneve offload (around 60% for bulk traffic). The reason is
that generally all NICs are capable of offloading transmitted and received
UDP checksums (viewed as ordinary UDP packets and not as tunnels). The
benefit comes on the receive side where the validated outer UDP checksum
can be used to additionally validate an inner checksum (such as TCP), which
in turn allows aggregation of packets to be more efficiently handled by
the rest of the stack.

Not all devices see such a benefit. The most notable exception is hardware
VTEPs (currently using VXLAN but potentially Geneve in the future). These
devices are designed to not buffer entire packets in their switching engines
and are therefore unable to efficiently compute or validate UDP checksums.
In addition certain versions of the Linux kernel are not able to fully
take advantage of Geneve capable NIC offloads in the presence of checksums.
(This is actually a pretty narrow corner case though - earlier versions of
Linux don't support Geneve offloads at all and later versions support both
offloads and checksums well.)

In order avoid possible problems with these cases, efficient checksum
receive performance is exposed as an encap option in the southbound
database as a hint to remote senders. This currently defaults to off
for hardware VTEPs and on for all other cases.

Signed-off-by: Jesse Gross <jesse@kernel.org>
---
 ovn/controller-vtep/gateway.c |  9 +++++++++
 ovn/controller/chassis.c      |  8 ++++++++
 ovn/controller/encaps.c       | 12 +++++++++++-
 ovn/ovn-sb.xml                | 10 ++++++++--
 ovn/utilities/ovn-sbctl.c     |  2 ++
 tests/ovn-controller-vtep.at  |  1 +
 tests/ovn-sbctl.at            |  3 +++
 7 files changed, 42 insertions(+), 3 deletions(-)

Comments

Ryan Moats Aug. 12, 2016, 2:10 a.m. UTC | #1
"dev" <dev-bounces@openvswitch.org> wrote on 08/11/2016 07:20:34 PM:

> From: Jesse Gross <jesse@kernel.org>
> To: dev@openvswitch.org
> Date: 08/11/2016 07:22 PM
> Subject: [ovs-dev] [PATCH rebase 3/3] ovn-controller: Use UDP
> checksums when creating Geneve tunnels.
> Sent by: "dev" <dev-bounces@openvswitch.org>
>
> Currently metadata transmitted by OVN over Geneve tunnels is
> unprotected by any checksum other than the one provided by the link
> layer - this includes both the VNI and data stored in options. Turning
> on UDP checksums which cover this data has obvious benefits in terms of
> integrity protection.
>
> In terms of performance, this actually significantly increases throughput
> in most common cases when running on Linux based hosts without NICs
> supporting Geneve offload (around 60% for bulk traffic). The reason is
> that generally all NICs are capable of offloading transmitted and
received
> UDP checksums (viewed as ordinary UDP packets and not as tunnels). The
> benefit comes on the receive side where the validated outer UDP checksum
> can be used to additionally validate an inner checksum (such as TCP),
which
> in turn allows aggregation of packets to be more efficiently handled by
> the rest of the stack.
>
> Not all devices see such a benefit. The most notable exception is
hardware
> VTEPs (currently using VXLAN but potentially Geneve in the future). These
> devices are designed to not buffer entire packets in their switching
engines
> and are therefore unable to efficiently compute or validate UDP
checksums.
> In addition certain versions of the Linux kernel are not able to fully
> take advantage of Geneve capable NIC offloads in the presence of
checksums.
> (This is actually a pretty narrow corner case though - earlier versions
of
> Linux don't support Geneve offloads at all and later versions support
both
> offloads and checksums well.)
>
> In order avoid possible problems with these cases, efficient checksum
> receive performance is exposed as an encap option in the southbound
> database as a hint to remote senders. This currently defaults to off
> for hardware VTEPs and on for all other cases.
>
> Signed-off-by: Jesse Gross <jesse@kernel.org>
> ---

Patch 2/3 never made my mailbox, so I'm using this to comment on both
patches

While I understand patch 2/3 does and I like, I'd like it even better if
it had test cases that would help me understand the changes that weren't
picked up by the original code, because that's a potential future
regression.

Assuming that is handled in a follow-on patch, consider this an ack on
both parts 2 and 3...

Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff Aug. 12, 2016, 8:06 p.m. UTC | #2
On Thu, Aug 11, 2016 at 05:20:34PM -0700, Jesse Gross wrote:
> Currently metadata transmitted by OVN over Geneve tunnels is
> unprotected by any checksum other than the one provided by the link
> layer - this includes both the VNI and data stored in options. Turning
> on UDP checksums which cover this data has obvious benefits in terms of
> integrity protection.
> 
> In terms of performance, this actually significantly increases throughput
> in most common cases when running on Linux based hosts without NICs
> supporting Geneve offload (around 60% for bulk traffic). The reason is
> that generally all NICs are capable of offloading transmitted and received
> UDP checksums (viewed as ordinary UDP packets and not as tunnels). The
> benefit comes on the receive side where the validated outer UDP checksum
> can be used to additionally validate an inner checksum (such as TCP), which
> in turn allows aggregation of packets to be more efficiently handled by
> the rest of the stack.
> 
> Not all devices see such a benefit. The most notable exception is hardware
> VTEPs (currently using VXLAN but potentially Geneve in the future). These
> devices are designed to not buffer entire packets in their switching engines
> and are therefore unable to efficiently compute or validate UDP checksums.
> In addition certain versions of the Linux kernel are not able to fully
> take advantage of Geneve capable NIC offloads in the presence of checksums.
> (This is actually a pretty narrow corner case though - earlier versions of
> Linux don't support Geneve offloads at all and later versions support both
> offloads and checksums well.)
> 
> In order avoid possible problems with these cases, efficient checksum
> receive performance is exposed as an encap option in the southbound
> database as a hint to remote senders. This currently defaults to off
> for hardware VTEPs and on for all other cases.
> 
> Signed-off-by: Jesse Gross <jesse@kernel.org>

smap_get_bool() might slightly simplify some bits of this.

The rationale about software versus hardware defaults might go into the
documentation or code somewhere, otherwise it'll probably be forgotten
over time.

Acked-by: Ben Pfaff <blp@ovn.org>
Jesse Gross Aug. 14, 2016, 6:17 a.m. UTC | #3
On Thu, Aug 11, 2016 at 7:10 PM, Ryan Moats <rmoats@us.ibm.com> wrote:
> "dev" <dev-bounces@openvswitch.org> wrote on 08/11/2016 07:20:34 PM:
>
>> From: Jesse Gross <jesse@kernel.org>
>> To: dev@openvswitch.org
>> Date: 08/11/2016 07:22 PM
>> Subject: [ovs-dev] [PATCH rebase 3/3] ovn-controller: Use UDP
>> checksums when creating Geneve tunnels.
>> Sent by: "dev" <dev-bounces@openvswitch.org>
>
>
>>
>> Currently metadata transmitted by OVN over Geneve tunnels is
>> unprotected by any checksum other than the one provided by the link
>> layer - this includes both the VNI and data stored in options. Turning
>> on UDP checksums which cover this data has obvious benefits in terms of
>> integrity protection.
>>
>> In terms of performance, this actually significantly increases throughput
>> in most common cases when running on Linux based hosts without NICs
>> supporting Geneve offload (around 60% for bulk traffic). The reason is
>> that generally all NICs are capable of offloading transmitted and received
>> UDP checksums (viewed as ordinary UDP packets and not as tunnels). The
>> benefit comes on the receive side where the validated outer UDP checksum
>> can be used to additionally validate an inner checksum (such as TCP),
>> which
>> in turn allows aggregation of packets to be more efficiently handled by
>> the rest of the stack.
>>
>> Not all devices see such a benefit. The most notable exception is hardware
>> VTEPs (currently using VXLAN but potentially Geneve in the future). These
>> devices are designed to not buffer entire packets in their switching
>> engines
>> and are therefore unable to efficiently compute or validate UDP checksums.
>> In addition certain versions of the Linux kernel are not able to fully
>> take advantage of Geneve capable NIC offloads in the presence of
>> checksums.
>> (This is actually a pretty narrow corner case though - earlier versions of
>> Linux don't support Geneve offloads at all and later versions support both
>> offloads and checksums well.)
>>
>> In order avoid possible problems with these cases, efficient checksum
>> receive performance is exposed as an encap option in the southbound
>> database as a hint to remote senders. This currently defaults to off
>> for hardware VTEPs and on for all other cases.
>>
>> Signed-off-by: Jesse Gross <jesse@kernel.org>
>> ---
>
> Patch 2/3 never made my mailbox, so I'm using this to comment on both
> patches
>
> While I understand patch 2/3 does and I like, I'd like it even better if
> it had test cases that would help me understand the changes that weren't
> picked up by the original code, because that's a potential future
> regression.
>
> Assuming that is handled in a follow-on patch, consider this an ack on
> both parts 2 and 3...

Thanks for the review - I rolled some test cases directly into this
patch. While I think this is a good idea and should catch the most
common types of problems, as I mentioned in the previous message to
Ben I'm not sure that this is something that we can test our way out
of - there are simply too many possible corner cases. A better
solution would seem to be a common rule engine that can maintain
mappings and invariants based some business logic.
Jesse Gross Aug. 14, 2016, 6:18 a.m. UTC | #4
On Fri, Aug 12, 2016 at 1:06 PM, Ben Pfaff <blp@ovn.org> wrote:
> On Thu, Aug 11, 2016 at 05:20:34PM -0700, Jesse Gross wrote:
>> Currently metadata transmitted by OVN over Geneve tunnels is
>> unprotected by any checksum other than the one provided by the link
>> layer - this includes both the VNI and data stored in options. Turning
>> on UDP checksums which cover this data has obvious benefits in terms of
>> integrity protection.
>>
>> In terms of performance, this actually significantly increases throughput
>> in most common cases when running on Linux based hosts without NICs
>> supporting Geneve offload (around 60% for bulk traffic). The reason is
>> that generally all NICs are capable of offloading transmitted and received
>> UDP checksums (viewed as ordinary UDP packets and not as tunnels). The
>> benefit comes on the receive side where the validated outer UDP checksum
>> can be used to additionally validate an inner checksum (such as TCP), which
>> in turn allows aggregation of packets to be more efficiently handled by
>> the rest of the stack.
>>
>> Not all devices see such a benefit. The most notable exception is hardware
>> VTEPs (currently using VXLAN but potentially Geneve in the future). These
>> devices are designed to not buffer entire packets in their switching engines
>> and are therefore unable to efficiently compute or validate UDP checksums.
>> In addition certain versions of the Linux kernel are not able to fully
>> take advantage of Geneve capable NIC offloads in the presence of checksums.
>> (This is actually a pretty narrow corner case though - earlier versions of
>> Linux don't support Geneve offloads at all and later versions support both
>> offloads and checksums well.)
>>
>> In order avoid possible problems with these cases, efficient checksum
>> receive performance is exposed as an encap option in the southbound
>> database as a hint to remote senders. This currently defaults to off
>> for hardware VTEPs and on for all other cases.
>>
>> Signed-off-by: Jesse Gross <jesse@kernel.org>
>
> smap_get_bool() might slightly simplify some bits of this.
>
> The rationale about software versus hardware defaults might go into the
> documentation or code somewhere, otherwise it'll probably be forgotten
> over time.
>
> Acked-by: Ben Pfaff <blp@ovn.org>

Thanks, I improved those two areas that you mentioned and then pushed
this series to master.
diff mbox

Patch

diff --git a/ovn/controller-vtep/gateway.c b/ovn/controller-vtep/gateway.c
index ebe0087..e44b319 100644
--- a/ovn/controller-vtep/gateway.c
+++ b/ovn/controller-vtep/gateway.c
@@ -57,6 +57,8 @@  create_chassis_rec(struct ovsdb_idl_txn *txn, const char *name,
     encap_rec = sbrec_encap_insert(txn);
     sbrec_encap_set_type(encap_rec, OVN_SB_ENCAP_TYPE);
     sbrec_encap_set_ip(encap_rec, encap_ip);
+    const struct smap options = SMAP_CONST1(&options, "csum", "false");
+    sbrec_encap_set_options(encap_rec, &options);
     sbrec_chassis_set_encaps(chassis_rec, &encap_rec, 1);
 
     return chassis_rec;
@@ -107,6 +109,13 @@  revalidate_gateway(struct controller_vtep_ctx *ctx)
             if (strcmp(chassis_rec->encaps[0]->ip, encap_ip)) {
                 sbrec_encap_set_ip(chassis_rec->encaps[0], encap_ip);
             }
+            const char *csum = smap_get(&chassis_rec->encaps[0]->options,
+                                        "csum");
+            if (!csum || strcmp(csum, "false")) {
+                const struct smap options = SMAP_CONST1(&options, "csum",
+                                                                  "false");
+                sbrec_encap_set_options(chassis_rec->encaps[0], &options);
+            }
         } else {
             if (gw_node) {
                 VLOG_WARN("Chassis for VTEP physical switch (%s) disappears, "
diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
index b5f022e..beb393b 100644
--- a/ovn/controller/chassis.c
+++ b/ovn/controller/chassis.c
@@ -137,8 +137,13 @@  chassis_run(struct controller_ctx *ctx, const char *chassis_id)
         uint32_t cur_tunnels = 0;
         bool same = true;
         for (int i = 0; i < chassis_rec->n_encaps; i++) {
+            const char *csum;
+
             cur_tunnels |= get_tunnel_type(chassis_rec->encaps[i]->type);
             same = same && !strcmp(chassis_rec->encaps[i]->ip, encap_ip);
+
+            csum = smap_get(&chassis_rec->encaps[i]->options, "csum");
+            same = same && csum && !strcmp(csum, "true");
         }
         same = same && req_tunnels == cur_tunnels;
 
@@ -178,6 +183,8 @@  chassis_run(struct controller_ctx *ctx, const char *chassis_id)
 
     int n_encaps = count_1bits(req_tunnels);
     struct sbrec_encap **encaps = xmalloc(n_encaps * sizeof *encaps);
+    const struct smap options = SMAP_CONST1(&options, "csum", "true");
+
     for (int i = 0; i < n_encaps; i++) {
         const char *type = pop_tunnel_name(&req_tunnels);
 
@@ -185,6 +192,7 @@  chassis_run(struct controller_ctx *ctx, const char *chassis_id)
 
         sbrec_encap_set_type(encaps[i], type);
         sbrec_encap_set_ip(encaps[i], encap_ip);
+        sbrec_encap_set_options(encaps[i], &options);
     }
 
     sbrec_chassis_set_encaps(chassis_rec, encaps, n_encaps);
diff --git a/ovn/controller/encaps.c b/ovn/controller/encaps.c
index 52348c3..7ae6af8 100644
--- a/ovn/controller/encaps.c
+++ b/ovn/controller/encaps.c
@@ -191,6 +191,7 @@  tunnel_add(const struct sbrec_chassis *chassis_rec,
     struct smap options = SMAP_INITIALIZER(&options);
     struct ovsrec_port *port, **ports;
     struct ovsrec_interface *iface;
+    const char *csum = smap_get(&encap->options, "csum");
     char *port_name;
     size_t i;
 
@@ -206,6 +207,9 @@  tunnel_add(const struct sbrec_chassis *chassis_rec,
     ovsrec_interface_set_type(iface, encap->type);
     smap_add(&options, "remote_ip", encap->ip);
     smap_add(&options, "key", "flow");
+    if (csum && (!strcmp(csum, "true") || !strcmp(csum, "false"))) {
+        smap_add(&options, "csum", csum);
+    }
     ovsrec_interface_set_options(iface, &options);
     smap_destroy(&options);
 
@@ -303,15 +307,21 @@  check_and_update_tunnel(const struct ovsrec_port *port,
 {
     const struct sbrec_encap *encap = preferred_encap(chassis_rec);
     const struct ovsrec_interface *iface = port->interfaces[0];
+    const char *csum = smap_get(&encap->options, "csum");
+    const char *existing_csum = smap_get(&iface->options, "csum");
 
     if (strcmp(encap->type, iface->type)) {
         ovsrec_interface_set_type(iface, encap->type);
     }
     const char *ip = smap_get(&iface->options, "remote_ip");
-    if (!ip || strcmp(encap->ip, ip)) {
+    if (!ip || strcmp(encap->ip, ip) ||
+        (!!csum != !!existing_csum || (csum && strcmp(csum, existing_csum)))) {
         struct smap options = SMAP_INITIALIZER(&options);
         smap_add(&options, "remote_ip", encap->ip);
         smap_add(&options, "key", "flow");
+        if (csum && (!strcmp(csum, "true") || !strcmp(csum, "false"))) {
+            smap_add(&options, "csum", csum);
+        }
         ovsrec_interface_set_options(iface, &options);
         smap_destroy(&options);
     }
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 13c9526..1cc650f 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -274,8 +274,14 @@ 
     </column>
 
     <column name="options">
-      Options for configuring the encapsulation, e.g. IPsec parameters when
-      IPsec support is introduced.  No options are currently defined.
+      Options for configuring the encapsulation. Currently, the only
+      option that has been defined is <code>csum</code>. This indicates
+      that encapsulation checksums can be transmitted and received with
+      reasonable performance. It is a hint to senders transmitting data
+      to this chassis that they should used checksums to protect OVN
+      metadata. In addition, on Linux based hypervisors this will also
+      improve the throughput of encapsulated traffic in most cases. Set to
+      <code>true</code> to enable or <code>false</code> to disable.
     </column>
 
     <column name="ip">
diff --git a/ovn/utilities/ovn-sbctl.c b/ovn/utilities/ovn-sbctl.c
index 50c9f17..cb2d477 100644
--- a/ovn/utilities/ovn-sbctl.c
+++ b/ovn/utilities/ovn-sbctl.c
@@ -557,6 +557,7 @@  cmd_chassis_add(struct ctl_context *ctx)
 
     size_t n_encaps = sset_count(&encap_set);
     struct sbrec_encap **encaps = xmalloc(n_encaps * sizeof *encaps);
+    const struct smap options = SMAP_CONST1(&options, "csum", "true");
     const char *encap_type;
     int i = 0;
     SSET_FOR_EACH (encap_type, &encap_set){
@@ -564,6 +565,7 @@  cmd_chassis_add(struct ctl_context *ctx)
 
         sbrec_encap_set_type(encaps[i], encap_type);
         sbrec_encap_set_ip(encaps[i], encap_ip);
+        sbrec_encap_set_options(encaps[i], &options);
         i++;
     }
     sset_destroy(&encap_set);
diff --git a/tests/ovn-controller-vtep.at b/tests/ovn-controller-vtep.at
index 8eca16c..654c212 100644
--- a/tests/ovn-controller-vtep.at
+++ b/tests/ovn-controller-vtep.at
@@ -124,6 +124,7 @@  AT_CHECK([ovn-sbctl show], [0], [dnl
 Chassis br-vtep
     Encap vxlan
         ip: "1.2.3.4"
+        options: {csum="false"}
 ])
 
 # deletes the chassis via ovn-sbctl and check that it is readded back
diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
index 72dc441..9393eef 100644
--- a/tests/ovn-sbctl.at
+++ b/tests/ovn-sbctl.at
@@ -93,6 +93,7 @@  AT_CHECK([ovn-sbctl show], [0], [dnl
 Chassis "ch0"
     Encap stt
         ip: "1.2.3.5"
+        options: {csum="true"}
     Port_Binding "vif0"
 ])
 
@@ -105,6 +106,7 @@  AT_CHECK([ovn-sbctl show | sed 's/vif[[0-9]]/vif/'], [0], [dnl
 Chassis "ch0"
     Encap stt
         ip: "1.2.3.5"
+        options: {csum="true"}
     Port_Binding "vif"
     Port_Binding "vif"
 ])
@@ -116,6 +118,7 @@  AT_CHECK([ovn-sbctl show], [0], [dnl
 Chassis "ch0"
     Encap stt
         ip: "1.2.3.5"
+        options: {csum="true"}
     Port_Binding "vif0"
 ])