diff mbox series

[ovs-dev,v2] netdev-offload-tc: Fix offload of tunnel key tp_src.

Message ID 20231201230836.3093792-1-i.maximets@ovn.org
State Accepted
Commit 472dd66423aae651f2c453e6f6e785b954386cea
Headers show
Series [ovs-dev,v2] netdev-offload-tc: Fix offload of tunnel key tp_src. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

Ilya Maximets Dec. 1, 2023, 11:06 p.m. UTC
There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload
should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested
by OVS.  Current code just ignores the attribute in the tunnel(set())
action leading to a flow mismatch and potential incorrect datapath
behavior:

  |tc(handler21)|DBG|tc flower compare failed action compare
  ...
  Action 0 mismatch:
  - Expected Action:
  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
  00000020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
  ...
 - Received Action:
  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
  00000020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
  ...

In the tc_action dump above we can see the difference on the second
line.  The action dumped from a kernel is missing 'c0 5b' - source
port for a tunnel(set()) action on the second line.

Removing the field from the tc_action_encap structure entirely to
avoid any potential confusion.

Note: In general, the source port number in the tunnel(set()) action
is not particularly useful for most tunnels, because they may just
ignore the value.  Specs for Geneve and VXLAN suggest using a value
based on the headers of the inner packet as a source port.
In vast majority of scenarios the source port doesn't actually end
up in the action itself.
Having a mismatch between the userspace and TC leads to constant
revalidation of the flow and warnings in the log.

Adding a test case that demonstrates a scenario where the issue
occurs - bridging of two tunnels.

Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
Reported-by: Vladislav Odintsov <odivlad@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

Version 2:
  * Slightly adjusted a commit message now that we understand the
    scenario better.
  * Added a test case that reproduces the issue.


 lib/netdev-offload-tc.c |  4 ++-
 lib/tc.h                |  3 +-
 tests/system-traffic.at | 77 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+), 2 deletions(-)

Comments

Vladislav Odintsov Dec. 2, 2023, 3:16 p.m. UTC | #1
Hi Ilya, thanks for the added test!

I’ve got one question about it, psb.

> On 2 Dec 2023, at 02:06, Ilya Maximets <i.maximets@ovn.org> wrote:
> 
> There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload
> should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested
> by OVS.  Current code just ignores the attribute in the tunnel(set())
> action leading to a flow mismatch and potential incorrect datapath
> behavior:
> 
>  |tc(handler21)|DBG|tc flower compare failed action compare
>  ...
>  Action 0 mismatch:
>  - Expected Action:
>  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>  00000020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>  ...
> - Received Action:
>  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>  00000020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>  ...
> 
> In the tc_action dump above we can see the difference on the second
> line.  The action dumped from a kernel is missing 'c0 5b' - source
> port for a tunnel(set()) action on the second line.
> 
> Removing the field from the tc_action_encap structure entirely to
> avoid any potential confusion.
> 
> Note: In general, the source port number in the tunnel(set()) action
> is not particularly useful for most tunnels, because they may just
> ignore the value.  Specs for Geneve and VXLAN suggest using a value
> based on the headers of the inner packet as a source port.
> In vast majority of scenarios the source port doesn't actually end
> up in the action itself.
> Having a mismatch between the userspace and TC leads to constant
> revalidation of the flow and warnings in the log.
> 
> Adding a test case that demonstrates a scenario where the issue
> occurs - bridging of two tunnels.
> 
> Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
> Reported-by: Vladislav Odintsov <odivlad@gmail.com>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
> 
> Version 2:
>  * Slightly adjusted a commit message now that we understand the
>    scenario better.
>  * Added a test case that reproduces the issue.
> 
> 
> lib/netdev-offload-tc.c |  4 ++-
> lib/tc.h                |  3 +-
> tests/system-traffic.at | 77 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 82 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index b846a63c2..164c7eef6 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1627,7 +1627,9 @@ parse_put_flow_set_action(struct tc_flower *flower, struct tc_action *action,
>         }
>         break;
>         case OVS_TUNNEL_KEY_ATTR_TP_SRC: {
> -            action->encap.tp_src = nl_attr_get_be16(tun_attr);
> +            /* There is no corresponding attribute in TC. */
> +            VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC");
> +            return EOPNOTSUPP;
>         }
>         break;
>         case OVS_TUNNEL_KEY_ATTR_TP_DST: {
> diff --git a/lib/tc.h b/lib/tc.h
> index 06707ffa4..fdbcf4b7c 100644
> --- a/lib/tc.h
> +++ b/lib/tc.h
> @@ -213,7 +213,8 @@ enum nat_type {
> struct tc_action_encap {
>     bool id_present;
>     ovs_be64 id;
> -    ovs_be16 tp_src;
> +    /* ovs_be16 tp_src;  Could have been here, but there is no
> +     * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */
>     ovs_be16 tp_dst;
>     uint8_t tos;
>     uint8_t ttl;
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index a7d4ed83b..fd55fdee1 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -903,6 +903,83 @@ ovs-pcap p0.pcap
> AT_CHECK([ovs-pcap p0.pcap | grep -Eq "^[[[:xdigit:]]]{24}86dd60000000003a1140fc000000000000000000000000000100fc000000000000000000000000000001[[[:xdigit:]]]{4}17c1003a[[[:xdigit:]]]{4}0000655800000000fffffffffffffa163e949d8008060001080006040001[[[:xdigit:]]]{12}0a0000f40000000000000a0000fe$"])
> AT_CLEANUP
> 
> +AT_SETUP([datapath - bridging two geneve tunnels])
> +OVS_CHECK_TUNNEL_TSO()
> +OVS_CHECK_GENEVE()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay-0])
> +ADD_BR([br-underlay-1])
> +
> +ADD_NAMESPACES(at_ns0)
> +ADD_NAMESPACES(at_ns1)
> +
> +dnl Set up underlay link from host into the namespaces using veth pairs.
> +ADD_VETH(p0, at_ns0, br-underlay-0, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay-0 "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay-0 up])
> +
> +ADD_VETH(p1, at_ns1, br-underlay-1, "172.31.2.1/24")
> +AT_CHECK([ip addr add dev br-underlay-1 "172.31.2.100/24"])
> +AT_CHECK([ip link set dev br-underlay-1 up])
> +
> +dnl Set up two OVS tunnel endpoints in a root namespace and two native
> +dnl linux devices inside the test namespaces.
> +dnl
> +dnl  ns_gnv0                          |               ns_gnv1
> +dnl  ip:        10.1.1.1/24           |               ip:        10.1.1.2/24
> +dnl  remote_ip: 172.31.1.100          |               remote_ip: 172.31.2.100
> +dnl          |                        |                  |
> +dnl          |                        |                  |
> +dnl  p0                               |               p1
> +dnl  ip: 172.31.1.1/24                |               ip: 172.31.2.1/24
> +dnl          |                  NS0   |   NS1            |
> +dnl ---------|------------------------+------------------|--------------------
> +dnl          |                                           |
> +dnl       br-underlay-0:                           br-underlay-1:
> +dnl       ip: 172.31.1.100/24                      ip: 172.31.2.100/24
> +dnl         ovs-p0                                   ovs-p1
> +dnl          |                                           |
> +dnl          |            br0                            |
> +dnl       encap/decap --- ip: 10.1.1.100/24 --------- encap/decap
> +dnl                         at_gnv0
> +dnl                           remote_ip: 172.31.1.1
> +dnl                         at_gnv1
> +dnl                           remote_ip: 172.31.2.1
> +dnl
> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
> +                  [vni 0])
> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv1], [172.31.2.1], [10.1.1.101/24])
> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv1], [at_ns1], [172.31.2.100], [10.1.1.2/24],
> +                  [vni 0])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay-0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay-1 "actions=normal"])
> +
> +dnl First, check both underlays.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -W 2 172.31.2.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Now, check the overlay with different packet sizes.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])

In the original problem there was used a TC offload, which throwed warn
messages in vswitchd logfile but offload functioned well.  IIUC, with this
patch and without [0] offload should not happen due to its changes, but in both
cases there should be connectivity: "incorrect" old behaviour using TC and
"correct" new one falling back to kernel (or other non-tc-offloaded) datapath.

So, I’m wondering whether this test should check somehow that tp_src is ignored
before offload?  Or it’s just a test to quickly reproduce mentioned topology
and than to do things manually?

0: https://patchwork.ozlabs.org/project/openvswitch/patch/20231201210523.3085560-1-i.maximets@ovn.org/

> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> AT_SETUP([datapath - ping over gre tunnel by simulated packets])
> OVS_CHECK_TUNNEL_TSO()
> OVS_CHECK_MIN_KERNEL(3, 10)
> -- 
> 2.43.0
> 


Regards,
Vladislav Odintsov
Eelco Chaudron Dec. 4, 2023, 9:10 a.m. UTC | #2
On 2 Dec 2023, at 0:06, Ilya Maximets wrote:

> There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload
> should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested
> by OVS.  Current code just ignores the attribute in the tunnel(set())
> action leading to a flow mismatch and potential incorrect datapath
> behavior:
>
>   |tc(handler21)|DBG|tc flower compare failed action compare
>   ...
>   Action 0 mismatch:
>   - Expected Action:
>   00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>   00000020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>   00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>   ...
>  - Received Action:
>   00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>   00000020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>   00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>   ...
>
> In the tc_action dump above we can see the difference on the second
> line.  The action dumped from a kernel is missing 'c0 5b' - source
> port for a tunnel(set()) action on the second line.
>
> Removing the field from the tc_action_encap structure entirely to
> avoid any potential confusion.
>
> Note: In general, the source port number in the tunnel(set()) action
> is not particularly useful for most tunnels, because they may just
> ignore the value.  Specs for Geneve and VXLAN suggest using a value
> based on the headers of the inner packet as a source port.
> In vast majority of scenarios the source port doesn't actually end
> up in the action itself.
> Having a mismatch between the userspace and TC leads to constant
> revalidation of the flow and warnings in the log.
>
> Adding a test case that demonstrates a scenario where the issue
> occurs - bridging of two tunnels.
>
> Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
> Reported-by: Vladislav Odintsov <odivlad@gmail.com>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

Thank for fixing this and adding the unit test.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Marcelo Leitner Dec. 4, 2023, 12:40 p.m. UTC | #3
On Sat, Dec 02, 2023 at 12:06:59AM +0100, Ilya Maximets wrote:
...
> Adding a test case that demonstrates a scenario where the issue
> occurs - bridging of two tunnels.

I get the fix and it LGTM. What I don't get is the test case.
Considering that the attr was getting simply ignored, wouldn't the
test case always succeed? That is, I don't see how ignoring tp_src
would cause the test to fail. Can you please elaborate?

  Marcelo

>
> Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
> Reported-by: Vladislav Odintsov <odivlad@gmail.com>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>
> Version 2:
>   * Slightly adjusted a commit message now that we understand the
>     scenario better.
>   * Added a test case that reproduces the issue.
>
>
>  lib/netdev-offload-tc.c |  4 ++-
>  lib/tc.h                |  3 +-
>  tests/system-traffic.at | 77 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 82 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index b846a63c2..164c7eef6 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1627,7 +1627,9 @@ parse_put_flow_set_action(struct tc_flower *flower, struct tc_action *action,
>          }
>          break;
>          case OVS_TUNNEL_KEY_ATTR_TP_SRC: {
> -            action->encap.tp_src = nl_attr_get_be16(tun_attr);
> +            /* There is no corresponding attribute in TC. */
> +            VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC");
> +            return EOPNOTSUPP;
>          }
>          break;
>          case OVS_TUNNEL_KEY_ATTR_TP_DST: {
> diff --git a/lib/tc.h b/lib/tc.h
> index 06707ffa4..fdbcf4b7c 100644
> --- a/lib/tc.h
> +++ b/lib/tc.h
> @@ -213,7 +213,8 @@ enum nat_type {
>  struct tc_action_encap {
>      bool id_present;
>      ovs_be64 id;
> -    ovs_be16 tp_src;
> +    /* ovs_be16 tp_src;  Could have been here, but there is no
> +     * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */
>      ovs_be16 tp_dst;
>      uint8_t tos;
>      uint8_t ttl;
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index a7d4ed83b..fd55fdee1 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -903,6 +903,83 @@ ovs-pcap p0.pcap
>  AT_CHECK([ovs-pcap p0.pcap | grep -Eq "^[[[:xdigit:]]]{24}86dd60000000003a1140fc000000000000000000000000000100fc000000000000000000000000000001[[[:xdigit:]]]{4}17c1003a[[[:xdigit:]]]{4}0000655800000000fffffffffffffa163e949d8008060001080006040001[[[:xdigit:]]]{12}0a0000f40000000000000a0000fe$"])
>  AT_CLEANUP
>
> +AT_SETUP([datapath - bridging two geneve tunnels])
> +OVS_CHECK_TUNNEL_TSO()
> +OVS_CHECK_GENEVE()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay-0])
> +ADD_BR([br-underlay-1])
> +
> +ADD_NAMESPACES(at_ns0)
> +ADD_NAMESPACES(at_ns1)
> +
> +dnl Set up underlay link from host into the namespaces using veth pairs.
> +ADD_VETH(p0, at_ns0, br-underlay-0, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay-0 "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay-0 up])
> +
> +ADD_VETH(p1, at_ns1, br-underlay-1, "172.31.2.1/24")
> +AT_CHECK([ip addr add dev br-underlay-1 "172.31.2.100/24"])
> +AT_CHECK([ip link set dev br-underlay-1 up])
> +
> +dnl Set up two OVS tunnel endpoints in a root namespace and two native
> +dnl linux devices inside the test namespaces.
> +dnl
> +dnl  ns_gnv0                          |               ns_gnv1
> +dnl  ip:        10.1.1.1/24           |               ip:        10.1.1.2/24
> +dnl  remote_ip: 172.31.1.100          |               remote_ip: 172.31.2.100
> +dnl          |                        |                  |
> +dnl          |                        |                  |
> +dnl  p0                               |               p1
> +dnl  ip: 172.31.1.1/24                |               ip: 172.31.2.1/24
> +dnl          |                  NS0   |   NS1            |
> +dnl ---------|------------------------+------------------|--------------------
> +dnl          |                                           |
> +dnl       br-underlay-0:                           br-underlay-1:
> +dnl       ip: 172.31.1.100/24                      ip: 172.31.2.100/24
> +dnl         ovs-p0                                   ovs-p1
> +dnl          |                                           |
> +dnl          |            br0                            |
> +dnl       encap/decap --- ip: 10.1.1.100/24 --------- encap/decap
> +dnl                         at_gnv0
> +dnl                           remote_ip: 172.31.1.1
> +dnl                         at_gnv1
> +dnl                           remote_ip: 172.31.2.1
> +dnl
> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
> +                  [vni 0])
> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv1], [172.31.2.1], [10.1.1.101/24])
> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv1], [at_ns1], [172.31.2.100], [10.1.1.2/24],
> +                  [vni 0])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay-0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay-1 "actions=normal"])
> +
> +dnl First, check both underlays.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -W 2 172.31.2.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Now, check the overlay with different packet sizes.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([datapath - ping over gre tunnel by simulated packets])
>  OVS_CHECK_TUNNEL_TSO()
>  OVS_CHECK_MIN_KERNEL(3, 10)
> --
> 2.43.0
>
Ilya Maximets Dec. 4, 2023, 12:57 p.m. UTC | #4
On 12/4/23 13:40, Marcelo Leitner wrote:
> On Sat, Dec 02, 2023 at 12:06:59AM +0100, Ilya Maximets wrote:
> ...
>> Adding a test case that demonstrates a scenario where the issue
>> occurs - bridging of two tunnels.
> 
> I get the fix and it LGTM. What I don't get is the test case.
> Considering that the attr was getting simply ignored, wouldn't the
> test case always succeed? That is, I don't see how ignoring tp_src
> would cause the test to fail. Can you please elaborate?

Ignoring the tp_src is causing tc code to warn about kernel
acknowledgement mismatch.  Testsuite doesn't tolerate warnings,
so the test fails on checking the logs:


./system-traffic.at:980: check_logs 
--- /dev/null   2023-11-16 11:36:50.991000000 -0500
+++ /root/ovs/tests/system-offloads-testsuite.dir/at-groups/40/stdout   2023-12-04 07:55:35.055945362 -0500
@@ -0,0 +1,6 @@
+2023-12-04T12:55:32.199Z|00001|tc(handler22)|WARN|Kernel flower acknowledgment does not match request!  Set dpif_netlink to dbg to see which rule caused this error.
+2023-12-04T12:55:32.199Z|00002|tc(handler22)|WARN|Kernel flower acknowledgment does not match request!  Set dpif_netlink to dbg to see which rule caused this error.
+2023-12-04T12:55:32.574Z|00001|tc(handler8)|WARN|Kernel flower acknowledgment does not match request!  Set dpif_netlink to dbg to see which rule caused this error.
+2023-12-04T12:55:32.765Z|00001|tc(handler20)|WARN|Kernel flower acknowledgment does not match request!  Set dpif_netlink to dbg to see which rule caused this error.
+2023-12-04T12:55:33.189Z|00001|tc(handler23)|WARN|Kernel flower acknowledgment does not match request!  Set dpif_netlink to dbg to see which rule caused this error.
+2023-12-04T12:55:33.597Z|00003|tc(handler8)|WARN|Kernel flower acknowledgment does not match request!  Set dpif_netlink to dbg to see which rule caused this error.
<...>
40. system-traffic.at:906:  FAILED (system-traffic.at:980)

We do not check if the offloading is actually happening, but we
check if there is any miscommunication between ovs-vswitchd and
the kernel (TC).

Does that make sense?

> 
>   Marcelo
> 
>>
>> Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
>> Reported-by: Vladislav Odintsov <odivlad@gmail.com>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>
>> Version 2:
>>   * Slightly adjusted a commit message now that we understand the
>>     scenario better.
>>   * Added a test case that reproduces the issue.
>>
>>
>>  lib/netdev-offload-tc.c |  4 ++-
>>  lib/tc.h                |  3 +-
>>  tests/system-traffic.at | 77 +++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 82 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index b846a63c2..164c7eef6 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1627,7 +1627,9 @@ parse_put_flow_set_action(struct tc_flower *flower, struct tc_action *action,
>>          }
>>          break;
>>          case OVS_TUNNEL_KEY_ATTR_TP_SRC: {
>> -            action->encap.tp_src = nl_attr_get_be16(tun_attr);
>> +            /* There is no corresponding attribute in TC. */
>> +            VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC");
>> +            return EOPNOTSUPP;
>>          }
>>          break;
>>          case OVS_TUNNEL_KEY_ATTR_TP_DST: {
>> diff --git a/lib/tc.h b/lib/tc.h
>> index 06707ffa4..fdbcf4b7c 100644
>> --- a/lib/tc.h
>> +++ b/lib/tc.h
>> @@ -213,7 +213,8 @@ enum nat_type {
>>  struct tc_action_encap {
>>      bool id_present;
>>      ovs_be64 id;
>> -    ovs_be16 tp_src;
>> +    /* ovs_be16 tp_src;  Could have been here, but there is no
>> +     * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */
>>      ovs_be16 tp_dst;
>>      uint8_t tos;
>>      uint8_t ttl;
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index a7d4ed83b..fd55fdee1 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -903,6 +903,83 @@ ovs-pcap p0.pcap
>>  AT_CHECK([ovs-pcap p0.pcap | grep -Eq "^[[[:xdigit:]]]{24}86dd60000000003a1140fc000000000000000000000000000100fc000000000000000000000000000001[[[:xdigit:]]]{4}17c1003a[[[:xdigit:]]]{4}0000655800000000fffffffffffffa163e949d8008060001080006040001[[[:xdigit:]]]{12}0a0000f40000000000000a0000fe$"])
>>  AT_CLEANUP
>>
>> +AT_SETUP([datapath - bridging two geneve tunnels])
>> +OVS_CHECK_TUNNEL_TSO()
>> +OVS_CHECK_GENEVE()
>> +
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +ADD_BR([br-underlay-0])
>> +ADD_BR([br-underlay-1])
>> +
>> +ADD_NAMESPACES(at_ns0)
>> +ADD_NAMESPACES(at_ns1)
>> +
>> +dnl Set up underlay link from host into the namespaces using veth pairs.
>> +ADD_VETH(p0, at_ns0, br-underlay-0, "172.31.1.1/24")
>> +AT_CHECK([ip addr add dev br-underlay-0 "172.31.1.100/24"])
>> +AT_CHECK([ip link set dev br-underlay-0 up])
>> +
>> +ADD_VETH(p1, at_ns1, br-underlay-1, "172.31.2.1/24")
>> +AT_CHECK([ip addr add dev br-underlay-1 "172.31.2.100/24"])
>> +AT_CHECK([ip link set dev br-underlay-1 up])
>> +
>> +dnl Set up two OVS tunnel endpoints in a root namespace and two native
>> +dnl linux devices inside the test namespaces.
>> +dnl
>> +dnl  ns_gnv0                          |               ns_gnv1
>> +dnl  ip:        10.1.1.1/24           |               ip:        10.1.1.2/24
>> +dnl  remote_ip: 172.31.1.100          |               remote_ip: 172.31.2.100
>> +dnl          |                        |                  |
>> +dnl          |                        |                  |
>> +dnl  p0                               |               p1
>> +dnl  ip: 172.31.1.1/24                |               ip: 172.31.2.1/24
>> +dnl          |                  NS0   |   NS1            |
>> +dnl ---------|------------------------+------------------|--------------------
>> +dnl          |                                           |
>> +dnl       br-underlay-0:                           br-underlay-1:
>> +dnl       ip: 172.31.1.100/24                      ip: 172.31.2.100/24
>> +dnl         ovs-p0                                   ovs-p1
>> +dnl          |                                           |
>> +dnl          |            br0                            |
>> +dnl       encap/decap --- ip: 10.1.1.100/24 --------- encap/decap
>> +dnl                         at_gnv0
>> +dnl                           remote_ip: 172.31.1.1
>> +dnl                         at_gnv1
>> +dnl                           remote_ip: 172.31.2.1
>> +dnl
>> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
>> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
>> +                  [vni 0])
>> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv1], [172.31.2.1], [10.1.1.101/24])
>> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv1], [at_ns1], [172.31.2.100], [10.1.1.2/24],
>> +                  [vni 0])
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>> +AT_CHECK([ovs-ofctl add-flow br-underlay-0 "actions=normal"])
>> +AT_CHECK([ovs-ofctl add-flow br-underlay-1 "actions=normal"])
>> +
>> +dnl First, check both underlays.
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 172.31.1.100 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -W 2 172.31.2.100 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +
>> +dnl Now, check the overlay with different packet sizes.
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_SETUP([datapath - ping over gre tunnel by simulated packets])
>>  OVS_CHECK_TUNNEL_TSO()
>>  OVS_CHECK_MIN_KERNEL(3, 10)
>> --
>> 2.43.0
>>
>
Ilya Maximets Dec. 4, 2023, 12:59 p.m. UTC | #5
On 12/2/23 16:16, Vladislav Odintsov wrote:
> Hi Ilya, thanks for the added test!
> 
> I’ve got one question about it, psb.
> 
>> On 2 Dec 2023, at 02:06, Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload
>> should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested
>> by OVS.  Current code just ignores the attribute in the tunnel(set())
>> action leading to a flow mismatch and potential incorrect datapath
>> behavior:
>>
>>  |tc(handler21)|DBG|tc flower compare failed action compare
>>  ...
>>  Action 0 mismatch:
>>  - Expected Action:
>>  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>>  00000020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>>  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>>  ...
>> - Received Action:
>>  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>>  00000020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>>  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>>  ...
>>
>> In the tc_action dump above we can see the difference on the second
>> line.  The action dumped from a kernel is missing 'c0 5b' - source
>> port for a tunnel(set()) action on the second line.
>>
>> Removing the field from the tc_action_encap structure entirely to
>> avoid any potential confusion.
>>
>> Note: In general, the source port number in the tunnel(set()) action
>> is not particularly useful for most tunnels, because they may just
>> ignore the value.  Specs for Geneve and VXLAN suggest using a value
>> based on the headers of the inner packet as a source port.
>> In vast majority of scenarios the source port doesn't actually end
>> up in the action itself.
>> Having a mismatch between the userspace and TC leads to constant
>> revalidation of the flow and warnings in the log.
>>
>> Adding a test case that demonstrates a scenario where the issue
>> occurs - bridging of two tunnels.
>>
>> Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
>> Reported-by: Vladislav Odintsov <odivlad@gmail.com>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>
>> Version 2:
>>  * Slightly adjusted a commit message now that we understand the
>>    scenario better.
>>  * Added a test case that reproduces the issue.
>>
>>
>> lib/netdev-offload-tc.c |  4 ++-
>> lib/tc.h                |  3 +-
>> tests/system-traffic.at | 77 +++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 82 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index b846a63c2..164c7eef6 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1627,7 +1627,9 @@ parse_put_flow_set_action(struct tc_flower *flower, struct tc_action *action,
>>         }
>>         break;
>>         case OVS_TUNNEL_KEY_ATTR_TP_SRC: {
>> -            action->encap.tp_src = nl_attr_get_be16(tun_attr);
>> +            /* There is no corresponding attribute in TC. */
>> +            VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC");
>> +            return EOPNOTSUPP;
>>         }
>>         break;
>>         case OVS_TUNNEL_KEY_ATTR_TP_DST: {
>> diff --git a/lib/tc.h b/lib/tc.h
>> index 06707ffa4..fdbcf4b7c 100644
>> --- a/lib/tc.h
>> +++ b/lib/tc.h
>> @@ -213,7 +213,8 @@ enum nat_type {
>> struct tc_action_encap {
>>     bool id_present;
>>     ovs_be64 id;
>> -    ovs_be16 tp_src;
>> +    /* ovs_be16 tp_src;  Could have been here, but there is no
>> +     * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */
>>     ovs_be16 tp_dst;
>>     uint8_t tos;
>>     uint8_t ttl;
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index a7d4ed83b..fd55fdee1 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -903,6 +903,83 @@ ovs-pcap p0.pcap
>> AT_CHECK([ovs-pcap p0.pcap | grep -Eq "^[[[:xdigit:]]]{24}86dd60000000003a1140fc000000000000000000000000000100fc000000000000000000000000000001[[[:xdigit:]]]{4}17c1003a[[[:xdigit:]]]{4}0000655800000000fffffffffffffa163e949d8008060001080006040001[[[:xdigit:]]]{12}0a0000f40000000000000a0000fe$"])
>> AT_CLEANUP
>>
>> +AT_SETUP([datapath - bridging two geneve tunnels])
>> +OVS_CHECK_TUNNEL_TSO()
>> +OVS_CHECK_GENEVE()
>> +
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +ADD_BR([br-underlay-0])
>> +ADD_BR([br-underlay-1])
>> +
>> +ADD_NAMESPACES(at_ns0)
>> +ADD_NAMESPACES(at_ns1)
>> +
>> +dnl Set up underlay link from host into the namespaces using veth pairs.
>> +ADD_VETH(p0, at_ns0, br-underlay-0, "172.31.1.1/24")
>> +AT_CHECK([ip addr add dev br-underlay-0 "172.31.1.100/24"])
>> +AT_CHECK([ip link set dev br-underlay-0 up])
>> +
>> +ADD_VETH(p1, at_ns1, br-underlay-1, "172.31.2.1/24")
>> +AT_CHECK([ip addr add dev br-underlay-1 "172.31.2.100/24"])
>> +AT_CHECK([ip link set dev br-underlay-1 up])
>> +
>> +dnl Set up two OVS tunnel endpoints in a root namespace and two native
>> +dnl linux devices inside the test namespaces.
>> +dnl
>> +dnl  ns_gnv0                          |               ns_gnv1
>> +dnl  ip:        10.1.1.1/24           |               ip:        10.1.1.2/24
>> +dnl  remote_ip: 172.31.1.100          |               remote_ip: 172.31.2.100
>> +dnl          |                        |                  |
>> +dnl          |                        |                  |
>> +dnl  p0                               |               p1
>> +dnl  ip: 172.31.1.1/24                |               ip: 172.31.2.1/24
>> +dnl          |                  NS0   |   NS1            |
>> +dnl ---------|------------------------+------------------|--------------------
>> +dnl          |                                           |
>> +dnl       br-underlay-0:                           br-underlay-1:
>> +dnl       ip: 172.31.1.100/24                      ip: 172.31.2.100/24
>> +dnl         ovs-p0                                   ovs-p1
>> +dnl          |                                           |
>> +dnl          |            br0                            |
>> +dnl       encap/decap --- ip: 10.1.1.100/24 --------- encap/decap
>> +dnl                         at_gnv0
>> +dnl                           remote_ip: 172.31.1.1
>> +dnl                         at_gnv1
>> +dnl                           remote_ip: 172.31.2.1
>> +dnl
>> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
>> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
>> +                  [vni 0])
>> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv1], [172.31.2.1], [10.1.1.101/24])
>> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv1], [at_ns1], [172.31.2.100], [10.1.1.2/24],
>> +                  [vni 0])
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>> +AT_CHECK([ovs-ofctl add-flow br-underlay-0 "actions=normal"])
>> +AT_CHECK([ovs-ofctl add-flow br-underlay-1 "actions=normal"])
>> +
>> +dnl First, check both underlays.
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 172.31.1.100 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -W 2 172.31.2.100 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +
>> +dnl Now, check the overlay with different packet sizes.
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
> 
> In the original problem there was used a TC offload, which throwed warn
> messages in vswitchd logfile but offload functioned well.  IIUC, with this
> patch and without [0] offload should not happen due to its changes, but in both
> cases there should be connectivity: "incorrect" old behaviour using TC and
> "correct" new one falling back to kernel (or other non-tc-offloaded) datapath.
> 
> So, I’m wondering whether this test should check somehow that tp_src is ignored
> before offload?  Or it’s just a test to quickly reproduce mentioned topology
> and than to do things manually?

Our tests do not tolerate failures, so if the tp_src is ignored,
we'll get the original acknowledgement mismatch warning in the log
and the test will fail because of this.  I posted an example in my
reply to Marcelo.

> 
> 0: https://patchwork.ozlabs.org/project/openvswitch/patch/20231201210523.3085560-1-i.maximets@ovn.org/
> 
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>> AT_SETUP([datapath - ping over gre tunnel by simulated packets])
>> OVS_CHECK_TUNNEL_TSO()
>> OVS_CHECK_MIN_KERNEL(3, 10)
>> -- 
>> 2.43.0
>>
> 
> 
> Regards,
> Vladislav Odintsov
>
Marcelo Leitner Dec. 4, 2023, 1:18 p.m. UTC | #6
On Mon, Dec 04, 2023 at 01:57:50PM +0100, Ilya Maximets wrote:
> On 12/4/23 13:40, Marcelo Leitner wrote:
> > On Sat, Dec 02, 2023 at 12:06:59AM +0100, Ilya Maximets wrote:
> > ...
> >> Adding a test case that demonstrates a scenario where the issue
> >> occurs - bridging of two tunnels.
> >
> > I get the fix and it LGTM. What I don't get is the test case.
> > Considering that the attr was getting simply ignored, wouldn't the
> > test case always succeed? That is, I don't see how ignoring tp_src
> > would cause the test to fail. Can you please elaborate?
>
> Ignoring the tp_src is causing tc code to warn about kernel
> acknowledgement mismatch.  Testsuite doesn't tolerate warnings,
> so the test fails on checking the logs:

Nice.

...
> We do not check if the offloading is actually happening, but we
> check if there is any miscommunication between ovs-vswitchd and
> the kernel (TC).
>
> Does that make sense?

Yes, thanks.

Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Vladislav Odintsov Dec. 4, 2023, 3:28 p.m. UTC | #7
Thanks Ilya for the clarification. I didn’t know about check_logs logic.

Tested-by: Vladislav Odintsov <odivlad@gmail.com>

> On 4 Dec 2023, at 15:59, Ilya Maximets <i.maximets@ovn.org> wrote:
> 
> On 12/2/23 16:16, Vladislav Odintsov wrote:
>> Hi Ilya, thanks for the added test!
>> 
>> I’ve got one question about it, psb.
>> 
>>> On 2 Dec 2023, at 02:06, Ilya Maximets <i.maximets@ovn.org> wrote:
>>> 
>>> There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload
>>> should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested
>>> by OVS.  Current code just ignores the attribute in the tunnel(set())
>>> action leading to a flow mismatch and potential incorrect datapath
>>> behavior:
>>> 
>>>  |tc(handler21)|DBG|tc flower compare failed action compare
>>>  ...
>>>  Action 0 mismatch:
>>>  - Expected Action:
>>>  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>>>  00000020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>>>  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>>>  ...
>>> - Received Action:
>>>  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
>>>  00000020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
>>>  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
>>>  ...
>>> 
>>> In the tc_action dump above we can see the difference on the second
>>> line.  The action dumped from a kernel is missing 'c0 5b' - source
>>> port for a tunnel(set()) action on the second line.
>>> 
>>> Removing the field from the tc_action_encap structure entirely to
>>> avoid any potential confusion.
>>> 
>>> Note: In general, the source port number in the tunnel(set()) action
>>> is not particularly useful for most tunnels, because they may just
>>> ignore the value.  Specs for Geneve and VXLAN suggest using a value
>>> based on the headers of the inner packet as a source port.
>>> In vast majority of scenarios the source port doesn't actually end
>>> up in the action itself.
>>> Having a mismatch between the userspace and TC leads to constant
>>> revalidation of the flow and warnings in the log.
>>> 
>>> Adding a test case that demonstrates a scenario where the issue
>>> occurs - bridging of two tunnels.
>>> 
>>> Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
>>> Reported-by: Vladislav Odintsov <odivlad@gmail.com>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>> 
>>> Version 2:
>>>  * Slightly adjusted a commit message now that we understand the
>>>    scenario better.
>>>  * Added a test case that reproduces the issue.
>>> 
>>> 
>>> lib/netdev-offload-tc.c |  4 ++-
>>> lib/tc.h                |  3 +-
>>> tests/system-traffic.at | 77 +++++++++++++++++++++++++++++++++++++++++
>>> 3 files changed, 82 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index b846a63c2..164c7eef6 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -1627,7 +1627,9 @@ parse_put_flow_set_action(struct tc_flower *flower, struct tc_action *action,
>>>         }
>>>         break;
>>>         case OVS_TUNNEL_KEY_ATTR_TP_SRC: {
>>> -            action->encap.tp_src = nl_attr_get_be16(tun_attr);
>>> +            /* There is no corresponding attribute in TC. */
>>> +            VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC");
>>> +            return EOPNOTSUPP;
>>>         }
>>>         break;
>>>         case OVS_TUNNEL_KEY_ATTR_TP_DST: {
>>> diff --git a/lib/tc.h b/lib/tc.h
>>> index 06707ffa4..fdbcf4b7c 100644
>>> --- a/lib/tc.h
>>> +++ b/lib/tc.h
>>> @@ -213,7 +213,8 @@ enum nat_type {
>>> struct tc_action_encap {
>>>     bool id_present;
>>>     ovs_be64 id;
>>> -    ovs_be16 tp_src;
>>> +    /* ovs_be16 tp_src;  Could have been here, but there is no
>>> +     * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */
>>>     ovs_be16 tp_dst;
>>>     uint8_t tos;
>>>     uint8_t ttl;
>>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>>> index a7d4ed83b..fd55fdee1 100644
>>> --- a/tests/system-traffic.at
>>> +++ b/tests/system-traffic.at
>>> @@ -903,6 +903,83 @@ ovs-pcap p0.pcap
>>> AT_CHECK([ovs-pcap p0.pcap | grep -Eq "^[[[:xdigit:]]]{24}86dd60000000003a1140fc000000000000000000000000000100fc000000000000000000000000000001[[[:xdigit:]]]{4}17c1003a[[[:xdigit:]]]{4}0000655800000000fffffffffffffa163e949d8008060001080006040001[[[:xdigit:]]]{12}0a0000f40000000000000a0000fe$"])
>>> AT_CLEANUP
>>> 
>>> +AT_SETUP([datapath - bridging two geneve tunnels])
>>> +OVS_CHECK_TUNNEL_TSO()
>>> +OVS_CHECK_GENEVE()
>>> +
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +ADD_BR([br-underlay-0])
>>> +ADD_BR([br-underlay-1])
>>> +
>>> +ADD_NAMESPACES(at_ns0)
>>> +ADD_NAMESPACES(at_ns1)
>>> +
>>> +dnl Set up underlay link from host into the namespaces using veth pairs.
>>> +ADD_VETH(p0, at_ns0, br-underlay-0, "172.31.1.1/24")
>>> +AT_CHECK([ip addr add dev br-underlay-0 "172.31.1.100/24"])
>>> +AT_CHECK([ip link set dev br-underlay-0 up])
>>> +
>>> +ADD_VETH(p1, at_ns1, br-underlay-1, "172.31.2.1/24")
>>> +AT_CHECK([ip addr add dev br-underlay-1 "172.31.2.100/24"])
>>> +AT_CHECK([ip link set dev br-underlay-1 up])
>>> +
>>> +dnl Set up two OVS tunnel endpoints in a root namespace and two native
>>> +dnl linux devices inside the test namespaces.
>>> +dnl
>>> +dnl  ns_gnv0                          |               ns_gnv1
>>> +dnl  ip:        10.1.1.1/24           |               ip:        10.1.1.2/24
>>> +dnl  remote_ip: 172.31.1.100          |               remote_ip: 172.31.2.100
>>> +dnl          |                        |                  |
>>> +dnl          |                        |                  |
>>> +dnl  p0                               |               p1
>>> +dnl  ip: 172.31.1.1/24                |               ip: 172.31.2.1/24
>>> +dnl          |                  NS0   |   NS1            |
>>> +dnl ---------|------------------------+------------------|--------------------
>>> +dnl          |                                           |
>>> +dnl       br-underlay-0:                           br-underlay-1:
>>> +dnl       ip: 172.31.1.100/24                      ip: 172.31.2.100/24
>>> +dnl         ovs-p0                                   ovs-p1
>>> +dnl          |                                           |
>>> +dnl          |            br0                            |
>>> +dnl       encap/decap --- ip: 10.1.1.100/24 --------- encap/decap
>>> +dnl                         at_gnv0
>>> +dnl                           remote_ip: 172.31.1.1
>>> +dnl                         at_gnv1
>>> +dnl                           remote_ip: 172.31.2.1
>>> +dnl
>>> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
>>> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
>>> +                  [vni 0])
>>> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv1], [172.31.2.1], [10.1.1.101/24])
>>> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv1], [at_ns1], [172.31.2.100], [10.1.1.2/24],
>>> +                  [vni 0])
>>> +
>>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>>> +AT_CHECK([ovs-ofctl add-flow br-underlay-0 "actions=normal"])
>>> +AT_CHECK([ovs-ofctl add-flow br-underlay-1 "actions=normal"])
>>> +
>>> +dnl First, check both underlays.
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 172.31.1.100 | FORMAT_PING], [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -W 2 172.31.2.100 | FORMAT_PING], [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +
>>> +dnl Now, check the overlay with different packet sizes.
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>> 
>> In the original problem there was used a TC offload, which throwed warn
>> messages in vswitchd logfile but offload functioned well.  IIUC, with this
>> patch and without [0] offload should not happen due to its changes, but in both
>> cases there should be connectivity: "incorrect" old behaviour using TC and
>> "correct" new one falling back to kernel (or other non-tc-offloaded) datapath.
>> 
>> So, I’m wondering whether this test should check somehow that tp_src is ignored
>> before offload?  Or it’s just a test to quickly reproduce mentioned topology
>> and than to do things manually?
> 
> Our tests do not tolerate failures, so if the tp_src is ignored,
> we'll get the original acknowledgement mismatch warning in the log
> and the test will fail because of this.  I posted an example in my
> reply to Marcelo.
> 
>> 
>> 0: https://patchwork.ozlabs.org/project/openvswitch/patch/20231201210523.3085560-1-i.maximets@ovn.org/
>> 
>>> +
>>> +OVS_TRAFFIC_VSWITCHD_STOP
>>> +AT_CLEANUP
>>> +
>>> AT_SETUP([datapath - ping over gre tunnel by simulated packets])
>>> OVS_CHECK_TUNNEL_TSO()
>>> OVS_CHECK_MIN_KERNEL(3, 10)
>>> -- 
>>> 2.43.0
>>> 
>> 
>> 
>> Regards,
>> Vladislav Odintsov


Regards,
Vladislav Odintsov
Ilya Maximets Dec. 5, 2023, 2:35 p.m. UTC | #8
On 12/4/23 14:18, Marcelo Leitner wrote:
> On Mon, Dec 04, 2023 at 01:57:50PM +0100, Ilya Maximets wrote:
>> On 12/4/23 13:40, Marcelo Leitner wrote:
>>> On Sat, Dec 02, 2023 at 12:06:59AM +0100, Ilya Maximets wrote:
>>> ...
>>>> Adding a test case that demonstrates a scenario where the issue
>>>> occurs - bridging of two tunnels.
>>>
>>> I get the fix and it LGTM. What I don't get is the test case.
>>> Considering that the attr was getting simply ignored, wouldn't the
>>> test case always succeed? That is, I don't see how ignoring tp_src
>>> would cause the test to fail. Can you please elaborate?
>>
>> Ignoring the tp_src is causing tc code to warn about kernel
>> acknowledgement mismatch.  Testsuite doesn't tolerate warnings,
>> so the test fails on checking the logs:
> 
> Nice.
> 
> ...
>> We do not check if the offloading is actually happening, but we
>> check if there is any miscommunication between ovs-vswitchd and
>> the kernel (TC).
>>
>> Does that make sense?
> 
> Yes, thanks.
> 
> Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> 


Thanks, everyone!
Applied and backported down to 2.17.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index b846a63c2..164c7eef6 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1627,7 +1627,9 @@  parse_put_flow_set_action(struct tc_flower *flower, struct tc_action *action,
         }
         break;
         case OVS_TUNNEL_KEY_ATTR_TP_SRC: {
-            action->encap.tp_src = nl_attr_get_be16(tun_attr);
+            /* There is no corresponding attribute in TC. */
+            VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC");
+            return EOPNOTSUPP;
         }
         break;
         case OVS_TUNNEL_KEY_ATTR_TP_DST: {
diff --git a/lib/tc.h b/lib/tc.h
index 06707ffa4..fdbcf4b7c 100644
--- a/lib/tc.h
+++ b/lib/tc.h
@@ -213,7 +213,8 @@  enum nat_type {
 struct tc_action_encap {
     bool id_present;
     ovs_be64 id;
-    ovs_be16 tp_src;
+    /* ovs_be16 tp_src;  Could have been here, but there is no
+     * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */
     ovs_be16 tp_dst;
     uint8_t tos;
     uint8_t ttl;
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index a7d4ed83b..fd55fdee1 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -903,6 +903,83 @@  ovs-pcap p0.pcap
 AT_CHECK([ovs-pcap p0.pcap | grep -Eq "^[[[:xdigit:]]]{24}86dd60000000003a1140fc000000000000000000000000000100fc000000000000000000000000000001[[[:xdigit:]]]{4}17c1003a[[[:xdigit:]]]{4}0000655800000000fffffffffffffa163e949d8008060001080006040001[[[:xdigit:]]]{12}0a0000f40000000000000a0000fe$"])
 AT_CLEANUP
 
+AT_SETUP([datapath - bridging two geneve tunnels])
+OVS_CHECK_TUNNEL_TSO()
+OVS_CHECK_GENEVE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay-0])
+ADD_BR([br-underlay-1])
+
+ADD_NAMESPACES(at_ns0)
+ADD_NAMESPACES(at_ns1)
+
+dnl Set up underlay link from host into the namespaces using veth pairs.
+ADD_VETH(p0, at_ns0, br-underlay-0, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay-0 "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay-0 up])
+
+ADD_VETH(p1, at_ns1, br-underlay-1, "172.31.2.1/24")
+AT_CHECK([ip addr add dev br-underlay-1 "172.31.2.100/24"])
+AT_CHECK([ip link set dev br-underlay-1 up])
+
+dnl Set up two OVS tunnel endpoints in a root namespace and two native
+dnl linux devices inside the test namespaces.
+dnl
+dnl  ns_gnv0                          |               ns_gnv1
+dnl  ip:        10.1.1.1/24           |               ip:        10.1.1.2/24
+dnl  remote_ip: 172.31.1.100          |               remote_ip: 172.31.2.100
+dnl          |                        |                  |
+dnl          |                        |                  |
+dnl  p0                               |               p1
+dnl  ip: 172.31.1.1/24                |               ip: 172.31.2.1/24
+dnl          |                  NS0   |   NS1            |
+dnl ---------|------------------------+------------------|--------------------
+dnl          |                                           |
+dnl       br-underlay-0:                           br-underlay-1:
+dnl       ip: 172.31.1.100/24                      ip: 172.31.2.100/24
+dnl         ovs-p0                                   ovs-p1
+dnl          |                                           |
+dnl          |            br0                            |
+dnl       encap/decap --- ip: 10.1.1.100/24 --------- encap/decap
+dnl                         at_gnv0
+dnl                           remote_ip: 172.31.1.1
+dnl                         at_gnv1
+dnl                           remote_ip: 172.31.2.1
+dnl
+ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
+                  [vni 0])
+ADD_OVS_TUNNEL([geneve], [br0], [at_gnv1], [172.31.2.1], [10.1.1.101/24])
+ADD_NATIVE_TUNNEL([geneve], [ns_gnv1], [at_ns1], [172.31.2.100], [10.1.1.2/24],
+                  [vni 0])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay-0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay-1 "actions=normal"])
+
+dnl First, check both underlays.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -W 2 172.31.2.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Now, check the overlay with different packet sizes.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping over gre tunnel by simulated packets])
 OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_MIN_KERNEL(3, 10)