diff mbox series

[ovs-dev,1/1] meta-flow: Introduce tunnel eth-type field.

Message ID 50ea95b5-5820-41d7-9da8-f44a38b1b89b@nvidia.com
State Changes Requested
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev,1/1] meta-flow: Introduce tunnel eth-type field. | expand

Checks

Context Check Description
ovsrobot/apply-robot fail apply and check: fail

Commit Message

Reuven Plevinsky Sept. 15, 2025, 1:08 p.m. UTC
From: Eli Britstein <elibr@nvidia.com>


Introduce eth-type field to tunnel metadata that is internally matched
for tunnels, for robust detection of the IP type.

Signed-off-by: Eli Britstein <elibr@nvidia.com>
---
1. Please note this is a draft-only proposal. We didn't take care on
all tunnel types, testsuite etc.
2. About backward compatibility: You mentioned that today OpenFlow
allows setting both tunnel IPv4 and IPv6 fields simultaneously.
While that is true technically, there is no valid use case or
semantic meaning to matching both address types at once. Matching
on IPv4 and IPv6 together simply does not make sense from a protocol
perspective. So, in practice, I would argue that this is not a real
compatibility issue, because the "dual set" case is already invalid.
3. About dependencies and pipeline design: I understand the point about
dependencies and the mismatch with OpenFlow's "all fields can be set"
model. But the key here is that the proposed tunnel_eth_type is an
internal field. It is not exposed directly in OpenFlow as a
user-visible match key. Instead, it provides internal consistency
for the datapath and avoids ambiguity in cases where the tunnel IP
version matters. Since it is internal, existing OpenFlow semantics
remain untouched, and no user-facing behavior is broken.
4. If we go forward with this approach, more commits will follow to
handle offloads and cleanups of "dual set", that as mentioned it is
already invalid.

include/openvswitch/match.h | 1 +
include/openvswitch/meta-flow.h | 14 ++++++++++++++
include/openvswitch/packets.h | 3 ++-
lib/match.c | 8 ++++++++
lib/meta-flow.c | 18 ++++++++++++++++++
lib/meta-flow.xml | 1 +
lib/netdev-native-tnl.c | 15 +++++++++++++++
lib/netdev-offload-dpdk.c | 3 +++
lib/netdev-offload-tc.c | 2 ++
ofproto/tunnel.c | 1 +
10 files changed, 65 insertions(+), 1 deletion(-)

Comments

Ilya Maximets Oct. 15, 2025, 1:13 p.m. UTC | #1
On 9/15/25 3:08 PM, Reuven Plevinsky via dev wrote:
> From: Eli Britstein <elibr@nvidia.com>
> 
> 
> Introduce eth-type field to tunnel metadata that is internally matched
> for tunnels, for robust detection of the IP type.
> 
> Signed-off-by: Eli Britstein <elibr@nvidia.com>
> ---
> 1. Please note this is a draft-only proposal. We didn't take care on
> all tunnel types, testsuite etc.

Hi, Eli and Reuven.  Thanks for the patch.  For the future, please,
add RFC into the subject prefix for incomplete/RFC work.

Also, the patch seem to be mangled by your mail client or the server,
you may need to find a different way of sending patches.

> 2. About backward compatibility: You mentioned that today OpenFlow
> allows setting both tunnel IPv4 and IPv6 fields simultaneously.
> While that is true technically, there is no valid use case or
> semantic meaning to matching both address types at once. Matching
> on IPv4 and IPv6 together simply does not make sense from a protocol
> perspective. So, in practice, I would argue that this is not a real
> compatibility issue, because the "dual set" case is already invalid.

As I said before, it's not invalid on the OpenFlow level.  In fact, it
may be the only way to tell if the packet is from a tunnel or not for
non-keyed tunnels.

> 3. About dependencies and pipeline design: I understand the point about
> dependencies and the mismatch with OpenFlow's "all fields can be set"
> model. But the key here is that the proposed tunnel_eth_type is an
> internal field. It is not exposed directly in OpenFlow as a
> user-visible match key. Instead, it provides internal consistency
> for the datapath and avoids ambiguity in cases where the tunnel IP
> version matters. Since it is internal, existing OpenFlow semantics
> remain untouched, and no user-facing behavior is broken.

It may be fine to add an internal field.  As long as other fields do not
depend on it, there should indeed not be any compatibility breakages.
But we should still not deny addition of OpenFlow rules that match on
both types of addresses as that will be incompatible.  We can clear the
'always true' matches before sending them to the datapath, based on the
eth_type, but only when the eth_type is also matched in the datapath,
i.e. we'll need to also add support for matching the tunnel eth_type in
the datapath.  And we'll need to detect if our datapath supports these
matches and only clear the address fields in this case.  The datapath
support is necessary as everything that is used to make a decision in
ofproto layer must be matched in the datapath.  This is all revolving
around the problem of being able to match on "this packet is not from a
tunnel" condition, which will be covered by a 'tun.eth_type == 0' match
in the datapath.

> 4. If we go forward with this approach, more commits will follow to
> handle offloads and cleanups of "dual set", that as mentioned it is
> already invalid.

Feel free to send RFC v2 with a proper implementation, including support
on the datapath level and corresponding conditional use of this new field.
If that will look good, then we can proceed with making the kernel changes
first and then accepting the userspace parts.

The implementation may be a bit challenging, but I don't see fundamental
issues of why this can't be done.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/include/openvswitch/match.h b/include/openvswitch/match.h
index 2e8812048e86..b962badfcfd3 100644
--- a/include/openvswitch/match.h
+++ b/include/openvswitch/match.h
@@ -127,6 +127,7 @@  void match_set_tun_gtpu_flags_masked(struct match 
*match, uint8_t flags,
void match_set_tun_gtpu_msgtype(struct match *match, uint8_t msgtype);
void match_set_tun_gtpu_msgtype_masked(struct match *match, uint8_t msgtype,
uint8_t mask);
+void match_set_tun_eth_type(struct match *, ovs_be16);
void match_set_in_port(struct match *, ofp_port_t ofp_port);
void match_set_pkt_mark(struct match *, uint32_t pkt_mark);
void match_set_pkt_mark_masked(struct match *, uint32_t pkt_mark, 
uint32_t mask);
diff --git a/include/openvswitch/meta-flow.h 
b/include/openvswitch/meta-flow.h
index fb7d17ebe736..cfdd286d8bf1 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -304,6 +304,20 @@  enum OVS_PACKED_ENUM mf_field_id {
*/
MFF_TUN_ID,
+ /* "tun_eth_type".
+ *
+ * Packet's Tunnel Ethernet type.
+ *
+ * Type: be16.
+ * Maskable: no.
+ * Formatting: decimal.
+ * Prerequisites: none.
+ * Access: read-only.
+ * NXM: none.
+ * OXM: none.
+ */
+ MFF_TUN_ETH_TYPE,
+
/* "tun_src".
*
* The IPv4 source address in the outer IP header of a tunneled packet.
diff --git a/include/openvswitch/packets.h b/include/openvswitch/packets.h
index a65cb0d04e77..c9af1b6ceca6 100644
--- a/include/openvswitch/packets.h
+++ b/include/openvswitch/packets.h
@@ -45,7 +45,8 @@  struct flow_tnl {
uint8_t erspan_hwid;
uint8_t gtpu_flags;
uint8_t gtpu_msgtype;
- uint8_t pad1[4]; /* Pad to 64 bits. */
+ ovs_be16 eth_type;
+ uint8_t pad1[2]; /* Pad to 64 bits. */
struct tun_metadata metadata;
};
diff --git a/lib/match.c b/lib/match.c
index 9b7e06e0c7f8..e6681db82a05 100644
--- a/lib/match.c
+++ b/lib/match.c
@@ -402,6 +402,13 @@  match_set_tun_gtpu_msgtype(struct match *match, 
uint8_t msgtype)
match_set_tun_gtpu_msgtype_masked(match, msgtype, UINT8_MAX);
}
+void
+match_set_tun_eth_type(struct match *match, ovs_be16 eth_type)
+{
+ match->wc.masks.tunnel.eth_type = OVS_BE16_MAX;
+ match->flow.tunnel.eth_type = eth_type;
+}
+
void
match_set_in_port(struct match *match, ofp_port_t ofp_port)
{
@@ -1398,6 +1405,7 @@  format_flow_tunnel(struct ds *s, const struct 
match *match)
FLOW_TNL_F_MASK);
ds_put_char(s, ',');
}
+ ds_put_format(s, "tun_eth_type=0x%04"PRIx16",", ntohs(tnl->eth_type));
tun_metadata_match_format(s, match);
}
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index b03fe7abf1d8..e2c870a126d4 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -211,6 +211,8 @@  mf_is_all_wild(const struct mf_field *mf, const 
struct flow_wildcards *wc)
return !wc->masks.packet_type;
case MFF_CONJ_ID:
return !wc->masks.conj_id;
+ case MFF_TUN_ETH_TYPE:
+ return !wc->masks.tunnel.eth_type;
case MFF_TUN_SRC:
return !wc->masks.tunnel.ip_src;
case MFF_TUN_DST:
@@ -522,6 +524,7 @@  mf_is_value_valid(const struct mf_field *mf, const 
union mf_value *value)
case MFF_PACKET_TYPE:
case MFF_CONJ_ID:
case MFF_TUN_ID:
+ case MFF_TUN_ETH_TYPE:
case MFF_TUN_SRC:
case MFF_TUN_DST:
case MFF_TUN_IPV6_SRC:
@@ -678,6 +681,9 @@  mf_get_value(const struct mf_field *mf, const struct 
flow *flow,
case MFF_TUN_ID:
value->be64 = flow->tunnel.tun_id;
break;
+ case MFF_TUN_ETH_TYPE:
+ value->be16 = flow->tunnel.eth_type;
+ break;
case MFF_TUN_SRC:
value->be32 = flow->tunnel.ip_src;
break;
@@ -1015,6 +1021,9 @@  mf_set_value(const struct mf_field *mf,
case MFF_TUN_ID:
match_set_tun_id(match, value->be64);
break;
+ case MFF_TUN_ETH_TYPE:
+ match_set_tun_eth_type(match, value->be16);
+ break;
case MFF_TUN_SRC:
match_set_tun_src(match, value->be32);
break;
@@ -1437,6 +1446,9 @@  mf_set_flow_value(const struct mf_field *mf,
case MFF_TUN_ID:
flow->tunnel.tun_id = value->be64;
break;
+ case MFF_TUN_ETH_TYPE:
+ flow->tunnel.eth_type = value->be16;
+ break;
case MFF_TUN_SRC:
flow->tunnel.ip_src = value->be32;
break;
@@ -1821,6 +1833,7 @@  mf_is_any_metadata(const struct mf_field *mf)
return true;
case MFF_TUN_ID:
+ case MFF_TUN_ETH_TYPE:
case MFF_TUN_SRC:
case MFF_TUN_DST:
case MFF_TUN_IPV6_SRC:
@@ -1915,6 +1928,7 @@  mf_is_pipeline_field(const struct mf_field *mf)
{
switch (mf->id) {
case MFF_TUN_ID:
+ case MFF_TUN_ETH_TYPE:
case MFF_TUN_SRC:
case MFF_TUN_DST:
case MFF_TUN_IPV6_SRC:
@@ -2073,6 +2087,9 @@  mf_set_wild(const struct mf_field *mf, struct 
match *match, char **err_str)
case MFF_TUN_ID:
match_set_tun_id_masked(match, htonll(0), htonll(0));
break;
+ case MFF_TUN_ETH_TYPE:
+ match->flow.tunnel.eth_type = 0;
+ break;
case MFF_TUN_SRC:
match_set_tun_src_masked(match, htonl(0), htonl(0));
break;
@@ -2477,6 +2494,7 @@  mf_set(const struct mf_field *mf,
case MFF_ICMPV6_CODE:
case MFF_ND_RESERVED:
case MFF_ND_OPTIONS_TYPE:
+ case MFF_TUN_ETH_TYPE:
return OFPUTIL_P_NONE;
case MFF_DP_HASH:
diff --git a/lib/meta-flow.xml b/lib/meta-flow.xml
index ac72a44bce40..45225f3ad351 100644
--- a/lib/meta-flow.xml
+++ b/lib/meta-flow.xml
@@ -1510,6 +1510,7 @@  ovs-ofctl add-flow br-int 
'in_port=3,tun_src=192.168.1.1,tun_id=5001 actions=1'
</diagram>
</field>
+ <field id="MFF_TUN_ETH_TYPE" title="Tunnel Ethernet type" internal="yes"/>
<field id="MFF_TUN_SRC" title="Tunnel IPv4 Source">
<p>
When a packet is received from a tunnel, this field is the
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index e172ed72e447..5bf56f480aa0 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -524,6 +524,7 @@  netdev_gre_pop_header(struct dp_packet *packet)
struct pkt_metadata *md = &packet->md;
struct flow_tnl *tnl = &md->tunnel;
int hlen = sizeof(struct eth_header) + 4;
+ const struct eth_header *eth;
ovs_assert(data_dp);
@@ -540,6 +541,10 @@  netdev_gre_pop_header(struct dp_packet *packet)
goto err;
}
+ eth = dp_packet_eth(packet);
+ if (eth) {
+ tnl->eth_type = eth->eth_type;
+ }
dp_packet_reset_packet(packet, hlen);
return packet;
@@ -1097,6 +1102,7 @@  netdev_vxlan_pop_header(struct dp_packet *packet)
unsigned int hlen;
ovs_be32 vx_flags;
enum packet_type next_pt = PT_ETH;
+ const struct eth_header *eth;
ovs_assert(packet->l3_ofs > 0);
ovs_assert(packet->l4_ofs > 0);
@@ -1147,6 +1153,10 @@  netdev_vxlan_pop_header(struct dp_packet *packet)
tnl->flags |= FLOW_TNL_F_KEY;
packet->packet_type = htonl(next_pt);
+ eth = dp_packet_eth(packet);
+ if (eth) {
+ tnl->eth_type = eth->eth_type;
+ }
dp_packet_reset_packet(packet, hlen + VXLAN_HLEN);
if (next_pt != PT_ETH) {
packet->l3_ofs = 0;
@@ -1214,6 +1224,7 @@  netdev_geneve_pop_header(struct dp_packet *packet)
struct flow_tnl *tnl = &md->tunnel;
struct genevehdr *gnh;
unsigned int hlen, opts_len, ulen;
+ const struct eth_header *eth;
pkt_metadata_init_tnl(md);
if (GENEVE_BASE_HLEN > dp_packet_l4_size(packet)) {
@@ -1255,6 +1266,10 @@  netdev_geneve_pop_header(struct dp_packet *packet)
tnl->flags |= FLOW_TNL_F_UDPIF;
packet->packet_type = htonl(PT_ETH);
+ eth = dp_packet_eth(packet);
+ if (eth) {
+ tnl->eth_type = eth->eth_type;
+ }
dp_packet_reset_packet(packet, hlen);
return packet;
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index f963bef2280b..7ce9da21b0f3 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -1367,6 +1367,9 @@  parse_flow_tnl_match(struct netdev *tnldev,
return ret;
}
+ /* Temporary ignore. */
+ match->wc.masks.tunnel.eth_type = 0;
+
if (!strcmp(netdev_get_type(tnldev), "vxlan")) {
ret = parse_vxlan_match(patterns, match);
}
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 66f2558ddb71..0b56b650b69f 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -2324,6 +2324,8 @@  netdev_tc_flow_put(struct netdev *netdev, struct 
match *match,
memset(&tnl_mask->gbp_id, 0, sizeof tnl_mask->gbp_id);
memset(&tnl_mask->gbp_flags, 0, sizeof tnl_mask->gbp_flags);
tnl_mask->flags &= ~FLOW_TNL_F_KEY;
+ /* Tunnel's eth_type is for userspace only. Ignore it here. */
+ tnl_mask->eth_type = 0;
/* XXX: This is wrong! We're ignoring DF and CSUM flags configuration
* requested by the user. However, TC for now has no way to pass
diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 80ddee78acf6..c63332051078 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -381,6 +381,7 @@  tnl_wc_init(struct flow *flow, struct flow_wildcards 
*wc)
* wildcarded, not to unwildcard them here. */
wc->masks.tunnel.tp_src = 0;
wc->masks.tunnel.tp_dst = 0;
+ wc->masks.tunnel.eth_type = OVS_BE16_MAX;
if (is_ip_any(flow)
&& IP_ECN_is_ce(flow->tunnel.ip_tos)) {