diff mbox series

[ovs-dev] flow: Consistent VXLAN UDP src ports for fragmented packets

Message ID 1610445027-23835-1-git-send-email-parvathy.tarur.ramachandran@ericsson.com
State Superseded
Headers show
Series [ovs-dev] flow: Consistent VXLAN UDP src ports for fragmented packets | expand

Commit Message

Parvathy Tarur Ramachandran Jan. 12, 2021, 9:50 a.m. UTC
Issue:
The src-port for UDP is based on RSS hash in the packet metadata.
In case of packets coming from VM it will be 5-tuple, if available,
otherwise just IP addresses. If the VM fragments a large IP packet
and sends the fragments to OVS, only the first fragment will contain
the L4 header.  Therefore, the first fragment and subsequent fragments
get different UDP src ports in the outgoing VXLAN header. This can
lead to fragment re-ordering in the fabric as packet will take
different paths.

Fix:
With this patch, we ignore the L4 header during hash calculation in
the case of fragmented packets.

Signed-off-by: Parvathy Tarur Ramachandran <parvathy.tarur.ramachandran@ericsson.com>
---
 lib/flow.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Ilya Maximets May 13, 2021, 5:29 p.m. UTC | #1
On 1/12/21 10:50 AM, Parvathy Tarur Ramachandran via dev wrote:
> Issue:
> The src-port for UDP is based on RSS hash in the packet metadata.
> In case of packets coming from VM it will be 5-tuple, if available,
> otherwise just IP addresses. If the VM fragments a large IP packet
> and sends the fragments to OVS, only the first fragment will contain
> the L4 header.  Therefore, the first fragment and subsequent fragments
> get different UDP src ports in the outgoing VXLAN header. This can
> lead to fragment re-ordering in the fabric as packet will take
> different paths.
> 
> Fix:
> With this patch, we ignore the L4 header during hash calculation in
> the case of fragmented packets.

Hi.  Sorry for late reply.

I see the problem, but I'm not sure that this change will actually
fix it, because if the first packet is fragmented and the second
one is not fragmented (originally had smaller size for any reason),
they will still be hashed differently even if they are from the same
flow.

The solution for this would be to completely disable L4 hashing,
but that doesn't sound right.

What do you think?

Best regards, Ilya Maximets.

> 
> Signed-off-by: Parvathy Tarur Ramachandran <parvathy.tarur.ramachandran@ericsson.com>
> ---
>  lib/flow.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/flow.c b/lib/flow.c
> index cc1b3f2..38bf377 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -2178,7 +2178,7 @@ miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis)
>  
>      if (flow) {
>          ovs_be16 dl_type = MINIFLOW_GET_BE16(flow, dl_type);
> -        uint8_t nw_proto;
> +        uint8_t nw_proto, nw_frag;
>  
>          if (dl_type == htons(ETH_TYPE_IPV6)) {
>              struct flowmap map = FLOWMAP_EMPTY_INITIALIZER;
> @@ -2200,6 +2200,14 @@ miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis)
>  
>          nw_proto = MINIFLOW_GET_U8(flow, nw_proto);
>          hash = hash_add(hash, nw_proto);
> +        /* Skip l4 header fields if IP packet is fragmented since
> +         * only first fragment will carry l4 header.
> +         */
> +        nw_frag = MINIFLOW_GET_U8(flow, nw_frag);
> +        if (nw_frag) {
> +            goto out;
> +        }
> +
>          if (nw_proto != IPPROTO_TCP && nw_proto != IPPROTO_UDP
>              && nw_proto != IPPROTO_SCTP && nw_proto != IPPROTO_ICMP
>              && nw_proto != IPPROTO_ICMPV6) {
>
diff mbox series

Patch

diff --git a/lib/flow.c b/lib/flow.c
index cc1b3f2..38bf377 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -2178,7 +2178,7 @@  miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis)
 
     if (flow) {
         ovs_be16 dl_type = MINIFLOW_GET_BE16(flow, dl_type);
-        uint8_t nw_proto;
+        uint8_t nw_proto, nw_frag;
 
         if (dl_type == htons(ETH_TYPE_IPV6)) {
             struct flowmap map = FLOWMAP_EMPTY_INITIALIZER;
@@ -2200,6 +2200,14 @@  miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis)
 
         nw_proto = MINIFLOW_GET_U8(flow, nw_proto);
         hash = hash_add(hash, nw_proto);
+        /* Skip l4 header fields if IP packet is fragmented since
+         * only first fragment will carry l4 header.
+         */
+        nw_frag = MINIFLOW_GET_U8(flow, nw_frag);
+        if (nw_frag) {
+            goto out;
+        }
+
         if (nw_proto != IPPROTO_TCP && nw_proto != IPPROTO_UDP
             && nw_proto != IPPROTO_SCTP && nw_proto != IPPROTO_ICMP
             && nw_proto != IPPROTO_ICMPV6) {