Message ID | 1610445027-23835-1-git-send-email-parvathy.tarur.ramachandran@ericsson.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] flow: Consistent VXLAN UDP src ports for fragmented packets | expand |
On 1/12/21 10:50 AM, Parvathy Tarur Ramachandran via dev wrote: > Issue: > The src-port for UDP is based on RSS hash in the packet metadata. > In case of packets coming from VM it will be 5-tuple, if available, > otherwise just IP addresses. If the VM fragments a large IP packet > and sends the fragments to OVS, only the first fragment will contain > the L4 header. Therefore, the first fragment and subsequent fragments > get different UDP src ports in the outgoing VXLAN header. This can > lead to fragment re-ordering in the fabric as packet will take > different paths. > > Fix: > With this patch, we ignore the L4 header during hash calculation in > the case of fragmented packets. Hi. Sorry for late reply. I see the problem, but I'm not sure that this change will actually fix it, because if the first packet is fragmented and the second one is not fragmented (originally had smaller size for any reason), they will still be hashed differently even if they are from the same flow. The solution for this would be to completely disable L4 hashing, but that doesn't sound right. What do you think? Best regards, Ilya Maximets. > > Signed-off-by: Parvathy Tarur Ramachandran <parvathy.tarur.ramachandran@ericsson.com> > --- > lib/flow.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/lib/flow.c b/lib/flow.c > index cc1b3f2..38bf377 100644 > --- a/lib/flow.c > +++ b/lib/flow.c > @@ -2178,7 +2178,7 @@ miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis) > > if (flow) { > ovs_be16 dl_type = MINIFLOW_GET_BE16(flow, dl_type); > - uint8_t nw_proto; > + uint8_t nw_proto, nw_frag; > > if (dl_type == htons(ETH_TYPE_IPV6)) { > struct flowmap map = FLOWMAP_EMPTY_INITIALIZER; > @@ -2200,6 +2200,14 @@ miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis) > > nw_proto = MINIFLOW_GET_U8(flow, nw_proto); > hash = hash_add(hash, nw_proto); > + /* Skip l4 header fields if IP packet is fragmented since > + * only first fragment will carry l4 header. > + */ > + nw_frag = MINIFLOW_GET_U8(flow, nw_frag); > + if (nw_frag) { > + goto out; > + } > + > if (nw_proto != IPPROTO_TCP && nw_proto != IPPROTO_UDP > && nw_proto != IPPROTO_SCTP && nw_proto != IPPROTO_ICMP > && nw_proto != IPPROTO_ICMPV6) { >
diff --git a/lib/flow.c b/lib/flow.c index cc1b3f2..38bf377 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -2178,7 +2178,7 @@ miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis) if (flow) { ovs_be16 dl_type = MINIFLOW_GET_BE16(flow, dl_type); - uint8_t nw_proto; + uint8_t nw_proto, nw_frag; if (dl_type == htons(ETH_TYPE_IPV6)) { struct flowmap map = FLOWMAP_EMPTY_INITIALIZER; @@ -2200,6 +2200,14 @@ miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis) nw_proto = MINIFLOW_GET_U8(flow, nw_proto); hash = hash_add(hash, nw_proto); + /* Skip l4 header fields if IP packet is fragmented since + * only first fragment will carry l4 header. + */ + nw_frag = MINIFLOW_GET_U8(flow, nw_frag); + if (nw_frag) { + goto out; + } + if (nw_proto != IPPROTO_TCP && nw_proto != IPPROTO_UDP && nw_proto != IPPROTO_SCTP && nw_proto != IPPROTO_ICMP && nw_proto != IPPROTO_ICMPV6) {
Issue: The src-port for UDP is based on RSS hash in the packet metadata. In case of packets coming from VM it will be 5-tuple, if available, otherwise just IP addresses. If the VM fragments a large IP packet and sends the fragments to OVS, only the first fragment will contain the L4 header. Therefore, the first fragment and subsequent fragments get different UDP src ports in the outgoing VXLAN header. This can lead to fragment re-ordering in the fabric as packet will take different paths. Fix: With this patch, we ignore the L4 header during hash calculation in the case of fragmented packets. Signed-off-by: Parvathy Tarur Ramachandran <parvathy.tarur.ramachandran@ericsson.com> --- lib/flow.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)