[ovs-dev] dpif-netdev: Avoid reordering of packets in a batch with same megaflow
diff mbox series

Message ID 1529202111-20855-1-git-send-email-vishal.deep.ajmera@ericsson.com
State Superseded
Headers show
Series
  • [ovs-dev] dpif-netdev: Avoid reordering of packets in a batch with same megaflow
Related show

Commit Message

Vishal Deep Ajmera June 17, 2018, 2:21 a.m. UTC
OVS reads packets in batches from a given port and packets in the
batch are subjected to potentially 3 levels of lookups to identify
the datapath megaflow entry (or flow) associated with the packet.
Each megaflow entry has a dedicated buffer in which packets that match
the flow classification criteria are collected. This buffer helps OVS
perform batch processing for all packets associated with a given flow.

Each packet in the received batch is first subjected to lookup in the
Exact Match Cache (EMC). Each EMC entry will point to a flow. If the
EMC lookup is successful, the packet is moved from the rx batch to the
per-flow buffer.

Packets that did not match any EMC entry are rearranged in the rx batch
at the beginning and are now subjected to a lookup in the megaflow cache.
Packets that match a megaflow cache entry are *appended* to the per-flow
buffer.

Packets that do not match any megaflow entry are subjected to slow-path
processing through the upcall mechanism. This cannot change the order of
packets as by definition upcall processing is only done for packets
without matching megaflow entry.

The EMC entry match fields encompass all potentially significant header
fields, typically more than specified in the associated flow's match
criteria. Hence, multiple EMC entries can point to the same flow. Given
that per-flow batching happens at each lookup stage, packets belonging
to the same megaflow can get re-ordered because some packets match EMC
entries while others do not.

The following example can illustrate the issue better. Consider
following batch of packets (labelled P1 to P8) associated with a single
TCP connection and associated with a single flow. Let us assume that
packets with just the ACK bit set in TCP flags have been received in a
prior batch also and a corresponding EMC entry exists.

1. P1 (TCP Flag: ACK)
2. P2 (TCP Flag: ACK)
3. P3 (TCP Flag: ACK)
4. P4 (TCP Flag: ACK, PSH)
5. P5 (TCP Flag: ACK)
6. P6 (TCP Flag: ACK)
7. P7 (TCP Flag: ACK)
8. P8 (TCP Flag: ACK)

The megaflow classification criteria does not include TCP flags while
the EMC match criteria does. Thus, all packets other than P4 match
the existing EMC entry and are moved to the per-flow packet batch.
Subsequently, packet P4 is moved to the same per-flow packet batch as
a result of the megaflow lookup. Though the packets have all been
correctly classified as being associated with the same flow, the
packet order has not been preserved because of the per-flow batching
performed during the EMC lookup stage. This packet re-ordering has
performance implications for TCP applications.

This patch preserves the packet ordering by performing the per-flow
batching after both the EMC and megaflow lookups are complete. As an
optimization, packets are flow-batched in emc processing till any
packet in the batch has an EMC miss.

A new flow map is maintained to keep the original order of packet
along with flow information. Post fastpath processing, packets from
flow map are *appended* to per-flow buffer.

Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
---
 lib/dpif-netdev.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 66 insertions(+), 13 deletions(-)

Comments

Vishal Deep Ajmera June 18, 2018, 3:49 p.m. UTC | #1
Hi,

Does anyone see any issue with the patch ? The intent of the patch is to fix *reordering* of packets belonging to same megaflow. The issue can be frequently reproduced by running iperf test (as an example) between two VM's connected by an OVS netdev bridge with NORMAL flow. Analyzing packets captured at server side VM using wire-shark shows packets with PSH flag are reordered by OVS.

Warm Regards,
Vishal Ajmera
Wang, Yipeng1 June 18, 2018, 8:43 p.m. UTC | #2
Hi,

I looked into the code and the logic seems good to me.

But reordering in dataplane also has performance implications especially considering non-TCP traffic. Also, the packets came to OvS may already been out-of-ordered.  Could you provide some performance data points showing  when all EMC hit and mix of EMC-hit and EMC-miss cases, and how is the throughput affected? Also maybe considering only do it for TCP traffic?

Thanks
Yipeng
 
>-----Original Message-----
>From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Vishal Deep Ajmera
>Sent: Monday, June 18, 2018 8:49 AM
>To: dev@openvswitch.org
>Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Avoid reordering of packets in a batch with same megaflow
>
>Hi,
>
>Does anyone see any issue with the patch ? The intent of the patch is to fix *reordering* of packets belonging to same megaflow. The
>issue can be frequently reproduced by running iperf test (as an example) between two VM's connected by an OVS netdev bridge with
>NORMAL flow. Analyzing packets captured at server side VM using wire-shark shows packets with PSH flag are reordered by OVS.
>
>Warm Regards,
>Vishal Ajmera
>_______________________________________________
>dev mailing list
>dev@openvswitch.org
>https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Vishal Deep Ajmera June 19, 2018, 8:45 a.m. UTC | #3
> I looked into the code and the logic seems good to me.

> But reordering in dataplane also has performance implications especially considering non-TCP traffic. Also, the packets came to OvS may already been out-of-ordered.  Could you provide some performance data points showing  when all EMC hit and mix of EMC-hit and EMC-miss cases, and how is the throughput affected? Also maybe considering only do it for TCP traffic?

Thanks Yipeng for reviewing the patch.

The patch is for *preventing* OVS to do any reordering of packets received in a batch which belongs to same megaflow. OVS should send all packets for a given megaflow in the same order as it receives in the batch.

However if a packet is already received out-of-order from the network itself, OVS cannot put it back in order.

Regarding performance it also depends on application how it treats any out-of-order packet. For e.g. if a TCP application does not use SACK then end point may request retransmissions of packets which are received out-of-order thereby reducing effective throughput. Similarly any delayed (out-of-order) ACK can also cause fast-retransmits from sender side.

As per my understanding in principle if OVS is acting as a L2 switch (NORMAL flow) then it should not change the order of received packets for a given flow since we don't know how an application (Virtual Machine) will react to out-of-order packets.

Warm Regards,
Vishal Ajmera
0-day Robot June 19, 2018, 9:20 p.m. UTC | #4
Bleep bloop.  Greetings Vishal Deep Ajmera, I am a robot and I have tried out your patch
with message ID  <1529202111-20855-1-git-send-email-vishal.deep.ajmera@ericsson.com>
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
== Checking "0000.patch" ==
WARNING: Line is 84 characters long (recommended limit is 79)
#159 FILE: lib/dpif-netdev.c:5222:
                /* Flow batching should be performed only after fast-path processing

WARNING: Line is 93 characters long (recommended limit is 79)
#253 FILE: lib/dpif-netdev.c:5485:
        dp_netdev_queue_batches(map->packet, map->flow, map->tcp_flags, batches, &n_batches);

Lines checked: 265, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email me back.

Thanks,
0-day Robot
Aaron Conole June 19, 2018, 9:27 p.m. UTC | #5
0-day Robot <robot@bytheb.org> writes:

> Bleep bloop.  Greetings Vishal Deep Ajmera, I am a robot and I have tried out your patch
> with message ID  <1529202111-20855-1-git-send-email-vishal.deep.ajmera@ericsson.com>
> Thanks for your contribution.
>
> I encountered some error that I wasn't expecting.  See the details below.
>
>
> checkpatch:
> == Checking "0000.patch" ==
> WARNING: Line is 84 characters long (recommended limit is 79)
> #159 FILE: lib/dpif-netdev.c:5222:
>                 /* Flow batching should be performed only after fast-path processing
>
> WARNING: Line is 93 characters long (recommended limit is 79)
> #253 FILE: lib/dpif-netdev.c:5485:
>         dp_netdev_queue_batches(map->packet, map->flow, map->tcp_flags, batches, &n_batches);
>
> Lines checked: 265, Warnings: 2, Errors: 0
>
>
> Please check this out.  If you feel there has been an error, please email me back.

Oops.  I need to update the template.  Please don't reply to the bot - all
messages for it go to the bit bucket.
Ben Pfaff June 19, 2018, 9:42 p.m. UTC | #6
On Tue, Jun 19, 2018 at 05:20:08PM -0400, 0-day Robot wrote:
> Bleep bloop.  Greetings Vishal Deep Ajmera, I am a robot and I have tried out your patch
> with message ID  <1529202111-20855-1-git-send-email-vishal.deep.ajmera@ericsson.com>
> Thanks for your contribution.

Watch out.  It's a short step from "I have tried out your patch" to
"Kill all humans!"
Han Zhou June 20, 2018, 9:20 p.m. UTC | #7
On Sat, Jun 16, 2018 at 7:21 PM, Vishal Deep Ajmera <
vishal.deep.ajmera@ericsson.com> wrote:
>
> OVS reads packets in batches from a given port and packets in the
> batch are subjected to potentially 3 levels of lookups to identify
> the datapath megaflow entry (or flow) associated with the packet.
> Each megaflow entry has a dedicated buffer in which packets that match
> the flow classification criteria are collected. This buffer helps OVS
> perform batch processing for all packets associated with a given flow.
>
> Each packet in the received batch is first subjected to lookup in the
> Exact Match Cache (EMC). Each EMC entry will point to a flow. If the
> EMC lookup is successful, the packet is moved from the rx batch to the
> per-flow buffer.
>
> Packets that did not match any EMC entry are rearranged in the rx batch
> at the beginning and are now subjected to a lookup in the megaflow cache.
> Packets that match a megaflow cache entry are *appended* to the per-flow
> buffer.
>
> Packets that do not match any megaflow entry are subjected to slow-path
> processing through the upcall mechanism. This cannot change the order of
> packets as by definition upcall processing is only done for packets
> without matching megaflow entry.
>
> The EMC entry match fields encompass all potentially significant header
> fields, typically more than specified in the associated flow's match
> criteria. Hence, multiple EMC entries can point to the same flow. Given
> that per-flow batching happens at each lookup stage, packets belonging
> to the same megaflow can get re-ordered because some packets match EMC
> entries while others do not.
>
> The following example can illustrate the issue better. Consider
> following batch of packets (labelled P1 to P8) associated with a single
> TCP connection and associated with a single flow. Let us assume that
> packets with just the ACK bit set in TCP flags have been received in a
> prior batch also and a corresponding EMC entry exists.
>
> 1. P1 (TCP Flag: ACK)
> 2. P2 (TCP Flag: ACK)
> 3. P3 (TCP Flag: ACK)
> 4. P4 (TCP Flag: ACK, PSH)
> 5. P5 (TCP Flag: ACK)
> 6. P6 (TCP Flag: ACK)
> 7. P7 (TCP Flag: ACK)
> 8. P8 (TCP Flag: ACK)
>
> The megaflow classification criteria does not include TCP flags while
> the EMC match criteria does. Thus, all packets other than P4 match
> the existing EMC entry and are moved to the per-flow packet batch.
> Subsequently, packet P4 is moved to the same per-flow packet batch as
> a result of the megaflow lookup. Though the packets have all been
> correctly classified as being associated with the same flow, the
> packet order has not been preserved because of the per-flow batching
> performed during the EMC lookup stage. This packet re-ordering has
> performance implications for TCP applications.
>
> This patch preserves the packet ordering by performing the per-flow
> batching after both the EMC and megaflow lookups are complete. As an
> optimization, packets are flow-batched in emc processing till any
> packet in the batch has an EMC miss.
>
> A new flow map is maintained to keep the original order of packet
> along with flow information. Post fastpath processing, packets from
> flow map are *appended* to per-flow buffer.
>
> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
> Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
> Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
> ---
>  lib/dpif-netdev.c | 79
++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 66 insertions(+), 13 deletions(-)


Thanks for the patch! I didn't review the patch, but just have a question:
does this issue exist only for dpif-netdev, or does kernel datapath has
same issue?
Vishal Deep Ajmera June 21, 2018, 6:12 a.m. UTC | #8
Thanks for the patch! I didn't review the patch, but just have a question: does this issue exist only for dpif-netdev, or does kernel datapath has same issue?

I am not familiar with kernel data-path and don’t know if it also uses flow-based batching as in dpif-netdev.
Vishal Deep Ajmera June 21, 2018, 3:56 p.m. UTC | #9
Hi,

Requesting to get this patch merged to master and branches till 2.6 if there are no issues with the fix.

Warm Regards,
Vishal Ajmera

> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org <ovs-dev-
> bounces@openvswitch.org> On Behalf Of Vishal Deep Ajmera
> Sent: Thursday, June 21, 2018 11:42 AM
> To: Han Zhou <zhouhan@gmail.com>
> Cc: ovs dev <dev@openvswitch.org>
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Avoid reordering of packets in a
> batch with same megaflow
> 
> 
> Thanks for the patch! I didn't review the patch, but just have a question: does
> this issue exist only for dpif-netdev, or does kernel datapath has same issue?
> 
> I am not familiar with kernel data-path and don’t know if it also uses flow-based
> batching as in dpif-netdev.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Stokes, Ian July 3, 2018, 4:32 p.m. UTC | #10
On 6/17/2018 3:21 AM, Vishal Deep Ajmera wrote:
> OVS reads packets in batches from a given port and packets in the
> batch are subjected to potentially 3 levels of lookups to identify
> the datapath megaflow entry (or flow) associated with the packet.
> Each megaflow entry has a dedicated buffer in which packets that match
> the flow classification criteria are collected. This buffer helps OVS
> perform batch processing for all packets associated with a given flow.
> 
> Each packet in the received batch is first subjected to lookup in the
> Exact Match Cache (EMC). Each EMC entry will point to a flow. If the
> EMC lookup is successful, the packet is moved from the rx batch to the
> per-flow buffer.
> 
> Packets that did not match any EMC entry are rearranged in the rx batch
> at the beginning and are now subjected to a lookup in the megaflow cache.
> Packets that match a megaflow cache entry are *appended* to the per-flow
> buffer.
> 
> Packets that do not match any megaflow entry are subjected to slow-path
> processing through the upcall mechanism. This cannot change the order of
> packets as by definition upcall processing is only done for packets
> without matching megaflow entry.
> 
> The EMC entry match fields encompass all potentially significant header
> fields, typically more than specified in the associated flow's match
> criteria. Hence, multiple EMC entries can point to the same flow. Given
> that per-flow batching happens at each lookup stage, packets belonging
> to the same megaflow can get re-ordered because some packets match EMC
> entries while others do not.
> 
> The following example can illustrate the issue better. Consider
> following batch of packets (labelled P1 to P8) associated with a single
> TCP connection and associated with a single flow. Let us assume that
> packets with just the ACK bit set in TCP flags have been received in a
> prior batch also and a corresponding EMC entry exists.
> 
> 1. P1 (TCP Flag: ACK)
> 2. P2 (TCP Flag: ACK)
> 3. P3 (TCP Flag: ACK)
> 4. P4 (TCP Flag: ACK, PSH)
> 5. P5 (TCP Flag: ACK)
> 6. P6 (TCP Flag: ACK)
> 7. P7 (TCP Flag: ACK)
> 8. P8 (TCP Flag: ACK)
> 
> The megaflow classification criteria does not include TCP flags while
> the EMC match criteria does. Thus, all packets other than P4 match
> the existing EMC entry and are moved to the per-flow packet batch.
> Subsequently, packet P4 is moved to the same per-flow packet batch as
> a result of the megaflow lookup. Though the packets have all been
> correctly classified as being associated with the same flow, the
> packet order has not been preserved because of the per-flow batching
> performed during the EMC lookup stage. This packet re-ordering has
> performance implications for TCP applications.
> 
> This patch preserves the packet ordering by performing the per-flow
> batching after both the EMC and megaflow lookups are complete. As an
> optimization, packets are flow-batched in emc processing till any
> packet in the batch has an EMC miss.
> 
> A new flow map is maintained to keep the original order of packet
> along with flow information. Post fastpath processing, packets from
> flow map are *appended* to per-flow buffer.
> 
> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
> Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
> Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
> ---
>   lib/dpif-netdev.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 66 insertions(+), 13 deletions(-)
> 

Hi Vishal, thanks for the patch, I've some minor comments below. I'm 
still testing but so far it seems good.

Unless there are objections (or any new issues come to light) I will put 
this as part of the pull request this week and back port to the 
appropriate branches.

Thanks
Ian

> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9390fff..2c423ed 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -207,6 +207,13 @@ struct dpcls_rule {
>       /* 'flow' must be the last field, additional space is allocated here. */
>   };
>   
> +/* data structure to keep packet order till fastpath processing */
> +struct dp_packet_flow_map {
> +    struct dp_packet *packet;
> +    struct dp_netdev_flow *flow;
> +    uint16_t tcp_flags;
> +};
> +
>   static void dpcls_init(struct dpcls *);
>   static void dpcls_destroy(struct dpcls *);
>   static void dpcls_sort_subtable_vector(struct dpcls *);
> @@ -5081,10 +5088,10 @@ struct packet_batch_per_flow {
>   static inline void
>   packet_batch_per_flow_update(struct packet_batch_per_flow *batch,
>                                struct dp_packet *packet,
> -                             const struct miniflow *mf)
> +                             uint16_t tcp_flags)
>   {
>       batch->byte_count += dp_packet_size(packet);
> -    batch->tcp_flags |= miniflow_get_tcp_flags(mf);
> +    batch->tcp_flags |= tcp_flags;
>       batch->array.packets[batch->array.count++] = packet;
>   }
>   
> @@ -5118,7 +5125,7 @@ packet_batch_per_flow_execute(struct packet_batch_per_flow *batch,
>   
>   static inline void
>   dp_netdev_queue_batches(struct dp_packet *pkt,
> -                        struct dp_netdev_flow *flow, const struct miniflow *mf,
> +                        struct dp_netdev_flow *flow, uint16_t tcp_flags,
>                           struct packet_batch_per_flow *batches,
>                           size_t *n_batches)
>   {
> @@ -5129,7 +5136,7 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
>           packet_batch_per_flow_init(batch, flow);
>       }
>   
> -    packet_batch_per_flow_update(batch, pkt, mf);
> +    packet_batch_per_flow_update(batch, pkt, tcp_flags);
>   }
>   
>   /* Try to process all ('cnt') the 'packets' using only the exact match cache
> @@ -5151,6 +5158,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>                  struct dp_packet_batch *packets_,
>                  struct netdev_flow_key *keys,
>                  struct packet_batch_per_flow batches[], size_t *n_batches,
> +               struct dp_packet_flow_map *flow_map,
> +               size_t *n_flows,
> +               uint8_t *index_map,
>                  bool md_is_valid, odp_port_t port_no)
>   {
>       struct emc_cache *flow_cache = &pmd->flow_cache;
> @@ -5160,6 +5170,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>       const size_t cnt = dp_packet_batch_size(packets_);
>       uint32_t cur_min;
>       int i;
> +    size_t map_cnt = 0;
> +    uint16_t tcp_flags;
> +    bool batch_enable = true;
>   
>       atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
>       pmd_perf_update_counter(&pmd->perf_stats,
> @@ -5168,6 +5181,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>   
>       DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
>           struct dp_netdev_flow *flow;
> +        struct dp_packet_flow_map *map;
>   
>           if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
>               dp_packet_delete(packet);
> @@ -5200,8 +5214,20 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>               flow = NULL;
>           }
>           if (OVS_LIKELY(flow)) {
> -            dp_netdev_queue_batches(packet, flow, &key->mf, batches,
> -                                    n_batches);
> +            if (OVS_LIKELY(batch_enable)) {
> +                tcp_flags = miniflow_get_tcp_flags(&key->mf);
> +                dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
> +                                        n_batches);
> +            } else {
> +                /* Flow batching should be performed only after fast-path processing
> +                 * is also completed for packets with emc miss or else it will
> +                 * result in reordering of packets with same datapath flows.
> +                 */

You need to reformat the comment above as per ovs style guide as 
currently it exceeds 79 characters in width.

> +                map = &flow_map[map_cnt++];
> +                map->flow = flow;
> +                map->packet = packet;
> +                map->tcp_flags = miniflow_get_tcp_flags(&key->mf);
> +            }
>           } else {
>               /* Exact match cache missed. Group missed packets together at
>                * the beginning of the 'packets' array. */
> @@ -5210,9 +5236,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>                * must be returned to the caller. The next key should be extracted
>                * to 'keys[n_missed + 1]'. */
>               key = &keys[++n_missed];
> +
> +            /* preserve the order of packet for flow batching */
> +            index_map[packets_->count - 1] = map_cnt;
> +            flow_map[map_cnt++].flow = NULL;
> +
> +            /* skip batching for subsequent packets to avoid reordering */
> +            batch_enable = false;
>           }
>       }
>   
> +    *n_flows = map_cnt;
>       pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
>                               cnt - n_dropped - n_missed);
>   
> @@ -5299,8 +5333,8 @@ static inline void
>   fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>                        struct dp_packet_batch *packets_,
>                        struct netdev_flow_key *keys,
> -                     struct packet_batch_per_flow batches[],
> -                     size_t *n_batches,
> +                     struct dp_packet_flow_map *flow_map,
> +                     uint8_t *index_map,
>                        odp_port_t in_port)
>   {
>       const size_t cnt = dp_packet_batch_size(packets_);
> @@ -5379,6 +5413,8 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>   
>       DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
>           struct dp_netdev_flow *flow;
> +        /* get the original order of this packet in received batch */
> +        int recv_idx = index_map[i];
>   
>           if (OVS_UNLIKELY(!rules[i])) {
>               continue;
> @@ -5387,7 +5423,12 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>           flow = dp_netdev_flow_cast(rules[i]);
>   
>           emc_probabilistic_insert(pmd, &keys[i], flow);
> -        dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
> +        /* add these packets into the flow map in the same order
> +         * as received.
> +         */
> +        flow_map[recv_idx].packet = packet;
> +        flow_map[recv_idx].flow = flow;
> +        flow_map[recv_idx].tcp_flags = miniflow_get_tcp_flags(&keys[i].mf);
>       }
>   
>       pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
> @@ -5418,17 +5459,30 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
>       OVS_ALIGNED_VAR(CACHE_LINE_SIZE)
>           struct netdev_flow_key keys[PKT_ARRAY_SIZE];
>       struct packet_batch_per_flow batches[PKT_ARRAY_SIZE];
> -    size_t n_batches;
> +    struct dp_packet_flow_map flow_map[PKT_ARRAY_SIZE];
> +    uint8_t index_map[PKT_ARRAY_SIZE];
> +    size_t n_batches, n_flows = 0;
>       odp_port_t in_port;
>   
>       n_batches = 0;
>       emc_processing(pmd, packets, keys, batches, &n_batches,
> -                            md_is_valid, port_no);
> +                   flow_map, &n_flows, index_map, md_is_valid, port_no);
>       if (!dp_packet_batch_is_empty(packets)) {
>           /* Get ingress port from first packet's metadata. */
>           in_port = packets->packets[0]->md.in_port.odp_port;
>           fast_path_processing(pmd, packets, keys,
> -                             batches, &n_batches, in_port);
> +                             flow_map, index_map, in_port);
> +    }
> +
> +    size_t i;

I would prefer to see 'size_t i' declared at the beginning of the 
function to keep with the existing style.

> +    /* batch rest of packets which are in flow map */
> +    for (i = 0;i < n_flows; i++) {

OVS style guide, missing space between ; and i above.

> +        struct dp_packet_flow_map *map = &flow_map[i];
> +
> +        if (OVS_UNLIKELY(!map->flow)) {
> +            continue;
> +        }
> +        dp_netdev_queue_batches(map->packet, map->flow, map->tcp_flags, batches, &n_batches);

Above exceeds 79 character width, need to split arguments to new line 
from batches.

>       }
>   
>       /* All the flow batches need to be reset before any call to
> @@ -5440,7 +5494,6 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
>        * already its own batches[k] still waiting to be served.  So if its
>        * ‘batch’ member is not reset, the recirculated packet would be wrongly
>        * appended to batches[k] of the 1st call to dp_netdev_input__(). */
> -    size_t i;
>       for (i = 0; i < n_batches; i++) {
>           batches[i].flow->batch = NULL;
>       }
> 
> 
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Vishal Deep Ajmera July 4, 2018, 2:39 p.m. UTC | #11
> 
> Hi Vishal, thanks for the patch, I've some minor comments below. I'm
> still testing but so far it seems good.
> 
> Unless there are objections (or any new issues come to light) I will put
> this as part of the pull request this week and back port to the
> appropriate branches.
> 
> Thanks
> Ian

Thanks Ian for reviewing and testing this patch. 
I will address your comments in V2 patch.

Warm Regards,
Vishal Ajmera

Patch
diff mbox series

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9390fff..2c423ed 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -207,6 +207,13 @@  struct dpcls_rule {
     /* 'flow' must be the last field, additional space is allocated here. */
 };
 
+/* data structure to keep packet order till fastpath processing */
+struct dp_packet_flow_map {
+    struct dp_packet *packet;
+    struct dp_netdev_flow *flow;
+    uint16_t tcp_flags;
+};
+
 static void dpcls_init(struct dpcls *);
 static void dpcls_destroy(struct dpcls *);
 static void dpcls_sort_subtable_vector(struct dpcls *);
@@ -5081,10 +5088,10 @@  struct packet_batch_per_flow {
 static inline void
 packet_batch_per_flow_update(struct packet_batch_per_flow *batch,
                              struct dp_packet *packet,
-                             const struct miniflow *mf)
+                             uint16_t tcp_flags)
 {
     batch->byte_count += dp_packet_size(packet);
-    batch->tcp_flags |= miniflow_get_tcp_flags(mf);
+    batch->tcp_flags |= tcp_flags;
     batch->array.packets[batch->array.count++] = packet;
 }
 
@@ -5118,7 +5125,7 @@  packet_batch_per_flow_execute(struct packet_batch_per_flow *batch,
 
 static inline void
 dp_netdev_queue_batches(struct dp_packet *pkt,
-                        struct dp_netdev_flow *flow, const struct miniflow *mf,
+                        struct dp_netdev_flow *flow, uint16_t tcp_flags,
                         struct packet_batch_per_flow *batches,
                         size_t *n_batches)
 {
@@ -5129,7 +5136,7 @@  dp_netdev_queue_batches(struct dp_packet *pkt,
         packet_batch_per_flow_init(batch, flow);
     }
 
-    packet_batch_per_flow_update(batch, pkt, mf);
+    packet_batch_per_flow_update(batch, pkt, tcp_flags);
 }
 
 /* Try to process all ('cnt') the 'packets' using only the exact match cache
@@ -5151,6 +5158,9 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
                struct dp_packet_batch *packets_,
                struct netdev_flow_key *keys,
                struct packet_batch_per_flow batches[], size_t *n_batches,
+               struct dp_packet_flow_map *flow_map,
+               size_t *n_flows,
+               uint8_t *index_map,
                bool md_is_valid, odp_port_t port_no)
 {
     struct emc_cache *flow_cache = &pmd->flow_cache;
@@ -5160,6 +5170,9 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
     const size_t cnt = dp_packet_batch_size(packets_);
     uint32_t cur_min;
     int i;
+    size_t map_cnt = 0;
+    uint16_t tcp_flags;
+    bool batch_enable = true;
 
     atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
     pmd_perf_update_counter(&pmd->perf_stats,
@@ -5168,6 +5181,7 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
 
     DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
         struct dp_netdev_flow *flow;
+        struct dp_packet_flow_map *map;
 
         if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
             dp_packet_delete(packet);
@@ -5200,8 +5214,20 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
             flow = NULL;
         }
         if (OVS_LIKELY(flow)) {
-            dp_netdev_queue_batches(packet, flow, &key->mf, batches,
-                                    n_batches);
+            if (OVS_LIKELY(batch_enable)) {
+                tcp_flags = miniflow_get_tcp_flags(&key->mf);
+                dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
+                                        n_batches);
+            } else {
+                /* Flow batching should be performed only after fast-path processing
+                 * is also completed for packets with emc miss or else it will
+                 * result in reordering of packets with same datapath flows.
+                 */
+                map = &flow_map[map_cnt++];
+                map->flow = flow;
+                map->packet = packet;
+                map->tcp_flags = miniflow_get_tcp_flags(&key->mf);
+            }
         } else {
             /* Exact match cache missed. Group missed packets together at
              * the beginning of the 'packets' array. */
@@ -5210,9 +5236,17 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
              * must be returned to the caller. The next key should be extracted
              * to 'keys[n_missed + 1]'. */
             key = &keys[++n_missed];
+
+            /* preserve the order of packet for flow batching */
+            index_map[packets_->count - 1] = map_cnt;
+            flow_map[map_cnt++].flow = NULL;
+
+            /* skip batching for subsequent packets to avoid reordering */
+            batch_enable = false;
         }
     }
 
+    *n_flows = map_cnt;
     pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
                             cnt - n_dropped - n_missed);
 
@@ -5299,8 +5333,8 @@  static inline void
 fast_path_processing(struct dp_netdev_pmd_thread *pmd,
                      struct dp_packet_batch *packets_,
                      struct netdev_flow_key *keys,
-                     struct packet_batch_per_flow batches[],
-                     size_t *n_batches,
+                     struct dp_packet_flow_map *flow_map,
+                     uint8_t *index_map,
                      odp_port_t in_port)
 {
     const size_t cnt = dp_packet_batch_size(packets_);
@@ -5379,6 +5413,8 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
 
     DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
         struct dp_netdev_flow *flow;
+        /* get the original order of this packet in received batch */
+        int recv_idx = index_map[i];
 
         if (OVS_UNLIKELY(!rules[i])) {
             continue;
@@ -5387,7 +5423,12 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         flow = dp_netdev_flow_cast(rules[i]);
 
         emc_probabilistic_insert(pmd, &keys[i], flow);
-        dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
+        /* add these packets into the flow map in the same order
+         * as received.
+         */
+        flow_map[recv_idx].packet = packet;
+        flow_map[recv_idx].flow = flow;
+        flow_map[recv_idx].tcp_flags = miniflow_get_tcp_flags(&keys[i].mf);
     }
 
     pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
@@ -5418,17 +5459,30 @@  dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
     OVS_ALIGNED_VAR(CACHE_LINE_SIZE)
         struct netdev_flow_key keys[PKT_ARRAY_SIZE];
     struct packet_batch_per_flow batches[PKT_ARRAY_SIZE];
-    size_t n_batches;
+    struct dp_packet_flow_map flow_map[PKT_ARRAY_SIZE];
+    uint8_t index_map[PKT_ARRAY_SIZE];
+    size_t n_batches, n_flows = 0;
     odp_port_t in_port;
 
     n_batches = 0;
     emc_processing(pmd, packets, keys, batches, &n_batches,
-                            md_is_valid, port_no);
+                   flow_map, &n_flows, index_map, md_is_valid, port_no);
     if (!dp_packet_batch_is_empty(packets)) {
         /* Get ingress port from first packet's metadata. */
         in_port = packets->packets[0]->md.in_port.odp_port;
         fast_path_processing(pmd, packets, keys,
-                             batches, &n_batches, in_port);
+                             flow_map, index_map, in_port);
+    }
+
+    size_t i;
+    /* batch rest of packets which are in flow map */
+    for (i = 0;i < n_flows; i++) {
+        struct dp_packet_flow_map *map = &flow_map[i];
+
+        if (OVS_UNLIKELY(!map->flow)) {
+            continue;
+        }
+        dp_netdev_queue_batches(map->packet, map->flow, map->tcp_flags, batches, &n_batches);
     }
 
     /* All the flow batches need to be reset before any call to
@@ -5440,7 +5494,6 @@  dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
      * already its own batches[k] still waiting to be served.  So if its
      * ‘batch’ member is not reset, the recirculated packet would be wrongly
      * appended to batches[k] of the 1st call to dp_netdev_input__(). */
-    size_t i;
     for (i = 0; i < n_batches; i++) {
         batches[i].flow->batch = NULL;
     }