diff mbox series

[ovs-dev] dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.

Message ID 20200709151910.1773729-1-i.maximets@ovn.org
State Changes Requested
Headers show
Series [ovs-dev] dpif-netdev: Avoid deadlock with offloading during PMD thread deletion. | expand

Commit Message

Ilya Maximets July 9, 2020, 3:19 p.m. UTC
Main thread will try to pause/stop all revalidators during datapath
reconfiguration via datapath purge callback (dp_purge_cb) while
holding 'dp->port_mutex'.  And deadlock happens in case any of
revalidator threads is already waiting on 'dp->port_mutex' while
dumping offloaded flows:

           main thread                           revalidator
 ---------------------------------  ----------------------------------

 ovs_mutex_lock(&dp->port_mutex)

                                    dpif_netdev_flow_dump_next()
                                    -> dp_netdev_flow_to_dpif_flow
                                    -> get_dpif_flow_status
                                    -> dpif_netdev_get_flow_offload_status()
                                    -> ovs_mutex_lock(&dp->port_mutex)
                                       <waiting for mutex here>

 reconfigure_datapath()
 -> reconfigure_pmd_threads()
 -> dp_netdev_del_pmd()
 -> dp_purge_cb()
 -> udpif_pause_revalidators()
 -> ovs_barrier_block(&udpif->pause_barrier)
    <waiting for revalidators to reach barrier>

                          <DEADLOCK>

We're not allowed to call offloading API without holding global
port mutex from the userspace datapath due to thread safety
restrictions on netdev-ofload-dpdk module.  And it's also not easy
to rework datapath reconfiguration process in order to move actual
PMD removal and datapath purge out of the port mutex.

So, for now, not sleeping on a mutex if it's not immediately available
seem like an easiest workaround.  This will have impact on flow
statistics update rate and on ability to get the latest statistics
before removing the flow (latest stats will be lost in case we were
not able to take the mutex).  However, this will allow us to operate
normally avoiding the deadlock.

The last bit is that to avoid flapping of flow attributes and
statistics we're not failing the operation, but returning last
statistics and attributes returned by offload provider.  Since those
might be updated in different threads, stores and reads are atomic.

Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

Tested with unit tests only.  Needs checking with the real setup.
Especially stats/attrs update parts.

 AUTHORS.rst       |  1 +
 lib/dpif-netdev.c | 78 +++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 73 insertions(+), 6 deletions(-)

Comments

0-day Robot July 9, 2020, 3:59 p.m. UTC | #1
Bleep bloop.  Greetings Ilya Maximets, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#157 FILE: lib/dpif-netdev.c:3217:
     * XXX: Main thread will try to pause/stop all revalidators during datapath

Lines checked: 194, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Kevin Traynor July 14, 2020, 3:33 p.m. UTC | #2
On 09/07/2020 16:19, Ilya Maximets wrote:
> Main thread will try to pause/stop all revalidators during datapath
> reconfiguration via datapath purge callback (dp_purge_cb) while
> holding 'dp->port_mutex'.  And deadlock happens in case any of
> revalidator threads is already waiting on 'dp->port_mutex' while
> dumping offloaded flows:
> 
>            main thread                           revalidator
>  ---------------------------------  ----------------------------------
> 
>  ovs_mutex_lock(&dp->port_mutex)
> 
>                                     dpif_netdev_flow_dump_next()
>                                     -> dp_netdev_flow_to_dpif_flow
>                                     -> get_dpif_flow_status
>                                     -> dpif_netdev_get_flow_offload_status()
>                                     -> ovs_mutex_lock(&dp->port_mutex)
>                                        <waiting for mutex here>
> 
>  reconfigure_datapath()
>  -> reconfigure_pmd_threads()
>  -> dp_netdev_del_pmd()
>  -> dp_purge_cb()
>  -> udpif_pause_revalidators()
>  -> ovs_barrier_block(&udpif->pause_barrier)
>     <waiting for revalidators to reach barrier>
> 
>                           <DEADLOCK>
> 
> We're not allowed to call offloading API without holding global
> port mutex from the userspace datapath due to thread safety
> restrictions on netdev-ofload-dpdk module.  And it's also not easy

typo s/ofload/offload/

> to rework datapath reconfiguration process in order to move actual
> PMD removal and datapath purge out of the port mutex.
> 
> So, for now, not sleeping on a mutex if it's not immediately available
> seem like an easiest workaround.  This will have impact on flow
> statistics update rate and on ability to get the latest statistics
> before removing the flow (latest stats will be lost in case we were
> not able to take the mutex).  However, this will allow us to operate
> normally avoiding the deadlock.
> 
> The last bit is that to avoid flapping of flow attributes and
> statistics we're not failing the operation, but returning last
> statistics and attributes returned by offload provider.  Since those
> might be updated in different threads, stores and reads are atomic.
> 

Nice explanation, thanks

> Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
> 
> Tested with unit tests only.  Needs checking with the real setup.
> Especially stats/attrs update parts.
> 
>  AUTHORS.rst       |  1 +
>  lib/dpif-netdev.c | 78 +++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 73 insertions(+), 6 deletions(-)
> 
> diff --git a/AUTHORS.rst b/AUTHORS.rst
> index 8e6a0769f..eb36a01d0 100644
> --- a/AUTHORS.rst
> +++ b/AUTHORS.rst
> @@ -504,6 +504,7 @@ Edwin Chiu                      echiu@vmware.com
>  Eivind Bulie Haanaes
>  Enas Ahmad                      enas.ahmad@kaust.edu.sa
>  Eric Lopez
> +Frank Wang (王培辉)             wangpeihui@inspur.com
>  Frido Roose                     fr.roose@gmail.com
>  Gaetano Catalli                 gaetano.catalli@gmail.com
>  Gavin Remaley                   gavin_remaley@selinc.com
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 629a0cb53..2eb7f32ff 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -490,6 +490,12 @@ struct dp_netdev_flow_stats {
>      atomic_uint16_t tcp_flags;     /* Bitwise-OR of seen tcp_flags values. */
>  };
>  
> +/* Contained by struct dp_netdev_flow's 'last_attrs' member.  */
> +struct dp_netdev_flow_attrs {
> +    atomic_bool offloaded;         /* True if flow is offloaded to HW. */
> +    ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */
> +};
> +
>  /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
>   *
>   *
> @@ -550,6 +556,10 @@ struct dp_netdev_flow {
>      /* Statistics. */
>      struct dp_netdev_flow_stats stats;
>  
> +    /* Statistics and attributes received from the netdev offload provider. */
> +    struct dp_netdev_flow_stats last_stats;
> +    struct dp_netdev_flow_attrs last_attrs;
> +
>      /* Actions. */
>      OVSRCU_TYPE(struct dp_netdev_actions *) actions;
>  
> @@ -3150,9 +3160,43 @@ dp_netdev_pmd_find_flow(const struct dp_netdev_pmd_thread *pmd,
>      return NULL;
>  }
>  
> +static void
> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
> +                                    const struct dpif_flow_stats *stats,
> +                                    const struct dpif_flow_attrs *attrs)
> +{
> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
> +
> +    atomic_store_relaxed(&last_stats->used,         stats->used);
> +    atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
> +    atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
> +    atomic_store_relaxed(&last_stats->tcp_flags,    stats->tcp_flags);
> +
> +    atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
> +    atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
> +}
> +
> +static void
> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
> +                                    struct dpif_flow_stats *stats,
> +                                    struct dpif_flow_attrs *attrs)
> +{
> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
> +
> +    atomic_read_relaxed(&last_stats->used,         &stats->used);
> +    atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
> +    atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
> +    atomic_read_relaxed(&last_stats->tcp_flags,    &stats->tcp_flags);
> +
> +    atomic_read_relaxed(&last_attrs->offloaded,    &attrs->offloaded);
> +    atomic_read_relaxed(&last_attrs->dp_layer,     &attrs->dp_layer);
> +}
> +
>  static bool
>  dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
> -                                    const struct dp_netdev_flow *netdev_flow,
> +                                    struct dp_netdev_flow *netdev_flow,
>                                      struct dpif_flow_stats *stats,
>                                      struct dpif_flow_attrs *attrs)
>  {
> @@ -3175,11 +3219,31 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>      }
>      ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>      /* Taking a global 'port_mutex' to fulfill thread safety
> -     * restrictions for the netdev-offload-dpdk module. */
> -    ovs_mutex_lock(&dp->port_mutex);
> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow->mega_ufid,
> -                          stats, attrs, &buf);
> -    ovs_mutex_unlock(&dp->port_mutex);
> +     * restrictions for the netdev-offload-dpdk module.
> +     *
> +     * XXX: Main thread will try to pause/stop all revalidators during datapath
> +     *      reconfiguration via datapath purge callback (dp_purge_cb) while
> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex here.
> +     *      Otherwise, deadlock is possible, bcause revalidators might sleep
> +     *      waiting for the main thread to release the lock and main thread
> +     *      will wait for them to stop processing.
> +     *      This workaround might make statistics less accurate. Especially
> +     *      for flow deletion case, since there will be no other attempt.  */
> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
> +        ret = netdev_flow_get(netdev, &match, &actions,
> +                              &netdev_flow->mega_ufid, stats, attrs, &buf);
> +        /* Storing attributes from the last request for later use on mutex
> +         * contention. */

Shouldn't you check 'ret' here before the set

> +        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs);
> +        ovs_mutex_unlock(&dp->port_mutex);
> +    } else {
> +        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs);
> +        if (!attrs->dp_layer) {
> +            /* Flow was never reported as 'offloaded' so it's harmless
> +             * to continue to think so. */
> +            ret = EAGAIN;
> +        }
> +    }
>      netdev_close(netdev);
>      if (ret) {
>          return false;
> @@ -3448,6 +3512,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>      /* Do not allocate extra space. */
>      flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
>      memset(&flow->stats, 0, sizeof flow->stats);
> +    memset(&flow->last_stats, 0, sizeof flow->last_stats);
> +    memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
>      flow->dead = false;
>      flow->batch = NULL;
>      flow->mark = INVALID_FLOW_MARK;
>
Ilya Maximets July 14, 2020, 5:52 p.m. UTC | #3
On 7/14/20 5:33 PM, Kevin Traynor wrote:
> On 09/07/2020 16:19, Ilya Maximets wrote:
>> Main thread will try to pause/stop all revalidators during datapath
>> reconfiguration via datapath purge callback (dp_purge_cb) while
>> holding 'dp->port_mutex'.  And deadlock happens in case any of
>> revalidator threads is already waiting on 'dp->port_mutex' while
>> dumping offloaded flows:
>>
>>            main thread                           revalidator
>>  ---------------------------------  ----------------------------------
>>
>>  ovs_mutex_lock(&dp->port_mutex)
>>
>>                                     dpif_netdev_flow_dump_next()
>>                                     -> dp_netdev_flow_to_dpif_flow
>>                                     -> get_dpif_flow_status
>>                                     -> dpif_netdev_get_flow_offload_status()
>>                                     -> ovs_mutex_lock(&dp->port_mutex)
>>                                        <waiting for mutex here>
>>
>>  reconfigure_datapath()
>>  -> reconfigure_pmd_threads()
>>  -> dp_netdev_del_pmd()
>>  -> dp_purge_cb()
>>  -> udpif_pause_revalidators()
>>  -> ovs_barrier_block(&udpif->pause_barrier)
>>     <waiting for revalidators to reach barrier>
>>
>>                           <DEADLOCK>
>>
>> We're not allowed to call offloading API without holding global
>> port mutex from the userspace datapath due to thread safety
>> restrictions on netdev-ofload-dpdk module.  And it's also not easy
> 
> typo s/ofload/offload/

Thanks.

> 
>> to rework datapath reconfiguration process in order to move actual
>> PMD removal and datapath purge out of the port mutex.
>>
>> So, for now, not sleeping on a mutex if it's not immediately available
>> seem like an easiest workaround.  This will have impact on flow
>> statistics update rate and on ability to get the latest statistics
>> before removing the flow (latest stats will be lost in case we were
>> not able to take the mutex).  However, this will allow us to operate
>> normally avoiding the deadlock.
>>
>> The last bit is that to avoid flapping of flow attributes and
>> statistics we're not failing the operation, but returning last
>> statistics and attributes returned by offload provider.  Since those
>> might be updated in different threads, stores and reads are atomic.
>>
> 
> Nice explanation, thanks
> 
>> Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
>> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>
>> Tested with unit tests only.  Needs checking with the real setup.
>> Especially stats/attrs update parts.
>>
>>  AUTHORS.rst       |  1 +
>>  lib/dpif-netdev.c | 78 +++++++++++++++++++++++++++++++++++++++++++----
>>  2 files changed, 73 insertions(+), 6 deletions(-)
>>
>> diff --git a/AUTHORS.rst b/AUTHORS.rst
>> index 8e6a0769f..eb36a01d0 100644
>> --- a/AUTHORS.rst
>> +++ b/AUTHORS.rst
>> @@ -504,6 +504,7 @@ Edwin Chiu                      echiu@vmware.com
>>  Eivind Bulie Haanaes
>>  Enas Ahmad                      enas.ahmad@kaust.edu.sa
>>  Eric Lopez
>> +Frank Wang (王培辉)             wangpeihui@inspur.com
>>  Frido Roose                     fr.roose@gmail.com
>>  Gaetano Catalli                 gaetano.catalli@gmail.com
>>  Gavin Remaley                   gavin_remaley@selinc.com
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 629a0cb53..2eb7f32ff 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -490,6 +490,12 @@ struct dp_netdev_flow_stats {
>>      atomic_uint16_t tcp_flags;     /* Bitwise-OR of seen tcp_flags values. */
>>  };
>>  
>> +/* Contained by struct dp_netdev_flow's 'last_attrs' member.  */
>> +struct dp_netdev_flow_attrs {
>> +    atomic_bool offloaded;         /* True if flow is offloaded to HW. */
>> +    ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */
>> +};
>> +
>>  /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
>>   *
>>   *
>> @@ -550,6 +556,10 @@ struct dp_netdev_flow {
>>      /* Statistics. */
>>      struct dp_netdev_flow_stats stats;
>>  
>> +    /* Statistics and attributes received from the netdev offload provider. */
>> +    struct dp_netdev_flow_stats last_stats;
>> +    struct dp_netdev_flow_attrs last_attrs;
>> +
>>      /* Actions. */
>>      OVSRCU_TYPE(struct dp_netdev_actions *) actions;
>>  
>> @@ -3150,9 +3160,43 @@ dp_netdev_pmd_find_flow(const struct dp_netdev_pmd_thread *pmd,
>>      return NULL;
>>  }
>>  
>> +static void
>> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>> +                                    const struct dpif_flow_stats *stats,
>> +                                    const struct dpif_flow_attrs *attrs)
>> +{
>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>> +
>> +    atomic_store_relaxed(&last_stats->used,         stats->used);
>> +    atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
>> +    atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
>> +    atomic_store_relaxed(&last_stats->tcp_flags,    stats->tcp_flags);
>> +
>> +    atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
>> +    atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
>> +}
>> +
>> +static void
>> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>> +                                    struct dpif_flow_stats *stats,
>> +                                    struct dpif_flow_attrs *attrs)
>> +{
>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>> +
>> +    atomic_read_relaxed(&last_stats->used,         &stats->used);
>> +    atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
>> +    atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
>> +    atomic_read_relaxed(&last_stats->tcp_flags,    &stats->tcp_flags);
>> +
>> +    atomic_read_relaxed(&last_attrs->offloaded,    &attrs->offloaded);
>> +    atomic_read_relaxed(&last_attrs->dp_layer,     &attrs->dp_layer);
>> +}
>> +
>>  static bool
>>  dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>> -                                    const struct dp_netdev_flow *netdev_flow,
>> +                                    struct dp_netdev_flow *netdev_flow,
>>                                      struct dpif_flow_stats *stats,
>>                                      struct dpif_flow_attrs *attrs)
>>  {
>> @@ -3175,11 +3219,31 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>      }
>>      ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>>      /* Taking a global 'port_mutex' to fulfill thread safety
>> -     * restrictions for the netdev-offload-dpdk module. */
>> -    ovs_mutex_lock(&dp->port_mutex);
>> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow->mega_ufid,
>> -                          stats, attrs, &buf);
>> -    ovs_mutex_unlock(&dp->port_mutex);
>> +     * restrictions for the netdev-offload-dpdk module.
>> +     *
>> +     * XXX: Main thread will try to pause/stop all revalidators during datapath
>> +     *      reconfiguration via datapath purge callback (dp_purge_cb) while
>> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex here.
>> +     *      Otherwise, deadlock is possible, bcause revalidators might sleep
>> +     *      waiting for the main thread to release the lock and main thread
>> +     *      will wait for them to stop processing.
>> +     *      This workaround might make statistics less accurate. Especially
>> +     *      for flow deletion case, since there will be no other attempt.  */
>> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
>> +        ret = netdev_flow_get(netdev, &match, &actions,
>> +                              &netdev_flow->mega_ufid, stats, attrs, &buf);
>> +        /* Storing attributes from the last request for later use on mutex
>> +         * contention. */
> 
> Shouldn't you check 'ret' here before the set

Yes.  That makes sense.  I'll move below call under 'if (!ret)'.

> 
>> +        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs);
>> +        ovs_mutex_unlock(&dp->port_mutex);
>> +    } else {
>> +        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs);
>> +        if (!attrs->dp_layer) {
>> +            /* Flow was never reported as 'offloaded' so it's harmless
>> +             * to continue to think so. */
>> +            ret = EAGAIN;
>> +        }
>> +    }
>>      netdev_close(netdev);
>>      if (ret) {
>>          return false;
>> @@ -3448,6 +3512,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>>      /* Do not allocate extra space. */
>>      flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
>>      memset(&flow->stats, 0, sizeof flow->stats);
>> +    memset(&flow->last_stats, 0, sizeof flow->last_stats);
>> +    memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
>>      flow->dead = false;
>>      flow->batch = NULL;
>>      flow->mark = INVALID_FLOW_MARK;
>>
>
Ilya Maximets July 14, 2020, 6:24 p.m. UTC | #4
On 7/14/20 7:52 PM, Ilya Maximets wrote:
> On 7/14/20 5:33 PM, Kevin Traynor wrote:
>> On 09/07/2020 16:19, Ilya Maximets wrote:
>>> Main thread will try to pause/stop all revalidators during datapath
>>> reconfiguration via datapath purge callback (dp_purge_cb) while
>>> holding 'dp->port_mutex'.  And deadlock happens in case any of
>>> revalidator threads is already waiting on 'dp->port_mutex' while
>>> dumping offloaded flows:
>>>
>>>            main thread                           revalidator
>>>  ---------------------------------  ----------------------------------
>>>
>>>  ovs_mutex_lock(&dp->port_mutex)
>>>
>>>                                     dpif_netdev_flow_dump_next()
>>>                                     -> dp_netdev_flow_to_dpif_flow
>>>                                     -> get_dpif_flow_status
>>>                                     -> dpif_netdev_get_flow_offload_status()
>>>                                     -> ovs_mutex_lock(&dp->port_mutex)
>>>                                        <waiting for mutex here>
>>>
>>>  reconfigure_datapath()
>>>  -> reconfigure_pmd_threads()
>>>  -> dp_netdev_del_pmd()
>>>  -> dp_purge_cb()
>>>  -> udpif_pause_revalidators()
>>>  -> ovs_barrier_block(&udpif->pause_barrier)
>>>     <waiting for revalidators to reach barrier>
>>>
>>>                           <DEADLOCK>
>>>
>>> We're not allowed to call offloading API without holding global
>>> port mutex from the userspace datapath due to thread safety
>>> restrictions on netdev-ofload-dpdk module.  And it's also not easy
>>
>> typo s/ofload/offload/
> 
> Thanks.
> 
>>
>>> to rework datapath reconfiguration process in order to move actual
>>> PMD removal and datapath purge out of the port mutex.
>>>
>>> So, for now, not sleeping on a mutex if it's not immediately available
>>> seem like an easiest workaround.  This will have impact on flow
>>> statistics update rate and on ability to get the latest statistics
>>> before removing the flow (latest stats will be lost in case we were
>>> not able to take the mutex).  However, this will allow us to operate
>>> normally avoiding the deadlock.
>>>
>>> The last bit is that to avoid flapping of flow attributes and
>>> statistics we're not failing the operation, but returning last
>>> statistics and attributes returned by offload provider.  Since those
>>> might be updated in different threads, stores and reads are atomic.
>>>
>>
>> Nice explanation, thanks
>>
>>> Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
>>> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>>
>>> Tested with unit tests only.  Needs checking with the real setup.
>>> Especially stats/attrs update parts.
>>>
>>>  AUTHORS.rst       |  1 +
>>>  lib/dpif-netdev.c | 78 +++++++++++++++++++++++++++++++++++++++++++----
>>>  2 files changed, 73 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/AUTHORS.rst b/AUTHORS.rst
>>> index 8e6a0769f..eb36a01d0 100644
>>> --- a/AUTHORS.rst
>>> +++ b/AUTHORS.rst
>>> @@ -504,6 +504,7 @@ Edwin Chiu                      echiu@vmware.com
>>>  Eivind Bulie Haanaes
>>>  Enas Ahmad                      enas.ahmad@kaust.edu.sa
>>>  Eric Lopez
>>> +Frank Wang (王培辉)             wangpeihui@inspur.com
>>>  Frido Roose                     fr.roose@gmail.com
>>>  Gaetano Catalli                 gaetano.catalli@gmail.com
>>>  Gavin Remaley                   gavin_remaley@selinc.com
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 629a0cb53..2eb7f32ff 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -490,6 +490,12 @@ struct dp_netdev_flow_stats {
>>>      atomic_uint16_t tcp_flags;     /* Bitwise-OR of seen tcp_flags values. */
>>>  };
>>>  
>>> +/* Contained by struct dp_netdev_flow's 'last_attrs' member.  */
>>> +struct dp_netdev_flow_attrs {
>>> +    atomic_bool offloaded;         /* True if flow is offloaded to HW. */
>>> +    ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */
>>> +};
>>> +
>>>  /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
>>>   *
>>>   *
>>> @@ -550,6 +556,10 @@ struct dp_netdev_flow {
>>>      /* Statistics. */
>>>      struct dp_netdev_flow_stats stats;
>>>  
>>> +    /* Statistics and attributes received from the netdev offload provider. */
>>> +    struct dp_netdev_flow_stats last_stats;
>>> +    struct dp_netdev_flow_attrs last_attrs;
>>> +
>>>      /* Actions. */
>>>      OVSRCU_TYPE(struct dp_netdev_actions *) actions;
>>>  
>>> @@ -3150,9 +3160,43 @@ dp_netdev_pmd_find_flow(const struct dp_netdev_pmd_thread *pmd,
>>>      return NULL;
>>>  }
>>>  
>>> +static void
>>> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>> +                                    const struct dpif_flow_stats *stats,
>>> +                                    const struct dpif_flow_attrs *attrs)
>>> +{
>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>>> +
>>> +    atomic_store_relaxed(&last_stats->used,         stats->used);
>>> +    atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
>>> +    atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
>>> +    atomic_store_relaxed(&last_stats->tcp_flags,    stats->tcp_flags);
>>> +
>>> +    atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
>>> +    atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
>>> +}
>>> +
>>> +static void
>>> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>> +                                    struct dpif_flow_stats *stats,
>>> +                                    struct dpif_flow_attrs *attrs)
>>> +{
>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>>> +
>>> +    atomic_read_relaxed(&last_stats->used,         &stats->used);
>>> +    atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
>>> +    atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
>>> +    atomic_read_relaxed(&last_stats->tcp_flags,    &stats->tcp_flags);
>>> +
>>> +    atomic_read_relaxed(&last_attrs->offloaded,    &attrs->offloaded);
>>> +    atomic_read_relaxed(&last_attrs->dp_layer,     &attrs->dp_layer);
>>> +}
>>> +
>>>  static bool
>>>  dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>> -                                    const struct dp_netdev_flow *netdev_flow,
>>> +                                    struct dp_netdev_flow *netdev_flow,
>>>                                      struct dpif_flow_stats *stats,
>>>                                      struct dpif_flow_attrs *attrs)
>>>  {
>>> @@ -3175,11 +3219,31 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>>      }
>>>      ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>>>      /* Taking a global 'port_mutex' to fulfill thread safety
>>> -     * restrictions for the netdev-offload-dpdk module. */
>>> -    ovs_mutex_lock(&dp->port_mutex);
>>> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow->mega_ufid,
>>> -                          stats, attrs, &buf);
>>> -    ovs_mutex_unlock(&dp->port_mutex);
>>> +     * restrictions for the netdev-offload-dpdk module.
>>> +     *
>>> +     * XXX: Main thread will try to pause/stop all revalidators during datapath
>>> +     *      reconfiguration via datapath purge callback (dp_purge_cb) while
>>> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex here.
>>> +     *      Otherwise, deadlock is possible, bcause revalidators might sleep
>>> +     *      waiting for the main thread to release the lock and main thread
>>> +     *      will wait for them to stop processing.
>>> +     *      This workaround might make statistics less accurate. Especially
>>> +     *      for flow deletion case, since there will be no other attempt.  */
>>> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
>>> +        ret = netdev_flow_get(netdev, &match, &actions,
>>> +                              &netdev_flow->mega_ufid, stats, attrs, &buf);
>>> +        /* Storing attributes from the last request for later use on mutex
>>> +         * contention. */
>>
>> Shouldn't you check 'ret' here before the set
> 
> Yes.  That makes sense.  I'll move below call under 'if (!ret)'.

Another option is to actually store 'ret' along with last_{stats,attrs}.
And return last return value that we actually had.  That will protect
us from flapping in case the flow dissapeared from the HW (for example
with linux tc this might happen if someone removed flows from qdisk
externally).  In case we're storing only on success, this will result
in returniing error if we took the mutex and sucess if we didn't but have
good attributes and statistics from one of the previous calls.

What do you think?

> 
>>
>>> +        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs);
>>> +        ovs_mutex_unlock(&dp->port_mutex);
>>> +    } else {
>>> +        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs);
>>> +        if (!attrs->dp_layer) {
>>> +            /* Flow was never reported as 'offloaded' so it's harmless
>>> +             * to continue to think so. */
>>> +            ret = EAGAIN;
>>> +        }
>>> +    }
>>>      netdev_close(netdev);
>>>      if (ret) {
>>>          return false;
>>> @@ -3448,6 +3512,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>>>      /* Do not allocate extra space. */
>>>      flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
>>>      memset(&flow->stats, 0, sizeof flow->stats);
>>> +    memset(&flow->last_stats, 0, sizeof flow->last_stats);
>>> +    memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
>>>      flow->dead = false;
>>>      flow->batch = NULL;
>>>      flow->mark = INVALID_FLOW_MARK;
>>>
>>
>
Kevin Traynor July 15, 2020, 9:44 a.m. UTC | #5
On 14/07/2020 19:24, Ilya Maximets wrote:
> On 7/14/20 7:52 PM, Ilya Maximets wrote:
>> On 7/14/20 5:33 PM, Kevin Traynor wrote:
>>> On 09/07/2020 16:19, Ilya Maximets wrote:
>>>> Main thread will try to pause/stop all revalidators during datapath
>>>> reconfiguration via datapath purge callback (dp_purge_cb) while
>>>> holding 'dp->port_mutex'.  And deadlock happens in case any of
>>>> revalidator threads is already waiting on 'dp->port_mutex' while
>>>> dumping offloaded flows:
>>>>
>>>>            main thread                           revalidator
>>>>  ---------------------------------  ----------------------------------
>>>>
>>>>  ovs_mutex_lock(&dp->port_mutex)
>>>>
>>>>                                     dpif_netdev_flow_dump_next()
>>>>                                     -> dp_netdev_flow_to_dpif_flow
>>>>                                     -> get_dpif_flow_status
>>>>                                     -> dpif_netdev_get_flow_offload_status()
>>>>                                     -> ovs_mutex_lock(&dp->port_mutex)
>>>>                                        <waiting for mutex here>
>>>>
>>>>  reconfigure_datapath()
>>>>  -> reconfigure_pmd_threads()
>>>>  -> dp_netdev_del_pmd()
>>>>  -> dp_purge_cb()
>>>>  -> udpif_pause_revalidators()
>>>>  -> ovs_barrier_block(&udpif->pause_barrier)
>>>>     <waiting for revalidators to reach barrier>
>>>>
>>>>                           <DEADLOCK>
>>>>
>>>> We're not allowed to call offloading API without holding global
>>>> port mutex from the userspace datapath due to thread safety
>>>> restrictions on netdev-ofload-dpdk module.  And it's also not easy
>>>
>>> typo s/ofload/offload/
>>
>> Thanks.
>>
>>>
>>>> to rework datapath reconfiguration process in order to move actual
>>>> PMD removal and datapath purge out of the port mutex.
>>>>
>>>> So, for now, not sleeping on a mutex if it's not immediately available
>>>> seem like an easiest workaround.  This will have impact on flow
>>>> statistics update rate and on ability to get the latest statistics
>>>> before removing the flow (latest stats will be lost in case we were
>>>> not able to take the mutex).  However, this will allow us to operate
>>>> normally avoiding the deadlock.
>>>>
>>>> The last bit is that to avoid flapping of flow attributes and
>>>> statistics we're not failing the operation, but returning last
>>>> statistics and attributes returned by offload provider.  Since those
>>>> might be updated in different threads, stores and reads are atomic.
>>>>
>>>
>>> Nice explanation, thanks
>>>
>>>> Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
>>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
>>>> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>> ---
>>>>
>>>> Tested with unit tests only.  Needs checking with the real setup.
>>>> Especially stats/attrs update parts.
>>>>
>>>>  AUTHORS.rst       |  1 +
>>>>  lib/dpif-netdev.c | 78 +++++++++++++++++++++++++++++++++++++++++++----
>>>>  2 files changed, 73 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/AUTHORS.rst b/AUTHORS.rst
>>>> index 8e6a0769f..eb36a01d0 100644
>>>> --- a/AUTHORS.rst
>>>> +++ b/AUTHORS.rst
>>>> @@ -504,6 +504,7 @@ Edwin Chiu                      echiu@vmware.com
>>>>  Eivind Bulie Haanaes
>>>>  Enas Ahmad                      enas.ahmad@kaust.edu.sa
>>>>  Eric Lopez
>>>> +Frank Wang (王培辉)             wangpeihui@inspur.com
>>>>  Frido Roose                     fr.roose@gmail.com
>>>>  Gaetano Catalli                 gaetano.catalli@gmail.com
>>>>  Gavin Remaley                   gavin_remaley@selinc.com
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index 629a0cb53..2eb7f32ff 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -490,6 +490,12 @@ struct dp_netdev_flow_stats {
>>>>      atomic_uint16_t tcp_flags;     /* Bitwise-OR of seen tcp_flags values. */
>>>>  };
>>>>  
>>>> +/* Contained by struct dp_netdev_flow's 'last_attrs' member.  */
>>>> +struct dp_netdev_flow_attrs {
>>>> +    atomic_bool offloaded;         /* True if flow is offloaded to HW. */
>>>> +    ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */
>>>> +};
>>>> +
>>>>  /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
>>>>   *
>>>>   *
>>>> @@ -550,6 +556,10 @@ struct dp_netdev_flow {
>>>>      /* Statistics. */
>>>>      struct dp_netdev_flow_stats stats;
>>>>  
>>>> +    /* Statistics and attributes received from the netdev offload provider. */
>>>> +    struct dp_netdev_flow_stats last_stats;
>>>> +    struct dp_netdev_flow_attrs last_attrs;
>>>> +
>>>>      /* Actions. */
>>>>      OVSRCU_TYPE(struct dp_netdev_actions *) actions;
>>>>  
>>>> @@ -3150,9 +3160,43 @@ dp_netdev_pmd_find_flow(const struct dp_netdev_pmd_thread *pmd,
>>>>      return NULL;
>>>>  }
>>>>  
>>>> +static void
>>>> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>>> +                                    const struct dpif_flow_stats *stats,
>>>> +                                    const struct dpif_flow_attrs *attrs)
>>>> +{
>>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>>> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>>>> +
>>>> +    atomic_store_relaxed(&last_stats->used,         stats->used);
>>>> +    atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
>>>> +    atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
>>>> +    atomic_store_relaxed(&last_stats->tcp_flags,    stats->tcp_flags);
>>>> +
>>>> +    atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
>>>> +    atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
>>>> +}
>>>> +
>>>> +static void
>>>> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>>> +                                    struct dpif_flow_stats *stats,
>>>> +                                    struct dpif_flow_attrs *attrs)
>>>> +{
>>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>>> +    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>>>> +
>>>> +    atomic_read_relaxed(&last_stats->used,         &stats->used);
>>>> +    atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
>>>> +    atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
>>>> +    atomic_read_relaxed(&last_stats->tcp_flags,    &stats->tcp_flags);
>>>> +
>>>> +    atomic_read_relaxed(&last_attrs->offloaded,    &attrs->offloaded);
>>>> +    atomic_read_relaxed(&last_attrs->dp_layer,     &attrs->dp_layer);
>>>> +}
>>>> +
>>>>  static bool
>>>>  dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>>> -                                    const struct dp_netdev_flow *netdev_flow,
>>>> +                                    struct dp_netdev_flow *netdev_flow,
>>>>                                      struct dpif_flow_stats *stats,
>>>>                                      struct dpif_flow_attrs *attrs)
>>>>  {
>>>> @@ -3175,11 +3219,31 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>>>      }
>>>>      ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>>>>      /* Taking a global 'port_mutex' to fulfill thread safety
>>>> -     * restrictions for the netdev-offload-dpdk module. */
>>>> -    ovs_mutex_lock(&dp->port_mutex);
>>>> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow->mega_ufid,
>>>> -                          stats, attrs, &buf);
>>>> -    ovs_mutex_unlock(&dp->port_mutex);
>>>> +     * restrictions for the netdev-offload-dpdk module.
>>>> +     *
>>>> +     * XXX: Main thread will try to pause/stop all revalidators during datapath
>>>> +     *      reconfiguration via datapath purge callback (dp_purge_cb) while
>>>> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex here.
>>>> +     *      Otherwise, deadlock is possible, bcause revalidators might sleep
>>>> +     *      waiting for the main thread to release the lock and main thread
>>>> +     *      will wait for them to stop processing.
>>>> +     *      This workaround might make statistics less accurate. Especially
>>>> +     *      for flow deletion case, since there will be no other attempt.  */
>>>> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
>>>> +        ret = netdev_flow_get(netdev, &match, &actions,
>>>> +                              &netdev_flow->mega_ufid, stats, attrs, &buf);
>>>> +        /* Storing attributes from the last request for later use on mutex
>>>> +         * contention. */
>>>
>>> Shouldn't you check 'ret' here before the set
>>
>> Yes.  That makes sense.  I'll move below call under 'if (!ret)'.
> 
> Another option is to actually store 'ret' along with last_{stats,attrs}.
> And return last return value that we actually had.  That will protect
> us from flapping in case the flow dissapeared from the HW (for example
> with linux tc this might happen if someone removed flows from qdisk
> externally).  In case we're storing only on success, this will result
> in returniing error if we took the mutex and sucess if we didn't but have
> good attributes and statistics from one of the previous calls.
> 
> What do you think?
> 

Not sure if you're saying you don't want to update the stats/attrs
anymore past last good value in case you got a ret error from flow_get()
*anytime* in the past?

>>
>>>
>>>> +        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs);
>>>> +        ovs_mutex_unlock(&dp->port_mutex);
>>>> +    } else {
>>>> +        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs);
>>>> +        if (!attrs->dp_layer) {
>>>> +            /* Flow was never reported as 'offloaded' so it's harmless
>>>> +             * to continue to think so. */
>>>> +            ret = EAGAIN;
>>>> +        }
>>>> +    }
>>>>      netdev_close(netdev);
>>>>      if (ret) {
>>>>          return false;
>>>> @@ -3448,6 +3512,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>>>>      /* Do not allocate extra space. */
>>>>      flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
>>>>      memset(&flow->stats, 0, sizeof flow->stats);
>>>> +    memset(&flow->last_stats, 0, sizeof flow->last_stats);
>>>> +    memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
>>>>      flow->dead = false;
>>>>      flow->batch = NULL;
>>>>      flow->mark = INVALID_FLOW_MARK;
>>>>
>>>
>>
>
Ilya Maximets July 15, 2020, 10:02 a.m. UTC | #6
On 7/15/20 11:44 AM, Kevin Traynor wrote:
> On 14/07/2020 19:24, Ilya Maximets wrote:
>> On 7/14/20 7:52 PM, Ilya Maximets wrote:
>>> On 7/14/20 5:33 PM, Kevin Traynor wrote:
>>>> On 09/07/2020 16:19, Ilya Maximets wrote:
>>>>> @@ -3175,11 +3219,31 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>>>>      }
>>>>>      ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>>>>>      /* Taking a global 'port_mutex' to fulfill thread safety
>>>>> -     * restrictions for the netdev-offload-dpdk module. */
>>>>> -    ovs_mutex_lock(&dp->port_mutex);
>>>>> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow->mega_ufid,
>>>>> -                          stats, attrs, &buf);
>>>>> -    ovs_mutex_unlock(&dp->port_mutex);
>>>>> +     * restrictions for the netdev-offload-dpdk module.
>>>>> +     *
>>>>> +     * XXX: Main thread will try to pause/stop all revalidators during datapath
>>>>> +     *      reconfiguration via datapath purge callback (dp_purge_cb) while
>>>>> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex here.
>>>>> +     *      Otherwise, deadlock is possible, bcause revalidators might sleep
>>>>> +     *      waiting for the main thread to release the lock and main thread
>>>>> +     *      will wait for them to stop processing.
>>>>> +     *      This workaround might make statistics less accurate. Especially
>>>>> +     *      for flow deletion case, since there will be no other attempt.  */
>>>>> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
>>>>> +        ret = netdev_flow_get(netdev, &match, &actions,
>>>>> +                              &netdev_flow->mega_ufid, stats, attrs, &buf);
>>>>> +        /* Storing attributes from the last request for later use on mutex
>>>>> +         * contention. */
>>>>
>>>> Shouldn't you check 'ret' here before the set
>>>
>>> Yes.  That makes sense.  I'll move below call under 'if (!ret)'.
>>
>> Another option is to actually store 'ret' along with last_{stats,attrs}.
>> And return last return value that we actually had.  That will protect
>> us from flapping in case the flow dissapeared from the HW (for example
>> with linux tc this might happen if someone removed flows from qdisk
>> externally).  In case we're storing only on success, this will result
>> in returniing error if we took the mutex and sucess if we didn't but have
>> good attributes and statistics from one of the previous calls.
>>
>> What do you think?
>>
> 
> Not sure if you're saying you don't want to update the stats/attrs
> anymore past last good value in case you got a ret error from flow_get()
> *anytime* in the past?

Sorry for not being clear.  I meant something like this:

--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -557,6 +559,7 @@ struct dp_netdev_flow {
     struct dp_netdev_flow_stats stats;
 
     /* Statistics and attributes received from the netdev offload provider. */
+    atomic_int netdev_flow_get_result;
     struct dp_netdev_flow_stats last_stats;
     struct dp_netdev_flow_attrs last_attrs;
 
@@ -3163,11 +3291,17 @@ dp_netdev_pmd_find_flow(const struct dp_netdev_pmd_thread *pmd,
 static void
 dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
                                     const struct dpif_flow_stats *stats,
-                                    const struct dpif_flow_attrs *attrs)
+                                    const struct dpif_flow_attrs *attrs,
+                                    int result)
 {
     struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
     struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
 
+    atomic_store_relaxed(&netdev_flow->netdev_flow_get_result, result);
+    if (result) {
+        return;
+    }
+
     atomic_store_relaxed(&last_stats->used,         stats->used);
     atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
     atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
@@ -3175,16 +3309,23 @@ dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
 
     atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
     atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
+
 }
 
 static void
 dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
                                     struct dpif_flow_stats *stats,
-                                    struct dpif_flow_attrs *attrs)
+                                    struct dpif_flow_attrs *attrs,
+                                    int *result)
 {
     struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
     struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
 
+    atomic_read_relaxed(&netdev_flow->netdev_flow_get_result, result);
+    if (*result) {
+        return;
+    }
+
     atomic_read_relaxed(&last_stats->used,         &stats->used);
     atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
     atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
@@ -3232,13 +3373,13 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
     if (!ovs_mutex_trylock(&dp->port_mutex)) {
         ret = netdev_flow_get(netdev, &match, &actions,
                               &netdev_flow->mega_ufid, stats, attrs, &buf);
-        /* Storing attributes from the last request for later use on mutex
-         * contention. */
-        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs);
+        /* Storing statistics and attributes from the last request for
+         * later use on mutex contention. */
+        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs, ret);
         ovs_mutex_unlock(&dp->port_mutex);
     } else {
-        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs);
-        if (!attrs->dp_layer) {
+        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs, &ret);
+        if (!ret && !attrs->dp_layer) {
             /* Flow was never reported as 'offloaded' so it's harmless
              * to continue to think so. */
             ret = EAGAIN;
@@ -3512,6 +3653,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
     /* Do not allocate extra space. */
     flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
     memset(&flow->stats, 0, sizeof flow->stats);
+    atomic_init(&flow->netdev_flow_get_result, 0);
     memset(&flow->last_stats, 0, sizeof flow->last_stats);
     memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
     flow->dead = false;
---

This should mimic the last call regardless of its success.

Best regards, Ilya Maximets.
Kevin Traynor July 15, 2020, 10:31 a.m. UTC | #7
On 15/07/2020 11:02, Ilya Maximets wrote:
> On 7/15/20 11:44 AM, Kevin Traynor wrote:
>> On 14/07/2020 19:24, Ilya Maximets wrote:
>>> On 7/14/20 7:52 PM, Ilya Maximets wrote:
>>>> On 7/14/20 5:33 PM, Kevin Traynor wrote:
>>>>> On 09/07/2020 16:19, Ilya Maximets wrote:
>>>>>> @@ -3175,11 +3219,31 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>>>>>      }
>>>>>>      ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>>>>>>      /* Taking a global 'port_mutex' to fulfill thread safety
>>>>>> -     * restrictions for the netdev-offload-dpdk module. */
>>>>>> -    ovs_mutex_lock(&dp->port_mutex);
>>>>>> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow->mega_ufid,
>>>>>> -                          stats, attrs, &buf);
>>>>>> -    ovs_mutex_unlock(&dp->port_mutex);
>>>>>> +     * restrictions for the netdev-offload-dpdk module.
>>>>>> +     *
>>>>>> +     * XXX: Main thread will try to pause/stop all revalidators during datapath
>>>>>> +     *      reconfiguration via datapath purge callback (dp_purge_cb) while
>>>>>> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex here.
>>>>>> +     *      Otherwise, deadlock is possible, bcause revalidators might sleep
>>>>>> +     *      waiting for the main thread to release the lock and main thread
>>>>>> +     *      will wait for them to stop processing.
>>>>>> +     *      This workaround might make statistics less accurate. Especially
>>>>>> +     *      for flow deletion case, since there will be no other attempt.  */
>>>>>> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
>>>>>> +        ret = netdev_flow_get(netdev, &match, &actions,
>>>>>> +                              &netdev_flow->mega_ufid, stats, attrs, &buf);
>>>>>> +        /* Storing attributes from the last request for later use on mutex
>>>>>> +         * contention. */
>>>>>
>>>>> Shouldn't you check 'ret' here before the set
>>>>
>>>> Yes.  That makes sense.  I'll move below call under 'if (!ret)'.
>>>
>>> Another option is to actually store 'ret' along with last_{stats,attrs}.
>>> And return last return value that we actually had.  That will protect
>>> us from flapping in case the flow dissapeared from the HW (for example
>>> with linux tc this might happen if someone removed flows from qdisk
>>> externally).  In case we're storing only on success, this will result
>>> in returniing error if we took the mutex and sucess if we didn't but have
>>> good attributes and statistics from one of the previous calls.
>>>
>>> What do you think?
>>>
>>
>> Not sure if you're saying you don't want to update the stats/attrs
>> anymore past last good value in case you got a ret error from flow_get()
>> *anytime* in the past?
> 
> Sorry for not being clear.  I meant something like this:
> 
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -557,6 +559,7 @@ struct dp_netdev_flow {
>      struct dp_netdev_flow_stats stats;
>  
>      /* Statistics and attributes received from the netdev offload provider. */
> +    atomic_int netdev_flow_get_result;
>      struct dp_netdev_flow_stats last_stats;
>      struct dp_netdev_flow_attrs last_attrs;
>  
> @@ -3163,11 +3291,17 @@ dp_netdev_pmd_find_flow(const struct dp_netdev_pmd_thread *pmd,
>  static void
>  dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>                                      const struct dpif_flow_stats *stats,
> -                                    const struct dpif_flow_attrs *attrs)
> +                                    const struct dpif_flow_attrs *attrs,
> +                                    int result)
>  {
>      struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>      struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>  
> +    atomic_store_relaxed(&netdev_flow->netdev_flow_get_result, result);
> +    if (result) {
> +        return;
> +    }
> +
>      atomic_store_relaxed(&last_stats->used,         stats->used);
>      atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
>      atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
> @@ -3175,16 +3309,23 @@ dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>  
>      atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
>      atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
> +
>  }
>  
>  static void
>  dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>                                      struct dpif_flow_stats *stats,
> -                                    struct dpif_flow_attrs *attrs)
> +                                    struct dpif_flow_attrs *attrs,
> +                                    int *result)
>  {
>      struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>      struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
>  
> +    atomic_read_relaxed(&netdev_flow->netdev_flow_get_result, result);
> +    if (*result) {
> +        return;
> +    }
> +
>      atomic_read_relaxed(&last_stats->used,         &stats->used);
>      atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
>      atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
> @@ -3232,13 +3373,13 @@ dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>      if (!ovs_mutex_trylock(&dp->port_mutex)) {
>          ret = netdev_flow_get(netdev, &match, &actions,
>                                &netdev_flow->mega_ufid, stats, attrs, &buf);
> -        /* Storing attributes from the last request for later use on mutex
> -         * contention. */
> -        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs);
> +        /* Storing statistics and attributes from the last request for
> +         * later use on mutex contention. */
> +        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs, ret);
>          ovs_mutex_unlock(&dp->port_mutex);
>      } else {
> -        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs);
> -        if (!attrs->dp_layer) {
> +        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs, &ret);
> +        if (!ret && !attrs->dp_layer) {
>              /* Flow was never reported as 'offloaded' so it's harmless
>               * to continue to think so. */
>              ret = EAGAIN;
> @@ -3512,6 +3653,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>      /* Do not allocate extra space. */
>      flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
>      memset(&flow->stats, 0, sizeof flow->stats);
> +    atomic_init(&flow->netdev_flow_get_result, 0);
>      memset(&flow->last_stats, 0, sizeof flow->last_stats);
>      memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
>      flow->dead = false;
> ---
> 
> This should mimic the last call regardless of its success.
> 

got it, that looks good, thanks.

> Best regards, Ilya Maximets.
>
diff mbox series

Patch

diff --git a/AUTHORS.rst b/AUTHORS.rst
index 8e6a0769f..eb36a01d0 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -504,6 +504,7 @@  Edwin Chiu                      echiu@vmware.com
 Eivind Bulie Haanaes
 Enas Ahmad                      enas.ahmad@kaust.edu.sa
 Eric Lopez
+Frank Wang (王培辉)             wangpeihui@inspur.com
 Frido Roose                     fr.roose@gmail.com
 Gaetano Catalli                 gaetano.catalli@gmail.com
 Gavin Remaley                   gavin_remaley@selinc.com
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 629a0cb53..2eb7f32ff 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -490,6 +490,12 @@  struct dp_netdev_flow_stats {
     atomic_uint16_t tcp_flags;     /* Bitwise-OR of seen tcp_flags values. */
 };
 
+/* Contained by struct dp_netdev_flow's 'last_attrs' member.  */
+struct dp_netdev_flow_attrs {
+    atomic_bool offloaded;         /* True if flow is offloaded to HW. */
+    ATOMIC(const char *) dp_layer; /* DP layer the flow is handled in. */
+};
+
 /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
  *
  *
@@ -550,6 +556,10 @@  struct dp_netdev_flow {
     /* Statistics. */
     struct dp_netdev_flow_stats stats;
 
+    /* Statistics and attributes received from the netdev offload provider. */
+    struct dp_netdev_flow_stats last_stats;
+    struct dp_netdev_flow_attrs last_attrs;
+
     /* Actions. */
     OVSRCU_TYPE(struct dp_netdev_actions *) actions;
 
@@ -3150,9 +3160,43 @@  dp_netdev_pmd_find_flow(const struct dp_netdev_pmd_thread *pmd,
     return NULL;
 }
 
+static void
+dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
+                                    const struct dpif_flow_stats *stats,
+                                    const struct dpif_flow_attrs *attrs)
+{
+    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
+    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
+
+    atomic_store_relaxed(&last_stats->used,         stats->used);
+    atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
+    atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
+    atomic_store_relaxed(&last_stats->tcp_flags,    stats->tcp_flags);
+
+    atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
+    atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
+}
+
+static void
+dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
+                                    struct dpif_flow_stats *stats,
+                                    struct dpif_flow_attrs *attrs)
+{
+    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
+    struct dp_netdev_flow_attrs *last_attrs = &netdev_flow->last_attrs;
+
+    atomic_read_relaxed(&last_stats->used,         &stats->used);
+    atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
+    atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
+    atomic_read_relaxed(&last_stats->tcp_flags,    &stats->tcp_flags);
+
+    atomic_read_relaxed(&last_attrs->offloaded,    &attrs->offloaded);
+    atomic_read_relaxed(&last_attrs->dp_layer,     &attrs->dp_layer);
+}
+
 static bool
 dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
-                                    const struct dp_netdev_flow *netdev_flow,
+                                    struct dp_netdev_flow *netdev_flow,
                                     struct dpif_flow_stats *stats,
                                     struct dpif_flow_attrs *attrs)
 {
@@ -3175,11 +3219,31 @@  dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
     }
     ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
     /* Taking a global 'port_mutex' to fulfill thread safety
-     * restrictions for the netdev-offload-dpdk module. */
-    ovs_mutex_lock(&dp->port_mutex);
-    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow->mega_ufid,
-                          stats, attrs, &buf);
-    ovs_mutex_unlock(&dp->port_mutex);
+     * restrictions for the netdev-offload-dpdk module.
+     *
+     * XXX: Main thread will try to pause/stop all revalidators during datapath
+     *      reconfiguration via datapath purge callback (dp_purge_cb) while
+     *      holding 'dp->port_mutex'.  So we're not waiting for mutex here.
+     *      Otherwise, deadlock is possible, bcause revalidators might sleep
+     *      waiting for the main thread to release the lock and main thread
+     *      will wait for them to stop processing.
+     *      This workaround might make statistics less accurate. Especially
+     *      for flow deletion case, since there will be no other attempt.  */
+    if (!ovs_mutex_trylock(&dp->port_mutex)) {
+        ret = netdev_flow_get(netdev, &match, &actions,
+                              &netdev_flow->mega_ufid, stats, attrs, &buf);
+        /* Storing attributes from the last request for later use on mutex
+         * contention. */
+        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs);
+        ovs_mutex_unlock(&dp->port_mutex);
+    } else {
+        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs);
+        if (!attrs->dp_layer) {
+            /* Flow was never reported as 'offloaded' so it's harmless
+             * to continue to think so. */
+            ret = EAGAIN;
+        }
+    }
     netdev_close(netdev);
     if (ret) {
         return false;
@@ -3448,6 +3512,8 @@  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
     /* Do not allocate extra space. */
     flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
     memset(&flow->stats, 0, sizeof flow->stats);
+    memset(&flow->last_stats, 0, sizeof flow->last_stats);
+    memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
     flow->dead = false;
     flow->batch = NULL;
     flow->mark = INVALID_FLOW_MARK;