mbox series

[ovs-dev,v6,0/8] Add OVS DPDK keep-alive functionality.

Message ID 1512734667-23202-1-git-send-email-bhanuprakash.bodireddy@intel.com
Headers show
Series Add OVS DPDK keep-alive functionality. | expand

Message

Bodireddy, Bhanuprakash Dec. 8, 2017, 12:04 p.m. UTC
Keepalive feature is aimed at achieving Fastpath Service Assurance
in OVS-DPDK deployments. It adds support for monitoring the packet
processing threads by dispatching heartbeats at regular intervals.
 
keepalive feature can be enabled through below OVSDB settings.

    enable-keepalive=true
      - Keepalive feature is disabled by default and should be enabled
        at startup before ovs-vswitchd daemon is started.

    keepalive-interval="5000"
      - Timer interval in milliseconds for monitoring the packet
        processing cores.

TESTING:
    The testing of keepalive is done using stress cmd (simulating the stalls).
      - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
      - stress -c 1 &          [tid is usually the __tid + 1 of the output]
      - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
      - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is running]
      - PMD thread will be descheduled due to its normal priority and yields
        core to stress thread.

      - ovs-appctl keepalive/pmd-health-show   [Display that the thread is GONE]
      - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the status]

      - taskset -p 0x10 <tid>  [This brings back pmd thread to life as stress thread
                                is moved to idle core]

      (watch out for stress threads, and carefully pin them to core not to hang your DUTs
       during tesing).

v5 -> v6
  * Remove 2 patches from series
     - xnanosleep was applied to master as part of high resolution timeout support.
     - Extend get_process_info() API was also applied to master earlier.
  * Remove KA_STATE_DOZING as it was initially meant to handle Core C states, not needed
    for now.
  * Fixed ka_destroy(), to fix unit test cases 536, 537.
  * A minor performance degradation(0.5%) is observed with Keepalive enabled.
    [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts and
    1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
  * Verified with sparse, MSVC compilers(appveyor).

v4 -> v5
  * Add 3 more patches to the series
     - xnanosleep()
     - Documentation
     - Update to NEWS
  * Remove all references to core_id and instead implemented thread based tracking.
  * Addressed most of the comments in v4.

v3 -> v4
  * Split the functionality in to 2 parts. This patch series only updates
    PMD status to OVSDB. The incremental patch series to handle false positives,
    negatives and more checking and stats. 
  * Remove code from netdev layer and dependency on rte_keepalive lib.
  * Merged few patches and simplified the patch series.
  * Timestamp in human readable form.

v2 -> v3
  * Rebase.
  * Verified with dpdk-stable-17.05.1 release.
  * Fixed build issues with MSVC and cross checked with appveyor.

v1 -> v2
  * Rebase
  * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
    is already applied as separate patch.

RFCv3 -> v1
  * Made changes to fix failures in some unit test cases.
  * some more code cleanup w.r.t process related APIs.

RFCv2 -> RFCv3
  * Remove POSIX shared memory block implementation (suggested by Aaron).
  * Rework the logic to register and track threads instead of cores. This way
    in the future any thread can be registered to KA framework. For now only PMD
    threads are tracked (suggested by Aaron).
  * Refactor few APIs and further clean up the code.
   
RFCv1 -> RFCv2
  * Merged the xml and schema commits to later commit where the actual
    implementation is done(suggested by Ben).
  * Fix ovs-appctl keepalive/* hang issue when KA disabled.
  * Fixed memory leaks with appctl commands for keepalive/pmd-health-show,
    pmd-xstats-show.
  * Refactor code and fixed APIs dealing with PMD health monitoring.


Bhanuprakash Bodireddy (8):
  Keepalive: Add initial keepalive configuration.
  dpif-netdev: Register packet processing cores to KA framework.
  dpif-netdev: Enable heartbeats for DPDK datapath.
  keepalive: Retrieve PMD status periodically.
  bridge: Update keepalive status in OVSDB.
  keepalive: Add support to query keepalive status and statistics.
  Documentation: Update DPDK doc with Keepalive feature.
  NEWS: Add keepalive support information in NEWS.

 Documentation/howto/dpdk.rst | 112 +++++++++
 NEWS                         |   2 +
 lib/automake.mk              |   2 +
 lib/dpif-netdev.c            |  92 ++++++++
 lib/keepalive.c              | 552 +++++++++++++++++++++++++++++++++++++++++++
 lib/keepalive.h              | 109 +++++++++
 lib/ovs-thread.c             |   6 +
 lib/ovs-thread.h             |   1 +
 lib/util.c                   |  22 ++
 lib/util.h                   |   1 +
 vswitchd/bridge.c            |  29 +++
 vswitchd/vswitch.ovsschema   |   8 +-
 vswitchd/vswitch.xml         |  49 ++++
 13 files changed, 983 insertions(+), 2 deletions(-)
 create mode 100644 lib/keepalive.c
 create mode 100644 lib/keepalive.h

Comments

Fischetti, Antonio Dec. 15, 2017, 2:23 p.m. UTC | #1
Hi Bhanu,
I've tested v6 and LGTM, works as expected.

To check the behavior I blocked one or more of the PMD with the trick of 
attaching cgdb. So with 

   utilities/ovs-appctl keepalive/pmd-health-show

I could see for example 2 of the 3 PMDs - those I intentionally blocked - 
correctly reported as DEAD first, then GONE like

                Keepalive status

keepalive status   : Enabled
keepalive interval : 1000 ms
keepalive init time: 15 Dec 2017 13:20:54
PMD threads        : 3

 PMD    CORE    STATE   LAST SEEN TIMESTAMP(UTC)
pmd62    5      GONE    15 Dec 2017 13:22:39
pmd65    7      ALIVE   15 Dec 2017 13:22:50
pmd63    4      GONE    15 Dec 2017 13:22:14


Thanks,
Antonio

> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, December 8, 2017 12:04 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v6 0/8] Add OVS DPDK keep-alive functionality.
> 
> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing threads by dispatching heartbeats at regular intervals.
> 
> keepalive feature can be enabled through below OVSDB settings.
> 
>     enable-keepalive=true
>       - Keepalive feature is disabled by default and should be enabled
>         at startup before ovs-vswitchd daemon is started.
> 
>     keepalive-interval="5000"
>       - Timer interval in milliseconds for monitoring the packet
>         processing cores.
> 
> TESTING:
>     The testing of keepalive is done using stress cmd (simulating the
> stalls).
>       - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>       - stress -c 1 &          [tid is usually the __tid + 1 of the
> output]
>       - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
>       - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is
> running]
>       - PMD thread will be descheduled due to its normal priority and
> yields
>         core to stress thread.
> 
>       - ovs-appctl keepalive/pmd-health-show   [Display that the thread
> is GONE]
>       - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the
> status]
> 
>       - taskset -p 0x10 <tid>  [This brings back pmd thread to life as
> stress thread
>                                 is moved to idle core]
> 
>       (watch out for stress threads, and carefully pin them to core not
> to hang your DUTs
>        during tesing).
> 
> v5 -> v6
>   * Remove 2 patches from series
>      - xnanosleep was applied to master as part of high resolution
> timeout support.
>      - Extend get_process_info() API was also applied to master earlier.
>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C
> states, not needed
>     for now.
>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>   * A minor performance degradation(0.5%) is observed with Keepalive
> enabled.
>     [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts
> and
>     1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>   * Verified with sparse, MSVC compilers(appveyor).
> 
> v4 -> v5
>   * Add 3 more patches to the series
>      - xnanosleep()
>      - Documentation
>      - Update to NEWS
>   * Remove all references to core_id and instead implemented thread
> based tracking.
>   * Addressed most of the comments in v4.
> 
> v3 -> v4
>   * Split the functionality in to 2 parts. This patch series only
> updates
>     PMD status to OVSDB. The incremental patch series to handle false
> positives,
>     negatives and more checking and stats.
>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>   * Merged few patches and simplified the patch series.
>   * Timestamp in human readable form.
> 
> v2 -> v3
>   * Rebase.
>   * Verified with dpdk-stable-17.05.1 release.
>   * Fixed build issues with MSVC and cross checked with appveyor.
> 
> v1 -> v2
>   * Rebase
>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
>     is already applied as separate patch.
> 
> RFCv3 -> v1
>   * Made changes to fix failures in some unit test cases.
>   * some more code cleanup w.r.t process related APIs.
> 
> RFCv2 -> RFCv3
>   * Remove POSIX shared memory block implementation (suggested by
> Aaron).
>   * Rework the logic to register and track threads instead of cores.
> This way
>     in the future any thread can be registered to KA framework. For now
> only PMD
>     threads are tracked (suggested by Aaron).
>   * Refactor few APIs and further clean up the code.
> 
> RFCv1 -> RFCv2
>   * Merged the xml and schema commits to later commit where the actual
>     implementation is done(suggested by Ben).
>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-
> show,
>     pmd-xstats-show.
>   * Refactor code and fixed APIs dealing with PMD health monitoring.
> 
> 
> Bhanuprakash Bodireddy (8):
>   Keepalive: Add initial keepalive configuration.
>   dpif-netdev: Register packet processing cores to KA framework.
>   dpif-netdev: Enable heartbeats for DPDK datapath.
>   keepalive: Retrieve PMD status periodically.
>   bridge: Update keepalive status in OVSDB.
>   keepalive: Add support to query keepalive status and statistics.
>   Documentation: Update DPDK doc with Keepalive feature.
>   NEWS: Add keepalive support information in NEWS.
> 
>  Documentation/howto/dpdk.rst | 112 +++++++++
>  NEWS                         |   2 +
>  lib/automake.mk              |   2 +
>  lib/dpif-netdev.c            |  92 ++++++++
>  lib/keepalive.c              | 552
> +++++++++++++++++++++++++++++++++++++++++++
>  lib/keepalive.h              | 109 +++++++++
>  lib/ovs-thread.c             |   6 +
>  lib/ovs-thread.h             |   1 +
>  lib/util.c                   |  22 ++
>  lib/util.h                   |   1 +
>  vswitchd/bridge.c            |  29 +++
>  vswitchd/vswitch.ovsschema   |   8 +-
>  vswitchd/vswitch.xml         |  49 ++++
>  13 files changed, 983 insertions(+), 2 deletions(-)
>  create mode 100644 lib/keepalive.c
>  create mode 100644 lib/keepalive.h
> 
> --
> 2.4.11
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Flavio Leitner Jan. 15, 2018, 8:58 p.m. UTC | #2
Hi,

Sorry to jump on this at v6 only, but I skimmed over the code and I am
struggling to understand what problem you're trying to solve. Yes, I
realize you want some sort of feedback about the PMD processing,
but it's not clear to me what exactly you want from it.

This last patchset uses a separate thread just to monitor the PMD
threads which can update their status in the core busy loop.  I
guess it tells you if the PMD thread is stuck or not, but not
really if it's processing packets.  That's again, my question above.

If you need to know if the thread is running, I think any OVS can
provide you the process stats which should be more reliable and
doesn't depend on OVS at all.

I appreciate if you could elaborate more on the use-case.
Thanks
fbl


On Fri, Dec 08, 2017 at 12:04:19PM +0000, Bhanuprakash Bodireddy wrote:
> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing threads by dispatching heartbeats at regular intervals.
>  
> keepalive feature can be enabled through below OVSDB settings.
> 
>     enable-keepalive=true
>       - Keepalive feature is disabled by default and should be enabled
>         at startup before ovs-vswitchd daemon is started.
> 
>     keepalive-interval="5000"
>       - Timer interval in milliseconds for monitoring the packet
>         processing cores.
> 
> TESTING:
>     The testing of keepalive is done using stress cmd (simulating the stalls).
>       - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>       - stress -c 1 &          [tid is usually the __tid + 1 of the output]
>       - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
>       - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is running]
>       - PMD thread will be descheduled due to its normal priority and yields
>         core to stress thread.
> 
>       - ovs-appctl keepalive/pmd-health-show   [Display that the thread is GONE]
>       - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the status]
> 
>       - taskset -p 0x10 <tid>  [This brings back pmd thread to life as stress thread
>                                 is moved to idle core]
> 
>       (watch out for stress threads, and carefully pin them to core not to hang your DUTs
>        during tesing).
> 
> v5 -> v6
>   * Remove 2 patches from series
>      - xnanosleep was applied to master as part of high resolution timeout support.
>      - Extend get_process_info() API was also applied to master earlier.
>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C states, not needed
>     for now.
>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>   * A minor performance degradation(0.5%) is observed with Keepalive enabled.
>     [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts and
>     1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>   * Verified with sparse, MSVC compilers(appveyor).
> 
> v4 -> v5
>   * Add 3 more patches to the series
>      - xnanosleep()
>      - Documentation
>      - Update to NEWS
>   * Remove all references to core_id and instead implemented thread based tracking.
>   * Addressed most of the comments in v4.
> 
> v3 -> v4
>   * Split the functionality in to 2 parts. This patch series only updates
>     PMD status to OVSDB. The incremental patch series to handle false positives,
>     negatives and more checking and stats. 
>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>   * Merged few patches and simplified the patch series.
>   * Timestamp in human readable form.
> 
> v2 -> v3
>   * Rebase.
>   * Verified with dpdk-stable-17.05.1 release.
>   * Fixed build issues with MSVC and cross checked with appveyor.
> 
> v1 -> v2
>   * Rebase
>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
>     is already applied as separate patch.
> 
> RFCv3 -> v1
>   * Made changes to fix failures in some unit test cases.
>   * some more code cleanup w.r.t process related APIs.
> 
> RFCv2 -> RFCv3
>   * Remove POSIX shared memory block implementation (suggested by Aaron).
>   * Rework the logic to register and track threads instead of cores. This way
>     in the future any thread can be registered to KA framework. For now only PMD
>     threads are tracked (suggested by Aaron).
>   * Refactor few APIs and further clean up the code.
>    
> RFCv1 -> RFCv2
>   * Merged the xml and schema commits to later commit where the actual
>     implementation is done(suggested by Ben).
>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-show,
>     pmd-xstats-show.
>   * Refactor code and fixed APIs dealing with PMD health monitoring.
> 
> 
> Bhanuprakash Bodireddy (8):
>   Keepalive: Add initial keepalive configuration.
>   dpif-netdev: Register packet processing cores to KA framework.
>   dpif-netdev: Enable heartbeats for DPDK datapath.
>   keepalive: Retrieve PMD status periodically.
>   bridge: Update keepalive status in OVSDB.
>   keepalive: Add support to query keepalive status and statistics.
>   Documentation: Update DPDK doc with Keepalive feature.
>   NEWS: Add keepalive support information in NEWS.
> 
>  Documentation/howto/dpdk.rst | 112 +++++++++
>  NEWS                         |   2 +
>  lib/automake.mk              |   2 +
>  lib/dpif-netdev.c            |  92 ++++++++
>  lib/keepalive.c              | 552 +++++++++++++++++++++++++++++++++++++++++++
>  lib/keepalive.h              | 109 +++++++++
>  lib/ovs-thread.c             |   6 +
>  lib/ovs-thread.h             |   1 +
>  lib/util.c                   |  22 ++
>  lib/util.h                   |   1 +
>  vswitchd/bridge.c            |  29 +++
>  vswitchd/vswitch.ovsschema   |   8 +-
>  vswitchd/vswitch.xml         |  49 ++++
>  13 files changed, 983 insertions(+), 2 deletions(-)
>  create mode 100644 lib/keepalive.c
>  create mode 100644 lib/keepalive.h
> 
> -- 
> 2.4.11
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Bodireddy, Bhanuprakash Jan. 16, 2018, 10:43 a.m. UTC | #3
>Hi,
>
>Sorry to jump on this at v6 only, but I skimmed over the code and I am
>struggling to understand what problem you're trying to solve. Yes, I realize
>you want some sort of feedback about the PMD processing, but it's not clear
>to me what exactly you want from it.
>
>This last patchset uses a separate thread just to monitor the PMD threads
>which can update their status in the core busy loop.  I guess it tells you if the
>PMD thread is stuck or not, but not really if it's processing packets.  That's
>again, my question above.
>
>If you need to know if the thread is running, I think any OVS can provide you
>the process stats which should be more reliable and doesn't depend on OVS
>at all.
>
>I appreciate if you could elaborate more on the use-case.

Intel SA team has been working on  SA Framework for NFV environment and has defined interfaces
for the base platform(aligned with ETSI GS NFV 002)  which includes compute, storage, NW, virtual switch, OS and hypervisor.
The core idea here is to monitor and detect the service impacting faults on the Base platform. 
Both reactive and pro-active fault detection techniques are employed and faults are reported to
higher level layers for corrective actions. The corrective actions for example here can be migrating
the workloads, marking the compute offline and is based on the policies enforced at higher layers.

One aspect of larger SA framework is monitoring virtual switch health. Some of the events of interest here
are link status, OvS DB connection status, packet statistics(drops/errors), PMD health. 

This patch series has only implemented *PMD health* monitoring and reporting mechanism and the details are
already in the patch. The other interesting events of virtual switch are already implemented as part of collectd plugin.

On your questions:

> I guess it tells you if the PMD thread is stuck or not, but not really if it's processing packets.  That's
>again, my question above.

The functionality to check if the PMD is processing the packets was implemented way back in v3.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336789.html

For easier review, the patch series was split up in v4 to get the basic functionality in. This is mentioned in version change log below.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337702.html

>If you need to know if the thread is running, I think any OVS can provide you
>the process stats which should be more reliable and doesn't depend on OVS
>at all.

There is a problem here and I did simulate the case to show that the stats reported by OS aren't accurate in the below thread.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338388.html

Check the details on /proc/[pid]/stats. Though the PMD thread is stalled, OS reports the thread as *Running (R)* state.

- Bhanuprakash.

>
>
>On Fri, Dec 08, 2017 at 12:04:19PM +0000, Bhanuprakash Bodireddy wrote:
>> Keepalive feature is aimed at achieving Fastpath Service Assurance in
>> OVS-DPDK deployments. It adds support for monitoring the packet
>> processing threads by dispatching heartbeats at regular intervals.
>>
>> keepalive feature can be enabled through below OVSDB settings.
>>
>>     enable-keepalive=true
>>       - Keepalive feature is disabled by default and should be enabled
>>         at startup before ovs-vswitchd daemon is started.
>>
>>     keepalive-interval="5000"
>>       - Timer interval in milliseconds for monitoring the packet
>>         processing cores.
>>
>> TESTING:
>>     The testing of keepalive is done using stress cmd (simulating the stalls).
>>       - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>>       - stress -c 1 &          [tid is usually the __tid + 1 of the output]
>>       - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
>>       - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is running]
>>       - PMD thread will be descheduled due to its normal priority and yields
>>         core to stress thread.
>>
>>       - ovs-appctl keepalive/pmd-health-show   [Display that the thread is
>GONE]
>>       - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the
>> status]
>>
>>       - taskset -p 0x10 <tid>  [This brings back pmd thread to life as stress
>thread
>>                                 is moved to idle core]
>>
>>       (watch out for stress threads, and carefully pin them to core not to hang
>your DUTs
>>        during tesing).
>>
>> v5 -> v6
>>   * Remove 2 patches from series
>>      - xnanosleep was applied to master as part of high resolution timeout
>support.
>>      - Extend get_process_info() API was also applied to master earlier.
>>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C
>states, not needed
>>     for now.
>>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>>   * A minor performance degradation(0.5%) is observed with Keepalive
>enabled.
>>     [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts and
>>     1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>>   * Verified with sparse, MSVC compilers(appveyor).
>>
>> v4 -> v5
>>   * Add 3 more patches to the series
>>      - xnanosleep()
>>      - Documentation
>>      - Update to NEWS
>>   * Remove all references to core_id and instead implemented thread based
>tracking.
>>   * Addressed most of the comments in v4.
>>
>> v3 -> v4
>>   * Split the functionality in to 2 parts. This patch series only updates
>>     PMD status to OVSDB. The incremental patch series to handle false
>positives,
>>     negatives and more checking and stats.
>>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>>   * Merged few patches and simplified the patch series.
>>   * Timestamp in human readable form.
>>
>> v2 -> v3
>>   * Rebase.
>>   * Verified with dpdk-stable-17.05.1 release.
>>   * Fixed build issues with MSVC and cross checked with appveyor.
>>
>> v1 -> v2
>>   * Rebase
>>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
>>     is already applied as separate patch.
>>
>> RFCv3 -> v1
>>   * Made changes to fix failures in some unit test cases.
>>   * some more code cleanup w.r.t process related APIs.
>>
>> RFCv2 -> RFCv3
>>   * Remove POSIX shared memory block implementation (suggested by
>Aaron).
>>   * Rework the logic to register and track threads instead of cores. This way
>>     in the future any thread can be registered to KA framework. For now only
>PMD
>>     threads are tracked (suggested by Aaron).
>>   * Refactor few APIs and further clean up the code.
>>
>> RFCv1 -> RFCv2
>>   * Merged the xml and schema commits to later commit where the actual
>>     implementation is done(suggested by Ben).
>>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-
>show,
>>     pmd-xstats-show.
>>   * Refactor code and fixed APIs dealing with PMD health monitoring.
>>
>>
>> Bhanuprakash Bodireddy (8):
>>   Keepalive: Add initial keepalive configuration.
>>   dpif-netdev: Register packet processing cores to KA framework.
>>   dpif-netdev: Enable heartbeats for DPDK datapath.
>>   keepalive: Retrieve PMD status periodically.
>>   bridge: Update keepalive status in OVSDB.
>>   keepalive: Add support to query keepalive status and statistics.
>>   Documentation: Update DPDK doc with Keepalive feature.
>>   NEWS: Add keepalive support information in NEWS.
>>
>>  Documentation/howto/dpdk.rst | 112 +++++++++
>>  NEWS                         |   2 +
>>  lib/automake.mk              |   2 +
>>  lib/dpif-netdev.c            |  92 ++++++++
>>  lib/keepalive.c              | 552
>+++++++++++++++++++++++++++++++++++++++++++
>>  lib/keepalive.h              | 109 +++++++++
>>  lib/ovs-thread.c             |   6 +
>>  lib/ovs-thread.h             |   1 +
>>  lib/util.c                   |  22 ++
>>  lib/util.h                   |   1 +
>>  vswitchd/bridge.c            |  29 +++
>>  vswitchd/vswitch.ovsschema   |   8 +-
>>  vswitchd/vswitch.xml         |  49 ++++
>>  13 files changed, 983 insertions(+), 2 deletions(-)  create mode
>> 100644 lib/keepalive.c  create mode 100644 lib/keepalive.h
>>
>> --
>> 2.4.11
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>--
>Flavio
Kevin Traynor Jan. 22, 2018, 4:08 p.m. UTC | #4
On 12/08/2017 12:04 PM, Bhanuprakash Bodireddy wrote:
> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing threads by dispatching heartbeats at regular intervals.
>  

Hi Bhanu,

I guess there's comments elsewhere about the need for this feature, but
looking at this solution I had a few comments on it. Not a full review.

This is creating a dependency on internal OVS threading model from
collectd/ceilometer. That seems like something that could cause problems
in the future. What if someone wanted to put multiple pmds on a core, or
put a pmd into interrupt mode. Would it be blocked in OVS because it
would break collectd/ceilometer? or how would the projects coordinate?

I see there is a sleep state, so maybe that could be used for things
like an interrupt mode etc. but please consider that that OVS may not
always spin the pmds in the manner it does now and at least allow for a
"don't monitor this pmd" type option, so OVS is not restricted by trying
to please collectd/ceilometer.

It also seems strange that collectd/ceilometer would be monitoring pmd
health but be ok with pmds appearing and disappearing due to internal
OVS rxq assignment.

> keepalive feature can be enabled through below OVSDB settings.
> 
>     enable-keepalive=true
>       - Keepalive feature is disabled by default and should be enabled
>         at startup before ovs-vswitchd daemon is started.
> 
>     keepalive-interval="5000"
>       - Timer interval in milliseconds for monitoring the packet
>         processing cores.
> 
> TESTING:
>     The testing of keepalive is done using stress cmd (simulating the stalls).
>       - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>       - stress -c 1 &          [tid is usually the __tid + 1 of the output]
>       - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
>       - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is running]
>       - PMD thread will be descheduled due to its normal priority and yields
>         core to stress thread.
> 
>       - ovs-appctl keepalive/pmd-health-show   [Display that the thread is GONE]
>       - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the status]
> 
>       - taskset -p 0x10 <tid>  [This brings back pmd thread to life as stress thread
>                                 is moved to idle core]
> 
>       (watch out for stress threads, and carefully pin them to core not to hang your DUTs
>        during tesing).
> 
> v5 -> v6
>   * Remove 2 patches from series
>      - xnanosleep was applied to master as part of high resolution timeout support.
>      - Extend get_process_info() API was also applied to master earlier.
>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C states, not needed
>     for now.
>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>   * A minor performance degradation(0.5%) is observed with Keepalive enabled.
>     [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts and
>     1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>   * Verified with sparse, MSVC compilers(appveyor).
> 
> v4 -> v5
>   * Add 3 more patches to the series
>      - xnanosleep()
>      - Documentation
>      - Update to NEWS
>   * Remove all references to core_id and instead implemented thread based tracking.
>   * Addressed most of the comments in v4.
> 
> v3 -> v4
>   * Split the functionality in to 2 parts. This patch series only updates
>     PMD status to OVSDB. The incremental patch series to handle false positives,
>     negatives and more checking and stats. 
>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>   * Merged few patches and simplified the patch series.
>   * Timestamp in human readable form.
> 
> v2 -> v3
>   * Rebase.
>   * Verified with dpdk-stable-17.05.1 release.
>   * Fixed build issues with MSVC and cross checked with appveyor.
> 
> v1 -> v2
>   * Rebase
>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
>     is already applied as separate patch.
> 
> RFCv3 -> v1
>   * Made changes to fix failures in some unit test cases.
>   * some more code cleanup w.r.t process related APIs.
> 
> RFCv2 -> RFCv3
>   * Remove POSIX shared memory block implementation (suggested by Aaron).
>   * Rework the logic to register and track threads instead of cores. This way
>     in the future any thread can be registered to KA framework. For now only PMD
>     threads are tracked (suggested by Aaron).
>   * Refactor few APIs and further clean up the code.
>    
> RFCv1 -> RFCv2
>   * Merged the xml and schema commits to later commit where the actual
>     implementation is done(suggested by Ben).
>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-show,
>     pmd-xstats-show.
>   * Refactor code and fixed APIs dealing with PMD health monitoring.
> 
> 
> Bhanuprakash Bodireddy (8):
>   Keepalive: Add initial keepalive configuration.
>   dpif-netdev: Register packet processing cores to KA framework.
>   dpif-netdev: Enable heartbeats for DPDK datapath.
>   keepalive: Retrieve PMD status periodically.
>   bridge: Update keepalive status in OVSDB.
>   keepalive: Add support to query keepalive status and statistics.
>   Documentation: Update DPDK doc with Keepalive feature.
>   NEWS: Add keepalive support information in NEWS.
> 
>  Documentation/howto/dpdk.rst | 112 +++++++++
>  NEWS                         |   2 +
>  lib/automake.mk              |   2 +
>  lib/dpif-netdev.c            |  92 ++++++++
>  lib/keepalive.c              | 552 +++++++++++++++++++++++++++++++++++++++++++
>  lib/keepalive.h              | 109 +++++++++
>  lib/ovs-thread.c             |   6 +
>  lib/ovs-thread.h             |   1 +
>  lib/util.c                   |  22 ++
>  lib/util.h                   |   1 +
>  vswitchd/bridge.c            |  29 +++
>  vswitchd/vswitch.ovsschema   |   8 +-
>  vswitchd/vswitch.xml         |  49 ++++
>  13 files changed, 983 insertions(+), 2 deletions(-)
>  create mode 100644 lib/keepalive.c
>  create mode 100644 lib/keepalive.h
>
Aaron Conole Jan. 24, 2018, 5:02 p.m. UTC | #5
"Bodireddy, Bhanuprakash" <bhanuprakash.bodireddy@intel.com> writes:

>>Hi,
>>
>>Sorry to jump on this at v6 only, but I skimmed over the code and I am
>>struggling to understand what problem you're trying to solve. Yes, I realize
>>you want some sort of feedback about the PMD processing, but it's not clear
>>to me what exactly you want from it.
>>
>>This last patchset uses a separate thread just to monitor the PMD threads
>>which can update their status in the core busy loop.  I guess it tells you if the
>>PMD thread is stuck or not, but not really if it's processing packets.  That's
>>again, my question above.
>>
>>If you need to know if the thread is running, I think any OVS can provide you
>>the process stats which should be more reliable and doesn't depend on OVS
>>at all.
>>
>>I appreciate if you could elaborate more on the use-case.
>
> Intel SA team has been working on SA Framework for NFV environment and
> has defined interfaces
> for the base platform(aligned with ETSI GS NFV 002) which includes
> compute, storage, NW, virtual switch, OS and hypervisor.
> The core idea here is to monitor and detect the service impacting
> faults on the Base platform.
> Both reactive and pro-active fault detection techniques are employed
> and faults are reported to
> higher level layers for corrective actions. The corrective actions for
> example here can be migrating
> the workloads, marking the compute offline and is based on the
> policies enforced at higher layers.
>
> One aspect of larger SA framework is monitoring virtual switch
> health. Some of the events of interest here
> are link status, OvS DB connection status, packet
> statistics(drops/errors), PMD health.
>
> This patch series has only implemented *PMD health* monitoring and
> reporting mechanism and the details are
> already in the patch. The other interesting events of virtual switch
> are already implemented as part of collectd plugin.
>
> On your questions:
>
>> I guess it tells you if the PMD thread is stuck or not, but not
>> really if it's processing packets.  That's
>>again, my question above.
>
> The functionality to check if the PMD is processing the packets was
> implemented way back in v3.
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336789.html
>
> For easier review, the patch series was split up in v4 to get the
> basic functionality in. This is mentioned in version change log below.
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337702.html
>
>>If you need to know if the thread is running, I think any OVS can provide you
>>the process stats which should be more reliable and doesn't depend on OVS
>>at all.
>
> There is a problem here and I did simulate the case to show that the
> stats reported by OS aren't accurate in the below thread.
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338388.html
>
> Check the details on /proc/[pid]/stats. Though the PMD thread is
> stalled, OS reports the thread as *Running (R)* state.

Just responding to this point, and I assumed you would follow up to my
statement when I wrote:

  That said, I suspect that when those times were increasing, they really
  were getting some system / user time.  Was your worker thread
  occasionally making system calls (or even doing sched_yield()?)  In
  those cases, it likely was letting the pmd thread move on.

You are observing that the thread is still running, and still getting time.
Your experiment merely shows that the scheduler works as intended.  If
you clocked the percentage (as I just did), you'll see that it
balances.  ex, when I run two RT threads on the same core as a PMD
thread (and use a simple 'top' to show the data):

 5905 root      10 -10 1996516 141900  11128 R 60.1  0.9   3:09.08 pmd20        
 6361 aconole   20   0    7896    104      0 R 19.9  0.0   1:03.46 stress       
 6362 aconole   20   0    7896    104      0 R 19.9  0.0   1:06.95 stress       

Note the CPU utilization percentage.  When I stop these threads:

 5905 root      10 -10 1996516 141900  11128 R 99.7  0.9   4:22.77 pmd20        

That means, if you simply calculate the time taken by the thread as a
percentage of the time available to take, you'll detect stalls by seeing
if the percentage drops below, say, 90%.

This can still be done using procfs.  I reject the assertion that the
kernel doesn't account for process time properly.  That would be a
scheduler bug or an accounting bug.

I disagree with your statement:

  "The other fields related to time in stats file can't be completely
   relied due to below test case."

Did I misunderstand your assertion and test?

> - Bhanuprakash.
>
>>
>>
>>On Fri, Dec 08, 2017 at 12:04:19PM +0000, Bhanuprakash Bodireddy wrote:
>>> Keepalive feature is aimed at achieving Fastpath Service Assurance in
>>> OVS-DPDK deployments. It adds support for monitoring the packet
>>> processing threads by dispatching heartbeats at regular intervals.
>>>
>>> keepalive feature can be enabled through below OVSDB settings.
>>>
>>>     enable-keepalive=true
>>>       - Keepalive feature is disabled by default and should be enabled
>>>         at startup before ovs-vswitchd daemon is started.
>>>
>>>     keepalive-interval="5000"
>>>       - Timer interval in milliseconds for monitoring the packet
>>>         processing cores.
>>>
>>> TESTING:
>>>     The testing of keepalive is done using stress cmd (simulating the stalls).
>>>       - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>>>       - stress -c 1 &          [tid is usually the __tid + 1 of the output]
>>>       - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
>>>       - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is running]
>>>       - PMD thread will be descheduled due to its normal priority and yields
>>>         core to stress thread.
>>>
>>>       - ovs-appctl keepalive/pmd-health-show   [Display that the thread is
>>GONE]
>>>       - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the
>>> status]
>>>
>>>       - taskset -p 0x10 <tid>  [This brings back pmd thread to life as stress
>>thread
>>>                                 is moved to idle core]
>>>
>>>       (watch out for stress threads, and carefully pin them to core not to hang
>>your DUTs
>>>        during tesing).
>>>
>>> v5 -> v6
>>>   * Remove 2 patches from series
>>>      - xnanosleep was applied to master as part of high resolution timeout
>>support.
>>>      - Extend get_process_info() API was also applied to master earlier.
>>>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C
>>states, not needed
>>>     for now.
>>>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>>>   * A minor performance degradation(0.5%) is observed with Keepalive
>>enabled.
>>>     [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts and
>>>     1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>>>   * Verified with sparse, MSVC compilers(appveyor).
>>>
>>> v4 -> v5
>>>   * Add 3 more patches to the series
>>>      - xnanosleep()
>>>      - Documentation
>>>      - Update to NEWS
>>>   * Remove all references to core_id and instead implemented thread based
>>tracking.
>>>   * Addressed most of the comments in v4.
>>>
>>> v3 -> v4
>>>   * Split the functionality in to 2 parts. This patch series only updates
>>>     PMD status to OVSDB. The incremental patch series to handle false
>>positives,
>>>     negatives and more checking and stats.
>>>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>>>   * Merged few patches and simplified the patch series.
>>>   * Timestamp in human readable form.
>>>
>>> v2 -> v3
>>>   * Rebase.
>>>   * Verified with dpdk-stable-17.05.1 release.
>>>   * Fixed build issues with MSVC and cross checked with appveyor.
>>>
>>> v1 -> v2
>>>   * Rebase
>>>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
>>>     is already applied as separate patch.
>>>
>>> RFCv3 -> v1
>>>   * Made changes to fix failures in some unit test cases.
>>>   * some more code cleanup w.r.t process related APIs.
>>>
>>> RFCv2 -> RFCv3
>>>   * Remove POSIX shared memory block implementation (suggested by
>>Aaron).
>>>   * Rework the logic to register and track threads instead of cores. This way
>>>     in the future any thread can be registered to KA framework. For now only
>>PMD
>>>     threads are tracked (suggested by Aaron).
>>>   * Refactor few APIs and further clean up the code.
>>>
>>> RFCv1 -> RFCv2
>>>   * Merged the xml and schema commits to later commit where the actual
>>>     implementation is done(suggested by Ben).
>>>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>>>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-
>>show,
>>>     pmd-xstats-show.
>>>   * Refactor code and fixed APIs dealing with PMD health monitoring.
>>>
>>>
>>> Bhanuprakash Bodireddy (8):
>>>   Keepalive: Add initial keepalive configuration.
>>>   dpif-netdev: Register packet processing cores to KA framework.
>>>   dpif-netdev: Enable heartbeats for DPDK datapath.
>>>   keepalive: Retrieve PMD status periodically.
>>>   bridge: Update keepalive status in OVSDB.
>>>   keepalive: Add support to query keepalive status and statistics.
>>>   Documentation: Update DPDK doc with Keepalive feature.
>>>   NEWS: Add keepalive support information in NEWS.
>>>
>>>  Documentation/howto/dpdk.rst | 112 +++++++++
>>>  NEWS                         |   2 +
>>>  lib/automake.mk              |   2 +
>>>  lib/dpif-netdev.c            |  92 ++++++++
>>>  lib/keepalive.c              | 552
>>+++++++++++++++++++++++++++++++++++++++++++
>>>  lib/keepalive.h              | 109 +++++++++
>>>  lib/ovs-thread.c             |   6 +
>>>  lib/ovs-thread.h             |   1 +
>>>  lib/util.c                   |  22 ++
>>>  lib/util.h                   |   1 +
>>>  vswitchd/bridge.c            |  29 +++
>>>  vswitchd/vswitch.ovsschema   |   8 +-
>>>  vswitchd/vswitch.xml         |  49 ++++
>>>  13 files changed, 983 insertions(+), 2 deletions(-)  create mode
>>> 100644 lib/keepalive.c  create mode 100644 lib/keepalive.h
>>>
>>> --
>>> 2.4.11
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>--
>>Flavio
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev