diff mbox

[ovs-dev,RFC,02/21] ovsschema: Introduce 'keepalive' column in Open_vSwitch.

Message ID 1496852117-71097-3-git-send-email-bhanuprakash.bodireddy@intel.com
State Superseded
Headers show

Commit Message

Bodireddy, Bhanuprakash June 7, 2017, 4:14 p.m. UTC
This commit adds new ovsdb column "keepalive". It shows the overall datapath
status and the health of the cores running datapath threads.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
 vswitchd/vswitch.ovsschema |  7 +++++--
 vswitchd/vswitch.xml       | 20 ++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

Comments

Ben Pfaff June 7, 2017, 9:25 p.m. UTC | #1
On Wed, Jun 07, 2017 at 05:14:58PM +0100, Bhanuprakash Bodireddy wrote:
> This commit adds new ovsdb column "keepalive". It shows the overall datapath
> status and the health of the cores running datapath threads.
> 
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>

I'm a little uncomfortable with having OVS report that it's
nonfunctional.  If it's dead, then from my point of view the most
natural response would be to call abort(), to let the monitoring process
restart it and presumably fix the problem.  What's the guiding
philosophy here?
Bodireddy, Bhanuprakash June 8, 2017, 1:59 p.m. UTC | #2
>On Wed, Jun 07, 2017 at 05:14:58PM +0100, Bhanuprakash Bodireddy wrote:
>> This commit adds new ovsdb column "keepalive". It shows the overall
>> datapath status and the health of the cores running datapath threads.
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> <bhanuprakash.bodireddy@intel.com>
>
>I'm a little uncomfortable with having OVS report that it's nonfunctional.  If it's
>dead, then from my point of view the most natural response would be to call
>abort(), to let the monitoring process restart it and presumably fix the
>problem.  What's the guiding philosophy here?

Hello Ben,

In some scenarios its correct to let the monitoring process instantly restart the OvS in case of failures.

However, as part of OPNFV Barometer project, key KPI statistics are exposed to monitor the health of computes. This includes CPU, Memory, Cache utilization, Link status, packet statistics, Networking MIBS etc. vSwitch health is most important and the same is exposed with KA patches to monitoring apps like collectd, which internally relays the information to OpenStack service Ceilometer. As you are aware Ceilometer only collects the events and metering data and isn't entitled to take any decisions.

In case of vSwitch issue, based on the criticality of the failure and also considering other KPIs from compute, fault management services like 'Doctor' can take actions to migrate the VNFs to other compute and further mark the compute node as offline so that nova wont schedule VMs on this problematic compute.

- Bhanuprakash.
diff mbox

Patch

diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 19b49da..769434e 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@ 
 {"name": "Open_vSwitch",
- "version": "7.15.0",
- "cksum": "544856471 23228",
+ "version": "7.16.0",
+ "cksum": "2916438977 23364",
  "tables": {
    "Open_vSwitch": {
      "columns": {
@@ -28,6 +28,9 @@ 
        "statistics": {
          "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"},
          "ephemeral": true},
+       "keepalive": {
+         "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"},
+         "ephemeral": true},
        "ovs_version": {
          "type": {"key": {"type": "string"},
                   "min": 0, "max": 1}},
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 59c96df..fd4ba04 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -569,6 +569,26 @@ 
             the daemon.
           </p>
         </column>
+
+        <column name="keepalive" key="CORE_ID">
+          <p>
+            One such key-value pair, with <code>ID</code> replaced by the
+            core id, will exist for each active PMD thread.  The value is a
+            comma-separated list of status of PMD core and last seen timestamp
+            of PMD thread. In respective order, these values are:
+          </p>
+
+          <ol>
+            <li>Status of PMD core.  Valid values include ALIVE, MISSING, DEAD,
+            GONE, DOZING, SLEEPING.</li>
+            <li>Last seen timestamp of the PMD core.</li>
+          </ol>
+
+          <p>
+            This is only valid for OvS-DPDK Datapath and only PMD threads status
+            is implemented.
+          </p>
+        </column>
       </group>
     </group>