diff mbox

[ovs-dev,RFC,v2,03/19] Keepalive: Add initial keepalive support.

Message ID 1497286187-69287-4-git-send-email-bhanuprakash.bodireddy@intel.com
State Superseded
Headers show

Commit Message

Bodireddy, Bhanuprakash June 12, 2017, 4:49 p.m. UTC
This commit introduces the initial keepalive support by adding
'keepalive' module and also helper and initialization functions
that will be invoked by later commits.

This commit adds new ovsdb column "keepalive". It shows the overall
datapath status and the health of the cores running datapath threads.

For eg:
  To enable keepalive feature.
  'ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-keepalive=true'

  To set timer interval of 5000ms for monitoring packet processing cores;
  'ovs-vsctl --no-wait set Open_vSwitch . \
     other_config:keepalive-interval="5000"

  To set shared memory block name where the events shall be updated
  'ovs-vsctl --no-wait set Open_vSwitch .
     other_config:keepalive-shm-name="/ovs_keepalive_shm_name"'

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
 lib/automake.mk            |   2 +
 lib/dpdk.c                 |  23 ++++++
 lib/dpdk.h                 |   1 +
 lib/keepalive.c            | 173 +++++++++++++++++++++++++++++++++++++++++++++
 lib/keepalive.h            |  60 ++++++++++++++++
 lib/netdev-dpdk.c          |  78 +++++++++++++++++++-
 lib/netdev-dpdk.h          |   5 ++
 vswitchd/vswitch.ovsschema |   7 +-
 vswitchd/vswitch.xml       |  59 ++++++++++++++++
 9 files changed, 405 insertions(+), 3 deletions(-)
 create mode 100644 lib/keepalive.c
 create mode 100644 lib/keepalive.h

Comments

Aaron Conole June 12, 2017, 6:07 p.m. UTC | #1
Hi Bhanu,

Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> writes:

> This commit introduces the initial keepalive support by adding
> 'keepalive' module and also helper and initialization functions
> that will be invoked by later commits.
>
> This commit adds new ovsdb column "keepalive". It shows the overall
> datapath status and the health of the cores running datapath threads.
>
> For eg:
>   To enable keepalive feature.
>   'ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-keepalive=true'
>
>   To set timer interval of 5000ms for monitoring packet processing cores;
>   'ovs-vsctl --no-wait set Open_vSwitch . \
>      other_config:keepalive-interval="5000"
>
>   To set shared memory block name where the events shall be updated
>   'ovs-vsctl --no-wait set Open_vSwitch .
>      other_config:keepalive-shm-name="/ovs_keepalive_shm_name"'
>
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
> ---

Please drop the shm from this in a future spin.  I could break the
internal state quite easily (like the very first torturous thing I
did) by dumping /dev/urandom to the shared memory file.  I could very
easily craft something that will keep this shared memory in a bad state.

In fact, I haven't even tried to do things like overwrite existing
shared memory objects (should be possible, and will break other
projects, ecryptfs is the first that comes to mind).  It would be better
to just drop it.

ovs-appctl keepalive/pmd-health-show
		Keepalive status
keepalive status  : Enabled
keepalive interval: 1000 ms

CORE	STATE	LAST SEEN TIMESTAMP
 7	ALIVE	43344787807209

Datapath Status   : HEALTHY

cat /dev/urandom > /dev/shm/dpdk_keepalive_shm_name
ovs-appctl keepalive/pmd-health-show		Keepalive status
keepalive status  : Enabled
keepalive interval: 1000 ms

CORE	STATE	LAST SEEN TIMESTAMP
 0	(null)	10660124645881463916
 1	(null)	5419620669063293457
 2	(null)	3997290327519151462
 3	(null)	741012298071192240
 4	(null)	17121191240143348475
 5	(null)	196826143357288180
 6	(null)	16991287540667354606
 7	ALIVE	43433690238297
 8	(null)	12605537197433233625
 9	(null)	16165932034124372170
10	(null)	7043153906566983888
11	(null)	3313365658311486089
12	(null)	9472051631753636385
13	(null)	2729591716011495190
14	(null)	8116719657874052902
15	(null)	6170218932000590981
16	(null)	11940062704289009242
17	(null)	16054426603346992693
18	(null)	11733783812145211587
19	(null)	17422311991122429793
20	(null)	12157602060703720484
21	(null)	4702539880649374601
22	(null)	3308829437461656258
23	(null)	10686327068067950801
24	(null)	1062353524616742042
25	(null)	3411196120056755362
26	(null)	14380258066374538906
27	(null)	1994426035553696437
28	(null)	14629305673046216947
29	(null)	6815178381412532977
30	(null)	3837867130711225730
31	(null)	14372514319083311235
32	(null)	16904818025985329362
33	(null)	7874035779643796944
34	(null)	14137956378807162773
35	(null)	2135100817994545762
36	(null)	938767493018956293
37	(null)	549204087805245919
38	(null)	10719355933507854137
39	(null)	263240089144185313
40	(null)	6073326489453212261
41	(null)	8116030765185514690
42	(null)	1248609037227101558
43	(null)	6678443548270239496
44	(null)	5847797356619529282
45	(null)	14391879962356278324
46	(null)	1716234857307350191
47	(null)	7574042054802707876
48	(null)	16860876577697127818
49	(null)	7895935822970790600
50	(null)	5153886903119256630
51	(null)	6363876866636362709
52	(null)	17623253782684394967
53	(null)	13094434757667773855
54	(null)	1342198222097391716
55	(null)	5961615020761719880
56	(null)	175752874781730961
57	(null)	16164123257846350774
58	(null)	3640530682414592532
59	(null)	14841131474158653980
60	(null)	17010280974634724206
61	(null)	5804918975148153365
62	(null)	1467237527024283686
63	(null)	796105455950228646
64	(null)	3888015814214546233
65	(null)	13081487125645211001
66	(null)	3487953759259578515
67	(null)	1219525686458220333
68	(null)	13050098724568496629
69	(null)	5616895831600742933
70	(null)	4515459817496319851
71	(null)	4300342810383125296
72	(null)	2089763498196056699
73	(null)	4558921459658476837
74	(null)	9589791739912051982
75	(null)	7272211504898147193
76	(null)	153518444253159437
77	(null)	11821286970977580690
78	(null)	6027047558190586472
79	(null)	4444531704085926681
80	(null)	2759824267120885517
81	(null)	16600346914700514203
82	(null)	14476746472648794914
83	(null)	6836491613483326907
84	(null)	3891839138365995246
85	(null)	16117423162606961380
86	(null)	18263784802693730952
87	(null)	9904243893554832371
88	(null)	4143808308929447327
89	(null)	16809817010561028285
90	(null)	11469071700772754979
91	(null)	3962294215679521615
92	(null)	204372726457692892
93	(null)	15106779410389116135
94	(null)	11449324126036287827
95	(null)	8941138084047006708
96	(null)	4723903076832588171
97	(null)	4401931717370190036
98	(null)	4908359066639256207
99	(null)	8440636070681774753
100	(null)	13761214205257801060
101	(null)	17872778920723388899
102	(null)	12113388264038981360
103	(null)	17875097124559172525
104	(null)	834690981501251676
105	(null)	13643461999341349109
106	(null)	2362283555242969385
107	(null)	445267096604159221
108	(null)	16382407908611601109
109	(null)	5439783483802505466
110	(null)	14232074262186003839
111	(null)	13625303887042450726
112	(null)	3937990477933492467
113	(null)	10730991513718300068
114	(null)	8438291360705909413
115	(null)	10487084900559550739
116	(null)	11133741592439444843
117	(null)	7016453285148043103
118	(null)	3551521114318926201
119	(null)	5173646250762172263
120	(null)	3620228314653289305
121	(null)	6407373942480887984
122	(null)	15939516032859279633
123	(null)	14285513721828072858
124	(null)	13743510024269650309
125	(null)	4770228809351045051
126	(null)	4282030962150088172
127	(null)	1674150412751264599

Datapath Status   : HEALTHY
Bodireddy, Bhanuprakash June 13, 2017, 10:59 a.m. UTC | #2
>Hi Bhanu,
>
>Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> writes:
>
>> This commit introduces the initial keepalive support by adding
>> 'keepalive' module and also helper and initialization functions that
>> will be invoked by later commits.
>>
>> This commit adds new ovsdb column "keepalive". It shows the overall
>> datapath status and the health of the cores running datapath threads.
>>
>> For eg:
>>   To enable keepalive feature.
>>   'ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-
>keepalive=true'
>>
>>   To set timer interval of 5000ms for monitoring packet processing cores;
>>   'ovs-vsctl --no-wait set Open_vSwitch . \
>>      other_config:keepalive-interval="5000"
>>
>>   To set shared memory block name where the events shall be updated
>>   'ovs-vsctl --no-wait set Open_vSwitch .
>>      other_config:keepalive-shm-name="/ovs_keepalive_shm_name"'
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> <bhanuprakash.bodireddy@intel.com>
>> ---
>
>Please drop the shm from this in a future spin.  I could break the internal state
>quite easily (like the very first torturous thing I
>did) by dumping /dev/urandom to the shared memory file.  I could very easily
>craft something that will keep this shared memory in a bad state.
>
>In fact, I haven't even tried to do things like overwrite existing shared memory
>objects (should be possible, and will break other projects, ecryptfs is the first
>that comes to mind).  It would be better to just drop it.

Hi Aaron,

I agree with you.  I have already started working on the code to get rid of SHM.
I will wait for few more days to see if there is any more feedback on the remaining patches and will send out new series(v3 - without SHM implementation).

Bhanuprakash.
diff mbox

Patch

diff --git a/lib/automake.mk b/lib/automake.mk
index f5baba2..1b05221 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -110,6 +110,8 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/json.c \
 	lib/jsonrpc.c \
 	lib/jsonrpc.h \
+	lib/keepalive.c \
+	lib/keepalive.h \
 	lib/lacp.c \
 	lib/lacp.h \
 	lib/latch.h \
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 9c764b9..3f5669b 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -32,6 +32,7 @@ 
 
 #include "dirs.h"
 #include "fatal-signal.h"
+#include "keepalive.h"
 #include "netdev-dpdk.h"
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/vlog.h"
@@ -477,6 +478,28 @@  dpdk_init(const struct smap *ovs_other_config)
     }
 }
 
+int
+dpdk_ka_init(void)
+{
+    struct keepalive_shm *ka_shm = get_ka_shm();
+    if (!ka_shm) {
+        VLOG_ERR("SHM uninitialized? keepalive initialization aborted.");
+        return -1;
+    }
+
+    /* Initialize keepalive subsystem */
+    if ((rte_global_keepalive_info =
+            rte_keepalive_create(&dpdk_failcore_cb, ka_shm)) == NULL) {
+        VLOG_ERR("Keepalive initialization failed.");
+        return -1;
+    } else {
+        rte_keepalive_register_relay_callback(rte_global_keepalive_info,
+            dpdk_ka_update_core_state, ka_shm);
+    }
+
+    return 0;
+}
+
 const char *
 dpdk_get_vhost_sock_dir(void)
 {
diff --git a/lib/dpdk.h b/lib/dpdk.h
index bdbb51b..dc830c4 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -37,6 +37,7 @@  struct smap;
 
 struct rte_keepalive *rte_global_keepalive_info;
 void dpdk_init(const struct smap *ovs_other_config);
+int dpdk_ka_init(void);
 void dpdk_set_lcore_id(unsigned cpu);
 const char *dpdk_get_vhost_sock_dir(void);
 
diff --git a/lib/keepalive.c b/lib/keepalive.c
new file mode 100644
index 0000000..8d2b0a0
--- /dev/null
+++ b/lib/keepalive.c
@@ -0,0 +1,173 @@ 
+/*
+ * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <sys/mman.h>
+#include <unistd.h>
+
+#include "dpdk.h"
+#include "keepalive.h"
+#include "lib/vswitch-idl.h"
+#include "openvswitch/vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(keepalive);
+
+static bool keepalive_enable = false;   /* KeepAlive disabled by default */
+static bool ka_init_status = ka_init_failure; /* KeepAlive initialization */
+static uint32_t keepalive_timer_interval;     /* keepalive timer interval */
+
+static const char *keepalive_shm_blk = NULL;
+struct keepalive_shm *ka_shm = NULL;
+
+/* Return the Keepalive shared memory block name. */
+static inline const char *
+get_ka_shm_blk(void)
+{
+    return keepalive_shm_blk;
+}
+
+inline struct keepalive_shm *
+get_ka_shm(void)
+{
+    return ka_shm;
+}
+
+void
+ka_set_pmd_state_ts(unsigned core_id, enum keepalive_state state,
+                    uint64_t last_alive)
+{
+    ka_shm->core_state[core_id] = state;
+    ka_shm->core_last_seen_times[core_id] = last_alive;
+}
+
+/* Retrieve and return the keepalive timer interval from OVSDB. */
+static uint32_t
+get_ka_timer_interval(const struct smap *ovs_other_config OVS_UNUSED)
+{
+#define OVS_KEEPALIVE_TIMEOUT 1000    /* Default timeout set to 1000ms */
+    uint32_t ka_interval;
+
+    /* Timer granularity in milliseconds
+     * Defaults to OVS_KEEPALIVE_TIMEOUT(ms) if not set */
+    ka_interval = smap_get_int(ovs_other_config, "keepalive-interval",
+                  OVS_KEEPALIVE_TIMEOUT);
+
+    VLOG_INFO("Keepalive timer interval set to %"PRIu32" (ms)\n", ka_interval);
+    return ka_interval;
+}
+
+static const char *
+get_ka_shm_block(const struct smap *ovs_other_config OVS_UNUSED)
+{
+/* Shared mem block. */
+#define OVS_KEEPALIVE_SHM_NAME /dpdk_keepalive_shm_name
+    keepalive_shm_blk = smap_get(ovs_other_config, "keepalive-shm-name");
+    if (!keepalive_shm_blk) {
+        keepalive_shm_blk = OVS_STRINGIZE(OVS_KEEPALIVE_SHM_NAME);
+    }
+
+    VLOG_INFO("KeepAlive shared memory block: %s\n", keepalive_shm_blk);
+    return keepalive_shm_blk;
+}
+
+/* Create POSIX Shared memory object and initialize the core states. */
+static
+struct keepalive_shm *keepalive_shm_create(void)
+{
+    int fd;
+    int coreid;
+    struct keepalive_shm *ka_shm;
+    char ka_shmblk[40];
+
+    sprintf(ka_shmblk, "%s", get_ka_shm_blk());
+    if (shm_unlink(ka_shmblk) == -1 && errno != ENOENT) {
+        VLOG_ERR("Error unlinking stale %s \n", ka_shmblk);
+    }
+
+    if ((fd = shm_open(ka_shmblk,
+           O_CREAT | O_TRUNC | O_RDWR, 0666)) < 0) {
+        VLOG_WARN("Failed to open %s as SHM \n", ka_shmblk);
+    } else if (ftruncate(fd, sizeof(struct keepalive_shm)) != 0) {
+        VLOG_WARN("Failed to resize SHM \n");
+    } else {
+        ka_shm = (struct keepalive_shm *) mmap(
+           0, sizeof(struct keepalive_shm),
+            PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+        close(fd);
+        if (ka_shm == MAP_FAILED) {
+            VLOG_WARN("Failed to mmap SHM \n");
+        } else {
+            memset(ka_shm, 0, sizeof(struct keepalive_shm));
+
+            /* Mark all cores to 'not present' */
+            for (coreid = 0; coreid < KEEPALIVE_MAXCORES; coreid++) {
+                ka_shm->core_state[coreid] = KA_STATE_UNUSED;
+                ka_shm->core_last_seen_times[coreid] = 0;
+            }
+
+            return ka_shm;
+        }
+    }
+    return NULL;
+}
+
+static int
+ka_init__(void)
+{
+#ifdef DPDK_NETDEV
+    return dpdk_ka_init();
+#else
+    return -1;
+#endif
+}
+
+void
+ka_init(const struct smap *ovs_other_config)
+{
+    if (ka_init_status || !ovs_other_config) {
+        return;
+    }
+
+    static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER;
+    if (ovsthread_once_start(&once_enable)) {
+        if (smap_get_bool(ovs_other_config, "enable-keepalive", false)) {
+            keepalive_enable = true;
+            VLOG_INFO("OvS Keepalive enabled.");
+
+            keepalive_timer_interval =
+                get_ka_timer_interval(ovs_other_config);
+            keepalive_shm_blk = get_ka_shm_block(ovs_other_config);
+
+            /* Create shared memory block */
+            if ((ka_shm = keepalive_shm_create()) != NULL) {
+                int err = ka_init__();
+                if (!err) {
+                    VLOG_INFO("OvS Keepalive - initialized.");
+                    ka_init_status = ka_init_success;
+                }
+            } else {
+                VLOG_ERR("keepalive_shm_create() failed.");
+            }
+        } else {
+            VLOG_INFO("OvS Keepalive disabled.");
+        }
+
+        ovsthread_once_done(&once_enable);
+    }
+}
diff --git a/lib/keepalive.h b/lib/keepalive.h
new file mode 100644
index 0000000..2e844fb
--- /dev/null
+++ b/lib/keepalive.h
@@ -0,0 +1,60 @@ 
+/*
+ * Copyright (c) 2016 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef KEEPALIVE_H
+#define KEEPALIVE_H
+
+#include <stdint.h>
+#ifdef DPDK_NETDEV
+#include <rte_keepalive.h>
+#define KEEPALIVE_MAXCORES RTE_KEEPALIVE_MAXCORES
+#else
+#define KEEPALIVE_MAXCORES 128
+#endif /* DPDK_NETDEV */
+
+struct smap;
+
+enum keepalive_state {
+    KA_STATE_UNUSED = 0,
+    KA_STATE_ALIVE = 1,
+    KA_STATE_MISSING = 4,
+    KA_STATE_DEAD = 2,
+    KA_STATE_GONE = 3,
+    KA_STATE_DOZING = 5,
+    KA_STATE_SLEEP = 6,
+    KA_STATE_CHECK = 7
+};
+
+struct keepalive_shm {
+    enum keepalive_state core_state[KEEPALIVE_MAXCORES];
+
+    /* Last seen timestamp of the core */
+    uint64_t core_last_seen_times[KEEPALIVE_MAXCORES];
+
+    /* Store pmd thread tid */
+    pid_t thread_id[KEEPALIVE_MAXCORES];
+};
+
+enum keepalive_status {
+   ka_init_failure = 0,
+   ka_init_success
+};
+
+void ka_init(const struct smap *);
+struct keepalive_shm *get_ka_shm(void);
+void ka_set_pmd_state_ts(unsigned, enum keepalive_state, uint64_t);
+
+#endif /* keepalive.h */
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 810800e..24a87bb 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -32,12 +32,14 @@ 
 #include <rte_mbuf.h>
 #include <rte_meter.h>
 #include <rte_virtio_net.h>
+#include <rte_keepalive.h>
 
 #include "dirs.h"
 #include "dp-packet.h"
 #include "dpdk.h"
 #include "dpif-netdev.h"
 #include "fatal-signal.h"
+#include "keepalive.h"
 #include "netdev-provider.h"
 #include "netdev-vport.h"
 #include "odp-util.h"
@@ -48,8 +50,9 @@ 
 #include "ovs-numa.h"
 #include "ovs-thread.h"
 #include "ovs-rcu.h"
-#include "packets.h"
 #include "openvswitch/shash.h"
+#include "packets.h"
+#include "process.h"
 #include "smap.h"
 #include "sset.h"
 #include "unaligned.h"
@@ -567,6 +570,79 @@  dpdk_mp_put(struct dpdk_mp *dmp)
     ovs_mutex_unlock(&dpdk_mp_mutex);
 }
 
+/* Callback function invoked on heartbeat miss.  Verify if it is genuine
+ * heartbeat miss or a false positive and log the message accordingly.
+ */
+void
+dpdk_failcore_cb(void *ptr_data, const int core_id)
+{
+    struct keepalive_shm *ka_shm = (struct keepalive_shm *)ptr_data;
+
+    if (ka_shm) {
+        int pstate;
+        uint32_t tid = ka_shm->thread_id[core_id];
+        int err = get_process_status(tid, &pstate);
+
+        if (!err) {
+            switch (pstate) {
+
+            case ACTIVE_STATE:
+                VLOG_INFO_RL(&rl,"False positive, pmd tid[%"PRIu32"] alive\n",
+                                  tid);
+                break;
+            case STOPPED_STATE:
+            case TRACED_STATE:
+            case DEFUNC_STATE:
+            case UNINTERRUPTIBLE_SLEEP_STATE:
+                VLOG_WARN_RL(&rl,
+                    "PMD tid[%"PRIu32"] on core[%d] is unresponsive\n",
+                    tid, core_id);
+                break;
+            default:
+                VLOG_DBG("%s: The process state: %d\n", __FUNCTION__, pstate);
+                OVS_NOT_REACHED();
+            }
+        }
+    }
+}
+
+/* Update the core state in shared memory.
+ *
+ * This function shall be invoked periodically to write the core status and
+ * last seen timestamp of the cores in to shared memory block.
+ */
+void
+dpdk_ka_update_core_state(void *ptr_data, const int core_id,
+       const enum rte_keepalive_state core_state, uint64_t last_alive)
+{
+    struct keepalive_shm *ka_shm = (struct keepalive_shm *)ptr_data;
+    if (!ka_shm) {
+        VLOG_ERR_RL(&rl, "KeepAlive: Invalid shared memory block\n");
+        return;
+    }
+
+    VLOG_DBG_RL(&rl,
+               "TS(%lu):CORE%d, old state:%d, current_state:%d\n",
+               (unsigned long)time(NULL),core_id,ka_shm->core_state[core_id],
+               core_state);
+
+    switch (core_state) {
+    case RTE_KA_STATE_ALIVE:
+    case RTE_KA_STATE_MISSING:
+        ka_set_pmd_state_ts(core_id, KA_STATE_ALIVE, last_alive);
+        break;
+    case RTE_KA_STATE_DOZING:
+    case RTE_KA_STATE_SLEEP:
+    case RTE_KA_STATE_DEAD:
+    case RTE_KA_STATE_GONE:
+        ka_set_pmd_state_ts(core_id, core_state, last_alive);
+        break;
+    case RTE_KA_STATE_UNUSED:
+        ka_set_pmd_state_ts(core_id, KA_STATE_UNUSED, 0);
+        break;
+    }
+}
+
 /* Tries to allocate new mempool on requested_socket_id with
  * mbuf size corresponding to requested_mtu.
  * On success new configuration will be applied.
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..229e0d0 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -18,15 +18,20 @@ 
 #define NETDEV_DPDK_H
 
 #include <config.h>
+#include <stdint.h>
 
 #include "openvswitch/compiler.h"
 
 struct dp_packet;
+enum rte_keepalive_state;
 
 #ifdef DPDK_NETDEV
 
 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
+void dpdk_failcore_cb(void *, const int);
+void dpdk_ka_update_core_state(void *ptr, const int,
+                               const enum rte_keepalive_state, uint64_t);
 
 #else
 
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 19b49da..769434e 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@ 
 {"name": "Open_vSwitch",
- "version": "7.15.0",
- "cksum": "544856471 23228",
+ "version": "7.16.0",
+ "cksum": "2916438977 23364",
  "tables": {
    "Open_vSwitch": {
      "columns": {
@@ -28,6 +28,9 @@ 
        "statistics": {
          "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"},
          "ephemeral": true},
+       "keepalive": {
+         "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"},
+         "ephemeral": true},
        "ovs_version": {
          "type": {"key": {"type": "string"},
                   "min": 0, "max": 1}},
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index d219bfd..984ce7d 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -531,6 +531,65 @@ 
           </p>
         </column>
       </group>
+
+      <group title="Keepalive">
+        <p>
+          The <code>keepalive</code> column contains key-value pairs that
+          report health of datapath cores in Open vSwitch.  These are updated
+          periodically (based on the keepalive-interval).
+        </p>
+
+        <column name="other_config" key="enable-keepalive"
+                type='{"type": "boolean"}'>
+          Keepalive is disabled by default to avoid overhead in the common
+          case when heartbeat monitoring is not useful.  Set this value to
+          <code>true</code> to enable keepalive <ref column="keepalive"/>
+          column or to <code>false</code> to explicitly disable it.
+        </column>
+
+        <column name="other_config" key="keepalive-interval"
+                type='{"type": "integer", "minInteger": 1}'>
+          <p>
+            Specifies the keepalive interval value.
+          </p>
+          <p>
+            If not specified, this will be set to 1000 milliseconds (default
+            value). Changing this value requires restarting the daemon.
+          </p>
+        </column>
+
+        <column name="other_config" key="keepalive-shm-name"
+              type='{"type": "string"}'>
+          <p>
+            Specifies the keepalive shared memory block name.
+          </p>
+          <p>
+            If not specified, shared memory block named "keepalive_shm_name"
+            (default name) is created. Changing this value requires restarting
+            the daemon.
+          </p>
+        </column>
+
+        <column name="keepalive" key="CORE_ID">
+          <p>
+            One such key-value pair, with <code>ID</code> replaced by the
+            core id, will exist for each active PMD thread.  The value is a
+            comma-separated list of status of PMD core and last seen timestamp
+            of PMD thread. In respective order, these values are:
+          </p>
+
+          <ol>
+            <li>Status of PMD core.  Valid values include ALIVE, MISSING, DEAD,
+            GONE, DOZING, SLEEPING.</li>
+            <li>Last seen timestamp of the PMD core.</li>
+          </ol>
+
+          <p>
+            This is only valid for OvS-DPDK Datapath and only PMD threads status
+            is implemented.
+          </p>
+        </column>
+      </group>
     </group>
 
     <group title="Version Reporting">