diff mbox series

[ovs-dev] netdev-afxdp: Add interrupt mode using poll syscall.

Message ID 1582316690-8678-1-git-send-email-u9012063@gmail.com
State Changes Requested
Headers show
Series [ovs-dev] netdev-afxdp: Add interrupt mode using poll syscall. | expand

Commit Message

William Tu Feb. 21, 2020, 8:24 p.m. UTC
The patch adds a new option 'use-intr' to enable afxdp interrupt
mode.  At receive path, add a poll() syscall so that when there
is no packet arrived, the pmd thread will be blocked and this
saves some CPU time for other processes. This avoids burning the
CPU to always 100% when there is no traffic. Disabled by default.

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/653613549
Signed-off-by: William Tu <u9012063@gmail.com>
---
 NEWS                       |  3 +++
 lib/netdev-afxdp.c         | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 lib/netdev-linux-private.h |  2 ++
 vswitchd/vswitch.xml       | 12 ++++++++++++
 4 files changed, 61 insertions(+), 1 deletion(-)

Comments

Ilya Maximets Feb. 24, 2020, 10:47 a.m. UTC | #1
On 2/21/20 9:24 PM, William Tu wrote:
> The patch adds a new option 'use-intr' to enable afxdp interrupt
> mode.  At receive path, add a poll() syscall so that when there
> is no packet arrived, the pmd thread will be blocked and this
> saves some CPU time for other processes. This avoids burning the
> CPU to always 100% when there is no traffic. Disabled by default.

Sleeping inside the PMD thread is not a good idea in general.
If one port doesn't have packets this doesn't mean that other
ports are idle too.  With this patch, PMD thread will probably
sleep for 1 second for each rxq without packets?  Am I right?

Also, sleeping while not in a quiescent state will produce
additional issues will too late rcu calls and stalls of other
threads waiting on rcu synchronization.

I also spotted that you're entering quiescent state at some
point, but who will end this state?  PMD thread will continue
working in a quiescent state and will probably crash while trying
to use rcu-protected data structures like flow tables.

IMHO, for this case you just need to create a non-pmd version
of netdev-afxdp with rxq_wait() implemented.  These ports
will be handled by the main thread without consuming extra CPU
resources.

Best regards, Ilya Maximets.
William Tu Feb. 25, 2020, 1:31 a.m. UTC | #2
On Mon, Feb 24, 2020 at 2:47 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 2/21/20 9:24 PM, William Tu wrote:
> > The patch adds a new option 'use-intr' to enable afxdp interrupt
> > mode.  At receive path, add a poll() syscall so that when there
> > is no packet arrived, the pmd thread will be blocked and this
> > saves some CPU time for other processes. This avoids burning the
> > CPU to always 100% when there is no traffic. Disabled by default.
>
> Sleeping inside the PMD thread is not a good idea in general.
> If one port doesn't have packets this doesn't mean that other
> ports are idle too.  With this patch, PMD thread will probably
> sleep for 1 second for each rxq without packets?  Am I right?

Timeout will be 1 millisecond.
>
> Also, sleeping while not in a quiescent state will produce
> additional issues will too late rcu calls and stalls of other
> threads waiting on rcu synchronization.
>
> I also spotted that you're entering quiescent state at some
> point, but who will end this state?  PMD thread will continue
> working in a quiescent state and will probably crash while trying
> to use rcu-protected data structures like flow tables.

Right, I should also end the quiescent state somewhere.

>
> IMHO, for this case you just need to create a non-pmd version
> of netdev-afxdp with rxq_wait() implemented.  These ports
> will be handled by the main thread without consuming extra CPU
> resources.
>
Hi Ilya,

Thanks for your feedback.
I will work on the idea of setting to non-pmd version.

William
William Tu Feb. 26, 2020, 9:18 p.m. UTC | #3
On Mon, Feb 24, 2020 at 5:31 PM William Tu <u9012063@gmail.com> wrote:
>
> On Mon, Feb 24, 2020 at 2:47 AM Ilya Maximets <i.maximets@ovn.org> wrote:
> >
> > On 2/21/20 9:24 PM, William Tu wrote:
> > > The patch adds a new option 'use-intr' to enable afxdp interrupt
> > > mode.  At receive path, add a poll() syscall so that when there
> > > is no packet arrived, the pmd thread will be blocked and this
> > > saves some CPU time for other processes. This avoids burning the
> > > CPU to always 100% when there is no traffic. Disabled by default.
> >
> > Sleeping inside the PMD thread is not a good idea in general.
> > If one port doesn't have packets this doesn't mean that other
> > ports are idle too.  With this patch, PMD thread will probably
> > sleep for 1 second for each rxq without packets?  Am I right?
>
> Timeout will be 1 millisecond.
> >
> > Also, sleeping while not in a quiescent state will produce
> > additional issues will too late rcu calls and stalls of other
> > threads waiting on rcu synchronization.
> >
> > I also spotted that you're entering quiescent state at some
> > point, but who will end this state?  PMD thread will continue
> > working in a quiescent state and will probably crash while trying
> > to use rcu-protected data structures like flow tables.
>
> Right, I should also end the quiescent state somewhere.
>
> >
> > IMHO, for this case you just need to create a non-pmd version
> > of netdev-afxdp with rxq_wait() implemented.  These ports
> > will be handled by the main thread without consuming extra CPU
> > resources.
> >
> Hi Ilya,
>
> Thanks for your feedback.
> I will work on the idea of setting to non-pmd version.
>

Hi Ilya,

I have two approaches to this non-pmd versions. I'm thinking about
which directions to go.

1) Dynamically setting .is_pmd to true/false.
So users can use
ovs-vsctl -- set int eth0 options:use-intr=true/false
The datapath will have to reconfigure the pmd to queue assignment.
Need to maintain per afxdp netdev's is_pmd value (some netdev might
set to true, some might be false), now the is_pmd is per-netdev_class.
I think this is pretty complicated changes.

2) Another solution is to statically create another netdev_class, with
.name = "afxdp-nonpmd",
.is_pmd = false,
I think this is much easier.

What do you think?
William
Ilya Maximets Feb. 27, 2020, 3:33 p.m. UTC | #4
On 2/26/20 10:18 PM, William Tu wrote:
> On Mon, Feb 24, 2020 at 5:31 PM William Tu <u9012063@gmail.com> wrote:
>>
>> On Mon, Feb 24, 2020 at 2:47 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>>>
>>> On 2/21/20 9:24 PM, William Tu wrote:
>>>> The patch adds a new option 'use-intr' to enable afxdp interrupt
>>>> mode.  At receive path, add a poll() syscall so that when there
>>>> is no packet arrived, the pmd thread will be blocked and this
>>>> saves some CPU time for other processes. This avoids burning the
>>>> CPU to always 100% when there is no traffic. Disabled by default.
>>>
>>> Sleeping inside the PMD thread is not a good idea in general.
>>> If one port doesn't have packets this doesn't mean that other
>>> ports are idle too.  With this patch, PMD thread will probably
>>> sleep for 1 second for each rxq without packets?  Am I right?
>>
>> Timeout will be 1 millisecond.
>>>
>>> Also, sleeping while not in a quiescent state will produce
>>> additional issues will too late rcu calls and stalls of other
>>> threads waiting on rcu synchronization.
>>>
>>> I also spotted that you're entering quiescent state at some
>>> point, but who will end this state?  PMD thread will continue
>>> working in a quiescent state and will probably crash while trying
>>> to use rcu-protected data structures like flow tables.
>>
>> Right, I should also end the quiescent state somewhere.
>>
>>>
>>> IMHO, for this case you just need to create a non-pmd version
>>> of netdev-afxdp with rxq_wait() implemented.  These ports
>>> will be handled by the main thread without consuming extra CPU
>>> resources.
>>>
>> Hi Ilya,
>>
>> Thanks for your feedback.
>> I will work on the idea of setting to non-pmd version.
>>
> 
> Hi Ilya,
> 
> I have two approaches to this non-pmd versions. I'm thinking about
> which directions to go.
> 
> 1) Dynamically setting .is_pmd to true/false.
> So users can use
> ovs-vsctl -- set int eth0 options:use-intr=true/false
> The datapath will have to reconfigure the pmd to queue assignment.
> Need to maintain per afxdp netdev's is_pmd value (some netdev might
> set to true, some might be false), now the is_pmd is per-netdev_class.
> I think this is pretty complicated changes.
> 
> 2) Another solution is to statically create another netdev_class, with
> .name = "afxdp-nonpmd",
> .is_pmd = false,
> I think this is much easier.
> 
> What do you think?

I'd prefer the second option if it doesn't require much code duplication.
First option looks a little bit like an overkill.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index f62ef1f47ea8..d91002a76fff 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@  Post-v2.13.0
    - OpenFlow:
      * The OpenFlow ofp_desc/serial_num may now be configured by setting the
        value of other-config:dp-sn in the Bridge table.
+   - AF_XDP:
+     * New option 'use-intr' for netdev-afxdp to save CPU cycles by enabling
+       interrupt mode. Disabled by default.
 
 
 v2.13.0 - 14 Feb 2020
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index 482400d8d135..7b60acde1b02 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -604,6 +604,7 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
     enum afxdp_mode xdp_mode;
     bool need_wakeup;
     int new_n_rxq;
+    bool use_intr;
 
     ovs_mutex_lock(&dev->mutex);
     new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
@@ -637,12 +638,16 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
     }
 #endif
 
+    use_intr = smap_get_bool(args, "use-intr", false);
+
     if (dev->requested_n_rxq != new_n_rxq
         || dev->requested_xdp_mode != xdp_mode
-        || dev->requested_need_wakeup != need_wakeup) {
+        || dev->requested_need_wakeup != need_wakeup
+        || dev->requested_use_intr != use_intr) {
         dev->requested_n_rxq = new_n_rxq;
         dev->requested_xdp_mode = xdp_mode;
         dev->requested_need_wakeup = need_wakeup;
+        dev->requested_use_intr = use_intr;
         netdev_request_reconfigure(netdev);
     }
     ovs_mutex_unlock(&dev->mutex);
@@ -661,6 +666,8 @@  netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
                     xdp_modes[dev->xdp_mode_in_use].name);
     smap_add_format(args, "use-need-wakeup", "%s",
                     dev->use_need_wakeup ? "true" : "false");
+    smap_add_format(args, "use-intr", "%s",
+                    dev->use_intr ? "true" : "false");
     ovs_mutex_unlock(&dev->mutex);
     return 0;
 }
@@ -696,6 +703,7 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     if (netdev->n_rxq == dev->requested_n_rxq
         && dev->xdp_mode == dev->requested_xdp_mode
         && dev->use_need_wakeup == dev->requested_need_wakeup
+        && dev->use_intr == dev->requested_use_intr
         && dev->xsks) {
         goto out;
     }
@@ -713,6 +721,7 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
         VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", ovs_strerror(errno));
     }
     dev->use_need_wakeup = dev->requested_need_wakeup;
+    dev->use_intr = dev->requested_use_intr;
 
     err = xsk_configure_all(netdev);
     if (err) {
@@ -815,6 +824,32 @@  prepare_fill_queue(struct xsk_socket_info *xsk_info)
     xsk_info->available_rx += BATCH_SIZE;
 }
 
+static int
+enable_intr_mode(int fd, struct netdev *netdev)
+{
+    struct pollfd fds[1];
+    int ret;
+
+    memset(fds, 0, sizeof fds);
+    fds[0].fd = fd;
+    fds[0].events = POLLIN;
+
+    ret = poll(fds, 1, 1);
+    if (OVS_UNLIKELY(ret == 0)) {
+        /* Timeout. */
+        ovsrcu_quiesce_start();
+        return EAGAIN;
+    }
+    if (OVS_UNLIKELY(ret < 0)) {
+        VLOG_WARN_RL(&rl, "%s: error polling rx fd: %s.",
+                     netdev_get_name(netdev),
+                     ovs_strerror(errno));
+        return EAGAIN;
+    }
+
+    return 0;
+}
+
 int
 netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
                       int *qfill)
@@ -827,6 +862,7 @@  netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
     uint32_t idx_rx = 0;
     int qid = rxq_->queue_id;
     unsigned int rcvd, i;
+    int ret;
 
     xsk_info = dev->xsks[qid];
     if (!xsk_info || !xsk_info->xsk) {
@@ -838,6 +874,13 @@  netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
     umem = xsk_info->umem;
     rx->fd = xsk_socket__fd(xsk_info->xsk);
 
+    if (OVS_UNLIKELY(dev->use_intr)) {
+        ret = enable_intr_mode(rx->fd, netdev);
+        if (ret > 0) {
+            return ret;
+        }
+    }
+
     rcvd = xsk_ring_cons__peek(&xsk_info->rx, BATCH_SIZE, &idx_rx);
     if (!rcvd) {
         xsk_rx_wakeup_if_needed(umem, netdev, rx->fd);
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
index c7c515f70700..a89eb36b5e3c 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -116,6 +116,8 @@  struct netdev_linux {
 
     bool use_need_wakeup;
     bool requested_need_wakeup;
+    bool use_intr;
+    bool requested_use_intr;
 
     struct netdev_afxdp_tx_lock *tx_locks;  /* Array of locks for TX queues. */
 #endif
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 4a74ed3ef24e..046018179578 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3186,6 +3186,18 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         </p>
       </column>
 
+      <column name="options" key="use-intr"
+              type='{"type": "boolean"}'>
+        <p>
+          Specifies whether to use interrupt mode feature in afxdp netdev.
+          If enabled, the pmd thread calls poll() syscall on the RX path.
+          If there is no packet at rx queue, the pmd is blocked at poll()
+          until it is timeout or new packets arrive. This saves CPU cycles
+          when packet arrival rate is low.
+          Defaults to false.
+        </p>
+      </column>
+
       <column name="options" key="vhost-server-path"
               type='{"type": "string"}'>
         <p>