From patchwork Fri Oct 23 03:29:42 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josh Don <joshdon@google.com>
X-Patchwork-Id: 1386491
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org
 (client-ip=23.128.96.18; helo=vger.kernel.org;
 envelope-from=netdev-owner@vger.kernel.org; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256
 header.s=20161025 header.b=i4wG/gwo;
	dkim-atps=neutral
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by ozlabs.org (Postfix) with ESMTP id 4CHV9g6YqWz9sSW
	for <patchwork-incoming-netdev@ozlabs.org>;
 Fri, 23 Oct 2020 14:30:31 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S374802AbgJWDaQ (ORCPT
        <rfc822;patchwork-incoming-netdev@ozlabs.org>);
        Thu, 22 Oct 2020 23:30:16 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40680 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S368814AbgJWDaQ (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 22 Oct 2020 23:30:16 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B443C0613CE
        for <netdev@vger.kernel.org>; Thu, 22 Oct 2020 20:30:16 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id w4so136488plp.17
        for <netdev@vger.kernel.org>; Thu, 22 Oct 2020 20:30:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=sender:date:message-id:mime-version:subject:from:to:cc;
        bh=1nY3WP57swg6MAG4MDM8yKEtar1GITDnzwO2HHJz44U=;
        b=i4wG/gwoPmGRZ6Mpz8QxDdp67CljbBwZPYy9aiHWtHbg4Yjk2zTjT5HniN2uRl+Qxv
         gRv7AZgHdFg7vtaVzteuxIsZ/JmDzxnbuP6QEPQaJ9wOrJzESY9tyUgJ1m1pBv21m8KJ
         hgT/1Ff73GDCmmnlabp4ozlnhXfW2f2muaShGzYcrkSkxd39qeaafQvRriM73XFnEuTY
         Jcj50rqDVJPXGgZGvLAUuVxwweSabtVnQxF6wPoKxc/BsejfXtcXgHUaOghv3xYhkdnr
         OsH4XdW+IsmDTCun6KJ5XbRjdwMI/OaApRCKkvVa5ZK3l3OfQKTuBJHfqLChyZMhXTlL
         ShPA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:date:message-id:mime-version:subject:from
         :to:cc;
        bh=1nY3WP57swg6MAG4MDM8yKEtar1GITDnzwO2HHJz44U=;
        b=OlFq6WGxUvvRV4rso892cAVQxffZadXuZ8Uw8DFdO6rENH+Ha0YkUcbCL93rKI3rTi
         qw0paOfqvdOgwAICWO3k70A1m1SB1j/S/QU/FjAuySu/QI8z20lcryejnAs4b2tn7BSm
         4mU6y/VKzQTk7aW7RdcHIR2Cw5WQDpkbaUx/TiAUYpy4Qjlt9JNCweOEJqtejE2+1HYY
         a24b+oREeCtG6PLoMbmaDCyba4CJ9/h5PCLkWRh6QTjmsv9pan/1bxOzOEhrLJ10DzFr
         X+G6Q36tMOYNPtf5Ro2A2aDdgJaeucn/o3lylUCxk0aNrN/SJxu4fWhy14iv8vF97t35
         wYmQ==
X-Gm-Message-State: AOAM530rc1Oy9GNbW1KCQX8oSuyzi86dBKXKr9yfTL7kW/093Oy/FqFS
        YBySxguoEAtoi7FtbHcbR+RZRDBYIclJ
X-Google-Smtp-Source: 
 ABdhPJzQxmKmCLQEWloUtqGTBdjOj0/OiTQmKmDlMAVk1e0OUR2bLkyYPhuESGhvRCGjWYhVdmZxT/FuxztA
Sender: "joshdon via sendgmr" <joshdon@joshdon.svl.corp.google.com>
X-Received: from joshdon.svl.corp.google.com
 ([2620:15c:2cd:202:a28c:fdff:fee1:cc86])
 (user=joshdon job=sendgmr) by 2002:a05:6a00:8c5:b029:142:2501:39e6 with SMTP
 id s5-20020a056a0008c5b0290142250139e6mr184421pfu.53.1603423815689; Thu, 22
 Oct 2020 20:30:15 -0700 (PDT)
Date: Thu, 22 Oct 2020 20:29:42 -0700
Message-Id: <20201023032944.399861-1-joshdon@google.com>
Mime-Version: 1.0
X-Mailer: git-send-email 2.29.0.rc1.297.gfa9743e501-goog
Subject: [PATCH 1/3] sched: better handling for busy polling loops
From: Josh Don <joshdon@google.com>
To: Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        "David S. Miller" <davem@davemloft.net>,
        Jakub Kicinski <kuba@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Eric Dumazet <edumazet@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        kvm@vger.kernel.org, Josh Don <joshdon@google.com>,
        Xi Wang <xii@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Busy polling loops in the kernel such as network socket poll and kvm
halt polling have performance problems related to process scheduler load
accounting.

Both of the busy polling examples are opportunistic - they relinquish
the cpu if another thread is ready to run. This design, however, doesn't
extend to multiprocessor load balancing very well. The scheduler still
sees the busy polling cpu as 100% busy and will be less likely to put
another thread on that cpu. In other words, if all cores are 100%
utilized and some of them are running real workloads and some others are
running busy polling loops, newly woken up threads will not prefer the
busy polling cpus. System wide throughput and latency may suffer.

This change allows the scheduler to detect busy polling cpus in order to
allow them to be more frequently considered for wake up balancing.

This change also disables preemption for the duration of the busy
polling loop. This is important, as it ensures that if a polling thread
decides to end its poll to relinquish cpu to another thread, the polling
thread will actually exit the busy loop and potentially block. When it
later becomes runnable, it will have the opportunity to find an idle cpu
via wakeup cpu selection.

Suggested-by: Xi Wang <xii@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Xi Wang <xii@google.com>
---
 include/linux/sched.h |  5 +++
 kernel/sched/core.c   | 94 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c   | 25 ++++++++----
 kernel/sched/sched.h  |  2 +
 4 files changed, 119 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e232935..80ef477e5a87 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1651,6 +1651,7 @@ extern int can_nice(const struct task_struct *p, const int nice);
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
 extern int available_idle_cpu(int cpu);
+extern int polling_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
 extern void sched_set_fifo(struct task_struct *p);
@@ -2048,4 +2049,8 @@ int sched_trace_rq_nr_running(struct rq *rq);
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
 
+extern void prepare_to_busy_poll(void);
+extern int continue_busy_poll(void);
+extern void end_busy_poll(bool allow_resched);
+
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d95dc3f4644..2783191d0bd4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5107,6 +5107,24 @@ int available_idle_cpu(int cpu)
 	return 1;
 }
 
+/**
+ * polling_cpu - is a given CPU currently running a thread in a busy polling
+ * loop that could be preempted if a new thread were to be scheduled?
+ * @cpu: the CPU in question.
+ *
+ * Return: 1 if the CPU is currently polling. 0 otherwise.
+ */
+int polling_cpu(int cpu)
+{
+#ifdef CONFIG_SMP
+	struct rq *rq = cpu_rq(cpu);
+
+	return unlikely(rq->busy_polling);
+#else
+	return 0;
+#endif
+}
+
 /**
  * idle_task - return the idle task for a given CPU.
  * @cpu: the processor in question.
@@ -7191,6 +7209,7 @@ void __init sched_init(void)
 
 		rq_csd_init(rq, &rq->nohz_csd, nohz_csd_func);
 #endif
+		rq->busy_polling = 0;
 #endif /* CONFIG_SMP */
 		hrtick_rq_init(rq);
 		atomic_set(&rq->nr_iowait, 0);
@@ -7417,6 +7436,81 @@ void ia64_set_curr_task(int cpu, struct task_struct *p)
 
 #endif
 
+/*
+ * Calling this function before entering a preemptible busy polling loop will
+ * help the scheduler make better load balancing decisions. Wake up balance
+ * will treat the polling cpu as idle.
+ *
+ * Preemption is disabled inside this function and re-enabled in
+ * end_busy_poll(), thus the polling loop must periodically check
+ * continue_busy_poll().
+ *
+ * REQUIRES: prepare_to_busy_poll(), continue_busy_poll(), and end_busy_poll()
+ * must be used together.
+ */
+void prepare_to_busy_poll(void)
+{
+	struct rq __maybe_unused *rq = this_rq();
+	unsigned long __maybe_unused flags;
+
+	/* Preemption will be reenabled by end_busy_poll() */
+	preempt_disable();
+
+#ifdef CONFIG_SMP
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	/* preemption disabled; only one thread can poll at a time */
+	WARN_ON_ONCE(rq->busy_polling);
+	rq->busy_polling++;
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+#endif
+}
+EXPORT_SYMBOL(prepare_to_busy_poll);
+
+int continue_busy_poll(void)
+{
+	if (!single_task_running())
+		return 0;
+
+	/* Important that we check this, since preemption is disabled */
+	if (need_resched())
+		return 0;
+
+	return 1;
+}
+EXPORT_SYMBOL(continue_busy_poll);
+
+/*
+ * Restore any state modified by prepare_to_busy_poll(), including re-enabling
+ * preemption.
+ *
+ * @allow_resched: If true, this potentially calls schedule() as part of
+ * enabling preemption. A busy poll loop can use false in order to have an
+ * opportunity to block before rescheduling.
+ */
+void end_busy_poll(bool allow_resched)
+{
+#ifdef CONFIG_SMP
+	struct rq *rq = this_rq();
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	BUG_ON(!rq->busy_polling); /* not paired with prepare() */
+	rq->busy_polling--;
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+#endif
+
+	/*
+	 * preemption needs to be kept disabled between prepare_to_busy_poll()
+	 * and end_busy_poll().
+	 */
+	BUG_ON(preemptible());
+	if (allow_resched)
+		preempt_enable();
+	else
+		preempt_enable_no_resched();
+}
+EXPORT_SYMBOL(end_busy_poll);
+
 #ifdef CONFIG_CGROUP_SCHED
 /* task_group_lock serializes the addition/removal of task groups */
 static DEFINE_SPINLOCK(task_group_lock);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1a68a0536add..58e525c74cc6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5460,6 +5460,11 @@ static int sched_idle_cpu(int cpu)
 {
 	return sched_idle_rq(cpu_rq(cpu));
 }
+
+static int sched_idle_or_polling_cpu(int cpu)
+{
+	return sched_idle_cpu(cpu) || polling_cpu(cpu);
+}
 #endif
 
 /*
@@ -5880,6 +5885,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 	u64 latest_idle_timestamp = 0;
 	int least_loaded_cpu = this_cpu;
 	int shallowest_idle_cpu = -1;
+	int found_polling = 0;
 	int i;
 
 	/* Check if we have any choice: */
@@ -5914,10 +5920,14 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 				shallowest_idle_cpu = i;
 			}
 		} else if (shallowest_idle_cpu == -1) {
+			int polling = polling_cpu(i);
+
 			load = cpu_load(cpu_rq(i));
-			if (load < min_load) {
+			if ((polling == found_polling && load < min_load) ||
+			    (polling && !found_polling)) {
 				min_load = load;
 				least_loaded_cpu = i;
+				found_polling = polling;
 			}
 		}
 	}
@@ -6085,7 +6095,7 @@ static int select_idle_smt(struct task_struct *p, int target)
 	for_each_cpu(cpu, cpu_smt_mask(target)) {
 		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 			continue;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+		if (available_idle_cpu(cpu) || sched_idle_or_polling_cpu(cpu))
 			return cpu;
 	}
 
@@ -6149,7 +6159,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (!--nr)
 			return -1;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+		if (available_idle_cpu(cpu) || sched_idle_or_polling_cpu(cpu))
 			break;
 	}
 
@@ -6179,7 +6189,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
 	for_each_cpu_wrap(cpu, cpus, target) {
 		unsigned long cpu_cap = capacity_of(cpu);
 
-		if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
+		if (!available_idle_cpu(cpu) && !sched_idle_or_polling_cpu(cpu))
 			continue;
 		if (task_fits_capacity(p, cpu_cap))
 			return cpu;
@@ -6223,14 +6233,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	}
 
 symmetric:
-	if (available_idle_cpu(target) || sched_idle_cpu(target))
+	if (available_idle_cpu(target) || sched_idle_or_polling_cpu(target))
 		return target;
 
 	/*
 	 * If the previous CPU is cache affine and idle, don't be stupid:
 	 */
 	if (prev != target && cpus_share_cache(prev, target) &&
-	    (available_idle_cpu(prev) || sched_idle_cpu(prev)))
+	    (available_idle_cpu(prev) || sched_idle_or_polling_cpu(prev)))
 		return prev;
 
 	/*
@@ -6252,7 +6262,8 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (recent_used_cpu != prev &&
 	    recent_used_cpu != target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
-	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
+	    (available_idle_cpu(recent_used_cpu) ||
+	     sched_idle_or_polling_cpu(recent_used_cpu)) &&
 	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
 		/*
 		 * Replace recent_used_cpu with prev as it is a potential
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 28709f6b0975..45de468d0ffb 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1003,6 +1003,8 @@ struct rq {
 
 	/* This is used to determine avg_idle's max value */
 	u64			max_idle_balance_cost;
+
+	unsigned int		busy_polling;
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING

From patchwork Fri Oct 23 03:29:43 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josh Don <joshdon@google.com>
X-Patchwork-Id: 1386492
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org
 (client-ip=23.128.96.18; helo=vger.kernel.org;
 envelope-from=netdev-owner@vger.kernel.org; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256
 header.s=20161025 header.b=tz+FVd1p;
	dkim-atps=neutral
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by ozlabs.org (Postfix) with ESMTP id 4CHV9l5wxmz9sTR
	for <patchwork-incoming-netdev@ozlabs.org>;
 Fri, 23 Oct 2020 14:30:35 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S374847AbgJWDaX (ORCPT
        <rfc822;patchwork-incoming-netdev@ozlabs.org>);
        Thu, 22 Oct 2020 23:30:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40706 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S374835AbgJWDaW (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 22 Oct 2020 23:30:22 -0400
Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com
 [IPv6:2607:f8b0:4864:20::f49])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFBD6C0613CF
        for <netdev@vger.kernel.org>; Thu, 22 Oct 2020 20:30:22 -0700 (PDT)
Received: by mail-qv1-xf49.google.com with SMTP id d16so115302qvy.16
        for <netdev@vger.kernel.org>; Thu, 22 Oct 2020 20:30:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=sender:date:in-reply-to:message-id:mime-version:references:subject
         :from:to:cc;
        bh=q84jJ3bGQh9hixZvKUeeIGN3DtugPKTamoF/dyrHLFI=;
        b=tz+FVd1pN36tw1bBQmvqr+Y2n0MIU+W7R+izbV8oJ8fgeAKPZoplKAvJcLYPYctDkN
         FiltZT50WLAaR4aspJVHloJjO6Yw1L1Qo2/gY707fc0byoSUHUFN4+yUOfmvLN7b7BLf
         dFvtLW5stbWzfFZn3avsXzwdQ5dlSZJ8ADvMW8xLX8kg/cMUonSvv0jYUXn/Y7Ax8PvG
         3hAzTfDOVNGkgm+apl6pysVZC+xovU7QLo3krg1eUDk/srcG7SOLFJaiJleL6ahNbCxA
         DCkMacJ9RvSY6jYKBxMbCsMGmthYVsXsNzmhdmF9SZIZlLoeQ+5G6/o8OcyIVvsEkQaw
         U9Uw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=q84jJ3bGQh9hixZvKUeeIGN3DtugPKTamoF/dyrHLFI=;
        b=ItmNai/SLupHB8sfKnkHIiAK2O2TcGRhpT4gmYimn7NpceEPoHkCst0UCnr3CrOHd6
         muBxmU267Fw0OIdZN2FXaqoB9yUQNmnWJ6/vQwVpwpsGC4WkchWUGR0cIl2cyhUZ2z2z
         G46am/LTKEI5v4e5alYVR2/zs0ViTfnMuAv0Cn6dASQBh6krTmnNsolYJKufV5epnQq2
         FtcAAO1SzxhbSYA0nNj+wOo91vIQ4FH+vEmRV4h7IqwYD2LpaCJBIJV4fWDZIMebRx04
         xuc2hpyhaWLOjufc/snPdF1Dz5D+Q5Wkwwj5ESMbOFmU4OgSfznGAfhKIAPvrqb0e72c
         IgXQ==
X-Gm-Message-State: AOAM532x5IbuMnga9uZgXeR/1iaXbp7loF/4rSN+Uzty6nnK76jNLcLI
        6AroVIIHizFGh/75o2M/6jeUOD+Ock2L
X-Google-Smtp-Source: 
 ABdhPJxbtFDCbtjPjdxOYTFaImtDm6yGGHc8tKKsm9bUQfUsi1wM/ldih7BglY7z7cH9xMS1lGGikQ0vDTOY
Sender: "joshdon via sendgmr" <joshdon@joshdon.svl.corp.google.com>
X-Received: from joshdon.svl.corp.google.com
 ([2620:15c:2cd:202:a28c:fdff:fee1:cc86])
 (user=joshdon job=sendgmr) by 2002:a05:6214:122a:: with SMTP id
 p10mr488108qvv.0.1603423821912; Thu, 22 Oct 2020 20:30:21 -0700 (PDT)
Date: Thu, 22 Oct 2020 20:29:43 -0700
In-Reply-To: <20201023032944.399861-1-joshdon@google.com>
Message-Id: <20201023032944.399861-2-joshdon@google.com>
Mime-Version: 1.0
References: <20201023032944.399861-1-joshdon@google.com>
X-Mailer: git-send-email 2.29.0.rc1.297.gfa9743e501-goog
Subject: [PATCH 2/3] kvm: better handling for kvm halt polling
From: Josh Don <joshdon@google.com>
To: Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        "David S. Miller" <davem@davemloft.net>,
        Jakub Kicinski <kuba@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Eric Dumazet <edumazet@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        kvm@vger.kernel.org, Josh Don <joshdon@google.com>,
        Xi Wang <xii@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Add the new functions prepare_to_busy_poll() and friends to
kvm_vcpu_block. The busy polling cpu will be considered an
idle target during wake up balancing.

cpu_relax is also added to the polling loop to improve the performance
of other hw threads sharing the busy polling core.

Suggested-by: Xi Wang <xii@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Xi Wang <xii@google.com>
---
 virt/kvm/kvm_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cf88233b819a..8f818f0fc979 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2772,7 +2772,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
 
 		++vcpu->stat.halt_attempted_poll;
+		prepare_to_busy_poll(); /* also disables preemption */
 		do {
+			cpu_relax();
 			/*
 			 * This sets KVM_REQ_UNHALT if an interrupt
 			 * arrives.
@@ -2781,10 +2783,12 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 				++vcpu->stat.halt_successful_poll;
 				if (!vcpu_valid_wakeup(vcpu))
 					++vcpu->stat.halt_poll_invalid;
+				end_busy_poll(false);
 				goto out;
 			}
 			poll_end = cur = ktime_get();
-		} while (single_task_running() && ktime_before(cur, stop));
+		} while (continue_busy_poll() && ktime_before(cur, stop));
+		end_busy_poll(false);
 	}
 
 	prepare_to_rcuwait(&vcpu->wait);

From patchwork Fri Oct 23 03:29:44 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josh Don <joshdon@google.com>
X-Patchwork-Id: 1386493
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org
 (client-ip=23.128.96.18; helo=vger.kernel.org;
 envelope-from=netdev-owner@vger.kernel.org; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256
 header.s=20161025 header.b=iVgGGpv/;
	dkim-atps=neutral
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by ozlabs.org (Postfix) with ESMTP id 4CHV9p17VCz9sTR
	for <patchwork-incoming-netdev@ozlabs.org>;
 Fri, 23 Oct 2020 14:30:38 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S374868AbgJWDac (ORCPT
        <rfc822;patchwork-incoming-netdev@ozlabs.org>);
        Thu, 22 Oct 2020 23:30:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40728 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S374856AbgJWDa3 (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 22 Oct 2020 23:30:29 -0400
Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com
 [IPv6:2607:f8b0:4864:20::849])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0405C0613D4
        for <netdev@vger.kernel.org>; Thu, 22 Oct 2020 20:30:27 -0700 (PDT)
Received: by mail-qt1-x849.google.com with SMTP id r4so9215qta.9
        for <netdev@vger.kernel.org>; Thu, 22 Oct 2020 20:30:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=sender:date:in-reply-to:message-id:mime-version:references:subject
         :from:to:cc;
        bh=6HFlEVpudWH5pffnpQkc+Yc/oxstkbax6tUIXieA0lk=;
        b=iVgGGpv/7FVDQ3pQoJo5ENDkVCIPPnAb2W/P85Q5YObYQ2qOZxGF5aIUBugdD4WxjU
         Z5RiS70Guwrf2/je+hqDDOaMglLC2zunZShamfYLdW7/LeCKyl0Jm+323jMv/JP4bdLn
         Cnc1mlvlswebVTlRL7llKF7A6RsRxb5HywKIzg2GugnXeaGy2yJebCLgsW4mT3HNGT67
         Q9Ykutn29AadOxcTFqlJfponLBNNUotM0TG5bkKPWwVkNvEfLOzYa4mI5lFBvayOC1b3
         zP5wOtjf8ftz3PMQHQr6qhUXN6Jm0YYw/T2c3+BrXI0d1YjElTouA69Vo9i4xhiZB1ok
         Vx0Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=6HFlEVpudWH5pffnpQkc+Yc/oxstkbax6tUIXieA0lk=;
        b=S3SvhxZ+qiQrqAYjHJM+HWAtCc/FKW7IRNHN9DNMPp/+8/nuDemHxb+r2dw1G8sG4m
         hrTqL8EAojv5yBXqMAJKxUtU3xJQHRQhV3ZYdYrUUxzMiiU64YMSjN3ts/eJmDuuEzLE
         eplExfnmVRTuw2Zo8+340HryV1ZPAo96XLa67hbR6WjzCw+53DHOP0YEkWK9PogI/8Et
         Fz/AWw8X46PSYIWfxP84UAqAPHim33yvUwQekH0jiDrSDa3Vah8SfBTljuIeAz095TjN
         XQmo3n1Wd2Mncz8JPF9aP5cGzppSlBre1VuRBYY3MHGeftrWqOKkZMgKdZkkGLLqHsSb
         n34g==
X-Gm-Message-State: AOAM5339nb7qaszreUGCmCfma9WYyZceQSib9Cb7rwgDhS+lqSM4Th9k
        b/DimdkRxQTMf9LLPE9y1mS9O+CgZ7+E
X-Google-Smtp-Source: 
 ABdhPJywFZ9nl2zxccBBMSoKLsYKnTzcd6iRKtoGpdGnB7NFOH56JsDER6SvXGz5MrN+E2VlfKYlk66r6HEj
Sender: "joshdon via sendgmr" <joshdon@joshdon.svl.corp.google.com>
X-Received: from joshdon.svl.corp.google.com
 ([2620:15c:2cd:202:a28c:fdff:fee1:cc86])
 (user=joshdon job=sendgmr) by 2002:ad4:43ca:: with SMTP id
 o10mr376677qvs.33.1603423826790; Thu, 22 Oct 2020 20:30:26 -0700 (PDT)
Date: Thu, 22 Oct 2020 20:29:44 -0700
In-Reply-To: <20201023032944.399861-1-joshdon@google.com>
Message-Id: <20201023032944.399861-3-joshdon@google.com>
Mime-Version: 1.0
References: <20201023032944.399861-1-joshdon@google.com>
X-Mailer: git-send-email 2.29.0.rc1.297.gfa9743e501-goog
Subject: [PATCH 3/3] net: better handling for network busy poll
From: Josh Don <joshdon@google.com>
To: Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        "David S. Miller" <davem@davemloft.net>,
        Jakub Kicinski <kuba@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Eric Dumazet <edumazet@google.com>,
        linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
        kvm@vger.kernel.org, Josh Don <joshdon@google.com>,
        Xi Wang <xii@google.com>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Add the new functions prepare_to_busy_poll() and friends to
napi_busy_loop(). The busy polling cpu will be considered an idle
target during wake up balancing.

Suggested-by: Xi Wang <xii@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Xi Wang <xii@google.com>
---
 net/core/dev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 266073e300b5..4fb4ae4b27fc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6476,7 +6476,7 @@ void napi_busy_loop(unsigned int napi_id,
 	if (!napi)
 		goto out;
 
-	preempt_disable();
+	prepare_to_busy_poll(); /* disables preemption */
 	for (;;) {
 		int work = 0;
 
@@ -6509,10 +6509,10 @@ void napi_busy_loop(unsigned int napi_id,
 		if (!loop_end || loop_end(loop_end_arg, start_time))
 			break;
 
-		if (unlikely(need_resched())) {
+		if (unlikely(!continue_busy_poll())) {
 			if (napi_poll)
 				busy_poll_stop(napi, have_poll_lock);
-			preempt_enable();
+			end_busy_poll(true);
 			rcu_read_unlock();
 			cond_resched();
 			if (loop_end(loop_end_arg, start_time))
@@ -6523,7 +6523,7 @@ void napi_busy_loop(unsigned int napi_id,
 	}
 	if (napi_poll)
 		busy_poll_stop(napi, have_poll_lock);
-	preempt_enable();
+	end_busy_poll(true);
 out:
 	rcu_read_unlock();
 }