From patchwork Fri Aug 20 19:00:32 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tim Gardner <tim.gardner@canonical.com>
X-Patchwork-Id: 1519154
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com
 (client-ip=91.189.94.19; helo=huckleberry.canonical.com;
 envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dkim=fail reason="signature verification failed" (2048-bit key;
 unprotected) header.d=canonical.com header.i=@canonical.com
 header.a=rsa-sha256 header.s=20210705 header.b=NPUevSAA;
	dkim-atps=neutral
Received: from huckleberry.canonical.com (huckleberry.canonical.com
 [91.189.94.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4GrrZH2DKTz9sX2;
	Sat, 21 Aug 2021 05:00:55 +1000 (AEST)
Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com)
	by huckleberry.canonical.com with esmtp (Exim 4.86_2)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1mH9l5-0003pE-Je; Fri, 20 Aug 2021 19:00:51 +0000
Received: from smtp-relay-internal-0.internal ([10.131.114.225]
 helo=smtp-relay-internal-0.canonical.com)
 by huckleberry.canonical.com with esmtps
 (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2)
 (envelope-from <tim.gardner@canonical.com>) id 1mH9ky-0003nE-2A
 for kernel-team@lists.ubuntu.com; Fri, 20 Aug 2021 19:00:44 +0000
Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com
 [209.85.215.198])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
 (No client certificate requested)
 by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id BF1554079A
 for <kernel-team@lists.ubuntu.com>; Fri, 20 Aug 2021 19:00:43 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com;
 s=20210705; t=1629486043;
 bh=uQl8vdPfNFYpcYLzwxfgS1CaH9XmTGjfp/Rx9582tHA=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=NPUevSAACT2PVAOtOIM2mA9O/baE/gUnjaybN1P51gS1Nru3ks8r37nTryMbGH7dc
 HhP1hdqzxR2fHuQj0FYcVfRhkSXtbF57dtQjDKj/JuhwZRma4EOhWBxuF//NHizEva
 4qkTcwIOIwb+pWX2nuVjeZ//CqXD9otUrTapdgVCxVBzGQeQfCdZSyK5Ah4GrbJWm1
 GEub+REM1qqoq7VXsqh5gjEsYItuy2B3cUzlimVXuAiBmacBwt3Oux4PKZEsSgvjC3
 SlUY9JZo6NxZsCtbPk1JbhTy9VGTE07MQKXhFSsgAPboESC/OjotIggPmiwHrMvauF
 T5pVditSbOivg==
Received: by mail-pg1-f198.google.com with SMTP id
 h36-20020a6353240000b0290233de51954bso5789159pgb.17
 for <kernel-team@lists.ubuntu.com>; Fri, 20 Aug 2021 12:00:43 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=uQl8vdPfNFYpcYLzwxfgS1CaH9XmTGjfp/Rx9582tHA=;
 b=G01gDy9VfeekKpA5j1sWEOfZdu5ipPlg7fONWLpsME7LN25mwGmd8LzFqmsSETwSpQ
 V8SaWXw9KranETZ2wpgF7KWpMpESyKrcl5cepjiA5bQ6A2LxVr6REx08zhaewy+T4LtH
 DAICHZG6ydZW4GK+0YG8DITLxwL3zmMDV6AoA+f4W3c88Hgxsb7hLTyYcqcnPf3Camyg
 HR9KLuKE1TgBN6grmJR5gaP5fdo1OYIYpwangO+PHNd7Fqn4IGOkTQb8oSFoqebWKe1z
 gclR64V1AyXKJ1YynVsVj/IwcoYib1hwbrFY0Qs2432E9mtWQE4j4Cxkudj1seI36Nys
 biQA==
X-Gm-Message-State: AOAM532rEWdXxcU7TZm4o2Lw3BFnVEjmX9Dvg9A356aweKaQmW9iA26W
 3YC2QZ2WL2V6w9BHLn/2VMKU8oaJvkGllyDrEHN7ADYg4LFyq3F1zlX6JrC2Ltd8czMu6k0OI43
 haxO3TfEejy3uCODYul5wCDJSnGR00wlOLNvuBLg5Tw==
X-Received: by 2002:a17:90b:154:: with SMTP id
 em20mr6067841pjb.92.1629486041443;
 Fri, 20 Aug 2021 12:00:41 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJxhNsoUDbfYoCwiGdt0uYOSsbimURxvfy2jH/DF2cHaxvlMv70ow2VsaVPUrlOCB3/v0FsMeQ==
X-Received: by 2002:a17:90b:154:: with SMTP id
 em20mr6067815pjb.92.1629486041106;
 Fri, 20 Aug 2021 12:00:41 -0700 (PDT)
Received: from localhost.localdomain ([69.163.84.166])
 by smtp.gmail.com with ESMTPSA id j68sm9400228pgc.44.2021.08.20.12.00.40
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 20 Aug 2021 12:00:40 -0700 (PDT)
From: Tim Gardner <tim.gardner@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [PATCH][hirsute/linux-azure] Drivers: hv: vmbus: Fix duplicate CPU
 assignments within a device
Date: Fri, 20 Aug 2021 13:00:32 -0600
Message-Id: <20210820190032.8956-3-tim.gardner@canonical.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20210820190032.8956-1-tim.gardner@canonical.com>
References: <20210820190032.8956-1-tim.gardner@canonical.com>
MIME-Version: 1.0
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

From: Haiyang Zhang <haiyangz@microsoft.com>

BugLink: https://bugs.launchpad.net/bugs/1937078

The vmbus module uses a rotational algorithm to assign target CPUs to
a device's channels. Depending on the timing of different device's channel
offers, different channels of a device may be assigned to the same CPU.

For example on a VM with 2 CPUs, if NIC A and B's channels are offered
in the following order, NIC A will have both channels on CPU0, and
NIC B will have both channels on CPU1 -- see below. This kind of
assignment causes RSS load that is spreading across different channels
to end up on the same CPU.

Timing of channel offers:
NIC A channel 0
NIC B channel 0
NIC A channel 1
NIC B channel 1

VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
        Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd}
        Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd
        Rel_ID=14, target_cpu=0
        Rel_ID=17, target_cpu=0

VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
        Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea}
        Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea
        Rel_ID=16, target_cpu=1
        Rel_ID=18, target_cpu=1

Update the vmbus CPU assignment algorithm to avoid duplicate CPU
assignments within a device.

The new algorithm iterates num_online_cpus + 1 times.
The existing rotational algorithm to find "next NUMA & CPU" is still here.
But if the resulting CPU is already used by the same device, it will try
the next CPU.
In the last iteration, it assigns the channel to the next available CPU
like the existing algorithm. This is not normally expected, because
during device probe, we limit the number of channels of a device to
be <= number of online CPUs.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit 7c9ff3deeee61b253715dcf968a6307af148c9b2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
 drivers/hv/channel_mgmt.c | 96 ++++++++++++++++++++++++++-------------
 1 file changed, 64 insertions(+), 32 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index fb4ded2885f3..d75a43f8eb87 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -575,6 +575,17 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 	 */
 	mutex_lock(&vmbus_connection.channel_mutex);
 
+	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		if (guid_equal(&channel->offermsg.offer.if_type,
+			       &newchannel->offermsg.offer.if_type) &&
+		    guid_equal(&channel->offermsg.offer.if_instance,
+			       &newchannel->offermsg.offer.if_instance)) {
+			fnew = false;
+			newchannel->primary_channel = channel;
+			break;
+		}
+	}
+
 	init_vp_index(newchannel);
 
 	/* Remember the channels that should be cleaned up upon suspend. */
@@ -587,16 +598,6 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 	 */
 	atomic_dec(&vmbus_connection.offer_in_progress);
 
-	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
-		if (guid_equal(&channel->offermsg.offer.if_type,
-			       &newchannel->offermsg.offer.if_type) &&
-		    guid_equal(&channel->offermsg.offer.if_instance,
-			       &newchannel->offermsg.offer.if_instance)) {
-			fnew = false;
-			break;
-		}
-	}
-
 	if (fnew) {
 		list_add_tail(&newchannel->listentry,
 			      &vmbus_connection.chn_list);
@@ -617,7 +618,6 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 		/*
 		 * Process the sub-channel.
 		 */
-		newchannel->primary_channel = channel;
 		list_add_tail(&newchannel->sc_list, &channel->sc_list);
 	}
 
@@ -653,6 +653,30 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 	queue_work(wq, &newchannel->add_channel_work);
 }
 
+/*
+ * Check if CPUs used by other channels of the same device.
+ * It should only be called by init_vp_index().
+ */
+static bool hv_cpuself_used(u32 cpu, struct vmbus_channel *chn)
+{
+	struct vmbus_channel *primary = chn->primary_channel;
+	struct vmbus_channel *sc;
+
+	lockdep_assert_held(&vmbus_connection.channel_mutex);
+
+	if (!primary)
+		return false;
+
+	if (primary->target_cpu == cpu)
+		return true;
+
+	list_for_each_entry(sc, &primary->sc_list, sc_list)
+		if (sc != chn && sc->target_cpu == cpu)
+			return true;
+
+	return false;
+}
+
 /*
  * We use this state to statically distribute the channel interrupt load.
  */
@@ -672,6 +696,7 @@ static int next_numa_node_id;
 static void init_vp_index(struct vmbus_channel *channel)
 {
 	bool perf_chn = hv_is_perf_channel(channel);
+	u32 i, ncpu = num_online_cpus();
 	cpumask_var_t available_mask;
 	struct cpumask *alloced_mask;
 	u32 target_cpu;
@@ -694,31 +719,38 @@ static void init_vp_index(struct vmbus_channel *channel)
 		return;
 	}
 
-	while (true) {
-		numa_node = next_numa_node_id++;
-		if (numa_node == nr_node_ids) {
-			next_numa_node_id = 0;
-			continue;
+	for (i = 1; i <= ncpu + 1; i++) {
+		while (true) {
+			numa_node = next_numa_node_id++;
+			if (numa_node == nr_node_ids) {
+				next_numa_node_id = 0;
+				continue;
+			}
+			if (cpumask_empty(cpumask_of_node(numa_node)))
+				continue;
+			break;
 		}
-		if (cpumask_empty(cpumask_of_node(numa_node)))
-			continue;
-		break;
-	}
-	alloced_mask = &hv_context.hv_numa_map[numa_node];
+		alloced_mask = &hv_context.hv_numa_map[numa_node];
 
-	if (cpumask_weight(alloced_mask) ==
-	    cpumask_weight(cpumask_of_node(numa_node))) {
-		/*
-		 * We have cycled through all the CPUs in the node;
-		 * reset the alloced map.
-		 */
-		cpumask_clear(alloced_mask);
-	}
+		if (cpumask_weight(alloced_mask) ==
+		    cpumask_weight(cpumask_of_node(numa_node))) {
+			/*
+			 * We have cycled through all the CPUs in the node;
+			 * reset the alloced map.
+			 */
+			cpumask_clear(alloced_mask);
+		}
+
+		cpumask_xor(available_mask, alloced_mask,
+			    cpumask_of_node(numa_node));
 
-	cpumask_xor(available_mask, alloced_mask, cpumask_of_node(numa_node));
+		target_cpu = cpumask_first(available_mask);
+		cpumask_set_cpu(target_cpu, alloced_mask);
 
-	target_cpu = cpumask_first(available_mask);
-	cpumask_set_cpu(target_cpu, alloced_mask);
+		if (channel->offermsg.offer.sub_channel_index >= ncpu ||
+		    i > ncpu || !hv_cpuself_used(target_cpu, channel))
+			break;
+	}
 
 	channel->target_cpu = target_cpu;