From patchwork Mon Nov 1 14:20:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Gardner X-Patchwork-Id: 1549180 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=aANM1SmM; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HjZtw3WNwz9sS8 for ; Tue, 2 Nov 2021 01:20:23 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1mhYAZ-0005YG-Gf; Mon, 01 Nov 2021 14:20:15 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1mhYAX-0005Y3-79 for kernel-team@lists.ubuntu.com; Mon, 01 Nov 2021 14:20:13 +0000 Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 325EA3F166 for ; Mon, 1 Nov 2021 14:20:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1635776412; bh=ym6afMn5Pd/Cp7DJV6ZfrE+iqGCTRh0Hb4XUUGEmpx0=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=aANM1SmMP493ggRJhZxhAIUAfZdJo8rjMP0f8/6J+ICpZVq1z3+isKfUirPsfFsZM CAfcqwy+Gb7y6wM2sBJ8jS4RkFsHgVlPUikzbAl8y+Lw+4OhhTvyqmMA1y1IpFQ24G 0DlMBOB+omO0WxHQoV6R+qdm9kOR/PW2bATfipv4UPjP37gvKsRflvHbMhqGYOwDr8 EBTlVIg6Z6AIgEAr2jfYpronFn6iUyXLYfwN8hS/Sl01uRsJaZpLfn/rpBNY0ojZfQ zpthkYAZ3h30uYTcq1kqqcAPFddZGqCmR5pGkNpKThLvK8eb4C/18SfS3UoUDaCkra VjOyrhWSqvgLw== Received: by mail-pl1-f200.google.com with SMTP id m5-20020a170902bb8500b0013a2b785187so5994299pls.11 for ; Mon, 01 Nov 2021 07:20:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ym6afMn5Pd/Cp7DJV6ZfrE+iqGCTRh0Hb4XUUGEmpx0=; b=lEXQu1qF2vxPX5bmkUUnezSb8s6rOCyZDrL3mEARjuyA+e8Gu3+K3xHlIttmJWWmf6 nHupJPHkVTEOAIOIJFEXY23aNTauJr8bMaqcRKnDs2K95hfpBuKlShjzzoSF4VOCRa0o DltSbIOc311mmRA61CPBSZNXQX69/lvJl0Scx2MKSKErKOk+TYay6CkzszgXeQP81GoE CWJNROg1nU2/XIeEkb58KyskgEYe8dpxVMXndo4iITxp2ZknBnPvE8LX0ijd+S+XaTp8 XWKcKCgFlwmWOBjGR7wsKMxChICA65fKy+lhXTi5mVIaBDTZty38wcoBRHjBFuMmihO+ kLhg== X-Gm-Message-State: AOAM531t6NCSyvIN5JzVy4U5QVQR306P4oq2etmeHJXscRsp0jFxrYNz y5P6CKX1ub2Jgh2ZBCtIgRcs/HMIKuESOjfg+Uf3BZHtkma/0uXD2LM41CSevn7r6dGXPvf7mqs fv6NKEhRKWFh5XI6YrHgfRQE87Lfcx3Z1WsZo5kOCzQ== X-Received: by 2002:a17:90b:128d:: with SMTP id fw13mr13489204pjb.50.1635776409973; Mon, 01 Nov 2021 07:20:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAcTxLczBn6d16zgIRK/cJhKmhIBIC/yg8azxtHW+5LfSC0topY8d96hbxv0T9DeRr+nkz1g== X-Received: by 2002:a17:90b:128d:: with SMTP id fw13mr13489165pjb.50.1635776409588; Mon, 01 Nov 2021 07:20:09 -0700 (PDT) Received: from localhost.localdomain ([69.163.84.166]) by smtp.gmail.com with ESMTPSA id nn14sm13602031pjb.27.2021.11.01.07.20.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Nov 2021 07:20:09 -0700 (PDT) From: Tim Gardner To: kernel-team@lists.ubuntu.com Subject: [PATCH][focal:linux-azure] Drivers: hv: vmbus: Fix duplicate CPU assignments within a device Date: Mon, 1 Nov 2021 08:20:08 -0600 Message-Id: <20211101142008.18228-1-tim.gardner@canonical.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/1937078 commit 7c9ff3deeee61b253715dcf968a6307af148c9b2 upstream The vmbus module uses a rotational algorithm to assign target CPUs to a device's channels. Depending on the timing of different device's channel offers, different channels of a device may be assigned to the same CPU. For example on a VM with 2 CPUs, if NIC A and B's channels are offered in the following order, NIC A will have both channels on CPU0, and NIC B will have both channels on CPU1 -- see below. This kind of assignment causes RSS load that is spreading across different channels to end up on the same CPU. Timing of channel offers: NIC A channel 0 NIC B channel 0 NIC A channel 1 NIC B channel 1 VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd} Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd Rel_ID=14, target_cpu=0 Rel_ID=17, target_cpu=0 VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea} Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea Rel_ID=16, target_cpu=1 Rel_ID=18, target_cpu=1 Update the vmbus CPU assignment algorithm to avoid duplicate CPU assignments within a device. The new algorithm iterates num_online_cpus + 1 times. The existing rotational algorithm to find "next NUMA & CPU" is still here. But if the resulting CPU is already used by the same device, it will try the next CPU. In the last iteration, it assigns the channel to the next available CPU like the existing algorithm. This is not normally expected, because during device probe, we limit the number of channels of a device to be <= number of online CPUs. Signed-off-by: Haiyang Zhang Reviewed-by: Michael Kelley Tested-by: Michael Kelley Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Wei Liu (backported from commit 7c9ff3deeee61b253715dcf968a6307af148c9b2) Signed-off-by: Tim Gardner [ rtg - backported by Haiyang Zhang and has been verified to fix the bug on A2 and D32 instance types. ] Acked-by: Stefan Bader Acked-by: Kelsey Skunberg --- v2 - v1 of this patch caused boot hangs on certain instance types and was dropped from the SRU cycle (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1937078/comments/6). This backport was developed by the original author who likely has a better handle on things then I do. From SF, "Haiyang backported the patch to azure-5.4-next (attached). He also reproduced the bug on 5.4 kernel with VM size A2 and D32, and verified the patch fixes the bug." --- drivers/hv/channel_mgmt.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 2b5d20965b1bf..ceac42e48f3a2 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -628,6 +628,32 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel) queue_work(wq, &newchannel->add_channel_work); } +/* + * Check if CPU is used by other channels of the same device. + * It should only be called by init_vp_index(). + */ +static bool hv_cpuself_used(u32 cpu, struct vmbus_channel *chn) +{ + struct vmbus_channel *primary = chn->primary_channel; + struct vmbus_channel *sc; + bool self_used = false; + unsigned long flags; + + if (!primary) + return false; + + if (primary->target_cpu == cpu) + return true; + + spin_lock_irqsave(&primary->lock, flags); + list_for_each_entry(sc, &primary->sc_list, sc_list) + if (sc != chn && sc->target_cpu == cpu) + self_used = true; + spin_unlock_irqrestore(&primary->lock, flags); + + return self_used; +} + /* * We use this state to statically distribute the channel interrupt load. */ @@ -654,6 +680,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type) u32 cur_cpu; bool perf_chn = vmbus_devs[dev_type].perf_device; struct vmbus_channel *primary = channel->primary_channel; + u32 cnt = 1, ncpu = num_online_cpus(); int next_node; cpumask_var_t available_mask; struct cpumask *alloced_mask; @@ -676,6 +703,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type) spin_lock(&bind_channel_to_cpu_lock); +retry: /* * Based on the channel affinity policy, we will assign the NUMA * nodes. @@ -755,6 +783,11 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type) } } + if (channel->affinity_policy == HV_BALANCED && + channel->offermsg.offer.sub_channel_index < ncpu && + cnt++ < ncpu + 1 && hv_cpuself_used(cur_cpu, channel)) + goto retry; + channel->target_cpu = cur_cpu; channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);