diff mbox series

[focal:linux-azure] Drivers: hv: vmbus: Fix duplicate CPU assignments within a device

Message ID 20211101142008.18228-1-tim.gardner@canonical.com
State New
Headers show
Series [focal:linux-azure] Drivers: hv: vmbus: Fix duplicate CPU assignments within a device | expand

Commit Message

Tim Gardner Nov. 1, 2021, 2:20 p.m. UTC
From: Haiyang Zhang <haiyangz@microsoft.com>

BugLink: https://bugs.launchpad.net/bugs/1937078

commit 7c9ff3deeee61b253715dcf968a6307af148c9b2 upstream

The vmbus module uses a rotational algorithm to assign target CPUs to
a device's channels. Depending on the timing of different device's channel
offers, different channels of a device may be assigned to the same CPU.

For example on a VM with 2 CPUs, if NIC A and B's channels are offered
in the following order, NIC A will have both channels on CPU0, and
NIC B will have both channels on CPU1 -- see below. This kind of
assignment causes RSS load that is spreading across different channels
to end up on the same CPU.

Timing of channel offers:
NIC A channel 0
NIC B channel 0
NIC A channel 1
NIC B channel 1

VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
        Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd}
        Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd
        Rel_ID=14, target_cpu=0
        Rel_ID=17, target_cpu=0

VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
        Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea}
        Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea
        Rel_ID=16, target_cpu=1
        Rel_ID=18, target_cpu=1

Update the vmbus CPU assignment algorithm to avoid duplicate CPU
assignments within a device.

The new algorithm iterates num_online_cpus + 1 times.
The existing rotational algorithm to find "next NUMA & CPU" is still here.
But if the resulting CPU is already used by the same device, it will try
the next CPU.
In the last iteration, it assigns the channel to the next available CPU
like the existing algorithm. This is not normally expected, because
during device probe, we limit the number of channels of a device to
be <= number of online CPUs.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(backported from commit 7c9ff3deeee61b253715dcf968a6307af148c9b2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[ rtg - backported by Haiyang Zhang and has been
 verified to fix the bug on A2 and D32 instance types. ]
---

v2 - v1 of this patch caused boot hangs on certain instance types and was dropped from
the SRU cycle (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1937078/comments/6).
This backport was developed by the original author who likely has a better handle on
things then I do. From SF, "Haiyang backported the patch to azure-5.4-next (attached). He
also reproduced the bug on 5.4 kernel with VM size A2 and D32, and verified the patch fixes the bug."

---
 drivers/hv/channel_mgmt.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

Comments

Stefan Bader Nov. 3, 2021, 8:18 a.m. UTC | #1
On 01.11.21 15:20, Tim Gardner wrote:
> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1937078
> 
> commit 7c9ff3deeee61b253715dcf968a6307af148c9b2 upstream
> 
> The vmbus module uses a rotational algorithm to assign target CPUs to
> a device's channels. Depending on the timing of different device's channel
> offers, different channels of a device may be assigned to the same CPU.
> 
> For example on a VM with 2 CPUs, if NIC A and B's channels are offered
> in the following order, NIC A will have both channels on CPU0, and
> NIC B will have both channels on CPU1 -- see below. This kind of
> assignment causes RSS load that is spreading across different channels
> to end up on the same CPU.
> 
> Timing of channel offers:
> NIC A channel 0
> NIC B channel 0
> NIC A channel 1
> NIC B channel 1
> 
> VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>          Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd}
>          Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd
>          Rel_ID=14, target_cpu=0
>          Rel_ID=17, target_cpu=0
> 
> VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>          Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea}
>          Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea
>          Rel_ID=16, target_cpu=1
>          Rel_ID=18, target_cpu=1
> 
> Update the vmbus CPU assignment algorithm to avoid duplicate CPU
> assignments within a device.
> 
> The new algorithm iterates num_online_cpus + 1 times.
> The existing rotational algorithm to find "next NUMA & CPU" is still here.
> But if the resulting CPU is already used by the same device, it will try
> the next CPU.
> In the last iteration, it assigns the channel to the next available CPU
> like the existing algorithm. This is not normally expected, because
> during device probe, we limit the number of channels of a device to
> be <= number of online CPUs.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Tested-by: Michael Kelley <mikelley@microsoft.com>
> Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> (backported from commit 7c9ff3deeee61b253715dcf968a6307af148c9b2)
> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
> [ rtg - backported by Haiyang Zhang and has been
>   verified to fix the bug on A2 and D32 instance types. ]
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---
> 
> v2 - v1 of this patch caused boot hangs on certain instance types and was dropped from
> the SRU cycle (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1937078/comments/6).
> This backport was developed by the original author who likely has a better handle on
> things then I do. From SF, "Haiyang backported the patch to azure-5.4-next (attached). He
> also reproduced the bug on 5.4 kernel with VM size A2 and D32, and verified the patch fixes the bug."
> 
> ---
>   drivers/hv/channel_mgmt.c | 33 +++++++++++++++++++++++++++++++++
>   1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 2b5d20965b1bf..ceac42e48f3a2 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -628,6 +628,32 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
>   	queue_work(wq, &newchannel->add_channel_work);
>   }
>   
> +/*
> + * Check if CPU is used by other channels of the same device.
> + * It should only be called by init_vp_index().
> + */
> +static bool hv_cpuself_used(u32 cpu, struct vmbus_channel *chn)
> +{
> +	struct vmbus_channel *primary = chn->primary_channel;
> +	struct vmbus_channel *sc;
> +	bool self_used = false;
> +	unsigned long flags;
> +
> +	if (!primary)
> +		return false;
> +
> +	if (primary->target_cpu == cpu)
> +		return true;
> +
> +	spin_lock_irqsave(&primary->lock, flags);
> +	list_for_each_entry(sc, &primary->sc_list, sc_list)
> +		if (sc != chn && sc->target_cpu == cpu)
> +			self_used = true;
> +	spin_unlock_irqrestore(&primary->lock, flags);
> +
> +	return self_used;
> +}
> +
>   /*
>    * We use this state to statically distribute the channel interrupt load.
>    */
> @@ -654,6 +680,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>   	u32 cur_cpu;
>   	bool perf_chn = vmbus_devs[dev_type].perf_device;
>   	struct vmbus_channel *primary = channel->primary_channel;
> +	u32 cnt = 1, ncpu = num_online_cpus();
>   	int next_node;
>   	cpumask_var_t available_mask;
>   	struct cpumask *alloced_mask;
> @@ -676,6 +703,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>   
>   	spin_lock(&bind_channel_to_cpu_lock);
>   
> +retry:
>   	/*
>   	 * Based on the channel affinity policy, we will assign the NUMA
>   	 * nodes.
> @@ -755,6 +783,11 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>   		}
>   	}
>   
> +	if (channel->affinity_policy == HV_BALANCED &&
> +	    channel->offermsg.offer.sub_channel_index < ncpu &&
> +	    cnt++ < ncpu + 1 && hv_cpuself_used(cur_cpu, channel))
> +		goto retry;
> +
>   	channel->target_cpu = cur_cpu;
>   	channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);
>   
>
Kelsey Skunberg Nov. 4, 2021, 5:20 p.m. UTC | #2
Acked-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>

On 2021-11-01 08:20:08 , Tim Gardner wrote:
> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1937078
> 
> commit 7c9ff3deeee61b253715dcf968a6307af148c9b2 upstream
> 
> The vmbus module uses a rotational algorithm to assign target CPUs to
> a device's channels. Depending on the timing of different device's channel
> offers, different channels of a device may be assigned to the same CPU.
> 
> For example on a VM with 2 CPUs, if NIC A and B's channels are offered
> in the following order, NIC A will have both channels on CPU0, and
> NIC B will have both channels on CPU1 -- see below. This kind of
> assignment causes RSS load that is spreading across different channels
> to end up on the same CPU.
> 
> Timing of channel offers:
> NIC A channel 0
> NIC B channel 0
> NIC A channel 1
> NIC B channel 1
> 
> VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>         Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd}
>         Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd
>         Rel_ID=14, target_cpu=0
>         Rel_ID=17, target_cpu=0
> 
> VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>         Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea}
>         Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea
>         Rel_ID=16, target_cpu=1
>         Rel_ID=18, target_cpu=1
> 
> Update the vmbus CPU assignment algorithm to avoid duplicate CPU
> assignments within a device.
> 
> The new algorithm iterates num_online_cpus + 1 times.
> The existing rotational algorithm to find "next NUMA & CPU" is still here.
> But if the resulting CPU is already used by the same device, it will try
> the next CPU.
> In the last iteration, it assigns the channel to the next available CPU
> like the existing algorithm. This is not normally expected, because
> during device probe, we limit the number of channels of a device to
> be <= number of online CPUs.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Tested-by: Michael Kelley <mikelley@microsoft.com>
> Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> (backported from commit 7c9ff3deeee61b253715dcf968a6307af148c9b2)
> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
> [ rtg - backported by Haiyang Zhang and has been
>  verified to fix the bug on A2 and D32 instance types. ]
> ---
> 
> v2 - v1 of this patch caused boot hangs on certain instance types and was dropped from
> the SRU cycle (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1937078/comments/6).
> This backport was developed by the original author who likely has a better handle on
> things then I do. From SF, "Haiyang backported the patch to azure-5.4-next (attached). He
> also reproduced the bug on 5.4 kernel with VM size A2 and D32, and verified the patch fixes the bug."
> 
> ---
>  drivers/hv/channel_mgmt.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 2b5d20965b1bf..ceac42e48f3a2 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -628,6 +628,32 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
>  	queue_work(wq, &newchannel->add_channel_work);
>  }
>  
> +/*
> + * Check if CPU is used by other channels of the same device.
> + * It should only be called by init_vp_index().
> + */
> +static bool hv_cpuself_used(u32 cpu, struct vmbus_channel *chn)
> +{
> +	struct vmbus_channel *primary = chn->primary_channel;
> +	struct vmbus_channel *sc;
> +	bool self_used = false;
> +	unsigned long flags;
> +
> +	if (!primary)
> +		return false;
> +
> +	if (primary->target_cpu == cpu)
> +		return true;
> +
> +	spin_lock_irqsave(&primary->lock, flags);
> +	list_for_each_entry(sc, &primary->sc_list, sc_list)
> +		if (sc != chn && sc->target_cpu == cpu)
> +			self_used = true;
> +	spin_unlock_irqrestore(&primary->lock, flags);
> +
> +	return self_used;
> +}
> +
>  /*
>   * We use this state to statically distribute the channel interrupt load.
>   */
> @@ -654,6 +680,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>  	u32 cur_cpu;
>  	bool perf_chn = vmbus_devs[dev_type].perf_device;
>  	struct vmbus_channel *primary = channel->primary_channel;
> +	u32 cnt = 1, ncpu = num_online_cpus();
>  	int next_node;
>  	cpumask_var_t available_mask;
>  	struct cpumask *alloced_mask;
> @@ -676,6 +703,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>  
>  	spin_lock(&bind_channel_to_cpu_lock);
>  
> +retry:
>  	/*
>  	 * Based on the channel affinity policy, we will assign the NUMA
>  	 * nodes.
> @@ -755,6 +783,11 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>  		}
>  	}
>  
> +	if (channel->affinity_policy == HV_BALANCED &&
> +	    channel->offermsg.offer.sub_channel_index < ncpu &&
> +	    cnt++ < ncpu + 1 && hv_cpuself_used(cur_cpu, channel))
> +		goto retry;
> +
>  	channel->target_cpu = cur_cpu;
>  	channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);
>  
> -- 
> 2.33.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Tim Gardner Nov. 4, 2021, 5:53 p.m. UTC | #3
Applied to focal/linux-azure. Thanks.

-rtg

On 11/1/21 8:20 AM, Tim Gardner wrote:
> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1937078
> 
> commit 7c9ff3deeee61b253715dcf968a6307af148c9b2 upstream
> 
> The vmbus module uses a rotational algorithm to assign target CPUs to
> a device's channels. Depending on the timing of different device's channel
> offers, different channels of a device may be assigned to the same CPU.
> 
> For example on a VM with 2 CPUs, if NIC A and B's channels are offered
> in the following order, NIC A will have both channels on CPU0, and
> NIC B will have both channels on CPU1 -- see below. This kind of
> assignment causes RSS load that is spreading across different channels
> to end up on the same CPU.
> 
> Timing of channel offers:
> NIC A channel 0
> NIC B channel 0
> NIC A channel 1
> NIC B channel 1
> 
> VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>          Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd}
>          Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd
>          Rel_ID=14, target_cpu=0
>          Rel_ID=17, target_cpu=0
> 
> VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>          Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea}
>          Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea
>          Rel_ID=16, target_cpu=1
>          Rel_ID=18, target_cpu=1
> 
> Update the vmbus CPU assignment algorithm to avoid duplicate CPU
> assignments within a device.
> 
> The new algorithm iterates num_online_cpus + 1 times.
> The existing rotational algorithm to find "next NUMA & CPU" is still here.
> But if the resulting CPU is already used by the same device, it will try
> the next CPU.
> In the last iteration, it assigns the channel to the next available CPU
> like the existing algorithm. This is not normally expected, because
> during device probe, we limit the number of channels of a device to
> be <= number of online CPUs.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Tested-by: Michael Kelley <mikelley@microsoft.com>
> Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> (backported from commit 7c9ff3deeee61b253715dcf968a6307af148c9b2)
> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
> [ rtg - backported by Haiyang Zhang and has been
>   verified to fix the bug on A2 and D32 instance types. ]
> ---
> 
> v2 - v1 of this patch caused boot hangs on certain instance types and was dropped from
> the SRU cycle (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1937078/comments/6).
> This backport was developed by the original author who likely has a better handle on
> things then I do. From SF, "Haiyang backported the patch to azure-5.4-next (attached). He
> also reproduced the bug on 5.4 kernel with VM size A2 and D32, and verified the patch fixes the bug."
> 
> ---
>   drivers/hv/channel_mgmt.c | 33 +++++++++++++++++++++++++++++++++
>   1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 2b5d20965b1bf..ceac42e48f3a2 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -628,6 +628,32 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
>   	queue_work(wq, &newchannel->add_channel_work);
>   }
>   
> +/*
> + * Check if CPU is used by other channels of the same device.
> + * It should only be called by init_vp_index().
> + */
> +static bool hv_cpuself_used(u32 cpu, struct vmbus_channel *chn)
> +{
> +	struct vmbus_channel *primary = chn->primary_channel;
> +	struct vmbus_channel *sc;
> +	bool self_used = false;
> +	unsigned long flags;
> +
> +	if (!primary)
> +		return false;
> +
> +	if (primary->target_cpu == cpu)
> +		return true;
> +
> +	spin_lock_irqsave(&primary->lock, flags);
> +	list_for_each_entry(sc, &primary->sc_list, sc_list)
> +		if (sc != chn && sc->target_cpu == cpu)
> +			self_used = true;
> +	spin_unlock_irqrestore(&primary->lock, flags);
> +
> +	return self_used;
> +}
> +
>   /*
>    * We use this state to statically distribute the channel interrupt load.
>    */
> @@ -654,6 +680,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>   	u32 cur_cpu;
>   	bool perf_chn = vmbus_devs[dev_type].perf_device;
>   	struct vmbus_channel *primary = channel->primary_channel;
> +	u32 cnt = 1, ncpu = num_online_cpus();
>   	int next_node;
>   	cpumask_var_t available_mask;
>   	struct cpumask *alloced_mask;
> @@ -676,6 +703,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>   
>   	spin_lock(&bind_channel_to_cpu_lock);
>   
> +retry:
>   	/*
>   	 * Based on the channel affinity policy, we will assign the NUMA
>   	 * nodes.
> @@ -755,6 +783,11 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>   		}
>   	}
>   
> +	if (channel->affinity_policy == HV_BALANCED &&
> +	    channel->offermsg.offer.sub_channel_index < ncpu &&
> +	    cnt++ < ncpu + 1 && hv_cpuself_used(cur_cpu, channel))
> +		goto retry;
> +
>   	channel->target_cpu = cur_cpu;
>   	channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);
>   
>
diff mbox series

Patch

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 2b5d20965b1bf..ceac42e48f3a2 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -628,6 +628,32 @@  static void vmbus_process_offer(struct vmbus_channel *newchannel)
 	queue_work(wq, &newchannel->add_channel_work);
 }
 
+/*
+ * Check if CPU is used by other channels of the same device.
+ * It should only be called by init_vp_index().
+ */
+static bool hv_cpuself_used(u32 cpu, struct vmbus_channel *chn)
+{
+	struct vmbus_channel *primary = chn->primary_channel;
+	struct vmbus_channel *sc;
+	bool self_used = false;
+	unsigned long flags;
+
+	if (!primary)
+		return false;
+
+	if (primary->target_cpu == cpu)
+		return true;
+
+	spin_lock_irqsave(&primary->lock, flags);
+	list_for_each_entry(sc, &primary->sc_list, sc_list)
+		if (sc != chn && sc->target_cpu == cpu)
+			self_used = true;
+	spin_unlock_irqrestore(&primary->lock, flags);
+
+	return self_used;
+}
+
 /*
  * We use this state to statically distribute the channel interrupt load.
  */
@@ -654,6 +680,7 @@  static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 	u32 cur_cpu;
 	bool perf_chn = vmbus_devs[dev_type].perf_device;
 	struct vmbus_channel *primary = channel->primary_channel;
+	u32 cnt = 1, ncpu = num_online_cpus();
 	int next_node;
 	cpumask_var_t available_mask;
 	struct cpumask *alloced_mask;
@@ -676,6 +703,7 @@  static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 
 	spin_lock(&bind_channel_to_cpu_lock);
 
+retry:
 	/*
 	 * Based on the channel affinity policy, we will assign the NUMA
 	 * nodes.
@@ -755,6 +783,11 @@  static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 		}
 	}
 
+	if (channel->affinity_policy == HV_BALANCED &&
+	    channel->offermsg.offer.sub_channel_index < ncpu &&
+	    cnt++ < ncpu + 1 && hv_cpuself_used(cur_cpu, channel))
+		goto retry;
+
 	channel->target_cpu = cur_cpu;
 	channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);