[v3,6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()

Message ID 20180306182128.23281-7-decui@microsoft.com
State Superseded
Delegated to: Lorenzo Pieralisi
Headers show
Series
  • some fixes to the pci-hyperv driver.
Related show

Commit Message

Dexuan Cui March 6, 2018, 6:21 p.m.
1. With the patch "x86/vector/msi: Switch to global reservation mode"
(4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg()
by request_irq()  -> request_threaded_irq() -> __setup_irq()->irq_startup()
 -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
disabled in __setup_irq().

Fix this by polling the channel.

2. If the host is ejecting the VF device before we reach
hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
forever, because at this time the host doesn't respond to the
CREATE_INTERRUPT request. This issue also happens to old kernels like
v4.14, v4.13, etc.

Fix this by polling the channel for the PCI_EJECT message and
hpdev->state, and by checking the PCI vendor ID.

Note: actually the above issues also happen to a SMP VM, if
"hbus->hdev->channel->target_cpu == smp_processor_id()" is true.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
Tested-by: Chris Valean <v-chvale@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
---
 drivers/pci/host/pci-hyperv.c | 58 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

Comments

Michael Kelley (EOSG) March 6, 2018, 6:32 p.m. | #1
> -----Original Message-----
> From: Dexuan Cui
> Sent: Tuesday, March 6, 2018 10:22 AM
> To: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; apw@canonical.com;
> jasowang@redhat.com
> Cc: linux-kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Haiyang Zhang
> <haiyangz@microsoft.com>; vkuznets@redhat.com; marcelo.cerri@canonical.com; Michael
> Kelley (EOSG) <Michael.H.Kelley@microsoft.com>; Dexuan Cui <decui@microsoft.com>;
> stable@vger.kernel.org; Jack Morgenstein <jackm@mellanox.com>
> Subject: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> 
> 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg()
> by request_irq()  -> request_threaded_irq() -> __setup_irq()->irq_startup()
>  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> disabled in __setup_irq().
> 
> Fix this by polling the channel.
> 
> 2. If the host is ejecting the VF device before we reach
> hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
> forever, because at this time the host doesn't respond to the
> CREATE_INTERRUPT request. This issue also happens to old kernels like
> v4.14, v4.13, etc.
> 
> Fix this by polling the channel for the PCI_EJECT message and
> hpdev->state, and by checking the PCI vendor ID.
> 
> Note: actually the above issues also happen to a SMP VM, if
> "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
> Tested-by: Chris Valean <v-chvale@microsoft.com>
> Cc: stable@vger.kernel.org
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Jack Morgenstein <jackm@mellanox.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 58 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Lorenzo Pieralisi March 7, 2018, 12:34 p.m. | #2
On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote:
> 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg()
> by request_irq()  -> request_threaded_irq() -> __setup_irq()->irq_startup()
>  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> disabled in __setup_irq().
> 
> Fix this by polling the channel.
> 
> 2. If the host is ejecting the VF device before we reach
> hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
> forever, because at this time the host doesn't respond to the
> CREATE_INTERRUPT request. This issue also happens to old kernels like
> v4.14, v4.13, etc.

If you are fixing a problem you should report what commit you are fixing
with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log
to send it to stable kernels to which it should be applied; mentioning
kernel versions in the commit log is useless and should be omitted.

Side note: you should not have stable@vger.kernel.org in the email
addresses CC list you are sending the patches to (you mark patches for
stable by adding an appropriate CC tag in the commit log).

Here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst?h=v4.16-rc4

Last but not least, most of the patches in this series do not justify
sending them to stable kernels at all so you should remove the
corresponding tag from the patches.

Thanks,
Lorenzo

> Fix this by polling the channel for the PCI_EJECT message and
> hpdev->state, and by checking the PCI vendor ID.
> 
> Note: actually the above issues also happen to a SMP VM, if
> "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
> Tested-by: Chris Valean <v-chvale@microsoft.com>
> Cc: stable@vger.kernel.org
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Jack Morgenstein <jackm@mellanox.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 58 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 265ba11e53e2..50cdefe3f6d3 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -521,6 +521,8 @@ struct hv_pci_compl {
>  	s32 completion_status;
>  };
>  
> +static void hv_pci_onchannelcallback(void *context);
> +
>  /**
>   * hv_pci_generic_compl() - Invoked for a completion packet
>   * @context:		Set up by the sender of the packet.
> @@ -665,6 +667,31 @@ static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where,
>  	}
>  }
>  
> +static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev)
> +{
> +	u16 ret;
> +	unsigned long flags;
> +	void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET +
> +			     PCI_VENDOR_ID;
> +
> +	spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
> +
> +	/* Choose the function to be read. (See comment above) */
> +	writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
> +	/* Make sure the function was chosen before we start reading. */
> +	mb();
> +	/* Read from that function's config space. */
> +	ret = readw(addr);
> +	/*
> +	 * mb() is not required here, because the spin_unlock_irqrestore()
> +	 * is a barrier.
> +	 */
> +
> +	spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
> +
> +	return ret;
> +}
> +
>  /**
>   * _hv_pcifront_write_config() - Internal PCI config write
>   * @hpdev:	The PCI driver's representation of the device
> @@ -1107,8 +1134,37 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  	 * Since this function is called with IRQ locks held, can't
>  	 * do normal wait for completion; instead poll.
>  	 */
> -	while (!try_wait_for_completion(&comp.comp_pkt.host_event))
> +	while (!try_wait_for_completion(&comp.comp_pkt.host_event)) {
> +		/* 0xFFFF means an invalid PCI VENDOR ID. */
> +		if (hv_pcifront_get_vendor_id(hpdev) == 0xFFFF) {
> +			dev_err_once(&hbus->hdev->device,
> +				     "the device has gone\n");
> +			goto free_int_desc;
> +		}
> +
> +		/*
> +		 * When the higher level interrupt code calls us with
> +		 * interrupt disabled, we must poll the channel by calling
> +		 * the channel callback directly when channel->target_cpu is
> +		 * the current CPU. When the higher level interrupt code
> +		 * calls us with interrupt enabled, let's add the
> +		 * local_bh_disable()/enable() to avoid race.
> +		 */
> +		local_bh_disable();
> +
> +		if (hbus->hdev->channel->target_cpu == smp_processor_id())
> +			hv_pci_onchannelcallback(hbus);
> +
> +		local_bh_enable();
> +
> +		if (hpdev->state == hv_pcichild_ejecting) {
> +			dev_err_once(&hbus->hdev->device,
> +				     "the device is being ejected\n");
> +			goto free_int_desc;
> +		}
> +
>  		udelay(100);
> +	}
>  
>  	if (comp.comp_pkt.completion_status < 0) {
>  		dev_err(&hbus->hdev->device,
> -- 
> 2.7.4
Dexuan Cui March 7, 2018, 9:40 p.m. | #3
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Sent: Wednesday, March 7, 2018 04:35
> On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote:
> > 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> > Hyper-V VM with SR-IOV. This is because when we reach
> hv_compose_msi_msg()
> > by request_irq()  -> request_threaded_irq() -> __setup_irq()->irq_startup()
> >  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> > disabled in __setup_irq().
> >
> > Fix this by polling the channel.
> >
> > 2. If the host is ejecting the VF device before we reach
> > hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
> > forever, because at this time the host doesn't respond to the
> > CREATE_INTERRUPT request. This issue also happens to old kernels like
> > v4.14, v4.13, etc.
>
> If you are fixing a problem you should report what commit you are fixing
> with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log
> to send it to stable kernels to which it should be applied; mentioning
> kernel versions in the commit log is useless and should be omitted.

Hi Lorenzo,
Thanks for your comments!
This patch does have a "Cc: stable@vger.kernel.org" in the sign-off area. :-)

Here the patch is made to resolve 2 issues:
#1 is triggered by the x86 global reservation mode (4900be8360) patch.
4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c
should be fixed.

#2 is a longstanding issue since the first day the pci-hyperv driver was
accepted into the kernel.

So IMO actually we don't really need to add a Fixes: tag, which is usually
used to specify a specific commit that introduces a bug that is being fixed.

> Side note: you should not have stable@vger.kernel.org in the email
> addresses CC list you are sending the patches to (you mark patches for
> stable by adding an appropriate CC tag in the commit log).

Sorry, I didn't know this, but actually I didn't add stable@vger.kernel.org
manually. Instead I used "git send-email" to send this patchset, and it told
me "The Cc list above has been expanded by additional addresses found
in the patch commit message."

I didn't find a way to disable this behavior of "git send-email" by checking
its manual and googling it. This is strange.

> Here:
>
> git.kernel.org/.../Documentation/process/stable-kernel-rules.rst
>
> Last but not least, most of the patches in this series do not justify
> sending them to stable kernels at all so you should remove the
> corresponding tag from the patches.

I hope at least these 2 patches can go into the stable kernels:
[PATCH v3 3/6] PCI: hv: serialize the present/eject work items
[PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
Especially the second one, which fixes a real hang issue for UP virtual
machines running v4.15 and newer.
And,  IMO the patches are small enough (<100 lines) , but definitely
the maintainers make the final call.

>
> Thanks,
> Lorenzo
>
> > Fix this by polling the channel for the PCI_EJECT message and
> > hpdev->state, and by checking the PCI vendor ID.
> >
> > Note: actually the above issues also happen to a SMP VM, if
> > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> >
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
> > Tested-by: Chris Valean <v-chvale@microsoft.com>
> > Cc: stable@vger.kernel.org
> > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > Cc: K. Y. Srinivasan <kys@microsoft.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Cc: Jack Morgenstein <jackm@mellanox.com>
> > ---
> >  drivers/pci/host/pci-hyperv.c | 58


Thanks,
-- Dexuan
Haiyang Zhang March 9, 2018, 7:38 p.m. | #4
> -----Original Message-----
> From: Dexuan Cui
> Sent: Tuesday, March 6, 2018 1:22 PM
> To: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com
> Cc: linux-kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org;
> Haiyang Zhang <haiyangz@microsoft.com>; vkuznets@redhat.com;
> marcelo.cerri@canonical.com; Michael Kelley (EOSG)
> <Michael.H.Kelley@microsoft.com>; Dexuan Cui <decui@microsoft.com>;
> stable@vger.kernel.org; Jack Morgenstein <jackm@mellanox.com>
> Subject: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> 
> 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> Hyper-V VM with SR-IOV. This is because when we reach
> hv_compose_msi_msg() by request_irq()  -> request_threaded_irq() ->
> __setup_irq()->irq_startup()  -> __irq_startup() -> irq_domain_activate_irq() -
> > ... ->
> msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is disabled in
> __setup_irq().
> 
> Fix this by polling the channel.
> 
> 2. If the host is ejecting the VF device before we reach hv_compose_msi_msg(),
> in a UP VM, we can hang in hv_compose_msi_msg() forever, because at this
> time the host doesn't respond to the CREATE_INTERRUPT request. This issue
> also happens to old kernels like v4.14, v4.13, etc.
> 
> Fix this by polling the channel for the PCI_EJECT message and
> hpdev->state, and by checking the PCI vendor ID.
> 
> Note: actually the above issues also happen to a SMP VM, if "hbus->hdev-
> >channel->target_cpu == smp_processor_id()" is true.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
> Tested-by: Chris Valean <v-chvale@microsoft.com>
> Cc: stable@vger.kernel.org
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Jack Morgenstein <jackm@mellanox.com>
> ---

Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Dexuan Cui March 13, 2018, 6:23 p.m. | #5
> From: Dexuan Cui
> Sent: Wednesday, March 7, 2018 13:40
> To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com; linux-
> kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Haiyang
> Zhang <haiyangz@microsoft.com>; vkuznets@redhat.com;
> marcelo.cerri@canonical.com; Michael Kelley (EOSG)
> <Michael.H.Kelley@microsoft.com>; stable@vger.kernel.org; Jack
> Morgenstein <jackm@mellanox.com>
> Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> 
> > From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Sent: Wednesday, March 7, 2018 04:35
> > On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote:
> > > 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> > > Hyper-V VM with SR-IOV. This is because when we reach
> > hv_compose_msi_msg()
> > > by request_irq()  -> request_threaded_irq() -> __setup_irq()->irq_startup()
> > >  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> > > disabled in __setup_irq().
> > >
> > > Fix this by polling the channel.
> > >
> > > 2. If the host is ejecting the VF device before we reach
> > > hv_compose_msi_msg(), in a UP VM, we can hang in
> hv_compose_msi_msg()
> > > forever, because at this time the host doesn't respond to the
> > > CREATE_INTERRUPT request. This issue also happens to old kernels like
> > > v4.14, v4.13, etc.
> >
> > If you are fixing a problem you should report what commit you are fixing
> > with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log
> > to send it to stable kernels to which it should be applied; mentioning
> > kernel versions in the commit log is useless and should be omitted.
> 
> Hi Lorenzo,
> Thanks for your comments!
> This patch does have a "Cc: stable@vger.kernel.org" in the sign-off area. :-)
> 
> Here the patch is made to resolve 2 issues:
> #1 is triggered by the x86 global reservation mode (4900be8360) patch.
> 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c
> should be fixed.
> 
> #2 is a longstanding issue since the first day the pci-hyperv driver was
> accepted into the kernel.
> 
> So IMO actually we don't really need to add a Fixes: tag, which is usually
> used to specify a specific commit that introduces a bug that is being fixed.
> 
> > Side note: you should not have stable@vger.kernel.org in the email
> > addresses CC list you are sending the patches to (you mark patches for
> > stable by adding an appropriate CC tag in the commit log).
> 
> Sorry, I didn't know this, but actually I didn't add stable@vger.kernel.org
> manually. Instead I used "git send-email" to send this patchset, and it told
> me "The Cc list above has been expanded by additional addresses found
> in the patch commit message."
> 
> I didn't find a way to disable this behavior of "git send-email" by checking
> its manual and googling it. This is strange.
> 
> > Here:
> >
> > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst
> >
> > Last but not least, most of the patches in this series do not justify
> > sending them to stable kernels at all so you should remove the
> > corresponding tag from the patches.
> 
> I hope at least these 2 patches can go into the stable kernels:
> [PATCH v3 3/6] PCI: hv: serialize the present/eject work items
> [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> Especially the second one, which fixes a real hang issue for UP virtual
> machines running v4.15 and newer.
> And,  IMO the patches are small enough (<100 lines) , but definitely
> the maintainers make the final call.
> 
> >
> > Thanks,
> > Lorenzo
> >
> > > Fix this by polling the channel for the PCI_EJECT message and
> > > hpdev->state, and by checking the PCI vendor ID.
> > >
> > > Note: actually the above issues also happen to a SMP VM, if
> > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> > >
> > > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > > Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
> > > Tested-by: Chris Valean <v-chvale@microsoft.com>
> > > Cc: stable@vger.kernel.org
> > > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > > Cc: K. Y. Srinivasan <kys@microsoft.com>
> > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > Cc: Jack Morgenstein <jackm@mellanox.com>
> > > ---
> > >  drivers/pci/host/pci-hyperv.c | 58
> 
> 
> Thanks,
> -- Dexuan

Hi Lorenzo, Bjorn, and all,
Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd 
the patchset.

Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag
for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be 
easier if you just remove the tags if you belive it's necessary (IMHO all the
6 paches are not big and it would be great if we can have all of them in 
the old stable kernels, but I respect your decision).

Please let me know if I missed something when addressing the comments,
 and if I should send a v4.

Thanks!
-- Dexuan
Lorenzo Pieralisi March 13, 2018, 6:31 p.m. | #6
On Tue, Mar 13, 2018 at 06:23:39PM +0000, Dexuan Cui wrote:
> > From: Dexuan Cui
> > Sent: Wednesday, March 7, 2018 13:40
> > To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Cc: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan
> > <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> > olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com; linux-
> > kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Haiyang
> > Zhang <haiyangz@microsoft.com>; vkuznets@redhat.com;
> > marcelo.cerri@canonical.com; Michael Kelley (EOSG)
> > <Michael.H.Kelley@microsoft.com>; stable@vger.kernel.org; Jack
> > Morgenstein <jackm@mellanox.com>
> > Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> > 
> > > From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > > Sent: Wednesday, March 7, 2018 04:35
> > > On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote:
> > > > 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> > > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> > > > Hyper-V VM with SR-IOV. This is because when we reach
> > > hv_compose_msi_msg()
> > > > by request_irq()  -> request_threaded_irq() -> __setup_irq()->irq_startup()
> > > >  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> > > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> > > > disabled in __setup_irq().
> > > >
> > > > Fix this by polling the channel.
> > > >
> > > > 2. If the host is ejecting the VF device before we reach
> > > > hv_compose_msi_msg(), in a UP VM, we can hang in
> > hv_compose_msi_msg()
> > > > forever, because at this time the host doesn't respond to the
> > > > CREATE_INTERRUPT request. This issue also happens to old kernels like
> > > > v4.14, v4.13, etc.
> > >
> > > If you are fixing a problem you should report what commit you are fixing
> > > with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log
> > > to send it to stable kernels to which it should be applied; mentioning
> > > kernel versions in the commit log is useless and should be omitted.
> > 
> > Hi Lorenzo,
> > Thanks for your comments!
> > This patch does have a "Cc: stable@vger.kernel.org" in the sign-off area. :-)
> > 
> > Here the patch is made to resolve 2 issues:
> > #1 is triggered by the x86 global reservation mode (4900be8360) patch.
> > 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c
> > should be fixed.
> > 
> > #2 is a longstanding issue since the first day the pci-hyperv driver was
> > accepted into the kernel.
> > 
> > So IMO actually we don't really need to add a Fixes: tag, which is usually
> > used to specify a specific commit that introduces a bug that is being fixed.
> > 
> > > Side note: you should not have stable@vger.kernel.org in the email
> > > addresses CC list you are sending the patches to (you mark patches for
> > > stable by adding an appropriate CC tag in the commit log).
> > 
> > Sorry, I didn't know this, but actually I didn't add stable@vger.kernel.org
> > manually. Instead I used "git send-email" to send this patchset, and it told
> > me "The Cc list above has been expanded by additional addresses found
> > in the patch commit message."
> > 
> > I didn't find a way to disable this behavior of "git send-email" by checking
> > its manual and googling it. This is strange.
> > 
> > > Here:
> > >
> > > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst
> > >
> > > Last but not least, most of the patches in this series do not justify
> > > sending them to stable kernels at all so you should remove the
> > > corresponding tag from the patches.
> > 
> > I hope at least these 2 patches can go into the stable kernels:
> > [PATCH v3 3/6] PCI: hv: serialize the present/eject work items
> > [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> > Especially the second one, which fixes a real hang issue for UP virtual
> > machines running v4.15 and newer.
> > And,  IMO the patches are small enough (<100 lines) , but definitely
> > the maintainers make the final call.
> > 
> > >
> > > Thanks,
> > > Lorenzo
> > >
> > > > Fix this by polling the channel for the PCI_EJECT message and
> > > > hpdev->state, and by checking the PCI vendor ID.
> > > >
> > > > Note: actually the above issues also happen to a SMP VM, if
> > > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> > > >
> > > > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > > > Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
> > > > Tested-by: Chris Valean <v-chvale@microsoft.com>
> > > > Cc: stable@vger.kernel.org
> > > > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > > > Cc: K. Y. Srinivasan <kys@microsoft.com>
> > > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > Cc: Jack Morgenstein <jackm@mellanox.com>
> > > > ---
> > > >  drivers/pci/host/pci-hyperv.c | 58
> > 
> > 
> > Thanks,
> > -- Dexuan
> 
> Hi Lorenzo, Bjorn, and all,
> Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd 
> the patchset.
> 
> Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag
> for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be 
> easier if you just remove the tags if you belive it's necessary (IMHO all the
> 6 paches are not big and it would be great if we can have all of them in 
> the old stable kernels, but I respect your decision).
> 
> Please let me know if I missed something when addressing the comments,
>  and if I should send a v4.

I will have a look tomorrow, thank you.

Lorenzo
Lorenzo Pieralisi March 14, 2018, 11:50 a.m. | #7
On Tue, Mar 13, 2018 at 06:23:39PM +0000, Dexuan Cui wrote:

[...]

> Hi Lorenzo, Bjorn, and all,
> Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd 
> the patchset.
> 
> Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag
> for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be 
> easier if you just remove the tags if you belive it's necessary (IMHO all the
> 6 paches are not big and it would be great if we can have all of them in 
> the old stable kernels, but I respect your decision).

I think you need a v4 since for patches that are actually fixing a bug I
want a Fixes: tag added and I want them to be applicable independently
of other patches in the series. Send them in a separate series if you
prefer - I just want to make sure they apply independently.

As for marking patches for stable kernels, I do not think it is right
to send cosmetic churn to stable kernels anyway, at least that's my
reading of the policy.

You can easily post a v4 with patches reshuffled - I will apply them
accordingly.

Before re-posting please read this to improve formatting (I can do it
for you but while at it you can do it yourself):

https://marc.info/?l=linux-pci&m=150905742808166&w=2

Thanks,
Lorenzo
Dexuan Cui March 14, 2018, 5:17 p.m. | #8
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Sent: Wednesday, March 14, 2018 04:50
> On Tue, Mar 13, 2018 at 06:23:39PM +0000, Dexuan Cui wrote:
> 
> [...]
> 
> > Hi Lorenzo, Bjorn, and all,
> > Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd
> > the patchset.
> >
> > Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag
> > for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be
> > easier if you just remove the tags if you belive it's necessary (IMHO all the
> > 6 paches are not big and it would be great if we can have all of them in
> > the old stable kernels, but I respect your decision).
> 
> I think you need a v4 since for patches that are actually fixing a bug I
> want a Fixes: tag added and I want them to be applicable independently
> of other patches in the series. Send them in a separate series if you
> prefer - I just want to make sure they apply independently.

Ok, I'll send 2 series: one for
[6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
[3/6] PCI: hv: serialize the present/eject work items
These fix real issues reported by Mellanox when they tested SR-IOV with VMs
runnning on Hyper-V. I'll add stable tags for them.

The other series will cover these 4 patces which don't need to go in stable:
[5/6] PCI: hv: hv_pci_devices_present(): only queue a new work when necessary
[4/6] PCI: hv: remove hbus->enum_sem
[2/6] PCI: hv: hv_eject_device_work(): remove the bogus test
[1/6] PCI: hv: fix a comment typo in _hv_pcifront_read_config()
 
> As for marking patches for stable kernels, I do not think it is right
> to send cosmetic churn to stable kernels anyway, at least that's my
> reading of the policy.
> 
> You can easily post a v4 with patches reshuffled - I will apply them
> accordingly.
OK.
 
> Before re-posting please read this to improve formatting (I can do it
> for you but while at it you can do it yourself):
> https://marc.info/?l=linux-pci&m=150905742808166&w=2
Will read. Thanks!

-- Dexuan

Patch

diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index 265ba11e53e2..50cdefe3f6d3 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -521,6 +521,8 @@  struct hv_pci_compl {
 	s32 completion_status;
 };
 
+static void hv_pci_onchannelcallback(void *context);
+
 /**
  * hv_pci_generic_compl() - Invoked for a completion packet
  * @context:		Set up by the sender of the packet.
@@ -665,6 +667,31 @@  static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where,
 	}
 }
 
+static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev)
+{
+	u16 ret;
+	unsigned long flags;
+	void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET +
+			     PCI_VENDOR_ID;
+
+	spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
+
+	/* Choose the function to be read. (See comment above) */
+	writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
+	/* Make sure the function was chosen before we start reading. */
+	mb();
+	/* Read from that function's config space. */
+	ret = readw(addr);
+	/*
+	 * mb() is not required here, because the spin_unlock_irqrestore()
+	 * is a barrier.
+	 */
+
+	spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+
+	return ret;
+}
+
 /**
  * _hv_pcifront_write_config() - Internal PCI config write
  * @hpdev:	The PCI driver's representation of the device
@@ -1107,8 +1134,37 @@  static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	 * Since this function is called with IRQ locks held, can't
 	 * do normal wait for completion; instead poll.
 	 */
-	while (!try_wait_for_completion(&comp.comp_pkt.host_event))
+	while (!try_wait_for_completion(&comp.comp_pkt.host_event)) {
+		/* 0xFFFF means an invalid PCI VENDOR ID. */
+		if (hv_pcifront_get_vendor_id(hpdev) == 0xFFFF) {
+			dev_err_once(&hbus->hdev->device,
+				     "the device has gone\n");
+			goto free_int_desc;
+		}
+
+		/*
+		 * When the higher level interrupt code calls us with
+		 * interrupt disabled, we must poll the channel by calling
+		 * the channel callback directly when channel->target_cpu is
+		 * the current CPU. When the higher level interrupt code
+		 * calls us with interrupt enabled, let's add the
+		 * local_bh_disable()/enable() to avoid race.
+		 */
+		local_bh_disable();
+
+		if (hbus->hdev->channel->target_cpu == smp_processor_id())
+			hv_pci_onchannelcallback(hbus);
+
+		local_bh_enable();
+
+		if (hpdev->state == hv_pcichild_ejecting) {
+			dev_err_once(&hbus->hdev->device,
+				     "the device is being ejected\n");
+			goto free_int_desc;
+		}
+
 		udelay(100);
+	}
 
 	if (comp.comp_pkt.completion_status < 0) {
 		dev_err(&hbus->hdev->device,