From patchwork Tue May 3 12:22:00 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vitaly Kuznetsov X-Patchwork-Id: 617934 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3qzgKb2X8lz9t4h for ; Tue, 3 May 2016 22:22:27 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933114AbcECMWH (ORCPT ); Tue, 3 May 2016 08:22:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35646 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932762AbcECMWE (ORCPT ); Tue, 3 May 2016 08:22:04 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 180C415563; Tue, 3 May 2016 12:22:04 +0000 (UTC) Received: from vitty.brq.redhat.com (vitty.brq.redhat.com [10.34.26.3]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u43CM1UZ019587; Tue, 3 May 2016 08:22:02 -0400 From: Vitaly Kuznetsov To: linux-pci@vger.kernel.org Cc: linux-kernel@vger.kernel.org, devel@linuxdriverproject.org, Bjorn Helgaas , Haiyang Zhang , "K. Y. Srinivasan" , Jake Oshins Subject: [PATCH [RFC]] PCI: hv: add explicit fencing to config space access Date: Tue, 3 May 2016 14:22:00 +0200 Message-Id: <1462278120-7859-1-git-send-email-vkuznets@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org I'm trying to pass-through Broadcom BCM5720 NIC (Dell Device 1f5b) on Dell R720 server. Everything works fine when target VM has only one CPU, but SMP guests reboot when NIC driver is trying to access PCI config space (with hv_pcifront_read_config/hv_pcifront_write_config). The reboot appears to be induced by the hypervisor and no crash is observed. Windows event logs are not helpful at all ('Virtual machine ... has quit unexpectedly'). The particular access point is always different and putting debug between them (printk/mdelay/...) moves the issue further away. The server model affects the issue as well: on Dell R420 I'm able to pass-through BCM5720 NIC to SMP guests without issues. While I'm obviously failing to reveal the essence of the issue I was able to come up with a (possible) solution: if explicit fencing is put to hv_pcifront_read_config/hv_pcifront_write_config the issue goes away. The essential minimum is rmb() at the end on _hv_pcifront_read_config() and wmb() at the end of _hv_pcifront_write_config() but I'm not confident it will be sufficient for all hardware. I suggest the following fencing: 1) wmb()/mb() between choosing the function and writing to its space. 2) mb() before releasing the spinlock in both _hv_pcifront_read_config()/ _hv_pcifront_write_config to ensure that consecutive reads/writes to the space won't get re-ordered as drivers may count on that. Config space access is not supposed to be performance-critical so this explicit fencing is not supposed to bring any slowdown. Signed-off-by: Vitaly Kuznetsov Acked-by: Jake Oshins --- drivers/pci/host/pci-hyperv.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index c17e792..7e9b2de 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -553,6 +553,8 @@ static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where, spin_lock_irqsave(&hpdev->hbus->config_lock, flags); /* Choose the function to be read. (See comment above) */ writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr); + /* Make sure the function was chosen before we start reading. */ + mb(); /* Read from that function's config space. */ switch (size) { case 1: @@ -565,6 +567,11 @@ static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where, *val = readl(addr); break; } + /* + * Make sure the write was done before we release the spinlock + * allowing consecutive reads/writes. + */ + mb(); spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags); } else { dev_err(&hpdev->hbus->hdev->device, @@ -592,6 +599,8 @@ static void _hv_pcifront_write_config(struct hv_pci_dev *hpdev, int where, spin_lock_irqsave(&hpdev->hbus->config_lock, flags); /* Choose the function to be written. (See comment above) */ writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr); + /* Make sure the function was chosen before we start writing. */ + wmb(); /* Write to that function's config space. */ switch (size) { case 1: @@ -604,6 +613,11 @@ static void _hv_pcifront_write_config(struct hv_pci_dev *hpdev, int where, writel(val, addr); break; } + /* + * Make sure the write was done before we release the spinlock + * allowing consecutive reads/writes. + */ + mb(); spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags); } else { dev_err(&hpdev->hbus->hdev->device,