diff mbox

[net-next] net/mlx4_core: match pci_device_id including dynids

Message ID 1396750050-7183-1-git-send-email-weiyang@linux.vnet.ibm.com
State Deferred, archived
Delegated to: David Miller
Headers show

Commit Message

Wei Yang April 6, 2014, 2:07 a.m. UTC
Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass
pci_device_id.driver_data to __mlx4_init_one during reset".

pci_match_id() just match the static pci_device_id, which may return NULL if
someone binds the driver to a device manually using
/sys/bus/pci/drivers/.../new_id.

This patch wrap up a helper function __mlx4_remove_one() which does the tear
down function but preserve the drv_data. Functions like
mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
releasing drvdata.

Tested on ConnectX-3.

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |  165 +++++++++++++++--------------
 1 file changed, 88 insertions(+), 77 deletions(-)

Comments

David Miller April 7, 2014, 6:43 p.m. UTC | #1
From: Wei Yang <weiyang@linux.vnet.ibm.com>
Date: Sun,  6 Apr 2014 10:07:30 +0800

> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass
> pci_device_id.driver_data to __mlx4_init_one during reset".
> 
> pci_match_id() just match the static pci_device_id, which may return NULL if
> someone binds the driver to a device manually using
> /sys/bus/pci/drivers/.../new_id.
> 
> This patch wrap up a helper function __mlx4_remove_one() which does the tear
> down function but preserve the drv_data. Functions like
> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
> releasing drvdata.
> 
> Tested on ConnectX-3.
> 
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: Amir Vadai <amirv@mellanox.com>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>

Please resubmit this when the net-next tree opens back up, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz April 8, 2014, 6:48 a.m. UTC | #2
On 07/04/2014 21:43, David Miller wrote:
> From: Wei Yang <weiyang@linux.vnet.ibm.com>
> Date: Sun,  6 Apr 2014 10:07:30 +0800
>
>> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass
>> pci_device_id.driver_data to __mlx4_init_one during reset".
>>
>> pci_match_id() just match the static pci_device_id, which may return NULL if
>> someone binds the driver to a device manually using
>> /sys/bus/pci/drivers/.../new_id.
>>
>> This patch wrap up a helper function __mlx4_remove_one() which does the tear
>> down function but preserve the drv_data. Functions like
>> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
>> releasing drvdata.
>>
>> Tested on ConnectX-3.
>>
>> CC: Bjorn Helgaas <bhelgaas@google.com>
>> CC: Amir Vadai <amirv@mellanox.com>
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Please resubmit this when the net-next tree opens back up, thanks.

Dave, this patch is for net, since it fixes an issue in the current code 
(actually it relates to a patch
that was merged after 3.14-rc7, so it would eventually might go to 
-stable too). The author wasn't very
familiar with the exact differences/nature of net vs. net-next, so it 
might created some confusion here, I'v
sent him few private notes to explain how things go...

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz April 8, 2014, 6:52 a.m. UTC | #3
On 06/04/2014 05:07, Wei Yang wrote:
> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass
> pci_device_id.driver_data to __mlx4_init_one during reset".
>
> pci_match_id() just match the static pci_device_id, which may return NULL if
> someone binds the driver to a device manually using
> /sys/bus/pci/drivers/.../new_id.
>
> This patch wrap up a helper function __mlx4_remove_one() which does the tear
> down function but preserve the drv_data. Functions like
> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
> releasing drvdata.
>
> Tested on ConnectX-3.
>
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: Amir Vadai <amirv@mellanox.com>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/main.c |  165 +++++++++++++++--------------
>   1 file changed, 88 insertions(+), 77 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
> index aa54ef7..9dd23e0 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
> @@ -2268,13 +2268,8 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
>   	/* Allow large DMA segments, up to the firmware limit of 1 GB */
>   	dma_set_max_seg_size(&pdev->dev, 1024 * 1024 * 1024);
>   
> -	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> -	if (!priv) {
> -		err = -ENOMEM;
> -		goto err_release_regions;
> -	}
> -
> -	dev       = &priv->dev;
> +	dev       = pci_get_drvdata(pdev);
> +	priv      = mlx4_priv(dev);
>   	dev->pdev = pdev;
>   	INIT_LIST_HEAD(&priv->ctx_list);
>   	spin_lock_init(&priv->ctx_lock);

Here are some comments from Jack, please make sure to CC Jack 
Morgenstein <jackm@dev.mellanox.co.il>
on V2 of the patch -->

There is a change in behavior here, which I am concerned about.
Before this patch, __mlx4_init_one() allocated a zeroed-out structure
for priv, in all circumstances.

Now, this structure is passed "dirty" to __mlx4_init_one from some
flows (e.g., mlx4_restart_one).
There may, in fact, be fields in the structure which should be "zeroed
out" and are not, because previously there was no reason to be strict
about this.

Add a line with

memset(priv, 0, sizeof(*priv));


after setting the priv variable above

and also do NOT remove the following line:

- priv->pci_dev_data = pci_dev_data;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang April 8, 2014, 7:45 a.m. UTC | #4
On Tue, Apr 08, 2014 at 09:48:30AM +0300, Or Gerlitz wrote:
>On 07/04/2014 21:43, David Miller wrote:
>>From: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Date: Sun,  6 Apr 2014 10:07:30 +0800
>>
>>>Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass
>>>pci_device_id.driver_data to __mlx4_init_one during reset".
>>>
>>>pci_match_id() just match the static pci_device_id, which may return NULL if
>>>someone binds the driver to a device manually using
>>>/sys/bus/pci/drivers/.../new_id.
>>>
>>>This patch wrap up a helper function __mlx4_remove_one() which does the tear
>>>down function but preserve the drv_data. Functions like
>>>mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
>>>releasing drvdata.
>>>
>>>Tested on ConnectX-3.
>>>
>>>CC: Bjorn Helgaas <bhelgaas@google.com>
>>>CC: Amir Vadai <amirv@mellanox.com>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Please resubmit this when the net-next tree opens back up, thanks.
>
>Dave, this patch is for net, since it fixes an issue in the current
>code (actually it relates to a patch
>that was merged after 3.14-rc7, so it would eventually might go to
>-stable too). The author wasn't very
>familiar with the exact differences/nature of net vs. net-next, so it
>might created some confusion here, I'v
>sent him few private notes to explain how things go...

Yes, sorry for the confusion. Thanks Or	for your patient explanation, while I
still make mistakes.

>
>Or.
David Miller April 8, 2014, 4:25 p.m. UTC | #5
From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Tue, 8 Apr 2014 09:48:30 +0300

> On 07/04/2014 21:43, David Miller wrote:
>> From: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Date: Sun,  6 Apr 2014 10:07:30 +0800
>>
>>> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass
>>> pci_device_id.driver_data to __mlx4_init_one during reset".
>>>
>>> pci_match_id() just match the static pci_device_id, which may return
>>> NULL if
>>> someone binds the driver to a device manually using
>>> /sys/bus/pci/drivers/.../new_id.
>>>
>>> This patch wrap up a helper function __mlx4_remove_one() which does
>>> the tear
>>> down function but preserve the drv_data. Functions like
>>> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with
>>> out
>>> releasing drvdata.
>>>
>>> Tested on ConnectX-3.
>>>
>>> CC: Bjorn Helgaas <bhelgaas@google.com>
>>> CC: Amir Vadai <amirv@mellanox.com>
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Please resubmit this when the net-next tree opens back up, thanks.
> 
> Dave, this patch is for net, since it fixes an issue in the current
> code (actually it relates to a patch
> that was merged after 3.14-rc7, so it would eventually might go to
> -stable too). The author wasn't very
> familiar with the exact differences/nature of net vs. net-next, so it
> might created some confusion here, I'v
> sent him few private notes to explain how things go...

Then please resubmit this patch with the proper subject line, th ank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz April 8, 2014, 4:51 p.m. UTC | #6
On Tue, Apr 8, 2014 at 7:25 PM, David Miller <davem@davemloft.net> wrote:
>
> From: Or Gerlitz <ogerlitz@mellanox.com>
> Date: Tue, 8 Apr 2014 09:48:30 +0300
>
> > On 07/04/2014 21:43, David Miller wrote:
> >> From: Wei Yang <weiyang@linux.vnet.ibm.com>
> >> Date: Sun,  6 Apr 2014 10:07:30 +0800
> >>
> >>> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass
> >>> pci_device_id.driver_data to __mlx4_init_one during reset".
> >>>
> >>> pci_match_id() just match the static pci_device_id, which may return
> >>> NULL if
> >>> someone binds the driver to a device manually using
> >>> /sys/bus/pci/drivers/.../new_id.
> >>>
> >>> This patch wrap up a helper function __mlx4_remove_one() which does
> >>> the tear
> >>> down function but preserve the drv_data. Functions like
> >>> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with
> >>> out
> >>> releasing drvdata.
> >>>
> >>> Tested on ConnectX-3.
> >>>
> >>> CC: Bjorn Helgaas <bhelgaas@google.com>
> >>> CC: Amir Vadai <amirv@mellanox.com>
> >>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> >> Please resubmit this when the net-next tree opens back up, thanks.
> >
> > Dave, this patch is for net, since it fixes an issue in the current
> > code (actually it relates to a patch
> > that was merged after 3.14-rc7, so it would eventually might go to
> > -stable too). The author wasn't very
> > familiar with the exact differences/nature of net vs. net-next, so it
> > might created some confusion here, I'v
> > sent him few private notes to explain how things go...
>
> Then please resubmit this patch with the proper subject line, th ank you.


We've sent some feedback on the patch to the author along with crash
net vs. net-next course, I really hope V1 he will post with the proper
tag...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index aa54ef7..9dd23e0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2268,13 +2268,8 @@  static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
 	/* Allow large DMA segments, up to the firmware limit of 1 GB */
 	dma_set_max_seg_size(&pdev->dev, 1024 * 1024 * 1024);
 
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv) {
-		err = -ENOMEM;
-		goto err_release_regions;
-	}
-
-	dev       = &priv->dev;
+	dev       = pci_get_drvdata(pdev);
+	priv      = mlx4_priv(dev);
 	dev->pdev = pdev;
 	INIT_LIST_HEAD(&priv->ctx_list);
 	spin_lock_init(&priv->ctx_lock);
@@ -2453,9 +2448,6 @@  slave_start:
 	mlx4_sense_init(dev);
 	mlx4_start_sense(dev);
 
-	priv->pci_dev_data = pci_dev_data;
-	pci_set_drvdata(pdev, dev);
-
 	return 0;
 
 err_port:
@@ -2520,84 +2512,101 @@  err_disable_pdev:
 
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
+	struct mlx4_priv *priv;
+	struct mlx4_dev *dev;
+
 	printk_once(KERN_INFO "%s", mlx4_version);
 
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv) {
+		return -ENOMEM;
+	}
+
+	dev       = &priv->dev;
+	pci_set_drvdata(pdev, dev);
+	priv->pci_dev_data = id->driver_data;
+
 	return __mlx4_init_one(pdev, id->driver_data);
 }
 
-static void mlx4_remove_one(struct pci_dev *pdev)
+static void __mlx4_remove_one(struct pci_dev *pdev)
 {
 	struct mlx4_dev  *dev  = pci_get_drvdata(pdev);
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int p;
 
-	if (dev) {
-		/* in SRIOV it is not allowed to unload the pf's
-		 * driver while there are alive vf's */
-		if (mlx4_is_master(dev)) {
-			if (mlx4_how_many_lives_vf(dev))
-				printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");
-		}
-		mlx4_stop_sense(dev);
-		mlx4_unregister_device(dev);
+	/* in SRIOV it is not allowed to unload the pf's
+	 * driver while there are alive vf's */
+	if (mlx4_is_master(dev)) {
+		if (mlx4_how_many_lives_vf(dev))
+			printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");
+	}
+	mlx4_stop_sense(dev);
+	mlx4_unregister_device(dev);
 
-		for (p = 1; p <= dev->caps.num_ports; p++) {
-			mlx4_cleanup_port_info(&priv->port[p]);
-			mlx4_CLOSE_PORT(dev, p);
-		}
+	for (p = 1; p <= dev->caps.num_ports; p++) {
+		mlx4_cleanup_port_info(&priv->port[p]);
+		mlx4_CLOSE_PORT(dev, p);
+	}
 
-		if (mlx4_is_master(dev))
-			mlx4_free_resource_tracker(dev,
-						   RES_TR_FREE_SLAVES_ONLY);
-
-		mlx4_cleanup_counters_table(dev);
-		mlx4_cleanup_qp_table(dev);
-		mlx4_cleanup_srq_table(dev);
-		mlx4_cleanup_cq_table(dev);
-		mlx4_cmd_use_polling(dev);
-		mlx4_cleanup_eq_table(dev);
-		mlx4_cleanup_mcg_table(dev);
-		mlx4_cleanup_mr_table(dev);
-		mlx4_cleanup_xrcd_table(dev);
-		mlx4_cleanup_pd_table(dev);
+	if (mlx4_is_master(dev))
+		mlx4_free_resource_tracker(dev,
+					   RES_TR_FREE_SLAVES_ONLY);
 
-		if (mlx4_is_master(dev))
-			mlx4_free_resource_tracker(dev,
-						   RES_TR_FREE_STRUCTS_ONLY);
-
-		iounmap(priv->kar);
-		mlx4_uar_free(dev, &priv->driver_uar);
-		mlx4_cleanup_uar_table(dev);
-		if (!mlx4_is_slave(dev))
-			mlx4_clear_steering(dev);
-		mlx4_free_eq_table(dev);
-		if (mlx4_is_master(dev))
-			mlx4_multi_func_cleanup(dev);
-		mlx4_close_hca(dev);
-		if (mlx4_is_slave(dev))
-			mlx4_multi_func_cleanup(dev);
-		mlx4_cmd_cleanup(dev);
-
-		if (dev->flags & MLX4_FLAG_MSI_X)
-			pci_disable_msix(pdev);
-		if (dev->flags & MLX4_FLAG_SRIOV) {
-			mlx4_warn(dev, "Disabling SR-IOV\n");
-			pci_disable_sriov(pdev);
-		}
+	mlx4_cleanup_counters_table(dev);
+	mlx4_cleanup_qp_table(dev);
+	mlx4_cleanup_srq_table(dev);
+	mlx4_cleanup_cq_table(dev);
+	mlx4_cmd_use_polling(dev);
+	mlx4_cleanup_eq_table(dev);
+	mlx4_cleanup_mcg_table(dev);
+	mlx4_cleanup_mr_table(dev);
+	mlx4_cleanup_xrcd_table(dev);
+	mlx4_cleanup_pd_table(dev);
 
-		if (!mlx4_is_slave(dev))
-			mlx4_free_ownership(dev);
+	if (mlx4_is_master(dev))
+		mlx4_free_resource_tracker(dev,
+					   RES_TR_FREE_STRUCTS_ONLY);
 
-		kfree(dev->caps.qp0_tunnel);
-		kfree(dev->caps.qp0_proxy);
-		kfree(dev->caps.qp1_tunnel);
-		kfree(dev->caps.qp1_proxy);
+	iounmap(priv->kar);
+	mlx4_uar_free(dev, &priv->driver_uar);
+	mlx4_cleanup_uar_table(dev);
+	if (!mlx4_is_slave(dev))
+		mlx4_clear_steering(dev);
+	mlx4_free_eq_table(dev);
+	if (mlx4_is_master(dev))
+		mlx4_multi_func_cleanup(dev);
+	mlx4_close_hca(dev);
+	if (mlx4_is_slave(dev))
+		mlx4_multi_func_cleanup(dev);
+	mlx4_cmd_cleanup(dev);
 
-		kfree(priv);
-		pci_release_regions(pdev);
-		pci_disable_device(pdev);
-		pci_set_drvdata(pdev, NULL);
+	if (dev->flags & MLX4_FLAG_MSI_X)
+		pci_disable_msix(pdev);
+	if (dev->flags & MLX4_FLAG_SRIOV) {
+		mlx4_warn(dev, "Disabling SR-IOV\n");
+		pci_disable_sriov(pdev);
 	}
+
+	if (!mlx4_is_slave(dev))
+		mlx4_free_ownership(dev);
+
+	kfree(dev->caps.qp0_tunnel);
+	kfree(dev->caps.qp0_proxy);
+	kfree(dev->caps.qp1_tunnel);
+	kfree(dev->caps.qp1_proxy);
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+}
+
+static void mlx4_remove_one(struct pci_dev *pdev)
+{
+	struct mlx4_dev  *dev  = pci_get_drvdata(pdev);
+	struct mlx4_priv *priv = mlx4_priv(dev);
+
+	__mlx4_remove_one(pdev);
+	kfree(priv);
+	pci_set_drvdata(pdev, NULL);
 }
 
 int mlx4_restart_one(struct pci_dev *pdev)
@@ -2607,7 +2616,7 @@  int mlx4_restart_one(struct pci_dev *pdev)
 	int		  pci_dev_data;
 
 	pci_dev_data = priv->pci_dev_data;
-	mlx4_remove_one(pdev);
+	__mlx4_remove_one(pdev);
 	return __mlx4_init_one(pdev, pci_dev_data);
 }
 
@@ -2662,7 +2671,7 @@  MODULE_DEVICE_TABLE(pci, mlx4_pci_table);
 static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev,
 					      pci_channel_state_t state)
 {
-	mlx4_remove_one(pdev);
+	__mlx4_remove_one(pdev);
 
 	return state == pci_channel_io_perm_failure ?
 		PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
@@ -2670,11 +2679,13 @@  static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev,
 
 static pci_ers_result_t mlx4_pci_slot_reset(struct pci_dev *pdev)
 {
-	const struct pci_device_id *id;
-	int ret;
+	struct mlx4_dev	 *dev  = pci_get_drvdata(pdev);
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int		  pci_dev_data;
+	int               ret;
 
-	id = pci_match_id(mlx4_pci_table, pdev);
-	ret = __mlx4_init_one(pdev, id->driver_data);
+	pci_dev_data = priv->pci_dev_data;
+	ret = __mlx4_init_one(pdev, pci_dev_data);
 
 	return ret ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
 }