diff mbox

[net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host

Message ID 1401619783-23659-1-git-send-email-ogerlitz@mellanox.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Or Gerlitz June 1, 2014, 10:49 a.m. UTC
From: Jack Morgenstein <jackm@dev.mellanox.co.il>

Commit befdf89 did not take into account the case where the Host
driver is being unloaded. In this case, pci_get_drvdata for the VF
remove_one call may return NULL, so that dereferencing the priv
struct results in a kernel oops.

The fix is to also test that the dev pointer returned by
pci_get_drvdata is non-NULL.

Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Sergei Shtylyov June 1, 2014, 4:41 p.m. UTC | #1
Hello.

On 06/01/2014 02:49 PM, Or Gerlitz wrote:

> From: Jack Morgenstein <jackm@dev.mellanox.co.il>

> Commit befdf89 did not take into account the case where the Host

    Please also specify that commit's summary line in parens.

> driver is being unloaded. In this case, pci_get_drvdata for the VF
> remove_one call may return NULL, so that dereferencing the priv
> struct results in a kernel oops.

> The fix is to also test that the dev pointer returned by
> pci_get_drvdata is non-NULL.

> Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz June 1, 2014, 7:59 p.m. UTC | #2
On Sun, Jun 1, 2014 at 7:41 PM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> On 06/01/2014 02:49 PM, Or Gerlitz wrote:

>> Commit befdf89 did not take into account the case where the Host
>    Please also specify that commit's summary line in parens.

Did that below, see where we say

Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")

>> driver is being unloaded. In this case, pci_get_drvdata for the VF
>> remove_one call may return NULL, so that dereferencing the priv
>> struct results in a kernel oops.

>> The fix is to also test that the dev pointer returned by
>> pci_get_drvdata is non-NULL.
>> Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang June 2, 2014, 2:29 p.m. UTC | #3
On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote:
>From: Jack Morgenstein <jackm@dev.mellanox.co.il>
>
>Commit befdf89 did not take into account the case where the Host
>driver is being unloaded. In this case, pci_get_drvdata for the VF

In my mind, unloading PF's driver when there is alive VFs is not allowed.
Quoted in driver code:

	/* in SRIOV it is not allowed to unload the pf's
	 * driver while there are alive vf's */
	if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev))
		printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");

Actually, I don't understand this restriction clearly. Maybe my understanding
of alive VF is not correct.

And in your code, unload PF's driver would call pci_disable_sriov() which will
destroy the VFs. While in your test, the VF's driver is still there?

>remove_one call may return NULL, so that dereferencing the priv
>struct results in a kernel oops.

Sorry for my poor mind, I still can't understand this situation.
Would you describe the situation more? You are unloading PF's driver in Host
at first, and then try to release the VF's driver?

>
>The fix is to also test that the dev pointer returned by
>pci_get_drvdata is non-NULL.
>
>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>---
> drivers/net/ethernet/mellanox/mlx4/main.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>index c187d74..a6ae089 100644
>--- a/drivers/net/ethernet/mellanox/mlx4/main.c
>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev)
> 	int               pci_dev_data;
> 	int p;
>
>-	if (priv->removed)
>+	if (!dev || priv->removed)
> 		return;

This fix looks good to me.

As I remembered, I had this check in my first version, but I removed the check
on dev based on the suggestion from Bjorn. Since I agreed that there is no
chance for dev to be NULL. Bjorn, seems we are not correct :(

>
> 	pci_dev_data = priv->pci_dev_data;
>-- 
>1.7.1
Bjorn Helgaas June 2, 2014, 4:10 p.m. UTC | #4
On Mon, Jun 2, 2014 at 8:29 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote:
>>From: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>
>>Commit befdf89 did not take into account the case where the Host
>>driver is being unloaded. In this case, pci_get_drvdata for the VF
>
> In my mind, unloading PF's driver when there is alive VFs is not allowed.
> Quoted in driver code:
>
>         /* in SRIOV it is not allowed to unload the pf's
>          * driver while there are alive vf's */
>         if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev))
>                 printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");
>
> Actually, I don't understand this restriction clearly. Maybe my understanding
> of alive VF is not correct.
>
> And in your code, unload PF's driver would call pci_disable_sriov() which will
> destroy the VFs. While in your test, the VF's driver is still there?
>
>>remove_one call may return NULL, so that dereferencing the priv
>>struct results in a kernel oops.
>
> Sorry for my poor mind, I still can't understand this situation.
> Would you describe the situation more? You are unloading PF's driver in Host
> at first, and then try to release the VF's driver?
>
>>
>>The fix is to also test that the dev pointer returned by
>>pci_get_drvdata is non-NULL.
>>
>>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>>---
>> drivers/net/ethernet/mellanox/mlx4/main.c |    2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>>index c187d74..a6ae089 100644
>>--- a/drivers/net/ethernet/mellanox/mlx4/main.c
>>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev)
>>       int               pci_dev_data;
>>       int p;
>>
>>-      if (priv->removed)
>>+      if (!dev || priv->removed)
>>               return;
>
> This fix looks good to me.
>
> As I remembered, I had this check in my first version, but I removed the check
> on dev based on the suggestion from Bjorn. Since I agreed that there is no
> chance for dev to be NULL. Bjorn, seems we are not correct :(

Writing a driver is not an empirical process of trying things to see
what works.  You need to actively design a consistent structure so you
know why and when things are safe.  I object to gratuitous "dev ==
NULL" checks because often they are just a way of patching up a driver
design that isn't well thought-out.

As I wrote before:

  From the PCI core's perspective, after .probe() returns successfully,
  we can call any driver entry point and pass the pci_dev to it, and
  expect it to work.  Doing mlx4_remove_one() in mlx4_pci_err_detected()
  sort of breaks that assumption because you clear out pci_drvdata().
  Right now, the only other entry point mlx4 really implements is
  mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
  is NULL.  But that's ... a hack, and you'll have to do the same
  if/when you implement suspend/resume/sriov_configure/etc.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 3, 2014, 12:58 a.m. UTC | #5
From: Bjorn Helgaas <bhelgaas@google.com>
Date: Mon, 2 Jun 2014 10:10:01 -0600

> Writing a driver is not an empirical process of trying things to see
> what works.  You need to actively design a consistent structure so you
> know why and when things are safe.  I object to gratuitous "dev ==
> NULL" checks because often they are just a way of patching up a driver
> design that isn't well thought-out.
> 
> As I wrote before:
> 
>   From the PCI core's perspective, after .probe() returns successfully,
>   we can call any driver entry point and pass the pci_dev to it, and
>   expect it to work.  Doing mlx4_remove_one() in mlx4_pci_err_detected()
>   sort of breaks that assumption because you clear out pci_drvdata().
>   Right now, the only other entry point mlx4 really implements is
>   mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
>   is NULL.  But that's ... a hack, and you'll have to do the same
>   if/when you implement suspend/resume/sriov_configure/etc.

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang June 3, 2014, 2 a.m. UTC | #6
On Mon, Jun 02, 2014 at 10:10:01AM -0600, Bjorn Helgaas wrote:
>On Mon, Jun 2, 2014 at 8:29 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote:
>>>From: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>>
>>>Commit befdf89 did not take into account the case where the Host
>>>driver is being unloaded. In this case, pci_get_drvdata for the VF
>>
>> In my mind, unloading PF's driver when there is alive VFs is not allowed.
>> Quoted in driver code:
>>
>>         /* in SRIOV it is not allowed to unload the pf's
>>          * driver while there are alive vf's */
>>         if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev))
>>                 printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");
>>
>> Actually, I don't understand this restriction clearly. Maybe my understanding
>> of alive VF is not correct.
>>
>> And in your code, unload PF's driver would call pci_disable_sriov() which will
>> destroy the VFs. While in your test, the VF's driver is still there?
>>
>>>remove_one call may return NULL, so that dereferencing the priv
>>>struct results in a kernel oops.
>>
>> Sorry for my poor mind, I still can't understand this situation.
>> Would you describe the situation more? You are unloading PF's driver in Host
>> at first, and then try to release the VF's driver?
>>
>>>
>>>The fix is to also test that the dev pointer returned by
>>>pci_get_drvdata is non-NULL.
>>>
>>>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>>>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>>>---
>>> drivers/net/ethernet/mellanox/mlx4/main.c |    2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>index c187d74..a6ae089 100644
>>>--- a/drivers/net/ethernet/mellanox/mlx4/main.c
>>>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev)
>>>       int               pci_dev_data;
>>>       int p;
>>>
>>>-      if (priv->removed)
>>>+      if (!dev || priv->removed)
>>>               return;
>>
>> This fix looks good to me.
>>
>> As I remembered, I had this check in my first version, but I removed the check
>> on dev based on the suggestion from Bjorn. Since I agreed that there is no
>> chance for dev to be NULL. Bjorn, seems we are not correct :(
>
>Writing a driver is not an empirical process of trying things to see
>what works.  You need to actively design a consistent structure so you
>know why and when things are safe.  I object to gratuitous "dev ==
>NULL" checks because often they are just a way of patching up a driver
>design that isn't well thought-out.
>
>As I wrote before:
>
>  From the PCI core's perspective, after .probe() returns successfully,
>  we can call any driver entry point and pass the pci_dev to it, and
>  expect it to work.  Doing mlx4_remove_one() in mlx4_pci_err_detected()
>  sort of breaks that assumption because you clear out pci_drvdata().
>  Right now, the only other entry point mlx4 really implements is
>  mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
>  is NULL.  But that's ... a hack, and you'll have to do the same
>  if/when you implement suspend/resume/sriov_configure/etc.

Thanks for your kindness. After re-reading it, I understand it more, it is not
only related to the Mellanox driver, but also the whole picture about how to
write a driver.

1. We should make the driver entry save, after .probe() returns successfully.
2. If there is an exception and a hack to test the pci_drvdata(), we need to
   have this hack in suspend/resum/etc.

Now back to the current mlx4 driver, mlx4_remove_one() is called by .shutdown
and .remove. In my mind, these two hook is invoked by rmmod or reboot. By
doing so, it is trying to comply with rule 1, make sure the pci_drvdata() is
valid, after .probe() succeed.

Then I am curious about in which case the driver break this rule. 

Following is my suggestion:
1. To comply with rule 1, it would be better to fix this point instead of add
   a hack.
2. Or to comply with rule 2, the driver needs to check pci_drvdata() in every
   driver's entry instead of just in one driver entry. For example,
   mlx4_pci_slot_reset() need this check too.

Bjorn, thanks again, hope my understanding this time is correct :-)

>
>Bjorn
Or Gerlitz June 3, 2014, 8:15 a.m. UTC | #7
On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Writing a driver is not an empirical process of trying things to see
> what works.  You need to actively design a consistent structure so you
> know why and when things are safe.  I object to gratuitous "dev ==
> NULL" checks because often they are just a way of patching up a driver
> design that isn't well thought-out.

Bjorn, 1st and most -- Agreed.

Next, to be precise, the use case of rebooting the host while the
driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
commit befdf89 and is now broken.

Reading further your response, I understand that the code was probably
using a sort of hackish branching to make that to happen, and you
suggest we re-write that section properly so it can serve well when
(hopefully soon) implemenet
sriov_configure and possibly also suspend/resume, point taken.

Dave, as for this patch, again, the regression of inability to reboot
the host node
while the driver is loaded exists in the latest upstream code as of
befdf89 / 3.15-rc1

Now, taking into account that 3.15 is after rc8 and the IL devel team
has a holiday this week, I don't see us coming in time with a more
deeper fix for 3.15, so maybe you can eventaully go and merge this one
liner for 3.15?

Or.


> As I wrote before:
>   From the PCI core's perspective, after .probe() returns successfully,
>   we can call any driver entry point and pass the pci_dev to it, and
>   expect it to work.  Doing mlx4_remove_one() in mlx4_pci_err_detected()
>   sort of breaks that assumption because you clear out pci_drvdata().
>   Right now, the only other entry point mlx4 really implements is
>   mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
>   is NULL.  But that's ... a hack, and you'll have to do the same
>   if/when you implement suspend/resume/sriov_configure/etc.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang June 3, 2014, 8:40 a.m. UTC | #8
On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote:
>On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Writing a driver is not an empirical process of trying things to see
>> what works.  You need to actively design a consistent structure so you
>> know why and when things are safe.  I object to gratuitous "dev ==
>> NULL" checks because often they are just a way of patching up a driver
>> design that isn't well thought-out.
>
>Bjorn, 1st and most -- Agreed.
>
>Next, to be precise, the use case of rebooting the host while the
>driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
>commit befdf89 and is now broken.
>
>Reading further your response, I understand that the code was probably
>using a sort of hackish branching to make that to happen, and you
>suggest we re-write that section properly so it can serve well when
>(hopefully soon) implemenet
>sriov_configure and possibly also suspend/resume, point taken.
>
>Dave, as for this patch, again, the regression of inability to reboot
>the host node
>while the driver is loaded exists in the latest upstream code as of
>befdf89 / 3.15-rc1
>
>Now, taking into account that 3.15 is after rc8 and the IL devel team
>has a holiday this week, I don't see us coming in time with a more
>deeper fix for 3.15, so maybe you can eventaully go and merge this one
>liner for 3.15?

I am glad to verify your patch, if you wish.

>
>Or.
>
>
>> As I wrote before:
>>   From the PCI core's perspective, after .probe() returns successfully,
>>   we can call any driver entry point and pass the pci_dev to it, and
>>   expect it to work.  Doing mlx4_remove_one() in mlx4_pci_err_detected()
>>   sort of breaks that assumption because you clear out pci_drvdata().
>>   Right now, the only other entry point mlx4 really implements is
>>   mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
>>   is NULL.  But that's ... a hack, and you'll have to do the same
>>   if/when you implement suspend/resume/sriov_configure/etc.
Wei Yang June 4, 2014, 9:50 a.m. UTC | #9
On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote:
>On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Writing a driver is not an empirical process of trying things to see
>> what works.  You need to actively design a consistent structure so you
>> know why and when things are safe.  I object to gratuitous "dev ==
>> NULL" checks because often they are just a way of patching up a driver
>> design that isn't well thought-out.
>
>Bjorn, 1st and most -- Agreed.
>
>Next, to be precise, the use case of rebooting the host while the
>driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
>commit befdf89 and is now broken.
>
>Reading further your response, I understand that the code was probably
>using a sort of hackish branching to make that to happen, and you
>suggest we re-write that section properly so it can serve well when
>(hopefully soon) implemenet
>sriov_configure and possibly also suspend/resume, point taken.
>
>Dave, as for this patch, again, the regression of inability to reboot
>the host node
>while the driver is loaded exists in the latest upstream code as of
>befdf89 / 3.15-rc1
>
>Now, taking into account that 3.15 is after rc8 and the IL devel team
>has a holiday this week, I don't see us coming in time with a more
>deeper fix for 3.15, so maybe you can eventaully go and merge this one
>liner for 3.15?
>
>Or.

Hi, Or,

I did some tests with your steps to reproduce the case. Below is my analysis:

I did "rmmod mlx4_core" and "kexec" after probe the Mellanox driver. Below is
the log from two steps respectively.

[root@tian-lp1 ywywyang]# rmmod mlx4_core 
[  534.159740] mlx4_core 0003:05:00.1: mlx4_remove_one: called
[  534.161272] mlx4_core 0003:05:00.0: Received reset from slave:1
[  534.161509] mlx4_core 0003:05:00.0: mlx4_remove_one: called
[  534.170823] mlx4_core 0003:05:00.0: Disabling SR-IOV

[root@tian-lp1 ywywyang]# kexec -e 
[  669.089322] kvm: exiting hardware virtualization
[  669.091746] mlx4_core 0003:05:00.1: mlx4_remove_one: called
[  669.326754] mlx4_core 0003:05:00.0: Received reset from slave:1
[  674.488417] lpfc 0006:01:00.4: 2:2885 Port Status Event: port status reg 0x81000000, port smphr reg 0xc000, error 1=0x9f000001, error 2=0xa9fa47fd
[  675.618578] mlx4_core 0003:05:00.0: mlx4_remove_one: called
[  675.691278] mlx4_en 0003:05:00.0: removed PHC
[  675.700414] mlx4_core 0003:05:00.0: Disabling SR-IOV
[  675.700630] mlx4_core 0003:05:00.1: mlx4_remove_one: called
[  675.700701] Unable to handle kernel paging request for data at address 0x00000370
[  675.700769] Faulting instruction address: 0xd00000001a13fb88
[  675.700826] Oops: Kernel access of bad area, sig: 11 [#1]
[---]

During rmmod, the driver works fine, and in kexec there is oops message. The
kexec is almost the same as reboot. We see the driver for pci device 
0003:05:00.1 has been "removed" twice and at the second time the driver
triggers an error.

rmmod and kexec calls different driver entry, rmmod -> .remove and
kexec->shutdown. I think this is the reason why there is an oops message
during reboot. In .shutdown, the driver will not be detached. While in case
there is VFs, both .shutdown and .remove will be invoked on VF.

Did a quick glance at the e1000e driver, the .shutdown and .remove behaves
differently. So maybe at .shutdown, it needs some different handling than
.remove. Well adding a check at .remove is a quick fix for this case.

This is my draft analysis for your reference, hope it is correct and help you
to some extend.

Have a good day :-)
Or Gerlitz June 8, 2014, 9:16 a.m. UTC | #10
On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
[...]
>   From the PCI core's perspective, after .probe() returns successfully,
>   we can call any driver entry point and pass the pci_dev to it, and
>   expect it to work.  Doing mlx4_remove_one() in mlx4_pci_err_detected()

note that __mlx4_remove_one() is what called from mlx4_pci_err_detected()
and the former is built in a way which allows it to be called twice.

In that respect, I agree to the fix provided by Wei Yang over this thread, which
essentially makes .shutdown to behave in a similar way and call
__mlx4_remove_one()
and will submit it for inclusion.

>   sort of breaks that assumption because you clear out pci_drvdata().
>   Right now, the only other entry point mlx4 really implements is
>   mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
>   is NULL.  But that's ... a hack, and you'll have to do the same
>   if/when you implement suspend/resume/sriov_configure/etc.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index c187d74..a6ae089 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2629,7 +2629,7 @@  static void __mlx4_remove_one(struct pci_dev *pdev)
 	int               pci_dev_data;
 	int p;
 
-	if (priv->removed)
+	if (!dev || priv->removed)
 		return;
 
 	pci_dev_data = priv->pci_dev_data;