diff mbox

[net,2/2] net/mlx4_core: mlx4_init_slave() shouldn't access comm channel before PF is ready

Message ID 1394123297-7878-3-git-send-email-amirv@mellanox.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Amir Vadai March 6, 2014, 4:28 p.m. UTC
Currently, the PF call to pci_enable_sriov from the PF probe function
stalls for 10 seconds times the number of VFs probed on the host. This
happens because the way for such VFs to determine of the PF
initialization finished, is by attempting to issue reset on the
comm-channel and get timeout (after 10s).

The PF probe function is called from a kenernel workqueue, and therefore
during that time, rcu lock is being held and kernel's workqueue is
stalled. This blocks other processes that try to use the workqueue
or rcu lock.  For example, interface renaming which is calling
rcu_synchronize is blocked, and timedout by systemd.

Changed mlx4_init_slave() to allow VF probed on the host to immediatly
detect that the PF is not ready, and return EPROBE_DEFER instantly.

Only when the PF finishes the initialization, allow such VFs to
access the comm channel.

This issue and fix are relevant only for probed VFs on the hypervisor,
there is no way to pass this information to a VM until comm channel is
ready, so in a VM, if PF is not ready, the first command will be timedout
after 10 seconds and return EPROBE_DEFER.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

David Miller March 6, 2014, 8:12 p.m. UTC | #1
From: Amir Vadai <amirv@mellanox.com>
Date: Thu,  6 Mar 2014 18:28:17 +0200

> @@ -150,6 +150,8 @@ struct mlx4_port_config {
>  	struct pci_dev *pdev;
>  };
>  
> +static atomic_t pf_loading = ATOMIC_INIT(0);
> +
>  int mlx4_check_port_params(struct mlx4_dev *dev,
>  			   enum mlx4_port_type *port_type)
>  {
> @@ -1407,6 +1409,11 @@ static int mlx4_init_slave(struct mlx4_dev *dev)
>  	u32 slave_read;
>  	u32 cmd_channel_ver;
>  
> +	if (atomic_read(&pf_loading)) {
> +		mlx4_warn(dev, "PF is not ready. Deferring probe\n");
> +		return -EPROBE_DEFER;
> +	}
> +
 ...
> @@ -2319,7 +2326,11 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
>  
>  		if (num_vfs) {
>  			mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n", num_vfs);
> +
> +			atomic_inc(&pf_loading);
>  			err = pci_enable_sriov(pdev, num_vfs);
> +			atomic_dec(&pf_loading);
> +

This synchronization scheme doesn't look right to me at all.

It's global, so VF's for _any_ PF will probe defer while one is enabling
SRIOV.

It doesn't seem correct to cause unrelated VF's to defer the probe.

You have absolutely have to maintain this state at least on a per-PF
level.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz March 6, 2014, 8:19 p.m. UTC | #2
On Thu, Mar 6, 2014 at 10:12 PM, David Miller <davem@davemloft.net> wrote:
> From: Amir Vadai <amirv@mellanox.com>
> Date: Thu,  6 Mar 2014 18:28:17 +0200
> > @@ -150,6 +150,8 @@ struct mlx4_port_config {
> >       struct pci_dev *pdev;
> >  };
> >
> > +static atomic_t pf_loading = ATOMIC_INIT(0);
> > +
> >  int mlx4_check_port_params(struct mlx4_dev *dev,
> >                          enum mlx4_port_type *port_type)
> >  {
> > @@ -1407,6 +1409,11 @@ static int mlx4_init_slave(struct mlx4_dev *dev)
> >       u32 slave_read;
> >       u32 cmd_channel_ver;
> >
> > +     if (atomic_read(&pf_loading)) {
> > +             mlx4_warn(dev, "PF is not ready. Deferring probe\n");
> > +             return -EPROBE_DEFER;
> > +     }
> > +
>  ...
> > @@ -2319,7 +2326,11 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
> >
> >               if (num_vfs) {
> >                       mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n", num_vfs);
> > +
> > +                     atomic_inc(&pf_loading);
> >                       err = pci_enable_sriov(pdev, num_vfs);
> > +                     atomic_dec(&pf_loading);
> > +
>
> This synchronization scheme doesn't look right to me at all.
> It's global, so VF's for _any_ PF will probe defer while one is enabling SRIOV.
> It doesn't seem correct to cause unrelated VF's to defer the probe.

Hi Dave,

Can you please elaborate a bit why you find this approach to be
incorrect? basically, these nested VF probed are a bit headache
anyway, so we didn't find such global deferring to be problematic.

Or.

> You have absolutely have to maintain this state at least on a per-PF level.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 6, 2014, 8:48 p.m. UTC | #3
From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Thu, 6 Mar 2014 22:19:48 +0200

> On Thu, Mar 6, 2014 at 10:12 PM, David Miller <davem@davemloft.net> wrote:
>> From: Amir Vadai <amirv@mellanox.com>
>> Date: Thu,  6 Mar 2014 18:28:17 +0200
>> > @@ -150,6 +150,8 @@ struct mlx4_port_config {
>> >       struct pci_dev *pdev;
>> >  };
>> >
>> > +static atomic_t pf_loading = ATOMIC_INIT(0);
>> > +
>> >  int mlx4_check_port_params(struct mlx4_dev *dev,
>> >                          enum mlx4_port_type *port_type)
>> >  {
>> > @@ -1407,6 +1409,11 @@ static int mlx4_init_slave(struct mlx4_dev *dev)
>> >       u32 slave_read;
>> >       u32 cmd_channel_ver;
>> >
>> > +     if (atomic_read(&pf_loading)) {
>> > +             mlx4_warn(dev, "PF is not ready. Deferring probe\n");
>> > +             return -EPROBE_DEFER;
>> > +     }
>> > +
>>  ...
>> > @@ -2319,7 +2326,11 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
>> >
>> >               if (num_vfs) {
>> >                       mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n", num_vfs);
>> > +
>> > +                     atomic_inc(&pf_loading);
>> >                       err = pci_enable_sriov(pdev, num_vfs);
>> > +                     atomic_dec(&pf_loading);
>> > +
>>
>> This synchronization scheme doesn't look right to me at all.
>> It's global, so VF's for _any_ PF will probe defer while one is enabling SRIOV.
>> It doesn't seem correct to cause unrelated VF's to defer the probe.
> 
> Hi Dave,
> 
> Can you please elaborate a bit why you find this approach to be
> incorrect? basically, these nested VF probed are a bit headache
> anyway, so we didn't find such global deferring to be problematic.

What if a second PF starts to init and call pci_enable_sriov(), while the VFs
from a previous PF probed call mlx4_init_slave()?

It will increment pf_loading() and force those unreladed VFs to defer.

You must have a per-PF value to block the underlying VFs, rather than a global
one.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz March 6, 2014, 9:08 p.m. UTC | #4
On Thu, Mar 6, 2014 at 10:48 PM, David Miller <davem@davemloft.net> wrote:
> From: Or Gerlitz <or.gerlitz@gmail.com>
> Date: Thu, 6 Mar 2014 22:19:48 +0200
>> On Thu, Mar 6, 2014 at 10:12 PM, David Miller <davem@davemloft.net> wrote:

>>> > +static atomic_t pf_loading = ATOMIC_INIT(0);
>>> > @@ -1407,6 +1409,11 @@ static int mlx4_init_slave(struct mlx4_dev *dev)
>>> > +     if (atomic_read(&pf_loading)) {
>>> > +             mlx4_warn(dev, "PF is not ready. Deferring probe\n");
>>> > +             return -EPROBE_DEFER;
>>> > +     }
>>> > +
>>> > @@ -2319,7 +2326,11 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
>>> >
>>> >               if (num_vfs) {
>>> >                       mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n",num_vfs);
>>> > +
>>> > +                     atomic_inc(&pf_loading);
>>> >                       err = pci_enable_sriov(pdev, num_vfs);
>>> > +                     atomic_dec(&pf_loading);
>>> > +

>>> This synchronization scheme doesn't look right to me at all.
>>> It's global, so VF's for _any_ PF will probe defer while one is enabling SRIOV.
>>> It doesn't seem correct to cause unrelated VF's to defer the probe.

>> Can you please elaborate a bit why you find this approach to be
>> incorrect? basically, these nested VF probed are a bit headache
>> anyway, so we didn't find such global deferring to be problematic.

> What if a second PF starts to init and call pci_enable_sriov(), while the VFs
> from a previous PF probed call mlx4_init_slave()?
> It will increment pf_loading() and force those unreladed VFs to defer.

By "unreladed VFs" I assume you mean unrelated VFs that belong to the
1st VF, which is OK for them to probe, right? so yes, this is sort of
conservative approach that wait till all PFs are fully ready, and I
understand you don't like it, but still, I would be happy to know
what's wrong in doing so..

> You must have a per-PF value to block the underlying VFs, rather than a global
> one.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 6, 2014, 9:12 p.m. UTC | #5
From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Thu, 6 Mar 2014 23:08:36 +0200

> On Thu, Mar 6, 2014 at 10:48 PM, David Miller <davem@davemloft.net> wrote:
>> From: Or Gerlitz <or.gerlitz@gmail.com>
>> Date: Thu, 6 Mar 2014 22:19:48 +0200
>>> On Thu, Mar 6, 2014 at 10:12 PM, David Miller <davem@davemloft.net> wrote:
> 
>>>> > +static atomic_t pf_loading = ATOMIC_INIT(0);
>>>> > @@ -1407,6 +1409,11 @@ static int mlx4_init_slave(struct mlx4_dev *dev)
>>>> > +     if (atomic_read(&pf_loading)) {
>>>> > +             mlx4_warn(dev, "PF is not ready. Deferring probe\n");
>>>> > +             return -EPROBE_DEFER;
>>>> > +     }
>>>> > +
>>>> > @@ -2319,7 +2326,11 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
>>>> >
>>>> >               if (num_vfs) {
>>>> >                       mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n",num_vfs);
>>>> > +
>>>> > +                     atomic_inc(&pf_loading);
>>>> >                       err = pci_enable_sriov(pdev, num_vfs);
>>>> > +                     atomic_dec(&pf_loading);
>>>> > +
> 
>>>> This synchronization scheme doesn't look right to me at all.
>>>> It's global, so VF's for _any_ PF will probe defer while one is enabling SRIOV.
>>>> It doesn't seem correct to cause unrelated VF's to defer the probe.
> 
>>> Can you please elaborate a bit why you find this approach to be
>>> incorrect? basically, these nested VF probed are a bit headache
>>> anyway, so we didn't find such global deferring to be problematic.
> 
>> What if a second PF starts to init and call pci_enable_sriov(), while the VFs
>> from a previous PF probed call mlx4_init_slave()?
>> It will increment pf_loading() and force those unreladed VFs to defer.
> 
> By "unreladed VFs" I assume you mean unrelated VFs that belong to the
> 1st VF, which is OK for them to probe, right? so yes, this is sort of
> conservative approach that wait till all PFs are fully ready, and I
> understand you don't like it, but still, I would be happy to know
> what's wrong in doing so..

My understanding is that the relationship between these devices is:

	PF --> VF1, VF2, VF3, ...

and these VF children are (essentially) instantiated by
pci_enable_sriov() calls.

Therefore if we:

	probe PF1

we go:

	pf_loading++
	pci_enable_sriov();
	PF1_VF1 defers
	PF1_VF2 defers
	PF1_VF3 defers
	...
	pf_loading--

next:

	probe PF2

	pf_loading++
..

at this point any attempt of PF1's VFs to init will defer, what will
cause them to properly retry that init?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz March 6, 2014, 9:58 p.m. UTC | #6
On Thu, Mar 6, 2014 at 11:12 PM, David Miller <davem@davemloft.net> wrote:
> From: Or Gerlitz <or.gerlitz@gmail.com>
> Date: Thu, 6 Mar 2014 23:08:36 +0200
>> On Thu, Mar 6, 2014 at 10:48 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Or Gerlitz <or.gerlitz@gmail.com>
>>> Date: Thu, 6 Mar 2014 22:19:48 +0200

>>>>> This synchronization scheme doesn't look right to me at all.
>>>>> It's global, so VF's for _any_ PF will probe defer while one is enabling SRIOV.
>>>>> It doesn't seem correct to cause unrelated VF's to defer the probe.
>>>> Can you please elaborate a bit why you find this approach to be
>>>> incorrect? basically, these nested VF probed are a bit headache
>>>> anyway, so we didn't find such global deferring to be problematic.

>>> What if a second PF starts to init and call pci_enable_sriov(), while the VFs
>>> from a previous PF probed call mlx4_init_slave()?
>>> It will increment pf_loading() and force those unreladed VFs to defer.

>> By "unreladed VFs" I assume you mean unrelated VFs that belong to the
>> 1st VF, which is OK for them to probe, right? so yes, this is sort of
>> conservative approach that wait till all PFs are fully ready, and I
>> understand you don't like it, but still, I would be happy to know
>> what's wrong in doing so..

> My understanding is that the relationship between these devices is:
>         PF --> VF1, VF2, VF3, ...
>
> and these VF children are (essentially) instantiated by
> pci_enable_sriov() calls.
>
> Therefore if we:
>
>         probe PF1
>
> we go:
>
>         pf_loading++
>         pci_enable_sriov();
>         PF1_VF1 defers
>         PF1_VF2 defers
>         PF1_VF3 defers
>         ...
>         pf_loading--
>
> next:
>
>         probe PF2
>
>         pf_loading++
> ..

correct, that would be the situation

> at this point any attempt of PF1's VFs to init will defer, what will
> cause them to properly retry that init?

So... we were thinking that there is a mechanism that causes them to
retry that init as long as they return -EPROBE_DEFER or they succeed,
isn't that the case?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 6, 2014, 10:09 p.m. UTC | #7
From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Thu, 6 Mar 2014 23:58:19 +0200

> So... we were thinking that there is a mechanism that causes them to
> retry that init as long as they return -EPROBE_DEFER or they succeed,
> isn't that the case?

Indeed.  When the PF returns from it's probe, all VFs that deferred
will retry.

Both patches applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 5a6105f..30a08a6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -150,6 +150,8 @@  struct mlx4_port_config {
 	struct pci_dev *pdev;
 };
 
+static atomic_t pf_loading = ATOMIC_INIT(0);
+
 int mlx4_check_port_params(struct mlx4_dev *dev,
 			   enum mlx4_port_type *port_type)
 {
@@ -1407,6 +1409,11 @@  static int mlx4_init_slave(struct mlx4_dev *dev)
 	u32 slave_read;
 	u32 cmd_channel_ver;
 
+	if (atomic_read(&pf_loading)) {
+		mlx4_warn(dev, "PF is not ready. Deferring probe\n");
+		return -EPROBE_DEFER;
+	}
+
 	mutex_lock(&priv->cmd.slave_cmd_mutex);
 	priv->cmd.max_cmds = 1;
 	mlx4_warn(dev, "Sending reset\n");
@@ -2319,7 +2326,11 @@  static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
 
 		if (num_vfs) {
 			mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n", num_vfs);
+
+			atomic_inc(&pf_loading);
 			err = pci_enable_sriov(pdev, num_vfs);
+			atomic_dec(&pf_loading);
+
 			if (err) {
 				mlx4_err(dev, "Failed to enable SR-IOV, continuing without SR-IOV (err = %d).\n",
 					 err);