diff mbox series

[net-next,RFC,09/13] devlink: Add enable_remote_dev_reset generic parameter

Message ID 1595847753-2234-10-git-send-email-moshe@mellanox.com
State RFC
Delegated to: David Miller
Headers show
Series Add devlink reload level option | expand

Commit Message

Moshe Shemesh July 27, 2020, 11:02 a.m. UTC
The enable_remote_dev_reset devlink param flags that the host admin
allows device resets that can be initiated by other hosts. This
parameter is useful for setups where a device is shared by different
hosts, such as multi-host setup. Once the user set this parameter to
false, the driver should NACK any attempt to reset the device while the
driver is loaded.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 Documentation/networking/devlink/devlink-params.rst | 6 ++++++
 include/net/devlink.h                               | 4 ++++
 net/core/devlink.c                                  | 5 +++++
 3 files changed, 15 insertions(+)

Comments

Jakub Kicinski July 28, 2020, 12:59 a.m. UTC | #1
On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:
> The enable_remote_dev_reset devlink param flags that the host admin
> allows device resets that can be initiated by other hosts. This
> parameter is useful for setups where a device is shared by different
> hosts, such as multi-host setup. Once the user set this parameter to
> false, the driver should NACK any attempt to reset the device while the
> driver is loaded.
> 
> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>

There needs to be a devlink event generated when reset is triggered
(remotely or not).

You're also missing failure reasons. Users need to know if the reset
request was clearly nacked by some host, not supported, etc. vs
unexpected failure.
Moshe Shemesh July 29, 2020, 2:42 p.m. UTC | #2
On 7/28/2020 3:59 AM, Jakub Kicinski wrote:
> On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:
>> The enable_remote_dev_reset devlink param flags that the host admin
>> allows device resets that can be initiated by other hosts. This
>> parameter is useful for setups where a device is shared by different
>> hosts, such as multi-host setup. Once the user set this parameter to
>> false, the driver should NACK any attempt to reset the device while the
>> driver is loaded.
>>
>> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
> There needs to be a devlink event generated when reset is triggered
> (remotely or not).
>
> You're also missing failure reasons. Users need to know if the reset
> request was clearly nacked by some host, not supported, etc. vs
> unexpected failure.


I will fix and send extack message to the user accordingly.
Jakub Kicinski July 29, 2020, 8:57 p.m. UTC | #3
On Wed, 29 Jul 2020 17:42:12 +0300 Moshe Shemesh wrote:
> On 7/28/2020 3:59 AM, Jakub Kicinski wrote:
> > On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:  
> >> The enable_remote_dev_reset devlink param flags that the host admin
> >> allows device resets that can be initiated by other hosts. This
> >> parameter is useful for setups where a device is shared by different
> >> hosts, such as multi-host setup. Once the user set this parameter to
> >> false, the driver should NACK any attempt to reset the device while the
> >> driver is loaded.
> >>
> >> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>  
> > There needs to be a devlink event generated when reset is triggered
> > (remotely or not).
> >
> > You're also missing failure reasons. Users need to know if the reset
> > request was clearly nacked by some host, not supported, etc. vs
> > unexpected failure.  
> 
> I will fix and send extack message to the user accordingly.

I'd suggest the reason codes to be somewhat standard.

The groups I can think of:
 - timeout - device did not respond to the reset request
 - device reject - FW or else has nacked the activation req
 - host incapable - one of the participating hosts (in MH) is not
   capable of handling live activation
 - host denied - one of the participating hosts has NACKed
 - host timeout - one of the p. hosts did not ack or done the procedure
   in time (e.g. has not toggled the link)
 - failed reset - the activation itself had failed
 - failed reinit - one of p. hosts was not able to cleanly come back up
Moshe Shemesh July 30, 2020, 12:08 p.m. UTC | #4
On 7/29/2020 11:57 PM, Jakub Kicinski wrote:
> On Wed, 29 Jul 2020 17:42:12 +0300 Moshe Shemesh wrote:
>> On 7/28/2020 3:59 AM, Jakub Kicinski wrote:
>>> On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:
>>>> The enable_remote_dev_reset devlink param flags that the host admin
>>>> allows device resets that can be initiated by other hosts. This
>>>> parameter is useful for setups where a device is shared by different
>>>> hosts, such as multi-host setup. Once the user set this parameter to
>>>> false, the driver should NACK any attempt to reset the device while the
>>>> driver is loaded.
>>>>
>>>> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
>>> There needs to be a devlink event generated when reset is triggered
>>> (remotely or not).
>>>
>>> You're also missing failure reasons. Users need to know if the reset
>>> request was clearly nacked by some host, not supported, etc. vs
>>> unexpected failure.
>> I will fix and send extack message to the user accordingly.
> I'd suggest the reason codes to be somewhat standard.
>
> The groups I can think of:
>   - timeout - device did not respond to the reset request
>   - device reject - FW or else has nacked the activation req
>   - host incapable - one of the participating hosts (in MH) is not
>     capable of handling live activation
>   - host denied - one of the participating hosts has NACKed
>   - host timeout - one of the p. hosts did not ack or done the procedure
>     in time (e.g. has not toggled the link)
>   - failed reset - the activation itself had failed
>   - failed reinit - one of p. hosts was not able to cleanly come back up


Sounds good, that seems to cover all options of fw_reset process to fail.
diff mbox series

Patch

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index d075fd090b3d..54c9f107c4b0 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -108,3 +108,9 @@  own name.
    * - ``region_snapshot_enable``
      - Boolean
      - Enable capture of ``devlink-region`` snapshots.
+   * - ``enable_remote_dev_reset``
+     - Boolean
+     - Enable device reset by remote host. When cleared, the device driver
+       will NACK any attempt of other host to reset the device. This parameter
+       is useful for setups where a device is shared by different hosts, such
+       as multi-host setup.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index b291cd8d6be6..125e7bb9bb82 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -420,6 +420,7 @@  enum devlink_param_generic_id {
 	DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY,
 	DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE,
 	DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
+	DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
 
 	/* add new param generic ids above here*/
 	__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -457,6 +458,9 @@  enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME "enable_roce"
 #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME "enable_remote_dev_reset"
+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)	\
 {									\
 	.id = DEVLINK_PARAM_GENERIC_ID_##_id,				\
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f1812fc620d4..4d7b0f2a6f7b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3230,6 +3230,11 @@  static const struct devlink_param devlink_param_generic[] = {
 		.name = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME,
 		.type = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE,
 	},
+	{
+		.id = DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
+		.name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME,
+		.type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE,
+	},
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)