diff mbox series

[net-next,RFC,01/13] devlink: Add reload level option to devlink reload command

Message ID 1595847753-2234-2-git-send-email-moshe@mellanox.com
State RFC
Delegated to: David Miller
Headers show
Series Add devlink reload level option | expand

Commit Message

Moshe Shemesh July 27, 2020, 11:02 a.m. UTC
Add devlink reload level to allow the user to request a specific reload
level. The level parameter is optional, if not specified then driver's
default reload level is used (backward compatible).
Reload levels supported are:
driver: driver entities re-instantiation only.
fw_reset: firmware reset and driver entities re-instantiation.
fw_live_patch: firmware live patching only.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c     |  6 ++-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  6 ++-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  6 ++-
 drivers/net/netdevsim/dev.c                   |  6 ++-
 include/net/devlink.h                         |  6 ++-
 include/uapi/linux/devlink.h                  | 19 +++++++
 net/core/devlink.c                            | 52 +++++++++++++++++--
 7 files changed, 86 insertions(+), 15 deletions(-)

Comments

Jakub Kicinski July 28, 2020, 12:58 a.m. UTC | #1
On Mon, 27 Jul 2020 14:02:21 +0300 Moshe Shemesh wrote:
> Add devlink reload level to allow the user to request a specific reload
> level. The level parameter is optional, if not specified then driver's
> default reload level is used (backward compatible).

Please don't leave space for driver-specific behavior. The OS is
supposed to abstract device differences away.

Previously the purpose of reload was to activate new devlink params
(with driverinit cmode), now you want the ability to activate new
firmware. Let users specify their intent and their constraints.

> Reload levels supported are:
> driver: driver entities re-instantiation only.
> fw_reset: firmware reset and driver entities re-instantiation.
> fw_live_patch: firmware live patching only.

I'm concerned live_patch is not first - it's the lowest impact (since
it's live). Please make sure you clearly specify the expected behavior
for the new API.

The notion of multi-host is key for live patching, so it has to be
mentioned.

> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Jiri Pirko July 28, 2020, 1:58 p.m. UTC | #2
Tue, Jul 28, 2020 at 02:58:02AM CEST, kuba@kernel.org wrote:
>On Mon, 27 Jul 2020 14:02:21 +0300 Moshe Shemesh wrote:
>> Add devlink reload level to allow the user to request a specific reload
>> level. The level parameter is optional, if not specified then driver's
>> default reload level is used (backward compatible).
>
>Please don't leave space for driver-specific behavior. The OS is
>supposed to abstract device differences away.

But this is needed to maintain the existing behaviour which is different
for different drivers.


>
>Previously the purpose of reload was to activate new devlink params
>(with driverinit cmode), now you want the ability to activate new
>firmware. Let users specify their intent and their constraints.
>
>> Reload levels supported are:
>> driver: driver entities re-instantiation only.
>> fw_reset: firmware reset and driver entities re-instantiation.
>> fw_live_patch: firmware live patching only.
>
>I'm concerned live_patch is not first - it's the lowest impact (since
>it's live). Please make sure you clearly specify the expected behavior
>for the new API.
>
>The notion of multi-host is key for live patching, so it has to be
>mentioned.
>
>> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Keller, Jacob E July 28, 2020, 4:47 p.m. UTC | #3
On 7/28/2020 6:58 AM, Jiri Pirko wrote:
> Tue, Jul 28, 2020 at 02:58:02AM CEST, kuba@kernel.org wrote:
>> On Mon, 27 Jul 2020 14:02:21 +0300 Moshe Shemesh wrote:
>>> Add devlink reload level to allow the user to request a specific reload
>>> level. The level parameter is optional, if not specified then driver's
>>> default reload level is used (backward compatible).
>>
>> Please don't leave space for driver-specific behavior. The OS is
>> supposed to abstract device differences away.
> 
> But this is needed to maintain the existing behaviour which is different
> for different drivers.
> 

Which drivers behave differently here?
Jakub Kicinski July 28, 2020, 6:44 p.m. UTC | #4
On Tue, 28 Jul 2020 09:47:00 -0700 Jacob Keller wrote:
> On 7/28/2020 6:58 AM, Jiri Pirko wrote:
> > But this is needed to maintain the existing behaviour which is different
> > for different drivers.
> 
> Which drivers behave differently here?

I think Jiri refers to mlxsw vs mlx5.

mlxsw loads firmware on probe, by default at least. So reloading the
driver implies a FW reset. NIC drivers OTOH don't generally load FW
so they didn't reset FW.

Now since we're redefining the API from "do a reload so that driverinit
params are applied" (or "so that all netdevs get spawned in a new
netns") to "do a reset of depth X" we have to change the paradigm.

What I was trying to suggest is that we should not have to re-define
the API like this.

From user perspective what's important is what the reset achieves (and
perhaps how destructive it is). We can define the reset levels as:

$ devlink dev reload pci/0000:82:00.0 net-ns-respawn
$ devlink dev reload pci/0000:82:00.0 driver-param-init
$ devlink dev reload pci/0000:82:00.0 fw-activate

combining should be possible when user wants multiple things to happen:

$ devlink dev reload pci/0000:82:00.0 fw-activate driver-param-init


Then we have the use case of a "live reset" which is slightly
under-defined right now IMHO, but we can extend it as:

$ devlink dev reload pci/0000:82:00.0 fw-activate --live


We can also add the "reset level" specifier - for the cases where
device is misbehaving:

$ devlink dev reload pci/0000:82:00.0 level [driver|fw|hardware]


But I don't think that we can go from the current reload command
cleanly to just a level reset. The driver-specific default is a bad
smell which indicates we're changing semantics from what user wants 
to what the reset depth is. Our semantics with the patch as it stands
are in fact:
 - if you want to load new params or change netns, don't pass the level
   - the "driver default" workaround dictates the right reset level for
   param init;
 - if you want to activate new firmware - select the reset level you'd
   like from the reset level options.
Keller, Jacob E July 28, 2020, 7:18 p.m. UTC | #5
On 7/28/2020 11:44 AM, Jakub Kicinski wrote:
> On Tue, 28 Jul 2020 09:47:00 -0700 Jacob Keller wrote:
>> On 7/28/2020 6:58 AM, Jiri Pirko wrote:
>>> But this is needed to maintain the existing behaviour which is different
>>> for different drivers.
>>
>> Which drivers behave differently here?
> 
> I think Jiri refers to mlxsw vs mlx5.
> 
> mlxsw loads firmware on probe, by default at least. So reloading the
> driver implies a FW reset. NIC drivers OTOH don't generally load FW
> so they didn't reset FW.
> 

Ok.

> Now since we're redefining the API from "do a reload so that driverinit
> params are applied" (or "so that all netdevs get spawned in a new
> netns") to "do a reset of depth X" we have to change the paradigm.
> 
> What I was trying to suggest is that we should not have to re-define
> the API like this.

Ok.

> 
> From user perspective what's important is what the reset achieves (and
> perhaps how destructive it is). We can define the reset levels as:
> 
> $ devlink dev reload pci/0000:82:00.0 net-ns-respawn
> $ devlink dev reload pci/0000:82:00.0 driver-param-init
> $ devlink dev reload pci/0000:82:00.0 fw-activate
> 
> combining should be possible when user wants multiple things to happen:
> 
> $ devlink dev reload pci/0000:82:00.0 fw-activate driver-param-init
> 

Where today "driver-param-init" is the default behavior. But didn't we
just say that mlxsw also does the equivalent of fw-activate?

> 
> Then we have the use case of a "live reset" which is slightly
> under-defined right now IMHO, but we can extend it as:
> 
> $ devlink dev reload pci/0000:82:00.0 fw-activate --live
> 

Yea, I think live fw patching things aren't quite as defined yet.

> 
> We can also add the "reset level" specifier - for the cases where
> device is misbehaving:
> 
> $ devlink dev reload pci/0000:82:00.0 level [driver|fw|hardware]
> 

I guess I don't quite see how level fits in? This is orthogonal to the
other settings?

> 
> But I don't think that we can go from the current reload command
> cleanly to just a level reset. The driver-specific default is a bad
> smell which indicates we're changing semantics from what user wants 
> to what the reset depth is. Our semantics with the patch as it stands
> are in fact:
>  - if you want to load new params or change netns, don't pass the level
>    - the "driver default" workaround dictates the right reset level for
>    param init;
>  - if you want to activate new firmware - select the reset level you'd
>    like from the reset level options.
> 

I think I agree, having the "what gets reloaded" as a separate vector
makes sense and avoids confusion. For example for ice hardware,
"fw-activate" really does mean "Do an EMP reset" rather than just a
"device reset" which could be interpreted differently. ice can do
function level reset, or core device reset also. Neither of those two
resets activates firmware.

Additionally the current function load process in ice doesn't support
driver-init at all. That's something I'd like to see happen but is quite
a significant refactor for how the driver loads. We need to refactor
everything out so that devlink is created early on and factor out
load/unload into handlers that can be called by the devlink reload. As I
have primarily been focused on flash update I sort of left that for the
future because it was a huge task to solve.
Jakub Kicinski July 28, 2020, 8:06 p.m. UTC | #6
On Tue, 28 Jul 2020 12:18:30 -0700 Jacob Keller wrote:
> On 7/28/2020 11:44 AM, Jakub Kicinski wrote:
> > From user perspective what's important is what the reset achieves (and
> > perhaps how destructive it is). We can define the reset levels as:
> > 
> > $ devlink dev reload pci/0000:82:00.0 net-ns-respawn
> > $ devlink dev reload pci/0000:82:00.0 driver-param-init
> > $ devlink dev reload pci/0000:82:00.0 fw-activate
> > 
> > combining should be possible when user wants multiple things to happen:
> > 
> > $ devlink dev reload pci/0000:82:00.0 fw-activate driver-param-init
> 
> Where today "driver-param-init" is the default behavior. But didn't we
> just say that mlxsw also does the equivalent of fw-activate?

Actually the default should probably be the combination of
driver-param-init and net-ns-respawn.

My expectations would be that the driver must perform the lowest reset 
level possible that satisfies the requested functional change. 
IOW driver may do more, in fact it should be acceptable for the driver
to always for a full HW reset (unless --live or other constraint is
specified).

> > We can also add the "reset level" specifier - for the cases where
> > device is misbehaving:
> > 
> > $ devlink dev reload pci/0000:82:00.0 level [driver|fw|hardware]
> 
> I guess I don't quite see how level fits in? This is orthogonal to the
> other settings?

Yup, it is, it's already orthogonal to what reload does today, hence the
need for the "driver default" hack.

> > But I don't think that we can go from the current reload command
> > cleanly to just a level reset. The driver-specific default is a bad
> > smell which indicates we're changing semantics from what user wants 
> > to what the reset depth is. Our semantics with the patch as it stands
> > are in fact:
> >  - if you want to load new params or change netns, don't pass the level
> >    - the "driver default" workaround dictates the right reset level for
> >    param init;
> >  - if you want to activate new firmware - select the reset level you'd
> >    like from the reset level options.
> >   
> 
> I think I agree, having the "what gets reloaded" as a separate vector
> makes sense and avoids confusion. For example for ice hardware,
> "fw-activate" really does mean "Do an EMP reset" rather than just a
> "device reset" which could be interpreted differently. ice can do
> function level reset, or core device reset also. Neither of those two
> resets activates firmware.
> 
> Additionally the current function load process in ice doesn't support
> driver-init at all. That's something I'd like to see happen but is quite
> a significant refactor for how the driver loads. We need to refactor
> everything out so that devlink is created early on and factor out
> load/unload into handlers that can be called by the devlink reload. As I
> have primarily been focused on flash update I sort of left that for the
> future because it was a huge task to solve.

Cool! That was what I was concerned about, but I didn't know any
existing driver already has the problem. "FW reset" is not nearly
a clear enough operation. We'd end up with drivers differing and 
users having to refer to vendor documentation to find out which 
"reset level" maps to what.

I think the components in ethtool-reset try to address the same
problem, and they have the notion of per-port, and per-device.
In the modern world we lack the per-host notion, but that's still 
strictly clearer than the limited API proposed here.
Moshe Shemesh July 29, 2020, 2:54 p.m. UTC | #7
On 7/28/2020 11:06 PM, Jakub Kicinski wrote:
> On Tue, 28 Jul 2020 12:18:30 -0700 Jacob Keller wrote:
>> On 7/28/2020 11:44 AM, Jakub Kicinski wrote:
>>>  From user perspective what's important is what the reset achieves (and
>>> perhaps how destructive it is). We can define the reset levels as:
>>>
>>> $ devlink dev reload pci/0000:82:00.0 net-ns-respawn
>>> $ devlink dev reload pci/0000:82:00.0 driver-param-init
>>> $ devlink dev reload pci/0000:82:00.0 fw-activate
>>>
>>> combining should be possible when user wants multiple things to happen:
>>>
>>> $ devlink dev reload pci/0000:82:00.0 fw-activate driver-param-init
>> Where today "driver-param-init" is the default behavior. But didn't we
>> just say that mlxsw also does the equivalent of fw-activate?
> Actually the default should probably be the combination of
> driver-param-init and net-ns-respawn.


What about the support of these combinations, one device needs to reset 
fw to apply the param init,

while another device can apply param-init without fw reset, but has to 
reload the driver for fw-reset.

So the support per driver will be a matrix of combinations ?

> My expectations would be that the driver must perform the lowest reset
> level possible that satisfies the requested functional change.
> IOW driver may do more, in fact it should be acceptable for the driver
> to always for a full HW reset (unless --live or other constraint is
> specified).


OK, but some combinations may still not be valid for specific driver 
even if it tries lowest level possible.

>>> We can also add the "reset level" specifier - for the cases where
>>> device is misbehaving:
>>>
>>> $ devlink dev reload pci/0000:82:00.0 level [driver|fw|hardware]
>> I guess I don't quite see how level fits in? This is orthogonal to the
>> other settings?
> Yup, it is, it's already orthogonal to what reload does today, hence the
> need for the "driver default" hack.
>
>>> But I don't think that we can go from the current reload command
>>> cleanly to just a level reset. The driver-specific default is a bad
>>> smell which indicates we're changing semantics from what user wants
>>> to what the reset depth is. Our semantics with the patch as it stands
>>> are in fact:
>>>   - if you want to load new params or change netns, don't pass the level
>>>     - the "driver default" workaround dictates the right reset level for
>>>     param init;
>>>   - if you want to activate new firmware - select the reset level you'd
>>>     like from the reset level options.
>>>    
>> I think I agree, having the "what gets reloaded" as a separate vector
>> makes sense and avoids confusion. For example for ice hardware,
>> "fw-activate" really does mean "Do an EMP reset" rather than just a
>> "device reset" which could be interpreted differently. ice can do
>> function level reset, or core device reset also. Neither of those two
>> resets activates firmware.
>>
>> Additionally the current function load process in ice doesn't support
>> driver-init at all. That's something I'd like to see happen but is quite
>> a significant refactor for how the driver loads. We need to refactor
>> everything out so that devlink is created early on and factor out
>> load/unload into handlers that can be called by the devlink reload. As I
>> have primarily been focused on flash update I sort of left that for the
>> future because it was a huge task to solve.
> Cool! That was what I was concerned about, but I didn't know any
> existing driver already has the problem. "FW reset" is not nearly
> a clear enough operation. We'd end up with drivers differing and
> users having to refer to vendor documentation to find out which
> "reset level" maps to what.
>
> I think the components in ethtool-reset try to address the same
> problem, and they have the notion of per-port, and per-device.
> In the modern world we lack the per-host notion, but that's still
> strictly clearer than the limited API proposed here.
Jakub Kicinski July 29, 2020, 9:07 p.m. UTC | #8
On Wed, 29 Jul 2020 17:54:08 +0300 Moshe Shemesh wrote:
> On 7/28/2020 11:06 PM, Jakub Kicinski wrote:
> > On Tue, 28 Jul 2020 12:18:30 -0700 Jacob Keller wrote:  
> >> On 7/28/2020 11:44 AM, Jakub Kicinski wrote:  
> >>>  From user perspective what's important is what the reset achieves (and
> >>> perhaps how destructive it is). We can define the reset levels as:
> >>>
> >>> $ devlink dev reload pci/0000:82:00.0 net-ns-respawn
> >>> $ devlink dev reload pci/0000:82:00.0 driver-param-init
> >>> $ devlink dev reload pci/0000:82:00.0 fw-activate
> >>>
> >>> combining should be possible when user wants multiple things to happen:
> >>>
> >>> $ devlink dev reload pci/0000:82:00.0 fw-activate driver-param-init  
> >> Where today "driver-param-init" is the default behavior. But didn't we
> >> just say that mlxsw also does the equivalent of fw-activate?  
> > Actually the default should probably be the combination of
> > driver-param-init and net-ns-respawn.  
> 
> What about the support of these combinations, one device needs to reset 
> fw to apply the param init, while another device can apply param-init
> without fw reset, but has to reload the driver for fw-reset.
> 
> So the support per driver will be a matrix of combinations ?

Note that there is no driver reload in my examples, driver reload is
likely not user's goal. Whatever the driver needs to reset to satisfy
the goal is fair game IMO.

It's already the case that some drivers reset FW for param init and some
don't and nobody is complaining.

We should treat constraints separate (in this set we have the live
activation which is a constraint on the reload operation).

> > My expectations would be that the driver must perform the lowest
> > reset level possible that satisfies the requested functional change.
> > IOW driver may do more, in fact it should be acceptable for the
> > driver to always for a full HW reset (unless --live or other
> > constraint is specified).  
> 
> OK, but some combinations may still not be valid for specific driver 
> even if it tries lowest level possible.

Can you give an example?
Moshe Shemesh July 30, 2020, 12:30 p.m. UTC | #9
On 7/30/2020 12:07 AM, Jakub Kicinski wrote:
> On Wed, 29 Jul 2020 17:54:08 +0300 Moshe Shemesh wrote:
>> On 7/28/2020 11:06 PM, Jakub Kicinski wrote:
>>> On Tue, 28 Jul 2020 12:18:30 -0700 Jacob Keller wrote:
>>>> On 7/28/2020 11:44 AM, Jakub Kicinski wrote:
>>>>>   From user perspective what's important is what the reset achieves (and
>>>>> perhaps how destructive it is). We can define the reset levels as:
>>>>>
>>>>> $ devlink dev reload pci/0000:82:00.0 net-ns-respawn
>>>>> $ devlink dev reload pci/0000:82:00.0 driver-param-init
>>>>> $ devlink dev reload pci/0000:82:00.0 fw-activate
>>>>>
>>>>> combining should be possible when user wants multiple things to happen:
>>>>>
>>>>> $ devlink dev reload pci/0000:82:00.0 fw-activate driver-param-init
>>>> Where today "driver-param-init" is the default behavior. But didn't we
>>>> just say that mlxsw also does the equivalent of fw-activate?
>>> Actually the default should probably be the combination of
>>> driver-param-init and net-ns-respawn.
>> What about the support of these combinations, one device needs to reset
>> fw to apply the param init, while another device can apply param-init
>> without fw reset, but has to reload the driver for fw-reset.
>>
>> So the support per driver will be a matrix of combinations ?
> Note that there is no driver reload in my examples, driver reload is
> likely not user's goal. Whatever the driver needs to reset to satisfy
> the goal is fair game IMO.


Actually, driver-param-init (cmode driverinit) implicit driver 
re-initialization.

> It's already the case that some drivers reset FW for param init and some
> don't and nobody is complaining.


Right, driver may need more than driver re-initialization for 
driver-param-init, but I think that driver re-initialization is the 
minimum for driver-param-init.

> We should treat constraints separate (in this set we have the live
> activation which is a constraint on the reload operation).
>
>>> My expectations would be that the driver must perform the lowest
>>> reset level possible that satisfies the requested functional change.
>>> IOW driver may do more, in fact it should be acceptable for the
>>> driver to always for a full HW reset (unless --live or other
>>> constraint is specified).
>> OK, but some combinations may still not be valid for specific driver
>> even if it tries lowest level possible.
> Can you give an example?


For example take the combination of fw-live-patch and param-init.

The fw-live-patch needs no re-initialization, while the param-init 
requires driver re-initialization.

So the only way to do that is to the one command after the other, not 
really combining.

Other combination, as fw-atcivate and param-init may not be valid for a 
specific driver as it doesn't support one of them and so can't even run 
one after the other.
Jakub Kicinski July 30, 2020, 11:11 p.m. UTC | #10
On Thu, 30 Jul 2020 15:30:45 +0300 Moshe Shemesh wrote:
> >>> My expectations would be that the driver must perform the lowest
> >>> reset level possible that satisfies the requested functional change.
> >>> IOW driver may do more, in fact it should be acceptable for the
> >>> driver to always for a full HW reset (unless --live or other
> >>> constraint is specified).  
> >> OK, but some combinations may still not be valid for specific driver
> >> even if it tries lowest level possible.  
> > Can you give an example?  
> 
> For example take the combination of fw-live-patch and param-init.
> 
> The fw-live-patch needs no re-initialization, while the param-init 
> requires driver re-initialization.
> 
> So the only way to do that is to the one command after the other, not 
> really combining.

You need to read my responses more carefully. I don't have
fw-live-patch in my proposal. The operation is fw-activate,
--live is independent and an constraint, not an operation.
Moshe Shemesh Aug. 1, 2020, 9:32 p.m. UTC | #11
On 7/31/2020 2:11 AM, Jakub Kicinski wrote:
> On Thu, 30 Jul 2020 15:30:45 +0300 Moshe Shemesh wrote:
>>>>> My expectations would be that the driver must perform the lowest
>>>>> reset level possible that satisfies the requested functional change.
>>>>> IOW driver may do more, in fact it should be acceptable for the
>>>>> driver to always for a full HW reset (unless --live or other
>>>>> constraint is specified).
>>>> OK, but some combinations may still not be valid for specific driver
>>>> even if it tries lowest level possible.
>>> Can you give an example?
>> For example take the combination of fw-live-patch and param-init.
>>
>> The fw-live-patch needs no re-initialization, while the param-init
>> requires driver re-initialization.
>>
>> So the only way to do that is to the one command after the other, not
>> really combining.
> You need to read my responses more carefully. I don't have
> fw-live-patch in my proposal. The operation is fw-activate,
> --live is independent and an constraint, not an operation.


OK, I probably didn't get the whole picture right.

I am not sure I got it yet, please review if that's the uAPI that you 
mean to:

devlink dev reload [ net-ns-respawn { PID | NAME | ID } ] [ 
driver-param-init ] [ fw-activate [ --live] ]


Also, I recall that before devlink param was added the devlink reload 
was used for devlink resources.

I am not sure it is still used for devlink resources as I don't see it 
in the code of devlink reload.

But if it is we probably should add it as another operation.

Jiri, please comment on that.
Jiri Pirko Aug. 3, 2020, 2:14 p.m. UTC | #12
Sat, Aug 01, 2020 at 11:32:25PM CEST, moshe@mellanox.com wrote:
>
>On 7/31/2020 2:11 AM, Jakub Kicinski wrote:
>> On Thu, 30 Jul 2020 15:30:45 +0300 Moshe Shemesh wrote:
>> > > > > My expectations would be that the driver must perform the lowest
>> > > > > reset level possible that satisfies the requested functional change.
>> > > > > IOW driver may do more, in fact it should be acceptable for the
>> > > > > driver to always for a full HW reset (unless --live or other
>> > > > > constraint is specified).
>> > > > OK, but some combinations may still not be valid for specific driver
>> > > > even if it tries lowest level possible.
>> > > Can you give an example?
>> > For example take the combination of fw-live-patch and param-init.
>> > 
>> > The fw-live-patch needs no re-initialization, while the param-init
>> > requires driver re-initialization.
>> > 
>> > So the only way to do that is to the one command after the other, not
>> > really combining.
>> You need to read my responses more carefully. I don't have
>> fw-live-patch in my proposal. The operation is fw-activate,
>> --live is independent and an constraint, not an operation.
>
>
>OK, I probably didn't get the whole picture right.
>
>I am not sure I got it yet, please review if that's the uAPI that you mean
>to:
>
>devlink dev reload [ net-ns-respawn { PID | NAME | ID } ] [ driver-param-init
>] [ fw-activate [ --live] ]

Jakub, why do you prefer to have another extra level-specific option
"live"? I think it is clear to have it as a separate level. The behaviour
of the operation is quite different.


>
>
>Also, I recall that before devlink param was added the devlink reload was
>used for devlink resources.

Yes. That was the primary usecase. That is also why mlxsw does fw reset,
because the fw reset is needed in order to pass resources configuration.

So I don't think that the name should be "driver-param-init" as it is
not specific to params.


>
>I am not sure it is still used for devlink resources as I don't see it in the
>code of devlink reload.
>
>But if it is we probably should add it as another operation.
>
>Jiri, please comment on that.
>
Jakub Kicinski Aug. 3, 2020, 8:57 p.m. UTC | #13
On Mon, 3 Aug 2020 16:14:42 +0200 Jiri Pirko wrote:
> >devlink dev reload [ net-ns-respawn { PID | NAME | ID } ] [ driver-param-init
> >] [ fw-activate [ --live] ]  
> 
> Jakub, why do you prefer to have another extra level-specific option
> "live"? I think it is clear to have it as a separate level. The behaviour
> of the operation is quite different.

I was trying to avoid having to provide a Cartesian product of
operation and system disruption level, if any other action can
be done "live" at some point.

But no strong feelings about that one.

Really, as long as there is no driver-specific defaults (or as 
little driver-specific anything as possible) and user actions 
are clearly expressed (fw-reset does not necessarily imply
fw-activation) - the API will be fine IMO.
Jiri Pirko Aug. 4, 2020, 10:04 a.m. UTC | #14
Mon, Aug 03, 2020 at 10:57:03PM CEST, kuba@kernel.org wrote:
>On Mon, 3 Aug 2020 16:14:42 +0200 Jiri Pirko wrote:
>> >devlink dev reload [ net-ns-respawn { PID | NAME | ID } ] [ driver-param-init
>> >] [ fw-activate [ --live] ]  
>> 
>> Jakub, why do you prefer to have another extra level-specific option
>> "live"? I think it is clear to have it as a separate level. The behaviour
>> of the operation is quite different.
>
>I was trying to avoid having to provide a Cartesian product of
>operation and system disruption level, if any other action can
>be done "live" at some point.
>
>But no strong feelings about that one.
>
>Really, as long as there is no driver-specific defaults (or as 
>little driver-specific anything as possible) and user actions 
>are clearly expressed (fw-reset does not necessarily imply
>fw-activation) - the API will be fine IMO.

Clear actions, that is what I'm fine with.

But not sure how you think we can achieve no driver-specific defaults.
We have them already :/ I don't think we can easily remove them and not
break user expectations.
Jakub Kicinski Aug. 4, 2020, 8:39 p.m. UTC | #15
On Tue, 4 Aug 2020 12:04:18 +0200 Jiri Pirko wrote:
> Mon, Aug 03, 2020 at 10:57:03PM CEST, kuba@kernel.org wrote:
> >I was trying to avoid having to provide a Cartesian product of
> >operation and system disruption level, if any other action can
> >be done "live" at some point.
> >
> >But no strong feelings about that one.
> >
> >Really, as long as there is no driver-specific defaults (or as 
> >little driver-specific anything as possible) and user actions 
> >are clearly expressed (fw-reset does not necessarily imply
> >fw-activation) - the API will be fine IMO.  
> 
> Clear actions, that is what I'm fine with.
> 
> But not sure how you think we can achieve no driver-specific defaults.
> We have them already :/ I don't think we can easily remove them and not
> break user expectations.

AFAIU the per-driver default is needed because we went too low 
level with what the action constitutes. We need maintain the higher
level actions.

The user clearly did not care if FW was reset during devlink reload
before this set, so what has changed? The objective user has is to
activate their config / FW / move to different net ns. 

Reloading the driver or resetting FW is a low level detail which
achieves different things for different implementations. So it's 
not a suitable abstraction -> IOW we need the driver default.


The work flow for the user is:

0. download fw to /lib/firmware
1. devlink flash $dev $fw
2. if live activation is enabled
   yes - devlink reload $dev $live-activate
   no - report machine has to be drained for reboot

fw-reset can't be $live-activate, because as Jake said fw-reset does
not activate the new image for Intel. So will we end up per-driver
defaults in the kernel space, and user space maintaining a mapping from
a driver to what a "level" of reset implies.

I hope this makes things crystal clear. Please explain what problems
you're seeing and extensions you're expecting. A list of user scenarios
you foresee would be v. useful.
Jiri Pirko Aug. 5, 2020, 11:02 a.m. UTC | #16
Tue, Aug 04, 2020 at 10:39:46PM CEST, kuba@kernel.org wrote:
>On Tue, 4 Aug 2020 12:04:18 +0200 Jiri Pirko wrote:
>> Mon, Aug 03, 2020 at 10:57:03PM CEST, kuba@kernel.org wrote:
>> >I was trying to avoid having to provide a Cartesian product of
>> >operation and system disruption level, if any other action can
>> >be done "live" at some point.
>> >
>> >But no strong feelings about that one.
>> >
>> >Really, as long as there is no driver-specific defaults (or as 
>> >little driver-specific anything as possible) and user actions 
>> >are clearly expressed (fw-reset does not necessarily imply
>> >fw-activation) - the API will be fine IMO.  
>> 
>> Clear actions, that is what I'm fine with.
>> 
>> But not sure how you think we can achieve no driver-specific defaults.
>> We have them already :/ I don't think we can easily remove them and not
>> break user expectations.
>
>AFAIU the per-driver default is needed because we went too low 
>level with what the action constitutes. We need maintain the higher
>level actions.
>
>The user clearly did not care if FW was reset during devlink reload
>before this set, so what has changed? The objective user has is to

Well for mlxsw, the user is used to this flow:
devlink dev flash - flash new fw
devlink dev reload - new fw is activated and reset and driver instances
are re-created.


>activate their config / FW / move to different net ns. 
>
>Reloading the driver or resetting FW is a low level detail which
>achieves different things for different implementations. So it's 
>not a suitable abstraction -> IOW we need the driver default.

I'm confused. So you think we need the driver default?


>
>
>The work flow for the user is:
>
>0. download fw to /lib/firmware
>1. devlink flash $dev $fw
>2. if live activation is enabled
>   yes - devlink reload $dev $live-activate
>   no - report machine has to be drained for reboot
>
>fw-reset can't be $live-activate, because as Jake said fw-reset does
>not activate the new image for Intel. So will we end up per-driver
>defaults in the kernel space, and user space maintaining a mapping from

Well, that is what what is Moshe's proposal. Per-driver kernel default..
I'm not sure what we are arguing about then :/


>a driver to what a "level" of reset implies.
>
>I hope this makes things crystal clear. Please explain what problems
>you're seeing and extensions you're expecting. A list of user scenarios
>you foresee would be v. useful.
Jakub Kicinski Aug. 6, 2020, 6:25 p.m. UTC | #17
On Wed, 5 Aug 2020 13:02:58 +0200 Jiri Pirko wrote:
> Tue, Aug 04, 2020 at 10:39:46PM CEST, kuba@kernel.org wrote:
> >AFAIU the per-driver default is needed because we went too low 
> >level with what the action constitutes. We need maintain the higher
> >level actions.
> >
> >The user clearly did not care if FW was reset during devlink reload
> >before this set, so what has changed? The objective user has is to  
> 
> Well for mlxsw, the user is used to this flow:
> devlink dev flash - flash new fw
> devlink dev reload - new fw is activated and reset and driver instances
> are re-created.

Ugh, if the current behavior already implies fw-activation for some
drivers then the default has to probably be "do all the things" :S

> >activate their config / FW / move to different net ns. 
> >
> >Reloading the driver or resetting FW is a low level detail which
> >achieves different things for different implementations. So it's 
> >not a suitable abstraction -> IOW we need the driver default.  
> 
> I'm confused. So you think we need the driver default?

No, I'm talking about the state of this patch set. _In this patchset_ 
we need a driver default because of the unsuitable abstraction.

Better design would not require it.

> >The work flow for the user is:
> >
> >0. download fw to /lib/firmware
> >1. devlink flash $dev $fw
> >2. if live activation is enabled
> >   yes - devlink reload $dev $live-activate
> >   no - report machine has to be drained for reboot
> >
> >fw-reset can't be $live-activate, because as Jake said fw-reset does
> >not activate the new image for Intel. So will we end up per-driver
> >defaults in the kernel space, and user space maintaining a mapping from  
> 
> Well, that is what what is Moshe's proposal. Per-driver kernel default..
> I'm not sure what we are arguing about then :/

The fact that if I do a pure "driver reload" it will active new
firmware for mlxsw but not for mlx5. In this patchset for mlx5 I need
driver reload fw-reset. And for Intel there is no suitable option.

> >a driver to what a "level" of reset implies.
> >
> >I hope this makes things crystal clear. Please explain what problems
> >you're seeing and extensions you're expecting. A list of user scenarios
> >you foresee would be v. useful.
Keller, Jacob E Aug. 6, 2020, 10:56 p.m. UTC | #18
On 8/6/2020 11:25 AM, Jakub Kicinski wrote:
> On Wed, 5 Aug 2020 13:02:58 +0200 Jiri Pirko wrote:
>> Tue, Aug 04, 2020 at 10:39:46PM CEST, kuba@kernel.org wrote:
>>> AFAIU the per-driver default is needed because we went too low 
>>> level with what the action constitutes. We need maintain the higher
>>> level actions.
>>>
>>> The user clearly did not care if FW was reset during devlink reload
>>> before this set, so what has changed? The objective user has is to  
>>
>> Well for mlxsw, the user is used to this flow:
>> devlink dev flash - flash new fw
>> devlink dev reload - new fw is activated and reset and driver instances
>> are re-created.
> 
> Ugh, if the current behavior already implies fw-activation for some
> drivers then the default has to probably be "do all the things" :S
> 
>>> activate their config / FW / move to different net ns. 
>>>
>>> Reloading the driver or resetting FW is a low level detail which
>>> achieves different things for different implementations. So it's 
>>> not a suitable abstraction -> IOW we need the driver default.  
>>
>> I'm confused. So you think we need the driver default?
> 
> No, I'm talking about the state of this patch set. _In this patchset_ 
> we need a driver default because of the unsuitable abstraction.
> 
> Better design would not require it.
> 
>>> The work flow for the user is:
>>>
>>> 0. download fw to /lib/firmware
>>> 1. devlink flash $dev $fw
>>> 2. if live activation is enabled
>>>   yes - devlink reload $dev $live-activate
>>>   no - report machine has to be drained for reboot
>>>
>>> fw-reset can't be $live-activate, because as Jake said fw-reset does
>>> not activate the new image for Intel. So will we end up per-driver
>>> defaults in the kernel space, and user space maintaining a mapping from  
>>
>> Well, that is what what is Moshe's proposal. Per-driver kernel default..
>> I'm not sure what we are arguing about then :/
> 
> The fact that if I do a pure "driver reload" it will active new
> firmware for mlxsw but not for mlx5. In this patchset for mlx5 I need
> driver reload fw-reset. And for Intel there is no suitable option.
> 

I want to clarify here, at least for ice: we *do* have a reset that can
activate firmware, but we have various level of reset which not all of
them do a fw activation. We have several levels of firmware reset,
including a "PF" reset that only resets data associated with that
function, a CORE reset which resets all functions, and then an EMP reset
which will activate the new firmware. For all of these resets, affected
PFs are notified over their firmware admin message queue. However, there
isn't a notion of negotiating beyond a message indicating what type of
reset is occurring.

I mostly wanted to clarify that "fw-reset" as a name doesn't necessarily
imply firmware activation. (hence separating fw-activate vs fw-reset).

Thanks,
Jake

>>> a driver to what a "level" of reset implies.
>>>
>>> I hope this makes things crystal clear. Please explain what problems
>>> you're seeing and extensions you're expecting. A list of user scenarios
>>> you foresee would be v. useful.
Moshe Shemesh Aug. 9, 2020, 1:21 p.m. UTC | #19
On 8/6/2020 9:25 PM, Jakub Kicinski wrote:
> On Wed, 5 Aug 2020 13:02:58 +0200 Jiri Pirko wrote:
>> Tue, Aug 04, 2020 at 10:39:46PM CEST, kuba@kernel.org wrote:
>>> AFAIU the per-driver default is needed because we went too low
>>> level with what the action constitutes. We need maintain the higher
>>> level actions.
>>>
>>> The user clearly did not care if FW was reset during devlink reload
>>> before this set, so what has changed? The objective user has is to
>> Well for mlxsw, the user is used to this flow:
>> devlink dev flash - flash new fw
>> devlink dev reload - new fw is activated and reset and driver instances
>> are re-created.
> Ugh, if the current behavior already implies fw-activation for some
> drivers then the default has to probably be "do all the things" :S


Okay, so devlink reload default for mlx5 will include also fw-activate 
to align with mlxsw default.

Meaning drivers that supports fw-activate will add it to the default.

The flow of devlink reload default on mlx5 will be:

If there is FW image pending and live patch is suitable to apply, do 
live patch and driver re-initialization.

If there is FW image pending but live patch doesn't fit do fw-reset and 
driver-initialization.

If no FW image pending just do driver-initialization.


I still think I should on top of that add the level option to be 
selected by the user if he prefers a specific action, so the uAPI would be:

devlink dev reload [ netns { PID | NAME | ID } ] [ level { fw-live-patch 
| driver-reinit |fw-activate } ]

But I am still missing something: fw-activate implies that it will 
activate a new FW image stored on flash, pending activation. What if the 
user wants to reset and reload the FW if no new FW pending ? Should we 
add --force option to fw-activate level ?


>>> activate their config / FW / move to different net ns.
>>>
>>> Reloading the driver or resetting FW is a low level detail which
>>> achieves different things for different implementations. So it's
>>> not a suitable abstraction -> IOW we need the driver default.
>> I'm confused. So you think we need the driver default?
> No, I'm talking about the state of this patch set. _In this patchset_
> we need a driver default because of the unsuitable abstraction.
>
> Better design would not require it.
>
>>> The work flow for the user is:
>>>
>>> 0. download fw to /lib/firmware
>>> 1. devlink flash $dev $fw
>>> 2. if live activation is enabled
>>>    yes - devlink reload $dev $live-activate
>>>    no - report machine has to be drained for reboot
>>>
>>> fw-reset can't be $live-activate, because as Jake said fw-reset does
>>> not activate the new image for Intel. So will we end up per-driver
>>> defaults in the kernel space, and user space maintaining a mapping from
>> Well, that is what what is Moshe's proposal. Per-driver kernel default..
>> I'm not sure what we are arguing about then :/
> The fact that if I do a pure "driver reload" it will active new
> firmware for mlxsw but not for mlx5. In this patchset for mlx5 I need
> driver reload fw-reset. And for Intel there is no suitable option.
>
>>> a driver to what a "level" of reset implies.
>>>
>>> I hope this makes things crystal clear. Please explain what problems
>>> you're seeing and extensions you're expecting. A list of user scenarios
>>> you foresee would be v. useful.
Jakub Kicinski Aug. 10, 2020, 4:53 p.m. UTC | #20
On Sun, 9 Aug 2020 16:21:29 +0300 Moshe Shemesh wrote:
> Okay, so devlink reload default for mlx5 will include also fw-activate 
> to align with mlxsw default.
> 
> Meaning drivers that supports fw-activate will add it to the default.

No per-driver default.

Maybe the difference between mlxsw and mlx5 can be simply explained by
the fact that mlxsw loads firmware from /lib/firmware on every probe
(more or less).

It's only natural for a driver which loads FW from disk to load it on
driver reload.

> The flow of devlink reload default on mlx5 will be:
> 
> If there is FW image pending and live patch is suitable to apply, do 
> live patch and driver re-initialization.
> 
> If there is FW image pending but live patch doesn't fit do fw-reset and 
> driver-initialization.
> 
> If no FW image pending just do driver-initialization.

This sounds too complicated. Don't try to guess what the user wants.

> I still think I should on top of that add the level option to be 
> selected by the user if he prefers a specific action, so the uAPI would be:
> 
> devlink dev reload [ netns { PID | NAME | ID } ] [ level { fw-live-patch 
> | driver-reinit |fw-activate } ]

I'm all for the level/action.

> But I am still missing something: fw-activate implies that it will 
> activate a new FW image stored on flash, pending activation. What if the 
> user wants to reset and reload the FW if no new FW pending ? Should we 
> add --force option to fw-activate level ?

Since reload does not check today if anything changed - i.e. if reload
is actually needed, neither should fw-activate, IMO. I'd expect the
"--force behavior" to be the default.
Keller, Jacob E Aug. 10, 2020, 5:09 p.m. UTC | #21
On 8/10/2020 9:53 AM, Jakub Kicinski wrote:
> On Sun, 9 Aug 2020 16:21:29 +0300 Moshe Shemesh wrote:
>> Okay, so devlink reload default for mlx5 will include also fw-activate 
>> to align with mlxsw default.
>>
>> Meaning drivers that supports fw-activate will add it to the default.
> 
> No per-driver default.
> 
> Maybe the difference between mlxsw and mlx5 can be simply explained by
> the fact that mlxsw loads firmware from /lib/firmware on every probe
> (more or less).
> 
> It's only natural for a driver which loads FW from disk to load it on
> driver reload.
> 

This seems reasonable to me as long as the drivers document this
behavior in their devlink/<driver>.rst. We shouldn't change existing
behavior. One could argue that this difference in behavior amounts to a
"driver default"... but I agree that we shouldn't enshrine that in the
interface.


>> The flow of devlink reload default on mlx5 will be:
>>
>> If there is FW image pending and live patch is suitable to apply, do 
>> live patch and driver re-initialization.
>>
>> If there is FW image pending but live patch doesn't fit do fw-reset and 
>> driver-initialization.
>>
>> If no FW image pending just do driver-initialization.
> 
> This sounds too complicated. Don't try to guess what the user wants.
> >> I still think I should on top of that add the level option to be
>> selected by the user if he prefers a specific action, so the uAPI would be:
>>
>> devlink dev reload [ netns { PID | NAME | ID } ] [ level { fw-live-patch 
>> | driver-reinit |fw-activate } ]
> 
> I'm all for the level/action.
> 

Yep, same here.

>> But I am still missing something: fw-activate implies that it will 
>> activate a new FW image stored on flash, pending activation. What if the 
>> user wants to reset and reload the FW if no new FW pending ? Should we 
>> add --force option to fw-activate level ?
> 
> Since reload does not check today if anything changed - i.e. if reload
> is actually needed, neither should fw-activate, IMO. I'd expect the
> "--force behavior" to be the default.
> 

Yep. What about if there is HW/FW that can't initiate the fw-activate
reset unless there is a pending update? I think ice firmware might
respond to the "please reset/activate" command with a specific status
code indicating that no update was pending.

I think the simplest solution is to just interpret this as a success.
Alternatively we could report a specific error to inform user that no
activation took place?
Jakub Kicinski Aug. 10, 2020, 6:17 p.m. UTC | #22
On Mon, 10 Aug 2020 10:09:20 -0700 Jacob Keller wrote:
> >> But I am still missing something: fw-activate implies that it will 
> >> activate a new FW image stored on flash, pending activation. What if the 
> >> user wants to reset and reload the FW if no new FW pending ? Should we 
> >> add --force option to fw-activate level ?  
> > 
> > Since reload does not check today if anything changed - i.e. if reload
> > is actually needed, neither should fw-activate, IMO. I'd expect the
> > "--force behavior" to be the default.
> >   
> 
> Yep. What about if there is HW/FW that can't initiate the fw-activate
> reset unless there is a pending update? I think ice firmware might
> respond to the "please reset/activate" command with a specific status
> code indicating that no update was pending.
> 
> I think the simplest solution is to just interpret this as a success.
> Alternatively we could report a specific error to inform user that no
> activation took place?

I'd do EOPNOTSUPP + extack.
Jiri Pirko Aug. 11, 2020, 5:46 a.m. UTC | #23
Mon, Aug 10, 2020 at 06:53:05PM CEST, kuba@kernel.org wrote:
>On Sun, 9 Aug 2020 16:21:29 +0300 Moshe Shemesh wrote:
>> Okay, so devlink reload default for mlx5 will include also fw-activate 
>> to align with mlxsw default.
>> 
>> Meaning drivers that supports fw-activate will add it to the default.
>
>No per-driver default.
>
>Maybe the difference between mlxsw and mlx5 can be simply explained by
>the fact that mlxsw loads firmware from /lib/firmware on every probe
>(more or less).
>
>It's only natural for a driver which loads FW from disk to load it on
>driver reload.

We don't load it on reaload... We just do reset witn activation.

>
>> The flow of devlink reload default on mlx5 will be:
>> 
>> If there is FW image pending and live patch is suitable to apply, do 
>> live patch and driver re-initialization.
>> 
>> If there is FW image pending but live patch doesn't fit do fw-reset and 
>> driver-initialization.
>> 
>> If no FW image pending just do driver-initialization.
>
>This sounds too complicated. Don't try to guess what the user wants.
>
>> I still think I should on top of that add the level option to be 
>> selected by the user if he prefers a specific action, so the uAPI would be:
>> 
>> devlink dev reload [ netns { PID | NAME | ID } ] [ level { fw-live-patch 
>> | driver-reinit |fw-activate } ]
>
>I'm all for the level/action.
>
>> But I am still missing something: fw-activate implies that it will 
>> activate a new FW image stored on flash, pending activation. What if the 
>> user wants to reset and reload the FW if no new FW pending ? Should we 
>> add --force option to fw-activate level ?
>
>Since reload does not check today if anything changed - i.e. if reload
>is actually needed, neither should fw-activate, IMO. I'd expect the
>"--force behavior" to be the default.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 954c22c79f6b..57d9d4381cb0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3935,7 +3935,7 @@  static int mlx4_restart_one_up(struct pci_dev *pdev, bool reload,
 			       struct devlink *devlink);
 
 static int mlx4_devlink_reload_down(struct devlink *devlink, bool netns_change,
-				    struct netlink_ext_ack *extack)
+				    enum devlink_reload_level level, struct netlink_ext_ack *extack)
 {
 	struct mlx4_priv *priv = devlink_priv(devlink);
 	struct mlx4_dev *dev = &priv->dev;
@@ -3951,7 +3951,7 @@  static int mlx4_devlink_reload_down(struct devlink *devlink, bool netns_change,
 	return 0;
 }
 
-static int mlx4_devlink_reload_up(struct devlink *devlink,
+static int mlx4_devlink_reload_up(struct devlink *devlink, enum devlink_reload_level level,
 				  struct netlink_ext_ack *extack)
 {
 	struct mlx4_priv *priv = devlink_priv(devlink);
@@ -3969,6 +3969,8 @@  static int mlx4_devlink_reload_up(struct devlink *devlink,
 
 static const struct devlink_ops mlx4_devlink_ops = {
 	.port_type_set	= mlx4_devlink_port_type_set,
+	.supported_reload_levels = BIT(DEVLINK_RELOAD_LEVEL_DRIVER),
+	.default_reload_level = DEVLINK_RELOAD_LEVEL_DRIVER,
 	.reload_down	= mlx4_devlink_reload_down,
 	.reload_up	= mlx4_devlink_reload_up,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index c709e9a385f6..5424e31a0f45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -89,7 +89,7 @@  mlx5_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 }
 
 static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
-				    struct netlink_ext_ack *extack)
+				    enum devlink_reload_level level, struct netlink_ext_ack *extack)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 
@@ -97,7 +97,7 @@  static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 	return 0;
 }
 
-static int mlx5_devlink_reload_up(struct devlink *devlink,
+static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_level level,
 				  struct netlink_ext_ack *extack)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
@@ -118,6 +118,8 @@  static const struct devlink_ops mlx5_devlink_ops = {
 #endif
 	.flash_update = mlx5_devlink_flash_update,
 	.info_get = mlx5_devlink_info_get,
+	.supported_reload_levels = BIT(DEVLINK_RELOAD_LEVEL_DRIVER),
+	.default_reload_level = DEVLINK_RELOAD_LEVEL_DRIVER,
 	.reload_down = mlx5_devlink_reload_down,
 	.reload_up = mlx5_devlink_reload_up,
 };
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index b01f8f2fab63..360d749eb042 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1113,7 +1113,7 @@  mlxsw_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 
 static int
 mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
-					  bool netns_change,
+					  bool netns_change, enum devlink_reload_level level,
 					  struct netlink_ext_ack *extack)
 {
 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
@@ -1126,7 +1126,7 @@  mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
 }
 
 static int
-mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink,
+mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, enum devlink_reload_level level,
 					struct netlink_ext_ack *extack)
 {
 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
@@ -1266,6 +1266,8 @@  mlxsw_devlink_trap_policer_counter_get(struct devlink *devlink,
 }
 
 static const struct devlink_ops mlxsw_devlink_ops = {
+	.supported_reload_levels	= BIT(DEVLINK_RELOAD_LEVEL_FW_RESET),
+	.default_reload_level		= DEVLINK_RELOAD_LEVEL_FW_RESET,
 	.reload_down		= mlxsw_devlink_core_bus_device_reload_down,
 	.reload_up		= mlxsw_devlink_core_bus_device_reload_up,
 	.port_type_set			= mlxsw_devlink_port_type_set,
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index ce719c830a77..680482f687f4 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -697,7 +697,7 @@  static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 static void nsim_dev_reload_destroy(struct nsim_dev *nsim_dev);
 
 static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change,
-				struct netlink_ext_ack *extack)
+				enum devlink_reload_level level, struct netlink_ext_ack *extack)
 {
 	struct nsim_dev *nsim_dev = devlink_priv(devlink);
 
@@ -713,7 +713,7 @@  static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change,
 	return 0;
 }
 
-static int nsim_dev_reload_up(struct devlink *devlink,
+static int nsim_dev_reload_up(struct devlink *devlink, enum devlink_reload_level level,
 			      struct netlink_ext_ack *extack)
 {
 	struct nsim_dev *nsim_dev = devlink_priv(devlink);
@@ -873,6 +873,8 @@  nsim_dev_devlink_trap_policer_counter_get(struct devlink *devlink,
 }
 
 static const struct devlink_ops nsim_dev_devlink_ops = {
+	.supported_reload_levels = BIT(DEVLINK_RELOAD_LEVEL_DRIVER),
+	.default_reload_level = DEVLINK_RELOAD_LEVEL_DRIVER,
 	.reload_down = nsim_dev_reload_down,
 	.reload_up = nsim_dev_reload_up,
 	.info_get = nsim_dev_info_get,
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 19d990c8edcc..b291cd8d6be6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -985,9 +985,11 @@  enum devlink_trap_group_generic_id {
 	}
 
 struct devlink_ops {
+	unsigned long supported_reload_levels;
+	enum devlink_reload_level default_reload_level;
 	int (*reload_down)(struct devlink *devlink, bool netns_change,
-			   struct netlink_ext_ack *extack);
-	int (*reload_up)(struct devlink *devlink,
+			   enum devlink_reload_level level, struct netlink_ext_ack *extack);
+	int (*reload_up)(struct devlink *devlink, enum devlink_reload_level level,
 			 struct netlink_ext_ack *extack);
 	int (*port_type_set)(struct devlink_port *devlink_port,
 			     enum devlink_port_type port_type);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cfef4245ea5a..fa5f66db5012 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -272,6 +272,23 @@  enum {
 	DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE,
 };
 
+/**
+ * enum devlink_reload_level - Reload level.
+ * @DEVLINK_RELOAD_LEVEL_DRIVER: Driver entities re-instantiation only.
+ * @DEVLINK_RELOAD_LEVEL_FW_RESET: FW reset and driver entities re-instantiation.
+ * @DEVLINK_RELOAD_LEVEL_FW_LIVE_PATCH: FW live patching only.
+ */
+enum devlink_reload_level {
+	DEVLINK_RELOAD_LEVEL_UNSPEC,
+	DEVLINK_RELOAD_LEVEL_DRIVER,
+	DEVLINK_RELOAD_LEVEL_FW_RESET,
+	DEVLINK_RELOAD_LEVEL_FW_LIVE_PATCH,
+
+	/* Add new reload levels above */
+	__DEVLINK_RELOAD_LEVEL_MAX,
+	DEVLINK_RELOAD_LEVEL_MAX = __DEVLINK_RELOAD_LEVEL_MAX - 1
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -458,6 +475,8 @@  enum devlink_attr {
 	DEVLINK_ATTR_PORT_LANES,			/* u32 */
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
+	DEVLINK_ATTR_RELOAD_LEVEL,		/* u8 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 0ca89196a367..31b367a1612d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -462,6 +462,12 @@  static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
 	return 0;
 }
 
+static bool
+devlink_reload_level_is_supported(struct devlink *devlink, enum devlink_reload_level level)
+{
+	return test_bit(level, &devlink->ops->supported_reload_levels);
+}
+
 static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
 			   enum devlink_command cmd, u32 portid,
 			   u32 seq, int flags)
@@ -2958,21 +2964,21 @@  bool devlink_is_reload_failed(const struct devlink *devlink)
 EXPORT_SYMBOL_GPL(devlink_is_reload_failed);
 
 static int devlink_reload(struct devlink *devlink, struct net *dest_net,
-			  struct netlink_ext_ack *extack)
+			  enum devlink_reload_level level, struct netlink_ext_ack *extack)
 {
 	int err;
 
 	if (!devlink->reload_enabled)
 		return -EOPNOTSUPP;
 
-	err = devlink->ops->reload_down(devlink, !!dest_net, extack);
+	err = devlink->ops->reload_down(devlink, !!dest_net, level, extack);
 	if (err)
 		return err;
 
 	if (dest_net && !net_eq(dest_net, devlink_net(devlink)))
 		devlink_reload_netns_change(devlink, dest_net);
 
-	err = devlink->ops->reload_up(devlink, extack);
+	err = devlink->ops->reload_up(devlink, level, extack);
 	devlink_reload_failed_set(devlink, !!err);
 	return err;
 }
@@ -2980,6 +2986,7 @@  static int devlink_reload(struct devlink *devlink, struct net *dest_net,
 static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 {
 	struct devlink *devlink = info->user_ptr[0];
+	enum devlink_reload_level level;
 	struct net *dest_net = NULL;
 	int err;
 
@@ -3000,7 +3007,20 @@  static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 			return PTR_ERR(dest_net);
 	}
 
-	err = devlink_reload(devlink, dest_net, info->extack);
+	if (info->attrs[DEVLINK_ATTR_RELOAD_LEVEL])
+		level = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_LEVEL]);
+	else
+		level = devlink->ops->default_reload_level;
+
+	if (level == DEVLINK_RELOAD_LEVEL_UNSPEC || level > DEVLINK_RELOAD_LEVEL_MAX) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Invalid reload level");
+		return -EINVAL;
+	} else if (!devlink_reload_level_is_supported(devlink, level)) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Requested reload level is not supported");
+		return -EOPNOTSUPP;
+	}
+
+	err = devlink_reload(devlink, dest_net, level, info->extack);
 
 	if (dest_net)
 		put_net(dest_net);
@@ -7026,6 +7046,7 @@  static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_TRAP_POLICER_RATE] = { .type = NLA_U64 },
 	[DEVLINK_ATTR_TRAP_POLICER_BURST] = { .type = NLA_U64 },
 	[DEVLINK_ATTR_PORT_FUNCTION] = { .type = NLA_NESTED },
+	[DEVLINK_ATTR_RELOAD_LEVEL] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -7351,6 +7372,21 @@  static struct genl_family devlink_nl_family __ro_after_init = {
 	.n_mcgrps	= ARRAY_SIZE(devlink_nl_mcgrps),
 };
 
+static int devlink_reload_levels_verify(struct devlink *devlink)
+{
+	const struct devlink_ops *ops;
+
+	if (!devlink_reload_supported(devlink))
+		return 0;
+
+	ops = devlink->ops;
+	if (WARN_ON(ops->supported_reload_levels >= BIT(__DEVLINK_RELOAD_LEVEL_MAX) ||
+		    ops->supported_reload_levels & BIT(DEVLINK_RELOAD_LEVEL_UNSPEC) ||
+		    !(ops->supported_reload_levels & BIT(ops->default_reload_level))))
+		return -EINVAL;
+	return 0;
+}
+
 /**
  *	devlink_alloc - Allocate new devlink instance resources
  *
@@ -7371,6 +7407,11 @@  struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	if (!devlink)
 		return NULL;
 	devlink->ops = ops;
+	if (devlink_reload_levels_verify(devlink)) {
+		kfree(devlink);
+		return NULL;
+	}
+
 	xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC);
 	__devlink_net_set(devlink, &init_net);
 	INIT_LIST_HEAD(&devlink->port_list);
@@ -9599,7 +9640,8 @@  static void __net_exit devlink_pernet_pre_exit(struct net *net)
 		if (net_eq(devlink_net(devlink), net)) {
 			if (WARN_ON(!devlink_reload_supported(devlink)))
 				continue;
-			err = devlink_reload(devlink, &init_net, NULL);
+			err = devlink_reload(devlink, &init_net,
+					     devlink->ops->default_reload_level, NULL);
 			if (err && err != -EOPNOTSUPP)
 				pr_warn("Failed to reload devlink instance into init_net\n");
 		}