mbox series

[net-next,v2,00/11] devlink: Add support for region access

Message ID 1531305788-29420-1-git-send-email-valex@mellanox.com
Headers show
Series devlink: Add support for region access | expand

Message

Alex Vesker July 11, 2018, 10:42 a.m. UTC
This is a proposal which will allow access to driver defined address
regions using devlink. Each device can create its supported address
regions and register them. A device which exposes a region will allow
access to it using devlink.

The suggested implementation will allow exposing regions to the user,
reading and dumping snapshots taken from different regions. 
A snapshot represents a memory image of a region taken by the driver.

If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots to be
done.

The major benefit of this support is not only to provide access to
internal address regions which were inaccessible to the user but also
to provide an additional way to debug complex error states using the
region snapshots.

Implemented commands:
$ devlink region help
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
$ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
	address ADDRESS length length

Show all of the exposed regions with region sizes:
$ devlink region show
pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2]
pci/0000:00:05.0/fw-health: size 64 snapshot [1 2]

Delete a snapshot using:
$ devlink region del pci/0000:00:05.0/cr-space snapshot 1

Dump a snapshot:
$ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5

Read a specific part of a snapshot:
$ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 
	length 16
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30

For more information you can check devlink-region.8 man page

Future:
There is a plan to extend the support to include a write command
as well as performing read and dump live region

v1->v2:
-Add a parameter to enable devlink region snapshot
-Allocate snapshot memory using kvmalloc
-Introduce destructor function devlink_snapshot_data_dest_t to avoid
 double allocation

Alex Vesker (11):
  devlink: Add support for creating and destroying regions
  devlink: Add callback to query for snapshot id before snapshot create
  devlink: Add support for creating region snapshots
  devlink: Add support for region get command
  devlink: Extend the support querying for region snapshot IDs
  devlink: Add support for region snapshot delete command
  devlink: Add support for region snapshot read command
  net/mlx4_core: Add health buffer address capability
  net/mlx4_core: Add Crdump FW snapshot support
  devlink: Add generic parameters region_snapshot
  net/mlx4_core: Use devlink region_snapshot parameter

 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 239 ++++++++++
 drivers/net/ethernet/mellanox/mlx4/fw.c     |   5 +-
 drivers/net/ethernet/mellanox/mlx4/fw.h     |   1 +
 drivers/net/ethernet/mellanox/mlx4/main.c   |  52 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h                 |   8 +
 include/net/devlink.h                       |  47 ++
 include/uapi/linux/devlink.h                |  18 +
 net/core/devlink.c                          | 647 ++++++++++++++++++++++++++++
 11 files changed, 1024 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

Comments

Jakub Kicinski July 11, 2018, 6:48 p.m. UTC | #1
CC: linux-wireless, wifi chips used to have similar problem

On Wed, 11 Jul 2018 13:42:57 +0300, Alex Vesker wrote:
> This is a proposal which will allow access to driver defined address
> regions using devlink. Each device can create its supported address
> regions and register them. A device which exposes a region will allow
> access to it using devlink.
> 
> The suggested implementation will allow exposing regions to the user,
> reading and dumping snapshots taken from different regions. 
> A snapshot represents a memory image of a region taken by the driver.
> 
> If a device collects a snapshot of an address region it can be later
> exposed using devlink region read or dump commands.
> This functionality allows for future analyses on the snapshots to be
> done.
> 
> The major benefit of this support is not only to provide access to
> internal address regions which were inaccessible to the user but also
> to provide an additional way to debug complex error states using the
> region snapshots.
> 
> Implemented commands:
> $ devlink region help
> $ devlink region show [ DEV/REGION ]
> $ devlink region del DEV/REGION snapshot SNAPSHOT_ID
> $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
> $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
> 	address ADDRESS length length

You got me excited with the read without snapshot but then you say below
that it's future work :)

> Show all of the exposed regions with region sizes:
> $ devlink region show
> pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2]
> pci/0000:00:05.0/fw-health: size 64 snapshot [1 2]
> 
> Delete a snapshot using:
> $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
> 
> Dump a snapshot:
> $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
> 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
> 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
> 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
> 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
> 
> Read a specific part of a snapshot:
> $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 
> 	length 16
> 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
> 
> For more information you can check devlink-region.8 man page
> 
> Future:
> There is a plan to extend the support to include a write command
> as well as performing read and dump live region

Reading live region would be very interesting and alleviate the need
for complicated ethtool dump marshalling a number of drivers started
doing (incl. nfp).  Any plans on that?

Write support I'm less excited about :)
Alex Vesker July 12, 2018, 6:21 a.m. UTC | #2
On 7/11/2018 9:48 PM, Jakub Kicinski wrote:
> CC: linux-wireless, wifi chips used to have similar problem
>
> On Wed, 11 Jul 2018 13:42:57 +0300, Alex Vesker wrote:
>> This is a proposal which will allow access to driver defined address
>> regions using devlink. Each device can create its supported address
>> regions and register them. A device which exposes a region will allow
>> access to it using devlink.
>>
>> The suggested implementation will allow exposing regions to the user,
>> reading and dumping snapshots taken from different regions.
>> A snapshot represents a memory image of a region taken by the driver.
>>
>> If a device collects a snapshot of an address region it can be later
>> exposed using devlink region read or dump commands.
>> This functionality allows for future analyses on the snapshots to be
>> done.
>>
>> The major benefit of this support is not only to provide access to
>> internal address regions which were inaccessible to the user but also
>> to provide an additional way to debug complex error states using the
>> region snapshots.
>>
>> Implemented commands:
>> $ devlink region help
>> $ devlink region show [ DEV/REGION ]
>> $ devlink region del DEV/REGION snapshot SNAPSHOT_ID
>> $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
>> $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
>> 	address ADDRESS length length
> You got me excited with the read without snapshot but then you say below
> that it's future work :)
>
>> Show all of the exposed regions with region sizes:
>> $ devlink region show
>> pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2]
>> pci/0000:00:05.0/fw-health: size 64 snapshot [1 2]
>>
>> Delete a snapshot using:
>> $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>
>> Dump a snapshot:
>> $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
>> 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
>> 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
>> 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
>> 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
>>
>> Read a specific part of a snapshot:
>> $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0
>> 	length 16
>> 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
>>
>> For more information you can check devlink-region.8 man page
>>
>> Future:
>> There is a plan to extend the support to include a write command
>> as well as performing read and dump live region
> Reading live region would be very interesting and alleviate the need
> for complicated ethtool dump marshalling a number of drivers started
> doing (incl. nfp).  Any plans on that?
Yes I plan to also support read of live regions.
Unlike ethtool which works per netdevice this will allow access per PCI 
device,
this allows reading region/dump on IPoIB ULP for example, which
cannot implement a vendor specific dump using ethtool.
> Write support I'm less excited about :)
Johannes Berg Aug. 15, 2018, 8:25 p.m. UTC | #3
On Wed, 2018-07-11 at 11:48 -0700, Jakub Kicinski wrote:
> CC: linux-wireless, wifi chips used to have similar problem
> 
> On Wed, 11 Jul 2018 13:42:57 +0300, Alex Vesker wrote:
> > This is a proposal which will allow access to driver defined address
> > regions using devlink. Each device can create its supported address
> > regions and register them. A device which exposes a region will allow
> > access to it using devlink.


We usually just implement it for debug dumping, and use devcoredump for
that. How does this overlap? Some of the devcoredump data may be
"synthetic" in the sense that it doesn't really have to be strictly a
device memory region, so it's not quite the same, but what's the
intended use case for this?

johannes