mbox series

[net-next,v2,0/7] mlxsw: Add extended ACK for EMADs

Message ID 20191112064830.27002-1-idosch@idosch.org
Headers show
Series mlxsw: Add extended ACK for EMADs | expand

Message

Ido Schimmel Nov. 12, 2019, 6:48 a.m. UTC
From: Ido Schimmel <idosch@mellanox.com>

Shalom says:

Ethernet Management Datagrams (EMADs) are Ethernet packets sent between
the driver and device's firmware. They are used to pass various
configurations to the device, but also to get events (e.g., port up)
from it. After the Ethernet header, these packets are built in a TLV
format.

Up until now, whenever the driver issued an erroneous register access it
only got an error code indicating a bad parameter was used. This patch
set adds a new TLV (string TLV) that can be used by the firmware to
encode a 128 character string describing the error. The new TLV is
allocated by the driver and set to zeros. In case of error, the driver
will check the length of the string in the response and report it using
devlink hwerr tracepoint.

Example:

$ perf record -a -q -e devlink:devlink_hwerr &

$ pkill -2 perf

$ perf script -F trace:event,trace | grep hwerr
devlink:devlink_hwerr: bus_name=pci dev_name=0000:03:00.0 driver_name=mlxsw_spectrum err=7 (tid=9913892d00001593,reg_id=8018(rauhtd)) bad parameter (inside er_rauhtd_write_query(), num_rec=32 is over the maximum  number of records supported)

Patch #1 parses the offsets of the different TLVs in incoming EMADs and
stores them in the skb's control block. This makes it easier to later
add new TLVs.

Patches #2-#3 remove deprecated TLVs and add string TLV definition.

Patches #4-#7 gradually add support for the new string TLV.

v2:
* Use existing devlink hwerr tracepoint to report the error string,
  instead of printing it to kernel log

Shalom Toledo (7):
  mlxsw: core: Parse TLVs' offsets of incoming EMADs
  mlxsw: emad: Remove deprecated EMAD TLVs
  mlxsw: core: Add EMAD string TLV
  mlxsw: core: Add support for EMAD string TLV parsing
  mlxsw: core: Extend EMAD information reported to devlink hwerr
  mlxsw: core: Add support for using EMAD string TLV
  mlxsw: spectrum: Enable EMAD string TLV

 drivers/net/ethernet/mellanox/mlxsw/core.c    | 171 ++++++++++++++++--
 drivers/net/ethernet/mellanox/mlxsw/core.h    |   2 +
 drivers/net/ethernet/mellanox/mlxsw/emad.h    |   7 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |   2 +
 4 files changed, 162 insertions(+), 20 deletions(-)

Comments

David Miller Nov. 12, 2019, 6:54 p.m. UTC | #1
From: Ido Schimmel <idosch@idosch.org>
Date: Tue, 12 Nov 2019 08:48:23 +0200

> From: Ido Schimmel <idosch@mellanox.com>
> 
> Shalom says:
> 
> Ethernet Management Datagrams (EMADs) are Ethernet packets sent between
> the driver and device's firmware. They are used to pass various
> configurations to the device, but also to get events (e.g., port up)
> from it. After the Ethernet header, these packets are built in a TLV
> format.
> 
> Up until now, whenever the driver issued an erroneous register access it
> only got an error code indicating a bad parameter was used. This patch
> set adds a new TLV (string TLV) that can be used by the firmware to
> encode a 128 character string describing the error. The new TLV is
> allocated by the driver and set to zeros. In case of error, the driver
> will check the length of the string in the response and report it using
> devlink hwerr tracepoint.
 ...

Series applied, thank you.
Jakub Kicinski Nov. 12, 2019, 10:22 p.m. UTC | #2
On Tue, 12 Nov 2019 08:48:23 +0200, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Shalom says:
> 
> Ethernet Management Datagrams (EMADs) are Ethernet packets sent between
> the driver and device's firmware. They are used to pass various
> configurations to the device, but also to get events (e.g., port up)
> from it. After the Ethernet header, these packets are built in a TLV
> format.
> 
> Up until now, whenever the driver issued an erroneous register access it
> only got an error code indicating a bad parameter was used. This patch
> set adds a new TLV (string TLV) that can be used by the firmware to
> encode a 128 character string describing the error. The new TLV is
> allocated by the driver and set to zeros. In case of error, the driver
> will check the length of the string in the response and report it using
> devlink hwerr tracepoint.
> 
> Example:
> 
> $ perf record -a -q -e devlink:devlink_hwerr &
> 
> $ pkill -2 perf
> 
> $ perf script -F trace:event,trace | grep hwerr
> devlink:devlink_hwerr: bus_name=pci dev_name=0000:03:00.0 driver_name=mlxsw_spectrum err=7 (tid=9913892d00001593,reg_id=8018(rauhtd)) bad parameter (inside er_rauhtd_write_query(), num_rec=32 is over the maximum  number of records supported)
> 
> Patch #1 parses the offsets of the different TLVs in incoming EMADs and
> stores them in the skb's control block. This makes it easier to later
> add new TLVs.
> 
> Patches #2-#3 remove deprecated TLVs and add string TLV definition.
> 
> Patches #4-#7 gradually add support for the new string TLV.
> 
> v2:
> * Use existing devlink hwerr tracepoint to report the error string,
>   instead of printing it to kernel log

Thanks, this is much better! 👍