diff mbox

[v3,1/3] aerdrv: Trace Event for AER

Message ID 20121130213330.7649.51101.stgit@grignak.americas.hpqcorp.net
State Superseded
Headers show

Commit Message

Lance Ortiz Nov. 30, 2012, 9:33 p.m. UTC
This header file will define a new trace event that will be triggered when
a AER event occurs.  The following data will be provided to the trace
event.

char * name -	String containing the device path

u32 status - 	Either the correctable or uncorrectable register
		indicating what error or errors have been see.

u8 severity - 	error severity 0:NONFATAL 1:FATAL 2:CORRECTED

The trace event will also provide a trace string that may look like:

"0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
TLP"

v1-v2 Move header from include/ras/aer_event.h to
include/trace/events/ras.h

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
---

 include/trace/events/ras.h |   77 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 77 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/ras.h


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Borislav Petkov Dec. 1, 2012, 10:55 a.m. UTC | #1
On Fri, Nov 30, 2012 at 02:33:30PM -0700, Lance Ortiz wrote:
> This header file will define a new trace event that will be triggered when
> a AER event occurs.  The following data will be provided to the trace
> event.
> 
> char * name -	String containing the device path

Can we explain that device path more, even with an example? Something like
"position of the device in the PCI hierarchy, i.e. 0000:05:00.0"

> u32 status - 	Either the correctable or uncorrectable register
> 		indicating what error or errors have been see.

							  seen

> u8 severity - 	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> 
> The trace event will also provide a trace string that may look like:
> 
> "0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
> TLP"
> 
> v1-v2 Move header from include/ras/aer_event.h to
> include/trace/events/ras.h
> 
> Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
> ---
> 
>  include/trace/events/ras.h |   77 ++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 77 insertions(+), 0 deletions(-)
>  create mode 100644 include/trace/events/ras.h
> 
> diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
> new file mode 100644
> index 0000000..f77d009
> --- /dev/null
> +++ b/include/trace/events/ras.h
> @@ -0,0 +1,77 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM aer_event
> +#define TRACE_INCLUDE_FILE ras 

Applying: aerdrv: Trace Event for AER
/home/boris/kernel/linux-2.6/.git/rebase-apply/patch:15: trailing whitespace.
#define TRACE_INCLUDE_FILE ras 
warning: 1 line adds whitespace errors.

> +
> +#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_AER_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +
> +
> +/*
> + * Anhance Error Reporting (AER) PCIE Report Error

      PCIE Advanced Error Reporting (AER) tracepoint


> + *
> + * These events are generated when hardware detects a corrected or
> + * uncorrected event on a pci express device and reports

			     PCIE device. (no need for "and reports errors")

> + * errors.  The event reports the following data.

	       The event record has the following structure:

> + *
> + * char * dev_name -	String containing the device identification
> + * u32 status -		Either the correctable or uncorrectable register
> + *			indicating what error or errors have been seen
> + * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> + */
> +
> +#define correctable_error_string			\
> +	{BIT(0),	"Receiver Error"},		\
> +	{BIT(6),	"Bad TLP"},			\
> +	{BIT(7),	"Bad DLLP"},			\
> +	{BIT(8),	"RELAY_NUM Rollover"},		\
> +	{BIT(12),	"Replay Timer Timeout"},	\
> +	{BIT(13),	"Advisory Non-Fatal"}
> +
> +#define uncorrectable_error_string			\
> +	{BIT(4),	"Data Link Protocol"},		\
> +	{BIT(12),	"Poisoned TLP"},		\
> +	{BIT(13),	"Flow Control Protocol"},	\
> +	{BIT(14),	"Completion Timeout"},		\
> +	{BIT(15),	"Completer Abort"},		\
> +	{BIT(16),	"Unexpected Completion"},	\
> +	{BIT(17),	"Receiver Overflow"},		\
> +	{BIT(18),	"Malformed TLP"},		\
> +	{BIT(19),	"ECRC"},			\
> +	{BIT(20),	"Unsupported Request"}

I'm wondering whether something else PCIe could use those error types
but I guess they'll pick them up from here if needed.

> +
> +TRACE_EVENT(aer_event,
> +	TP_PROTO(const char *dev_name,
> +		 const u32 status,
> +		 const u8 severity),
> +
> +	TP_ARGS(dev_name, status, severity),
> +
> +	TP_STRUCT__entry(
> +		__string(	dev_name,	dev_name	)
> +		__field(	u32,		status		)
> +		__field(	u8,		severity	)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->status		= status;
> +		__entry->severity	= severity;
> +	),
> +
> +	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
> +		__get_str(dev_name),
> +		(__entry->severity == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +			((__entry->severity == HW_EVENT_ERR_FATAL) ?
> +			"Fatal" : "Uncorrected"),
> +		__entry->severity == HW_EVENT_ERR_CORRECTED ?
> +		__print_flags(__entry->status, "|", correctable_error_string) :
> +		__print_flags(__entry->status, "|", uncorrectable_error_string))
> +);
> +
> +#endif /* _TRACE_AER_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

Apart from the minor issues above:

Acked-by: Borislav Petkov <bp@alien8.de>

Thanks.
Mauro Carvalho Chehab Dec. 1, 2012, 11:36 a.m. UTC | #2
Em Fri, 30 Nov 2012 14:33:30 -0700
Lance Ortiz <lance.ortiz@hp.com> escreveu:

> This header file will define a new trace event that will be triggered when
> a AER event occurs.  The following data will be provided to the trace
> event.
> 
> char * name -	String containing the device path

You renamed it to dev_name. Please fix it at the commit comments.

> 
> u32 status - 	Either the correctable or uncorrectable register
> 		indicating what error or errors have been see.
> 
> u8 severity - 	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> 
> The trace event will also provide a trace string that may look like:
> 
> "0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
> TLP"
> 
> v1-v2 Move header from include/ras/aer_event.h to
> include/trace/events/ras.h

Please don't call it as "ras.h". Call it, instead, ras_aer.h (or something
similar) as, if we're moving the tracing back to include/trace/events/,
the same should happen for the memory error events.

> 
> Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
> ---
> 
>  include/trace/events/ras.h |   77 ++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 77 insertions(+), 0 deletions(-)
>  create mode 100644 include/trace/events/ras.h
> 
> diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
> new file mode 100644
> index 0000000..f77d009
> --- /dev/null
> +++ b/include/trace/events/ras.h
> @@ -0,0 +1,77 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM aer_event
> +#define TRACE_INCLUDE_FILE ras 
> +
> +#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_AER_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +
> +
> +/*
> + * Anhance Error Reporting (AER) PCIE Report Error
> + *
> + * These events are generated when hardware detects a corrected or
> + * uncorrected event on a pci express device and reports
> + * errors.  The event reports the following data.
> + *
> + * char * dev_name -	String containing the device identification
> + * u32 status -		Either the correctable or uncorrectable register
> + *			indicating what error or errors have been seen
> + * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> + */
> +
> +#define correctable_error_string			\
> +	{BIT(0),	"Receiver Error"},		\
> +	{BIT(6),	"Bad TLP"},			\
> +	{BIT(7),	"Bad DLLP"},			\
> +	{BIT(8),	"RELAY_NUM Rollover"},		\
> +	{BIT(12),	"Replay Timer Timeout"},	\
> +	{BIT(13),	"Advisory Non-Fatal"}

Hmm... isn't something missing here? I'm seeing more bits defined at the
PCIe V3.0 spec for Offset 10h:

bit 14 - Corrected Internal Error 
bit 15 - Header Log Overflow 

> +#define uncorrectable_error_string			\
> +	{BIT(4),	"Data Link Protocol"},		\
> +	{BIT(12),	"Poisoned TLP"},		\
> +	{BIT(13),	"Flow Control Protocol"},	\
> +	{BIT(14),	"Completion Timeout"},		\
> +	{BIT(15),	"Completer Abort"},		\
> +	{BIT(16),	"Unexpected Completion"},	\
> +	{BIT(17),	"Receiver Overflow"},		\
> +	{BIT(18),	"Malformed TLP"},		\
> +	{BIT(19),	"ECRC"},			\
> +	{BIT(20),	"Unsupported Request"}

Hmm... isn't something missing here? I'm seeing more bits defined at the
PCIe V3.0 spec for Offset 04h:

bit 5 - Surprise Down Error 
bit 21 - ACS Violation 
bit 22 - Uncorrectable Internal Error 
bit 23 - MC Blocked TLP 
bit 24 - AtomicOp Egress Blocked 
bit 25 - TLP Prefix Blocked Error 

> +
> +TRACE_EVENT(aer_event,
> +	TP_PROTO(const char *dev_name,
> +		 const u32 status,
> +		 const u8 severity),
> +
> +	TP_ARGS(dev_name, status, severity),
> +
> +	TP_STRUCT__entry(
> +		__string(	dev_name,	dev_name	)
> +		__field(	u32,		status		)
> +		__field(	u8,		severity	)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->status		= status;
> +		__entry->severity	= severity;
> +	),
> +
> +	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
> +		__get_str(dev_name),
> +		(__entry->severity == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +			((__entry->severity == HW_EVENT_ERR_FATAL) ?
> +			"Fatal" : "Uncorrected"),
> +		__entry->severity == HW_EVENT_ERR_CORRECTED ?
> +		__print_flags(__entry->status, "|", correctable_error_string) :
> +		__print_flags(__entry->status, "|", uncorrectable_error_string))
> +);
> +
> +#endif /* _TRACE_AER_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
>
Borislav Petkov Dec. 1, 2012, 11:43 a.m. UTC | #3
On Sat, Dec 01, 2012 at 09:36:14AM -0200, Mauro Carvalho Chehab wrote:
> Please don't call it as "ras.h". Call it, instead, ras_aer.h
> (or something similar) as, if we're moving the tracing back to
> include/trace/events/, the same should happen for the memory error
> events.

As you can see yourself, include/trace/events/ contains one header per
topic so having ras_event.h and ras_aer.h or whatever, doesn't fit the
scheme. IOW, we want all RAS-specific tracepoints to be collected in a
header called ras.h like the rest of the subsystems do it. See rcu.h
there for a good example.
Mauro Carvalho Chehab Dec. 1, 2012, 1:28 p.m. UTC | #4
Cheers,
Mauro

On Sat, 1 Dec 2012, Borislav Petkov wrote:

> On Sat, Dec 01, 2012 at 09:36:14AM -0200, Mauro Carvalho Chehab wrote:
>> Please don't call it as "ras.h". Call it, instead, ras_aer.h
>> (or something similar) as, if we're moving the tracing back to
>> include/trace/events/, the same should happen for the memory error
>> events.
>
> As you can see yourself, include/trace/events/ contains one header per
> topic so having ras_event.h and ras_aer.h or whatever, doesn't fit the
> scheme. IOW, we want all RAS-specific tracepoints to be collected in a
> header called ras.h like the rest of the subsystems do it. See rcu.h
> there for a good example.

Works for me.

Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ortiz, Lance E Dec. 3, 2012, 8:01 p.m. UTC | #5
> > +#define correctable_error_string			\
> > +	{BIT(0),	"Receiver Error"},		\
> > +	{BIT(6),	"Bad TLP"},			\
> > +	{BIT(7),	"Bad DLLP"},			\
> > +	{BIT(8),	"RELAY_NUM Rollover"},		\
> > +	{BIT(12),	"Replay Timer Timeout"},	\
> > +	{BIT(13),	"Advisory Non-Fatal"}
> 
> Hmm... isn't something missing here? I'm seeing more bits defined at
> the
> PCIe V3.0 spec for Offset 10h:
> 
> bit 14 - Corrected Internal Error
> bit 15 - Header Log Overflow
> 
> > +#define uncorrectable_error_string			\
> > +	{BIT(4),	"Data Link Protocol"},		\
> > +	{BIT(12),	"Poisoned TLP"},		\
> > +	{BIT(13),	"Flow Control Protocol"},	\
> > +	{BIT(14),	"Completion Timeout"},		\
> > +	{BIT(15),	"Completer Abort"},		\
> > +	{BIT(16),	"Unexpected Completion"},	\
> > +	{BIT(17),	"Receiver Overflow"},		\
> > +	{BIT(18),	"Malformed TLP"},		\
> > +	{BIT(19),	"ECRC"},			\
> > +	{BIT(20),	"Unsupported Request"}
> 
> Hmm... isn't something missing here? I'm seeing more bits defined at
> the
> PCIe V3.0 spec for Offset 04h:
> 
> bit 5 - Surprise Down Error
> bit 21 - ACS Violation
> bit 22 - Uncorrectable Internal Error
> bit 23 - MC Blocked TLP
> bit 24 - AtomicOp Egress Blocked
> bit 25 - TLP Prefix Blocked Error
> 

I used the same errors defined in the string arrays at the top of aerdrv_errprint.c.  I am not sure why they were left out in that file.  I will investigate and probably add them as a later patch and then include them in aerdrv_errprint.c also.

Sound good?

Lance

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Dec. 3, 2012, 8:27 p.m. UTC | #6
On Mon, Dec 03, 2012 at 08:01:46PM +0000, Ortiz, Lance E wrote:
> > > +#define correctable_error_string			\
> > > +	{BIT(0),	"Receiver Error"},		\
> > > +	{BIT(6),	"Bad TLP"},			\
> > > +	{BIT(7),	"Bad DLLP"},			\
> > > +	{BIT(8),	"RELAY_NUM Rollover"},		\
> > > +	{BIT(12),	"Replay Timer Timeout"},	\
> > > +	{BIT(13),	"Advisory Non-Fatal"}
> > 
> > Hmm... isn't something missing here? I'm seeing more bits defined at
> > the
> > PCIe V3.0 spec for Offset 10h:
> > 
> > bit 14 - Corrected Internal Error
> > bit 15 - Header Log Overflow
> > 
> > > +#define uncorrectable_error_string			\
> > > +	{BIT(4),	"Data Link Protocol"},		\
> > > +	{BIT(12),	"Poisoned TLP"},		\
> > > +	{BIT(13),	"Flow Control Protocol"},	\
> > > +	{BIT(14),	"Completion Timeout"},		\
> > > +	{BIT(15),	"Completer Abort"},		\
> > > +	{BIT(16),	"Unexpected Completion"},	\
> > > +	{BIT(17),	"Receiver Overflow"},		\
> > > +	{BIT(18),	"Malformed TLP"},		\
> > > +	{BIT(19),	"ECRC"},			\
> > > +	{BIT(20),	"Unsupported Request"}
> > 
> > Hmm... isn't something missing here? I'm seeing more bits defined at
> > the
> > PCIe V3.0 spec for Offset 04h:
> > 
> > bit 5 - Surprise Down Error
> > bit 21 - ACS Violation
> > bit 22 - Uncorrectable Internal Error
> > bit 23 - MC Blocked TLP
> > bit 24 - AtomicOp Egress Blocked
> > bit 25 - TLP Prefix Blocked Error
> > 
> I used the same errors defined in the string arrays at the top of
> aerdrv_errprint.c. I am not sure why they were left out in that file.
> I will investigate and probably add them as a later patch and then
> include them in aerdrv_errprint.c also.

Maybe we should ask Yanmin who added those strings in
6c2b374d74857e892080ee726184ec1d15e7d4e4; CCed.

Also, it would make a lot of sense to have those string definitions
at one place and then reuse them instead of define them again in the
tracepoint header and they get out of sync and have needless duplication
in the kernel, etc, etc.

So, instead, you could define some macros which can generate your
strings above like this:

#define aer_uncor_err_str(bitnum)	\
	{ BIT(bitnum), aer_uncorrectable_error_string(bitnum) }

and use that macro for the __print_flags flag_array argument.

Or something to that effect.

Even better would it be if the error strings in
<drivers/pci/pcie/aer/aerdrv_errprint.c> could be shared with the
tracepoint ones. That would require a bit more changes though but
something like using an array of trace_print_flags instead an array of
strings could be doable.

Then, when you need the string, you do uncor_err_array[i].name and so
on.

Thanks.
Ortiz, Lance E Dec. 3, 2012, 9:13 p.m. UTC | #7
PiANCj4gTWF5YmUgd2Ugc2hvdWxkIGFzayBZYW5taW4gd2hvIGFkZGVkIHRob3NlIHN0cmluZ3Mg
aW4NCj4gNmMyYjM3NGQ3NDg1N2U4OTIwODBlZTcyNjE4NGVjMWQxNWU3ZDRlNDsgQ0NlZC4NCj4g
DQo+IEFsc28sIGl0IHdvdWxkIG1ha2UgYSBsb3Qgb2Ygc2Vuc2UgdG8gaGF2ZSB0aG9zZSBzdHJp
bmcgZGVmaW5pdGlvbnMNCj4gYXQgb25lIHBsYWNlIGFuZCB0aGVuIHJldXNlIHRoZW0gaW5zdGVh
ZCBvZiBkZWZpbmUgdGhlbSBhZ2FpbiBpbiB0aGUNCj4gdHJhY2Vwb2ludCBoZWFkZXIgYW5kIHRo
ZXkgZ2V0IG91dCBvZiBzeW5jIGFuZCBoYXZlIG5lZWRsZXNzDQo+IGR1cGxpY2F0aW9uDQo+IGlu
IHRoZSBrZXJuZWwsIGV0YywgZXRjLg0KPiANCj4gU28sIGluc3RlYWQsIHlvdSBjb3VsZCBkZWZp
bmUgc29tZSBtYWNyb3Mgd2hpY2ggY2FuIGdlbmVyYXRlIHlvdXINCj4gc3RyaW5ncyBhYm92ZSBs
aWtlIHRoaXM6DQo+IA0KPiAjZGVmaW5lIGFlcl91bmNvcl9lcnJfc3RyKGJpdG51bSkJXA0KPiAJ
eyBCSVQoYml0bnVtKSwgYWVyX3VuY29ycmVjdGFibGVfZXJyb3Jfc3RyaW5nKGJpdG51bSkgfQ0K
PiANCj4gYW5kIHVzZSB0aGF0IG1hY3JvIGZvciB0aGUgX19wcmludF9mbGFncyBmbGFnX2FycmF5
IGFyZ3VtZW50Lg0KPiANCj4gT3Igc29tZXRoaW5nIHRvIHRoYXQgZWZmZWN0Lg0KPiANCj4gRXZl
biBiZXR0ZXIgd291bGQgaXQgYmUgaWYgdGhlIGVycm9yIHN0cmluZ3MgaW4NCj4gPGRyaXZlcnMv
cGNpL3BjaWUvYWVyL2FlcmRydl9lcnJwcmludC5jPiBjb3VsZCBiZSBzaGFyZWQgd2l0aCB0aGUN
Cj4gdHJhY2Vwb2ludCBvbmVzLiBUaGF0IHdvdWxkIHJlcXVpcmUgYSBiaXQgbW9yZSBjaGFuZ2Vz
IHRob3VnaCBidXQNCj4gc29tZXRoaW5nIGxpa2UgdXNpbmcgYW4gYXJyYXkgb2YgdHJhY2VfcHJp
bnRfZmxhZ3MgaW5zdGVhZCBhbiBhcnJheSBvZg0KPiBzdHJpbmdzIGNvdWxkIGJlIGRvYWJsZS4N
Cj4gDQo+IFRoZW4sIHdoZW4geW91IG5lZWQgdGhlIHN0cmluZywgeW91IGRvIHVuY29yX2Vycl9h
cnJheVtpXS5uYW1lIGFuZCBzbw0KPiBvbi4NCg0KVGhpcyBhIGZpbmUgaWRlYS4gIEknbSB0aGlu
a2luZyB3ZSBtaWdodCB3YW50IHRvIGltcGxlbWVudCB0aGF0IGluIGEgbGF0ZXIgcGF0Y2ggc2lu
Y2UgaXQgaXMgYSBiaWdnZXIgY2hhbmdlIGFuZCB3ZSBzdGlsbCBuZWVkIHRvIHNlZSBhYm91dCBh
ZGRpbmcgYWRkaXRpb25hbCBzdHJpbmdzLiAgSSBjYW4gc2VuZCBvdXQgb25lIG1vcmUgdmVyc2lv
biBvZiB0aGUgcGF0Y2ggd2l0aCB0aGUgY29tbWVudCBoZWFkZXIgY2hhbmdlcyB5b3Ugc3VnZ2Vz
dGVkLCBhbmQgdGhlIGNvbW1lbnRzIEpvZSBoYWQgYW5kIG1heWJlIHdlIGNhbiBtb3ZlIGZvcndh
cmQgd2l0aCB0aGF0Lg0KDQpMYW5jZSANCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Dec. 3, 2012, 9:17 p.m. UTC | #8
On Mon, Dec 03, 2012 at 09:13:49PM +0000, Ortiz, Lance E wrote:
> This a fine idea. I'm thinking we might want to implement that in a
> later patch since it is a bigger change and we still need to see about
> adding additional strings.

Yes, definitely in a later patch or more depending on how much changes
are needed (haven't looked closely at the whole deal).

> I can send out one more version of the patch with the comment header
> changes you suggested, and the comments Joe had and maybe we can move
> forward with that.

Sure, thanks.
diff mbox

Patch

diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
new file mode 100644
index 0000000..f77d009
--- /dev/null
+++ b/include/trace/events/ras.h
@@ -0,0 +1,77 @@ 
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM aer_event
+#define TRACE_INCLUDE_FILE ras 
+
+#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_AER_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+
+
+/*
+ * Anhance Error Reporting (AER) PCIE Report Error
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event on a pci express device and reports
+ * errors.  The event reports the following data.
+ *
+ * char * dev_name -	String containing the device identification
+ * u32 status -		Either the correctable or uncorrectable register
+ *			indicating what error or errors have been seen
+ * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
+ */
+
+#define correctable_error_string			\
+	{BIT(0),	"Receiver Error"},		\
+	{BIT(6),	"Bad TLP"},			\
+	{BIT(7),	"Bad DLLP"},			\
+	{BIT(8),	"RELAY_NUM Rollover"},		\
+	{BIT(12),	"Replay Timer Timeout"},	\
+	{BIT(13),	"Advisory Non-Fatal"}
+
+#define uncorrectable_error_string			\
+	{BIT(4),	"Data Link Protocol"},		\
+	{BIT(12),	"Poisoned TLP"},		\
+	{BIT(13),	"Flow Control Protocol"},	\
+	{BIT(14),	"Completion Timeout"},		\
+	{BIT(15),	"Completer Abort"},		\
+	{BIT(16),	"Unexpected Completion"},	\
+	{BIT(17),	"Receiver Overflow"},		\
+	{BIT(18),	"Malformed TLP"},		\
+	{BIT(19),	"ECRC"},			\
+	{BIT(20),	"Unsupported Request"}
+
+TRACE_EVENT(aer_event,
+	TP_PROTO(const char *dev_name,
+		 const u32 status,
+		 const u8 severity),
+
+	TP_ARGS(dev_name, status, severity),
+
+	TP_STRUCT__entry(
+		__string(	dev_name,	dev_name	)
+		__field(	u32,		status		)
+		__field(	u8,		severity	)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		__entry->status		= status;
+		__entry->severity	= severity;
+	),
+
+	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
+		__get_str(dev_name),
+		(__entry->severity == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->severity == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		__entry->severity == HW_EVENT_ERR_CORRECTED ?
+		__print_flags(__entry->status, "|", correctable_error_string) :
+		__print_flags(__entry->status, "|", uncorrectable_error_string))
+);
+
+#endif /* _TRACE_AER_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>