diff mbox series

[v4,1/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Message ID 20200207103143.20104-2-shiju.jose@huawei.com
State New
Headers show
Series ACPI: APEI: Add support to notify the vendor specific HW errors | expand

Commit Message

Shiju Jose Feb. 7, 2020, 10:31 a.m. UTC
Presently APEI does not support reporting the vendor specific
HW errors, received in the vendor defined table entries, to the
vendor drivers for any recovery.

This patch adds the support to register and unregister the
error handling function for the vendor specific HW errors and
notify the registered kernel driver.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/apei/ghes.c | 116 +++++++++++++++++++++++++++++++++++++++++++++--
 include/acpi/ghes.h      |  56 +++++++++++++++++++++++
 2 files changed, 167 insertions(+), 5 deletions(-)

Comments

James Morse March 11, 2020, 5:29 p.m. UTC | #1
Hi Shiju,

On 07/02/2020 10:31, Shiju Jose wrote:
> Presently APEI does not support reporting the vendor specific
> HW errors, received in the vendor defined table entries, to the
> vendor drivers for any recovery.
> 
> This patch adds the support to register and unregister the
> error handling function for the vendor specific HW errors and
> notify the registered kernel driver.

Is it possible to use the kernel's existing atomic_notifier_chain_register() API for this?

The one thing that can't be done in the same way is the GUID filtering in ghes.c. Each
driver would need to check if the call matched a GUID they knew about, and return
NOTIFY_DONE if they "don't care".

I think this patch would be a lot smaller if it was tweaked to be able to use the existing
API. If there is a reason not to use it, it would be good to know what it is.


> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 103acbb..69e18d7 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)

> +/**
> + * ghes_unregister_event_handler - unregister the previously
> + * registered event handling function.
> + * @sec_type: sec_type of the corresponding CPER.
> + * @data: driver specific data to distinguish devices.
> + */
> +void ghes_unregister_event_handler(guid_t sec_type, void *data)
> +{
> +	struct ghes_event_notify *event_notify;
> +	bool found = false;
> +
> +	mutex_lock(&ghes_event_notify_mutex);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(event_notify,
> +				&ghes_event_handler_list, list) {
> +		if (guid_equal(&event_notify->sec_type, &sec_type)) {

> +			if (data != event_notify->data)

It looks like you need multiple drivers to handle the same GUID because of multiple root
ports. Can't the handler lookup the right device?


> +				continue;
> +			list_del_rcu(&event_notify->list);
> +			found = true;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +	mutex_unlock(&ghes_event_notify_mutex);
> +
> +	if (!found) {
> +		pr_err("Tried to unregister a GHES event handler that has not been registered\n");
> +		return;
> +	}
> +
> +	synchronize_rcu();
> +	kfree(event_notify);
> +}
> +EXPORT_SYMBOL_GPL(ghes_unregister_event_handler);

> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  			log_arm_hw_error(err);
>  		} else {
> -			void *err = acpi_hest_get_payload(gdata);
> -
> -			log_non_standard_event(sec_type, fru_id, fru_text,
> -					       sec_sev, err,
> -					       gdata->error_data_length);
> +			if (!ghes_handle_non_standard_event(sec_type, gdata,
> +							    sev)) {
> +				void *err = acpi_hest_get_payload(gdata);
> +
> +				log_non_standard_event(sec_type, fru_id,
> +						       fru_text, sec_sev, err,
> +						       gdata->error_data_length);
> +			}

So, a side effect of the kernel handling these is they no longer get logged out of trace
points?

I guess the driver the claims this logs some more accurate information. Are there expected
to be any user-space programs doing something useful with B2889FC9... today?


Thanks,

James
Shiju Jose March 12, 2020, 12:10 p.m. UTC | #2
Hi James,

Thanks for reviewing the code.

>-----Original Message-----
>From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-
>owner@vger.kernel.org] On Behalf Of James Morse
>Sent: 11 March 2020 17:30
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-acpi@vger.kernel.org; linux-pci@vger.kernel.org; linux-
>kernel@vger.kernel.org; rjw@rjwysocki.net; helgaas@kernel.org;
>lenb@kernel.org; bp@alien8.de; tony.luck@intel.com;
>gregkh@linuxfoundation.org; zhangliguang@linux.alibaba.com;
>tglx@linutronix.de; Linuxarm <linuxarm@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>yangyicong <yangyicong@huawei.com>
>Subject: Re: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor
>specific HW errors
>
>Hi Shiju,
>
>On 07/02/2020 10:31, Shiju Jose wrote:
>> Presently APEI does not support reporting the vendor specific HW
>> errors, received in the vendor defined table entries, to the vendor
>> drivers for any recovery.
>>
>> This patch adds the support to register and unregister the error
>> handling function for the vendor specific HW errors and notify the
>> registered kernel driver.
>
>Is it possible to use the kernel's existing atomic_notifier_chain_register() API for
>this?
>
>The one thing that can't be done in the same way is the GUID filtering in ghes.c.
>Each driver would need to check if the call matched a GUID they knew about,
>and return NOTIFY_DONE if they "don't care".
>
>I think this patch would be a lot smaller if it was tweaked to be able to use the
>existing API. If there is a reason not to use it, it would be good to know what it
>is.
I think when using atomic_notifier_chain_register we have following limitations,
1. All the registered error handlers would get called, though an error is not related to those handlers.    
    Also this may lead to mishandling of the error information if a handler does not
    implement GUID checking etc.
2. atomic_notifier_chain_register (notifier_chain_register) looks like does not support 
    pass the handler's private data during the registration which supposed to 
    passed later in the call back function *notifier_fn_t(... ,void *data) to the handler.
3. Also got difficulty in passing the ghes error data(acpi_hest_generic_data), GUID
    for the error received to the handler through the notifier_chain  callback interface. 
    
Sorry if I did not  understood your suggestion correctly.
 
>
>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
>> 103acbb..69e18d7 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct
>> acpi_hest_generic_data *gdata)
>
>> +/**
>> + * ghes_unregister_event_handler - unregister the previously
>> + * registered event handling function.
>> + * @sec_type: sec_type of the corresponding CPER.
>> + * @data: driver specific data to distinguish devices.
>> + */
>> +void ghes_unregister_event_handler(guid_t sec_type, void *data) {
>> +	struct ghes_event_notify *event_notify;
>> +	bool found = false;
>> +
>> +	mutex_lock(&ghes_event_notify_mutex);
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(event_notify,
>> +				&ghes_event_handler_list, list) {
>> +		if (guid_equal(&event_notify->sec_type, &sec_type)) {
>
>> +			if (data != event_notify->data)
>
>It looks like you need multiple drivers to handle the same GUID because of
>multiple root ports. Can't the handler lookup the right device?
This check was because GUID is shared among multiple devices with one driver as seen
in the B2889FC9 driver (pcie-hisi-error.c). 
  
>
>
>> +				continue;
>> +			list_del_rcu(&event_notify->list);
>> +			found = true;
>> +			break;
>> +		}
>> +	}
>> +	rcu_read_unlock();
>> +	mutex_unlock(&ghes_event_notify_mutex);
>> +
>> +	if (!found) {
>> +		pr_err("Tried to unregister a GHES event handler that has not
>been registered\n");
>> +		return;
>> +	}
>> +
>> +	synchronize_rcu();
>> +	kfree(event_notify);
>> +}
>> +EXPORT_SYMBOL_GPL(ghes_unregister_event_handler);
>
>> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>>
>>  			log_arm_hw_error(err);
>>  		} else {
>> -			void *err = acpi_hest_get_payload(gdata);
>> -
>> -			log_non_standard_event(sec_type, fru_id, fru_text,
>> -					       sec_sev, err,
>> -					       gdata->error_data_length);
>> +			if (!ghes_handle_non_standard_event(sec_type, gdata,
>> +							    sev)) {
>> +				void *err = acpi_hest_get_payload(gdata);
>> +
>> +				log_non_standard_event(sec_type, fru_id,
>> +						       fru_text, sec_sev, err,
>> +						       gdata->error_data_length);
>> +			}
>
>So, a side effect of the kernel handling these is they no longer get logged out of
>trace points?
>
>I guess the driver the claims this logs some more accurate information. Are
>there expected to be any user-space programs doing something useful with
>B2889FC9... today?
The B2889FC9 driver does not expect any corresponding user space programs. 
The driver mainly for the error recovery and basic error decoding and logging.
Previously we added the error logging for the B2889FC9 in the rasdaemon.
>
>
>Thanks,
>
>James

Thanks,
Shiju
James Morse March 13, 2020, 3:17 p.m. UTC | #3
Hi Shiju,

On 3/12/20 12:10 PM, Shiju Jose wrote:
>> On 07/02/2020 10:31, Shiju Jose wrote:
>>> Presently APEI does not support reporting the vendor specific HW
>>> errors, received in the vendor defined table entries, to the vendor
>>> drivers for any recovery.
>>>
>>> This patch adds the support to register and unregister the error
>>> handling function for the vendor specific HW errors and notify the
>>> registered kernel driver.
>>
>> Is it possible to use the kernel's existing atomic_notifier_chain_register() API for
>> this?
>>
>> The one thing that can't be done in the same way is the GUID filtering in ghes.c.
>> Each driver would need to check if the call matched a GUID they knew about,
>> and return NOTIFY_DONE if they "don't care".
>>
>> I think this patch would be a lot smaller if it was tweaked to be able to use the
>> existing API. If there is a reason not to use it, it would be good to know what it
>> is.

> I think when using atomic_notifier_chain_register we have following limitations,
> 1. All the registered error handlers would get called, though an error is not related to those handlers.    

The notifier chain provides NOTIFY_STOP_MASK, so that one of the callers
can say the work is done. We only expect a handful of these, so I don't
think there is going to be a scalability problem.


>     Also this may lead to mishandling of the error information if a handler does not
>     implement GUID checking etc.

Which would be a bug we can fix.
There is no point worrying about bugs in out of tree code.


> 2. atomic_notifier_chain_register (notifier_chain_register) looks like does not support 
>     pass the handler's private data during the registration which supposed to 
>     passed later in the call back function *notifier_fn_t(... ,void *data) to the handler.

The callback is provided with the struct notifier_block. A bit of
container_of() magic will give you whatever structure you embedded it in!


> 3. Also got difficulty in passing the ghes error data(acpi_hest_generic_data), GUID
>     for the error received to the handler through the notifier_chain  callback interface. 

Here you've lost me. Because you need to pass more than one thing? Can't
we have a struct for that?

But, isn't it all in struct acpi_hest_generic_data already? That is
where the guid and severity come from.


>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
>>> 103acbb..69e18d7 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct
>>> acpi_hest_generic_data *gdata)
>>
>>> +/**
>>> + * ghes_unregister_event_handler - unregister the previously
>>> + * registered event handling function.
>>> + * @sec_type: sec_type of the corresponding CPER.
>>> + * @data: driver specific data to distinguish devices.
>>> + */
>>> +void ghes_unregister_event_handler(guid_t sec_type, void *data) {
>>> +	struct ghes_event_notify *event_notify;
>>> +	bool found = false;
>>> +
>>> +	mutex_lock(&ghes_event_notify_mutex);
>>> +	rcu_read_lock();
>>> +	list_for_each_entry_rcu(event_notify,
>>> +				&ghes_event_handler_list, list) {
>>> +		if (guid_equal(&event_notify->sec_type, &sec_type)) {
>>
>>> +			if (data != event_notify->data)
>>
>> It looks like you need multiple drivers to handle the same GUID because of
>> multiple root ports. Can't the handler lookup the right device?

> This check was because GUID is shared among multiple devices with one driver as seen
> in the B2889FC9 driver (pcie-hisi-error.c). 

(we should stop calling it by its guid ... does it have a name?!)


This must be some kind of error collector for a bus right?

I agree we may need to have multiple drivers register to handle vendor
events, but it looks like you are registering the same handler multiple
times, with different private structures.

Can't it find the affected device from the error description?


>>> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>>>
>>>  			log_arm_hw_error(err);
>>>  		} else {
>>> -			void *err = acpi_hest_get_payload(gdata);
>>> -
>>> -			log_non_standard_event(sec_type, fru_id, fru_text,
>>> -					       sec_sev, err,
>>> -					       gdata->error_data_length);
>>> +			if (!ghes_handle_non_standard_event(sec_type, gdata,
>>> +							    sev)) {
>>> +				void *err = acpi_hest_get_payload(gdata);
>>> +
>>> +				log_non_standard_event(sec_type, fru_id,
>>> +						       fru_text, sec_sev, err,
>>> +						       gdata->error_data_length);
>>> +			}
>>
>> So, a side effect of the kernel handling these is they no longer get logged out of
>> trace points?
>>
>> I guess the driver the claims this logs some more accurate information. Are
>> there expected to be any user-space programs doing something useful with
>> B2889FC9... today?

> The B2889FC9 driver does not expect any corresponding user space programs. 
> The driver mainly for the error recovery and basic error decoding and logging.

> Previously we added the error logging for the B2889FC9 in the rasdaemon.

So this series would break the error logging in rasdaemon.

User-space would need to be upgraded to receive the trace information
from the specific driver instead. (how does it know?!)

Could we log_non_standard_event() unconditionally, maybe adding a field
to indicate that a driver claimed it, so there may be more data
somewhere else...


Thanks,

James
Shiju Jose March 13, 2020, 5:08 p.m. UTC | #4
Hi James,

>-----Original Message-----
>From: James Morse [mailto:james.morse@arm.com]
>Sent: 13 March 2020 15:17
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-acpi@vger.kernel.org; linux-pci@vger.kernel.org; linux-
>kernel@vger.kernel.org; rjw@rjwysocki.net; helgaas@kernel.org;
>lenb@kernel.org; bp@alien8.de; tony.luck@intel.com;
>gregkh@linuxfoundation.org; zhangliguang@linux.alibaba.com;
>tglx@linutronix.de; Linuxarm <linuxarm@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>yangyicong <yangyicong@huawei.com>
>Subject: Re: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor
>specific HW errors
>
>Hi Shiju,
>
>On 3/12/20 12:10 PM, Shiju Jose wrote:
>>> On 07/02/2020 10:31, Shiju Jose wrote:
>>>> Presently APEI does not support reporting the vendor specific HW
>>>> errors, received in the vendor defined table entries, to the vendor
>>>> drivers for any recovery.
>>>>
>>>> This patch adds the support to register and unregister the error
>>>> handling function for the vendor specific HW errors and notify the
>>>> registered kernel driver.
>>>
>>> Is it possible to use the kernel's existing
>>> atomic_notifier_chain_register() API for this?
>>>
>>> The one thing that can't be done in the same way is the GUID filtering in
>ghes.c.
>>> Each driver would need to check if the call matched a GUID they knew
>>> about, and return NOTIFY_DONE if they "don't care".
>>>
>>> I think this patch would be a lot smaller if it was tweaked to be
>>> able to use the existing API. If there is a reason not to use it, it
>>> would be good to know what it is.
>
>> I think when using atomic_notifier_chain_register we have following
>limitations,
>> 1. All the registered error handlers would get called, though an error is not
>related to those handlers.
>
>The notifier chain provides NOTIFY_STOP_MASK, so that one of the callers can
>say the work is done. We only expect a handful of these, so I don't think there is
>going to be a scalability problem.
Ok. I will check the error reporting by using atomic_notifier_chain and test.

>
>
>>     Also this may lead to mishandling of the error information if a handler does
>not
>>     implement GUID checking etc.
>
>Which would be a bug we can fix.
>There is no point worrying about bugs in out of tree code.
Ok.

>
>
>> 2. atomic_notifier_chain_register (notifier_chain_register) looks like does not
>support
>>     pass the handler's private data during the registration which supposed to
>>     passed later in the call back function *notifier_fn_t(... ,void *data) to the
>handler.
>
>The callback is provided with the struct notifier_block. A bit of
>container_of() magic will give you whatever structure you embedded it in!
Ok. I will check this.
 
>
>
>> 3. Also got difficulty in passing the ghes error data(acpi_hest_generic_data),
>GUID
>>     for the error received to the handler through the notifier_chain  callback
>interface.
>
>Here you've lost me. Because you need to pass more than one thing? Can't we
>have a struct for that?
>
>But, isn't it all in struct acpi_hest_generic_data already? That is where the guid
>and severity come from.
Ok.  right. 
 
>
>
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index
>>>> 103acbb..69e18d7 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct
>>>> acpi_hest_generic_data *gdata)
>>>
>>>> +/**
>>>> + * ghes_unregister_event_handler - unregister the previously
>>>> + * registered event handling function.
>>>> + * @sec_type: sec_type of the corresponding CPER.
>>>> + * @data: driver specific data to distinguish devices.
>>>> + */
>>>> +void ghes_unregister_event_handler(guid_t sec_type, void *data) {
>>>> +	struct ghes_event_notify *event_notify;
>>>> +	bool found = false;
>>>> +
>>>> +	mutex_lock(&ghes_event_notify_mutex);
>>>> +	rcu_read_lock();
>>>> +	list_for_each_entry_rcu(event_notify,
>>>> +				&ghes_event_handler_list, list) {
>>>> +		if (guid_equal(&event_notify->sec_type, &sec_type)) {
>>>
>>>> +			if (data != event_notify->data)
>>>
>>> It looks like you need multiple drivers to handle the same GUID
>>> because of multiple root ports. Can't the handler lookup the right device?
>
>> This check was because GUID is shared among multiple devices with one
>> driver as seen in the B2889FC9 driver (pcie-hisi-error.c).
>
>(we should stop calling it by its guid ... does it have a name?!)
>
>
>This must be some kind of error collector for a bus right?
>
>I agree we may need to have multiple drivers register to handle vendor events,
>but it looks like you are registering the same handler multiple times, with
>different private structures.
>
>Can't it find the affected device from the error description?
Yes. We already have the code in the PCIe error handling driver to identify the right device
from the error information.

>
>
>>>> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>>>>
>>>>  			log_arm_hw_error(err);
>>>>  		} else {
>>>> -			void *err = acpi_hest_get_payload(gdata);
>>>> -
>>>> -			log_non_standard_event(sec_type, fru_id, fru_text,
>>>> -					       sec_sev, err,
>>>> -					       gdata->error_data_length);
>>>> +			if (!ghes_handle_non_standard_event(sec_type, gdata,
>>>> +							    sev)) {
>>>> +				void *err = acpi_hest_get_payload(gdata);
>>>> +
>>>> +				log_non_standard_event(sec_type, fru_id,
>>>> +						       fru_text, sec_sev, err,
>>>> +						       gdata->error_data_length);
>>>> +			}
>>>
>>> So, a side effect of the kernel handling these is they no longer get
>>> logged out of trace points?
>>>
>>> I guess the driver the claims this logs some more accurate
>>> information. Are there expected to be any user-space programs doing
>>> something useful with B2889FC9... today?
>
>> The B2889FC9 driver does not expect any corresponding user space
>programs.
>> The driver mainly for the error recovery and basic error decoding and logging.
>
>> Previously we added the error logging for the B2889FC9 in the rasdaemon.
>
>So this series would break the error logging in rasdaemon.
It does not affect the logging information to the user for the HiSilicon PCIe controller errors
because the level of logging information is the same both in the rasdaemon and in the
newly adding HiSilicon PCIe controller error handling driver.
>
>User-space would need to be upgraded to receive the trace information from
>the specific driver instead. (how does it know?!)
>
>Could we log_non_standard_event() unconditionally, maybe adding a field to
>indicate that a driver claimed it, so there may be more data somewhere else...
sure, I will check the possibility of adding the field to indicate driver claimed it and
calling log_non_standard_event() always.
>
>
>Thanks,
>
>James

Thanks,
Shiju
diff mbox series

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 103acbb..69e18d7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -490,6 +490,109 @@  static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 #endif
 }
 
+struct ghes_event_notify {
+	struct list_head list;
+	struct rcu_head	rcu_head;
+	guid_t sec_type; /* guid of the error record */
+	ghes_event_handler_t event_handler; /* event handler function */
+	void *data; /* handler driver's private data if any */
+};
+
+/* List to store the registered event handling functions */
+static DEFINE_MUTEX(ghes_event_notify_mutex);
+static LIST_HEAD(ghes_event_handler_list);
+
+/**
+ * ghes_register_event_handler - register an event handling
+ * function for the non-fatal HW errors.
+ * @sec_type: sec_type of the corresponding CPER to be notified.
+ * @event_handler: pointer to the error handling function.
+ * @data: handler driver's private data.
+ *
+ * return 0 : SUCCESS, non-zero : FAIL
+ */
+int ghes_register_event_handler(guid_t sec_type,
+				ghes_event_handler_t event_handler,
+				void *data)
+{
+	struct ghes_event_notify *event_notify;
+
+	event_notify = kzalloc(sizeof(*event_notify), GFP_KERNEL);
+	if (!event_notify)
+		return -ENOMEM;
+
+	event_notify->event_handler = event_handler;
+	guid_copy(&event_notify->sec_type, &sec_type);
+	event_notify->data = data;
+
+	mutex_lock(&ghes_event_notify_mutex);
+	list_add_rcu(&event_notify->list, &ghes_event_handler_list);
+	mutex_unlock(&ghes_event_notify_mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ghes_register_event_handler);
+
+/**
+ * ghes_unregister_event_handler - unregister the previously
+ * registered event handling function.
+ * @sec_type: sec_type of the corresponding CPER.
+ * @data: driver specific data to distinguish devices.
+ */
+void ghes_unregister_event_handler(guid_t sec_type, void *data)
+{
+	struct ghes_event_notify *event_notify;
+	bool found = false;
+
+	mutex_lock(&ghes_event_notify_mutex);
+	rcu_read_lock();
+	list_for_each_entry_rcu(event_notify,
+				&ghes_event_handler_list, list) {
+		if (guid_equal(&event_notify->sec_type, &sec_type)) {
+			if (data != event_notify->data)
+				continue;
+			list_del_rcu(&event_notify->list);
+			found = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	mutex_unlock(&ghes_event_notify_mutex);
+
+	if (!found) {
+		pr_err("Tried to unregister a GHES event handler that has not been registered\n");
+		return;
+	}
+
+	synchronize_rcu();
+	kfree(event_notify);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_event_handler);
+
+static int ghes_handle_non_standard_event(guid_t *sec_type,
+	struct acpi_hest_generic_data *gdata, int sev)
+{
+	struct ghes_event_notify *event_notify;
+	bool found = false;
+	int ret;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(event_notify,
+				&ghes_event_handler_list, list) {
+		if (guid_equal(&event_notify->sec_type, sec_type)) {
+			ret = event_notify->event_handler(gdata, sev,
+						    event_notify->data);
+			if (!ret)
+				continue;
+			found = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return found;
+}
+
 static void ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
@@ -525,11 +628,14 @@  static void ghes_do_proc(struct ghes *ghes,
 
 			log_arm_hw_error(err);
 		} else {
-			void *err = acpi_hest_get_payload(gdata);
-
-			log_non_standard_event(sec_type, fru_id, fru_text,
-					       sec_sev, err,
-					       gdata->error_data_length);
+			if (!ghes_handle_non_standard_event(sec_type, gdata,
+							    sev)) {
+				void *err = acpi_hest_get_payload(gdata);
+
+				log_non_standard_event(sec_type, fru_id,
+						       fru_text, sec_sev, err,
+						       gdata->error_data_length);
+			}
 		}
 	}
 }
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e3f1cdd..e3387cf 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -50,6 +50,62 @@  enum {
 	GHES_SEV_PANIC = 0x3,
 };
 
+enum {
+	GHES_EVENT_NONE	= 0x0,
+	GHES_EVENT_HANDLED	= 0x1,
+};
+
+/**
+ * typedef ghes_event_handler_t - event handling function
+ * for the non-fatal HW errors.
+ *
+ * @gdata: acpi_hest_generic_data.
+ * @sev: error severity of the entire error event defined in the
+ *       ACPI spec table generic error status block.
+ * @data: handler driver's private data.
+ *
+ * Return : GHES_EVENT_NONE - event not handled, GHES_EVENT_HANDLED - handled.
+ *
+ * The error handling function is responsible for logging error and
+ * this function would be called in the interrupt context.
+ */
+typedef int (*ghes_event_handler_t)(struct acpi_hest_generic_data *gdata,
+				    int sev, void *data);
+
+#ifdef CONFIG_ACPI_APEI_GHES
+/**
+ * ghes_register_event_handler - register an event handling
+ * function for the non-fatal HW errors.
+ * @sec_type: sec_type of the corresponding CPER to be notified.
+ * @event_handler: pointer to the event handling function.
+ * @data: handler driver's private data.
+ *
+ * Return : 0 - SUCCESS, non-zero - FAIL.
+ */
+int ghes_register_event_handler(guid_t sec_type,
+				ghes_event_handler_t event_handler,
+				void *data);
+
+/**
+ * ghes_unregister_event_handler - unregister the previously
+ * registered event handling function.
+ * @sec_type: sec_type of the corresponding CPER.
+ * @data: driver specific data to distinguish devices.
+ */
+void ghes_unregister_event_handler(guid_t sec_typei, void *data);
+#else
+static inline int ghes_register_event_handler(guid_t sec_type,
+					ghes_event_handler_t event_handler,
+					void *data)
+{
+	return -ENODEV;
+}
+
+static inline void ghes_unregister_event_handler(guid_t sec_type, void *data)
+{
+}
+#endif
+
 int ghes_estatus_pool_init(int num_ghes);
 
 /* From drivers/edac/ghes_edac.c */