[linux,dev-4.13,1/2] Revert "drivers/hwmon/occ: Add temperature fault attribute and VRM temp alarm"

Message ID 1519944259-27898-3-git-send-email-eajames@linux.vnet.ibm.com
State New
Headers show
Series
  • [linux,dev-4.13,1/2] Revert "drivers/hwmon/occ: Add temperature fault attribute and VRM temp alarm"
Related show

Commit Message

Eddie James March 1, 2018, 10:44 p.m.
From: "Edward A. James" <eajames@us.ibm.com>

This reverts commit e55423ee10a5057338d24383c00e813436a126ea.

Apologies for pushing this up so early... Userspace applications aren't
ready for this change. The hwmon polling application cannot accept EGAIN
yet, and we can't be returning apparent errors if the sensor is
temporarily unavailable.

Brad Bishop <bradleyb@fuzziesquirrel.com> wrote:
> userspace retries for a configurable period on eio, etimedout,
> ebadmsg, eagain, and exnio.  It also exits cleanly on enoent.
>
> If any of these errnos don’t go away after the configured number of retry
> attempts, an error is logged.  Any other errnos and an error is logged
> immediately. If this list needs improvement I’d love to hear about it.
>
> ebadmsg and enxio are observed when i2c devices are unplugged with transfers
> in various stages of flight.  They occur just before the driver is unbound
> after the presence gpio toggle on these devices is noticed and processed
> (killing the hwmon userspace daemon cleanly).
>
> eio, etimedout seem to be bus type errors that appear somewhat infrequently,
> but occur nevertheless.
>
> we all know what eagain means...
>
> It isn’t so much that the hwmon userspace doesn’t support this change,
> its just the resulting behavior difference at the other end of the
> hwmon abi <-> dbus api translation is not optimal, right now.
>
> 1 - Without this change, the hwmon userspace reads a value of 0 out of
> the sysfs attribute and happily reports that as the temp at the DBus level.
>
> With this change, the hwmon userspace would read the attribute, get an
> error and retry for a bit.  After a number of retries, the error is
> logged and the sensor is marked faulted at a dbus level.
>
> The issue is the consumer of the sensor at the dbus level.  Today, the
> dbus consumer happens to translate the sensor value of zero into the
> desired behavior.  If we submit this change the sensor will be put
> into faulted state and the consuming application doesn’t have support
> for that yet.
>
> We have open issues to enhance the application to support faulted sensors,
> its just not implemented yet.

OpenBMC-Staging-Count: 1
Signed-off-by: Eddie James <eajames@us.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
---
 drivers/hwmon/occ/common.c | 33 +++------------------------------
 1 file changed, 3 insertions(+), 30 deletions(-)

Comments

Eddie James March 1, 2018, 10:47 p.m. | #1
Sorry, ignore this one. I renamed this patch to hwmon (occ): Remove 
temperature fault attribute and VRM temp alarm and forgot to delete this 
one.


On 03/01/2018 04:44 PM, Eddie James wrote:
> From: "Edward A. James" <eajames@us.ibm.com>
>
> This reverts commit e55423ee10a5057338d24383c00e813436a126ea.
>
> Apologies for pushing this up so early... Userspace applications aren't
> ready for this change. The hwmon polling application cannot accept EGAIN
> yet, and we can't be returning apparent errors if the sensor is
> temporarily unavailable.
>
> Brad Bishop <bradleyb@fuzziesquirrel.com> wrote:
>> userspace retries for a configurable period on eio, etimedout,
>> ebadmsg, eagain, and exnio.  It also exits cleanly on enoent.
>>
>> If any of these errnos don’t go away after the configured number of retry
>> attempts, an error is logged.  Any other errnos and an error is logged
>> immediately. If this list needs improvement I’d love to hear about it.
>>
>> ebadmsg and enxio are observed when i2c devices are unplugged with transfers
>> in various stages of flight.  They occur just before the driver is unbound
>> after the presence gpio toggle on these devices is noticed and processed
>> (killing the hwmon userspace daemon cleanly).
>>
>> eio, etimedout seem to be bus type errors that appear somewhat infrequently,
>> but occur nevertheless.
>>
>> we all know what eagain means...
>>
>> It isn’t so much that the hwmon userspace doesn’t support this change,
>> its just the resulting behavior difference at the other end of the
>> hwmon abi <-> dbus api translation is not optimal, right now.
>>
>> 1 - Without this change, the hwmon userspace reads a value of 0 out of
>> the sysfs attribute and happily reports that as the temp at the DBus level.
>>
>> With this change, the hwmon userspace would read the attribute, get an
>> error and retry for a bit.  After a number of retries, the error is
>> logged and the sensor is marked faulted at a dbus level.
>>
>> The issue is the consumer of the sensor at the dbus level.  Today, the
>> dbus consumer happens to translate the sensor value of zero into the
>> desired behavior.  If we submit this change the sensor will be put
>> into faulted state and the consuming application doesn’t have support
>> for that yet.
>>
>> We have open issues to enhance the application to support faulted sensors,
>> its just not implemented yet.
> OpenBMC-Staging-Count: 1
> Signed-off-by: Eddie James <eajames@us.ibm.com>
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
>   drivers/hwmon/occ/common.c | 33 +++------------------------------
>   1 file changed, 3 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 7c97a4c..a5c4df0 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -304,24 +304,11 @@ static ssize_t occ_show_temp_2(struct device *dev,
>   		val = get_unaligned_be32(&temp->sensor_id);
>   		break;
>   	case 1:
> -		val = temp->value;
> -		if (val == OCC_TEMP_SENSOR_FAULT)
> -			return -EREMOTEIO;
> -
> -		if (temp->fru_type != OCC_FRU_TYPE_VRM) {
> -			/* sensor not ready */
> -			if (val == 0)
> -				return -EAGAIN;
> -
> -			val *= 1000;	/* millidegrees */
> -		}
> +		val = temp->value * 1000;
>   		break;
>   	case 2:
>   		val = temp->fru_type;
>   		break;
> -	case 3:
> -		val = temp->value == OCC_TEMP_SENSOR_FAULT;
> -		break;
>   	default:
>   		return -EINVAL;
>   	}
> @@ -790,7 +777,7 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>   		num_attrs += (sensors->temp.num_sensors * 2);
>   		break;
>   	case 2:
> -		num_attrs += (sensors->temp.num_sensors * 4);
> +		num_attrs += (sensors->temp.num_sensors * 3);
>   		show_temp = occ_show_temp_2;
>   		break;
>   	default:
> @@ -863,22 +850,13 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>
>   	for (i = 0; i < sensors->temp.num_sensors; ++i) {
>   		s = i + 1;
> -		temp = ((struct temp_sensor_2 *)sensors->temp.data) + i;
>
>   		snprintf(attr->name, sizeof(attr->name), "temp%d_label", s);
>   		attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>   					     0, i);
>   		attr++;
>
> -		if (sensors->temp.version > 1 &&
> -		    temp->fru_type == OCC_FRU_TYPE_VRM) {
> -			snprintf(attr->name, sizeof(attr->name),
> -				 "temp%d_alarm", s);
> -		} else {
> -			snprintf(attr->name, sizeof(attr->name),
> -				 "temp%d_input", s);
> -		}
> -
> +		snprintf(attr->name, sizeof(attr->name), "temp%d_input", s);
>   		attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>   					     1, i);
>   		attr++;
> @@ -890,11 +868,6 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>   						     show_temp, NULL, 2, i);
>   			attr++;
>
> -			snprintf(attr->name, sizeof(attr->name),
> -				 "temp%d_fault", s);
> -			attr->sensor = OCC_INIT_ATTR(attr->name, 0444,
> -						     show_temp, NULL, 3, i);
> -			attr++;
>   		}
>   	}
>

Patch

diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index 7c97a4c..a5c4df0 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -304,24 +304,11 @@  static ssize_t occ_show_temp_2(struct device *dev,
 		val = get_unaligned_be32(&temp->sensor_id);
 		break;
 	case 1:
-		val = temp->value;
-		if (val == OCC_TEMP_SENSOR_FAULT)
-			return -EREMOTEIO;
-
-		if (temp->fru_type != OCC_FRU_TYPE_VRM) {
-			/* sensor not ready */
-			if (val == 0)
-				return -EAGAIN;
-
-			val *= 1000;	/* millidegrees */
-		}
+		val = temp->value * 1000;
 		break;
 	case 2:
 		val = temp->fru_type;
 		break;
-	case 3:
-		val = temp->value == OCC_TEMP_SENSOR_FAULT;
-		break;
 	default:
 		return -EINVAL;
 	}
@@ -790,7 +777,7 @@  static int occ_setup_sensor_attrs(struct occ *occ)
 		num_attrs += (sensors->temp.num_sensors * 2);
 		break;
 	case 2:
-		num_attrs += (sensors->temp.num_sensors * 4);
+		num_attrs += (sensors->temp.num_sensors * 3);
 		show_temp = occ_show_temp_2;
 		break;
 	default:
@@ -863,22 +850,13 @@  static int occ_setup_sensor_attrs(struct occ *occ)
 
 	for (i = 0; i < sensors->temp.num_sensors; ++i) {
 		s = i + 1;
-		temp = ((struct temp_sensor_2 *)sensors->temp.data) + i;
 
 		snprintf(attr->name, sizeof(attr->name), "temp%d_label", s);
 		attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
 					     0, i);
 		attr++;
 
-		if (sensors->temp.version > 1 &&
-		    temp->fru_type == OCC_FRU_TYPE_VRM) {
-			snprintf(attr->name, sizeof(attr->name),
-				 "temp%d_alarm", s);
-		} else {
-			snprintf(attr->name, sizeof(attr->name),
-				 "temp%d_input", s);
-		}
-
+		snprintf(attr->name, sizeof(attr->name), "temp%d_input", s);
 		attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
 					     1, i);
 		attr++;
@@ -890,11 +868,6 @@  static int occ_setup_sensor_attrs(struct occ *occ)
 						     show_temp, NULL, 2, i);
 			attr++;
 
-			snprintf(attr->name, sizeof(attr->name),
-				 "temp%d_fault", s);
-			attr->sensor = OCC_INIT_ATTR(attr->name, 0444,
-						     show_temp, NULL, 3, i);
-			attr++;
 		}
 	}