diff mbox

[net-next,1/2] be2net: set temperature value for all adapter's functions

Message ID 1469237395-11501-1-git-send-email-gpiccoli@linux.vnet.ibm.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Guilherme G. Piccoli July 23, 2016, 1:29 a.m. UTC
Temperature values on be2net driver are made available to userspace via
hwmon abstraction, so tools like lm-sensors can present them to the user.
The driver provides hwmon structures for each adapter's function.
Nevertheless, the temperature information come from fw queries performed by
be_worker() with some frequency, and this procedure is called with a single
function as argument; this means that the temperature value is updated only
in the specific function that was passed to be_worker().

This can lead to incongruency in reported temperature by a function, or in
a worse scenario, some functions might be unable to provide temperature
info to userspace, if they weren't fed with this information from fw in
be_worker() run.

This patch changes the way temperature is set in be2net driver. At anytime
the fw query is performed, it will set the temperature value for all
functions of the adapter, instead of only setting the temperature of the
function passed to be_worker().

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 drivers/net/ethernet/emulex/benet/be.h      |  1 +
 drivers/net/ethernet/emulex/benet/be_cmds.c | 13 +++---
 drivers/net/ethernet/emulex/benet/be_main.c | 63 +++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 7 deletions(-)

Comments

Sathya Perla July 25, 2016, 10:48 a.m. UTC | #1
> -----Original Message-----
> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
>
> Temperature values on be2net driver are made available to userspace via
hwmon abstraction, so tools like lm-
> sensors can present them to the user.
> The driver provides hwmon structures for each adapter's function.
> Nevertheless, the temperature information come from fw queries performed
by
> be_worker() with some frequency, and this procedure is called with a
single function as argument; this means
> that the temperature value is updated only in the specific function that
was passed to be_worker().
>
> This can lead to incongruency in reported temperature by a function, or
in a worse scenario, some functions
> might be unable to provide temperature info to userspace, if they
weren't fed with this information from fw in
> be_worker() run.

Hi, I'm wondering if you are OK with the temperature value being 128s old
(2/2 patch), then why is it a problem
if two different functions report a temperature value that is queried a
few seconds apart?
Also, you'll not have a scenario where the FW cmd succeeds for one
function and fails for other functions.
It's a common FW for the entire adapter.

>
> This patch changes the way temperature is set in be2net driver. At
anytime the fw query is performed, it will set
> the temperature value for all functions of the adapter, instead of only
setting the temperature of the function
> passed to be_worker().
If the possible inconsistency across functions is indeed a problem, then a
simpler solution would be to
issue the FW cmd synchronously when the sysfs attr is read, i.e., in
be_hwmon_show_temp() routine itself.

thanks!
-Sathya
Guilherme G. Piccoli July 25, 2016, 12:53 p.m. UTC | #2
On 07/25/2016 07:48 AM, Sathya Perla wrote:
>> -----Original Message-----
>> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
>>
>> Temperature values on be2net driver are made available to userspace via
> hwmon abstraction, so tools like lm-
>> sensors can present them to the user.
>> The driver provides hwmon structures for each adapter's function.
>> Nevertheless, the temperature information come from fw queries performed
> by
>> be_worker() with some frequency, and this procedure is called with a
> single function as argument; this means
>> that the temperature value is updated only in the specific function that
> was passed to be_worker().
>>
>> This can lead to incongruency in reported temperature by a function, or
> in a worse scenario, some functions
>> might be unable to provide temperature info to userspace, if they
> weren't fed with this information from fw in
>> be_worker() run.
>
> Hi, I'm wondering if you are OK with the temperature value being 128s old
> (2/2 patch), then why is it a problem
> if two different functions report a temperature value that is queried a
> few seconds apart?
> Also, you'll not have a scenario where the FW cmd succeeds for one
> function and fails for other functions.
> It's a common FW for the entire adapter.
>
>>
>> This patch changes the way temperature is set in be2net driver. At
> anytime the fw query is performed, it will set
>> the temperature value for all functions of the adapter, instead of only
> setting the temperature of the function
>> passed to be_worker().
> If the possible inconsistency across functions is indeed a problem, then a
> simpler solution would be to
> issue the FW cmd synchronously when the sysfs attr is read, i.e., in
> be_hwmon_show_temp() routine itself.
>

Hi Sathya, thanks very much for your quick reply. I agree with you that 
an 1 or 2 sec inconsistency wouldn't harm, but the main problem we're 
seeing is that be_worker() is being called with a single function as a 
parameter - in our case, the last function is being passed as argument 
to be_worker() multiple times in a row, and then we have its temperature 
updated but the other functions' temperature set as invalid.

Regarding the temperature update run on be_hwmon_show_temp(), it was an 
idea too, but I was afraid in delay this output too much - imagine some 
userspace tool reads hwmon attributes for all functions almost at "same 
time", supposing the fw command can't run in parallel, the "last" read 
would need to wait 4 fw commands to complete before showing it's output.
Besides, in a worse scenario, some "not-friendly" tool might issue lots 
of reads to hwmon per second then issuing lots of fw commands, which 
does not seem a good idea. Of course this last case we can avoid by 
implementing a counter or timer on be_hwmon_show_temp() to allow maximum 
number of fw cmds in a time frame.

I appreciate your advice on how do you prefer to address this issue.
Thanks,


Guilherme


> thanks!
> -Sathya
>
Sathya Perla July 26, 2016, 8:26 a.m. UTC | #3
> -----Original Message-----
> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
>
> On 07/25/2016 07:48 AM, Sathya Perla wrote:
> >> -----Original Message-----
> >> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
> >>
> >> Temperature values on be2net driver are made available to userspace
> >> via
> > hwmon abstraction, so tools like lm-
> >> sensors can present them to the user.
> >> The driver provides hwmon structures for each adapter's function.
> >> Nevertheless, the temperature information come from fw queries
> >> performed
> > by
> >> be_worker() with some frequency, and this procedure is called with a
> > single function as argument; this means
> >> that the temperature value is updated only in the specific function
> >> that
> > was passed to be_worker().
> >>
> >> This can lead to incongruency in reported temperature by a function,
> >> or
> > in a worse scenario, some functions
> >> might be unable to provide temperature info to userspace, if they
> > weren't fed with this information from fw in
> >> be_worker() run.
> >
> > Hi, I'm wondering if you are OK with the temperature value being 128s
> > old
> > (2/2 patch), then why is it a problem
> > if two different functions report a temperature value that is queried
> > a few seconds apart?
> > Also, you'll not have a scenario where the FW cmd succeeds for one
> > function and fails for other functions.
> > It's a common FW for the entire adapter.
> >
> >>
> >> This patch changes the way temperature is set in be2net driver. At
> > anytime the fw query is performed, it will set
> >> the temperature value for all functions of the adapter, instead of
> >> only
> > setting the temperature of the function
> >> passed to be_worker().
> > If the possible inconsistency across functions is indeed a problem,
> > then a simpler solution would be to issue the FW cmd synchronously
> > when the sysfs attr is read, i.e., in
> > be_hwmon_show_temp() routine itself.
> >
>
> Hi Sathya, thanks very much for your quick reply. I agree with you that an
> 1 or 2 sec inconsistency wouldn't
> harm, but the main problem we're seeing is that be_worker() is being
> called with a single function as a parameter
> - in our case, the last function is being passed as argument to
> be_worker() multiple times in a row, and then we
> have its temperature updated but the other functions' temperature set as
> invalid.

Hi Guilherme, this doesn't sound right to me and is not expected. The
be_worker() routine must execute for *each* function every second.
Can you pls share the driver/fw version and any debug logs (with prints) you
may have and also lspci output.

>
> Regarding the temperature update run on be_hwmon_show_temp(), it was an
> idea too, but I was afraid in delay
> this output too much - imagine some userspace tool reads hwmon attributes
> for all functions almost at "same
> time", supposing the fw command can't run in parallel, the "last" read
> would need to wait 4 fw commands to
> complete before showing it's output.

I don't see any issue even if the sensors program queries each function one
after another. These calls would only be
a few milli-seconds apart.

> Besides, in a worse scenario, some "not-friendly" tool might issue lots of
> reads to hwmon per second then
> issuing lots of fw commands, which does not seem a good idea. Of course
> this last case we can avoid by
> implementing a counter or timer on be_hwmon_show_temp() to allow maximum
> number of fw cmds in a time
> frame.
Yes, this is not an issue. If the hwmon read is issued with-in a few seconds
of the previous read then you can just return the old temperature value.
We are anyway querying this value only once in 64s now.
But, I'd like to root-cause the issue you are seeing above before we "fix"
anything.

thanks,
-Sathya
Guilherme G. Piccoli July 26, 2016, 8:31 p.m. UTC | #4
On 07/26/2016 05:26 AM, Sathya Perla wrote:
>> -----Original Message-----
>> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
>>
>> On 07/25/2016 07:48 AM, Sathya Perla wrote:
>>>> -----Original Message-----
>>>> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
>>>>
>>>> Temperature values on be2net driver are made available to userspace
>>>> via
>>> hwmon abstraction, so tools like lm-
>>>> sensors can present them to the user.
>>>> The driver provides hwmon structures for each adapter's function.
>>>> Nevertheless, the temperature information come from fw queries
>>>> performed
>>> by
>>>> be_worker() with some frequency, and this procedure is called with a
>>> single function as argument; this means
>>>> that the temperature value is updated only in the specific function
>>>> that
>>> was passed to be_worker().
>>>>
>>>> This can lead to incongruency in reported temperature by a function,
>>>> or
>>> in a worse scenario, some functions
>>>> might be unable to provide temperature info to userspace, if they
>>> weren't fed with this information from fw in
>>>> be_worker() run.
>>>
>>> Hi, I'm wondering if you are OK with the temperature value being 128s
>>> old
>>> (2/2 patch), then why is it a problem
>>> if two different functions report a temperature value that is queried
>>> a few seconds apart?
>>> Also, you'll not have a scenario where the FW cmd succeeds for one
>>> function and fails for other functions.
>>> It's a common FW for the entire adapter.
>>>
>>>>
>>>> This patch changes the way temperature is set in be2net driver. At
>>> anytime the fw query is performed, it will set
>>>> the temperature value for all functions of the adapter, instead of
>>>> only
>>> setting the temperature of the function
>>>> passed to be_worker().
>>> If the possible inconsistency across functions is indeed a problem,
>>> then a simpler solution would be to issue the FW cmd synchronously
>>> when the sysfs attr is read, i.e., in
>>> be_hwmon_show_temp() routine itself.
>>>
>>
>> Hi Sathya, thanks very much for your quick reply. I agree with you that an
>> 1 or 2 sec inconsistency wouldn't
>> harm, but the main problem we're seeing is that be_worker() is being
>> called with a single function as a parameter
>> - in our case, the last function is being passed as argument to
>> be_worker() multiple times in a row, and then we
>> have its temperature updated but the other functions' temperature set as
>> invalid.
>
> Hi Guilherme, this doesn't sound right to me and is not expected. The
> be_worker() routine must execute for *each* function every second.
> Can you pls share the driver/fw version and any debug logs (with prints) you
> may have and also lspci output.

Hi Sathya, indeed...this is _not right_...from my side heheh
Unfortunately I made a mistake in my analysis and ended up 
over-engineering a "solution" to an issue which root cause wasn't clear 
to me! I want to thank you for your relevant questions and the 
information you provided, which helped a lot to figure exactly what's 
going on.

Our issue is seen because some adapter's functions (3 out of 4) have 
their interface down, and the fw temperature queries are performed only 
for functions which interface is up. The following conditional avoids fw 
query to occur whenever adapter's interface is down:

   if (!netif_running(adapter->netdev))
[be_main.c:5002, kernel v4.7]

It seems harmless to change the fw query location to perform temperature 
read for all functions regardless the state of its interface - this will 
solve our issue. I wrote a simple patch (to "net", and not "net-next" 
anymore) to improve this driver's behavior.
I'll send it right after this message, please let me know what you think.

Again, thanks very much for your attention and sorry for my confusion.
Cheers,


Guilherme


>>
>> Regarding the temperature update run on be_hwmon_show_temp(), it was an
>> idea too, but I was afraid in delay
>> this output too much - imagine some userspace tool reads hwmon attributes
>> for all functions almost at "same
>> time", supposing the fw command can't run in parallel, the "last" read
>> would need to wait 4 fw commands to
>> complete before showing it's output.
>
> I don't see any issue even if the sensors program queries each function one
> after another. These calls would only be
> a few milli-seconds apart.
>
>> Besides, in a worse scenario, some "not-friendly" tool might issue lots of
>> reads to hwmon per second then
>> issuing lots of fw commands, which does not seem a good idea. Of course
>> this last case we can avoid by
>> implementing a counter or timer on be_hwmon_show_temp() to allow maximum
>> number of fw cmds in a time
>> frame.
> Yes, this is not an issue. If the hwmon read is issued with-in a few seconds
> of the previous read then you can just return the old temperature value.
> We are anyway querying this value only once in 64s now.
> But, I'd like to root-cause the issue you are seeing above before we "fix"
> anything.
>
> thanks,
> -Sathya
>
diff mbox

Patch

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index fe3763d..76f8fdb 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -851,6 +851,7 @@  u32 be_get_fw_log_level(struct be_adapter *adapter);
 int be_update_queues(struct be_adapter *adapter);
 int be_poll(struct napi_struct *napi, int budget);
 void be_eqd_update(struct be_adapter *adapter, bool force_update);
+void be_set_adapters_temperature_value(struct be_adapter *adapter, u8 temp);
 
 /*
  * internal function to initialize-cleanup roce device.
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 22402db..6c59351 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -221,13 +221,12 @@  static void be_async_cmd_process(struct be_adapter *adapter,
 		if (base_status == MCC_STATUS_SUCCESS) {
 			struct be_cmd_resp_get_cntl_addnl_attribs *resp =
 							(void *)resp_hdr;
-			adapter->hwmon_info.be_on_die_temp =
-						resp->on_die_temperature;
-		} else {
-			adapter->be_get_temp_freq = 0;
-			adapter->hwmon_info.be_on_die_temp =
-						BE_INVALID_DIE_TEMP;
-		}
+			be_set_adapters_temperature_value(adapter,
+							  resp->on_die_temperature);
+		} else
+			be_set_adapters_temperature_value(adapter,
+							  BE_INVALID_DIE_TEMP);
+
 		return;
 	}
 }
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index ed98ef1..9f44a00 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -53,6 +53,15 @@  static const struct pci_device_id be_dev_ids[] = {
 	{ 0 }
 };
 MODULE_DEVICE_TABLE(pci, be_dev_ids);
+
+struct adapter_list_node {
+	struct be_adapter *adapter;
+	struct list_head node;
+};
+
+static LIST_HEAD(adapters_list);
+static DEFINE_MUTEX(adapters_list_lock);
+
 /* UE Status Low CSR */
 static const char * const ue_status_low_desc[] = {
 	"CEV",
@@ -130,6 +139,40 @@  static const char * const ue_status_hi_desc[] = {
 				 BE_IF_FLAGS_MULTICAST | \
 				 BE_IF_FLAGS_PASS_L3L4_ERRORS)
 
+/* This procedure runs through adapters_list and sets the temperature for
+ * all functions of the same adapter. Since the temperature update is done
+ * by a single function in be_worker(), the other hwmon entries might remain
+ * with an invalid temperature.
+ */
+
+void be_set_adapters_temperature_value(struct be_adapter *adapter, u8 temp)
+{
+	struct adapter_list_node *adapter_lnode;
+	struct pci_bus *bus;
+	u8 bus_number, slot, dev;
+	u16 domain;
+
+	bus = adapter->pdev->bus;
+	domain = pci_domain_nr(bus);
+	bus_number = bus->number;
+	dev = PCI_SLOT(adapter->pdev->devfn);
+
+	mutex_lock(&adapters_list_lock);
+	list_for_each_entry(adapter_lnode, &adapters_list, node) {
+		bus = adapter_lnode->adapter->pdev->bus;
+		slot = PCI_SLOT(adapter_lnode->adapter->pdev->devfn);
+
+		if (pci_domain_nr(bus) == domain && bus->number == bus_number &&
+		    slot == dev) {
+			adapter_lnode->adapter->hwmon_info.be_on_die_temp
+								= temp;
+			if (unlikely(temp == BE_INVALID_DIE_TEMP))
+				adapter_lnode->adapter->be_get_temp_freq = 0;
+		}
+	}
+	mutex_unlock(&adapters_list_lock);
+}
+
 static void be_queue_free(struct be_adapter *adapter, struct be_queue_info *q)
 {
 	struct be_dma_mem *mem = &q->dma_mem;
@@ -5204,6 +5247,7 @@  free_mbox:
 static void be_remove(struct pci_dev *pdev)
 {
 	struct be_adapter *adapter = pci_get_drvdata(pdev);
+	struct adapter_list_node *lnode, *tmp;
 
 	if (!adapter)
 		return;
@@ -5215,6 +5259,15 @@  static void be_remove(struct pci_dev *pdev)
 
 	unregister_netdev(adapter->netdev);
 
+	mutex_lock(&adapters_list_lock);
+	list_for_each_entry_safe(lnode, tmp, &adapters_list, node)
+		if (lnode->adapter == adapter) {
+			list_del(&lnode->node);
+			kfree(lnode);
+			break;
+		}
+	mutex_unlock(&adapters_list_lock);
+
 	be_clear(adapter);
 
 	/* tell fw we're done with firing cmds */
@@ -5314,6 +5367,7 @@  static int be_probe(struct pci_dev *pdev, const struct pci_device_id *pdev_id)
 {
 	struct be_adapter *adapter;
 	struct net_device *netdev;
+	struct adapter_list_node *adapter_lnode;
 	int status = 0;
 
 	dev_info(&pdev->dev, "%s version is %s\n", DRV_NAME, DRV_VER);
@@ -5374,6 +5428,15 @@  static int be_probe(struct pci_dev *pdev, const struct pci_device_id *pdev_id)
 
 	be_schedule_err_detection(adapter, ERR_DETECTION_DELAY);
 
+	adapter_lnode = kmalloc(sizeof(*adapter_lnode), GFP_KERNEL);
+	if (adapter_lnode) {
+		adapter_lnode->adapter = adapter;
+		mutex_lock(&adapters_list_lock);
+		list_add_tail(&adapter_lnode->node, &adapters_list);
+		mutex_unlock(&adapters_list_lock);
+	} else
+		dev_warn(&pdev->dev, "Couldn't be added to adapters_list\n");
+
 	/* On Die temperature not supported for VF. */
 	if (be_physfn(adapter) && IS_ENABLED(CONFIG_BE2NET_HWMON)) {
 		adapter->hwmon_info.hwmon_dev =