mbox series

[0/1] i2c: imc: Add support for Intel iMC SMBus host controller.

Message ID 1582498270-50674-1-git-send-email-schaecsn@gmx.net
Headers show
Series i2c: imc: Add support for Intel iMC SMBus host controller. | expand

Message

Stefan Schaeckeler Feb. 23, 2020, 10:51 p.m. UTC
This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
this rewrite will:


Overview

Modern Intel memory controllers host an SMBus controller and connection to
DIMMs and their thermal sensors. The memory controller firmware has three modes
of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
Throttling (OLTT) and none.

- CLTT: The memory controller firmware is periodically accessing the DIMM
  temperature sensor over the SMBus.

- OLTT: The memory controller firmware is not accessing the DIMM temperature
  sensor over the SMBus but approximates/guesses its temperature.

Depending on the temperature, the memory controller firmware may throttle the
memory bandwidth and alike.

Only one mode of operation can be used at a time. Intel recommends CLTT. This
is also the default on our BIOS.


Original Driver and its Rewrite

The original driver i2c-imc.c was an iMC SMBus controller that provided access
to the DIMM thermal sensors. A second driver dimm-bus.c, also part of Andy's
patch-set, instantiated the thermal sensors.

The original driver was written for the memory controller found in Sandy Bridge
CPUs. Either the Sandy Bridge documentation is incomplete or the functionality
is limited. It was not possible to use this driver while the memory controller
was in CLTT mode as the driver and firmware were both accessing the memory
controller without arbitration. We ran this driver on our Broadwell CPU and the
driver's internal consistency check failed every 30 min or so.

We rewrote this driver to support Broadwell's memory controller 8086.6fa8. Over
time, support for more memory controllers should be added.

Our documentation (Intel Xeon Processor D-1500 Product Family External Design
Specification (EDS), Volume Two: Core and Uncore Registers Volume 2 of 5 Rev.
2.3) hints how to make OS drivers and firmware co-exist in CLTT mode. In short:

- don't (necessarily) disable CLTT mode, but set tsod_polling_interval to 0
- wait 10 ms to drain a potential in-flight firmware CLTT transaction
- OS has now exclusive access to the smb bus
- set tsod_polling_interval to the previous value

Our patch provides proper arbitration between OS and firmware on Broadwell.


The original patch-set also provided an additional driver, dimm-bus.c, to
instantiate the temperature sensors. It had some draw-backs:

- the probe function i2c_scan_dimm_bus() blindly enumerates potential DIMM
  sensor i2c addresses causing the SBE bit to be set 6 times on our system.
  That is dangerous (see comment in i2c-imc.c: if (stat & SMBSTAT_SBE)). The
  i2c addresses of the actual temperature sensors are known to the memory
  controller (when in CLTT mode) and don't need to be blindly enumerated.

- the probe function i2c_scan_dimm_bus() instantiates blindly 10 temperature
  sensors, although our system had only 2 DIMMs (with 1 temperature sensor
  each). The remaining 8 temperature sensors returned 0.

- as already pointed out, the instantiations happen in a further driver
  dimm-bus.c. The iMC SMBus driver i2c-imc.c is calling dimm-bus.c to do its
  job. That does not feel right. I don't know how to do it better and even move
  for now the instantiations into the iMC SMBus driver itself
  (imc_instantiate_sensors(()). Please advice here.


The mapping of dimm to i2c adapter and addresses is confusing at best. From the
smb_stat_0 and from Andy's dimm-bus.c driver, I gain the impression the mapping
may be

channel 00 slot 00   i2c-1 0x18 (if there is a dimm)
channel 00 slot 01   i2c-1 0x19 (if there is a dimm)
channel 00 slot 02   i2c-1 0x1a (if there is a dimm)
channel 00 slot 03   i2c-1 0x1b (if there is a dimm)
channel 01 slot 00   i2c-1 0x1c (if there is a dimm)
channel 01 slot 01   i2c-1 0x1d (if there is a dimm)
channel 01 slot 02   i2c-1 0x1e (if there is a dimm)
channel 01 slot 03   i2c-1 0x1f (if there is a dimm)

channel 02 slot 00   i2c-2 0x18 (if there is a dimm)
channel 02 slot 01   i2c-2 0x19 (if there is a dimm)
channel 02 slot 02   i2c-2 0x1a (if there is a dimm)
channel 02 slot 03   i2c-2 0x1b (if there is a dimm)
channel 03 slot 00   i2c-2 0x1c (if there is a dimm)
channel 03 slot 01   i2c-2 0x1d (if there is a dimm)
channel 03 slot 02   i2c-2 0x1e (if there is a dimm)
channel 03 slot 03   i2c-2 0x1f (if there is a dimm)


Experimentally, I gain the impression it's rather

channel 00 slot 00   i2c-1 0x18 (if there is a dimm)
channel 00 slot 01   i2c-1 0x19 (if there is a dimm)
channel 01 slot 00   i2c-1 0x1a (if there is a dimm)
channel 01 slot 01   i2c-1 0x1b (if there is a dimm)

channel 02 slot 00   i2c-2 0x18 (if there is a dimm)
channel 02 slot 01   i2c-2 0x19 (if there is a dimm)
channel 03 slot 00   i2c-2 0x1a (if there is a dimm)
channel 03 slot 01   i2c-2 0x1b (if there is a dimm)

Why? Because we see on our system temperature sensors on i2c address i2c-1 0x18
and ic2-1 0x1a and BIOS and EDAC tell us we have DIMMs on channel 0:slot 0 and
channel 1:slot 0.

[    9.522781] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 16, rank: 2, row: 0x10000, col: 0x400
[    9.522786] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 16, rank: 2, row: 0x10000, col: 0x400


When in OLTT mode, the sensors need to be manually instantiated, e.g.

# echo jc42 0x18  > /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/new_device
# echo jc42 0x1a  > /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/new_device


In CLTT mode - we expect almost everyone to configure CLTT mode in their BIOS -
the new driver knows where DIMMs are populated (see arguments to
imc_instantiate_sensor()) and instantiates the sensors. For this magic to
happen, we don't need to understand the mapping.


Unit Test

I had access to two systems with these memory configurations:

System 1: DIMM at channel 1, slot 0.
System 2: DIMM at channel 0, slot 0. DIMM at channel 1, slot 0.

I had no access to a system with DIMMs on channel 2 or 3.

We read the temperature sensors for 8 hours while having CLTT enabled. Next we
read the temperature sensors for 8 hours while having OLTT enabled. We always
get sane data. The internal sanity check always passes and dmesg is clean. The
grep at the end filters out sane temperature values in the 20C to 39C range so
we can focus on abnormal temperature values and error messages.

First we stress-tested the driver (for 8 hours).

System 1:

while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &

System  2:

while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-0018/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-0018/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &


Next, we gave firmware polling a better chance to start and added a sleep of 2
seconds (for 8 hours).

System 1 and System 2:

while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; sleep 2; done | grep -v ^[23] &

~ Stefan


Stefan Schaeckeler (1):
  i2c: imc: Add support for Intel iMC SMBus host controller.

 MAINTAINERS                  |   5 +
 drivers/i2c/busses/Kconfig   |  15 ++
 drivers/i2c/busses/Makefile  |   1 +
 drivers/i2c/busses/i2c-imc.c | 515 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 536 insertions(+)
 create mode 100644 drivers/i2c/busses/i2c-imc.c

--
2.11.0

Comments

Andy Shevchenko Feb. 25, 2020, 8:22 a.m. UTC | #1
On Mon, Feb 24, 2020 at 12:54 AM Stefan Schaeckeler <schaecsn@gmx.net> wrote:
>
> This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> this rewrite will:

Thanks for the patch!
I'll review the code later.

I think the better to have a documentation file where you describe
stuff like enumeration and so on for this drivers (under Documentation
folder).

> Stefan Schaeckeler (1):
>   i2c: imc: Add support for Intel iMC SMBus host controller.
>
>  MAINTAINERS                  |   5 +
>  drivers/i2c/busses/Kconfig   |  15 ++
>  drivers/i2c/busses/Makefile  |   1 +
>  drivers/i2c/busses/i2c-imc.c | 515 +++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 536 insertions(+)
>  create mode 100644 drivers/i2c/busses/i2c-imc.c
>
> --
> 2.11.0
>
Andy Lutomirski Feb. 25, 2020, 9:49 p.m. UTC | #2
On Sun, Feb 23, 2020 at 2:52 PM Stefan Schaeckeler <schaecsn@gmx.net> wrote:
>
> This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> this rewrite will:
>
>
> Overview
>
> Modern Intel memory controllers host an SMBus controller and connection to
> DIMMs and their thermal sensors. The memory controller firmware has three modes
> of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> Throttling (OLTT) and none.
>
> - CLTT: The memory controller firmware is periodically accessing the DIMM
>   temperature sensor over the SMBus.
>


I think this is great!  One question, though: what happens if the
system is in CLTT mode but you disable CLTT and claim the bus for too
long?  For example, if there's an infinite loop or other lockup which
you have the tsod polling interval set to 0?  Does the system catch
fire or does the system do something intelligent like temporarily
switching to open loop?
Stefan Schaeckeler March 1, 2020, 7:02 p.m. UTC | #3
Hello Any,

> > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > this rewrite will:
> >
> >
> > Overview
> >
> > Modern Intel memory controllers host an SMBus controller and connection to
> > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > Throttling (OLTT) and none.
> >
> > - CLTT: The memory controller firmware is periodically accessing the DIMM
> >   temperature sensor over the SMBus.
> >
>
>
> I think this is great!  One question, though: what happens if the
> system is in CLTT mode but you disable CLTT and claim the bus for too
> long?  For example, if there's an infinite loop or other lockup which
> you have the tsod polling interval set to 0?  Does the system catch
> fire or does the system do something intelligent like temporarily
> switching to open loop?

I don't know. Most likely, the current memory throttling rate will be kept.
That might not be enough for the forthcoming workload and, ehm, the system may
catch fire.

I assume our use-case is the most common use-case for this driver: our embedded
system comes with its own environmental management software. It monitors, among
other sensor values, the DIMM temperatures and takes action on abnormal values.
If one is concerned about your scenario, then the environmental management
software needs to consider blocked reads on the sysfs node as a worst case
scenario and reboot the system.

Nothing can really go wrong while the polling interval is set to 0, though:

- reading and setting pci configuration space registers.
- calling dev_err, dev_warn and alike.
- usleep_range(131,140) and up to 20 udelay(9).

What is not clear to me is what if imc_smbus_xfer() is executing while the
driver is rmmod-ed. Defensively, I set in the driver's remove function the
tsod_polling_interval back to its original value.

~ Stefan
Wolfram Sang April 5, 2020, 6:05 p.m. UTC | #4
On Tue, Feb 25, 2020 at 01:49:34PM -0800, Andy Lutomirski wrote:
> On Sun, Feb 23, 2020 at 2:52 PM Stefan Schaeckeler <schaecsn@gmx.net> wrote:
> >
> > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > this rewrite will:
> >
> >
> > Overview
> >
> > Modern Intel memory controllers host an SMBus controller and connection to
> > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > Throttling (OLTT) and none.
> >
> > - CLTT: The memory controller firmware is periodically accessing the DIMM
> >   temperature sensor over the SMBus.
> >
> 
> 
> I think this is great!  One question, though: what happens if the
> system is in CLTT mode but you disable CLTT and claim the bus for too
> long?  For example, if there's an infinite loop or other lockup which
> you have the tsod polling interval set to 0?  Does the system catch
> fire or does the system do something intelligent like temporarily
> switching to open loop?

Any news on this question?
Stefan Schaeckeler April 5, 2020, 9:40 p.m. UTC | #5
Hello Wolfram,

> > > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > > this rewrite will:
> > >
> > >
> > > Overview
> > >
> > > Modern Intel memory controllers host an SMBus controller and connection to
> > > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > > Throttling (OLTT) and none.
> > >
> > > - CLTT: The memory controller firmware is periodically accessing the DIMM
> > >   temperature sensor over the SMBus.
> > >
> >
> >
> > I think this is great!  One question, though: what happens if the
> > system is in CLTT mode but you disable CLTT and claim the bus for too
> > long?  For example, if there's an infinite loop or other lockup which
> > you have the tsod polling interval set to 0?  Does the system catch
> > fire or does the system do something intelligent like temporarily
> > switching to open loop?
>
> Any news on this question?

Thank you for your interest in this patch. You can read my reply here
https://lkml.org/lkml/2020/3/1/216

 Stefan
Andy Lutomirski April 5, 2020, 10:51 p.m. UTC | #6
On Sun, Apr 5, 2020 at 2:41 PM Stefan Schaeckeler <schaecsn@gmx.net> wrote:
>
> Hello Wolfram,
>
> > > > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > > > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > > > this rewrite will:
> > > >
> > > >
> > > > Overview
> > > >
> > > > Modern Intel memory controllers host an SMBus controller and connection to
> > > > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > > > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > > > Throttling (OLTT) and none.
> > > >
> > > > - CLTT: The memory controller firmware is periodically accessing the DIMM
> > > >   temperature sensor over the SMBus.
> > > >
> > >
> > >
> > > I think this is great!  One question, though: what happens if the
> > > system is in CLTT mode but you disable CLTT and claim the bus for too
> > > long?  For example, if there's an infinite loop or other lockup which
> > > you have the tsod polling interval set to 0?  Does the system catch
> > > fire or does the system do something intelligent like temporarily
> > > switching to open loop?
> >
> > Any news on this question?
>
> Thank you for your interest in this patch. You can read my reply here
> https://lkml.org/lkml/2020/3/1/216

I think it could make sense to upstream this driver but to require a
scary boot-time option to enable it.  Maybe i2c_imc.dangerous=1?

>
>  Stefan