Message ID | 1514652658-6228-2-git-send-email-linux@roeck-us.net |
---|---|
State | Changes Requested |
Headers | show |
Series | [1/2] i2c: piix4: Use request_muxed_region | expand |
On Sat, Dec 30, 2017 at 08:50:58AM -0800, Guenter Roeck wrote: > The piix4 i2c driver is extremely slow. Replacing msleep() > with usleep_range() increases its speed substantially. > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> Any comments or concerns ? Thanks, Guenter > --- > drivers/i2c/busses/i2c-piix4.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c > index 78dd5951d6e7..52a8b1c5c110 100644 > --- a/drivers/i2c/busses/i2c-piix4.c > +++ b/drivers/i2c/busses/i2c-piix4.c > @@ -467,13 +467,13 @@ static int piix4_transaction(struct i2c_adapter *piix4_adapter) > > /* We will always wait for a fraction of a second! (See PIIX4 docs errata) */ > if (srvrworks_csb5_delay) /* Extra delay for SERVERWORKS_CSB5 */ > - msleep(2); > + usleep_range(2000, 2000); > else > - msleep(1); > + usleep_range(500, 1000); > > while ((++timeout < MAX_TIMEOUT) && > ((temp = inb_p(SMBHSTSTS)) & 0x01)) > - msleep(1); > + usleep_range(200, 500); > > /* If the SMBus is still busy, we give up */ > if (timeout == MAX_TIMEOUT) {
Hi Guenter, On Sat, 30 Dec 2017 08:50:58 -0800, Guenter Roeck wrote: > The piix4 i2c driver is extremely slow. Replacing msleep() > with usleep_range() increases its speed substantially. > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > drivers/i2c/busses/i2c-piix4.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c > index 78dd5951d6e7..52a8b1c5c110 100644 > --- a/drivers/i2c/busses/i2c-piix4.c > +++ b/drivers/i2c/busses/i2c-piix4.c > @@ -467,13 +467,13 @@ static int piix4_transaction(struct i2c_adapter *piix4_adapter) > > /* We will always wait for a fraction of a second! (See PIIX4 docs errata) */ > if (srvrworks_csb5_delay) /* Extra delay for SERVERWORKS_CSB5 */ > - msleep(2); > + usleep_range(2000, 2000); Isn't this exactly the same? I'm fine using the same function for consistency, just curious. > else > - msleep(1); > + usleep_range(500, 1000); Were you able to test this on older hardware? Unfortunately, the specification errata of the original Intel PIIX4 is quite vague on the amount of time you must wait before checking the Host Busy bit: "Note that there may be moderate latency before the transaction begins and the Host Busy bit gets set." I guess we made it 1 ms at the time because it was the minimum we could sleep anyway. One option if you really care about the performance of the i2c-piix4 driver on recent hardware would be to lower the initial delay even more for ATI and AMD chipsets. The errata was for Intel chipsets originally, and while we know that at least some of the ServerWorks implementations suffered from the same problem (worse actually) I don't think that anybody ever bothered checking if it applied to more recent implementations by other vendors. For reference, at 93.75 kHz (the default SMBus frequency or the SB800), an SMBus Quick transaction would be completed in 117 us, so I guess an initial delay of 150 or 200 us would be optimum. And an SMBus Read Byte transaction completes in 416 ms. I think this is the most popular SMBus transaction, so ensuring that it is as fast as possible would make sense. And it might even work on older Intel chipsets, who knows. Plus I doubt anyone is still using them anyway, so you have my approval to lower the delays to whatever works for you. As a comparison point, in the i2c-i801 driver we use: usleep_range(250, 500); for both the initial sleep and the waiting loop. > > while ((++timeout < MAX_TIMEOUT) && > ((temp = inb_p(SMBHSTSTS)) & 0x01)) > - msleep(1); > + usleep_range(200, 500); Note that you are also drastically reducing the effective timeout value here, because it is a counter and not an overall time. Beforehand, we would wait for at least 501 ms for the transaction to complete. Now this could be down to only 100 ms. That being said, if my math is right, the longest supported transaction (Block Read) at the slowest allowed SMBus clock speed (10 kHz) would be done in 33 ms, so we are still good. > > /* If the SMBus is still busy, we give up */ > if (timeout == MAX_TIMEOUT) { Reviewed-by: Jean Delvare <jdelvare@suse.de>
Hi Jean, On Mon, Feb 12, 2018 at 11:53:36AM +0100, Jean Delvare wrote: > Hi Guenter, > > On Sat, 30 Dec 2017 08:50:58 -0800, Guenter Roeck wrote: > > The piix4 i2c driver is extremely slow. Replacing msleep() > > with usleep_range() increases its speed substantially. > > > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > > --- > > drivers/i2c/busses/i2c-piix4.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c > > index 78dd5951d6e7..52a8b1c5c110 100644 > > --- a/drivers/i2c/busses/i2c-piix4.c > > +++ b/drivers/i2c/busses/i2c-piix4.c > > @@ -467,13 +467,13 @@ static int piix4_transaction(struct i2c_adapter *piix4_adapter) > > > > /* We will always wait for a fraction of a second! (See PIIX4 docs errata) */ > > if (srvrworks_csb5_delay) /* Extra delay for SERVERWORKS_CSB5 */ > > - msleep(2); > > + usleep_range(2000, 2000); > > Isn't this exactly the same? I'm fine using the same function for > consistency, just curious. > No, it isn't. msleep() tends to sleep much longer then requested. uspeep_range() tends to be much more accurate. > > else > > - msleep(1); > > + usleep_range(500, 1000); > > Were you able to test this on older hardware? Unfortunately, the No, the only AMD CPUs I have are Ryzen. > specification errata of the original Intel PIIX4 is quite vague on the > amount of time you must wait before checking the Host Busy bit: > > "Note that there may be moderate latency before the transaction begins > and the Host Busy bit gets set." > > I guess we made it 1 ms at the time because it was the minimum we could > sleep anyway. > > One option if you really care about the performance of the i2c-piix4 > driver on recent hardware would be to lower the initial delay even more > for ATI and AMD chipsets. The errata was for Intel chipsets originally, > and while we know that at least some of the ServerWorks implementations > suffered from the same problem (worse actually) I don't think that > anybody ever bothered checking if it applied to more recent > implementations by other vendors. > > For reference, at 93.75 kHz (the default SMBus frequency or the SB800), > an SMBus Quick transaction would be completed in 117 us, so I guess an > initial delay of 150 or 200 us would be optimum. And an SMBus Read Byte > transaction completes in 416 ms. I think this is the most popular SMBus > transaction, so ensuring that it is as fast as possible would make > sense. > > And it might even work on older Intel chipsets, who knows. Plus I doubt > anyone is still using them anyway, so you have my approval to lower the > delays to whatever works for you. > > As a comparison point, in the i2c-i801 driver we use: > > usleep_range(250, 500); > > for both the initial sleep and the waiting loop. > > > > > while ((++timeout < MAX_TIMEOUT) && > > ((temp = inb_p(SMBHSTSTS)) & 0x01)) > > - msleep(1); > > + usleep_range(200, 500); > > Note that you are also drastically reducing the effective timeout value > here, because it is a counter and not an overall time. Beforehand, we > would wait for at least 501 ms for the transaction to complete. Now > this could be down to only 100 ms. That being said, if my math is > right, the longest supported transaction (Block Read) at the slowest > allowed SMBus clock speed (10 kHz) would be done in 33 ms, so we are > still good. > > > > > /* If the SMBus is still busy, we give up */ > > if (timeout == MAX_TIMEOUT) { > > Reviewed-by: Jean Delvare <jdelvare@suse.de> > Not sure I follow, overall. What changes, if any, do you expect me to make ? Thanks, Guenter > -- > Jean Delvare > SUSE L3 Support
On Mon, 2018-02-12 at 12:59 -0800, Guenter Roeck wrote: > On Mon, Feb 12, 2018 at 11:53:36AM +0100, Jean Delvare wrote: > > On Sat, 30 Dec 2017 08:50:58 -0800, Guenter Roeck wrote: > > > The piix4 i2c driver is extremely slow. Replacing msleep() > > > with usleep_range() increases its speed substantially. [] > > > diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c [] > > > @@ -467,13 +467,13 @@ static int piix4_transaction(struct i2c_adapter *piix4_adapter) > > > > > > /* We will always wait for a fraction of a second! (See PIIX4 docs errata) */ > > > if (srvrworks_csb5_delay) /* Extra delay for SERVERWORKS_CSB5 */ > > > - msleep(2); > > > + usleep_range(2000, 2000); usleep_range without a range isn't particularly useful. Perhaps a 100 uSec range would allow better scheduling. usleep_range(2000, 2100);
Hi Jean, On Mon, Feb 12, 2018 at 11:53:36AM +0100, Jean Delvare wrote: > Hi Guenter, > > On Sat, 30 Dec 2017 08:50:58 -0800, Guenter Roeck wrote: > > The piix4 i2c driver is extremely slow. Replacing msleep() > > with usleep_range() increases its speed substantially. > > > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > > --- > > drivers/i2c/busses/i2c-piix4.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c > > index 78dd5951d6e7..52a8b1c5c110 100644 > > --- a/drivers/i2c/busses/i2c-piix4.c > > +++ b/drivers/i2c/busses/i2c-piix4.c > > @@ -467,13 +467,13 @@ static int piix4_transaction(struct i2c_adapter *piix4_adapter) > > > > /* We will always wait for a fraction of a second! (See PIIX4 docs errata) */ > > if (srvrworks_csb5_delay) /* Extra delay for SERVERWORKS_CSB5 */ > > - msleep(2); > > + usleep_range(2000, 2000); > > Isn't this exactly the same? I'm fine using the same function for > consistency, just curious. > > > else > > - msleep(1); > > + usleep_range(500, 1000); > > Were you able to test this on older hardware? Unfortunately, the > specification errata of the original Intel PIIX4 is quite vague on the > amount of time you must wait before checking the Host Busy bit: > > "Note that there may be moderate latency before the transaction begins > and the Host Busy bit gets set." > > I guess we made it 1 ms at the time because it was the minimum we could > sleep anyway. > > One option if you really care about the performance of the i2c-piix4 > driver on recent hardware would be to lower the initial delay even more > for ATI and AMD chipsets. The errata was for Intel chipsets originally, > and while we know that at least some of the ServerWorks implementations > suffered from the same problem (worse actually) I don't think that > anybody ever bothered checking if it applied to more recent > implementations by other vendors. > > For reference, at 93.75 kHz (the default SMBus frequency or the SB800), > an SMBus Quick transaction would be completed in 117 us, so I guess an > initial delay of 150 or 200 us would be optimum. And an SMBus Read Byte > transaction completes in 416 ms. I think this is the most popular SMBus > transaction, so ensuring that it is as fast as possible would make > sense. > > And it might even work on older Intel chipsets, who knows. Plus I doubt > anyone is still using them anyway, so you have my approval to lower the > delays to whatever works for you. > > As a comparison point, in the i2c-i801 driver we use: > > usleep_range(250, 500); > > for both the initial sleep and the waiting loop. > A further test on Ryzen shows that bit 0 of SMBHSTSTS is set immediately, ie with outb_p(inb(SMBHSTCNT) | 0x040, SMBHSTCNT); busy = inb_p(SMBHSTSTS) & 0x01; busy is always true. None of the datasheets I was able to find (sb700, sb800, bolton) suggests that an initial delay is needed. Another quick test with my Ryzen system, using usleep_range(100, 100), shows the result of quick commands in the third loop iteration (ie after 200 uS), and the result of a "read byte" operation in the 6th loop iteration (ie after 500 uS). This is measured without initial delay. Not sure what that means, if anything, for the driver. The biggest concern is the "moderate latency" required by the Intel chips. Otherwise we could just use the values from i2c-i801 for both initial and loop delay. Guenter
On Mon, 12 Feb 2018 14:22:41 -0800, Guenter Roeck wrote: > On Mon, Feb 12, 2018 at 11:53:36AM +0100, Jean Delvare wrote: > > Were you able to test this on older hardware? Unfortunately, the > > specification errata of the original Intel PIIX4 is quite vague on the > > amount of time you must wait before checking the Host Busy bit: > > > > "Note that there may be moderate latency before the transaction begins > > and the Host Busy bit gets set." > > > > I guess we made it 1 ms at the time because it was the minimum we could > > sleep anyway. > > > > One option if you really care about the performance of the i2c-piix4 > > driver on recent hardware would be to lower the initial delay even more > > for ATI and AMD chipsets. The errata was for Intel chipsets originally, > > and while we know that at least some of the ServerWorks implementations > > suffered from the same problem (worse actually) I don't think that > > anybody ever bothered checking if it applied to more recent > > implementations by other vendors. > > > > For reference, at 93.75 kHz (the default SMBus frequency or the SB800), > > an SMBus Quick transaction would be completed in 117 us, so I guess an > > initial delay of 150 or 200 us would be optimum. And an SMBus Read Byte > > transaction completes in 416 ms. I think this is the most popular SMBus > > transaction, so ensuring that it is as fast as possible would make > > sense. > > > > And it might even work on older Intel chipsets, who knows. Plus I doubt > > anyone is still using them anyway, so you have my approval to lower the > > delays to whatever works for you. > > > > As a comparison point, in the i2c-i801 driver we use: > > > > usleep_range(250, 500); > > > > for both the initial sleep and the waiting loop. > > A further test on Ryzen shows that bit 0 of SMBHSTSTS is set immediately, > ie with > outb_p(inb(SMBHSTCNT) | 0x040, SMBHSTCNT); > busy = inb_p(SMBHSTSTS) & 0x01; > busy is always true. None of the datasheets I was able to find (sb700, > sb800, bolton) suggests that an initial delay is needed. It it good that today's hardware is better than Intel's original hardware, but on the other hand, it makes little sense to start polling before the command has a chance of being completed. So I don't think that skipping the initial delay would help with performance. > Another quick test with my Ryzen system, using usleep_range(100, 100), > shows the result of quick commands in the third loop iteration (ie > after 200 uS), and the result of a "read byte" operation in the 6th > loop iteration (ie after 500 uS). This is measured without initial delay. Your numbers are in line with my theoretical estimates above. > Not sure what that means, if anything, for the driver. The biggest concern > is the "moderate latency" required by the Intel chips. Otherwise we could > just use the values from i2c-i801 for both initial and loop delay. I think I would not bother too much about that older hardware which probably nobody uses anymore (we are talking about late 90s motherboards here, I'm rather conservative and sentimental with my hardware and even me no longer have these.) It makes more sense to make the driver as efficient as possible for current hardware. From your numbers above, an initial sleep of 200 us followed by 200 us in-loop delays would seem reasonable to me. With some upper margin to let the scheduler reduce the CPU awakening, as suggested by Joe somewhere else in this thread. If you want to push it one step further, you could estimate how long the transaction is expected to take (based on the SMBus clock frequency and the number of bytes in the transaction) and set the initial delay to match that. That would allow to shorten the cycle time in the polling loop (as you know you won't cycle at all in most cases) and limit the useless calls to usleep_range() and the polls which can't succeed. Whether you want to spend time on this is up to you, of course.
On 31/12/17 02:50, Guenter Roeck wrote: > The piix4 i2c driver is extremely slow. Replacing msleep() > with usleep_range() increases its speed substantially. > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > drivers/i2c/busses/i2c-piix4.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c > index 78dd5951d6e7..52a8b1c5c110 100644 > --- a/drivers/i2c/busses/i2c-piix4.c > +++ b/drivers/i2c/busses/i2c-piix4.c > @@ -467,13 +467,13 @@ static int piix4_transaction(struct i2c_adapter *piix4_adapter) > > /* We will always wait for a fraction of a second! (See PIIX4 docs errata) */ > if (srvrworks_csb5_delay) /* Extra delay for SERVERWORKS_CSB5 */ > - msleep(2); > + usleep_range(2000, 2000); > else > - msleep(1); > + usleep_range(500, 1000); > > while ((++timeout < MAX_TIMEOUT) && > ((temp = inb_p(SMBHSTSTS)) & 0x01)) > - msleep(1); > + usleep_range(200, 500); > > /* If the SMBus is still busy, we give up */ > if (timeout == MAX_TIMEOUT) { > Thanks for this patch. FWIW, this also makes a noticeable difference on AMD Family 16h Model 30h, used in embedded designs and also commonly available as AMD GX-412TC SoC in the PC Engines APU2. Among the tests I did were reading from the SoC temperature sensor in a loop: while [ true ] ; do i2cget -y 0 0x4C 0x01 ; done and scanning for peripherals in a loop: while [ true ] ; do i2cdetect -y 0 ; done These tests may be artificial and trivial, but the speedup matters to us because we have more than one bus master and the embedded controller needs to poll multiple sensors. Tested-by: Andrew Cooks <andrew.cooks@opengear.com>
> > /* If the SMBus is still busy, we give up */ > > if (timeout == MAX_TIMEOUT) { > > Reviewed-by: Jean Delvare <jdelvare@suse.de> Does this still hold given despite the discussion in this thread?
diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c index 78dd5951d6e7..52a8b1c5c110 100644 --- a/drivers/i2c/busses/i2c-piix4.c +++ b/drivers/i2c/busses/i2c-piix4.c @@ -467,13 +467,13 @@ static int piix4_transaction(struct i2c_adapter *piix4_adapter) /* We will always wait for a fraction of a second! (See PIIX4 docs errata) */ if (srvrworks_csb5_delay) /* Extra delay for SERVERWORKS_CSB5 */ - msleep(2); + usleep_range(2000, 2000); else - msleep(1); + usleep_range(500, 1000); while ((++timeout < MAX_TIMEOUT) && ((temp = inb_p(SMBHSTSTS)) & 0x01)) - msleep(1); + usleep_range(200, 500); /* If the SMBus is still busy, we give up */ if (timeout == MAX_TIMEOUT) {
The piix4 i2c driver is extremely slow. Replacing msleep() with usleep_range() increases its speed substantially. Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- drivers/i2c/busses/i2c-piix4.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)