diff mbox

[linux,dev-4.7,v2] drivers/fsi: Add delay before sampling input on SDA

Message ID 20170317211836.63576-1-cbostic@linux.vnet.ibm.com
State Changes Requested, archived
Headers show

Commit Message

Christopher Bostic March 17, 2017, 9:18 p.m. UTC
During high cpu loads the SDA in line can shift relative to the
clock signal which can corrupt the received input data.  Slow
down the time to sample input to account for this.

Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com>
---
v2 - Increase delay for SDA in sampling only.
   - Decrease the original delay for SDA sampling from 1us to 200ns
---
 drivers/fsi/fsi-master-gpio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andrew Geissler March 20, 2017, 1:40 a.m. UTC | #1
On Fri, Mar 17, 2017 at 4:18 PM, Christopher Bostic
<cbostic@linux.vnet.ibm.com> wrote:
> During high cpu loads the SDA in line can shift relative to the
> clock signal which can corrupt the received input data.  Slow
> down the time to sample input to account for this.
>
> Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com>
> ---
> v2 - Increase delay for SDA in sampling only.
>    - Decrease the original delay for SDA sampling from 1us to 200ns
> ---
>  drivers/fsi/fsi-master-gpio.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c
> index a8976d5..8e832e0 100644
> --- a/drivers/fsi/fsi-master-gpio.c
> +++ b/drivers/fsi/fsi-master-gpio.c
> @@ -14,6 +14,7 @@
>  #include "fsi-master.h"
>
>  #define        FSI_GPIO_STD_DLY        1       /* Standard pin delay in nS */
> +#define        FSI_GPIO_SDA_IN_DLY     200     /* Wait to sample SDA line, in nS */
>  #define        FSI_ECHO_DELAY_CLOCKS   16      /* Number clocks for echo delay */
>  #define        FSI_PRE_BREAK_CLOCKS    50      /* Number clocks to prep for break */
>  #define        FSI_BREAK_CLOCKS        256     /* Number of clocks to issue break */
> @@ -97,7 +98,7 @@ static int sda_in(struct fsi_master_gpio *master)
>  {
>         int in;
>
> -       ndelay(FSI_GPIO_STD_DLY);
> +       ndelay(FSI_GPIO_SDA_IN_DLY);
>         in = gpiod_get_value(master->gpio_data);
>         return in ? 1 : 0;
>  }
> --
> 1.8.2.2
>

Acked-by: Andrew Geissler <geissonator@gmail.com>
Jeremy Kerr March 20, 2017, 5:33 a.m. UTC | #2
Hi Chris,

> During high cpu loads the SDA in line can shift relative to the
> clock signal which can corrupt the received input data.  Slow
> down the time to sample input to account for this.

This looks much better than the previous patch, but do you know why high
CPU loads affect this? All of the GPIO interactions are performed with
the single spinlock held, so CPU load should have no bearing on the
timing of the bit-banging - the driver has exclusive use of the CPU
during a FSI transaction.

Can you elaborate on the symptoms of this?

One thing that looks a little suspicious to me is that you're running
the clock during the echo delay time, so you're potentially feeding
random data to the slave...

Cheers,


Jeremy
Christopher Bostic March 20, 2017, 1:49 p.m. UTC | #3
On 3/20/17 12:33 AM, Jeremy Kerr wrote:
> Hi Chris,
>
>> During high cpu loads the SDA in line can shift relative to the
>> clock signal which can corrupt the received input data.  Slow
>> down the time to sample input to account for this.
> This looks much better than the previous patch, but do you know why high
> CPU loads affect this? All of the GPIO interactions are performed with
> the single spinlock held, so CPU load should have no bearing on the
> timing of the bit-banging - the driver has exclusive use of the CPU
> during a FSI transaction.
> Can you elaborate on the symptoms of this?
Hi Jeremy,

Yes that's true, cpu load shouldn't be a factor in this case then. The 
failure mode is during the sampling of input data during the slave 
response phase of the command.  CRC received does not match the CRC 
calculated due to bit flips
related to sampling/clocking timing issues.  This problem only appears 
when hostboot is accessing the BMC.  Increasing delay does correct the 
issue.  I don't have a good explanation for why this timing behavior 
occurs given its not cpu load related.
> One thing that looks a little suspicious to me is that you're running
> the clock during the echo delay time, so you're potentially feeding
> random data to the slave...
>
> Cheers,
The echo delay in the driver clocks '1's on the line so as long as the 
echo_delay() utility is used there is no risk of random data in this 
phase.   Clocking during this phase also gives the slave more clocks to 
process the newly sent command so the entire time for processing a 
single command and generating a response takes less follow on clocks.

Thanks,
Chris
>
> Jeremy
>
Jeremy Kerr March 21, 2017, 12:57 a.m. UTC | #4
Hi Chris,

> The echo delay in the driver clocks '1's on the line so as long as the
> echo_delay() utility is used there is no risk of random data in this
> phase. 

Ah, I'd missed the set_sda_output(1), there. That should be fine then.

> Yes that's true, cpu load shouldn't be a factor in this case then. The
> failure mode is during the sampling of input data during the slave
> response phase of the command.  CRC received does not match the CRC
> calculated due to bit flips
> related to sampling/clocking timing issues.  This problem only appears
> when hostboot is accessing the BMC.  Increasing delay does correct the
> issue.  I don't have a good explanation for why this timing behavior
> occurs given its not cpu load related.

My concern here is that with such a dramatic change in timing there
(200x), and not knowing what the underlying problem is, we're just
making the issue less likely to occur, rather than fixing it.

Do you know if we have access to HW folks to scope the data & clock? We
could set a GPIO when we see a CRC error, to use as an external
trigger...

Cheers,


Jeremy
Christopher Bostic March 21, 2017, 5:14 p.m. UTC | #5
On 3/20/17 7:57 PM, Jeremy Kerr wrote:
> Hi Chris,
>
>> The echo delay in the driver clocks '1's on the line so as long as the
>> echo_delay() utility is used there is no risk of random data in this
>> phase.
> Ah, I'd missed the set_sda_output(1), there. That should be fine then.
>
>> Yes that's true, cpu load shouldn't be a factor in this case then. The
>> failure mode is during the sampling of input data during the slave
>> response phase of the command.  CRC received does not match the CRC
>> calculated due to bit flips
>> related to sampling/clocking timing issues.  This problem only appears
>> when hostboot is accessing the BMC.  Increasing delay does correct the
>> issue.  I don't have a good explanation for why this timing behavior
>> occurs given its not cpu load related.
> My concern here is that with such a dramatic change in timing there
> (200x), and not knowing what the underlying problem is, we're just
> making the issue less likely to occur, rather than fixing it.
Hi Jeremy,

I understand your concern here.  I agree we need to understand the
root cause.  I'll open up an issue to track this.
>
> Do you know if we have access to HW folks to scope the data & clock? We
> could set a GPIO when we see a CRC error, to use as an external
> trigger...
I'll ask around and see what can be done.  Taking a snap shot at time
of recreate should be enough to see what's going on in clock and
data timing when we fail, seems to be fairly consistent under some
hostboot access of PNOR conditions.

Thanks,
Chris
>
> Cheers,
>
>
> Jeremy
>
diff mbox

Patch

diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c
index a8976d5..8e832e0 100644
--- a/drivers/fsi/fsi-master-gpio.c
+++ b/drivers/fsi/fsi-master-gpio.c
@@ -14,6 +14,7 @@ 
 #include "fsi-master.h"
 
 #define	FSI_GPIO_STD_DLY	1	/* Standard pin delay in nS */
+#define	FSI_GPIO_SDA_IN_DLY	200	/* Wait to sample SDA line, in nS */
 #define	FSI_ECHO_DELAY_CLOCKS	16	/* Number clocks for echo delay */
 #define	FSI_PRE_BREAK_CLOCKS	50	/* Number clocks to prep for break */
 #define	FSI_BREAK_CLOCKS	256	/* Number of clocks to issue break */
@@ -97,7 +98,7 @@  static int sda_in(struct fsi_master_gpio *master)
 {
 	int in;
 
-	ndelay(FSI_GPIO_STD_DLY);
+	ndelay(FSI_GPIO_SDA_IN_DLY);
 	in = gpiod_get_value(master->gpio_data);
 	return in ? 1 : 0;
 }