Message ID | 20170317211836.63576-1-cbostic@linux.vnet.ibm.com |
---|---|
State | Changes Requested, archived |
Headers | show |
On Fri, Mar 17, 2017 at 4:18 PM, Christopher Bostic <cbostic@linux.vnet.ibm.com> wrote: > During high cpu loads the SDA in line can shift relative to the > clock signal which can corrupt the received input data. Slow > down the time to sample input to account for this. > > Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com> > --- > v2 - Increase delay for SDA in sampling only. > - Decrease the original delay for SDA sampling from 1us to 200ns > --- > drivers/fsi/fsi-master-gpio.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c > index a8976d5..8e832e0 100644 > --- a/drivers/fsi/fsi-master-gpio.c > +++ b/drivers/fsi/fsi-master-gpio.c > @@ -14,6 +14,7 @@ > #include "fsi-master.h" > > #define FSI_GPIO_STD_DLY 1 /* Standard pin delay in nS */ > +#define FSI_GPIO_SDA_IN_DLY 200 /* Wait to sample SDA line, in nS */ > #define FSI_ECHO_DELAY_CLOCKS 16 /* Number clocks for echo delay */ > #define FSI_PRE_BREAK_CLOCKS 50 /* Number clocks to prep for break */ > #define FSI_BREAK_CLOCKS 256 /* Number of clocks to issue break */ > @@ -97,7 +98,7 @@ static int sda_in(struct fsi_master_gpio *master) > { > int in; > > - ndelay(FSI_GPIO_STD_DLY); > + ndelay(FSI_GPIO_SDA_IN_DLY); > in = gpiod_get_value(master->gpio_data); > return in ? 1 : 0; > } > -- > 1.8.2.2 > Acked-by: Andrew Geissler <geissonator@gmail.com>
Hi Chris, > During high cpu loads the SDA in line can shift relative to the > clock signal which can corrupt the received input data. Slow > down the time to sample input to account for this. This looks much better than the previous patch, but do you know why high CPU loads affect this? All of the GPIO interactions are performed with the single spinlock held, so CPU load should have no bearing on the timing of the bit-banging - the driver has exclusive use of the CPU during a FSI transaction. Can you elaborate on the symptoms of this? One thing that looks a little suspicious to me is that you're running the clock during the echo delay time, so you're potentially feeding random data to the slave... Cheers, Jeremy
On 3/20/17 12:33 AM, Jeremy Kerr wrote: > Hi Chris, > >> During high cpu loads the SDA in line can shift relative to the >> clock signal which can corrupt the received input data. Slow >> down the time to sample input to account for this. > This looks much better than the previous patch, but do you know why high > CPU loads affect this? All of the GPIO interactions are performed with > the single spinlock held, so CPU load should have no bearing on the > timing of the bit-banging - the driver has exclusive use of the CPU > during a FSI transaction. > Can you elaborate on the symptoms of this? Hi Jeremy, Yes that's true, cpu load shouldn't be a factor in this case then. The failure mode is during the sampling of input data during the slave response phase of the command. CRC received does not match the CRC calculated due to bit flips related to sampling/clocking timing issues. This problem only appears when hostboot is accessing the BMC. Increasing delay does correct the issue. I don't have a good explanation for why this timing behavior occurs given its not cpu load related. > One thing that looks a little suspicious to me is that you're running > the clock during the echo delay time, so you're potentially feeding > random data to the slave... > > Cheers, The echo delay in the driver clocks '1's on the line so as long as the echo_delay() utility is used there is no risk of random data in this phase. Clocking during this phase also gives the slave more clocks to process the newly sent command so the entire time for processing a single command and generating a response takes less follow on clocks. Thanks, Chris > > Jeremy >
Hi Chris, > The echo delay in the driver clocks '1's on the line so as long as the > echo_delay() utility is used there is no risk of random data in this > phase. Ah, I'd missed the set_sda_output(1), there. That should be fine then. > Yes that's true, cpu load shouldn't be a factor in this case then. The > failure mode is during the sampling of input data during the slave > response phase of the command. CRC received does not match the CRC > calculated due to bit flips > related to sampling/clocking timing issues. This problem only appears > when hostboot is accessing the BMC. Increasing delay does correct the > issue. I don't have a good explanation for why this timing behavior > occurs given its not cpu load related. My concern here is that with such a dramatic change in timing there (200x), and not knowing what the underlying problem is, we're just making the issue less likely to occur, rather than fixing it. Do you know if we have access to HW folks to scope the data & clock? We could set a GPIO when we see a CRC error, to use as an external trigger... Cheers, Jeremy
On 3/20/17 7:57 PM, Jeremy Kerr wrote: > Hi Chris, > >> The echo delay in the driver clocks '1's on the line so as long as the >> echo_delay() utility is used there is no risk of random data in this >> phase. > Ah, I'd missed the set_sda_output(1), there. That should be fine then. > >> Yes that's true, cpu load shouldn't be a factor in this case then. The >> failure mode is during the sampling of input data during the slave >> response phase of the command. CRC received does not match the CRC >> calculated due to bit flips >> related to sampling/clocking timing issues. This problem only appears >> when hostboot is accessing the BMC. Increasing delay does correct the >> issue. I don't have a good explanation for why this timing behavior >> occurs given its not cpu load related. > My concern here is that with such a dramatic change in timing there > (200x), and not knowing what the underlying problem is, we're just > making the issue less likely to occur, rather than fixing it. Hi Jeremy, I understand your concern here. I agree we need to understand the root cause. I'll open up an issue to track this. > > Do you know if we have access to HW folks to scope the data & clock? We > could set a GPIO when we see a CRC error, to use as an external > trigger... I'll ask around and see what can be done. Taking a snap shot at time of recreate should be enough to see what's going on in clock and data timing when we fail, seems to be fairly consistent under some hostboot access of PNOR conditions. Thanks, Chris > > Cheers, > > > Jeremy >
diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c index a8976d5..8e832e0 100644 --- a/drivers/fsi/fsi-master-gpio.c +++ b/drivers/fsi/fsi-master-gpio.c @@ -14,6 +14,7 @@ #include "fsi-master.h" #define FSI_GPIO_STD_DLY 1 /* Standard pin delay in nS */ +#define FSI_GPIO_SDA_IN_DLY 200 /* Wait to sample SDA line, in nS */ #define FSI_ECHO_DELAY_CLOCKS 16 /* Number clocks for echo delay */ #define FSI_PRE_BREAK_CLOCKS 50 /* Number clocks to prep for break */ #define FSI_BREAK_CLOCKS 256 /* Number of clocks to issue break */ @@ -97,7 +98,7 @@ static int sda_in(struct fsi_master_gpio *master) { int in; - ndelay(FSI_GPIO_STD_DLY); + ndelay(FSI_GPIO_SDA_IN_DLY); in = gpiod_get_value(master->gpio_data); return in ? 1 : 0; }
During high cpu loads the SDA in line can shift relative to the clock signal which can corrupt the received input data. Slow down the time to sample input to account for this. Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com> --- v2 - Increase delay for SDA in sampling only. - Decrease the original delay for SDA sampling from 1us to 200ns --- drivers/fsi/fsi-master-gpio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)