diff mbox

[linux,dev-4.10] drivers/fsi: Remove scan from master registration

Message ID 1501180508-4009-1-git-send-email-eajames@linux.vnet.ibm.com
State Needs Review / ACK, archived
Headers show

Commit Message

Eddie James July 27, 2017, 6:35 p.m. UTC
From: "Edward A. James" <eajames@us.ibm.com>

Scanning during master registration is problematic because the gpio fsi
master will be registered during boot if it is present in device-tree
(it's a platform device). That means that the whole fsi scan occurs
during boot, which can cause problems.

Signed-off-by: Edward A. James <eajames@us.ibm.com>
---
 drivers/fsi/fsi-core.c       | 2 --
 drivers/fsi/fsi-master-hub.c | 4 +++-
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Jeremy Kerr July 28, 2017, 2:07 a.m. UTC | #1
Hi Chris,

> Scanning during master registration is problematic because the gpio fsi
> master will be registered during boot if it is present in device-tree
> (it's a platform device). That means that the whole fsi scan occurs
> during boot, which can cause problems.

What kind of problems?

This sounds a bit awkward to me - it's fairly reasonable to expect the
master to probe immediately. Requiring manual scan adds some unusual
dependencies to the bus initialisation process.

Cheers,


Jeremy
Jeremy Kerr July 28, 2017, 2:09 a.m. UTC | #2
Hi Eddie,

On 28/07/17 10:07, Jeremy Kerr wrote:
> Hi Chris,

Sorry, s/Chris/Eddie/ - I was reading the  CC line!

Cheers,


Jeremy
Eddie James July 28, 2017, 2:57 p.m. UTC | #3
On 07/27/2017 09:07 PM, Jeremy Kerr wrote:
> Hi Chris,
>
>> Scanning during master registration is problematic because the gpio fsi
>> master will be registered during boot if it is present in device-tree
>> (it's a platform device). That means that the whole fsi scan occurs
>> during boot, which can cause problems.
> What kind of problems?

Two main problems.

1) The SBEFIFO and it's client driver probes can hang if the SBE is not 
initialized. Basically, we wait forever for a response from the SBE and 
never get it. Not a problem with manual scan because a) it can be 
interrupted and b) in scanning manually, the SBEFIFO client driver 
probes are designed to fail during the scan. For some reason, the device 
registration doesn't immediately probe the driver if the scan happens at 
boot, so this workaround doesn't work. So the boot can hang.

2) FSI operations while the host is booting can cause FSI failures. 
Since we don't know the state of the host while booting the BMC, it's 
generally not safe to do FSI ops as we may disrupt the host.

Thanks,
Eddie

>
> This sounds a bit awkward to me - it's fairly reasonable to expect the
> master to probe immediately. Requiring manual scan adds some unusual
> dependencies to the bus initialisation process.
>
> Cheers,
>
>
> Jeremy
>
Patrick Williams July 28, 2017, 3:14 p.m. UTC | #4
On Fri, Jul 28, 2017 at 09:57:23AM -0500, Eddie James wrote:
> 2) FSI operations while the host is booting can cause FSI failures. 
> Since we don't know the state of the host while booting the BMC, it's 
> generally not safe to do FSI ops as we may disrupt the host.

From my perspective this is especially bad because it would only surface
in specific timing conditions.  The FSI bus is, by the system
architecture, under the control of the Host for a period during the
boot.  Both FSP in the past and BMC now are required to stay clear of
the FSI bus when the Host owns it.  If we do not, it can, in the best
case, cause unexpected FSI errors for the Host resulting in extra parts
being "guarded" and, in the worst case, cause silent corruption of
chip initialization registers leading to an unknown data integrity
problem at some future point.

When the BMC reboots, the kernel cannot know the ownership state of the
FSI bus, so we should assume it is un-owned and not do any activity
until userspace has had time to "assess the situation".
Jeremy Kerr July 31, 2017, 1:40 a.m. UTC | #5
Hi Eddie & Patrick,

> On Fri, Jul 28, 2017 at 09:57:23AM -0500, Eddie James wrote:
>> 2) FSI operations while the host is booting can cause FSI failures. 
>> Since we don't know the state of the host while booting the BMC, it's 
>> generally not safe to do FSI ops as we may disrupt the host.
> 
> From my perspective this is especially bad because it would only surface
> in specific timing conditions.  The FSI bus is, by the system
> architecture, under the control of the Host for a period during the
> boot.

OK, so this sounds like a good reason to add some form of arbitration,
but this patch isn't sufficient to handle that - we still have GPIO
initialisation on (FSI master) driver probe, and there's other
facilities that would cause the BMC to access the bus, potentially in
conflict with the host master.

So, I'd suggest this instead:

 - add a 'blocked' state to the fsi master driver, which prevents
   *all* accesses to the GPIOs, including during init; all master
   reads/writes would return -EBUSY.

 - add a boolean device tree property to indicate that a master should
   be initialised in blocked state (preventing GPIO changes during
   probe)

   It would make sense for this property to be associated with the
   'fsi-master' compatible value, as it'd potentially apply to all
   master implementations, not just GPIO-based ones.

 - add a sysfs file to control blocked state, which would be updated
   when the host acquires/releases control of the bus.

We may want to rethink the 'external mode' implementation to work with
this too.

Regards,


Jeremy
Christopher Bostic July 31, 2017, 2:27 a.m. UTC | #6
On 7/30/17 8:40 PM, Jeremy Kerr wrote:
> Hi Eddie & Patrick,
>
>> On Fri, Jul 28, 2017 at 09:57:23AM -0500, Eddie James wrote:
>>> 2) FSI operations while the host is booting can cause FSI failures.
>>> Since we don't know the state of the host while booting the BMC, it's
>>> generally not safe to do FSI ops as we may disrupt the host.
>>  From my perspective this is especially bad because it would only surface
>> in specific timing conditions.  The FSI bus is, by the system
>> architecture, under the control of the Host for a period during the
>> boot.
> OK, so this sounds like a good reason to add some form of arbitration,
> but this patch isn't sufficient to handle that - we still have GPIO
> initialisation on (FSI master) driver probe, and there's other
> facilities that would cause the BMC to access the bus, potentially in
> conflict with the host master.
>
> So, I'd suggest this instead:
>
>   - add a 'blocked' state to the fsi master driver, which prevents
>     *all* accesses to the GPIOs, including during init; all master
>     reads/writes would return -EBUSY.
>
>   - add a boolean device tree property to indicate that a master should
>     be initialised in blocked state (preventing GPIO changes during
>     probe)
Hi Jeremy,

Not sure I follow what the FSI GPIO master probe would do if it can't 
access the GPIO itself, assuming we're in blocked mode. Would this 
require a deferred probe of some kind once access is unblocked?

There is also a control register setting


Also, in the existing FSP-2 implementation there is a bit in the FSI 
slave SISC which indicates if OPB if fence by host.  Which I believe is 
set whenever host boot is using the FSI bus.   Possibly we could use 
that info to determine if any FSI accesses past the CFAM slave engines 
are allowed.   This might be racy though since BMC can't control when 
that bit is set and cleared.



>
>     It would make sense for this property to be associated with the
>     'fsi-master' compatible value, as it'd potentially apply to all
>     master implementations, not just GPIO-based ones.
>
>   - add a sysfs file to control blocked state, which would be updated
>     when the host acquires/releases control of the bus.
>
> We may want to rethink the 'external mode' implementation to work with
> this too.
>
> Regards,
>
>
> Jeremy
>
Jeremy Kerr July 31, 2017, 2:36 a.m. UTC | #7
Hi Chris,

> Not sure I follow what the FSI GPIO master probe would do if it can't
> access the GPIO itself, assuming we're in blocked mode. Would this
> require a deferred probe of some kind once access is unblocked?

Yes, any state change from blocked to unblocked would need to reinit the
GPIOs, and potentially initiate a scan.

So, if we're initialised in blocked mode, the fsi master gpio driver
would still claim the GPIOs (but not change any of their states),
register with the FSI core, etc, but couldn't start any operations that
require GPIO activity.

> Also, in the existing FSP-2 implementation there is a bit in the FSI
> slave SISC which indicates if OPB if fence by host.  Which I believe is
> set whenever host boot is using the FSI bus.   Possibly we could use
> that info to determine if any FSI accesses past the CFAM slave engines
> are allowed.   This might be racy though since BMC can't control when
> that bit is set and cleared.

OK, I'm confused - are we able to access *some* parts of the FSI bus
(ie, the SISC register in the CFAM) while the host is booting?

But as you say, this might race. It would seem simpler to use the BMC's
knowledge of when the host is booting to decide when the BMC is able to
access the bus.

Cheers,


Jeremy
Andrew Jeffery July 31, 2017, 3:20 a.m. UTC | #8
On Mon, 2017-07-31 at 10:36 +0800, Jeremy Kerr wrote:
> Hi Chris,
> 
> > Not sure I follow what the FSI GPIO master probe would do if it can't
> > access the GPIO itself, assuming we're in blocked mode. Would this
> > require a deferred probe of some kind once access is unblocked?
> 
> Yes, any state change from blocked to unblocked would need to reinit the
> GPIOs, and potentially initiate a scan.
> 
> So, if we're initialised in blocked mode, the fsi master gpio driver
> would still claim the GPIOs (but not change any of their states),
> register with the FSI core, etc, but couldn't start any operations that
> require GPIO activity.

On this note, we should probably configure reset tolerance for the FSI
GPIOs on the Aspeed SoC to avoid glitching them in any way during a SoC
reset.

> 
> > Also, in the existing FSP-2 implementation there is a bit in the FSI
> > slave SISC which indicates if OPB if fence by host.  Which I believe is
> > set whenever host boot is using the FSI bus.   Possibly we could use
> > that info to determine if any FSI accesses past the CFAM slave engines
> > are allowed.   This might be racy though since BMC can't control when
> > that bit is set and cleared.
> 
> OK, I'm confused - are we able to access *some* parts of the FSI bus
> (ie, the SISC register in the CFAM) while the host is booting?
> 
> But as you say, this might race. It would seem simpler to use the BMC's
> knowledge of when the host is booting to decide when the BMC is able to
> access the bus.
> 
> Cheers,
> 
> 
> Jeremy
diff mbox

Patch

diff --git a/drivers/fsi/fsi-core.c b/drivers/fsi/fsi-core.c
index 9b83f1a..348ed45 100644
--- a/drivers/fsi/fsi-core.c
+++ b/drivers/fsi/fsi-core.c
@@ -927,8 +927,6 @@  int fsi_master_register(struct fsi_master *master)
 		return rc;
 	}
 
-	fsi_master_scan(master);
-
 	return 0;
 }
 EXPORT_SYMBOL_GPL(fsi_master_register);
diff --git a/drivers/fsi/fsi-master-hub.c b/drivers/fsi/fsi-master-hub.c
index 3223a67..b5bb064 100644
--- a/drivers/fsi/fsi-master-hub.c
+++ b/drivers/fsi/fsi-master-hub.c
@@ -288,8 +288,10 @@  static int hub_master_probe(struct device *dev)
 	hub_master_init(hub);
 
 	rc = fsi_master_register(&hub->master);
-	if (!rc)
+	if (!rc) {
+		fsi_master_rescan(&hub->master);
 		return 0;
+	}
 
 	kfree(hub);
 err_release: