diff mbox

[next] cxl: Allow PSL timebase to not sync

Message ID 1457983772-4206-1-git-send-email-fbarrat@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Frederic Barrat March 14, 2016, 7:29 p.m. UTC
CXL driver synchronizes the PSL timebase with the CAPP during
initialization. If it can't synchronize, then the driver currently
fails and the cxl adapter is not usable. That behavior is a bit
extreme for the time being, as some adapters are known to have
troubles syncing their PSL timebase and there are no known use of it.

Introduce a psl_timebase module parameter to control whether PSL
timebase is required or not. Default is to allow initializaton even if
syncing failed.
Default behavior will be changed when current issues with some cxl
adapters are resolved.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---

Same as before, but this patch applies on 'next'

 drivers/misc/cxl/cxl.h  | 1 +
 drivers/misc/cxl/main.c | 4 ++++
 drivers/misc/cxl/pci.c  | 7 +++++--
 3 files changed, 10 insertions(+), 2 deletions(-)

Comments

Michael Neuling March 15, 2016, 12:27 a.m. UTC | #1
On Mon, 2016-03-14 at 20:29 +0100, Frederic Barrat wrote:
> CXL driver synchronizes the PSL timebase with the CAPP during
> initialization. If it can't synchronize, then the driver currently
> fails and the cxl adapter is not usable. That behavior is a bit
> extreme for the time being, as some adapters are known to have
> troubles syncing their PSL timebase and there are no known use of it.
> 
> Introduce a psl_timebase module parameter to control whether PSL
> timebase is required or not. Default is to allow initializaton even
> if
> syncing failed.
> Default behavior will be changed when current issues with some cxl
> adapters are resolved.

I'm not happy with doing this unless we add something which advertises
that it's synced or not to userspace.

If we do that, I'm happy to just fail without the need of the parameter
but advertise it to userspace.

The parameter is a bit of a PITA too, as it's a driver level config not
card level.  You really want to turn it on/off based on the card, not
the whole system.

Mikey

> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> ---
> 
> Same as before, but this patch applies on 'next'
> 
>  drivers/misc/cxl/cxl.h  | 1 +
>  drivers/misc/cxl/main.c | 4 ++++
>  drivers/misc/cxl/pci.c  | 7 +++++--
>  3 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index 38e21cf..84b563c 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -27,6 +27,7 @@
>  #include <uapi/misc/cxl.h>
>  
>  extern uint cxl_verbose;
> +extern uint cxl_psl_timebase;
>  
>  #define CXL_TIMEOUT 5
>  
> diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
> index ae68c32..0344010 100644
> --- a/drivers/misc/cxl/main.c
> +++ b/drivers/misc/cxl/main.c
> @@ -32,6 +32,10 @@ uint cxl_verbose;
>  module_param_named(verbose, cxl_verbose, uint, 0600);
>  MODULE_PARM_DESC(verbose, "Enable verbose dmesg output");
>  
> +uint cxl_psl_timebase;
> +module_param_named(psl_timebase, cxl_psl_timebase, uint, 0600);

Can we make this a bool?

> +MODULE_PARM_DESC(psl_timebase, "Require PSL timebase
> synchronization");
> +
>  const struct cxl_backend_ops *cxl_ops;
>  
>  int cxl_afu_slbia(struct cxl_afu *afu)
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 2844e97..02fd1f8 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1144,8 +1144,11 @@ static int cxl_configure_adapter(struct cxl
> *adapter, struct pci_dev *dev)
>  	if ((rc = pnv_phb_to_cxl_mode(dev,
> OPAL_PHB_CAPI_MODE_SNOOP_ON)))
>  		goto err;
>  
> -	if ((rc = cxl_setup_psl_timebase(adapter, dev)))
> -		goto err;
> +	if ((rc = cxl_setup_psl_timebase(adapter, dev))) {
> +		if (cxl_psl_timebase)
> +			goto err;
> +		pr_err("PSL: Timebase sync: ignoring error\n");
> +	}
>  
>  	if ((rc = cxl_native_register_psl_err_irq(adapter)))
>  		goto err;
Michael Ellerman March 15, 2016, 3:56 a.m. UTC | #2
On Tue, 2016-03-15 at 11:27 +1100, Michael Neuling wrote:

> On Mon, 2016-03-14 at 20:29 +0100, Frederic Barrat wrote:

> > CXL driver synchronizes the PSL timebase with the CAPP during
> > initialization. If it can't synchronize, then the driver currently
> > fails and the cxl adapter is not usable. That behavior is a bit
> > extreme for the time being, as some adapters are known to have
> > troubles syncing their PSL timebase and there are no known use of it.
> > 
> > Introduce a psl_timebase module parameter to control whether PSL
> > timebase is required or not. Default is to allow initializaton even
> > if
> > syncing failed.
> > Default behavior will be changed when current issues with some cxl
> > adapters are resolved.
> 
> I'm not happy with doing this unless we add something which advertises
> that it's synced or not to userspace.
> 
> If we do that, I'm happy to just fail without the need of the parameter
> but advertise it to userspace.
> 
> The parameter is a bit of a PITA too, as it's a driver level config not
> card level.  You really want to turn it on/off based on the card, not
> the whole system.

And it just sounds like a big hack around broken hardware, which we'll have to
carry for the foreseeable future.

cheers
Vaibhav Jain March 15, 2016, 4:09 a.m. UTC | #3
Hi Mikey,

Michael Neuling <mikey@neuling.org> writes:

> I'm not happy with doing this unless we add something which advertises
> that it's synced or not to userspace.
>
> If we do that, I'm happy to just fail without the need of the parameter
> but advertise it to userspace.
>
> The parameter is a bit of a PITA too, as it's a driver level config not
> card level.  You really want to turn it on/off based on the card, not
> the whole system.

Would a bool attribute in sysfs for the card (e.g timebase_sync) make
sense? We can by default ignore (and log) timebase sync errors during card
initialization and provide a simple wrapper in libcxl to query and set
this attribute afterwards.

Also it might be better if the AFU advertises the need for timebase
synchronization via a configuration record so that we enable it for the
psl iif one of the slices asks for it. I am not sure if this possible at
the moment but I believe it something we can put forth the psl/hdk team
before timebase usage becomes more widespread.

~ Vaibhav
Frederic Barrat March 15, 2016, 6:36 p.m. UTC | #4
Hi Mikey,

Le 15/03/2016 01:27, Michael Neuling a écrit :
> I'm not happy with doing this unless we add something which advertises
> that it's synced or not to userspace.
>
> If we do that, I'm happy to just fail without the need of the parameter
> but advertise it to userspace.

OK, so I'm guessing that by advertising, you mean more than logging 
something in dmesg, since that's already the case.
Are you suggesting to make the sync status available programmatically 
(like through a status on /sys)? Seems like it would be pushing some 
work on applications which want to use the psl timebase in the future, 
just because there's currently a problem with some card models.
So I doubt I'm understanding you correctly here.

My vote was to keep it simple: it's all or nothing. If the driver claims 
to support psl timebase sync, it should work on all the cards. This is 
not the case today, so let's not make it a requirement until we are 
confident it's working as expected. cxlflash is not using it, so there 
should be no harm done. We'd keep the psl sync code in mostly to get 
exposure on multiple setups.


> The parameter is a bit of a PITA too, as it's a driver level config not
> card level.  You really want to turn it on/off based on the card, not
> the whole system.

I don't really care about the module parameter. It was mostly a feeble 
attempt to be developer-friendly and activate the feature easily on some 
setup.

   Fred
Frederic Barrat March 16, 2016, 11:32 a.m. UTC | #5
Le 15/03/2016 04:56, Michael Ellerman a écrit :
> And it just sounds like a big hack around broken hardware, which we'll have to
> carry for the foreseeable future.

I discussed with Mikey about what would be acceptable here. There's 
basically no good solution. No matter how we workaround the issue, we'll 
have to pay a price in the future.
Removing the psl timebase sync code is not desirable either as we cannot 
be 100% sure it's not being used by someone (on the cards where it works).
So our only option at this point is to just keep debugging the 
problematic setups to come up with the real fix. In the meantime, this 
patch can be dropped.


   Fred
Ian Munsie March 17, 2016, 1:41 a.m. UTC | #6
Excerpts from Michael Neuling's message of 2016-03-15 11:27:29 +1100:
> I'm not happy with doing this unless we add something which advertises
> that it's synced or not to userspace.

In my opinion this is probably unnecessary (but it's not a bad idea
either and I'm happy for that to be added). As far as I know, no one is
using timebase in an AFU today, or if they are it is on working cards
(if it wasn't someone would have complained by now).

This change is just to make it so that the affected cards will work at
all, since at the moment we treat the failure of this feature no one
uses as fatal.

If / when someone actually needs timebase, they will have to debug the
root cause of the problem anyway (if we haven't already), at which point
they will have a working card regardless of whether this patch is merged
or not.

If it turns out to be a software bug it is not unreasonable for them to
say that their afu is only supported by kernel or firmware "whatever
version we fix it in".

That said, a read-only attribute in sysfs to indicate whether the
timebase is synced or not would be fine and give them something they can
query if they care about it.

> The parameter is a bit of a PITA too, as it's a driver level config not
> card level.  You really want to turn it on/off based on the card, not
> the whole system.

I don't agree - userspace already needs to know what AFU they are
dealing with and therefore knows if the timebase is important or not. A
read-only attribute in sysfs would be enough to tell them this. We can
be fairly confident that there is no software that would need to be
changed to check this today since we had been failing the whole init so
it wouldn't have worked at all regardless.

We did have a software bug where the sync was sensitive to the kernel
config, but that has been fixed. If anyone was using timebase (and I
don't think anyone was) already and lucked out by testing on a kernel
config that was working this won't impact them either since those cards
will work either way now that issue has been fixed.

IMO, we should ditch the module parameter altogether and never treat
timebase sync failure as fatal, and leave that up to any applications
actually need it to check.

Cheers,
-Ian
Michael Neuling March 17, 2016, 2:45 a.m. UTC | #7
> IMO, we should ditch the module parameter altogether and never treat
> timebase sync failure as fatal, and leave that up to any applications
> actually need it to check.

I agree with this this.

Mikey
Frederic Barrat March 17, 2016, 8:49 p.m. UTC | #8
Ian, Miley,

OK, I'm convinced.
New version of the patch out shortly.

   Fred
diff mbox

Patch

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 38e21cf..84b563c 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -27,6 +27,7 @@ 
 #include <uapi/misc/cxl.h>
 
 extern uint cxl_verbose;
+extern uint cxl_psl_timebase;
 
 #define CXL_TIMEOUT 5
 
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index ae68c32..0344010 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -32,6 +32,10 @@  uint cxl_verbose;
 module_param_named(verbose, cxl_verbose, uint, 0600);
 MODULE_PARM_DESC(verbose, "Enable verbose dmesg output");
 
+uint cxl_psl_timebase;
+module_param_named(psl_timebase, cxl_psl_timebase, uint, 0600);
+MODULE_PARM_DESC(psl_timebase, "Require PSL timebase synchronization");
+
 const struct cxl_backend_ops *cxl_ops;
 
 int cxl_afu_slbia(struct cxl_afu *afu)
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 2844e97..02fd1f8 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1144,8 +1144,11 @@  static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
 	if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON)))
 		goto err;
 
-	if ((rc = cxl_setup_psl_timebase(adapter, dev)))
-		goto err;
+	if ((rc = cxl_setup_psl_timebase(adapter, dev))) {
+		if (cxl_psl_timebase)
+			goto err;
+		pr_err("PSL: Timebase sync: ignoring error\n");
+	}
 
 	if ((rc = cxl_native_register_psl_err_irq(adapter)))
 		goto err;