Message ID | 1457983772-4206-1-git-send-email-fbarrat@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Mon, 2016-03-14 at 20:29 +0100, Frederic Barrat wrote: > CXL driver synchronizes the PSL timebase with the CAPP during > initialization. If it can't synchronize, then the driver currently > fails and the cxl adapter is not usable. That behavior is a bit > extreme for the time being, as some adapters are known to have > troubles syncing their PSL timebase and there are no known use of it. > > Introduce a psl_timebase module parameter to control whether PSL > timebase is required or not. Default is to allow initializaton even > if > syncing failed. > Default behavior will be changed when current issues with some cxl > adapters are resolved. I'm not happy with doing this unless we add something which advertises that it's synced or not to userspace. If we do that, I'm happy to just fail without the need of the parameter but advertise it to userspace. The parameter is a bit of a PITA too, as it's a driver level config not card level. You really want to turn it on/off based on the card, not the whole system. Mikey > > Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> > --- > > Same as before, but this patch applies on 'next' > > drivers/misc/cxl/cxl.h | 1 + > drivers/misc/cxl/main.c | 4 ++++ > drivers/misc/cxl/pci.c | 7 +++++-- > 3 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h > index 38e21cf..84b563c 100644 > --- a/drivers/misc/cxl/cxl.h > +++ b/drivers/misc/cxl/cxl.h > @@ -27,6 +27,7 @@ > #include <uapi/misc/cxl.h> > > extern uint cxl_verbose; > +extern uint cxl_psl_timebase; > > #define CXL_TIMEOUT 5 > > diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c > index ae68c32..0344010 100644 > --- a/drivers/misc/cxl/main.c > +++ b/drivers/misc/cxl/main.c > @@ -32,6 +32,10 @@ uint cxl_verbose; > module_param_named(verbose, cxl_verbose, uint, 0600); > MODULE_PARM_DESC(verbose, "Enable verbose dmesg output"); > > +uint cxl_psl_timebase; > +module_param_named(psl_timebase, cxl_psl_timebase, uint, 0600); Can we make this a bool? > +MODULE_PARM_DESC(psl_timebase, "Require PSL timebase > synchronization"); > + > const struct cxl_backend_ops *cxl_ops; > > int cxl_afu_slbia(struct cxl_afu *afu) > diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c > index 2844e97..02fd1f8 100644 > --- a/drivers/misc/cxl/pci.c > +++ b/drivers/misc/cxl/pci.c > @@ -1144,8 +1144,11 @@ static int cxl_configure_adapter(struct cxl > *adapter, struct pci_dev *dev) > if ((rc = pnv_phb_to_cxl_mode(dev, > OPAL_PHB_CAPI_MODE_SNOOP_ON))) > goto err; > > - if ((rc = cxl_setup_psl_timebase(adapter, dev))) > - goto err; > + if ((rc = cxl_setup_psl_timebase(adapter, dev))) { > + if (cxl_psl_timebase) > + goto err; > + pr_err("PSL: Timebase sync: ignoring error\n"); > + } > > if ((rc = cxl_native_register_psl_err_irq(adapter))) > goto err;
On Tue, 2016-03-15 at 11:27 +1100, Michael Neuling wrote: > On Mon, 2016-03-14 at 20:29 +0100, Frederic Barrat wrote: > > CXL driver synchronizes the PSL timebase with the CAPP during > > initialization. If it can't synchronize, then the driver currently > > fails and the cxl adapter is not usable. That behavior is a bit > > extreme for the time being, as some adapters are known to have > > troubles syncing their PSL timebase and there are no known use of it. > > > > Introduce a psl_timebase module parameter to control whether PSL > > timebase is required or not. Default is to allow initializaton even > > if > > syncing failed. > > Default behavior will be changed when current issues with some cxl > > adapters are resolved. > > I'm not happy with doing this unless we add something which advertises > that it's synced or not to userspace. > > If we do that, I'm happy to just fail without the need of the parameter > but advertise it to userspace. > > The parameter is a bit of a PITA too, as it's a driver level config not > card level. You really want to turn it on/off based on the card, not > the whole system. And it just sounds like a big hack around broken hardware, which we'll have to carry for the foreseeable future. cheers
Hi Mikey, Michael Neuling <mikey@neuling.org> writes: > I'm not happy with doing this unless we add something which advertises > that it's synced or not to userspace. > > If we do that, I'm happy to just fail without the need of the parameter > but advertise it to userspace. > > The parameter is a bit of a PITA too, as it's a driver level config not > card level. You really want to turn it on/off based on the card, not > the whole system. Would a bool attribute in sysfs for the card (e.g timebase_sync) make sense? We can by default ignore (and log) timebase sync errors during card initialization and provide a simple wrapper in libcxl to query and set this attribute afterwards. Also it might be better if the AFU advertises the need for timebase synchronization via a configuration record so that we enable it for the psl iif one of the slices asks for it. I am not sure if this possible at the moment but I believe it something we can put forth the psl/hdk team before timebase usage becomes more widespread. ~ Vaibhav
Hi Mikey, Le 15/03/2016 01:27, Michael Neuling a écrit : > I'm not happy with doing this unless we add something which advertises > that it's synced or not to userspace. > > If we do that, I'm happy to just fail without the need of the parameter > but advertise it to userspace. OK, so I'm guessing that by advertising, you mean more than logging something in dmesg, since that's already the case. Are you suggesting to make the sync status available programmatically (like through a status on /sys)? Seems like it would be pushing some work on applications which want to use the psl timebase in the future, just because there's currently a problem with some card models. So I doubt I'm understanding you correctly here. My vote was to keep it simple: it's all or nothing. If the driver claims to support psl timebase sync, it should work on all the cards. This is not the case today, so let's not make it a requirement until we are confident it's working as expected. cxlflash is not using it, so there should be no harm done. We'd keep the psl sync code in mostly to get exposure on multiple setups. > The parameter is a bit of a PITA too, as it's a driver level config not > card level. You really want to turn it on/off based on the card, not > the whole system. I don't really care about the module parameter. It was mostly a feeble attempt to be developer-friendly and activate the feature easily on some setup. Fred
Le 15/03/2016 04:56, Michael Ellerman a écrit : > And it just sounds like a big hack around broken hardware, which we'll have to > carry for the foreseeable future. I discussed with Mikey about what would be acceptable here. There's basically no good solution. No matter how we workaround the issue, we'll have to pay a price in the future. Removing the psl timebase sync code is not desirable either as we cannot be 100% sure it's not being used by someone (on the cards where it works). So our only option at this point is to just keep debugging the problematic setups to come up with the real fix. In the meantime, this patch can be dropped. Fred
Excerpts from Michael Neuling's message of 2016-03-15 11:27:29 +1100: > I'm not happy with doing this unless we add something which advertises > that it's synced or not to userspace. In my opinion this is probably unnecessary (but it's not a bad idea either and I'm happy for that to be added). As far as I know, no one is using timebase in an AFU today, or if they are it is on working cards (if it wasn't someone would have complained by now). This change is just to make it so that the affected cards will work at all, since at the moment we treat the failure of this feature no one uses as fatal. If / when someone actually needs timebase, they will have to debug the root cause of the problem anyway (if we haven't already), at which point they will have a working card regardless of whether this patch is merged or not. If it turns out to be a software bug it is not unreasonable for them to say that their afu is only supported by kernel or firmware "whatever version we fix it in". That said, a read-only attribute in sysfs to indicate whether the timebase is synced or not would be fine and give them something they can query if they care about it. > The parameter is a bit of a PITA too, as it's a driver level config not > card level. You really want to turn it on/off based on the card, not > the whole system. I don't agree - userspace already needs to know what AFU they are dealing with and therefore knows if the timebase is important or not. A read-only attribute in sysfs would be enough to tell them this. We can be fairly confident that there is no software that would need to be changed to check this today since we had been failing the whole init so it wouldn't have worked at all regardless. We did have a software bug where the sync was sensitive to the kernel config, but that has been fixed. If anyone was using timebase (and I don't think anyone was) already and lucked out by testing on a kernel config that was working this won't impact them either since those cards will work either way now that issue has been fixed. IMO, we should ditch the module parameter altogether and never treat timebase sync failure as fatal, and leave that up to any applications actually need it to check. Cheers, -Ian
> IMO, we should ditch the module parameter altogether and never treat > timebase sync failure as fatal, and leave that up to any applications > actually need it to check. I agree with this this. Mikey
Ian, Miley, OK, I'm convinced. New version of the patch out shortly. Fred
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index 38e21cf..84b563c 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -27,6 +27,7 @@ #include <uapi/misc/cxl.h> extern uint cxl_verbose; +extern uint cxl_psl_timebase; #define CXL_TIMEOUT 5 diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c index ae68c32..0344010 100644 --- a/drivers/misc/cxl/main.c +++ b/drivers/misc/cxl/main.c @@ -32,6 +32,10 @@ uint cxl_verbose; module_param_named(verbose, cxl_verbose, uint, 0600); MODULE_PARM_DESC(verbose, "Enable verbose dmesg output"); +uint cxl_psl_timebase; +module_param_named(psl_timebase, cxl_psl_timebase, uint, 0600); +MODULE_PARM_DESC(psl_timebase, "Require PSL timebase synchronization"); + const struct cxl_backend_ops *cxl_ops; int cxl_afu_slbia(struct cxl_afu *afu) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 2844e97..02fd1f8 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -1144,8 +1144,11 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev) if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON))) goto err; - if ((rc = cxl_setup_psl_timebase(adapter, dev))) - goto err; + if ((rc = cxl_setup_psl_timebase(adapter, dev))) { + if (cxl_psl_timebase) + goto err; + pr_err("PSL: Timebase sync: ignoring error\n"); + } if ((rc = cxl_native_register_psl_err_irq(adapter))) goto err;
CXL driver synchronizes the PSL timebase with the CAPP during initialization. If it can't synchronize, then the driver currently fails and the cxl adapter is not usable. That behavior is a bit extreme for the time being, as some adapters are known to have troubles syncing their PSL timebase and there are no known use of it. Introduce a psl_timebase module parameter to control whether PSL timebase is required or not. Default is to allow initializaton even if syncing failed. Default behavior will be changed when current issues with some cxl adapters are resolved. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> --- Same as before, but this patch applies on 'next' drivers/misc/cxl/cxl.h | 1 + drivers/misc/cxl/main.c | 4 ++++ drivers/misc/cxl/pci.c | 7 +++++-- 3 files changed, 10 insertions(+), 2 deletions(-)