Message ID | 1467226286-28547-1-git-send-email-imunsie@au.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Hi Ian, > -static int afu_control(struct cxl_afu *afu, u64 command, > +static int afu_control(struct cxl_afu *afu, u64 command, u64 clear, > u64 result, u64 mask, bool enabled) I'm not a big fan of the new "clear" argument, which forces us to pass an extra 0 most of the time. Why not always clearing the "action" bits of the register before applying the command? They are mutually exclusive, so it should work. I.e. : + AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An); + AFU_Cntl &= ~(CXL_AFU_Cntl_An_E | CXL_AFU_Cntl_An_RA); + AFU_Cntl |= command; > @@ -595,7 +597,16 @@ static int deactivate_afu_directed(struct cxl_afu *afu) > cxl_sysfs_afu_m_remove(afu); > cxl_chardev_afu_remove(afu); > > - cxl_ops->afu_reset(afu); > + if (afu->adapter->native->sl_ops->needs_reset_before_disable) { > + /* > + * XXX: We may be able to do away with this entirely - it is > + * possible that this was only ever needed due to a bug where > + * the disable operation did not clear the enable bit, however > + * I will only consider dropping this after more regression > + * testing on earlier PSL images. > + */ > + cxl_ops->afu_reset(afu); > + } > cxl_afu_disable(afu); > cxl_psl_purge(afu); > > @@ -735,7 +746,16 @@ static int native_attach_process(struct cxl_context *ctx, bool kernel, > > static inline int detach_process_native_dedicated(struct cxl_context *ctx) > { > - cxl_ops->afu_reset(ctx->afu); > + if (ctx->afu->adapter->native->sl_ops->needs_reset_before_disable) { > + /* > + * XXX: We may be able to do away with this entirely - it is > + * possible that this was only ever needed due to a bug where > + * the disable operation did not clear the enable bit, however > + * I will only consider dropping this after more regression > + * testing on earlier PSL images. > + */ > + cxl_ops->afu_reset(ctx->afu); > + } For dedicated mode, the CAIA recommends an explicit reset of the AFU (section 2.1.1). For directed mode, CAIA says it's AFU-specific. So for xsl, we only disable the afu and purge the xsl. Are we getting rid of the reset because it's useless in that environment, or because it times out? If just because of timeout, would it make sense to switch the order and disable first, then reset? I don't see any afu-reset on the next activation. Fred
Excerpts from Frederic Barrat's message of 2016-06-30 16:19:54 +0200: > I'm not a big fan of the new "clear" argument, which forces us to pass > an extra 0 most of the time. Why not always clearing the "action" bits > of the register before applying the command? They are mutually > exclusive, so it should work. I.e. : > > + AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An); > + AFU_Cntl &= ~(CXL_AFU_Cntl_An_E | CXL_AFU_Cntl_An_RA); > + AFU_Cntl |= command; In theory that should be OK, but I'd want to test it on some PSL images first just in case they don't behave quite how we expect since we've had problems in this area in the past (although after discovering the bug in the disable path that may provide the explanation for those problems). The risk I see with that approach is setting the reset bit and clearing the enable bit at the same time is commanding the hardware to do two similar but subtly different operations simultaneously, which is not something we have done before or tested - we can always clean this up in a later patch after we've tested it on a couple of PSLs, cxlflash, etc and are happy that it works with no ill effects. I don't see any reason that needs to be done in this patch though since it's just churn inside the driver and doesn't impact anyone else. > > static inline int detach_process_native_dedicated(struct cxl_context *ctx) > > { > > - cxl_ops->afu_reset(ctx->afu); > > + if (ctx->afu->adapter->native->sl_ops->needs_reset_before_disable) { > > + /* > > + * XXX: We may be able to do away with this entirely - it is > > + * possible that this was only ever needed due to a bug where > > + * the disable operation did not clear the enable bit, however > > + * I will only consider dropping this after more regression > > + * testing on earlier PSL images. > > + */ > > + cxl_ops->afu_reset(ctx->afu); > > + } > > For dedicated mode, the CAIA recommends an explicit reset of the AFU > (section 2.1.1). True, I had forgotten that procedure was added to the document before it was made public - I'll update the comment and resend. > For directed mode, CAIA says it's AFU-specific. Damnit, that should never have been architected as AFU specific - PSL implementation specific would have been ok, but AFU specific is just asking for trouble since this is all generic code with no way to identify the AFU. Oh well, in practice the AFU developers will be coding against our implementation now, so I guess we effectively became the authority on how this works by default... and therefore should probably continue to do the reset here. Just hope there aren't too many more ASIC implementations that implement some contradictory behaviour and are set in stone before anyone realises. I'll resend this with these two comments updated. > So for xsl, we only disable the afu and purge the xsl. Are we getting > rid of the reset because it's useless in that environment, or because > it times out? > > If just because of timeout, would it make sense to switch the order > and disable first, then reset? I don't see any afu-reset on the next > activation. It times out if the AFU is enabled when we attempt the reset - that's OK, but is a bit of a waste of time and generates unnecessary noise in the kernel log. We could switch the order, but I don't think that the AFU reset is necessary - the documentation certainly suggests that only a disable is required (but then again, the XSL has been full of surprises already so I reserve the right to send a later patch adding a reset if it has one more in store for me :-p). A reset is also pretty meaningless on the XSL as far as I can tell - it looks like it will assert a bit to the CX4 hardware, so I assume the rest of the card could choose to do something with it, but unlike the FPGA cards I don't think it actually resets anything (and I doubt we even need the one we do while initialising the card since the AFU descriptor is in the CX4 hardware and should be readable without that). Cheers, -Ian
Le 30/06/2016 17:32, Ian Munsie a écrit : >> For dedicated mode, the CAIA recommends an explicit reset of the AFU >> >(section 2.1.1). > True, I had forgotten that procedure was added to the document before it > was made public - I'll update the comment and resend. > Actually, my point was that for dedicated mode, we shouldn't have the "if" and always reset. It's only for dedicated mode, so it wouldn't impact cx4 and we would stay CAIA-compliant. If one day, there's a xsl with a dedicated mode AFU, they are expected to follow the spec. Fred
Excerpts from Frederic Barrat's message of 2016-06-30 17:50:00 +0200: > > Le 30/06/2016 17:32, Ian Munsie a écrit : > >> For dedicated mode, the CAIA recommends an explicit reset of the AFU > >> >(section 2.1.1). > > True, I had forgotten that procedure was added to the document before it > > was made public - I'll update the comment and resend. > > > > Actually, my point was that for dedicated mode, we shouldn't have the > "if" and always reset. It's only for dedicated mode, so it wouldn't > impact cx4 and we would stay CAIA-compliant. If one day, there's a xsl > with a dedicated mode AFU, they are expected to follow the spec. Yeah, I thought of that as well while I was updating the patch and removed that from the dedicated path. I still added a comment to that path to note that though. Cheers, -Ian
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index ce2b9d5..bab8dfd 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -544,6 +544,7 @@ struct cxl_service_layer_ops { void (*write_timebase_ctrl)(struct cxl *adapter); u64 (*timebase_read)(struct cxl *adapter); int capi_mode; + bool needs_reset_before_disable; }; struct cxl_native { diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c index 120c468..9479bfc 100644 --- a/drivers/misc/cxl/native.c +++ b/drivers/misc/cxl/native.c @@ -21,10 +21,10 @@ #include "cxl.h" #include "trace.h" -static int afu_control(struct cxl_afu *afu, u64 command, +static int afu_control(struct cxl_afu *afu, u64 command, u64 clear, u64 result, u64 mask, bool enabled) { - u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An); + u64 AFU_Cntl; unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT); int rc = 0; @@ -33,7 +33,8 @@ static int afu_control(struct cxl_afu *afu, u64 command, trace_cxl_afu_ctrl(afu, command); - cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl | command); + AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An); + cxl_p2n_write(afu, CXL_AFU_Cntl_An, (AFU_Cntl & ~clear) | command); AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An); while ((AFU_Cntl & mask) != result) { @@ -67,7 +68,7 @@ static int afu_enable(struct cxl_afu *afu) { pr_devel("AFU enable request\n"); - return afu_control(afu, CXL_AFU_Cntl_An_E, + return afu_control(afu, CXL_AFU_Cntl_An_E, 0, CXL_AFU_Cntl_An_ES_Enabled, CXL_AFU_Cntl_An_ES_MASK, true); } @@ -76,7 +77,8 @@ int cxl_afu_disable(struct cxl_afu *afu) { pr_devel("AFU disable request\n"); - return afu_control(afu, 0, CXL_AFU_Cntl_An_ES_Disabled, + return afu_control(afu, 0, CXL_AFU_Cntl_An_E, + CXL_AFU_Cntl_An_ES_Disabled, CXL_AFU_Cntl_An_ES_MASK, false); } @@ -85,7 +87,7 @@ static int native_afu_reset(struct cxl_afu *afu) { pr_devel("AFU reset request\n"); - return afu_control(afu, CXL_AFU_Cntl_An_RA, + return afu_control(afu, CXL_AFU_Cntl_An_RA, 0, CXL_AFU_Cntl_An_RS_Complete | CXL_AFU_Cntl_An_ES_Disabled, CXL_AFU_Cntl_An_RS_MASK | CXL_AFU_Cntl_An_ES_MASK, false); @@ -595,7 +597,16 @@ static int deactivate_afu_directed(struct cxl_afu *afu) cxl_sysfs_afu_m_remove(afu); cxl_chardev_afu_remove(afu); - cxl_ops->afu_reset(afu); + if (afu->adapter->native->sl_ops->needs_reset_before_disable) { + /* + * XXX: We may be able to do away with this entirely - it is + * possible that this was only ever needed due to a bug where + * the disable operation did not clear the enable bit, however + * I will only consider dropping this after more regression + * testing on earlier PSL images. + */ + cxl_ops->afu_reset(afu); + } cxl_afu_disable(afu); cxl_psl_purge(afu); @@ -735,7 +746,16 @@ static int native_attach_process(struct cxl_context *ctx, bool kernel, static inline int detach_process_native_dedicated(struct cxl_context *ctx) { - cxl_ops->afu_reset(ctx->afu); + if (ctx->afu->adapter->native->sl_ops->needs_reset_before_disable) { + /* + * XXX: We may be able to do away with this entirely - it is + * possible that this was only ever needed due to a bug where + * the disable operation did not clear the enable bit, however + * I will only consider dropping this after more regression + * testing on earlier PSL images. + */ + cxl_ops->afu_reset(ctx->afu); + } cxl_afu_disable(ctx->afu); cxl_psl_purge(ctx->afu); return 0; diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 58d7d821..b7f2e96 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -1309,6 +1309,7 @@ static const struct cxl_service_layer_ops psl_ops = { .write_timebase_ctrl = write_timebase_ctrl_psl, .timebase_read = timebase_read_psl, .capi_mode = OPAL_PHB_CAPI_MODE_CAPI, + .needs_reset_before_disable = true, }; static const struct cxl_service_layer_ops xsl_ops = {