Message ID | 1370417668-16832-16-git-send-email-shangw@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote: > The patch adds I/O chip backend to retrieve the state for the > indicated PE. While the PE state is temperarily unavailable, > we return the default wait time (1000ms). > > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> > --- > arch/powerpc/platforms/powernv/eeh-ioda.c | 102 ++++++++++++++++++++++++++++- > 1 files changed, 101 insertions(+), 1 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c > index e24622e..3c72321 100644 > --- a/arch/powerpc/platforms/powernv/eeh-ioda.c > +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c > @@ -125,10 +125,110 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option) > return ret; > } > > +/** > + * ioda_eeh_get_state - Retrieve the state of PE > + * @pe: EEH PE > + * @state: return value > + * > + * The PE's state should be retrieved from the PEEV, PEST > + * IODA tables. Since the OPAL has exported the function > + * to do it, it'd better to use that. > + */ > +static int ioda_eeh_get_state(struct eeh_pe *pe, int *state) > +{ So everywhere you have this "state" argument which isn't a state but a delay ... Moreover you only initialize it in one specific case and leave it otherwise uninitialized.... At the very least, init it to 0 by default as to not leave a dangling "return argument" like that. However, I still have a problem with it: > + case OPAL_EEH_STOPPED_TEMP_UNAVAIL: > + result |= EEH_STATE_UNAVAILABLE; > + if (state) > + *state = 1000; > + break; This is the *only* case where we return anything here. Why do we bother then and not have the upper layer simply wait one second whenever it gets a temp unavailable result (btw, you didn't differenciate temp unavailable from permanently unavailable in your API). This has impacts on patch 18/27 which I'll cover here: > +/** > + * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable > + * @pe: EEH PE > + * @option: operation to be issued > + * > + * The function is used to control the EEH functionality globally. > + * Currently, following options are support according to PAPR: > + * Enable EEH, Disable EEH, Enable MMIO and Enable DMA > + */ > +static int powernv_eeh_set_option(struct eeh_pe *pe, int option) > +{ > + struct pci_controller *hose = pe->phb; > + struct pnv_phb *phb = hose->private_data; > + int ret = -EEXIST; > + > + /* > + * What we need do is pass it down for hardware > + * implementation to handle it. > + */ > + if (phb->eeh_ops && phb->eeh_ops->set_option) > + ret = phb->eeh_ops->set_option(pe, option); > + > + return ret; > +} Should we implement something here ? IE. Should we look into disabling freezing in the PHB via the firmware ? Or we just don't care ? > +/** > + * powernv_eeh_get_pe_addr - Retrieve PE address > + * @pe: EEH PE > + * > + * Retrieve the PE address according to the given tranditional > + * PCI BDF (Bus/Device/Function) address. > + */ > +static int powernv_eeh_get_pe_addr(struct eeh_pe *pe) > +{ > + return pe->addr; > +} > > +/** > + * powernv_eeh_get_state - Retrieve PE state > + * @pe: EEH PE > + * @state: return value > + * > + * Retrieve the state of the specified PE. For IODA-compitable > + * platform, it should be retrieved from IODA table. Therefore, > + * we prefer passing down to hardware implementation to handle > + * it. > + */ > +static int powernv_eeh_get_state(struct eeh_pe *pe, int *state) > +{ > + struct pci_controller *hose = pe->phb; > + struct pnv_phb *phb = hose->private_data; > + int ret = EEH_STATE_NOT_SUPPORT; > + > + if (phb->eeh_ops && phb->eeh_ops->get_state) > + ret = phb->eeh_ops->get_state(pe, state); > + > + return ret; > +} Same comments about "state" which is really "delay" and is probably not necessary at all ... > +/** > + * powernv_eeh_reset - Reset the specified PE > + * @pe: EEH PE > + * @option: reset option > + * > + * Reset the specified PE > + */ > +static int powernv_eeh_reset(struct eeh_pe *pe, int option) > +{ > + struct pci_controller *hose = pe->phb; > + struct pnv_phb *phb = hose->private_data; > + int ret = -EEXIST; > + > + if (phb->eeh_ops && phb->eeh_ops->reset) > + ret = phb->eeh_ops->reset(pe, option); > + > + return ret; > +} > + > +/** > + * powernv_eeh_wait_state - Wait for PE state > + * @pe: EEH PE > + * @max_wait: maximal period in microsecond > + * > + * Wait for the state of associated PE. It might take some time > + * to retrieve the PE's state. > + */ > +static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait) > +{ > + int ret; > + int mwait; > + > + while (1) { > + ret = powernv_eeh_get_state(pe, &mwait); > + > + /* > + * If the PE's state is temporarily unavailable, > + * we have to wait for the specified time. Otherwise, > + * the PE's state will be returned immediately. > + */ > + if (ret != EEH_STATE_UNAVAILABLE) > + return ret; So here we do a compare, while ret is actually a bit mask ... In fact, ret should be named state_mask or something like that for clarity and you should do a bit test here. Also do you want to diffenciate permanent unavailability from temp. unavailability ? > + max_wait -= mwait; You decrement max_wait but never test it or use it. You probably mean to - Limit mwait to max_wait - If mwait is 0, return > + msleep(mwait); > + } > + > + return EEH_STATE_NOT_SUPPORT; > +} > + > +/** > + * powernv_eeh_get_log - Retrieve error log > + * @pe: EEH PE > + * @severity: temporary or permanent error log > + * @drv_log: driver log to be combined with retrieved error log > + * @len: length of driver log > + * > + * Retrieve the temporary or permanent error from the PE. > + */ > +static int powernv_eeh_get_log(struct eeh_pe *pe, int severity, > + char *drv_log, unsigned long len) > +{ > + struct pci_controller *hose = pe->phb; > + struct pnv_phb *phb = hose->private_data; > + int ret = -EEXIST; > + > + if (phb->eeh_ops && phb->eeh_ops->get_log) > + ret = phb->eeh_ops->get_log(pe, severity, drv_log, len); > + > + return ret; > +} > + > +/** > + * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE > + * @pe: EEH PE > + * > + * The function will be called to reconfigure the bridges included > + * in the specified PE so that the mulfunctional PE would be recovered > + * again. > + */ > +static int powernv_eeh_configure_bridge(struct eeh_pe *pe) > +{ > + struct pci_controller *hose = pe->phb; > + struct pnv_phb *phb = hose->private_data; > + int ret = 0; > + > + if (phb->eeh_ops && phb->eeh_ops->configure_bridge) > + ret = phb->eeh_ops->configure_bridge(pe); > + > + return ret; > +} Ben.
On Tue, Jun 11, 2013 at 05:37:04PM +1000, Benjamin Herrenschmidt wrote: >On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote: >> The patch adds I/O chip backend to retrieve the state for the >> indicated PE. While the PE state is temperarily unavailable, >> we return the default wait time (1000ms). >> >> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> >> --- >> arch/powerpc/platforms/powernv/eeh-ioda.c | 102 ++++++++++++++++++++++++++++- >> 1 files changed, 101 insertions(+), 1 deletions(-) >> >> diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c >> index e24622e..3c72321 100644 >> --- a/arch/powerpc/platforms/powernv/eeh-ioda.c >> +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c >> @@ -125,10 +125,110 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option) >> return ret; >> } >> >> +/** >> + * ioda_eeh_get_state - Retrieve the state of PE >> + * @pe: EEH PE >> + * @state: return value >> + * >> + * The PE's state should be retrieved from the PEEV, PEST >> + * IODA tables. Since the OPAL has exported the function >> + * to do it, it'd better to use that. >> + */ >> +static int ioda_eeh_get_state(struct eeh_pe *pe, int *state) >> +{ > >So everywhere you have this "state" argument which isn't a state but a delay ... > >Moreover you only initialize it in one specific case and leave it otherwise >uninitialized.... > >At the very least, init it to 0 by default as to not leave a dangling >"return argument" like that. However, I still have a problem with it: > Ok. I will update accordingly in upper layer (eeh-powernv.c) - Initialize it to value "0". - If necessary, return 1 second. >> + case OPAL_EEH_STOPPED_TEMP_UNAVAIL: >> + result |= EEH_STATE_UNAVAILABLE; >> + if (state) >> + *state = 1000; >> + break; > >This is the *only* case where we return anything here. Why do we bother >then and not have the upper layer simply wait one second whenever it gets >a temp unavailable result (btw, you didn't differenciate temp unavailable >from permanently unavailable in your API). > We already defferentiated the permanent/temp availibility through the return value from the function: - EEH_STATE_UNAVAILABLE: temporary unavailibility - EEH_STATE_NOT_SUPPORT: permanent unavailibility The EEH core will handle the return value (from the function) accordingly. >This has impacts on patch 18/27 which I'll cover here: > >> +/** >> + * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable >> + * @pe: EEH PE >> + * @option: operation to be issued >> + * >> + * The function is used to control the EEH functionality globally. >> + * Currently, following options are support according to PAPR: >> + * Enable EEH, Disable EEH, Enable MMIO and Enable DMA >> + */ >> +static int powernv_eeh_set_option(struct eeh_pe *pe, int option) >> +{ >> + struct pci_controller *hose = pe->phb; >> + struct pnv_phb *phb = hose->private_data; >> + int ret = -EEXIST; >> + >> + /* >> + * What we need do is pass it down for hardware >> + * implementation to handle it. >> + */ >> + if (phb->eeh_ops && phb->eeh_ops->set_option) >> + ret = phb->eeh_ops->set_option(pe, option); >> + >> + return ret; >> +} > >Should we implement something here ? IE. Should we look into >disabling freezing in the PHB via the firmware ? Or we just don't care ? > We just don't care. If EEH functionality has been disabled, we shouldn't run into the code. >> +/** >> + * powernv_eeh_get_pe_addr - Retrieve PE address >> + * @pe: EEH PE >> + * >> + * Retrieve the PE address according to the given tranditional >> + * PCI BDF (Bus/Device/Function) address. >> + */ >> +static int powernv_eeh_get_pe_addr(struct eeh_pe *pe) >> +{ >> + return pe->addr; >> +} >> >> +/** >> + * powernv_eeh_get_state - Retrieve PE state >> + * @pe: EEH PE >> + * @state: return value >> + * >> + * Retrieve the state of the specified PE. For IODA-compitable >> + * platform, it should be retrieved from IODA table. Therefore, >> + * we prefer passing down to hardware implementation to handle >> + * it. >> + */ >> +static int powernv_eeh_get_state(struct eeh_pe *pe, int *state) >> +{ >> + struct pci_controller *hose = pe->phb; >> + struct pnv_phb *phb = hose->private_data; >> + int ret = EEH_STATE_NOT_SUPPORT; >> + >> + if (phb->eeh_ops && phb->eeh_ops->get_state) >> + ret = phb->eeh_ops->get_state(pe, state); >> + >> + return ret; >> +} > >Same comments about "state" which is really "delay" and is probably >not necessary at all ... > We need the "delay" in future to support PowerKVM guest. If the specified PE is being reset, we rely on the delay to hold the powerkvm guest for a while until the PE reset is done. >> +/** >> + * powernv_eeh_reset - Reset the specified PE >> + * @pe: EEH PE >> + * @option: reset option >> + * >> + * Reset the specified PE >> + */ >> +static int powernv_eeh_reset(struct eeh_pe *pe, int option) >> +{ >> + struct pci_controller *hose = pe->phb; >> + struct pnv_phb *phb = hose->private_data; >> + int ret = -EEXIST; >> + >> + if (phb->eeh_ops && phb->eeh_ops->reset) >> + ret = phb->eeh_ops->reset(pe, option); >> + >> + return ret; >> +} >> + >> +/** >> + * powernv_eeh_wait_state - Wait for PE state >> + * @pe: EEH PE >> + * @max_wait: maximal period in microsecond >> + * >> + * Wait for the state of associated PE. It might take some time >> + * to retrieve the PE's state. >> + */ >> +static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait) >> +{ >> + int ret; >> + int mwait; >> + >> + while (1) { >> + ret = powernv_eeh_get_state(pe, &mwait); >> + >> + /* >> + * If the PE's state is temporarily unavailable, >> + * we have to wait for the specified time. Otherwise, >> + * the PE's state will be returned immediately. >> + */ >> + if (ret != EEH_STATE_UNAVAILABLE) >> + return ret; > >So here we do a compare, while ret is actually a bit mask ... > >In fact, ret should be named state_mask or something like that for clarity >and you should do a bit test here. Also do you want to diffenciate >permanent unavailability from temp. unavailability ? > >> + max_wait -= mwait; > >You decrement max_wait but never test it or use it. You probably mean to > > - Limit mwait to max_wait > - If mwait is 0, return > Yeah, I will change the code accordingly in next revision. >> + msleep(mwait); >> + } >> + >> + return EEH_STATE_NOT_SUPPORT; >> +} >> + >> +/** >> + * powernv_eeh_get_log - Retrieve error log >> + * @pe: EEH PE >> + * @severity: temporary or permanent error log >> + * @drv_log: driver log to be combined with retrieved error log >> + * @len: length of driver log >> + * >> + * Retrieve the temporary or permanent error from the PE. >> + */ >> +static int powernv_eeh_get_log(struct eeh_pe *pe, int severity, >> + char *drv_log, unsigned long len) >> +{ >> + struct pci_controller *hose = pe->phb; >> + struct pnv_phb *phb = hose->private_data; >> + int ret = -EEXIST; >> + >> + if (phb->eeh_ops && phb->eeh_ops->get_log) >> + ret = phb->eeh_ops->get_log(pe, severity, drv_log, len); >> + >> + return ret; >> +} >> + >> +/** >> + * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE >> + * @pe: EEH PE >> + * >> + * The function will be called to reconfigure the bridges included >> + * in the specified PE so that the mulfunctional PE would be recovered >> + * again. >> + */ >> +static int powernv_eeh_configure_bridge(struct eeh_pe *pe) >> +{ >> + struct pci_controller *hose = pe->phb; >> + struct pnv_phb *phb = hose->private_data; >> + int ret = 0; >> + >> + if (phb->eeh_ops && phb->eeh_ops->configure_bridge) >> + ret = phb->eeh_ops->configure_bridge(pe); >> + >> + return ret; >> +} Thanks, Gavin
On Wed, 2013-06-12 at 11:32 +0800, Gavin Shan wrote: > >Same comments about "state" which is really "delay" and is probably > >not necessary at all ... > > > > We need the "delay" in future to support PowerKVM guest. If the > specified PE is being reset, we rely on the delay to hold the > powerkvm guest for a while until the PE reset is done. Do we ? Can't we just rely on "temp unavailble" result and wait 1s when that happens (then try again) ? IE, A delay associated with a state doesn't make that much sense semantically speaking. With a state *transition* maybe but this isn't what this function is about... Cheers, Ben.
On Wed, Jun 12, 2013 at 02:19:25PM +1000, Benjamin Herrenschmidt wrote: >On Wed, 2013-06-12 at 11:32 +0800, Gavin Shan wrote: > >> >Same comments about "state" which is really "delay" and is probably >> >not necessary at all ... >> > >> >> We need the "delay" in future to support PowerKVM guest. If the >> specified PE is being reset, we rely on the delay to hold the >> powerkvm guest for a while until the PE reset is done. > >Do we ? Can't we just rely on "temp unavailble" result and wait 1s when >that happens (then try again) ? > >IE, A delay associated with a state doesn't make that much sense >semantically speaking. With a state *transition* maybe but this isn't >what this function is about... > Sorry, Ben. I should have clarified more clearly: Basically, the EEH core is going to be shared by: powernv, pseries on top of powernv or phyp. While running pseries on top of phyp, we're getting PE state through RTAS call "ibm,read-slot-reset-state2" and desired delay returned from f/w for temporary unavailable PE. In the future, the function ioda_eeh_get_state() will be called directly to emulate the RTAS call for guest running on top of PowerNV. So the answer is we can do it by makeing the assumption that f/w won't return valid delay and we're going to use default value (1 second) for guest on powernv or phyp, or we keep the delay here. Thanks, Gavin
On Thu, 2013-06-13 at 12:26 +0800, Gavin Shan wrote: > So the answer is we can do it by makeing the assumption that f/w won't > return valid delay and we're going to use default value (1 second) for > guest on powernv or phyp, or we keep the delay here. Ok, at the very least then change the name to "unavailable_delay" or something explicit like that then :-) BTW. I've already applied patches 1 and 2 to my tree, you don't have to resend those. They'll show up today or tomorrow when I push my next branch out. Cheers, Ben.
On Thu, Jun 13, 2013 at 02:42:17PM +1000, Benjamin Herrenschmidt wrote: >On Thu, 2013-06-13 at 12:26 +0800, Gavin Shan wrote: >> So the answer is we can do it by makeing the assumption that f/w won't >> return valid delay and we're going to use default value (1 second) for >> guest on powernv or phyp, or we keep the delay here. > >Ok, at the very least then change the name to "unavailable_delay" or >something explicit like that then :-) > Ok. >BTW. I've already applied patches 1 and 2 to my tree, you don't have to >resend those. They'll show up today or tomorrow when I push my next >branch out. > Ok. Thanks, Ben. Thanks, Gavin
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index e24622e..3c72321 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -125,10 +125,110 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option) return ret; } +/** + * ioda_eeh_get_state - Retrieve the state of PE + * @pe: EEH PE + * @state: return value + * + * The PE's state should be retrieved from the PEEV, PEST + * IODA tables. Since the OPAL has exported the function + * to do it, it'd better to use that. + */ +static int ioda_eeh_get_state(struct eeh_pe *pe, int *state) +{ + s64 ret = 0; + u8 fstate; + u16 pcierr; + u32 pe_no; + int result; + struct pci_controller *hose = pe->phb; + struct pnv_phb *phb = hose->private_data; + + /* + * Sanity check on PE address. The PHB PE address should + * be zero. + */ + if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) { + pr_err("%s: PE address %x out of range [0, %x] " + "on PHB#%x\n", + __func__, pe->addr, phb->ioda.total_pe, + hose->global_number); + return EEH_STATE_NOT_SUPPORT; + } + + /* Retrieve PE status through OPAL */ + pe_no = pe->addr; + ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, + &fstate, &pcierr, NULL); + if (ret) { + pr_err("%s: Failed to get EEH status on " + "PHB#%x-PE#%x\n, err=%lld\n", + __func__, hose->global_number, pe_no, ret); + return EEH_STATE_NOT_SUPPORT; + } + + /* Check PHB status */ + if (pe->type & EEH_PE_PHB) { + result = 0; + result &= ~EEH_STATE_RESET_ACTIVE; + + if (pcierr != OPAL_EEH_PHB_ERROR) { + result |= EEH_STATE_MMIO_ACTIVE; + result |= EEH_STATE_DMA_ACTIVE; + result |= EEH_STATE_MMIO_ENABLED; + result |= EEH_STATE_DMA_ENABLED; + } + + return result; + } + + /* Parse result out */ + result = 0; + switch (fstate) { + case OPAL_EEH_STOPPED_NOT_FROZEN: + result &= ~EEH_STATE_RESET_ACTIVE; + result |= EEH_STATE_MMIO_ACTIVE; + result |= EEH_STATE_DMA_ACTIVE; + result |= EEH_STATE_MMIO_ENABLED; + result |= EEH_STATE_DMA_ENABLED; + break; + case OPAL_EEH_STOPPED_MMIO_FREEZE: + result &= ~EEH_STATE_RESET_ACTIVE; + result |= EEH_STATE_DMA_ACTIVE; + result |= EEH_STATE_DMA_ENABLED; + break; + case OPAL_EEH_STOPPED_DMA_FREEZE: + result &= ~EEH_STATE_RESET_ACTIVE; + result |= EEH_STATE_MMIO_ACTIVE; + result |= EEH_STATE_MMIO_ENABLED; + break; + case OPAL_EEH_STOPPED_MMIO_DMA_FREEZE: + result &= ~EEH_STATE_RESET_ACTIVE; + break; + case OPAL_EEH_STOPPED_RESET: + result |= EEH_STATE_RESET_ACTIVE; + break; + case OPAL_EEH_STOPPED_TEMP_UNAVAIL: + result |= EEH_STATE_UNAVAILABLE; + if (state) + *state = 1000; + break; + case OPAL_EEH_STOPPED_PERM_UNAVAIL: + result |= EEH_STATE_NOT_SUPPORT; + break; + default: + pr_warning("%s: Unexpected EEH status 0x%x " + "on PHB#%x-PE#%x\n", + __func__, fstate, hose->global_number, pe_no); + } + + return result; +} + struct pnv_eeh_ops ioda_eeh_ops = { .post_init = ioda_eeh_post_init, .set_option = ioda_eeh_set_option, - .get_state = NULL, + .get_state = ioda_eeh_get_state, .reset = NULL, .get_log = NULL, .configure_bridge = NULL
The patch adds I/O chip backend to retrieve the state for the indicated PE. While the PE state is temperarily unavailable, we return the default wait time (1000ms). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> --- arch/powerpc/platforms/powernv/eeh-ioda.c | 102 ++++++++++++++++++++++++++++- 1 files changed, 101 insertions(+), 1 deletions(-)