diff mbox series

nvram: Fix 'missing' nvram on FSP systems.

Message ID 20171107060759.32202-1-cyril.bur@au1.ibm.com
State Changes Requested
Headers show
Series nvram: Fix 'missing' nvram on FSP systems. | expand

Commit Message

Cyril Bur Nov. 7, 2017, 6:07 a.m. UTC
commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
from NVRAM rather early. This works fine on BMC based systems as
nvram_init() is actually synchronous. This is not true for FSP systems
and it turns out that the query for the console log level simply
queries blank nvram.

The simple fix is to wait for the NVRAM read to complete before
performing any query. Unfortunately it turns out that the fsp-nvram
code does not inform the generic NVRAM layer when the read is complete,
rather, it must be prompted to do so.

This patch addresses both these problems. This patch adds a check before
the first read of the NVRAM (for the console log level) that the read
has completed. The fsp-nvram code has been updated to inform the generic
layer as soon as the read completes.

The old prompt to the fsp-nvram code has been removed but a check to
ensure that the NVRAM has been loaded remains. It is conservative but
if the NVRAM is not done loading before the host is booted it will not
have an nvram device-tree node which means it won't be able to access
the NVRAM at all, ever, even after the NVRAM has loaded.

Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
---
 core/init.c                  | 11 +++++++++--
 core/nvram-format.c          |  5 +++++
 core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
 core/test/run-nvram-format.c |  5 +++++
 hw/fsp/fsp-nvram.c           | 28 ++++------------------------
 include/fsp.h                |  1 -
 include/nvram.h              |  2 ++
 7 files changed, 60 insertions(+), 27 deletions(-)

Comments

Cyril Bur Nov. 7, 2017, 11:32 p.m. UTC | #1
On Tue, 2017-11-07 at 17:07 +1100, Cyril Bur wrote:
> commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
> from NVRAM rather early. This works fine on BMC based systems as
> nvram_init() is actually synchronous. This is not true for FSP systems
> and it turns out that the query for the console log level simply
> queries blank nvram.
> 
> The simple fix is to wait for the NVRAM read to complete before
> performing any query. Unfortunately it turns out that the fsp-nvram
> code does not inform the generic NVRAM layer when the read is complete,
> rather, it must be prompted to do so.
> 
> This patch addresses both these problems. This patch adds a check before
> the first read of the NVRAM (for the console log level) that the read
> has completed. The fsp-nvram code has been updated to inform the generic
> layer as soon as the read completes.
> 
> The old prompt to the fsp-nvram code has been removed but a check to
> ensure that the NVRAM has been loaded remains. It is conservative but
> if the NVRAM is not done loading before the host is booted it will not
> have an nvram device-tree node which means it won't be able to access
> the NVRAM at all, ever, even after the NVRAM has loaded.
> 
> Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
> ---
>  core/init.c                  | 11 +++++++++--
>  core/nvram-format.c          |  5 +++++
>  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
>  core/test/run-nvram-format.c |  5 +++++
>  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
>  include/fsp.h                |  1 -
>  include/nvram.h              |  2 ++
>  7 files changed, 60 insertions(+), 27 deletions(-)
> 
> diff --git a/core/init.c b/core/init.c
> index 8951e17b..17de9851 100644
> --- a/core/init.c
> +++ b/core/init.c
> @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
>  		/* We wait for the nvram read to complete here so we can
>  		 * grab stuff from there such as the kernel arguments
>  		 */
> -		fsp_nvram_wait_open();
> +		nvram_wait_for_load();
>  
>  		/* Wait for FW VPD data read to complete */
>  		fsp_code_update_wait_vpd(true);
> @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
>  	/* preload the IMC catalog dtb */
>  	imc_catalog_preload();
>  
> -	/* Set the console level */
> +	/*
> +	 * Set the console level
> +	 * We need NVRAM to have been setup by now otherwise we'll
> +	 * just read a zeroed buffer.
> +	 * Some FSP systems can actually be quite slow to return the
> +	 * NVRAM.
> +	 */
> +	nvram_wait_for_load();

There is a bit of debate as to where this should go.

The way I see it there are three options.

1) Inside nvram_query()
2) Here
3) At the end of nvram_init()

To me here are some pros and cons:

Option 1:
Advantage: Abstracted away.
   This is nice, there can be a gigantic warning/error that appears if
nvram_query() is called before nvram_init() but other than that, you
don't need to know how it all works, just call nvram_query().
"Hopefully" the first call to nvram_query() happens after the FSP
responds, no wait, nothing, great!

Disadvantage: Abstracted away.
   If this was userspace I'd say that there's no disadvantage, but this
is firmware and abstractions that hide a busy loop may not be the best
idea. A naive caller of nvram_query() might not realise it is going to
spin and this may be a bad idea. Of course this is FSP only since
nvram_init() is actually synchronous on BMC... and FSP boots are
already slow... moving on.

Option 2:
Advantage: Doesn't hide details. "Look theres a busy loop here".
Because this is firmware and not userspace I'll class it as a good
thing.

Disadvantage: Everyone who wants to call nvram_query() needs to be sure
that they've waited for the load. Although, this problem kind of
already exists since you need to be sure nvram_init() has been called.

Option 3:
Advantage: Abstracted away. If we're going hide it, here is nicer. It's
in a init so there's a bit more of an expectation this function might
take time.

Disadvantage: nvram_init() is now effectively synchronous on both FSP
and BMC when (correct me if I'm wrong) it actually doesn't need to be
(BMC would take much more work). It means we can't get anything else
done while we wait, and in the event that an nvram_query() wasn't going
to happen before the response, we wasted that time.


Let the discussion begin...

>  	console_log_level();
>  
>  	/* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
> diff --git a/core/nvram-format.c b/core/nvram-format.c
> index 923098af..655cb8ee 100644
> --- a/core/nvram-format.c
> +++ b/core/nvram-format.c
> @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
>  	const char *part_end, *start;
>  	int key_len = strlen(key);
>  
> +	if (!nvram_has_loaded()) {
> +		prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
> +		return NULL;
> +	}
> +
>  	/*
>  	 * The running OS can modify the NVRAM as it pleases so we need to be
>  	 * a little paranoid and check that it's ok before we try parse it.
> diff --git a/core/nvram.c b/core/nvram.c
> index 2140706a..de6cbddf 100644
> --- a/core/nvram.c
> +++ b/core/nvram.c
> @@ -122,6 +122,41 @@ void nvram_read_complete(bool success)
>  	nvram_ready = true;
>  }
>  
> +bool nvram_wait_for_load(void)
> +{
> +	/* Short cut */
> +	if (nvram_ready)
> +		return true;
> +
> +	/* Tell the caller it will never happen */
> +	if (!platform.nvram_info)
> +		return false;
> +
> +	/*
> +	 * One of two things has happened here.
> +	 * 1. nvram_wait_for_load() was called before nvram_init()
> +	 * 2. The read of NVRAM failed.
> +	 * Either way, this is quite a bad event.
> +	 */
> +	if (!nvram_image && !nvram_size) {
> +		prlog(PR_CRIT, "NVRAM: Possible wait before nvram_init()!\n");
> +		return false;
> +	}
> +
> +	while (!nvram_ready) {
> +		opal_run_pollers();
> +		/* If the read fails, tell the caller */
> +		if (!nvram_image && !nvram_size)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +bool nvram_has_loaded(void)
> +{
> +	return nvram_ready;
> +}
> +
>  void nvram_init(void)
>  {
>  	int rc;
> diff --git a/core/test/run-nvram-format.c b/core/test/run-nvram-format.c
> index 5bd8ea2a..e59fdfb6 100644
> --- a/core/test/run-nvram-format.c
> +++ b/core/test/run-nvram-format.c
> @@ -23,6 +23,11 @@ bool nvram_validate(void)
>  	return true;
>  }
>  
> +bool nvram_has_loaded(void)
> +{
> +	return true;
> +}
> +
>  static char *nvram_reset(void *nvram_image, int size)
>  {
>  	struct chrp_nvram_hdr *h = nvram_image;
> diff --git a/hw/fsp/fsp-nvram.c b/hw/fsp/fsp-nvram.c
> index 85b7e815..52342fcf 100644
> --- a/hw/fsp/fsp-nvram.c
> +++ b/hw/fsp/fsp-nvram.c
> @@ -203,6 +203,10 @@ static void fsp_nvram_rd_complete(struct fsp_msg *msg)
>  		 */
>  	}
>  	unlock(&fsp_nvram_lock);
> +	nvram_read_complete(fsp_nvram_state == NVRAM_STATE_OPEN);
> +	if (fsp_nvram_state != NVRAM_STATE_OPEN)
> +		log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
> +		"FSP: NVRAM not read, skipping init\n");
>  }
>  
>  static void fsp_nvram_send_read(void)
> @@ -387,27 +391,3 @@ int fsp_nvram_write(uint32_t offset, void *src, uint32_t size)
>  
>  	return 0;
>  }
> -
> -/* This is called right before starting the payload (Linux) to
> - * ensure the initial open & read of nvram has happened before
> - * we transfer control as the guest OS. This is necessary as
> - * Linux will not handle a OPAL_BUSY return properly and treat
> - * it as an error
> - */
> -void fsp_nvram_wait_open(void)
> -{
> -	if (!fsp_present())
> -		return;
> -
> -	while(fsp_nvram_state == NVRAM_STATE_OPENING)
> -		opal_run_pollers();
> -
> -	if (!fsp_nvram_was_read) {
> -		log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
> -			"FSP: NVRAM not read, skipping init\n");
> -		nvram_read_complete(false);
> -		return;
> -	}
> -
> -	nvram_read_complete(true);
> -}
> diff --git a/include/fsp.h b/include/fsp.h
> index 1d877d81..e4bcae02 100644
> --- a/include/fsp.h
> +++ b/include/fsp.h
> @@ -776,7 +776,6 @@ extern void fsp_used_by_console(void);
>  extern int fsp_nvram_info(uint32_t *total_size);
>  extern int fsp_nvram_start_read(void *dst, uint32_t src, uint32_t len);
>  extern int fsp_nvram_write(uint32_t offset, void *src, uint32_t size);
> -extern void fsp_nvram_wait_open(void);
>  
>  /* RTC */
>  extern void fsp_rtc_init(void);
> diff --git a/include/nvram.h b/include/nvram.h
> index 288b5368..012c107f 100644
> --- a/include/nvram.h
> +++ b/include/nvram.h
> @@ -21,6 +21,8 @@ int nvram_format(void *nvram_image, uint32_t nvram_size);
>  int nvram_check(void *nvram_image, uint32_t nvram_size);
>  void nvram_reinit(void);
>  bool nvram_validate(void);
> +bool nvram_has_loaded(void);
> +bool nvram_wait_for_load(void);
>  
>  const char *nvram_query(const char *name);
>  bool nvram_query_eq(const char *key, const char *value);
Stewart Smith Nov. 13, 2017, 6:33 a.m. UTC | #2
Cyril Bur <cyril.bur@au1.ibm.com> writes:
> commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
> from NVRAM rather early. This works fine on BMC based systems as
> nvram_init() is actually synchronous. This is not true for FSP systems
> and it turns out that the query for the console log level simply
> queries blank nvram.
>
> The simple fix is to wait for the NVRAM read to complete before
> performing any query. Unfortunately it turns out that the fsp-nvram
> code does not inform the generic NVRAM layer when the read is complete,
> rather, it must be prompted to do so.
>
> This patch addresses both these problems. This patch adds a check before
> the first read of the NVRAM (for the console log level) that the read
> has completed. The fsp-nvram code has been updated to inform the generic
> layer as soon as the read completes.
>
> The old prompt to the fsp-nvram code has been removed but a check to
> ensure that the NVRAM has been loaded remains. It is conservative but
> if the NVRAM is not done loading before the host is booted it will not
> have an nvram device-tree node which means it won't be able to access
> the NVRAM at all, ever, even after the NVRAM has loaded.
>
> Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
> ---
>  core/init.c                  | 11 +++++++++--
>  core/nvram-format.c          |  5 +++++
>  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
>  core/test/run-nvram-format.c |  5 +++++
>  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
>  include/fsp.h                |  1 -
>  include/nvram.h              |  2 ++
>  7 files changed, 60 insertions(+), 27 deletions(-)
>
> diff --git a/core/init.c b/core/init.c
> index 8951e17b..17de9851 100644
> --- a/core/init.c
> +++ b/core/init.c
> @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
>  		/* We wait for the nvram read to complete here so we can
>  		 * grab stuff from there such as the kernel arguments
>  		 */
> -		fsp_nvram_wait_open();
> +		nvram_wait_for_load();
>  
>  		/* Wait for FW VPD data read to complete */
>  		fsp_code_update_wait_vpd(true);
> @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
>  	/* preload the IMC catalog dtb */
>  	imc_catalog_preload();
>  
> -	/* Set the console level */
> +	/*
> +	 * Set the console level
> +	 * We need NVRAM to have been setup by now otherwise we'll
> +	 * just read a zeroed buffer.
> +	 * Some FSP systems can actually be quite slow to return the
> +	 * NVRAM.
> +	 */
> +	nvram_wait_for_load();
>  	console_log_level();
>  
>  	/* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
> diff --git a/core/nvram-format.c b/core/nvram-format.c
> index 923098af..655cb8ee 100644
> --- a/core/nvram-format.c
> +++ b/core/nvram-format.c
> @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
>  	const char *part_end, *start;
>  	int key_len = strlen(key);
>  
> +	if (!nvram_has_loaded()) {
> +		prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
> +		return NULL;
> +	}
> +
>  	/*
>  	 * The running OS can modify the NVRAM as it pleases so we need to be
>  	 * a little paranoid and check that it's ok before we try parse it.
> diff --git a/core/nvram.c b/core/nvram.c
> index 2140706a..de6cbddf 100644
> --- a/core/nvram.c
> +++ b/core/nvram.c
> @@ -122,6 +122,41 @@ void nvram_read_complete(bool success)
>  	nvram_ready = true;
>  }
>  
> +bool nvram_wait_for_load(void)
> +{
> +	/* Short cut */
> +	if (nvram_ready)
> +		return true;
> +
> +	/* Tell the caller it will never happen */
> +	if (!platform.nvram_info)
> +		return false;

here I think we should return true instead, as it's as 'ready' as it's
ever going to be, it's just not ever going to be usable.

So for the purpose of "do we need to log some error about not being able
to wait for nvram to be ready", the answer is no, which is pretty much
what the return value of this function is saying.

make sense?

Of course, what doesn't make sense is that I now have a meeting and thus
I haven't looked at the rest of this patch yet
Cyril Bur Nov. 13, 2017, 11:39 p.m. UTC | #3
On Mon, 2017-11-13 at 17:33 +1100, Stewart Smith wrote:
> Cyril Bur <cyril.bur@au1.ibm.com> writes:
> > commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
> > from NVRAM rather early. This works fine on BMC based systems as
> > nvram_init() is actually synchronous. This is not true for FSP systems
> > and it turns out that the query for the console log level simply
> > queries blank nvram.
> > 
> > The simple fix is to wait for the NVRAM read to complete before
> > performing any query. Unfortunately it turns out that the fsp-nvram
> > code does not inform the generic NVRAM layer when the read is complete,
> > rather, it must be prompted to do so.
> > 
> > This patch addresses both these problems. This patch adds a check before
> > the first read of the NVRAM (for the console log level) that the read
> > has completed. The fsp-nvram code has been updated to inform the generic
> > layer as soon as the read completes.
> > 
> > The old prompt to the fsp-nvram code has been removed but a check to
> > ensure that the NVRAM has been loaded remains. It is conservative but
> > if the NVRAM is not done loading before the host is booted it will not
> > have an nvram device-tree node which means it won't be able to access
> > the NVRAM at all, ever, even after the NVRAM has loaded.
> > 
> > Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
> > ---
> >  core/init.c                  | 11 +++++++++--
> >  core/nvram-format.c          |  5 +++++
> >  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
> >  core/test/run-nvram-format.c |  5 +++++
> >  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
> >  include/fsp.h                |  1 -
> >  include/nvram.h              |  2 ++
> >  7 files changed, 60 insertions(+), 27 deletions(-)
> > 
> > diff --git a/core/init.c b/core/init.c
> > index 8951e17b..17de9851 100644
> > --- a/core/init.c
> > +++ b/core/init.c
> > @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
> >  		/* We wait for the nvram read to complete here so we can
> >  		 * grab stuff from there such as the kernel arguments
> >  		 */
> > -		fsp_nvram_wait_open();
> > +		nvram_wait_for_load();
> >  
> >  		/* Wait for FW VPD data read to complete */
> >  		fsp_code_update_wait_vpd(true);
> > @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
> >  	/* preload the IMC catalog dtb */
> >  	imc_catalog_preload();
> >  
> > -	/* Set the console level */
> > +	/*
> > +	 * Set the console level
> > +	 * We need NVRAM to have been setup by now otherwise we'll
> > +	 * just read a zeroed buffer.
> > +	 * Some FSP systems can actually be quite slow to return the
> > +	 * NVRAM.
> > +	 */
> > +	nvram_wait_for_load();
> >  	console_log_level();
> >  
> >  	/* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
> > diff --git a/core/nvram-format.c b/core/nvram-format.c
> > index 923098af..655cb8ee 100644
> > --- a/core/nvram-format.c
> > +++ b/core/nvram-format.c
> > @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
> >  	const char *part_end, *start;
> >  	int key_len = strlen(key);
> >  
> > +	if (!nvram_has_loaded()) {
> > +		prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
> > +		return NULL;
> > +	}
> > +
> >  	/*
> >  	 * The running OS can modify the NVRAM as it pleases so we need to be
> >  	 * a little paranoid and check that it's ok before we try parse it.
> > diff --git a/core/nvram.c b/core/nvram.c
> > index 2140706a..de6cbddf 100644
> > --- a/core/nvram.c
> > +++ b/core/nvram.c
> > @@ -122,6 +122,41 @@ void nvram_read_complete(bool success)
> >  	nvram_ready = true;
> >  }
> >  
> > +bool nvram_wait_for_load(void)
> > +{
> > +	/* Short cut */
> > +	if (nvram_ready)
> > +		return true;
> > +
> > +	/* Tell the caller it will never happen */
> > +	if (!platform.nvram_info)
> > +		return false;
> 
> here I think we should return true instead, as it's as 'ready' as it's
> ever going to be, it's just not ever going to be usable.
> 
> So for the purpose of "do we need to log some error about not being able
> to wait for nvram to be ready", the answer is no, which is pretty much
> what the return value of this function is saying.
> 
> make sense?
> 

Sure, I'll add that to a v2 respin. If we're going to do that I'll also
rename the function to nvram_wait_for_ready().

> Of course, what doesn't make sense is that I now have a meeting and thus
> I haven't looked at the rest of this patch yet
>
Oliver O'Halloran Nov. 14, 2017, 12:39 a.m. UTC | #4
On Tue, Nov 7, 2017 at 5:07 PM, Cyril Bur <cyril.bur@au1.ibm.com> wrote:
> commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
> from NVRAM rather early. This works fine on BMC based systems as
> nvram_init() is actually synchronous. This is not true for FSP systems
> and it turns out that the query for the console log level simply
> queries blank nvram.
>
> The simple fix is to wait for the NVRAM read to complete before
> performing any query. Unfortunately it turns out that the fsp-nvram
> code does not inform the generic NVRAM layer when the read is complete,
> rather, it must be prompted to do so.
>
> This patch addresses both these problems. This patch adds a check before
> the first read of the NVRAM (for the console log level) that the read
> has completed. The fsp-nvram code has been updated to inform the generic
> layer as soon as the read completes.
>
> The old prompt to the fsp-nvram code has been removed but a check to
> ensure that the NVRAM has been loaded remains. It is conservative but
> if the NVRAM is not done loading before the host is booted it will not
> have an nvram device-tree node which means it won't be able to access
> the NVRAM at all, ever, even after the NVRAM has loaded.
>
> Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
> ---
>  core/init.c                  | 11 +++++++++--
>  core/nvram-format.c          |  5 +++++
>  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
>  core/test/run-nvram-format.c |  5 +++++
>  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
>  include/fsp.h                |  1 -
>  include/nvram.h              |  2 ++
>  7 files changed, 60 insertions(+), 27 deletions(-)
>
> diff --git a/core/init.c b/core/init.c
> index 8951e17b..17de9851 100644
> --- a/core/init.c
> +++ b/core/init.c
> @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
>                 /* We wait for the nvram read to complete here so we can
>                  * grab stuff from there such as the kernel arguments
>                  */
> -               fsp_nvram_wait_open();
> +               nvram_wait_for_load();
>
>                 /* Wait for FW VPD data read to complete */
>                 fsp_code_update_wait_vpd(true);
> @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
>         /* preload the IMC catalog dtb */
>         imc_catalog_preload();
>
> -       /* Set the console level */
> +       /*
> +        * Set the console level
> +        * We need NVRAM to have been setup by now otherwise we'll
> +        * just read a zeroed buffer.
> +        * Some FSP systems can actually be quite slow to return the
> +        * NVRAM.
> +        */
> +       nvram_wait_for_load();
>         console_log_level();
>
>         /* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
> diff --git a/core/nvram-format.c b/core/nvram-format.c
> index 923098af..655cb8ee 100644
> --- a/core/nvram-format.c
> +++ b/core/nvram-format.c
> @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
>         const char *part_end, *start;
>         int key_len = strlen(key);
>

> +       if (!nvram_has_loaded()) {
> +               prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
> +               return NULL;
> +       }

I think this should just call nvram_wait_for_load() rather than
failing here. The problem
with this behaviour is that can end up with nvram_query() failing
intermittently on FSP
systems since loading the nvram is asynchronous there.

Given that we use nvram_query() for switching debug stuff on and off
I'd rather it was
as reliable as possible even if it slows booting slightly.

> +
>         /*
>          * The running OS can modify the NVRAM as it pleases so we need to be
>          * a little paranoid and check that it's ok before we try parse it.
> diff --git a/core/nvram.c b/core/nvram.c
> index 2140706a..de6cbddf 100644
> --- a/core/nvram.c
> +++ b/core/nvram.c
> @@ -122,6 +122,41 @@ void nvram_read_complete(bool success)
>         nvram_ready = true;
>  }
>
> +bool nvram_wait_for_load(void)
> +{
> +       /* Short cut */
> +       if (nvram_ready)
> +               return true;
> +
> +       /* Tell the caller it will never happen */
> +       if (!platform.nvram_info)
> +               return false;
> +
> +       /*
> +        * One of two things has happened here.
> +        * 1. nvram_wait_for_load() was called before nvram_init()
> +        * 2. The read of NVRAM failed.
> +        * Either way, this is quite a bad event.
> +        */
> +       if (!nvram_image && !nvram_size) {
> +               prlog(PR_CRIT, "NVRAM: Possible wait before nvram_init()!\n");
> +               return false;
> +       }
> +
> +       while (!nvram_ready) {
> +               opal_run_pollers();
> +               /* If the read fails, tell the caller */
> +               if (!nvram_image && !nvram_size)
> +                       return false;
> +       }
> +       return true;
> +}
> +
> +bool nvram_has_loaded(void)
> +{
> +       return nvram_ready;
> +}
> +
>  void nvram_init(void)
>  {
>         int rc;
> diff --git a/core/test/run-nvram-format.c b/core/test/run-nvram-format.c
> index 5bd8ea2a..e59fdfb6 100644
> --- a/core/test/run-nvram-format.c
> +++ b/core/test/run-nvram-format.c
> @@ -23,6 +23,11 @@ bool nvram_validate(void)
>         return true;
>  }
>
> +bool nvram_has_loaded(void)
> +{
> +       return true;
> +}
> +
>  static char *nvram_reset(void *nvram_image, int size)
>  {
>         struct chrp_nvram_hdr *h = nvram_image;
> diff --git a/hw/fsp/fsp-nvram.c b/hw/fsp/fsp-nvram.c
> index 85b7e815..52342fcf 100644
> --- a/hw/fsp/fsp-nvram.c
> +++ b/hw/fsp/fsp-nvram.c
> @@ -203,6 +203,10 @@ static void fsp_nvram_rd_complete(struct fsp_msg *msg)
>                  */
>         }
>         unlock(&fsp_nvram_lock);
> +       nvram_read_complete(fsp_nvram_state == NVRAM_STATE_OPEN);
> +       if (fsp_nvram_state != NVRAM_STATE_OPEN)
> +               log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
> +               "FSP: NVRAM not read, skipping init\n");
>  }
>
>  static void fsp_nvram_send_read(void)
> @@ -387,27 +391,3 @@ int fsp_nvram_write(uint32_t offset, void *src, uint32_t size)
>
>         return 0;
>  }
> -
> -/* This is called right before starting the payload (Linux) to
> - * ensure the initial open & read of nvram has happened before
> - * we transfer control as the guest OS. This is necessary as
> - * Linux will not handle a OPAL_BUSY return properly and treat
> - * it as an error
> - */
> -void fsp_nvram_wait_open(void)
> -{
> -       if (!fsp_present())
> -               return;
> -
> -       while(fsp_nvram_state == NVRAM_STATE_OPENING)
> -               opal_run_pollers();
> -
> -       if (!fsp_nvram_was_read) {
> -               log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
> -                       "FSP: NVRAM not read, skipping init\n");
> -               nvram_read_complete(false);
> -               return;
> -       }
> -
> -       nvram_read_complete(true);
> -}
> diff --git a/include/fsp.h b/include/fsp.h
> index 1d877d81..e4bcae02 100644
> --- a/include/fsp.h
> +++ b/include/fsp.h
> @@ -776,7 +776,6 @@ extern void fsp_used_by_console(void);
>  extern int fsp_nvram_info(uint32_t *total_size);
>  extern int fsp_nvram_start_read(void *dst, uint32_t src, uint32_t len);
>  extern int fsp_nvram_write(uint32_t offset, void *src, uint32_t size);
> -extern void fsp_nvram_wait_open(void);
>
>  /* RTC */
>  extern void fsp_rtc_init(void);
> diff --git a/include/nvram.h b/include/nvram.h
> index 288b5368..012c107f 100644
> --- a/include/nvram.h
> +++ b/include/nvram.h
> @@ -21,6 +21,8 @@ int nvram_format(void *nvram_image, uint32_t nvram_size);
>  int nvram_check(void *nvram_image, uint32_t nvram_size);
>  void nvram_reinit(void);
>  bool nvram_validate(void);
> +bool nvram_has_loaded(void);
> +bool nvram_wait_for_load(void);
>
>  const char *nvram_query(const char *name);
>  bool nvram_query_eq(const char *key, const char *value);
> --
> 2.15.0
>
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Cyril Bur Nov. 16, 2017, 11:50 p.m. UTC | #5
On Tue, 2017-11-14 at 11:39 +1100, Oliver wrote:
> On Tue, Nov 7, 2017 at 5:07 PM, Cyril Bur <cyril.bur@au1.ibm.com> wrote:
> > commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
> > from NVRAM rather early. This works fine on BMC based systems as
> > nvram_init() is actually synchronous. This is not true for FSP systems
> > and it turns out that the query for the console log level simply
> > queries blank nvram.
> > 
> > The simple fix is to wait for the NVRAM read to complete before
> > performing any query. Unfortunately it turns out that the fsp-nvram
> > code does not inform the generic NVRAM layer when the read is complete,
> > rather, it must be prompted to do so.
> > 
> > This patch addresses both these problems. This patch adds a check before
> > the first read of the NVRAM (for the console log level) that the read
> > has completed. The fsp-nvram code has been updated to inform the generic
> > layer as soon as the read completes.
> > 
> > The old prompt to the fsp-nvram code has been removed but a check to
> > ensure that the NVRAM has been loaded remains. It is conservative but
> > if the NVRAM is not done loading before the host is booted it will not
> > have an nvram device-tree node which means it won't be able to access
> > the NVRAM at all, ever, even after the NVRAM has loaded.
> > 
> > Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
> > ---
> >  core/init.c                  | 11 +++++++++--
> >  core/nvram-format.c          |  5 +++++
> >  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
> >  core/test/run-nvram-format.c |  5 +++++
> >  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
> >  include/fsp.h                |  1 -
> >  include/nvram.h              |  2 ++
> >  7 files changed, 60 insertions(+), 27 deletions(-)
> > 
> > diff --git a/core/init.c b/core/init.c
> > index 8951e17b..17de9851 100644
> > --- a/core/init.c
> > +++ b/core/init.c
> > @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
> >                 /* We wait for the nvram read to complete here so we can
> >                  * grab stuff from there such as the kernel arguments
> >                  */
> > -               fsp_nvram_wait_open();
> > +               nvram_wait_for_load();
> > 
> >                 /* Wait for FW VPD data read to complete */
> >                 fsp_code_update_wait_vpd(true);
> > @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
> >         /* preload the IMC catalog dtb */
> >         imc_catalog_preload();
> > 
> > -       /* Set the console level */
> > +       /*
> > +        * Set the console level
> > +        * We need NVRAM to have been setup by now otherwise we'll
> > +        * just read a zeroed buffer.
> > +        * Some FSP systems can actually be quite slow to return the
> > +        * NVRAM.
> > +        */
> > +       nvram_wait_for_load();
> >         console_log_level();
> > 
> >         /* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
> > diff --git a/core/nvram-format.c b/core/nvram-format.c
> > index 923098af..655cb8ee 100644
> > --- a/core/nvram-format.c
> > +++ b/core/nvram-format.c
> > @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
> >         const char *part_end, *start;
> >         int key_len = strlen(key);
> > 
> > +       if (!nvram_has_loaded()) {
> > +               prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
> > +               return NULL;
> > +       }
> 
> I think this should just call nvram_wait_for_load() rather than
> failing here. The problem
> with this behaviour is that can end up with nvram_query() failing
> intermittently on FSP
> systems since loading the nvram is asynchronous there.
> 
> Given that we use nvram_query() for switching debug stuff on and off
> I'd rather it was
> as reliable as possible even if it slows booting slightly.
> 

I think this goes back to - when do we call nvram_wait_for_load()...
and great... looks like my entire summary email never made it to the
list... nor can I find it anywhere...

It was a rather long email I don't feel like retyping...

Basically, either we do what Oliver suggests but this hides the (now)
blocking nature of nvram_query() which I was hesitant to do. I feel
like in skiboot it should be fairly obvious that something might block.

Another options is we leave what I have, which makes the blocking quite
obvious but does require the function to be explicitly called before
nvram_query() and more importantly before launching the bootkernel.

Finally, we put nvram_wait_for_load() at the end of nvram_init(), which
is a nicer place to have blocking since its obviously an init function.
The problem being that we might be able to nvram_init() long enough
before we actually need the nvram and we lose the benefit of the
asynchronous load. Having said that, as Oliver points out - we use
nvram_query() to switch debug stuff. We want to do that as early as
possible, so if there is going to be a blocking nvram_query() right
after nvram_init(), then it might as well be in nvram_init().

All this only applies to FSP, BMC nvram_init() is actually synchronous
at the moment (although it doesn't have to be).


Stewart, do you have any strong feelings?

> > +
> >         /*
> >          * The running OS can modify the NVRAM as it pleases so we need to be
> >          * a little paranoid and check that it's ok before we try parse it.
> > diff --git a/core/nvram.c b/core/nvram.c
> > index 2140706a..de6cbddf 100644
> > --- a/core/nvram.c
> > +++ b/core/nvram.c
> > @@ -122,6 +122,41 @@ void nvram_read_complete(bool success)
> >         nvram_ready = true;
> >  }
> > 
> > +bool nvram_wait_for_load(void)
> > +{
> > +       /* Short cut */
> > +       if (nvram_ready)
> > +               return true;
> > +
> > +       /* Tell the caller it will never happen */
> > +       if (!platform.nvram_info)
> > +               return false;
> > +
> > +       /*
> > +        * One of two things has happened here.
> > +        * 1. nvram_wait_for_load() was called before nvram_init()
> > +        * 2. The read of NVRAM failed.
> > +        * Either way, this is quite a bad event.
> > +        */
> > +       if (!nvram_image && !nvram_size) {
> > +               prlog(PR_CRIT, "NVRAM: Possible wait before nvram_init()!\n");
> > +               return false;
> > +       }
> > +
> > +       while (!nvram_ready) {
> > +               opal_run_pollers();
> > +               /* If the read fails, tell the caller */
> > +               if (!nvram_image && !nvram_size)
> > +                       return false;
> > +       }
> > +       return true;
> > +}
> > +
> > +bool nvram_has_loaded(void)
> > +{
> > +       return nvram_ready;
> > +}
> > +
> >  void nvram_init(void)
> >  {
> >         int rc;
> > diff --git a/core/test/run-nvram-format.c b/core/test/run-nvram-format.c
> > index 5bd8ea2a..e59fdfb6 100644
> > --- a/core/test/run-nvram-format.c
> > +++ b/core/test/run-nvram-format.c
> > @@ -23,6 +23,11 @@ bool nvram_validate(void)
> >         return true;
> >  }
> > 
> > +bool nvram_has_loaded(void)
> > +{
> > +       return true;
> > +}
> > +
> >  static char *nvram_reset(void *nvram_image, int size)
> >  {
> >         struct chrp_nvram_hdr *h = nvram_image;
> > diff --git a/hw/fsp/fsp-nvram.c b/hw/fsp/fsp-nvram.c
> > index 85b7e815..52342fcf 100644
> > --- a/hw/fsp/fsp-nvram.c
> > +++ b/hw/fsp/fsp-nvram.c
> > @@ -203,6 +203,10 @@ static void fsp_nvram_rd_complete(struct fsp_msg *msg)
> >                  */
> >         }
> >         unlock(&fsp_nvram_lock);
> > +       nvram_read_complete(fsp_nvram_state == NVRAM_STATE_OPEN);
> > +       if (fsp_nvram_state != NVRAM_STATE_OPEN)
> > +               log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
> > +               "FSP: NVRAM not read, skipping init\n");
> >  }
> > 
> >  static void fsp_nvram_send_read(void)
> > @@ -387,27 +391,3 @@ int fsp_nvram_write(uint32_t offset, void *src, uint32_t size)
> > 
> >         return 0;
> >  }
> > -
> > -/* This is called right before starting the payload (Linux) to
> > - * ensure the initial open & read of nvram has happened before
> > - * we transfer control as the guest OS. This is necessary as
> > - * Linux will not handle a OPAL_BUSY return properly and treat
> > - * it as an error
> > - */
> > -void fsp_nvram_wait_open(void)
> > -{
> > -       if (!fsp_present())
> > -               return;
> > -
> > -       while(fsp_nvram_state == NVRAM_STATE_OPENING)
> > -               opal_run_pollers();
> > -
> > -       if (!fsp_nvram_was_read) {
> > -               log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
> > -                       "FSP: NVRAM not read, skipping init\n");
> > -               nvram_read_complete(false);
> > -               return;
> > -       }
> > -
> > -       nvram_read_complete(true);
> > -}
> > diff --git a/include/fsp.h b/include/fsp.h
> > index 1d877d81..e4bcae02 100644
> > --- a/include/fsp.h
> > +++ b/include/fsp.h
> > @@ -776,7 +776,6 @@ extern void fsp_used_by_console(void);
> >  extern int fsp_nvram_info(uint32_t *total_size);
> >  extern int fsp_nvram_start_read(void *dst, uint32_t src, uint32_t len);
> >  extern int fsp_nvram_write(uint32_t offset, void *src, uint32_t size);
> > -extern void fsp_nvram_wait_open(void);
> > 
> >  /* RTC */
> >  extern void fsp_rtc_init(void);
> > diff --git a/include/nvram.h b/include/nvram.h
> > index 288b5368..012c107f 100644
> > --- a/include/nvram.h
> > +++ b/include/nvram.h
> > @@ -21,6 +21,8 @@ int nvram_format(void *nvram_image, uint32_t nvram_size);
> >  int nvram_check(void *nvram_image, uint32_t nvram_size);
> >  void nvram_reinit(void);
> >  bool nvram_validate(void);
> > +bool nvram_has_loaded(void);
> > +bool nvram_wait_for_load(void);
> > 
> >  const char *nvram_query(const char *name);
> >  bool nvram_query_eq(const char *key, const char *value);
> > --
> > 2.15.0
> > 
> > _______________________________________________
> > Skiboot mailing list
> > Skiboot@lists.ozlabs.org
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ozlabs.org_listinfo_skiboot&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=enAa2wJuZFP2PsCNSamrs-fnBSp0sgKE8NVKBN2MoOk&m=T66gB1K3D15DxA1C_88UWikIJnnFzHR2tstA-B9Q7DE&s=gM3LwS7hmgKQV0R9oTgr22KZaNJ6woQPOaFpqGqGF-s&e=
> 
>
Stewart Smith Nov. 30, 2017, 4:50 a.m. UTC | #6
Cyril Bur <cyril.bur@au1.ibm.com> writes:
> On Tue, 2017-11-14 at 11:39 +1100, Oliver wrote:
>> On Tue, Nov 7, 2017 at 5:07 PM, Cyril Bur <cyril.bur@au1.ibm.com> wrote:
>> > commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
>> > from NVRAM rather early. This works fine on BMC based systems as
>> > nvram_init() is actually synchronous. This is not true for FSP systems
>> > and it turns out that the query for the console log level simply
>> > queries blank nvram.
>> > 
>> > The simple fix is to wait for the NVRAM read to complete before
>> > performing any query. Unfortunately it turns out that the fsp-nvram
>> > code does not inform the generic NVRAM layer when the read is complete,
>> > rather, it must be prompted to do so.
>> > 
>> > This patch addresses both these problems. This patch adds a check before
>> > the first read of the NVRAM (for the console log level) that the read
>> > has completed. The fsp-nvram code has been updated to inform the generic
>> > layer as soon as the read completes.
>> > 
>> > The old prompt to the fsp-nvram code has been removed but a check to
>> > ensure that the NVRAM has been loaded remains. It is conservative but
>> > if the NVRAM is not done loading before the host is booted it will not
>> > have an nvram device-tree node which means it won't be able to access
>> > the NVRAM at all, ever, even after the NVRAM has loaded.
>> > 
>> > Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
>> > ---
>> >  core/init.c                  | 11 +++++++++--
>> >  core/nvram-format.c          |  5 +++++
>> >  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
>> >  core/test/run-nvram-format.c |  5 +++++
>> >  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
>> >  include/fsp.h                |  1 -
>> >  include/nvram.h              |  2 ++
>> >  7 files changed, 60 insertions(+), 27 deletions(-)
>> > 
>> > diff --git a/core/init.c b/core/init.c
>> > index 8951e17b..17de9851 100644
>> > --- a/core/init.c
>> > +++ b/core/init.c
>> > @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
>> >                 /* We wait for the nvram read to complete here so we can
>> >                  * grab stuff from there such as the kernel arguments
>> >                  */
>> > -               fsp_nvram_wait_open();
>> > +               nvram_wait_for_load();
>> > 
>> >                 /* Wait for FW VPD data read to complete */
>> >                 fsp_code_update_wait_vpd(true);
>> > @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
>> >         /* preload the IMC catalog dtb */
>> >         imc_catalog_preload();
>> > 
>> > -       /* Set the console level */
>> > +       /*
>> > +        * Set the console level
>> > +        * We need NVRAM to have been setup by now otherwise we'll
>> > +        * just read a zeroed buffer.
>> > +        * Some FSP systems can actually be quite slow to return the
>> > +        * NVRAM.
>> > +        */
>> > +       nvram_wait_for_load();
>> >         console_log_level();
>> > 
>> >         /* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
>> > diff --git a/core/nvram-format.c b/core/nvram-format.c
>> > index 923098af..655cb8ee 100644
>> > --- a/core/nvram-format.c
>> > +++ b/core/nvram-format.c
>> > @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
>> >         const char *part_end, *start;
>> >         int key_len = strlen(key);
>> > 
>> > +       if (!nvram_has_loaded()) {
>> > +               prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
>> > +               return NULL;
>> > +       }
>> 
>> I think this should just call nvram_wait_for_load() rather than
>> failing here. The problem
>> with this behaviour is that can end up with nvram_query() failing
>> intermittently on FSP
>> systems since loading the nvram is asynchronous there.
>> 
>> Given that we use nvram_query() for switching debug stuff on and off
>> I'd rather it was
>> as reliable as possible even if it slows booting slightly.
>> 
>
> I think this goes back to - when do we call nvram_wait_for_load()...
> and great... looks like my entire summary email never made it to the
> list... nor can I find it anywhere...
>
> It was a rather long email I don't feel like retyping...
>
> Basically, either we do what Oliver suggests but this hides the (now)
> blocking nature of nvram_query() which I was hesitant to do. I feel
> like in skiboot it should be fairly obvious that something might block.
>
> Another options is we leave what I have, which makes the blocking quite
> obvious but does require the function to be explicitly called before
> nvram_query() and more importantly before launching the bootkernel.
>
> Finally, we put nvram_wait_for_load() at the end of nvram_init(), which
> is a nicer place to have blocking since its obviously an init function.
> The problem being that we might be able to nvram_init() long enough
> before we actually need the nvram and we lose the benefit of the
> asynchronous load. Having said that, as Oliver points out - we use
> nvram_query() to switch debug stuff. We want to do that as early as
> possible, so if there is going to be a blocking nvram_query() right
> after nvram_init(), then it might as well be in nvram_init().
>
> All this only applies to FSP, BMC nvram_init() is actually synchronous
> at the moment (although it doesn't have to be).
>
>
> Stewart, do you have any strong feelings?

I've gone back and forth a bit.... I think the best way would be to
print the warning and then wait for it to be loaded anyway. THat way,
things still work if somebody has to put in early debug and not sort
through the how-we-load-nvram-mess, but we'll know that we need to fix
something before shipping if we do.
Cyril Bur Dec. 1, 2017, 12:10 a.m. UTC | #7
On Thu, 2017-11-30 at 15:50 +1100, Stewart Smith wrote:
> Cyril Bur <cyril.bur@au1.ibm.com> writes:
> > On Tue, 2017-11-14 at 11:39 +1100, Oliver wrote:
> > > On Tue, Nov 7, 2017 at 5:07 PM, Cyril Bur <cyril.bur@au1.ibm.com> wrote:
> > > > commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
> > > > from NVRAM rather early. This works fine on BMC based systems as
> > > > nvram_init() is actually synchronous. This is not true for FSP systems
> > > > and it turns out that the query for the console log level simply
> > > > queries blank nvram.
> > > > 
> > > > The simple fix is to wait for the NVRAM read to complete before
> > > > performing any query. Unfortunately it turns out that the fsp-nvram
> > > > code does not inform the generic NVRAM layer when the read is complete,
> > > > rather, it must be prompted to do so.
> > > > 
> > > > This patch addresses both these problems. This patch adds a check before
> > > > the first read of the NVRAM (for the console log level) that the read
> > > > has completed. The fsp-nvram code has been updated to inform the generic
> > > > layer as soon as the read completes.
> > > > 
> > > > The old prompt to the fsp-nvram code has been removed but a check to
> > > > ensure that the NVRAM has been loaded remains. It is conservative but
> > > > if the NVRAM is not done loading before the host is booted it will not
> > > > have an nvram device-tree node which means it won't be able to access
> > > > the NVRAM at all, ever, even after the NVRAM has loaded.
> > > > 
> > > > Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
> > > > ---
> > > >  core/init.c                  | 11 +++++++++--
> > > >  core/nvram-format.c          |  5 +++++
> > > >  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
> > > >  core/test/run-nvram-format.c |  5 +++++
> > > >  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
> > > >  include/fsp.h                |  1 -
> > > >  include/nvram.h              |  2 ++
> > > >  7 files changed, 60 insertions(+), 27 deletions(-)
> > > > 
> > > > diff --git a/core/init.c b/core/init.c
> > > > index 8951e17b..17de9851 100644
> > > > --- a/core/init.c
> > > > +++ b/core/init.c
> > > > @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
> > > >                 /* We wait for the nvram read to complete here so we can
> > > >                  * grab stuff from there such as the kernel arguments
> > > >                  */
> > > > -               fsp_nvram_wait_open();
> > > > +               nvram_wait_for_load();
> > > > 
> > > >                 /* Wait for FW VPD data read to complete */
> > > >                 fsp_code_update_wait_vpd(true);
> > > > @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
> > > >         /* preload the IMC catalog dtb */
> > > >         imc_catalog_preload();
> > > > 
> > > > -       /* Set the console level */
> > > > +       /*
> > > > +        * Set the console level
> > > > +        * We need NVRAM to have been setup by now otherwise we'll
> > > > +        * just read a zeroed buffer.
> > > > +        * Some FSP systems can actually be quite slow to return the
> > > > +        * NVRAM.
> > > > +        */
> > > > +       nvram_wait_for_load();
> > > >         console_log_level();
> > > > 
> > > >         /* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
> > > > diff --git a/core/nvram-format.c b/core/nvram-format.c
> > > > index 923098af..655cb8ee 100644
> > > > --- a/core/nvram-format.c
> > > > +++ b/core/nvram-format.c
> > > > @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
> > > >         const char *part_end, *start;
> > > >         int key_len = strlen(key);
> > > > 
> > > > +       if (!nvram_has_loaded()) {
> > > > +               prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
> > > > +               return NULL;
> > > > +       }
> > > 
> > > I think this should just call nvram_wait_for_load() rather than
> > > failing here. The problem
> > > with this behaviour is that can end up with nvram_query() failing
> > > intermittently on FSP
> > > systems since loading the nvram is asynchronous there.
> > > 
> > > Given that we use nvram_query() for switching debug stuff on and off
> > > I'd rather it was
> > > as reliable as possible even if it slows booting slightly.
> > > 
> > 
> > I think this goes back to - when do we call nvram_wait_for_load()...
> > and great... looks like my entire summary email never made it to the
> > list... nor can I find it anywhere...
> > 
> > It was a rather long email I don't feel like retyping...
> > 
> > Basically, either we do what Oliver suggests but this hides the (now)
> > blocking nature of nvram_query() which I was hesitant to do. I feel
> > like in skiboot it should be fairly obvious that something might block.
> > 
> > Another options is we leave what I have, which makes the blocking quite
> > obvious but does require the function to be explicitly called before
> > nvram_query() and more importantly before launching the bootkernel.
> > 
> > Finally, we put nvram_wait_for_load() at the end of nvram_init(), which
> > is a nicer place to have blocking since its obviously an init function.
> > The problem being that we might be able to nvram_init() long enough
> > before we actually need the nvram and we lose the benefit of the
> > asynchronous load. Having said that, as Oliver points out - we use
> > nvram_query() to switch debug stuff. We want to do that as early as
> > possible, so if there is going to be a blocking nvram_query() right
> > after nvram_init(), then it might as well be in nvram_init().
> > 
> > All this only applies to FSP, BMC nvram_init() is actually synchronous
> > at the moment (although it doesn't have to be).
> > 
> > 
> > Stewart, do you have any strong feelings?
> 
> I've gone back and forth a bit.... I think the best way would be to
> print the warning and then wait for it to be loaded anyway. THat way,
> things still work if somebody has to put in early debug and not sort
> through the how-we-load-nvram-mess, but we'll know that we need to fix
> something before shipping if we do.
> 

So the in nvram_query() wait with a warning. Probably easier to speak
in code - I'll send a patch.

Cyril

>
diff mbox series

Patch

diff --git a/core/init.c b/core/init.c
index 8951e17b..17de9851 100644
--- a/core/init.c
+++ b/core/init.c
@@ -497,7 +497,7 @@  void __noreturn load_and_boot_kernel(bool is_reboot)
 		/* We wait for the nvram read to complete here so we can
 		 * grab stuff from there such as the kernel arguments
 		 */
-		fsp_nvram_wait_open();
+		nvram_wait_for_load();
 
 		/* Wait for FW VPD data read to complete */
 		fsp_code_update_wait_vpd(true);
@@ -987,7 +987,14 @@  void __noreturn __nomcount main_cpu_entry(const void *fdt)
 	/* preload the IMC catalog dtb */
 	imc_catalog_preload();
 
-	/* Set the console level */
+	/*
+	 * Set the console level
+	 * We need NVRAM to have been setup by now otherwise we'll
+	 * just read a zeroed buffer.
+	 * Some FSP systems can actually be quite slow to return the
+	 * NVRAM.
+	 */
+	nvram_wait_for_load();
 	console_log_level();
 
 	/* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
diff --git a/core/nvram-format.c b/core/nvram-format.c
index 923098af..655cb8ee 100644
--- a/core/nvram-format.c
+++ b/core/nvram-format.c
@@ -216,6 +216,11 @@  const char *nvram_query(const char *key)
 	const char *part_end, *start;
 	int key_len = strlen(key);
 
+	if (!nvram_has_loaded()) {
+		prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
+		return NULL;
+	}
+
 	/*
 	 * The running OS can modify the NVRAM as it pleases so we need to be
 	 * a little paranoid and check that it's ok before we try parse it.
diff --git a/core/nvram.c b/core/nvram.c
index 2140706a..de6cbddf 100644
--- a/core/nvram.c
+++ b/core/nvram.c
@@ -122,6 +122,41 @@  void nvram_read_complete(bool success)
 	nvram_ready = true;
 }
 
+bool nvram_wait_for_load(void)
+{
+	/* Short cut */
+	if (nvram_ready)
+		return true;
+
+	/* Tell the caller it will never happen */
+	if (!platform.nvram_info)
+		return false;
+
+	/*
+	 * One of two things has happened here.
+	 * 1. nvram_wait_for_load() was called before nvram_init()
+	 * 2. The read of NVRAM failed.
+	 * Either way, this is quite a bad event.
+	 */
+	if (!nvram_image && !nvram_size) {
+		prlog(PR_CRIT, "NVRAM: Possible wait before nvram_init()!\n");
+		return false;
+	}
+
+	while (!nvram_ready) {
+		opal_run_pollers();
+		/* If the read fails, tell the caller */
+		if (!nvram_image && !nvram_size)
+			return false;
+	}
+	return true;
+}
+
+bool nvram_has_loaded(void)
+{
+	return nvram_ready;
+}
+
 void nvram_init(void)
 {
 	int rc;
diff --git a/core/test/run-nvram-format.c b/core/test/run-nvram-format.c
index 5bd8ea2a..e59fdfb6 100644
--- a/core/test/run-nvram-format.c
+++ b/core/test/run-nvram-format.c
@@ -23,6 +23,11 @@  bool nvram_validate(void)
 	return true;
 }
 
+bool nvram_has_loaded(void)
+{
+	return true;
+}
+
 static char *nvram_reset(void *nvram_image, int size)
 {
 	struct chrp_nvram_hdr *h = nvram_image;
diff --git a/hw/fsp/fsp-nvram.c b/hw/fsp/fsp-nvram.c
index 85b7e815..52342fcf 100644
--- a/hw/fsp/fsp-nvram.c
+++ b/hw/fsp/fsp-nvram.c
@@ -203,6 +203,10 @@  static void fsp_nvram_rd_complete(struct fsp_msg *msg)
 		 */
 	}
 	unlock(&fsp_nvram_lock);
+	nvram_read_complete(fsp_nvram_state == NVRAM_STATE_OPEN);
+	if (fsp_nvram_state != NVRAM_STATE_OPEN)
+		log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
+		"FSP: NVRAM not read, skipping init\n");
 }
 
 static void fsp_nvram_send_read(void)
@@ -387,27 +391,3 @@  int fsp_nvram_write(uint32_t offset, void *src, uint32_t size)
 
 	return 0;
 }
-
-/* This is called right before starting the payload (Linux) to
- * ensure the initial open & read of nvram has happened before
- * we transfer control as the guest OS. This is necessary as
- * Linux will not handle a OPAL_BUSY return properly and treat
- * it as an error
- */
-void fsp_nvram_wait_open(void)
-{
-	if (!fsp_present())
-		return;
-
-	while(fsp_nvram_state == NVRAM_STATE_OPENING)
-		opal_run_pollers();
-
-	if (!fsp_nvram_was_read) {
-		log_simple_error(&e_info(OPAL_RC_NVRAM_INIT),
-			"FSP: NVRAM not read, skipping init\n");
-		nvram_read_complete(false);
-		return;
-	}
-
-	nvram_read_complete(true);
-}
diff --git a/include/fsp.h b/include/fsp.h
index 1d877d81..e4bcae02 100644
--- a/include/fsp.h
+++ b/include/fsp.h
@@ -776,7 +776,6 @@  extern void fsp_used_by_console(void);
 extern int fsp_nvram_info(uint32_t *total_size);
 extern int fsp_nvram_start_read(void *dst, uint32_t src, uint32_t len);
 extern int fsp_nvram_write(uint32_t offset, void *src, uint32_t size);
-extern void fsp_nvram_wait_open(void);
 
 /* RTC */
 extern void fsp_rtc_init(void);
diff --git a/include/nvram.h b/include/nvram.h
index 288b5368..012c107f 100644
--- a/include/nvram.h
+++ b/include/nvram.h
@@ -21,6 +21,8 @@  int nvram_format(void *nvram_image, uint32_t nvram_size);
 int nvram_check(void *nvram_image, uint32_t nvram_size);
 void nvram_reinit(void);
 bool nvram_validate(void);
+bool nvram_has_loaded(void);
+bool nvram_wait_for_load(void);
 
 const char *nvram_query(const char *name);
 bool nvram_query_eq(const char *key, const char *value);