diff mbox

[U-Boot,RFC] Pre-console buffer for ARM

Message ID 1314638517-20178-1-git-send-email-sjg@chromium.org
State New, archived
Headers show

Commit Message

Simon Glass Aug. 29, 2011, 5:21 p.m. UTC
This patch is for Graeme to take a look at. I found that have_console is
not a flag yet. Also a few tidy-ups to handle buffer overflow and to avoid
printing a 'dumping buffer' message when nothing was outputted.

Also I tried to simplify the maths for the location of the pre-console buffer.
IMO this should be done in board.c though.

Signed-off-by: Simon Glass <sjg@chromium.org>
---
 arch/arm/include/asm/global_data.h |    1 +
 common/console.c                   |   36 ++++++++++++++++++++++++++++++++++++
 include/configs/tegra2-common.h    |    8 ++++++--
 3 files changed, 43 insertions(+), 2 deletions(-)

Comments

Mike Frysinger Aug. 29, 2011, 7:20 p.m. UTC | #1
On Monday, August 29, 2011 13:21:57 Simon Glass wrote:
> +void pre_console_putc(const char c)
> +{
> +	char *buffer =  (char *)CONFIG_SYS_TMP_CON_BUF_ADDR;

excess space after the "="

> +	if (gd->con_buf_idx < CONFIG_SYS_TMP_CON_BUF_SZ)
> +		buffer[gd->con_buf_idx++] = c;

seems like a circular buffer would make more sense ... usually the part of the 
log you want is the last chunk and not the first
-mike
Simon Glass Aug. 29, 2011, 7:42 p.m. UTC | #2
Hi Mike,

On Mon, Aug 29, 2011 at 12:20 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, August 29, 2011 13:21:57 Simon Glass wrote:
>> +void pre_console_putc(const char c)
>> +{
>> +     char *buffer =  (char *)CONFIG_SYS_TMP_CON_BUF_ADDR;
>
> excess space after the "="

Thanks - I will leave this to Graeme to tidy as our patches overlap
(and most of the patch is just his).

>
>> +     if (gd->con_buf_idx < CONFIG_SYS_TMP_CON_BUF_SZ)
>> +             buffer[gd->con_buf_idx++] = c;
>
> seems like a circular buffer would make more sense ... usually the part of the
> log you want is the last chunk and not the first

Yes I agree, although if you have more than 1KB of data it might be a bug.

Regards,
SImon

> -mike
>
Mike Frysinger Aug. 29, 2011, 8:10 p.m. UTC | #3
On Monday, August 29, 2011 15:42:23 Simon Glass wrote:
> On Mon, Aug 29, 2011 at 12:20 PM, Mike Frysinger wrote:
> > On Monday, August 29, 2011 13:21:57 Simon Glass wrote:
> >> +     if (gd->con_buf_idx < CONFIG_SYS_TMP_CON_BUF_SZ)
> >> +             buffer[gd->con_buf_idx++] = c;
> > 
> > seems like a circular buffer would make more sense ... usually the part
> > of the log you want is the last chunk and not the first
> 
> Yes I agree, although if you have more than 1KB of data it might be a bug.

give people a foot and they'll take 1MiB :p

it's fairly easy as well:
#define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;
-mike
Graeme Russ Aug. 29, 2011, 10:49 p.m. UTC | #4
Hi Simon

On Tue, Aug 30, 2011 at 3:21 AM, Simon Glass <sjg@chromium.org> wrote:
> This patch is for Graeme to take a look at. I found that have_console is
> not a flag yet. Also a few tidy-ups to handle buffer overflow and to avoid
> printing a 'dumping buffer' message when nothing was outputted.
>
> Also I tried to simplify the maths for the location of the pre-console buffer.
> IMO this should be done in board.c though.

[snip]

Have you seen my latest pre-console buffer series?:

http://lists.denx.de/pipermail/u-boot/2011-August/099607.html

I think that is a better base to work from as I think we can also fold in
your 'Early panic support' series as well - I'll comment on that series
seperately

Regards,

Graeme
Graeme Russ Aug. 29, 2011, 11 p.m. UTC | #5
Hi Mike,

On Tue, Aug 30, 2011 at 6:10 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, August 29, 2011 15:42:23 Simon Glass wrote:
>> On Mon, Aug 29, 2011 at 12:20 PM, Mike Frysinger wrote:
>> > On Monday, August 29, 2011 13:21:57 Simon Glass wrote:
>> >> +     if (gd->con_buf_idx < CONFIG_SYS_TMP_CON_BUF_SZ)
>> >> +             buffer[gd->con_buf_idx++] = c;
>> >
>> > seems like a circular buffer would make more sense ... usually the part
>> > of the log you want is the last chunk and not the first
>>

Well you would need an 'overflow' flag and then based on that you would need
to do two printf()'s when dumping the buffer - One from the 'index to end'
which is the 'top' of the buffer and one from 'start to index' which is the
bottom.

>> Yes I agree, although if you have more than 1KB of data it might be a bug.

I personally don't see the need - I expect the amount of pre-console output
would be faily limited considering that the board should do everything it
can to initialise console as early as possible.

>
> give people a foot and they'll take 1MiB :p
>
> it's fairly easy as well:
> #define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
> buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;

But does that work for non power-of-two buffer sizes...

Buffer size = 100 = 1100100
(size - 1)        = 1100011
idx = 100         = 1100100
idx & (size - 1)  = 1100000

Nope :(

Regards,

Graeme
Simon Glass Aug. 29, 2011, 11:01 p.m. UTC | #6
Hi Graeme,

On Mon, Aug 29, 2011 at 3:49 PM, Graeme Russ <graeme.russ@gmail.com> wrote:
> Hi Simon
>
> On Tue, Aug 30, 2011 at 3:21 AM, Simon Glass <sjg@chromium.org> wrote:
>> This patch is for Graeme to take a look at. I found that have_console is
>> not a flag yet. Also a few tidy-ups to handle buffer overflow and to avoid
>> printing a 'dumping buffer' message when nothing was outputted.
>>
>> Also I tried to simplify the maths for the location of the pre-console buffer.
>> IMO this should be done in board.c though.
>
> [snip]
>
> Have you seen my latest pre-console buffer series?:
>
> http://lists.denx.de/pipermail/u-boot/2011-August/099607.html

No - I foolishly didn't look at the mailing list in those 4 hours :-)

Will take a look, thanks.

Regards,
Simon

>
> I think that is a better base to work from as I think we can also fold in
> your 'Early panic support' series as well - I'll comment on that series
> seperately
>
> Regards,
>
> Graeme
>
Mike Frysinger Aug. 30, 2011, 1:32 a.m. UTC | #7
On Monday, August 29, 2011 19:00:46 Graeme Russ wrote:
> On Tue, Aug 30, 2011 at 6:10 AM, Mike Frysinger wrote:
> > On Monday, August 29, 2011 15:42:23 Simon Glass wrote:
> >> On Mon, Aug 29, 2011 at 12:20 PM, Mike Frysinger wrote:
> >> > On Monday, August 29, 2011 13:21:57 Simon Glass wrote:
> >> >> +     if (gd->con_buf_idx < CONFIG_SYS_TMP_CON_BUF_SZ)
> >> >> +             buffer[gd->con_buf_idx++] = c;
> >> > 
> >> > seems like a circular buffer would make more sense ... usually the
> >> > part of the log you want is the last chunk and not the first
> 
> Well you would need an 'overflow' flag and then based on that you would
> need to do two printf()'s when dumping the buffer - One from the 'index to
> end' which is the 'top' of the buffer and one from 'start to index' which
> is the bottom.

you wouldnt need an overflow flag unless you wrote out more than 2^32 bytes.  
look at the logic again ... it isnt masking the write, it's masking the read.  
so by virtual of the con_buf_idx being larger than the max, you know you've 
wrapped around.

otherwise yes, you'd need to split up the writes.  not that big of a deal i 
dont think ...

> >> Yes I agree, although if you have more than 1KB of data it might be a
> >> bug.
> 
> I personally don't see the need - I expect the amount of pre-console output
> would be faily limited considering that the board should do everything it
> can to initialise console as early as possible.

until people hit an early fail and add a lot of debug printf's and then the 
output gets silently clipped.  it's confusing imo, and i say this having 
implemented early debug output in other systems (including linux) and seeing 
how people used/reacted to it.

> > give people a foot and they'll take 1MiB :p
> > 
> > it's fairly easy as well:
> > #define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
> > buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;
> 
> But does that work for non power-of-two buffer sizes...

no, but not that big of a deal.  so you get limited to the last 1KiB, 4KiB, 
8KiB, 16KiB, etc... amount of data.
-mike
Graeme Russ Aug. 30, 2011, 1:49 a.m. UTC | #8
Hi Mike,

On Tue, Aug 30, 2011 at 11:32 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, August 29, 2011 19:00:46 Graeme Russ wrote:
>> On Tue, Aug 30, 2011 at 6:10 AM, Mike Frysinger wrote:
>> > On Monday, August 29, 2011 15:42:23 Simon Glass wrote:
>> >> On Mon, Aug 29, 2011 at 12:20 PM, Mike Frysinger wrote:
>> >> > On Monday, August 29, 2011 13:21:57 Simon Glass wrote:
>> >> >> +     if (gd->con_buf_idx < CONFIG_SYS_TMP_CON_BUF_SZ)
>> >> >> +             buffer[gd->con_buf_idx++] = c;
>> >> >
>> >> > seems like a circular buffer would make more sense ... usually the
>> >> > part of the log you want is the last chunk and not the first
>>
>> Well you would need an 'overflow' flag and then based on that you would
>> need to do two printf()'s when dumping the buffer - One from the 'index to
>> end' which is the 'top' of the buffer and one from 'start to index' which
>> is the bottom.
>
> you wouldnt need an overflow flag unless you wrote out more than 2^32 bytes.
> look at the logic again ... it isnt masking the write, it's masking the read.
> so by virtual of the con_buf_idx being larger than the max, you know you've
> wrapped around.
>
> otherwise yes, you'd need to split up the writes.  not that big of a deal i
> dont think ...

OK, so you let gd->con_buf_idx increment always and modulo to get the array
index - fine if CONFIG_SYS_TMP_CON_BUF_SZ is ^2 - see below

>> >> Yes I agree, although if you have more than 1KB of data it might be a
>> >> bug.
>>
>> I personally don't see the need - I expect the amount of pre-console output
>> would be faily limited considering that the board should do everything it
>> can to initialise console as early as possible.
>
> until people hit an early fail and add a lot of debug printf's and then the
> output gets silently clipped.  it's confusing imo, and i say this having
> implemented early debug output in other systems (including linux) and seeing
> how people used/reacted to it.

We can easily have a 'output clipped' so people know what happened and can
trim their debugging messages accordingly

>> > give people a foot and they'll take 1MiB :p
>> >
>> > it's fairly easy as well:
>> > #define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
>> > buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;
>>
>> But does that work for non power-of-two buffer sizes...
>
> no, but not that big of a deal.  so you get limited to the last 1KiB, 4KiB,
> 8KiB, 16KiB, etc... amount of data.

Until someone doesn't read the documentation and figures they can only
squeeze 200 bytes out of their L1 cache after making room for gd and stack
and then tries to print 201 bytes of debug info and trashes either the
stack or gd and then things start to get a lot weirder than simply having
their early debug messgaes clipped...

To be safe, CONFIG_SYS_TMP_CON_BUF_SZ would need to be checked for ^2 size
and now we only get 128 bytes rather than 200 :( - Better to add another
long in gd and get 196

Regards,

Graeme
Mike Frysinger Aug. 30, 2011, 2:50 a.m. UTC | #9
On Monday, August 29, 2011 21:49:33 Graeme Russ wrote:
> On Tue, Aug 30, 2011 at 11:32 AM, Mike Frysinger wrote:
> > On Monday, August 29, 2011 19:00:46 Graeme Russ wrote:
> >> On Tue, Aug 30, 2011 at 6:10 AM, Mike Frysinger wrote:
> >> > On Monday, August 29, 2011 15:42:23 Simon Glass wrote:
> >> >> Yes I agree, although if you have more than 1KB of data it might be a
> >> >> bug.
> >> 
> >> I personally don't see the need - I expect the amount of pre-console
> >> output would be faily limited considering that the board should do
> >> everything it can to initialise console as early as possible.
> > 
> > until people hit an early fail and add a lot of debug printf's and then
> > the output gets silently clipped.  it's confusing imo, and i say this
> > having implemented early debug output in other systems (including linux)
> > and seeing how people used/reacted to it.
> 
> We can easily have a 'output clipped' so people know what happened and can
> trim their debugging messages accordingly

much easier said than done.  clipping the head rather than the tail is much 
more often the right default.

> >> > give people a foot and they'll take 1MiB :p
> >> > 
> >> > it's fairly easy as well:
> >> > #define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
> >> > buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;
> >> 
> >> But does that work for non power-of-two buffer sizes...
> > 
> > no, but not that big of a deal.  so you get limited to the last 1KiB,
> > 4KiB, 8KiB, 16KiB, etc... amount of data.
> 
> Until someone doesn't read the documentation and figures they can only
> squeeze 200 bytes out of their L1 cache after making room for gd and stack
> and then tries to print 201 bytes of debug info and trashes either the
> stack or gd and then things start to get a lot weirder than simply having
> their early debug messgaes clipped...

this is really a non-issue as it's trivial to catch at CPP time:
#if (CONFIG_SYS_TMP_CON_BUF_SZ & (CONFIG_SYS_TMP_CON_BUF_SZ - 1))
# error "CONFIG_SYS_TMP_CON_BUF_SZ must be a power-of-two"
#endif

> To be safe, CONFIG_SYS_TMP_CON_BUF_SZ would need to be checked for ^2 size
> and now we only get 128 bytes rather than 200 :( - Better to add another
> long in gd and get 196

my systems dont have nearly the size restrictions yours do, so if you think 
that's better, that's fine by me.  i dont care so much about the internal 
implementation details of the circular buffer ... i only care that it is a 
circular buffer and not one that stops too soon.
-mike
Graeme Russ Aug. 30, 2011, 3:12 a.m. UTC | #10
Hi Mike,

On Tue, Aug 30, 2011 at 12:50 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, August 29, 2011 21:49:33 Graeme Russ wrote:
>> On Tue, Aug 30, 2011 at 11:32 AM, Mike Frysinger wrote:
>> > On Monday, August 29, 2011 19:00:46 Graeme Russ wrote:
>> >> On Tue, Aug 30, 2011 at 6:10 AM, Mike Frysinger wrote:
>> >> > On Monday, August 29, 2011 15:42:23 Simon Glass wrote:
>> >> >> Yes I agree, although if you have more than 1KB of data it might be a
>> >> >> bug.
>> >>
>> >> I personally don't see the need - I expect the amount of pre-console
>> >> output would be faily limited considering that the board should do
>> >> everything it can to initialise console as early as possible.
>> >
>> > until people hit an early fail and add a lot of debug printf's and then
>> > the output gets silently clipped.  it's confusing imo, and i say this
>> > having implemented early debug output in other systems (including linux)
>> > and seeing how people used/reacted to it.
>>
>> We can easily have a 'output clipped' so people know what happened and can
>> trim their debugging messages accordingly
>
> much easier said than done.  clipping the head rather than the tail is much
> more often the right default.

One could argue that not clipping at all is the _right_ default ;)

But I can easily add to print_pre_console_buffer()...

	if (gd->conf_buf_idx >= CONFIG_SYS_TMP_CON_BUF_SZ)
		puts("Pre-console buffer clipped")

>> >> > give people a foot and they'll take 1MiB :p
>> >> >
>> >> > it's fairly easy as well:
>> >> > #define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
>> >> > buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;
>> >>
>> >> But does that work for non power-of-two buffer sizes...
>> >
>> > no, but not that big of a deal.  so you get limited to the last 1KiB,
>> > 4KiB, 8KiB, 16KiB, etc... amount of data.
>>
>> Until someone doesn't read the documentation and figures they can only
>> squeeze 200 bytes out of their L1 cache after making room for gd and stack
>> and then tries to print 201 bytes of debug info and trashes either the
>> stack or gd and then things start to get a lot weirder than simply having
>> their early debug messgaes clipped...
>
> this is really a non-issue as it's trivial to catch at CPP time:
> #if (CONFIG_SYS_TMP_CON_BUF_SZ & (CONFIG_SYS_TMP_CON_BUF_SZ - 1))
> # error "CONFIG_SYS_TMP_CON_BUF_SZ must be a power-of-two"
> #endif

Nice - hadn't thought of that

>> To be safe, CONFIG_SYS_TMP_CON_BUF_SZ would need to be checked for ^2 size
>> and now we only get 128 bytes rather than 200 :( - Better to add another
>> long in gd and get 196
>
> my systems dont have nearly the size restrictions yours do, so if you think
> that's better, that's fine by me.  i dont care so much about the internal
> implementation details of the circular buffer ... i only care that it is a
> circular buffer and not one that stops too soon.

Well actually, I have a 12-year old CPU with 16kB of cache - More than
enough for initial stack, gd and pre-console buffer. It was actually
Wolgang who raised the size issue, but that was probably more aimed at
those that do not have the space for the buffer at all

I'll do another rev of the two-patch series implementing the 'power of
two' trap and circular buffer.

P.S. Can we move this discussion over to that thread now:

[PATCH 0/2] console: Squelch and pre-console buffer
http://lists.denx.de/pipermail/u-boot/2011-August/099607.html

Regards,

Graeme
Wolfgang Denk Aug. 30, 2011, 5:17 a.m. UTC | #11
Dear Graeme Russ,

In message <CALButCJF1p+60WOLtiJqhcAed-71VxmP2mBqY2wdNURPk5=Q+g@mail.gmail.com> you wrote:
> 
> >> > it's fairly easy as well:
> >> > #define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
> >> > buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;
> >>
> >> But does that work for non power-of-two buffer sizes...
> >
> > no, but not that big of a deal.  so you get limited to the last 1KiB, 4KiB,
> > 8KiB, 16KiB, etc... amount of data.
> 
> Until someone doesn't read the documentation and figures they can only
> squeeze 200 bytes out of their L1 cache after making room for gd and stack
> and then tries to print 201 bytes of debug info and trashes either the
> stack or gd and then things start to get a lot weirder than simply having
> their early debug messgaes clipped...
> 
> To be safe, CONFIG_SYS_TMP_CON_BUF_SZ would need to be checked for ^2 size
> and now we only get 128 bytes rather than 200 :( - Better to add another
> long in gd and get 196

Grrrgh.

If you want to support arbitrary buffer sizes, then just use

	#define CIRC_BUF_IDX(idx) ((idx) % CONFIG_SYS_TMP_CON_BUF_SZ)

Best regards,

Wolfgang Denk
Graeme Russ Aug. 30, 2011, 5:28 a.m. UTC | #12
Hi Wolfgang

On Tue, Aug 30, 2011 at 3:17 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Graeme Russ,
>
> In message <CALButCJF1p+60WOLtiJqhcAed-71VxmP2mBqY2wdNURPk5=Q+g@mail.gmail.com> you wrote:
>>
>> >> > it's fairly easy as well:
>> >> > #define CIRC_BUF_IDX(idx) ((idx) & (CONFIG_SYS_TMP_CON_BUF_SZ-1))
>> >> > buffer[CIRC_BUF_IDX(gd->conf_buf_idx++)] = c;
>> >>
>> >> But does that work for non power-of-two buffer sizes...
>> >
>> > no, but not that big of a deal.  so you get limited to the last 1KiB, 4KiB,
>> > 8KiB, 16KiB, etc... amount of data.
>>
>> Until someone doesn't read the documentation and figures they can only
>> squeeze 200 bytes out of their L1 cache after making room for gd and stack
>> and then tries to print 201 bytes of debug info and trashes either the
>> stack or gd and then things start to get a lot weirder than simply having
>> their early debug messgaes clipped...
>>
>> To be safe, CONFIG_SYS_TMP_CON_BUF_SZ would need to be checked for ^2 size
>> and now we only get 128 bytes rather than 200 :( - Better to add another
>> long in gd and get 196
>
> Grrrgh.
>
> If you want to support arbitrary buffer sizes, then just use
>
>        #define CIRC_BUF_IDX(idx) ((idx) % CONFIG_SYS_TMP_CON_BUF_SZ)

I know, but I was concerned that you wouldn't like the use of modulo
arithmetic for every putc() - But I suppose thats cheaper than a compare
plus branch...

If you prefer modulo over the 'must be a power of two' restriction, then
I am happy to do it that way instead

Regards,

Graeme
Wolfgang Denk Aug. 30, 2011, 9:29 a.m. UTC | #13
Dear Graeme Russ,

In message <CALButC+t=OUL9XgQnfyAdZmgPJRp7mdRwEE9LKogYHEg1D+m4g@mail.gmail.com> you wrote:
> 
> > If you want to support arbitrary buffer sizes, then just use
> >
> >        #define CIRC_BUF_IDX(idx) ((idx) % CONFIG_SYS_TMP_CON_BUF_SZ)
> 
> I know, but I was concerned that you wouldn't like the use of modulo
> arithmetic for every putc() - But I suppose thats cheaper than a compare
> plus branch...

Well, the difference of "i & (CONFIG_SYS_TMP_CON_BUF_SZ-1)" versus
"i % CONFIG_SYS_TMP_CON_BUF_SZ" is 1 versus 4 assembler instructions
on Power, and 2 versus 5 assembler instructions on ARM.  I think we
can tolerate this.

Best regards,

Wolfgang Denk
Graeme Russ Aug. 30, 2011, 9:37 a.m. UTC | #14
Hi Wolfgang,

On 30/08/11 19:29, Wolfgang Denk wrote:
> Dear Graeme Russ,
> 
> In message <CALButC+t=OUL9XgQnfyAdZmgPJRp7mdRwEE9LKogYHEg1D+m4g@mail.gmail.com> you wrote:
>>
>>> If you want to support arbitrary buffer sizes, then just use
>>>
>>>        #define CIRC_BUF_IDX(idx) ((idx) % CONFIG_SYS_TMP_CON_BUF_SZ)
>>
>> I know, but I was concerned that you wouldn't like the use of modulo
>> arithmetic for every putc() - But I suppose thats cheaper than a compare
>> plus branch...
> 
> Well, the difference of "i & (CONFIG_SYS_TMP_CON_BUF_SZ-1)" versus
> "i % CONFIG_SYS_TMP_CON_BUF_SZ" is 1 versus 4 assembler instructions
> on Power, and 2 versus 5 assembler instructions on ARM.  I think we
> can tolerate this.

And it's only hit during printf() before console is initialised :)

Respinning now

Regards,

Graeme
diff mbox

Patch

diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h
index 4fc51fd..8b028cc 100644
--- a/arch/arm/include/asm/global_data.h
+++ b/arch/arm/include/asm/global_data.h
@@ -71,6 +71,7 @@  typedef	struct	global_data {
 	unsigned long	start_addr_sp;	/* start_addr_stackpointer */
 	unsigned long	reloc_off;
 #if !(defined(CONFIG_SYS_ICACHE_OFF) && defined(CONFIG_SYS_DCACHE_OFF))
+	unsigned long	con_buf_idx;	/* Console buffer index */
 	unsigned long	tlb_addr;
 #endif
 	void		**jt;		/* jump table */
diff --git a/common/console.c b/common/console.c
index 8c650e0..2a7d2c8 100644
--- a/common/console.c
+++ b/common/console.c
@@ -323,6 +323,36 @@  int tstc(void)
 	return serial_tstc();
 }
 
+void pre_console_putc(const char c)
+{
+	char *buffer =  (char *)CONFIG_SYS_TMP_CON_BUF_ADDR;
+
+	if (gd->con_buf_idx < CONFIG_SYS_TMP_CON_BUF_SZ)
+		buffer[gd->con_buf_idx++] = c;
+}
+
+void pre_console_puts(const char *s)
+{
+	while (*s)
+		pre_console_putc(*s++);
+}
+
+void print_pre_console_buffer(void)
+{
+	char *buffer =  (char *)CONFIG_SYS_TMP_CON_BUF_ADDR;
+	int index = gd->con_buf_idx;
+
+	if (index) {
+		printf("console initialised - dumping pre-console buffer:\n");
+		index = min(index, CONFIG_SYS_TMP_CON_BUF_SZ - 1);
+		buffer[index] = '\0';
+		puts(buffer);
+		if (gd->con_buf_idx == CONFIG_SYS_TMP_CON_BUF_SZ)
+			puts("... (buffer overflowed)\n");
+		printf("buffer dumped\n");
+	}
+}
+
 void putc(const char c)
 {
 #ifdef CONFIG_SILENT_CONSOLE
@@ -341,6 +371,8 @@  void putc(const char c)
 	} else {
 		/* Send directly to the handler */
 		serial_putc(c);
+	} else {
+		pre_console_putc(c);
 	}
 }
 
@@ -362,6 +394,8 @@  void puts(const char *s)
 	} else {
 		/* Send directly to the handler */
 		serial_puts(s);
+	} else {
+		pre_console_puts(s);
 	}
 }
 
@@ -529,6 +563,8 @@  int console_init_f(void)
 		gd->flags |= GD_FLG_SILENT;
 #endif
 
+	print_pre_console_buffer();
+
 	return 0;
 }
 
diff --git a/include/configs/tegra2-common.h b/include/configs/tegra2-common.h
index 73e0f05..a42cce1 100644
--- a/include/configs/tegra2-common.h
+++ b/include/configs/tegra2-common.h
@@ -156,8 +156,12 @@ 
 
 #define CONFIG_SYS_INIT_RAM_ADDR	CONFIG_STACKBASE
 #define CONFIG_SYS_INIT_RAM_SIZE	CONFIG_SYS_MALLOC_LEN
-#define CONFIG_SYS_INIT_SP_ADDR		(CONFIG_SYS_INIT_RAM_ADDR + \
-						CONFIG_SYS_INIT_RAM_SIZE - \
+#define CONFIG_SYS_TMP_CON_BUF_SZ	(1 * 1024)
+#define CONFIG_SYS_INIT_RAM_TOP		(CONFIG_SYS_INIT_RAM_ADDR  + \
+						CONFIG_SYS_INIT_RAM_SIZE)
+#define CONFIG_SYS_TMP_CON_BUF_ADDR	(CONFIG_SYS_INIT_RAM_TOP - \
+					CONFIG_SYS_TMP_CON_BUF_SZ)
+#define CONFIG_SYS_INIT_SP_ADDR		(CONFIG_SYS_TMP_CON_BUF_ADDR - \
 						GENERATED_GBL_DATA_SIZE)
 
 #define CONFIG_TEGRA2_GPIO