Patchwork exec-all.h: Make MAX_OP_PER_INSTR large enough for target-arm's uses

login
register
mail settings
Submitter Peter Maydell
Date June 22, 2011, 2:16 p.m.
Message ID <1308752192-17849-1-git-send-email-peter.maydell@linaro.org>
Download mbox | patch
Permalink /patch/101500/
State New
Headers show

Comments

Peter Maydell - June 22, 2011, 2:16 p.m.
The target-arm frontend's worst-case TCG ops per instr is 194 (and in
general many of the "load multiple registers" ARM instructions generate
more than 100 TCG ops). Raise MAX_OP_PER_INSTR accordingly to avoid
possible buffer overruns.

Since it doesn't make any sense for the "64 bit guest on 32 bit host"
case to have a smaller limit than the normal case, we collapse the
two cases back into each other again.

(This increase costs us about 14K in extra static buffer space and
21K of extra margin at the end of a 32MB codegen buffer.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
You might recall the patchset which moves the Neon load/store multiple
instructions to helper functions, and which turns out to slow them
down rather a lot. This is the other approach, which is just to
raise the limit so that the existing implementations don't risk
buffer overruns. The extra memory costs are tiny IMHO.
(The Neon instructions are the worst offenders but the VFP load/store
multiple insns also breach the previous limit. I think we should
consider an implementation of an instruction that's been basically
the same since VFP support was added to QEMU in 2005 to be an
acceptable one, and make sure our buffer sizes cope with it :-))

 exec-all.h |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)
Peter Maydell - July 6, 2011, 11:15 a.m.
Ping?

On 22 June 2011 15:16, Peter Maydell <peter.maydell@linaro.org> wrote:
> The target-arm frontend's worst-case TCG ops per instr is 194 (and in
> general many of the "load multiple registers" ARM instructions generate
> more than 100 TCG ops). Raise MAX_OP_PER_INSTR accordingly to avoid
> possible buffer overruns.
>
> Since it doesn't make any sense for the "64 bit guest on 32 bit host"
> case to have a smaller limit than the normal case, we collapse the
> two cases back into each other again.
>
> (This increase costs us about 14K in extra static buffer space and
> 21K of extra margin at the end of a 32MB codegen buffer.)
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> You might recall the patchset which moves the Neon load/store multiple
> instructions to helper functions, and which turns out to slow them
> down rather a lot. This is the other approach, which is just to
> raise the limit so that the existing implementations don't risk
> buffer overruns. The extra memory costs are tiny IMHO.
> (The Neon instructions are the worst offenders but the VFP load/store
> multiple insns also breach the previous limit. I think we should
> consider an implementation of an instruction that's been basically
> the same since VFP support was added to QEMU in 2005 to be an
> acceptable one, and make sure our buffer sizes cope with it :-))
>
>  exec-all.h |    6 +-----
>  1 files changed, 1 insertions(+), 5 deletions(-)
>
> diff --git a/exec-all.h b/exec-all.h
> index 2a13a95..ef5f5b6 100644
> --- a/exec-all.h
> +++ b/exec-all.h
> @@ -43,11 +43,7 @@ typedef ram_addr_t tb_page_addr_t;
>  typedef struct TranslationBlock TranslationBlock;
>
>  /* XXX: make safe guess about sizes */
> -#if (HOST_LONG_BITS == 32) && (TARGET_LONG_BITS == 64)
> -#define MAX_OP_PER_INSTR 128
> -#else
> -#define MAX_OP_PER_INSTR 96
> -#endif
> +#define MAX_OP_PER_INSTR 208
>
>  #if HOST_LONG_BITS == 32
>  #define MAX_OPC_PARAM_PER_ARG 2
> --
> 1.7.1
Blue Swirl - July 12, 2011, 8:52 p.m.
Thanks, applied.

On Wed, Jul 6, 2011 at 2:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> Ping?
>
> On 22 June 2011 15:16, Peter Maydell <peter.maydell@linaro.org> wrote:
>> The target-arm frontend's worst-case TCG ops per instr is 194 (and in
>> general many of the "load multiple registers" ARM instructions generate
>> more than 100 TCG ops). Raise MAX_OP_PER_INSTR accordingly to avoid
>> possible buffer overruns.
>>
>> Since it doesn't make any sense for the "64 bit guest on 32 bit host"
>> case to have a smaller limit than the normal case, we collapse the
>> two cases back into each other again.
>>
>> (This increase costs us about 14K in extra static buffer space and
>> 21K of extra margin at the end of a 32MB codegen buffer.)
>>
>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>> ---
>> You might recall the patchset which moves the Neon load/store multiple
>> instructions to helper functions, and which turns out to slow them
>> down rather a lot. This is the other approach, which is just to
>> raise the limit so that the existing implementations don't risk
>> buffer overruns. The extra memory costs are tiny IMHO.
>> (The Neon instructions are the worst offenders but the VFP load/store
>> multiple insns also breach the previous limit. I think we should
>> consider an implementation of an instruction that's been basically
>> the same since VFP support was added to QEMU in 2005 to be an
>> acceptable one, and make sure our buffer sizes cope with it :-))
>>
>>  exec-all.h |    6 +-----
>>  1 files changed, 1 insertions(+), 5 deletions(-)
>>
>> diff --git a/exec-all.h b/exec-all.h
>> index 2a13a95..ef5f5b6 100644
>> --- a/exec-all.h
>> +++ b/exec-all.h
>> @@ -43,11 +43,7 @@ typedef ram_addr_t tb_page_addr_t;
>>  typedef struct TranslationBlock TranslationBlock;
>>
>>  /* XXX: make safe guess about sizes */
>> -#if (HOST_LONG_BITS == 32) && (TARGET_LONG_BITS == 64)
>> -#define MAX_OP_PER_INSTR 128
>> -#else
>> -#define MAX_OP_PER_INSTR 96
>> -#endif
>> +#define MAX_OP_PER_INSTR 208
>>
>>  #if HOST_LONG_BITS == 32
>>  #define MAX_OPC_PARAM_PER_ARG 2
>> --
>> 1.7.1
>
>

Patch

diff --git a/exec-all.h b/exec-all.h
index 2a13a95..ef5f5b6 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -43,11 +43,7 @@  typedef ram_addr_t tb_page_addr_t;
 typedef struct TranslationBlock TranslationBlock;
 
 /* XXX: make safe guess about sizes */
-#if (HOST_LONG_BITS == 32) && (TARGET_LONG_BITS == 64)
-#define MAX_OP_PER_INSTR 128
-#else
-#define MAX_OP_PER_INSTR 96
-#endif
+#define MAX_OP_PER_INSTR 208
 
 #if HOST_LONG_BITS == 32
 #define MAX_OPC_PARAM_PER_ARG 2