diff mbox

[01/12] TCG "sync" op

Message ID 1256133153-3121-2-git-send-email-uli@suse.de
State New
Headers show

Commit Message

Ulrich Hecht Oct. 21, 2009, 1:52 p.m. UTC
sync allows concurrent accesses to locations in memory through different TCG
variables. This comes in handy when you are emulating CPU registers that can
be used as either 32 or 64 bit, as TCG doesn't know anything about aliases.
See the s390x target for an example.

Fixed sync_i64 build failure on 32-bit targets.

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 tcg/tcg-op.h  |   12 ++++++++++++
 tcg/tcg-opc.h |    2 ++
 tcg/tcg.c     |    6 ++++++
 3 files changed, 20 insertions(+), 0 deletions(-)

Comments

Aurelien Jarno Oct. 22, 2009, 9:03 p.m. UTC | #1
On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
> sync allows concurrent accesses to locations in memory through different TCG
> variables. This comes in handy when you are emulating CPU registers that can
> be used as either 32 or 64 bit, as TCG doesn't know anything about aliases.
> See the s390x target for an example.
> 
> Fixed sync_i64 build failure on 32-bit targets.

Looking more in details to the use case of this patch, I think it can be
useful in QEMU. However I don't feel very comfortable in merging it
without having the opinion of more persons. Paul, Malc Blue Swirl or
others, any opinion?

Anyway this patch should come with (simple) documentation to drop into
tcg/README.

Otherwise, please find my comments inline.

> Signed-off-by: Ulrich Hecht <uli@suse.de>
> ---
>  tcg/tcg-op.h  |   12 ++++++++++++
>  tcg/tcg-opc.h |    2 ++
>  tcg/tcg.c     |    6 ++++++
>  3 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index faf2e8b..c1b4710 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -316,6 +316,18 @@ static inline void tcg_gen_br(int label)
>      tcg_gen_op1i(INDEX_op_br, label);
>  }
>  
> +static inline void tcg_gen_sync_i32(TCGv_i32 arg)
> +{
> +    tcg_gen_op1_i32(INDEX_op_sync_i32, arg);
> +}
> +
> +#if TCG_TARGET_REG_BITS == 64
> +static inline void tcg_gen_sync_i64(TCGv_i64 arg)
> +{
> +    tcg_gen_op1_i64(INDEX_op_sync_i64, arg);
> +}
> +#endif
> +
>  static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
>  {
>      if (!TCGV_EQUAL_I32(ret, arg))

You may also want to add a sync_tl as a #define at the end of the file.

> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index b7f3fd7..5dcdeba 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -40,6 +40,7 @@ DEF2(call, 0, 1, 2, TCG_OPF_SIDE_EFFECTS) /* variable number of parameters */
>  DEF2(jmp, 0, 1, 0, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  DEF2(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  
> +DEF2(sync_i32, 0, 1, 0, 0)
>  DEF2(mov_i32, 1, 1, 0, 0)
>  DEF2(movi_i32, 1, 0, 1, 0)
>  /* load/store */
> @@ -109,6 +110,7 @@ DEF2(neg_i32, 1, 1, 0, 0)
>  #endif
>  
>  #if TCG_TARGET_REG_BITS == 64
> +DEF2(sync_i64, 0, 1, 0, 0)
>  DEF2(mov_i64, 1, 1, 0, 0)
>  DEF2(movi_i64, 1, 0, 1, 0)
>  /* load/store */
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 3c0e296..8eb60f8 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1930,6 +1930,12 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
>          //        dump_regs(s);
>  #endif
>          switch(opc) {
> +        case INDEX_op_sync_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_sync_i64:
> +#endif
> +            temp_save(s, args[0], s->reserved_regs);
> +            break;

This won't work for fixed register. You should probably add something
like:
    if (s->temps[args[0]].fixed_reg) {
        tcg_abort();
    }

>          case INDEX_op_mov_i32:
>  #if TCG_TARGET_REG_BITS == 64
>          case INDEX_op_mov_i64:
> -- 
> 1.6.2.1
> 
>
malc Oct. 22, 2009, 9:27 p.m. UTC | #2
On Thu, 22 Oct 2009, Aurelien Jarno wrote:

> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
> > sync allows concurrent accesses to locations in memory through different TCG
> > variables. This comes in handy when you are emulating CPU registers that can
> > be used as either 32 or 64 bit, as TCG doesn't know anything about aliases.
> > See the s390x target for an example.
> > 
> > Fixed sync_i64 build failure on 32-bit targets.
> 
> Looking more in details to the use case of this patch, I think it can be
> useful in QEMU. However I don't feel very comfortable in merging it
> without having the opinion of more persons. Paul, Malc Blue Swirl or
> others, any opinion?

I don't really understand the problem/solution so no - no opinion.

[..snip..]
Paul Brook Nov. 11, 2009, 12:56 a.m. UTC | #3
On Thursday 22 October 2009, Aurelien Jarno wrote:
> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
> > sync allows concurrent accesses to locations in memory through different
> > TCG variables. This comes in handy when you are emulating CPU registers
> > that can be used as either 32 or 64 bit, as TCG doesn't know anything
> > about aliases. See the s390x target for an example.
> >
> > Fixed sync_i64 build failure on 32-bit targets.
> 
> Looking more in details to the use case of this patch, I think it can be
> useful in QEMU. However I don't feel very comfortable in merging it
> without having the opinion of more persons. Paul, Malc Blue Swirl or
> others, any opinion?

I don't think this is the right solution.

IIUC the basic problem is that we have a register file where adjacent pairs of 
32-bit registers are also accessed as a 64-bit value.  This is something many 
other targets need to do (at least ARM, PPC, MIPS and SPARC).

While sync appears attractive as a quick hack to achieve this, I think it is 
liable to be abused, and cause us serious pain long-term. If you need an easy 
solution then use ld/st (as with ARM VFP registers). If you want a good 
solution then fix whichever bit of TCG makes accessing a pair of registers 
horribly slow. We already have some support for this (concat_i32_i64).

Paul
Alexander Graf Nov. 16, 2009, 1:54 p.m. UTC | #4
Paul Brook wrote:
> On Thursday 22 October 2009, Aurelien Jarno wrote:
>   
>> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
>>     
>>> sync allows concurrent accesses to locations in memory through different
>>> TCG variables. This comes in handy when you are emulating CPU registers
>>> that can be used as either 32 or 64 bit, as TCG doesn't know anything
>>> about aliases. See the s390x target for an example.
>>>
>>> Fixed sync_i64 build failure on 32-bit targets.
>>>       
>> Looking more in details to the use case of this patch, I think it can be
>> useful in QEMU. However I don't feel very comfortable in merging it
>> without having the opinion of more persons. Paul, Malc Blue Swirl or
>> others, any opinion?
>>     
>
> I don't think this is the right solution.
>
> IIUC the basic problem is that we have a register file where adjacent pairs of 
> 32-bit registers are also accessed as a 64-bit value.  This is something many 
> other targets need to do (at least ARM, PPC, MIPS and SPARC).
>
> While sync appears attractive as a quick hack to achieve this, I think it is 
> liable to be abused, and cause us serious pain long-term. If you need an easy 
> solution then use ld/st (as with ARM VFP registers). If you want a good 
> solution then fix whichever bit of TCG makes accessing a pair of registers 
> horribly slow. We already have some support for this (concat_i32_i64).
>   

Could we please include it nevertheless? I don't want to see S390 TCG
and KVM targets not included because of this sync operation.

If you like, add some big fat warning around it and maybe break
compilation if anyone but the s390 target uses that sync op, but let's
not keep a whole target from inclusion because of a single feature that
_might_ one day affect others.


Alex
Paul Brook Nov. 16, 2009, 2:37 p.m. UTC | #5
> > While sync appears attractive as a quick hack to achieve this, I think it
> > is liable to be abused, and cause us serious pain long-term. If you need
> > an easy solution then use ld/st (as with ARM VFP registers). If you want
> > a good solution then fix whichever bit of TCG makes accessing a pair of
> > registers horribly slow. We already have some support for this
> > (concat_i32_i64).
> 
> Could we please include it nevertheless? I don't want to see S390 TCG
> and KVM targets not included because of this sync operation.
> 
> If you like, add some big fat warning around it and maybe break
> compilation if anyone but the s390 target uses that sync op, but let's
> not keep a whole target from inclusion because of a single feature that
> _might_ one day affect others.

I'd rather not include sync, and instead use the explicit ld/st code you 
already wrote.

Paul
Alexander Graf Nov. 16, 2009, 3:14 p.m. UTC | #6
Paul Brook wrote:
>>> While sync appears attractive as a quick hack to achieve this, I think it
>>> is liable to be abused, and cause us serious pain long-term. If you need
>>> an easy solution then use ld/st (as with ARM VFP registers). If you want
>>> a good solution then fix whichever bit of TCG makes accessing a pair of
>>> registers horribly slow. We already have some support for this
>>> (concat_i32_i64).
>>>       
>> Could we please include it nevertheless? I don't want to see S390 TCG
>> and KVM targets not included because of this sync operation.
>>
>> If you like, add some big fat warning around it and maybe break
>> compilation if anyone but the s390 target uses that sync op, but let's
>> not keep a whole target from inclusion because of a single feature that
>> _might_ one day affect others.
>>     
>
> I'd rather not include sync, and instead use the explicit ld/st code you 
> already wrote.
>   

As long as the KVM code comes in I'm good. I don't really care that much
about the TCG parts and only need them to have the build system
surroundings set up.

So yes, I don't care about sync. As long as we finally get any solution
here I'm good, but patches lying around for weeks surely doesn't help qemu.


Alex
Aurelien Jarno Nov. 17, 2009, 11:40 p.m. UTC | #7
On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote:
> On Thursday 22 October 2009, Aurelien Jarno wrote:
> > On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
> > > sync allows concurrent accesses to locations in memory through different
> > > TCG variables. This comes in handy when you are emulating CPU registers
> > > that can be used as either 32 or 64 bit, as TCG doesn't know anything
> > > about aliases. See the s390x target for an example.
> > >
> > > Fixed sync_i64 build failure on 32-bit targets.
> > 
> > Looking more in details to the use case of this patch, I think it can be
> > useful in QEMU. However I don't feel very comfortable in merging it
> > without having the opinion of more persons. Paul, Malc Blue Swirl or
> > others, any opinion?
> 
> I don't think this is the right solution.
> 
> IIUC the basic problem is that we have a register file where adjacent pairs of 
> 32-bit registers are also accessed as a 64-bit value.  This is something many 
> other targets need to do (at least ARM, PPC, MIPS and SPARC).
> 
> While sync appears attractive as a quick hack to achieve this, I think it is 
> liable to be abused, and cause us serious pain long-term. If you need an easy 
> solution then use ld/st (as with ARM VFP registers). If you want a good 
> solution then fix whichever bit of TCG makes accessing a pair of registers 
> horribly slow. We already have some support for this (concat_i32_i64).
> 

What is probably needed here are merge_low and merge_high ops, merging a
32-bit value into the low or high part of a 64-bit value, leaving the
other part unchanged. Not sure we can really optimize that on x86/x86_64
compared to standard TCG code though.
Alexander Graf Nov. 18, 2009, 12:01 a.m. UTC | #8
On 18.11.2009, at 00:40, Aurelien Jarno wrote:

> On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote:
>> On Thursday 22 October 2009, Aurelien Jarno wrote:
>>> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
>>>> sync allows concurrent accesses to locations in memory through  
>>>> different
>>>> TCG variables. This comes in handy when you are emulating CPU  
>>>> registers
>>>> that can be used as either 32 or 64 bit, as TCG doesn't know  
>>>> anything
>>>> about aliases. See the s390x target for an example.
>>>>
>>>> Fixed sync_i64 build failure on 32-bit targets.
>>>
>>> Looking more in details to the use case of this patch, I think it  
>>> can be
>>> useful in QEMU. However I don't feel very comfortable in merging it
>>> without having the opinion of more persons. Paul, Malc Blue Swirl or
>>> others, any opinion?
>>
>> I don't think this is the right solution.
>>
>> IIUC the basic problem is that we have a register file where  
>> adjacent pairs of
>> 32-bit registers are also accessed as a 64-bit value.  This is  
>> something many
>> other targets need to do (at least ARM, PPC, MIPS and SPARC).
>>
>> While sync appears attractive as a quick hack to achieve this, I  
>> think it is
>> liable to be abused, and cause us serious pain long-term. If you  
>> need an easy
>> solution then use ld/st (as with ARM VFP registers). If you want a  
>> good
>> solution then fix whichever bit of TCG makes accessing a pair of  
>> registers
>> horribly slow. We already have some support for this  
>> (concat_i32_i64).
>>
>
> What is probably needed here are merge_low and merge_high ops,  
> merging a
> 32-bit value into the low or high part of a 64-bit value, leaving the
> other part unchanged. Not sure we can really optimize that on x86/ 
> x86_64
> compared to standard TCG code though.

Maybe I got the whole point wrong but isn't this about having 2  
virtual register sets for the same target registers? So you'd have:

  TCGvar my_reg_32;
  TCGvar my_reg_64;

and whenever you work with either of them you want to have the correct  
value present in both (cut off for 32bit, extended for 64 bit).

So what's really necessary is some internal coupling and dirty log of  
variables within TCG so it knows that an access to the 64 bit of a  
coupled variable means it needs to sync the 32 bit value over and vice  
versa. Then magically everything would just work as expected.

Or am I totally wrong here?

Alex
Aurelien Jarno Nov. 18, 2009, 3:12 p.m. UTC | #9
On Wed, Nov 18, 2009 at 01:01:13AM +0100, Alexander Graf wrote:
>
> On 18.11.2009, at 00:40, Aurelien Jarno wrote:
>
>> On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote:
>>> On Thursday 22 October 2009, Aurelien Jarno wrote:
>>>> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
>>>>> sync allows concurrent accesses to locations in memory through  
>>>>> different
>>>>> TCG variables. This comes in handy when you are emulating CPU  
>>>>> registers
>>>>> that can be used as either 32 or 64 bit, as TCG doesn't know  
>>>>> anything
>>>>> about aliases. See the s390x target for an example.
>>>>>
>>>>> Fixed sync_i64 build failure on 32-bit targets.
>>>>
>>>> Looking more in details to the use case of this patch, I think it  
>>>> can be
>>>> useful in QEMU. However I don't feel very comfortable in merging it
>>>> without having the opinion of more persons. Paul, Malc Blue Swirl or
>>>> others, any opinion?
>>>
>>> I don't think this is the right solution.
>>>
>>> IIUC the basic problem is that we have a register file where  
>>> adjacent pairs of
>>> 32-bit registers are also accessed as a 64-bit value.  This is  
>>> something many
>>> other targets need to do (at least ARM, PPC, MIPS and SPARC).
>>>
>>> While sync appears attractive as a quick hack to achieve this, I  
>>> think it is
>>> liable to be abused, and cause us serious pain long-term. If you  
>>> need an easy
>>> solution then use ld/st (as with ARM VFP registers). If you want a  
>>> good
>>> solution then fix whichever bit of TCG makes accessing a pair of  
>>> registers
>>> horribly slow. We already have some support for this  
>>> (concat_i32_i64).
>>>
>>
>> What is probably needed here are merge_low and merge_high ops, merging 
>> a
>> 32-bit value into the low or high part of a 64-bit value, leaving the
>> other part unchanged. Not sure we can really optimize that on x86/ 
>> x86_64
>> compared to standard TCG code though.
>
> Maybe I got the whole point wrong but isn't this about having 2 virtual 
> register sets for the same target registers? So you'd have:
>
>  TCGvar my_reg_32;
>  TCGvar my_reg_64;
>
> and whenever you work with either of them you want to have the correct  
> value present in both (cut off for 32bit, extended for 64 bit).
>
> So what's really necessary is some internal coupling and dirty log of  
> variables within TCG so it knows that an access to the 64 bit of a  
> coupled variable means it needs to sync the 32 bit value over and vice  
> versa. Then magically everything would just work as expected.
>

That's an option, basically keeping the list (or only one ?) of aliased 
TCG variables for each of them, and freeing the others before using one.
Alexander Graf Nov. 18, 2009, 3:21 p.m. UTC | #10
Aurelien Jarno wrote:
> On Wed, Nov 18, 2009 at 01:01:13AM +0100, Alexander Graf wrote:
>   
>> On 18.11.2009, at 00:40, Aurelien Jarno wrote:
>>
>>     
>>> On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote:
>>>       
>>>> On Thursday 22 October 2009, Aurelien Jarno wrote:
>>>>         
>>>>> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote:
>>>>>           
>>>>>> sync allows concurrent accesses to locations in memory through  
>>>>>> different
>>>>>> TCG variables. This comes in handy when you are emulating CPU  
>>>>>> registers
>>>>>> that can be used as either 32 or 64 bit, as TCG doesn't know  
>>>>>> anything
>>>>>> about aliases. See the s390x target for an example.
>>>>>>
>>>>>> Fixed sync_i64 build failure on 32-bit targets.
>>>>>>             
>>>>> Looking more in details to the use case of this patch, I think it  
>>>>> can be
>>>>> useful in QEMU. However I don't feel very comfortable in merging it
>>>>> without having the opinion of more persons. Paul, Malc Blue Swirl or
>>>>> others, any opinion?
>>>>>           
>>>> I don't think this is the right solution.
>>>>
>>>> IIUC the basic problem is that we have a register file where  
>>>> adjacent pairs of
>>>> 32-bit registers are also accessed as a 64-bit value.  This is  
>>>> something many
>>>> other targets need to do (at least ARM, PPC, MIPS and SPARC).
>>>>
>>>> While sync appears attractive as a quick hack to achieve this, I  
>>>> think it is
>>>> liable to be abused, and cause us serious pain long-term. If you  
>>>> need an easy
>>>> solution then use ld/st (as with ARM VFP registers). If you want a  
>>>> good
>>>> solution then fix whichever bit of TCG makes accessing a pair of  
>>>> registers
>>>> horribly slow. We already have some support for this  
>>>> (concat_i32_i64).
>>>>
>>>>         
>>> What is probably needed here are merge_low and merge_high ops, merging 
>>> a
>>> 32-bit value into the low or high part of a 64-bit value, leaving the
>>> other part unchanged. Not sure we can really optimize that on x86/ 
>>> x86_64
>>> compared to standard TCG code though.
>>>       
>> Maybe I got the whole point wrong but isn't this about having 2 virtual 
>> register sets for the same target registers? So you'd have:
>>
>>  TCGvar my_reg_32;
>>  TCGvar my_reg_64;
>>
>> and whenever you work with either of them you want to have the correct  
>> value present in both (cut off for 32bit, extended for 64 bit).
>>
>> So what's really necessary is some internal coupling and dirty log of  
>> variables within TCG so it knows that an access to the 64 bit of a  
>> coupled variable means it needs to sync the 32 bit value over and vice  
>> versa. Then magically everything would just work as expected.
>>
>>     
>
> That's an option, basically keeping the list (or only one ?) of aliased 
> TCG variables for each of them, and freeing the others before using one.
>   

Yeah, only one. I don't think this should be getting overengineered (and
thus slow) :-).
Doesn't really sound hard, does it? Any TCG magicians around? :)

Alex
Paul Brook Nov. 18, 2009, 3:33 p.m. UTC | #11
> > That's an option, basically keeping the list (or only one ?) of aliased
> > TCG variables for each of them, and freeing the others before using one.
> 
> Yeah, only one. I don't think this should be getting overengineered (and
> thus slow) :-).
> Doesn't really sound hard, does it? Any TCG magicians around? :)

Maybe. This is the sort of thing that tends to get harder the more you think 
about it.

Paul
diff mbox

Patch

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index faf2e8b..c1b4710 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -316,6 +316,18 @@  static inline void tcg_gen_br(int label)
     tcg_gen_op1i(INDEX_op_br, label);
 }
 
+static inline void tcg_gen_sync_i32(TCGv_i32 arg)
+{
+    tcg_gen_op1_i32(INDEX_op_sync_i32, arg);
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static inline void tcg_gen_sync_i64(TCGv_i64 arg)
+{
+    tcg_gen_op1_i64(INDEX_op_sync_i64, arg);
+}
+#endif
+
 static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
 {
     if (!TCGV_EQUAL_I32(ret, arg))
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index b7f3fd7..5dcdeba 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -40,6 +40,7 @@  DEF2(call, 0, 1, 2, TCG_OPF_SIDE_EFFECTS) /* variable number of parameters */
 DEF2(jmp, 0, 1, 0, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
 DEF2(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
 
+DEF2(sync_i32, 0, 1, 0, 0)
 DEF2(mov_i32, 1, 1, 0, 0)
 DEF2(movi_i32, 1, 0, 1, 0)
 /* load/store */
@@ -109,6 +110,7 @@  DEF2(neg_i32, 1, 1, 0, 0)
 #endif
 
 #if TCG_TARGET_REG_BITS == 64
+DEF2(sync_i64, 0, 1, 0, 0)
 DEF2(mov_i64, 1, 1, 0, 0)
 DEF2(movi_i64, 1, 0, 1, 0)
 /* load/store */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 3c0e296..8eb60f8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1930,6 +1930,12 @@  static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
         //        dump_regs(s);
 #endif
         switch(opc) {
+        case INDEX_op_sync_i32:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_sync_i64:
+#endif
+            temp_save(s, args[0], s->reserved_regs);
+            break;
         case INDEX_op_mov_i32:
 #if TCG_TARGET_REG_BITS == 64
         case INDEX_op_mov_i64: