diff mbox

[01/12] target-s390x: Fix wrong argument in call of tcg_gen_shl_i64()

Message ID 1306355129-23447-2-git-send-email-weil@mail.berlios.de
State Superseded
Headers show

Commit Message

Stefan Weil May 25, 2011, 8:25 p.m. UTC
tcg_gen_shl_i64 needs an argument of type TCGv_i64.
Using tmp4 needs some additional changes.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 target-s390x/translate.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

Comments

Alexander Graf May 25, 2011, 10:15 p.m. UTC | #1
On 25.05.2011, at 22:25, Stefan Weil wrote:

> tcg_gen_shl_i64 needs an argument of type TCGv_i64.
> Using tmp4 needs some additional changes.
> 
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
> target-s390x/translate.c |    8 +++++---
> 1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
> index 8e71df3..3614516 100644
> --- a/target-s390x/translate.c
> +++ b/target-s390x/translate.c
> @@ -2056,7 +2056,6 @@ do_mh:
>            even for very long ones... */
>         tmp = get_address(s, 0, b2, d2);
>         tmp3 = tcg_const_i64(stm_len);
> -        tmp4 = tcg_const_i64(32);
>         for (i = r1;; i = (i + 1) % 16) {
>             switch (op) {
>             case 0x4:
> @@ -2070,7 +2069,9 @@ do_mh:
> #else
>                 tmp2 = tcg_temp_new_i64();
>                 tcg_gen_qemu_ld32u(tmp2, tmp, get_mem_index(s));
> -                tcg_gen_shl_i64(tmp2, tmp2, 4);
> +                tmp4 = tcg_const_i64(4);
> +                tcg_gen_shl_i64(tmp2, tmp2, tmp4);
> +                tcg_temp_free_i64(tmp4);
>                 tcg_gen_ext32u_i64(regs[i], regs[i]);
>                 tcg_gen_or_i64(regs[i], regs[i], tmp2);
> #endif
> @@ -2081,7 +2082,9 @@ do_mh:
>                 break;
>             case 0x26:
>                 tmp2 = tcg_temp_new_i64();
> +                tmp4 = tcg_const_i64(32);

This moves the const inside the loop, which is exactly what I was trying to avoid here. The problem is that every new const generated here issues 1 additional tcg op, which really sums up when there's too many of them. I've had the buffer exceed here plenty of times.


Alex
Stefan Weil May 26, 2011, 4:32 a.m. UTC | #2
Am 26.05.2011 00:15, schrieb Alexander Graf:
> On 25.05.2011, at 22:25, Stefan Weil wrote:
>
>> tcg_gen_shl_i64 needs an argument of type TCGv_i64.
>> Using tmp4 needs some additional changes.
>>
>> Signed-off-by: Stefan Weil<weil@mail.berlios.de>
>> ---
>> target-s390x/translate.c |    8 +++++---
>> 1 files changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
>> index 8e71df3..3614516 100644
>> --- a/target-s390x/translate.c
>> +++ b/target-s390x/translate.c
>> @@ -2056,7 +2056,6 @@ do_mh:
>>             even for very long ones... */
>>          tmp = get_address(s, 0, b2, d2);
>>          tmp3 = tcg_const_i64(stm_len);
>> -        tmp4 = tcg_const_i64(32);
>>          for (i = r1;; i = (i + 1) % 16) {
>>              switch (op) {
>>              case 0x4:
>> @@ -2070,7 +2069,9 @@ do_mh:
>> #else
>>                  tmp2 = tcg_temp_new_i64();
>>                  tcg_gen_qemu_ld32u(tmp2, tmp, get_mem_index(s));
>> -                tcg_gen_shl_i64(tmp2, tmp2, 4);
>> +                tmp4 = tcg_const_i64(4);
>> +                tcg_gen_shl_i64(tmp2, tmp2, tmp4);
>> +                tcg_temp_free_i64(tmp4);
>>                  tcg_gen_ext32u_i64(regs[i], regs[i]);
>>                  tcg_gen_or_i64(regs[i], regs[i], tmp2);
>> #endif
>> @@ -2081,7 +2082,9 @@ do_mh:
>>                  break;
>>              case 0x26:
>>                  tmp2 = tcg_temp_new_i64();
>> +                tmp4 = tcg_const_i64(32);
> This moves the const inside the loop, which is exactly what I was trying to avoid here. The problem is that every new const generated here issues 1 additional tcg op, which really sums up when there's too many of them. I've had the buffer exceed here plenty of times.
>
> Alex

I noticed that, too. But adding a tmp5 and generating two const outside 
the loop
of whom only one or even none is used is also a bad solution.

What about moving the loop inside the switch statement (one loop for 
every case)?
This does not look nice but looks like the best solution here.

If you prefer a different solution, just omit this part of my patch series.

Grüße, Stefan
Stefan Weil May 26, 2011, 4:56 a.m. UTC | #3
Am 26.05.2011 06:32, schrieb Stefan Weil:
> Am 26.05.2011 00:15, schrieb Alexander Graf:
>> On 25.05.2011, at 22:25, Stefan Weil wrote:
>>
>>> tcg_gen_shl_i64 needs an argument of type TCGv_i64.
>>> Using tmp4 needs some additional changes.
>>>
>>> Signed-off-by: Stefan Weil<weil@mail.berlios.de>
>>> ---
>>> target-s390x/translate.c |    8 +++++---
>>> 1 files changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
>>> index 8e71df3..3614516 100644
>>> --- a/target-s390x/translate.c
>>> +++ b/target-s390x/translate.c
>>> @@ -2056,7 +2056,6 @@ do_mh:
>>>             even for very long ones... */
>>>          tmp = get_address(s, 0, b2, d2);
>>>          tmp3 = tcg_const_i64(stm_len);
>>> -        tmp4 = tcg_const_i64(32);
>>>          for (i = r1;; i = (i + 1) % 16) {
>>>              switch (op) {
>>>              case 0x4:
>>> @@ -2070,7 +2069,9 @@ do_mh:
>>> #else
>>>                  tmp2 = tcg_temp_new_i64();
>>>                  tcg_gen_qemu_ld32u(tmp2, tmp, get_mem_index(s));
>>> -                tcg_gen_shl_i64(tmp2, tmp2, 4);
>>> +                tmp4 = tcg_const_i64(4);
>>> +                tcg_gen_shl_i64(tmp2, tmp2, tmp4);
>>> +                tcg_temp_free_i64(tmp4);
>>>                  tcg_gen_ext32u_i64(regs[i], regs[i]);
>>>                  tcg_gen_or_i64(regs[i], regs[i], tmp2);
>>> #endif
>>> @@ -2081,7 +2082,9 @@ do_mh:
>>>                  break;
>>>              case 0x26:
>>>                  tmp2 = tcg_temp_new_i64();
>>> +                tmp4 = tcg_const_i64(32);
>> This moves the const inside the loop, which is exactly what I was 
>> trying to avoid here. The problem is that every new const generated 
>> here issues 1 additional tcg op, which really sums up when there's 
>> too many of them. I've had the buffer exceed here plenty of times.
>>
>> Alex
>
> I noticed that, too. But adding a tmp5 and generating two const 
> outside the loop
> of whom only one or even none is used is also a bad solution.
>
> What about moving the loop inside the switch statement (one loop for 
> every case)?
> This does not look nice but looks like the best solution here.
>
> If you prefer a different solution, just omit this part of my patch 
> series.
>
> Grüße, Stefan

Would it be possible to use tcg_gen_shli_64 / tcg_gen_shri_64 and remove
tmp4 completely?

Stefan
Alexander Graf May 26, 2011, 7:26 a.m. UTC | #4
On 26.05.2011, at 06:56, Stefan Weil wrote:

> Am 26.05.2011 06:32, schrieb Stefan Weil:
>> Am 26.05.2011 00:15, schrieb Alexander Graf:
>>> On 25.05.2011, at 22:25, Stefan Weil wrote:
>>> 
>>>> tcg_gen_shl_i64 needs an argument of type TCGv_i64.
>>>> Using tmp4 needs some additional changes.
>>>> 
>>>> Signed-off-by: Stefan Weil<weil@mail.berlios.de>
>>>> ---
>>>> target-s390x/translate.c |    8 +++++---
>>>> 1 files changed, 5 insertions(+), 3 deletions(-)
>>>> 
>>>> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
>>>> index 8e71df3..3614516 100644
>>>> --- a/target-s390x/translate.c
>>>> +++ b/target-s390x/translate.c
>>>> @@ -2056,7 +2056,6 @@ do_mh:
>>>>            even for very long ones... */
>>>>         tmp = get_address(s, 0, b2, d2);
>>>>         tmp3 = tcg_const_i64(stm_len);
>>>> -        tmp4 = tcg_const_i64(32);
>>>>         for (i = r1;; i = (i + 1) % 16) {
>>>>             switch (op) {
>>>>             case 0x4:
>>>> @@ -2070,7 +2069,9 @@ do_mh:
>>>> #else
>>>>                 tmp2 = tcg_temp_new_i64();
>>>>                 tcg_gen_qemu_ld32u(tmp2, tmp, get_mem_index(s));
>>>> -                tcg_gen_shl_i64(tmp2, tmp2, 4);
>>>> +                tmp4 = tcg_const_i64(4);
>>>> +                tcg_gen_shl_i64(tmp2, tmp2, tmp4);
>>>> +                tcg_temp_free_i64(tmp4);
>>>>                 tcg_gen_ext32u_i64(regs[i], regs[i]);
>>>>                 tcg_gen_or_i64(regs[i], regs[i], tmp2);
>>>> #endif
>>>> @@ -2081,7 +2082,9 @@ do_mh:
>>>>                 break;
>>>>             case 0x26:
>>>>                 tmp2 = tcg_temp_new_i64();
>>>> +                tmp4 = tcg_const_i64(32);
>>> This moves the const inside the loop, which is exactly what I was trying to avoid here. The problem is that every new const generated here issues 1 additional tcg op, which really sums up when there's too many of them. I've had the buffer exceed here plenty of times.
>>> 
>>> Alex
>> 
>> I noticed that, too. But adding a tmp5 and generating two const outside the loop
>> of whom only one or even none is used is also a bad solution.
>> 
>> What about moving the loop inside the switch statement (one loop for every case)?
>> This does not look nice but looks like the best solution here.
>> 
>> If you prefer a different solution, just omit this part of my patch series.
>> 
>> Grüße, Stefan
> 
> Would it be possible to use tcg_gen_shli_64 / tcg_gen_shri_64 and remove
> tmp4 completely?

That's what I had originally. It basically does exactly the same as the code you have patched it to :). Any immediate tcg op internally just generates a const tcg var and calls the generic one.


Alex
diff mbox

Patch

diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index 8e71df3..3614516 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -2056,7 +2056,6 @@  do_mh:
            even for very long ones... */
         tmp = get_address(s, 0, b2, d2);
         tmp3 = tcg_const_i64(stm_len);
-        tmp4 = tcg_const_i64(32);
         for (i = r1;; i = (i + 1) % 16) {
             switch (op) {
             case 0x4:
@@ -2070,7 +2069,9 @@  do_mh:
 #else
                 tmp2 = tcg_temp_new_i64();
                 tcg_gen_qemu_ld32u(tmp2, tmp, get_mem_index(s));
-                tcg_gen_shl_i64(tmp2, tmp2, 4);
+                tmp4 = tcg_const_i64(4);
+                tcg_gen_shl_i64(tmp2, tmp2, tmp4);
+                tcg_temp_free_i64(tmp4);
                 tcg_gen_ext32u_i64(regs[i], regs[i]);
                 tcg_gen_or_i64(regs[i], regs[i], tmp2);
 #endif
@@ -2081,7 +2082,9 @@  do_mh:
                 break;
             case 0x26:
                 tmp2 = tcg_temp_new_i64();
+                tmp4 = tcg_const_i64(32);
                 tcg_gen_shr_i64(tmp2, regs[i], tmp4);
+                tcg_temp_free_i64(tmp4);
                 tcg_gen_qemu_st32(tmp2, tmp, get_mem_index(s));
                 tcg_temp_free_i64(tmp2);
                 break;
@@ -2094,7 +2097,6 @@  do_mh:
             tcg_gen_add_i64(tmp, tmp, tmp3);
         }
         tcg_temp_free_i64(tmp);
-        tcg_temp_free_i64(tmp4);
         break;
     case 0x2c: /* STCMH R1,M3,D2(B2) [RSY] */
         tmp = get_address(s, 0, b2, d2);