diff mbox

[v14,30/33] target-tilegx: Handle atomic instructions

Message ID 1440433079-14458-31-git-send-email-rth@twiddle.net
State New
Headers show

Commit Message

Richard Henderson Aug. 24, 2015, 4:17 p.m. UTC
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tilegx/translate.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

Comments

Richard Henderson Aug. 25, 2015, 4:15 a.m. UTC | #1
On 08/24/2015 09:17 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>   target-tilegx/translate.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
> index 210e912..2a0798a 100644
> --- a/target-tilegx/translate.c
> +++ b/target-tilegx/translate.c
> @@ -180,6 +180,19 @@ static void gen_saturate_op(TCGv tdest, TCGv tsrca, TCGv tsrcb,
>       tcg_temp_free(t0);
>   }
>
> +static void gen_atomic_excp(DisasContext *dc, unsigned dest, unsigned srca,
> +                            unsigned srcb, TileExcp excp)
> +{
> +#ifdef CONFIG_USER_ONLY
> +    TCGv_i32 t = tcg_const_i32((dest << 16) | (srca << 8) | srcb);
> +    tcg_gen_st_i32(t, cpu_env, offsetof(CPUTLGState, excparam));
> +    tcg_temp_free_i32(t);
> +    gen_exception(dc, excp);
> +#else
> +    gen_exception(dc, TILEGX_EXCP_OPCODE_UNIMPLEMENTED);
> +#endif
> +}


This is broken.  While it does work well enough for Hello World, implementing a 
non-trap instruction with an exception is extremely dicey for TileGX.  The 
issue is that TileGX bundles operate atomically, with no RAW issues between the 
instructions of the bundle.

Consider a bundle like

    { add r0, r0, r1 ; exch r2, r0, r3 }

In Chen's implementation, the writeback to r0 would occur before the exception, 
and so the exch would happen to the wrong address.  In my implementation here, 
the exception would occur before the writeback, and so the result of the add 
would be discarded.

While retaining the current start_exclusive scheme, it would appear that we 
need to store more data here -- to wit, the contents of the registers not their 
numbers -- and delay the exception until after writeback.

Even better, of course, would be to not exit the TB at all, and use host atomic 
instructions...  I suppose that multi-threading patch set is still in limbo?


r~
Chen Gang Aug. 25, 2015, 1:11 p.m. UTC | #2
On 8/25/15 12:15, Richard Henderson wrote:
> On 08/24/2015 09:17 AM, Richard Henderson wrote:
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>> target-tilegx/translate.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
>> index 210e912..2a0798a 100644
>> --- a/target-tilegx/translate.c
>> +++ b/target-tilegx/translate.c
>> @@ -180,6 +180,19 @@ static void gen_saturate_op(TCGv tdest, TCGv tsrca, TCGv tsrcb,
>> tcg_temp_free(t0);
>> }
>>
>> +static void gen_atomic_excp(DisasContext *dc, unsigned dest, unsigned srca,
>> + unsigned srcb, TileExcp excp)
>> +{
>> +#ifdef CONFIG_USER_ONLY
>> + TCGv_i32 t = tcg_const_i32((dest << 16) | (srca << 8) | srcb);
>> + tcg_gen_st_i32(t, cpu_env, offsetof(CPUTLGState, excparam));
>> + tcg_temp_free_i32(t);
>> + gen_exception(dc, excp);
>> +#else
>> + gen_exception(dc, TILEGX_EXCP_OPCODE_UNIMPLEMENTED);
>> +#endif
>> +}

Originally, I used set_exception(), not gen_exception().

>
>
> This is broken. While it does work well enough for Hello World, implementing a non-trap instruction with an exception is extremely dicey for TileGX. The issue is that TileGX bundles operate atomically, with no RAW issues between the instructions of the bundle.
>
> Consider a bundle like
>
> { add r0, r0, r1 ; exch r2, r0, r3 }
>
> In Chen's implementation, the writeback to r0 would occur before the exception, and so the exch would happen to the wrong address. In my implementation here, the exception would occur before the writeback, and so the result of the add would be discarded.

We use tmp regs for buffering the r0.

- calculate x1 pipe, and save result to r0 tmp reg.

- exch the original r0 and r3 to r2 tmp reg.

- set exception flag (which will cause exception, later).

- save the result tmp regs to r0 or r2.

- gen exception.

>
> While retaining the current start_exclusive scheme, it would appear that we need to store more data here -- to wit, the contents of the registers not their numbers -- and delay the exception until after writeback.
>
> Even better, of course, would be to not exit the TB at all, and use host atomic instructions... I suppose that multi-threading patch set is still in limbo?
>

I guess, we need not.


Thanks.
--
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
Chen Gang Aug. 25, 2015, 1:12 p.m. UTC | #3
----------------------------------------
> From: xili_gchen_5257@hotmail.com
> To: rth@twiddle.net; qemu-devel@nongnu.org
> CC: walt@tilera.com; cmetcalf@ezchip.com; peter.maydell@linaro.org
> Subject: Re: [Qemu-devel] [PATCH v14 30/33] target-tilegx: Handle atomic instructions
> Date: Tue, 25 Aug 2015 21:11:11 +0800
>
> On 8/25/15 12:15, Richard Henderson wrote:
>> On 08/24/2015 09:17 AM, Richard Henderson wrote:
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>> ---
>>> target-tilegx/translate.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 49 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
>>> index 210e912..2a0798a 100644
>>> --- a/target-tilegx/translate.c
>>> +++ b/target-tilegx/translate.c
>>> @@ -180,6 +180,19 @@ static void gen_saturate_op(TCGv tdest, TCGv tsrca, TCGv tsrcb,
>>> tcg_temp_free(t0);
>>> }
>>>
>>> +static void gen_atomic_excp(DisasContext *dc, unsigned dest, unsigned srca,
>>> + unsigned srcb, TileExcp excp)
>>> +{
>>> +#ifdef CONFIG_USER_ONLY
>>> + TCGv_i32 t = tcg_const_i32((dest << 16) | (srca << 8) | srcb);
>>> + tcg_gen_st_i32(t, cpu_env, offsetof(CPUTLGState, excparam));
>>> + tcg_temp_free_i32(t);
>>> + gen_exception(dc, excp);
>>> +#else
>>> + gen_exception(dc, TILEGX_EXCP_OPCODE_UNIMPLEMENTED);
>>> +#endif
>>> +}
>
> Originally, I used set_exception(), not gen_exception().
>
>>
>>
>> This is broken. While it does work well enough for Hello World, implementing a non-trap instruction with an exception is extremely dicey for TileGX. The issue is that TileGX bundles operate atomically, with no RAW issues between the instructions of the bundle.
>>
>> Consider a bundle like
>>
>> { add r0, r0, r1 ; exch r2, r0, r3 }
>>
>> In Chen's implementation, the writeback to r0 would occur before the exception, and so the exch would happen to the wrong address. In my implementation here, the exception would occur before the writeback, and so the result of the add would be discarded.
>
> We use tmp regs for buffering the r0.
>
> - calculate x1 pipe, and save result to r0 tmp reg.
>

Oh, typo, calculate x0 pipe, and save result to r0 tmp reg.

> - exch the original r0 and r3 to r2 tmp reg.
>
> - set exception flag (which will cause exception, later).
>
> - save the result tmp regs to r0 or r2.
>
> - gen exception.
>
>>
>> While retaining the current start_exclusive scheme, it would appear that we need to store more data here -- to wit, the contents of the registers not their numbers -- and delay the exception until after writeback.
>>
>> Even better, of course, would be to not exit the TB at all, and use host atomic instructions... I suppose that multi-threading patch set is still in limbo?
>>
>
> I guess, we need not.
>
>
> Thanks.
> --
> Chen Gang
>
> Open, share, and attitude like air, water, and life which God blessed
>
Richard Henderson Aug. 25, 2015, 2:28 p.m. UTC | #4
On 08/25/2015 06:12 AM, Chen Gang wrote:
>
>
> ----------------------------------------
>> From: xili_gchen_5257@hotmail.com
>> To: rth@twiddle.net; qemu-devel@nongnu.org
>> CC: walt@tilera.com; cmetcalf@ezchip.com; peter.maydell@linaro.org
>> Subject: Re: [Qemu-devel] [PATCH v14 30/33] target-tilegx: Handle atomic instructions
>> Date: Tue, 25 Aug 2015 21:11:11 +0800
>>
>> On 8/25/15 12:15, Richard Henderson wrote:
>>> On 08/24/2015 09:17 AM, Richard Henderson wrote:
>>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>>> ---
>>>> target-tilegx/translate.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 49 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
>>>> index 210e912..2a0798a 100644
>>>> --- a/target-tilegx/translate.c
>>>> +++ b/target-tilegx/translate.c
>>>> @@ -180,6 +180,19 @@ static void gen_saturate_op(TCGv tdest, TCGv tsrca, TCGv tsrcb,
>>>> tcg_temp_free(t0);
>>>> }
>>>>
>>>> +static void gen_atomic_excp(DisasContext *dc, unsigned dest, unsigned srca,
>>>> + unsigned srcb, TileExcp excp)
>>>> +{
>>>> +#ifdef CONFIG_USER_ONLY
>>>> + TCGv_i32 t = tcg_const_i32((dest << 16) | (srca << 8) | srcb);
>>>> + tcg_gen_st_i32(t, cpu_env, offsetof(CPUTLGState, excparam));
>>>> + tcg_temp_free_i32(t);
>>>> + gen_exception(dc, excp);
>>>> +#else
>>>> + gen_exception(dc, TILEGX_EXCP_OPCODE_UNIMPLEMENTED);
>>>> +#endif
>>>> +}
>>
>> Originally, I used set_exception(), not gen_exception().
>>
>>>
>>>
>>> This is broken. While it does work well enough for Hello World, implementing a non-trap instruction with an exception is extremely dicey for TileGX. The issue is that TileGX bundles operate atomically, with no RAW issues between the instructions of the bundle.
>>>
>>> Consider a bundle like
>>>
>>> { add r0, r0, r1 ; exch r2, r0, r3 }
>>>
>>> In Chen's implementation, the writeback to r0 would occur before the exception, and so the exch would happen to the wrong address. In my implementation here, the exception would occur before the writeback, and so the result of the add would be discarded.
>>
>> We use tmp regs for buffering the r0.
>>
>> - calculate x1 pipe, and save result to r0 tmp reg.
>>
>
> Oh, typo, calculate x0 pipe, and save result to r0 tmp reg.
>
>> - exch the original r0 and r3 to r2 tmp reg.
>>
>> - set exception flag (which will cause exception, later).
>>
>> - save the result tmp regs to r0 or r2.
>>
>> - gen exception.

Exactly.  Now re-read what I wrote and see if you can spot the problem with this.


r~
Chen Gang Aug. 25, 2015, 9:45 p.m. UTC | #5
On 8/25/15 22:28, Richard Henderson wrote:
> On 08/25/2015 06:12 AM, Chen Gang wrote:
>>>>
>>>> Consider a bundle like
>>>>
>>>> { add r0, r0, r1 ; exch r2, r0, r3 }
>>>>
>>>> In Chen's implementation, the writeback to r0 would occur before the exception, and so the exch would happen to the wrong address. In my implementation here, the exception would occur before the writeback, and so the result of the add would be discarded.
>>>
>>> We use tmp regs for buffering the r0.
>>>
>>> - calculate x1 pipe, and save result to r0 tmp reg.
>>>
>>
>> Oh, typo, calculate x0 pipe, and save result to r0 tmp reg.
>>
>>> - exch the original r0 and r3 to r2 tmp reg.
>>>
>>> - set exception flag (which will cause exception, later).
>>>
>>> - save the result tmp regs to r0 or r2.
>>>
>>> - gen exception.
>
> Exactly. Now re-read what I wrote and see if you can spot the problem with this.
>

OK, thanks. In my memory, originally, we discussed about related things,
you provided several good ideas (temp regs, and gen exceptions ...).
Based on them, I implemented it in this way, hope it is correct.

At present, for gcc testsuite, I met several (10+) pending issues, which
need to be fixed, next. So I guess, there must be still any issues which
we did not find in the current whole tilegx code.


Thanks.
--
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
diff mbox

Patch

diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
index 210e912..2a0798a 100644
--- a/target-tilegx/translate.c
+++ b/target-tilegx/translate.c
@@ -180,6 +180,19 @@  static void gen_saturate_op(TCGv tdest, TCGv tsrca, TCGv tsrcb,
     tcg_temp_free(t0);
 }
 
+static void gen_atomic_excp(DisasContext *dc, unsigned dest, unsigned srca,
+                            unsigned srcb, TileExcp excp)
+{
+#ifdef CONFIG_USER_ONLY
+    TCGv_i32 t = tcg_const_i32((dest << 16) | (srca << 8) | srcb);
+    tcg_gen_st_i32(t, cpu_env, offsetof(CPUTLGState, excparam));
+    tcg_temp_free_i32(t);
+    gen_exception(dc, excp);
+#else
+    gen_exception(dc, TILEGX_EXCP_OPCODE_UNIMPLEMENTED);
+#endif
+}
+
 static void gen_dblaligni(TCGv tdest, TCGv tsrca, TCGv tsrcb, int shr)
 {
     TCGv t0 = tcg_temp_new();
@@ -595,8 +608,13 @@  static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
         mnemonic = "cmpeq";
         break;
     case OE_RRR(CMPEXCH4, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_CMPEXCH4);
+        mnemonic = "cmpexch4";
+        break;
     case OE_RRR(CMPEXCH, 0, X1):
-        return TILEGX_EXCP_OPCODE_UNIMPLEMENTED;
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_CMPEXCH);
+        mnemonic = "cmpexch";
+        break;
     case OE_RRR(CMPLES, 0, X0):
     case OE_RRR(CMPLES, 0, X1):
     case OE_RRR(CMPLES, 2, Y0):
@@ -662,7 +680,13 @@  static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
         mnemonic = "dblalign";
         break;
     case OE_RRR(EXCH4, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_EXCH4);
+        mnemonic = "exch4";
+        break;
     case OE_RRR(EXCH, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_EXCH);
+        mnemonic = "exch";
+        break;
     case OE_RRR(FDOUBLE_ADDSUB, 0, X0):
     case OE_RRR(FDOUBLE_ADD_FLAGS, 0, X0):
     case OE_RRR(FDOUBLE_MUL_FLAGS, 0, X0):
@@ -672,13 +696,37 @@  static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
     case OE_RRR(FDOUBLE_UNPACK_MAX, 0, X0):
     case OE_RRR(FDOUBLE_UNPACK_MIN, 0, X0):
     case OE_RRR(FETCHADD4, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHADD4);
+        mnemonic = "fetchadd4";
+        break;
     case OE_RRR(FETCHADDGEZ4, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHADDGEZ4);
+        mnemonic = "fetchaddgez4";
+        break;
     case OE_RRR(FETCHADDGEZ, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHADDGEZ);
+        mnemonic = "fetchaddgez";
+        break;
     case OE_RRR(FETCHADD, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHADD);
+        mnemonic = "fetchadd";
+        break;
     case OE_RRR(FETCHAND4, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHAND4);
+        mnemonic = "fetchand4";
+        break;
     case OE_RRR(FETCHAND, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHAND);
+        mnemonic = "fetchand";
+        break;
     case OE_RRR(FETCHOR4, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHOR4);
+        mnemonic = "fetchor4";
+        break;
     case OE_RRR(FETCHOR, 0, X1):
+        gen_atomic_excp(dc, dest, srca, srcb, TILEGX_EXCP_OPCODE_FETCHOR);
+        mnemonic = "fetchor";
+        break;
     case OE_RRR(FSINGLE_ADD1, 0, X0):
     case OE_RRR(FSINGLE_ADDSUB2, 0, X0):
     case OE_RRR(FSINGLE_MUL1, 0, X0):