diff mbox

Counting barrier instructions in ARM

Message ID 543F232A.6080104@gmail.com
State New
Headers show

Commit Message

Pranith Kumar Oct. 16, 2014, 1:45 a.m. UTC
Hello,

I am trying to count the number of barrier instructions (dmb) which are being
executed in an multi-threaded ARM executable. I am running the executable using
qemu user mode with the following patch applied.

Basically I created two counters in the ARM cpu state and incrementing them by
generating a TCG instruction whenever a barrier instruction is translated. I am
doing something similar even for counting the total instructions executed.

The problem I am facing is that this seems to be crashing when run with a
multi-threaded executable. Also the statistics gathered are not really accurate.

Is there something obviously wrong with what I am trying to do? Any help is
highly appreciated.

Thanks!


---
 linux-user/main.c      | 10 +++++++++-
 target-arm/cpu.h       |  2 ++
 target-arm/translate.c |  9 +++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

Comments

Max Filippov Oct. 16, 2014, 1:54 a.m. UTC | #1
Hi,

On Thu, Oct 16, 2014 at 5:45 AM, Pranith Kumar <bobby.prani@gmail.com> wrote:
> Is there something obviously wrong with what I am trying to do? Any help is
> highly appreciated.
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -7568,6 +7574,7 @@ static void disas_arm_insn(CPUARMState * env, DisasContext *s)
>              case 5: /* dmb */
>              case 6: /* isb */
>                  ARCH(7);
> +        tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);

tcg_gen_addi_i32, as you're adding an immediate.
Enabling TCG debug would catch such errors.
Not sure if that will fix crashing, but it should improve accuracy (:

>                  /* We don't emulate caches so these are a no-op.  */
>                  return;
>              default:
> @@ -9740,6 +9747,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
>                          case 4: /* dsb */
>                          case 5: /* dmb */
>                          case 6: /* isb */
> +                tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);

Here as well.

>                              /* These execute as NOPs.  */
>                              break;
>                          default:
> @@ -11022,6 +11030,7 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>              tcg_gen_debug_insn_start(dc->pc);
>          }
>
> +    tcg_gen_add_i32(cpu_insn_count, cpu_insn_count, 1);

And here.
Pranith Kumar Oct. 16, 2014, 2 a.m. UTC | #2
Hi Max,

On Wed, Oct 15, 2014 at 9:54 PM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> Hi,
>
> On Thu, Oct 16, 2014 at 5:45 AM, Pranith Kumar <bobby.prani@gmail.com> wrote:
>> Is there something obviously wrong with what I am trying to do? Any help is
>> highly appreciated.
>> --- a/target-arm/translate.c
>> +++ b/target-arm/translate.c
>> @@ -7568,6 +7574,7 @@ static void disas_arm_insn(CPUARMState * env, DisasContext *s)
>>              case 5: /* dmb */
>>              case 6: /* isb */
>>                  ARCH(7);
>> +        tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);
>
> tcg_gen_addi_i32, as you're adding an immediate.
> Enabling TCG debug would catch such errors.
> Not sure if that will fix crashing, but it should improve accuracy (:

I had that before the change to tcg_gen_add_i32 and the crashes were
still there. But yes, this needs to be changed to use addi.

How do I enable the TCG debug build?

Thanks!

>
>>                  /* We don't emulate caches so these are a no-op.  */
>>                  return;
>>              default:
>> @@ -9740,6 +9747,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
>>                          case 4: /* dsb */
>>                          case 5: /* dmb */
>>                          case 6: /* isb */
>> +                tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);
>
> Here as well.
>
>>                              /* These execute as NOPs.  */
>>                              break;
>>                          default:
>> @@ -11022,6 +11030,7 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>>              tcg_gen_debug_insn_start(dc->pc);
>>          }
>>
>> +    tcg_gen_add_i32(cpu_insn_count, cpu_insn_count, 1);
>
> And here.
>
> --
> Thanks.
> -- Max
Max Filippov Oct. 16, 2014, 2:12 a.m. UTC | #3
On Thu, Oct 16, 2014 at 6:00 AM, Pranith Kumar <bobby.prani@gmail.com> wrote:
>>> +        tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);
>>
>> tcg_gen_addi_i32, as you're adding an immediate.
>> Enabling TCG debug would catch such errors.
>> Not sure if that will fix crashing, but it should improve accuracy (:
>
> I had that before the change to tcg_gen_add_i32 and the crashes were
> still there. But yes, this needs to be changed to use addi.

Got backtraces of your crashes?

> How do I enable the TCG debug build?

configure with --enable-debug-tcg swithch.
Peter Maydell Oct. 16, 2014, 8:05 a.m. UTC | #4
On 16 October 2014 03:45, Pranith Kumar <bobby.prani@gmail.com> wrote:
> The problem I am facing is that this seems to be crashing when run with a
> multi-threaded executable.

This is nothing to do with your changes -- user-mode QEMU does not
support multi-threaded guest executables. QEMU may crash, hang,
or stop with an assertion failure, fairly randomly. Don't try
to run multithreaded guests :-)

> Also the statistics gathered are not really accurate.

This will be because you're using add rather than addi,
so you're adding effectively a random number to the count
every time (whatever TCG's "value in temporary 1" happens
to be, I expect).

thanks
-- PMM
Pranith Kumar Oct. 16, 2014, 4:01 p.m. UTC | #5
Hi Peter,

On Thu, Oct 16, 2014 at 4:05 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 16 October 2014 03:45, Pranith Kumar <bobby.prani@gmail.com> wrote:
>> The problem I am facing is that this seems to be crashing when run with a
>> multi-threaded executable.
>
> This is nothing to do with your changes -- user-mode QEMU does not
> support multi-threaded guest executables. QEMU may crash, hang,
> or stop with an assertion failure, fairly randomly. Don't try
> to run multithreaded guests :-)

OK, I will try to gather the statistics in system mode. Is there any
way to indicate from within the system to  qemu to start collecting the stats?

I dont want to collect the stats for bootup and other unrelated code paths.

>
>> Also the statistics gathered are not really accurate.
>
> This will be because you're using add rather than addi,
> so you're adding effectively a random number to the count
> every time (whatever TCG's "value in temporary 1" happens
> to be, I expect).

OK, I updated this when Max Filippov  pointed it out. Thanks!
Pranith Kumar Oct. 16, 2014, 4:05 p.m. UTC | #6
On Wed, Oct 15, 2014 at 10:12 PM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> On Thu, Oct 16, 2014 at 6:00 AM, Pranith Kumar <bobby.prani@gmail.com> wrote:
>>>> +        tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);
>>>
>>> tcg_gen_addi_i32, as you're adding an immediate.
>>> Enabling TCG debug would catch such errors.
>>> Not sure if that will fix crashing, but it should improve accuracy (:
>>
>> I had that before the change to tcg_gen_add_i32 and the crashes were
>> still there. But yes, this needs to be changed to use addi.
>
> Got backtraces of your crashes?
>

These bugs seem to be Heisenbugs. As Peter Maydell pointed out,
apparently multi-threaded user mode execution is not supported.

I will try to use system mode.

Thanks!
Alex Bennée Oct. 17, 2014, 10:43 a.m. UTC | #7
Pranith Kumar <bobby.prani@gmail.com> writes:

> Hi Peter,
>
> On Thu, Oct 16, 2014 at 4:05 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 16 October 2014 03:45, Pranith Kumar <bobby.prani@gmail.com> wrote:
>>> The problem I am facing is that this seems to be crashing when run with a
>>> multi-threaded executable.
>>
>> This is nothing to do with your changes -- user-mode QEMU does not
>> support multi-threaded guest executables. QEMU may crash, hang,
>> or stop with an assertion failure, fairly randomly. Don't try
>> to run multithreaded guests :-)
>
> OK, I will try to gather the statistics in system mode. Is there any
> way to indicate from within the system to  qemu to start collecting the stats?
>
> I dont want to collect the stats for bootup and other unrelated code
> paths.

You could enable CONTEXTIDR in the kernel and then make your counts
against an array indexed by it. I've used that technique before to
profile guest hot-blocks and work out which guest PID was responsible.
diff mbox

Patch

diff --git a/linux-user/main.c b/linux-user/main.c
index b453a39..7984027 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -816,7 +816,15 @@  void cpu_loop(CPUARMState *env)
                             break;
                         }
                     } else {
-                        env->regs[0] = do_syscall(env,
+            // print stats on exit
+            if (n == 248) {
+                FILE *icount_file = fopen("inscount.out", "w");
+                unsigned long total  = (unsigned long)env->fence_count;
+                unsigned long icount = (unsigned long)env->insn_count;
+                fprintf(icount_file, "%lu, %lu, %f\n", icount, total, total * 1000.0/icount);
+                fclose(icount_file);
+            }
+            env->regs[0] = do_syscall(env,
                                                   n,
                                                   env->regs[0],
                                                   env->regs[1],
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 369d472..be38574 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -304,6 +304,8 @@  typedef struct CPUARMState {
     uint64_t exclusive_test;
     uint32_t exclusive_info;
 #endif
+    uint32_t fence_count;
+    uint32_t insn_count;
 
     /* iwMMXt coprocessor state.  */
     struct {
diff --git a/target-arm/translate.c b/target-arm/translate.c
index cf4e767..4d4ceb1 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -68,6 +68,8 @@  static TCGv_i64 cpu_exclusive_val;
 static TCGv_i64 cpu_exclusive_test;
 static TCGv_i32 cpu_exclusive_info;
 #endif
+static TCGv_i32 cpu_fence_count;
+static TCGv_i32 cpu_insn_count;
 
 /* FIXME:  These should be removed.  */
 static TCGv_i32 cpu_F0s, cpu_F1s;
@@ -106,6 +108,10 @@  void arm_translate_init(void)
     cpu_exclusive_info = tcg_global_mem_new_i32(TCG_AREG0,
         offsetof(CPUARMState, exclusive_info), "exclusive_info");
 #endif
+    cpu_fence_count = tcg_global_mem_new_i32(TCG_AREG0,
+            offsetof(CPUARMState, fence_count), "fence_count");
+    cpu_insn_count = tcg_global_mem_new_i32(TCG_AREG0,
+            offsetof(CPUARMState, insn_count), "insn_count");
 
     a64_translate_init();
 }
@@ -7568,6 +7574,7 @@  static void disas_arm_insn(CPUARMState * env, DisasContext *s)
             case 5: /* dmb */
             case 6: /* isb */
                 ARCH(7);
+        tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);
                 /* We don't emulate caches so these are a no-op.  */
                 return;
             default:
@@ -9740,6 +9747,7 @@  static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                         case 4: /* dsb */
                         case 5: /* dmb */
                         case 6: /* isb */
+                tcg_gen_add_i32(cpu_fence_count, cpu_fence_count, 1);
                             /* These execute as NOPs.  */
                             break;
                         default:
@@ -11022,6 +11030,7 @@  static inline void gen_intermediate_code_internal(ARMCPU *cpu,
             tcg_gen_debug_insn_start(dc->pc);
         }
 
+    tcg_gen_add_i32(cpu_insn_count, cpu_insn_count, 1);
         if (dc->thumb) {
             disas_thumb_insn(env, dc);
             if (dc->condexec_mask) {