diff mbox series

[1/5] target/arm: Remove stale comment

Message ID 20170829063313.10237-1-bobby.prani@gmail.com
State New
Headers show
Series [1/5] target/arm: Remove stale comment | expand

Commit Message

Pranith Kumar Aug. 29, 2017, 6:33 a.m. UTC
Update the comment which is not true since MTTCG.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 target/arm/translate-a64.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Richard Henderson Aug. 29, 2017, 2:51 p.m. UTC | #1
On 08/28/2017 11:33 PM, Pranith Kumar wrote:
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  tcg/aarch64/tcg-target.h | 2 ++
>  tcg/arm/tcg-target.h     | 2 ++
>  tcg/ia64/tcg-target.h    | 2 ++
>  tcg/mips/tcg-target.h    | 2 ++
>  tcg/ppc/tcg-target.h     | 2 ++
>  tcg/s390/tcg-target.h    | 2 ++
>  tcg/sparc/tcg-target.h   | 2 ++
>  7 files changed, 14 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~
Richard Henderson Aug. 29, 2017, 3:01 p.m. UTC | #2
On 08/28/2017 11:33 PM, Pranith Kumar wrote:
> + * TODO: rewrite this comment
>   */
> -#define CPU_TLB_BITS                                             \
> -    MIN(8,                                                       \
> -        TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS -  \
> -        (NB_MMU_MODES <= 1 ? 0 :                                 \
> -         NB_MMU_MODES <= 2 ? 1 :                                 \
> -         NB_MMU_MODES <= 4 ? 2 :                                 \
> -         NB_MMU_MODES <= 8 ? 3 : 4))
> +#define CPU_TLB_BITS MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS)
>  

Ah, no.  This will cause several builds to fail.
You still need to restrict the *total* size of
the TLB to TCG_TARGET_TLB_DISPLACEMENT_BITS.

(That's not a 100% accurate statement, but is close.
See the QEMU_BUILD_BUG_ON in tcg/*/*.c for specifics.)

The upshot is that if a target has 2 MMU modes,
we can allow them to be bigger.  But if it has 8,
we have to make them smaller.

I was expecting you to write

  MIN(MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS)
      TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS -
      ...)

r~
Richard Henderson Aug. 29, 2017, 3:03 p.m. UTC | #3
On 08/28/2017 11:33 PM, Pranith Kumar wrote:
> +#if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
> +#define TCG_TARGET_TLB_MAX_INDEX_BITS 28
> +#else
> +#define TCG_TARGET_TLB_MAX_INDEX_BITS 27
> +#endif
> +

For the record, did it not work to actually write (32 - CPU_TLB_BITS)?  I'm not
fond of repeating the conditions that go into computing CPU_TLB_BITS.


r~
Pranith Kumar Aug. 29, 2017, 4:23 p.m. UTC | #4
On Tue, Aug 29, 2017 at 11:01 AM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 08/28/2017 11:33 PM, Pranith Kumar wrote:
>> + * TODO: rewrite this comment
>>   */
>> -#define CPU_TLB_BITS                                             \
>> -    MIN(8,                                                       \
>> -        TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS -  \
>> -        (NB_MMU_MODES <= 1 ? 0 :                                 \
>> -         NB_MMU_MODES <= 2 ? 1 :                                 \
>> -         NB_MMU_MODES <= 4 ? 2 :                                 \
>> -         NB_MMU_MODES <= 8 ? 3 : 4))
>> +#define CPU_TLB_BITS MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS)
>>
>
> Ah, no.  This will cause several builds to fail.
> You still need to restrict the *total* size of
> the TLB to TCG_TARGET_TLB_DISPLACEMENT_BITS.
>
> (That's not a 100% accurate statement, but is close.
> See the QEMU_BUILD_BUG_ON in tcg/*/*.c for specifics.)
>
> The upshot is that if a target has 2 MMU modes,
> we can allow them to be bigger.  But if it has 8,
> we have to make them smaller.
>
> I was expecting you to write
>
>   MIN(MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS)
>       TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS -
>       ...)

I see what you mean. I will fix the blunder and send an updated patch.

Thanks!
Alex Bennée Sept. 5, 2017, 12:02 p.m. UTC | #5
Pranith Kumar <bobby.prani@gmail.com> writes:

> Update the comment which is not true since MTTCG.

What happened to the cover letter? We seem to have a mix of patches but
no summary of the overall outcome.

>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  target/arm/translate-a64.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 2200e25be0..f42b155d7d 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -2012,10 +2012,6 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
>      }
>      tcg_addr = read_cpu_reg_sp(s, rn, 1);
>
> -    /* Note that since TCG is single threaded load-acquire/store-release
> -     * semantics require no extra if (is_lasr) { ... } handling.
> -     */
> -
>      if (is_excl) {
>          if (!is_store) {
>              s->is_ldex = true;


--
Alex Bennée
Alex Bennée Sept. 5, 2017, 12:28 p.m. UTC | #6
Pranith Kumar <bobby.prani@gmail.com> writes:

> Using heaptrack, I found that quite a few of our temporary allocations
> are coming from allocating work items. Instead of doing this
> continously, we can cache the allocated items and reuse them instead
> of freeing them.
>
> Stats from an ARM64 guest (boot+shutdown):
>
> heaptrack stats(before):
>         allocations:            1471317
>         leaked allocations:     73824
>         temporary allocations:  651293
>
> heaptrack stats(after):
>         allocations:            1143130
>         leaked allocations:     73693
>         temporary allocations:  487342
>
> The improvement in speedup is minor and within error margins, however I think the
> patch is still worth. We can also explore atomics instead of taking a lock for
> the work item pool.

When we where doing the original MTTCG work I looked at using GArray for
the work queue, see:

  http://lists.gnu.org/archive/html/qemu-devel/2016-08/msg00367.html

specifically:

  Subject: [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray
  Date: Tue,  2 Aug 2016 18:27:44 +0100
  Message-Id: <1470158864-17651-14-git-send-email-alex.bennee@linaro.org>

which I personally think might yield better results than messing around
with custom allocators and GSlice and the like. You still get the
dynamic sizing of a malloc based array but for operations like insertion
and iterating through the work queue should be cache friendly.

Once the array has (transparently) reached a reasonable size to service
all allocations in the usual servicing period the same memory can be
used over and over again ;-)

My fondness for arrays is informed by comments by Bjarne Stroustrup:

  https://www.youtube.com/watch?v=YQs6IC-vgmo

Obviously this patch would need to be re-worked given how much the code
has changes since it was merged.

>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  cpus-common.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 60 insertions(+), 15 deletions(-)
>
> diff --git a/cpus-common.c b/cpus-common.c
> index 59f751ecf9..ccf5f50e4e 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -24,6 +24,7 @@
>  #include "sysemu/cpus.h"
>
>  static QemuMutex qemu_cpu_list_lock;
> +static QemuMutex qemu_wi_pool_lock;
>  static QemuCond exclusive_cond;
>  static QemuCond exclusive_resume;
>  static QemuCond qemu_work_cond;
> @@ -33,6 +34,49 @@ static QemuCond qemu_work_cond;
>   */
>  static int pending_cpus;
>
> +typedef struct qemu_work_item {
> +    struct qemu_work_item *next;
> +    run_on_cpu_func func;
> +    run_on_cpu_data data;
> +    bool free, exclusive, done;
> +} qemu_work_item;
> +
> +typedef struct qemu_wi_pool {
> +    qemu_work_item *head;
> +    int num_items;
> +} qemu_wi_pool;
> +
> +qemu_wi_pool *wi_free_pool;
> +
> +static void qemu_init_workitem_pool(void)
> +{
> +    wi_free_pool = g_malloc0(sizeof(qemu_wi_pool));
> +}
> +
> +static void qemu_wi_pool_insert(qemu_work_item *item)
> +{
> +    qemu_mutex_lock(&qemu_wi_pool_lock);
> +    qemu_work_item *curr = atomic_read(&wi_free_pool->head);
> +    item->next = curr;
> +    wi_free_pool->head = item;
> +    qemu_mutex_unlock(&qemu_wi_pool_lock);
> +}
> +
> +static qemu_work_item *qemu_wi_pool_remove(void)
> +{
> +    qemu_mutex_lock(&qemu_wi_pool_lock);
> +    qemu_work_item *curr = atomic_read(&wi_free_pool->head);
> +    if (curr == NULL) {
> +        goto out;
> +    }
> +    wi_free_pool->head = curr->next;
> +    curr->next = NULL;
> +
> + out:
> +    qemu_mutex_unlock(&qemu_wi_pool_lock);
> +    return curr;
> +}
> +
>  void qemu_init_cpu_list(void)
>  {
>      /* This is needed because qemu_init_cpu_list is also called by the
> @@ -43,6 +87,9 @@ void qemu_init_cpu_list(void)
>      qemu_cond_init(&exclusive_cond);
>      qemu_cond_init(&exclusive_resume);
>      qemu_cond_init(&qemu_work_cond);
> +
> +    qemu_init_workitem_pool();
> +    qemu_mutex_init(&qemu_wi_pool_lock);
>  }
>
>  void cpu_list_lock(void)
> @@ -106,14 +153,7 @@ void cpu_list_remove(CPUState *cpu)
>      qemu_mutex_unlock(&qemu_cpu_list_lock);
>  }
>
> -struct qemu_work_item {
> -    struct qemu_work_item *next;
> -    run_on_cpu_func func;
> -    run_on_cpu_data data;
> -    bool free, exclusive, done;
> -};
> -
> -static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> +static void queue_work_on_cpu(CPUState *cpu, qemu_work_item *wi)
>  {
>      qemu_mutex_lock(&cpu->work_mutex);
>      if (cpu->queued_work_first == NULL) {
> @@ -132,7 +172,7 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>  void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
>                     QemuMutex *mutex)
>  {
> -    struct qemu_work_item wi;
> +    qemu_work_item wi;
>
>      if (qemu_cpu_is_self(cpu)) {
>          func(cpu, data);
> @@ -156,9 +196,11 @@ void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
>
>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
>  {
> -    struct qemu_work_item *wi;
> +    qemu_work_item *wi = qemu_wi_pool_remove();
>
> -    wi = g_malloc0(sizeof(struct qemu_work_item));
> +    if (!wi) {
> +        wi = g_malloc0(sizeof(qemu_work_item));
> +    }
>      wi->func = func;
>      wi->data = data;
>      wi->free = true;
> @@ -299,9 +341,11 @@ void cpu_exec_end(CPUState *cpu)
>  void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func,
>                             run_on_cpu_data data)
>  {
> -    struct qemu_work_item *wi;
> +    qemu_work_item *wi = qemu_wi_pool_remove();
>
> -    wi = g_malloc0(sizeof(struct qemu_work_item));
> +    if (!wi) {
> +        wi = g_malloc0(sizeof(qemu_work_item));
> +    }
>      wi->func = func;
>      wi->data = data;
>      wi->free = true;
> @@ -312,7 +356,7 @@ void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func,
>
>  void process_queued_cpu_work(CPUState *cpu)
>  {
> -    struct qemu_work_item *wi;
> +    qemu_work_item *wi;
>
>      if (cpu->queued_work_first == NULL) {
>          return;
> @@ -343,7 +387,8 @@ void process_queued_cpu_work(CPUState *cpu)
>          }
>          qemu_mutex_lock(&cpu->work_mutex);
>          if (wi->free) {
> -            g_free(wi);
> +            memset(wi, 0, sizeof(qemu_work_item));
> +            qemu_wi_pool_insert(wi);
>          } else {
>              atomic_mb_set(&wi->done, true);
>          }


--
Alex Bennée
Pranith Kumar Sept. 6, 2017, 12:35 a.m. UTC | #7
Hi Alex,

On Tue, Sep 5, 2017 at 8:02 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Pranith Kumar <bobby.prani@gmail.com> writes:
>
>> Update the comment which is not true since MTTCG.
>
> What happened to the cover letter? We seem to have a mix of patches but
> no summary of the overall outcome.
>

These are a bunch of unrelated patches, so there is no theme. I will
include a cover letter saying so from now on.

Thanks,
diff mbox series

Patch

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 2200e25be0..f42b155d7d 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2012,10 +2012,6 @@  static void disas_ldst_excl(DisasContext *s, uint32_t insn)
     }
     tcg_addr = read_cpu_reg_sp(s, rn, 1);
 
-    /* Note that since TCG is single threaded load-acquire/store-release
-     * semantics require no extra if (is_lasr) { ... } handling.
-     */
-
     if (is_excl) {
         if (!is_store) {
             s->is_ldex = true;