Message ID | 20170829063313.10237-1-bobby.prani@gmail.com |
---|---|
State | New |
Headers | show |
Series | [1/5] target/arm: Remove stale comment | expand |
On 08/28/2017 11:33 PM, Pranith Kumar wrote: > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > tcg/aarch64/tcg-target.h | 2 ++ > tcg/arm/tcg-target.h | 2 ++ > tcg/ia64/tcg-target.h | 2 ++ > tcg/mips/tcg-target.h | 2 ++ > tcg/ppc/tcg-target.h | 2 ++ > tcg/s390/tcg-target.h | 2 ++ > tcg/sparc/tcg-target.h | 2 ++ > 7 files changed, 14 insertions(+) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On 08/28/2017 11:33 PM, Pranith Kumar wrote: > + * TODO: rewrite this comment > */ > -#define CPU_TLB_BITS \ > - MIN(8, \ > - TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS - \ > - (NB_MMU_MODES <= 1 ? 0 : \ > - NB_MMU_MODES <= 2 ? 1 : \ > - NB_MMU_MODES <= 4 ? 2 : \ > - NB_MMU_MODES <= 8 ? 3 : 4)) > +#define CPU_TLB_BITS MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS) > Ah, no. This will cause several builds to fail. You still need to restrict the *total* size of the TLB to TCG_TARGET_TLB_DISPLACEMENT_BITS. (That's not a 100% accurate statement, but is close. See the QEMU_BUILD_BUG_ON in tcg/*/*.c for specifics.) The upshot is that if a target has 2 MMU modes, we can allow them to be bigger. But if it has 8, we have to make them smaller. I was expecting you to write MIN(MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS) TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS - ...) r~
On 08/28/2017 11:33 PM, Pranith Kumar wrote: > +#if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32 > +#define TCG_TARGET_TLB_MAX_INDEX_BITS 28 > +#else > +#define TCG_TARGET_TLB_MAX_INDEX_BITS 27 > +#endif > + For the record, did it not work to actually write (32 - CPU_TLB_BITS)? I'm not fond of repeating the conditions that go into computing CPU_TLB_BITS. r~
On Tue, Aug 29, 2017 at 11:01 AM, Richard Henderson <richard.henderson@linaro.org> wrote: > On 08/28/2017 11:33 PM, Pranith Kumar wrote: >> + * TODO: rewrite this comment >> */ >> -#define CPU_TLB_BITS \ >> - MIN(8, \ >> - TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS - \ >> - (NB_MMU_MODES <= 1 ? 0 : \ >> - NB_MMU_MODES <= 2 ? 1 : \ >> - NB_MMU_MODES <= 4 ? 2 : \ >> - NB_MMU_MODES <= 8 ? 3 : 4)) >> +#define CPU_TLB_BITS MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS) >> > > Ah, no. This will cause several builds to fail. > You still need to restrict the *total* size of > the TLB to TCG_TARGET_TLB_DISPLACEMENT_BITS. > > (That's not a 100% accurate statement, but is close. > See the QEMU_BUILD_BUG_ON in tcg/*/*.c for specifics.) > > The upshot is that if a target has 2 MMU modes, > we can allow them to be bigger. But if it has 8, > we have to make them smaller. > > I was expecting you to write > > MIN(MIN(12, TCG_TARGET_TLB_MAX_INDEX_BITS) > TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS - > ...) I see what you mean. I will fix the blunder and send an updated patch. Thanks!
Pranith Kumar <bobby.prani@gmail.com> writes: > Update the comment which is not true since MTTCG. What happened to the cover letter? We seem to have a mix of patches but no summary of the overall outcome. > > Reviewed-by: Richard Henderson <rth@twiddle.net> > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > target/arm/translate-a64.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c > index 2200e25be0..f42b155d7d 100644 > --- a/target/arm/translate-a64.c > +++ b/target/arm/translate-a64.c > @@ -2012,10 +2012,6 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) > } > tcg_addr = read_cpu_reg_sp(s, rn, 1); > > - /* Note that since TCG is single threaded load-acquire/store-release > - * semantics require no extra if (is_lasr) { ... } handling. > - */ > - > if (is_excl) { > if (!is_store) { > s->is_ldex = true; -- Alex Bennée
Pranith Kumar <bobby.prani@gmail.com> writes: > Using heaptrack, I found that quite a few of our temporary allocations > are coming from allocating work items. Instead of doing this > continously, we can cache the allocated items and reuse them instead > of freeing them. > > Stats from an ARM64 guest (boot+shutdown): > > heaptrack stats(before): > allocations: 1471317 > leaked allocations: 73824 > temporary allocations: 651293 > > heaptrack stats(after): > allocations: 1143130 > leaked allocations: 73693 > temporary allocations: 487342 > > The improvement in speedup is minor and within error margins, however I think the > patch is still worth. We can also explore atomics instead of taking a lock for > the work item pool. When we where doing the original MTTCG work I looked at using GArray for the work queue, see: http://lists.gnu.org/archive/html/qemu-devel/2016-08/msg00367.html specifically: Subject: [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray Date: Tue, 2 Aug 2016 18:27:44 +0100 Message-Id: <1470158864-17651-14-git-send-email-alex.bennee@linaro.org> which I personally think might yield better results than messing around with custom allocators and GSlice and the like. You still get the dynamic sizing of a malloc based array but for operations like insertion and iterating through the work queue should be cache friendly. Once the array has (transparently) reached a reasonable size to service all allocations in the usual servicing period the same memory can be used over and over again ;-) My fondness for arrays is informed by comments by Bjarne Stroustrup: https://www.youtube.com/watch?v=YQs6IC-vgmo Obviously this patch would need to be re-worked given how much the code has changes since it was merged. > > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > cpus-common.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 60 insertions(+), 15 deletions(-) > > diff --git a/cpus-common.c b/cpus-common.c > index 59f751ecf9..ccf5f50e4e 100644 > --- a/cpus-common.c > +++ b/cpus-common.c > @@ -24,6 +24,7 @@ > #include "sysemu/cpus.h" > > static QemuMutex qemu_cpu_list_lock; > +static QemuMutex qemu_wi_pool_lock; > static QemuCond exclusive_cond; > static QemuCond exclusive_resume; > static QemuCond qemu_work_cond; > @@ -33,6 +34,49 @@ static QemuCond qemu_work_cond; > */ > static int pending_cpus; > > +typedef struct qemu_work_item { > + struct qemu_work_item *next; > + run_on_cpu_func func; > + run_on_cpu_data data; > + bool free, exclusive, done; > +} qemu_work_item; > + > +typedef struct qemu_wi_pool { > + qemu_work_item *head; > + int num_items; > +} qemu_wi_pool; > + > +qemu_wi_pool *wi_free_pool; > + > +static void qemu_init_workitem_pool(void) > +{ > + wi_free_pool = g_malloc0(sizeof(qemu_wi_pool)); > +} > + > +static void qemu_wi_pool_insert(qemu_work_item *item) > +{ > + qemu_mutex_lock(&qemu_wi_pool_lock); > + qemu_work_item *curr = atomic_read(&wi_free_pool->head); > + item->next = curr; > + wi_free_pool->head = item; > + qemu_mutex_unlock(&qemu_wi_pool_lock); > +} > + > +static qemu_work_item *qemu_wi_pool_remove(void) > +{ > + qemu_mutex_lock(&qemu_wi_pool_lock); > + qemu_work_item *curr = atomic_read(&wi_free_pool->head); > + if (curr == NULL) { > + goto out; > + } > + wi_free_pool->head = curr->next; > + curr->next = NULL; > + > + out: > + qemu_mutex_unlock(&qemu_wi_pool_lock); > + return curr; > +} > + > void qemu_init_cpu_list(void) > { > /* This is needed because qemu_init_cpu_list is also called by the > @@ -43,6 +87,9 @@ void qemu_init_cpu_list(void) > qemu_cond_init(&exclusive_cond); > qemu_cond_init(&exclusive_resume); > qemu_cond_init(&qemu_work_cond); > + > + qemu_init_workitem_pool(); > + qemu_mutex_init(&qemu_wi_pool_lock); > } > > void cpu_list_lock(void) > @@ -106,14 +153,7 @@ void cpu_list_remove(CPUState *cpu) > qemu_mutex_unlock(&qemu_cpu_list_lock); > } > > -struct qemu_work_item { > - struct qemu_work_item *next; > - run_on_cpu_func func; > - run_on_cpu_data data; > - bool free, exclusive, done; > -}; > - > -static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi) > +static void queue_work_on_cpu(CPUState *cpu, qemu_work_item *wi) > { > qemu_mutex_lock(&cpu->work_mutex); > if (cpu->queued_work_first == NULL) { > @@ -132,7 +172,7 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi) > void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data, > QemuMutex *mutex) > { > - struct qemu_work_item wi; > + qemu_work_item wi; > > if (qemu_cpu_is_self(cpu)) { > func(cpu, data); > @@ -156,9 +196,11 @@ void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data, > > void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data) > { > - struct qemu_work_item *wi; > + qemu_work_item *wi = qemu_wi_pool_remove(); > > - wi = g_malloc0(sizeof(struct qemu_work_item)); > + if (!wi) { > + wi = g_malloc0(sizeof(qemu_work_item)); > + } > wi->func = func; > wi->data = data; > wi->free = true; > @@ -299,9 +341,11 @@ void cpu_exec_end(CPUState *cpu) > void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, > run_on_cpu_data data) > { > - struct qemu_work_item *wi; > + qemu_work_item *wi = qemu_wi_pool_remove(); > > - wi = g_malloc0(sizeof(struct qemu_work_item)); > + if (!wi) { > + wi = g_malloc0(sizeof(qemu_work_item)); > + } > wi->func = func; > wi->data = data; > wi->free = true; > @@ -312,7 +356,7 @@ void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, > > void process_queued_cpu_work(CPUState *cpu) > { > - struct qemu_work_item *wi; > + qemu_work_item *wi; > > if (cpu->queued_work_first == NULL) { > return; > @@ -343,7 +387,8 @@ void process_queued_cpu_work(CPUState *cpu) > } > qemu_mutex_lock(&cpu->work_mutex); > if (wi->free) { > - g_free(wi); > + memset(wi, 0, sizeof(qemu_work_item)); > + qemu_wi_pool_insert(wi); > } else { > atomic_mb_set(&wi->done, true); > } -- Alex Bennée
Hi Alex, On Tue, Sep 5, 2017 at 8:02 AM, Alex Bennée <alex.bennee@linaro.org> wrote: > > Pranith Kumar <bobby.prani@gmail.com> writes: > >> Update the comment which is not true since MTTCG. > > What happened to the cover letter? We seem to have a mix of patches but > no summary of the overall outcome. > These are a bunch of unrelated patches, so there is no theme. I will include a cover letter saying so from now on. Thanks,
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 2200e25be0..f42b155d7d 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -2012,10 +2012,6 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) } tcg_addr = read_cpu_reg_sp(s, rn, 1); - /* Note that since TCG is single threaded load-acquire/store-release - * semantics require no extra if (is_lasr) { ... } handling. - */ - if (is_excl) { if (!is_store) { s->is_ldex = true;