Message ID | 1440375847-17603-21-git-send-email-cota@braap.org |
---|---|
State | New |
Headers | show |
On 23/08/2015 17:23, Emilio G. Cota wrote: > + case INDEX_op_fence_load: > + tcg_out_fence(s, 0xe8); > + break; > + case INDEX_op_fence_full: > + tcg_out_fence(s, 0xf0); > + break; > + case INDEX_op_fence_store: > + tcg_out_fence(s, 0xf8); > + break; > + lfence and sfence are not needed in generated code; all loads are acquires and all stores are release on x86. Also, on targets that do not have MFENCE you want to generate something like "lock addl $0, (%esp)". Paolo
On Sun, Aug 23, 2015 at 18:32:51 -0700, Paolo Bonzini wrote: > > > On 23/08/2015 17:23, Emilio G. Cota wrote: > > + case INDEX_op_fence_load: > > + tcg_out_fence(s, 0xe8); > > + break; > > + case INDEX_op_fence_full: > > + tcg_out_fence(s, 0xf0); > > + break; > > + case INDEX_op_fence_store: > > + tcg_out_fence(s, 0xf8); > > + break; > > + > > lfence and sfence are not needed in generated code; all loads are > acquires and all stores are release on x86. lfence and sfence here serve two purposes: 1) Template for other architectures 2) x86 code does sometimes have lfence/sfence (e.g. movntq+sfence), so I guessed they should remain in the translated code. If on x86 we always ignore the Write-Combining from the guest, maybe we could claim the l/sfence pair here is really unnecessary. I'm no x86 expert so for a first stab I decided to put them there. I didn't intend to translate say *all* PPC/ARM load barriers into lfences when generating x86, which is I think your point. > Also, on targets that do not have MFENCE you want to generate something > like "lock addl $0, (%esp)". Good point, will fix. Thanks, Emilio
> lfence and sfence here serve two purposes: > > 1) Template for other architectures Ok, this makes sense. > 2) x86 code does sometimes have lfence/sfence (e.g. movntq+sfence), > so I guessed they should remain in the translated code. > If on x86 we always ignore the Write-Combining from the > guest, maybe we could claim the l/sfence pair here is really unnecessary. Yeah, I think it's fair enough to ignore WC and nontemporal stores. > I didn't intend to translate say *all* PPC/ARM load barriers > into lfences when generating x86, which is I think your point. Yeah, it's just that the only gen_op_smp_rmb() you had in the RFC also did not need an lfence. But it seems like we're on the same page. Paolo
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c index 887f22f..6600c45 100644 --- a/tcg/i386/tcg-target.c +++ b/tcg/i386/tcg-target.c @@ -1123,6 +1123,13 @@ static void tcg_out_jmp(TCGContext *s, tcg_insn_unit *dest) tcg_out_branch(s, 0, dest); } +static inline void tcg_out_fence(TCGContext *s, uint8_t op) +{ + tcg_out8(s, 0x0f); + tcg_out8(s, 0xae); + tcg_out8(s, op); +} + #if defined(CONFIG_SOFTMMU) /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr, * int mmu_idx, uintptr_t ra) @@ -2088,6 +2095,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, } break; + case INDEX_op_fence_load: + tcg_out_fence(s, 0xe8); + break; + case INDEX_op_fence_full: + tcg_out_fence(s, 0xf0); + break; + case INDEX_op_fence_store: + tcg_out_fence(s, 0xf8); + break; + case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ @@ -2226,6 +2243,9 @@ static const TCGTargetOpDef x86_op_defs[] = { { INDEX_op_qemu_ld_i64, { "r", "r", "L", "L" } }, { INDEX_op_qemu_st_i64, { "L", "L", "L", "L" } }, #endif + { INDEX_op_fence_load, { } }, + { INDEX_op_fence_store, { } }, + { INDEX_op_fence_full, { } }, { -1 }, };
Signed-off-by: Emilio G. Cota <cota@braap.org> --- tcg/i386/tcg-target.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)