From patchwork Sat May 8 06:41:22 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Igor V. Kovalenko" X-Patchwork-Id: 51962 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 84136B7D1A for ; Sat, 8 May 2010 16:43:31 +1000 (EST) Received: from localhost ([127.0.0.1]:33366 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OAdkz-0007sX-T9 for incoming@patchwork.ozlabs.org; Sat, 08 May 2010 02:43:25 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1OAdjA-0007Rt-Ud for qemu-devel@nongnu.org; Sat, 08 May 2010 02:41:32 -0400 Received: from [140.186.70.92] (port=54765 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OAdj7-0007RD-Rn for qemu-devel@nongnu.org; Sat, 08 May 2010 02:41:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OAdj4-0001Jr-IK for qemu-devel@nongnu.org; Sat, 08 May 2010 02:41:29 -0400 Received: from mail-fx0-f45.google.com ([209.85.161.45]:43741) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OAdj4-0001Jg-6H for qemu-devel@nongnu.org; Sat, 08 May 2010 02:41:26 -0400 Received: by fxm12 with SMTP id 12so1211929fxm.4 for ; Fri, 07 May 2010 23:41:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=jxnbuNf4I/As8FkyUs30msOeuKEnR82PiZuU57jhIdk=; b=uGGJyahFdzyPoDQrnZpcgM+dmzhpKMesYJ6ePPozC4wpl5jd3MPcYPUxx6B2O8u++M 1bCJHCpw0LUemofZPTUZd2xZfe/7SY6p2MoJKHK6OdRrqPFimiLlUIf/iwl8wySahJ4v Z/sDazVmgSzRis7hMAo/synFzqv2e6Wm0v5ek= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=BCPg3Yygq/XkHHlO0GvJ7oUKcQeIP4z8qkM+peaiecgHflylsxgH5BIPUk4cSZolqF gLg5gySgNJ2CQ69f3tVQvpjYqO0ZpZB/U8pFln8qsJk6C+HY8iFNkDe+vZnDFkTI1YXe aPlN6q+FwzpkhBYdfZgtsp9VPmiAipC2owbJc= MIME-Version: 1.0 Received: by 10.102.254.24 with SMTP id b24mr532773mui.5.1273300882681; Fri, 07 May 2010 23:41:22 -0700 (PDT) Received: by 10.103.248.10 with HTTP; Fri, 7 May 2010 23:41:22 -0700 (PDT) In-Reply-To: References: Date: Sat, 8 May 2010 10:41:22 +0400 Message-ID: From: Igor Kovalenko To: Blue Swirl X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) Cc: qemu-devel Subject: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Thu, May 6, 2010 at 10:51 PM, Blue Swirl wrote: > On 5/5/10, Igor Kovalenko wrote: >> On Wed, May 5, 2010 at 12:21 AM, Blue Swirl wrote: >>  > On 5/3/10, Igor Kovalenko wrote: >>  >> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl wrote: >>  >>  > On 5/3/10, Igor Kovalenko wrote: >>  >>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl wrote: >>  >>  >>  > On 5/3/10, Igor Kovalenko wrote: >>  >>  >>  >> Hi! >>  >>  >>  >> >>  >>  >>  >>  There is an issue with lazy conditional codes evaluation where >>  >>  >>  >>  we return from trap handler with mismatching conditionals. >>  >>  >>  >> >>  >>  >>  >>  I seldom reproduce it here when dragging qemu window while >>  >>  >>  >>  machine is working through silo initialization. I use gentoo minimal cd >>  >>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot >>  >>  >>  >>  would experience the same. Once in a while it would report crc error, >>  >>  >>  >>  unable to open cd partition or it would fail to decompress image. >>  >>  >>  > >>  >>  >>  > I think I've also seen this. >>  >>  >>  > >>  >>  >>  >>  Pattern that fails appears to require a sequence of compare insn >>  >>  >>  >>  possibly followed by a few instructions which do not touch conditionals, >>  >>  >>  >>  then conditional branch insn. If it happens that we trap while processing >>  >>  >>  >>  conditional branch insn so it is restarted after return from trap then >>  >>  >>  >>  seldom conditional codes are calculated incorrectly. >>  >>  >>  >> >>  >>  >>  >>  I cannot point to exact cause but it appears that after trap return >>  >>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere, >>  >>  >>  >>  since adding more cond evaluation flushes over the code helps. >>  >>  >>  >> >>  >>  >>  >>  We already tried doing flush more frequently and it is still not >>  >>  >>  >>  complete, so the question is how to finally do this once and right :) >>  >>  >>  >> >>  >>  >>  >>  Obviously I do not get the design of lazy evaluation right, but >>  >>  >>  >>  the following list appears to be good start. Plan is to prepare >>  >>  >>  >>  a change to qemu and find a way to test it. >>  >>  >>  >> >>  >>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to >>  >>  >>  >>    use DisasContext->cc_op to predict if flags should be not evaluated >>  >>  >>  >>    due to overriding insn. Instead we can drop cc_op from disassembler >>  >>  >>  >>    context and simplify code to only use cc_op from env. >>  >>  >>  > >>  >>  >>  > Not currently, but in the future we may use that to do even lazier >>  >>  >>  > flags computation. For example the sequence 'cmp x, y; bne target' >>  >>  >>  > could be much more optimal by changing the branch to do the >>  >>  >>  > comparison. Here's an old unfinished patch to do some of this. >> >> >> I wonder if it buys anything. Sparc RISC architecture means optimizing compiler >>  would prevent any extra flags computation, right? So it is basically 1-to-1 >>  conditional computation and use. Or even worse, if we delay computation >>  until there are two or more consumers, correct? > > For x86 target, that is the other part of savings from using lazy > condition computation. It's true that it will not benefit RISC targets > much. > > But I think you are missing the other part, where the actual flags are > not calculated but instead the original comparison can be used. > > For example, consider this Sparc64 code: > IN: > 0x000001fff000be58:  cmp  %g2, 0x51 > > OUT: [size=82] > 0x40df16b0:  mov    0x10(%r14),%rbp > 0x40df16b4:  mov    %rbp,%rbx > 0x40df16b7:  sub    $0x51,%rbx > 0x40df16bb:  mov    %rbx,%r12 > 0x40df16be:  mov    %r12,0x10a60(%r14) > 0x40df16c5:  mov    %rbp,0x60(%r14) > 0x40df16c9:  mov    $0x51,%ebp > 0x40df16ce:  mov    %rbp,0x68(%r14) > 0x40df16d2:  mov    %rbx,0x70(%r14) > 0x40df16d6:  mov    $0x7,%ebp > 0x40df16db:  mov    %ebp,0x78(%r14) > 0x40df16df:  mov    $0x1fff000be5c,%rbp > 0x40df16e9:  mov    %rbp,0x48(%r14) > 0x40df16ed:  mov    $0x1fff000be60,%rbp > 0x40df16f7:  mov    %rbp,0x50(%r14) > 0x40df16fb:  xor    %eax,%eax > 0x40df16fd:  jmpq   0xbfface > > -------------- > IN: > 0x000001fff000be5c:  bne  0x1fff000c268 > > OUT: [size=95] > 0x40df1710:  callq  0x518260 > 0x40df1715:  mov    0x98(%r14),%ebp > 0x40df171c:  mov    %ebp,%ebx > 0x40df171e:  shr    $0x16,%rbx > 0x40df1722:  and    $0x1,%ebx > 0x40df1725:  xor    $0x1,%rbx > 0x40df1729:  mov    %rbx,0x90(%r14) > 0x40df1730:  mov    $0x1fff000be60,%rbp > 0x40df173a:  mov    %rbp,0x48(%r14) > 0x40df173e:  test   %rbx,%rbx > 0x40df1741:  je     0x40df175a > 0x40df1747:  mov    $0x1fff000c268,%rbp > 0x40df1751:  mov    %rbp,0x50(%r14) > 0x40df1755:  jmpq   0x40df1768 > 0x40df175a:  mov    $0x1fff000be64,%rbp > 0x40df1764:  mov    %rbp,0x50(%r14) > 0x40df1768:  xor    %eax,%eax > 0x40df176a:  jmpq   0xbfface > > Instead of calculating any flags, we should generate for combined > 'cmp/btest + branch' sequences something like: > mov    0x10(%r14),%rbp > mov    %rbp,%rbx > cmp    $0x51,%rbx > je     0x40ac275a > mov    $0x1fff000c268,%rbp > mov    %rbp,0x50(%r14) > jmpq   0x40ac2768 > mov    $0x1fff000be64,%rbp > mov    %rbp,0x50(%r14) > xor    %eax,%eax > jmpq   0xbfface > > But I fully agree that correct code generation is the most important issue. > >>  >>  >>  > >>  >>  >>  >>    Another point is that we always write to env->cc_op when >>  >>  >>  >>  translating *cc insns >>  >>  >>  >>    This should solve any issue with dc->cc_op prediction going >>  >>  >>  >>    out of sync with env->cc_op and cpu_cc_src* >>  >>  >>  > >>  >>  >>  > I think this is what is happening now. >>  >>  >>  > >>  >>  >>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when >>  >>  >>  >>    a. conditional code is required by insn (like addc, cond branch etc.) >>  >>  >>  >>       - here we can optimize by evaluating specific bits (carry?) >>  >>  >>  >>       - not sure if it works in case we have two cond consuming insns, >>  >>  >>  >>         where first needs carry another needs the rest of flags >>  >>  >>  > >>  >>  >>  > Here's another patch to optimize C flag handling. It doesn't pass my >>  >>  >>  > tests though. >>  >>  >>  > >>  >>  >>  >>    b. CCR is read by rdccr (helper_rdccr) >>  >>  >>  >>       - have to compute all flags >>  >>  >>  >>    c. trap occurs and we prepare trap level context (saving pstate) >>  >>  >>  >>       - have to compute all flags >>  >>  >>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env) >>  >>  >>  >>       - have to compute all flags >>  >>  >>  > >>  >>  >>  > Fully agree. >>  >>  >> >>  >>  >> >>  >>  >> Cool >>  >>  >> >>  >>  >>  Still I'd propose to kill dc->cc_op, find a reliable way to test it >>  >>  >>  and then add it back possibly with more optimizations. >>  >>  >>  I'm lost in the code up to the point where I believe we need to >>  >>  >>  save/restore cc_op and cpu_cc* while switching trap levels. >>  >>  > >>  >>  > I'd think this should do the trick: >>  >>  > >>  >>  > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c >>  >>  > index b27778b..94921cd 100644 >>  >>  > --- a/target-sparc/op_helper.c >>  >>  > +++ b/target-sparc/op_helper.c >>  >>  > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env) >>  >>  >     } >>  >>  >     tsptr = cpu_tsptr(env); >>  >>  > >>  >>  > +    helper_compute_psr(); >>  >>  > + >>  >>  >     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) | >>  >>  >         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) | >>  >>  >         GET_CWP64(env); >>  >>  > >>  >> >>  >> >>  >> Thanks, this change seems to work here for silo issue. >>  >> >>  >>  Another change would be to flush for gdbstub use of GET_CCR and for >>  >>  helper_rdccr. >>  >>  I tried to embed flush into GET_CCR but the code looks ugly since we >>  >>  need to proxy a call to helper_compute_psr from gdbstub passing >>  >>  available env pointer. >>  >> >>  >>  Not really tested with your changes, but still what is the breakage you see? >>  > >>  > Aurora 2.0 (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/) >>  > breaks. >>  > >>  > This is what I get with git HEAD, having pressed enter key twice: >>  > Welcome to Aurora SPARC Linux >>  > >>  > >>  > >>  > >>  >                   +--------------+ CD Found +--------------+ >>  >                   |                                        | >>  >                   | To begin testing the CD media before   | >>  >                   | installation press OK.                 | >>  >                   |                                        | >>  >                   | Choose Skip to skip the media test     | >>  >                   | and start the installation.            | >>  >                   |                                        | >>  >                   |      +----+             +------+       | >>  >                   |      | OK |             | Skip |       | >>  >                   |      +----+             +------+       | >>  >                   |                                        | >>  >                   |                                        | >>  >                   +----------------------------------------+ >>  > >>  > >>  > >>  > >>  >  / between elements  | selects | next screen >>  > >>  > This is what I get with the C flag patch applied: >>  > Welcome to Aurora SPARC Linux >>  > >>  > >>  > >>  > >>  > >>  >                    +--------------+ Error +---------------+ >>  >                    |                                      | >>  >                    | failed to read keymap information:   | >>  >                    | Success                              | >>  >                    |                                      | >>  >                    |               +----+                 | >>  >                    |               | OK |                 | >>  >                    |               +----+                 | >>  >                    |                                      | >>  >                    |                                      | >>  >                    +--------------------------------------+ >>  > >>  > >>  > >>  > >>  > >>  > >>  >  / between elements  | selects | next screen >>  > >> >> >> I do reproduce the issue here with 0001-Convert-C-flag-input-BROKEN.patch > > The patch seems harmless, so maybe it uncovers some hideous bug. > Right, the bug is inside the gen_op_*cc helpers - we need to compute C using current flag source, whereas current implementation clobbers source then computes C. I think I now get the lazy evaluation design right :) Something along these changes to your 0001-Convert-C-flag-input-BROKEN.patch appears to be good for aurora 2.0 test: tcg_gen_mov_tl(dst, cpu_cc_dst); @@ -344,9 +344,9 @@ static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2) static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2) { + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_mov_tl(cpu_cc_src2, src2); - gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2); tcg_gen_mov_tl(dst, cpu_cc_dst); @@ -417,9 +417,9 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv src1, TCGv src2) static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2) { + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_movi_tl(cpu_cc_src2, src2); - gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2); tcg_gen_mov_tl(dst, cpu_cc_dst); @@ -427,9 +427,9 @@ static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2) static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2) { + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_mov_tl(cpu_cc_src2, src2); - gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2); tcg_gen_mov_tl(dst, cpu_cc_dst); diff --git a/target-sparc/translate.c b/target-sparc/translate.c index 94c343d..ea7c71b 100644 --- a/target-sparc/translate.c +++ b/target-sparc/translate.c @@ -334,9 +334,9 @@ static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2) static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2) { + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_movi_tl(cpu_cc_src2, src2); - gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);