From patchwork Mon Jul 20 01:54:47 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Benjamin Herrenschmidt <benh@kernel.crashing.org>
X-Patchwork-Id: 497547
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 16FE8140271
	for <incoming@patchwork.ozlabs.org>;
	Mon, 20 Jul 2015 11:55:44 +1000 (AEST)
Received: from localhost ([::1]:53347 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)
	id 1ZH0Iw-0005lI-2w
	for incoming@patchwork.ozlabs.org; Sun, 19 Jul 2015 21:55:42 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46236)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1ZH0IU-0005DR-MY
	for qemu-devel@nongnu.org; Sun, 19 Jul 2015 21:55:15 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1ZH0IT-0006Xa-IU
	for qemu-devel@nongnu.org; Sun, 19 Jul 2015 21:55:14 -0400
Received: from gate.crashing.org ([63.228.1.57]:48223)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>)
	id 1ZH0IO-0006Q2-0Y; Sun, 19 Jul 2015 21:55:08 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by gate.crashing.org (8.14.1/8.13.8) with ESMTP id t6K1slKE031124;
	Sun, 19 Jul 2015 20:54:48 -0500
Message-ID: <1437357287.28088.132.camel@kernel.crashing.org>
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: qemu-devel@nongnu.org
Date: Mon, 20 Jul 2015 11:54:47 +1000
X-Mailer: Evolution 3.12.11-0ubuntu3 
Mime-Version: 1.0
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 63.228.1.57
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-ppc@nongnu.org,
	Alexander Graf <agraf@suse.de>, Aurelien Jarno <aurelien@aurel32.net>,
	Richard Henderson <rth@twiddle.net>
Subject: [Qemu-devel] [RFC PATCH] tcg/ppc: Improve unaligned load/store
	handling on 64-bit backend
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Currently, we get to the slow path for any unaligned access in the
backend, because we effectively preserve the bottom address bits
below the alignment requirement when comparing with the TLB entry,
so any non-0 bit there will cause the compare to fail.

For the same number of instructions, we can instead add the access
size - 1 to the address and stick to clearing all the bottom bits.

That means that normal unaligned accesses will not fallback (the HW
will handle them fine). Only when crossing a page boundary well we
end up having a mismatch because we'll end up pointing to the next
page which cannot possibly be in that same TLB entry.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Note: I have verified things still work by booting an x86_64 ubuntu
installer on ppc64. I haven't noticed a large performance difference,
going to the full xubuntu installer took 5:45 instead of 5:51 on the
test machine I used, but I felt this is still worthwhile in case one
hits a worst-case scenario with a lot of unaligned accesses.

Note2: It would be nice to be able to pass larger load/stores to the
backend... it means we would need to use a higher bit in the TLB entry
for "invalid" and a bunch more macros in the front-end, but it could
be quite helpful speeding up things like memcpy which on ppc64 use
vector load/stores, or speeding up the new ppc lq/stq instructions.

Anybody already working on that ?

Note3: Hacking TCG is very new to me, so I apologize in advance for
any stupid oversight. I also assume other backends can probably use
the same trick if not already...

diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index 2b6eafa..59864bf 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -1426,13 +1426,18 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits,
     if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
         tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
                     (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
-    } else if (!s_bits) {
-        tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo,
-                    0, 63 - TARGET_PAGE_BITS);
     } else {
-        tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
-                    64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - s_bits);
-        tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
+       /* Alignment check trick: We add the access_size-1 to the address
+        * before masking the low bits. That will make the address overflow
+        * to the next page if we cross a page boundary which will then
+        * force a mismatch of the TLB compare since the next page cannot
+        * possibly be in the same TLB index.
+        */
+        if (s_bits) {
+            tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1));
+        }
+        tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0,
+                    0, 63 - TARGET_PAGE_BITS);
     }
 
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {