From patchwork Tue Nov 10 09:00:44 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joakim Tjernlund X-Patchwork-Id: 38047 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from bilbo.ozlabs.org (localhost [127.0.0.1]) by ozlabs.org (Postfix) with ESMTP id 6CDE0B7E09 for ; Tue, 10 Nov 2009 20:00:56 +1100 (EST) Received: by ozlabs.org (Postfix) id D95DEB7B73; Tue, 10 Nov 2009 20:00:49 +1100 (EST) Delivered-To: linuxppc-dev@ozlabs.org Received: from gw1.transmode.se (gw1.transmode.se [213.115.205.20]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 0F692B7B68 for ; Tue, 10 Nov 2009 20:00:48 +1100 (EST) Received: from sesr04.transmode.se (sesr04.transmode.se [192.168.201.15]) by gw1.transmode.se (Postfix) with ESMTP id B173F650002 for ; Tue, 10 Nov 2009 10:00:44 +0100 (CET) Received: from gentoo-jocke.transmode.se ([192.168.1.15]) by sesr04.transmode.se (Lotus Domino Release 8.5 HF964) with ESMTP id 2009111010004459-22988 ; Tue, 10 Nov 2009 10:00:44 +0100 Received: from gentoo-jocke.transmode.se (gentoo-jocke.transmode.se [127.0.0.1]) by gentoo-jocke.transmode.se (8.14.0/8.14.0) with ESMTP id nAA90iIV008547; Tue, 10 Nov 2009 10:00:44 +0100 Received: (from jocke@localhost) by gentoo-jocke.transmode.se (8.14.0/8.14.0/Submit) id nAA90iAC008546; Tue, 10 Nov 2009 10:00:44 +0100 From: Joakim Tjernlund To: "linuxppc-dev@ozlabs.org" Subject: [PATCH] zlib: Optimize inffast when copying direct from output Date: Tue, 10 Nov 2009 10:00:44 +0100 Message-Id: <1257843644-8496-1-git-send-email-Joakim.Tjernlund@transmode.se> X-Mailer: git-send-email 1.6.4.4 In-Reply-To: References: X-MIMETrack: Itemize by SMTP Server on sesr04/Transmode(Release 8.5 HF964|October 21, 2009) at 2009-11-10 10:00:44, Serialize by Router on sesr04/Transmode(Release 8.5 HF964|October 21, 2009) at 2009-11-10 10:00:44, Serialize complete at 2009-11-10 10:00:44 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org JFFS2 uses lesser compression ratio and inflate always ends up in "copy direct from output" case. This patch tries to optimize the direct copy procedure. Uses get_unaligned() but only in one place. The copy loop just above this one can also use this optimization, but I havn't done so as I have not tested if it is a win there too. On my MPC8321 this is about 17% faster on my JFFS2 root FS than the original. --- Would like some testing of the PowerPC boot wrapper and a LE target before sending it upstream. arch/powerpc/boot/Makefile | 4 ++- lib/zlib_inflate/inffast.c | 48 +++++++++++++++++++++++++++++++++---------- 2 files changed, 40 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 9ae7b7e..98e4c4f 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -20,7 +20,7 @@ all: $(obj)/zImage BOOTCFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ - -fno-strict-aliasing -Os -msoft-float -pipe \ + -fno-strict-aliasing -Os -msoft-float -pipe -D__KERNEL__\ -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \ -isystem $(shell $(CROSS32CC) -print-file-name=include) BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc @@ -34,6 +34,8 @@ BOOTCFLAGS += -fno-stack-protector endif BOOTCFLAGS += -I$(obj) -I$(srctree)/$(obj) +BOOTCFLAGS += -include include/linux/autoconf.h -Iarch/powerpc/include +BOOTCFLAGS += -Iinclude DTS_FLAGS ?= -p 1024 diff --git a/lib/zlib_inflate/inffast.c b/lib/zlib_inflate/inffast.c index 8550b0c..0c7fa3d 100644 --- a/lib/zlib_inflate/inffast.c +++ b/lib/zlib_inflate/inffast.c @@ -4,6 +4,7 @@ */ #include +#include #include "inftrees.h" #include "inflate.h" #include "inffast.h" @@ -24,9 +25,11 @@ #ifdef POSTINC # define OFF 0 # define PUP(a) *(a)++ +# define UP_UNALIGNED(a) get_unaligned((a)++) #else # define OFF 1 # define PUP(a) *++(a) +# define UP_UNALIGNED(a) get_unaligned(++(a)) #endif /* @@ -239,18 +242,41 @@ void inflate_fast(z_streamp strm, unsigned start) } } else { + unsigned short *sout; + unsigned long loops; + from = out - dist; /* copy direct from output */ - do { /* minimum length is three */ - PUP(out) = PUP(from); - PUP(out) = PUP(from); - PUP(out) = PUP(from); - len -= 3; - } while (len > 2); - if (len) { - PUP(out) = PUP(from); - if (len > 1) - PUP(out) = PUP(from); - } + /* minimum length is three */ + /* Align out addr */ + if (!((long)(out - 1 + OFF)) & 1) { + PUP(out) = PUP(from); + len--; + } + sout = (unsigned short *)(out - OFF); + if (dist > 2 ) { + unsigned short *sfrom; + + sfrom = (unsigned short *)(from - OFF); + loops = len >> 1; + do + PUP(sout) = UP_UNALIGNED(sfrom); + while (--loops); + out = (unsigned char *)sout + OFF; + from = (unsigned char *)sfrom + OFF; + } else { /* dist == 1 or dist == 2 */ + unsigned short pat16; + + pat16 = *(sout-2+2*OFF); + if (dist == 1) + pat16 = (pat16 & 0xff) | ((pat16 & 0xff ) << 8); + loops = len >> 1; + do + PUP(sout) = pat16; + while (--loops); + out = (unsigned char *)sout + OFF; + } + if (len & 1) + PUP(out) = PUP(from); } } else if ((op & 64) == 0) { /* 2nd level distance code */