[LEDE-DEV] sdk: automatically use all CPU cores for xz

Message ID 1502975110-3753-1-git-send-email-karlp@etactica.com
State Changes Requested
Headers show

Commit Message

Karl Palsson Aug. 17, 2017, 1:05 p.m.
xz has supported multithreaded compression since 5.2 in 2014.  Enable
it's automatic support for this via the "-T 0" flag.

Previously: (xz -7e)
real	3m13.631s

Now: (xz -T 0 -7e)
real	1m23.051s

Signed-off-by: Karl Palsson <karlp@etactica.com>
---
 target/sdk/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jonas Gorski Aug. 18, 2017, 10:13 a.m. | #1
Hi,

On 17 August 2017 at 15:05, Karl Palsson <karlp@etactica.com> wrote:
> xz has supported multithreaded compression since 5.2 in 2014.  Enable
> it's automatic support for this via the "-T 0" flag.

its ;p

>
> Previously: (xz -7e)
> real    3m13.631s
>
> Now: (xz -T 0 -7e)
> real    1m23.051s

After playing around with it, it seems enabling multiple threads makes
it compress slightly worse, at least in case of the linux kernel
sources (86.8 MiB single thread vs. 87.7 MiB multithread).

This means that the archive will have a different checksum, and thus
affects reproducability. [1] suggests this is because of a different
block size default, but unfortunately there seems to be no way to set
the block size in an attempt to make it use the same one regardless of
threads.

If we still do this, I would suggest taking over the value from -j, so
we don't use more cores than allowed, and you can force single
threaded mode on a multi core system. Also ImageBuilder and Toolchain
might profit from it as well.


Regards
Jonas

[1] https://lists.debian.org/debian-dpkg/2016/10/msg00008.html
Karl Palsson Aug. 18, 2017, 11:17 a.m. | #2
Jonas Gorski <jonas.gorski@gmail.com> wrote:
> >
> > Previously: (xz -7e)
> > real    3m13.631s
> >
> > Now: (xz -T 0 -7e)
> > real    1m23.051s
> 
> After playing around with it, it seems enabling multiple
> threads makes it compress slightly worse, at least in case of
> the linux kernel sources (86.8 MiB single thread vs. 87.7 MiB
> multithread).

I consider that to be lost in the noise when compared to bz2. T
 
> This means that the archive will have a different checksum, and
> thus affects reproducability. [1] suggests this is because of a
> different block size default, but unfortunately there seems to
> be no way to set the block size in an attempt to make it use
> the same one regardless of threads.

Further in that same thread, it seems to suggest that for _any_
build with >1 core, it will always use the same block size, so
I'm not _entirely_ convinced this is actually a real problem with
todays computers.

I've tried asking in both #debian-reproducible and
#reproducible-builds but not gotten any meaningful feedback.
(beyond: "it might")

Further...
https://tests.reproducible-builds.org/lede/lede_ar71xx.html isn't
even building the sdk or image builder or toolchain.... _and_
they don't change the cpu either. Is this really an issue that
matters?
 
> If we still do this, I would suggest taking over the value from
> -j, so we don't use more cores than allowed, and you can force
> single threaded mode on a multi core system.

If you know a nice way of getting the actual number in make,
please, go right ahead. "-T 0" was _vastly_ simpler than the sort
of jiggerypokery to get a sane number out of this optional make
argument.

> Also ImageBuilder and Toolchain might profit from it as well.

Indeed, I don't have those enabled in my build, so only noticed
the SDK being the massive timesuck. If we can get something like
this in, it could definitely be added to any place that uses xz.
(related, but not, we build a host 'xz' but then this place just
uses whatever xz the shell provides.....)

> 
> [1] https://lists.debian.org/debian-dpkg/2016/10/msg00008.html
> 

Cheers,
Karl P
Baptiste Jonglez Aug. 18, 2017, 11:49 a.m. | #3
On 18-08-17, Karl Palsson wrote:
> > This means that the archive will have a different checksum, and
> > thus affects reproducability. [1] suggests this is because of a
> > different block size default, but unfortunately there seems to
> > be no way to set the block size in an attempt to make it use
> > the same one regardless of threads.
> 
> Further in that same thread, it seems to suggest that for _any_
> build with >1 core, it will always use the same block size, so
> I'm not _entirely_ convinced this is actually a real problem with
> todays computers.

Indeed, the result is reproducible for any thread number >1.  On a
many-cores machines:

55395e07f991cb8057c73d19bbe2030b716f4a90  linux-T0.tar.xz
c5c7259f2781cddfaaae92b6b7fd7c17a8b1012a  linux-T1.tar.xz
55395e07f991cb8057c73d19bbe2030b716f4a90  linux-T2.tar.xz
55395e07f991cb8057c73d19bbe2030b716f4a90  linux-T4.tar.xz
55395e07f991cb8057c73d19bbe2030b716f4a90  linux-T7.tar.xz
55395e07f991cb8057c73d19bbe2030b716f4a90  linux-T8.tar.xz
55395e07f991cb8057c73d19bbe2030b716f4a90  linux-T11.tar.xz

On a single-core machine:

c5c7259f2781cddfaaae92b6b7fd7c17a8b1012a  linux-T0.tar.xz
c5c7259f2781cddfaaae92b6b7fd7c17a8b1012a  linux-T1.tar.xz
55395e07f991cb8057c73d19bbe2030b716f4a90  linux-T2.tar.xz

If we really wanted a reproducible output whatever -j option is passed, we
could enforce a minimum of -T2.  But this is a bad idea for single core
machines, because it uses a lot more RAM and is twice as slow:

$ time xz -T1 < linux.tar > linux-T1.tar.xz
real           5m48.682s
user           5m37.036s
sys            0m0.932s

$ time xz -T2 < linux.tar > linux-T2.tar.xz
real           11m20.137s
user           5m46.528s
sys            0m0.472s

I think the significant reduction in build time outweighs the slight
decrease in reproducibility (especially given the broad availability of
multi-core machines).

My 2 cents,
Baptiste
Bjørn Mork Aug. 18, 2017, 12:07 p.m. | #4
This looks yucky.  Experimenting a bit, I see that the result with

 a) -T 0 depends on multi-core vs single-core
 b) -T 1 is always different from the output of -T x where x > 1
 c) -T x where x > 1 is independent of both x and the actual number of
    cores

So you will not get reproducible results with either "-T 0" or "-T <value from make -j>".


Bjørn
Karl Palsson Aug. 18, 2017, 1:06 p.m. | #5
Bjørn Mork  <bjorn@mork.no> wrote:
> This looks yucky. Experimenting a bit, I see that the result
> with
> 
>  a) -T 0 depends on multi-core vs single-core
>  b) -T 1 is always different from the output of -T x where x > 1
>  c) -T x where x > 1 is independent of both x and the actual number of
>     cores
> 
> So you will not get reproducible results with either "-T 0" or
> "-T <value from make -j>".

Sure, but -T 0 will be ok for _any_ computer with more than one
core. Which is enough of them now, and enough for people doing
builds that care about reproducibility, and certainly enough of
them to appreciate the dramatically sped up builds.

CHeers,
Karl P
Jonas Gorski Aug. 26, 2017, 10:36 a.m. | #6
On 18 August 2017 at 15:06, Karl Palsson <karlp@tweak.net.au> wrote:
>
> Bjørn Mork  <bjorn@mork.no> wrote:
>> This looks yucky. Experimenting a bit, I see that the result
>> with
>>
>>  a) -T 0 depends on multi-core vs single-core
>>  b) -T 1 is always different from the output of -T x where x > 1
>>  c) -T x where x > 1 is independent of both x and the actual number of
>>     cores
>>
>> So you will not get reproducible results with either "-T 0" or
>> "-T <value from make -j>".
>
> Sure, but -T 0 will be ok for _any_ computer with more than one
> core. Which is enough of them now, and enough for people doing
> builds that care about reproducibility, and certainly enough of
> them to appreciate the dramatically sped up builds.

To get this discussion further I, if getting the -j value is far from
easy, especially in a portable way (as googling suggest), I suggest
adding a config option for enabling parallel XZ compression, either as
a bool or directly as a integer for the amount of threads to use,
defaulting to n/1. The int option is probably the best as it gives the
most control over it.

Then those that case about speed can enable it/set it 0, and those
that want to use their PC for other things at the same time can leave
it at 1 or an appropriate value so it doesn't take over the whole
system during compression.


Regards
Jonas

Patch

diff --git a/target/sdk/Makefile b/target/sdk/Makefile
index ae65fd1c8f4a..07616a4d7aa2 100644
--- a/target/sdk/Makefile
+++ b/target/sdk/Makefile
@@ -140,7 +140,7 @@  $(BIN_DIR)/$(SDK_NAME).tar.xz: clean
 	find $(SDK_BUILD_DIR) -name CVS | $(XARGS) rm -rf
 	-make -C $(SDK_BUILD_DIR)/scripts/config clean
 	(cd $(BUILD_DIR); \
-		tar -I 'xz -7e' -cf $@ $(SDK_NAME); \
+		tar -I 'xz -T 0 -7e' -cf $@ $(SDK_NAME); \
 	)
 
 download: