diff mbox series

[v2,1/1] ppd-merge: speed up per-package-rsync

Message ID 20231201170552.18210-1-brandon.maier@collins.com
State New
Headers show
Series [v2,1/1] ppd-merge: speed up per-package-rsync | expand

Commit Message

Brandon Maier Dec. 1, 2023, 5:05 p.m. UTC
The per-package-rsync stage can add a significant amount of time to
builds. They can also be annoying as the target-finalize and
host-finalize targets are the slowest and run on every `make all`, which
is used frequently for partial rebuilds.

The per-package-rsync is slow because it launches a new rsync for each
source tree, and each rsync must rescan the destination directory and
potentially overwrite files multiple times. We can instead merge all the
rsync calls down into one call, and rsync is smarter about scanning all
the source directories and only copying over the files it needs to.

We feed the source trees to rsync in reverse-order, as this preserves
the original behaviour. I.e. when using multiple rsyncs, the last source
tree would overwrite anything in the destination. Now when using a
single rsync, we put the last tree first as rsync will select the first
file it finds.

This only supports the 'copy' mode, which is used in the finalize step.
The 'hardlink' mode requires specifying each source tree with the
--link-dest flag but enforces a maximum of 20 trees.

Below is a benchmark running the host-finalize target for a build with
200 packages.

Benchmark 1: before copy
  Time (mean ± σ):     27.171 s ±  0.777 s    [User: 6.170 s, System: 14.830 s]
  Range (min … max):   26.343 s … 28.566 s    10 runs

Benchmark 2: after copy
  Time (mean ± σ):      6.296 s ±  0.196 s    [User: 2.874 s, System: 5.600 s]
  Range (min … max):    6.094 s …  6.709 s    10 runs

Summary
  after copy ran
    4.32 ± 0.18 times faster than before copy

Cc: Herve Codina <herve.codina@bootlin.com>
Cc: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Brandon Maier <brandon.maier@collins.com>
---
v1: https://patchwork.ozlabs.org/project/buildroot/patch/20231127224139.35969-1-brandon.maier@collins.com/

v2:
- Simplify logic for 'copy' mode
- Drop support for 'hardlink' mode

Signed-off-by: Brandon Maier <brandon.maier@collins.com>
---
 package/pkg-utils.mk | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

Comments

Herve Codina Dec. 4, 2023, 12:46 p.m. UTC | #1
Hi Brandon,

On Fri,  1 Dec 2023 17:05:52 +0000
Brandon Maier <brandon.maier@collins.com> wrote:

> The per-package-rsync stage can add a significant amount of time to
> builds. They can also be annoying as the target-finalize and
> host-finalize targets are the slowest and run on every `make all`, which
> is used frequently for partial rebuilds.
> 
> The per-package-rsync is slow because it launches a new rsync for each
> source tree, and each rsync must rescan the destination directory and
> potentially overwrite files multiple times. We can instead merge all the
> rsync calls down into one call, and rsync is smarter about scanning all
> the source directories and only copying over the files it needs to.
> 
> We feed the source trees to rsync in reverse-order, as this preserves
> the original behaviour. I.e. when using multiple rsyncs, the last source
> tree would overwrite anything in the destination. Now when using a
> single rsync, we put the last tree first as rsync will select the first
> file it finds.
> 
> This only supports the 'copy' mode, which is used in the finalize step.
> The 'hardlink' mode requires specifying each source tree with the
> --link-dest flag but enforces a maximum of 20 trees.
> 
> Below is a benchmark running the host-finalize target for a build with
> 200 packages.
> 
> Benchmark 1: before copy
>   Time (mean ± σ):     27.171 s ±  0.777 s    [User: 6.170 s, System: 14.830 s]
>   Range (min … max):   26.343 s … 28.566 s    10 runs
> 
> Benchmark 2: after copy
>   Time (mean ± σ):      6.296 s ±  0.196 s    [User: 2.874 s, System: 5.600 s]
>   Range (min … max):    6.094 s …  6.709 s    10 runs
> 
> Summary
>   after copy ran
>     4.32 ± 0.18 times faster than before copy
> 
> Cc: Herve Codina <herve.codina@bootlin.com>
> Cc: Yann E. MORIN <yann.morin.1998@free.fr>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> Signed-off-by: Brandon Maier <brandon.maier@collins.com>
> ---
> v1: https://patchwork.ozlabs.org/project/buildroot/patch/20231127224139.35969-1-brandon.maier@collins.com/
> 
> v2:
> - Simplify logic for 'copy' mode
> - Drop support for 'hardlink' mode
> 
> Signed-off-by: Brandon Maier <brandon.maier@collins.com>
> ---
>  package/pkg-utils.mk | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/package/pkg-utils.mk b/package/pkg-utils.mk
> index 059e86ae0a..bada328829 100644
> --- a/package/pkg-utils.mk
> +++ b/package/pkg-utils.mk
> @@ -217,18 +217,12 @@ ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
>  # $4: literal "copy" or "hardlink" to copy or hardlink files from src to dest
>  define per-package-rsync
>  	mkdir -p $(3)
> -	$(foreach pkg,$(1),\
> -		rsync -a \
> -			--hard-links \
> -			$(if $(filter hardlink,$(4)), \
> -				--link-dest=$(PER_PACKAGE_DIR)/$(pkg)/$(2)/, \
> -				$(if $(filter copy,$(4)), \
> -					$(empty), \
> -					$(error per-package-rsync can only "copy" or "hardlink", not "$(4)") \
> -				) \
> -			) \
> -			$(PER_PACKAGE_DIR)/$(pkg)/$(2)/ \
> -			$(3)$(sep))
> +	$(if $(filter hardlink,$(4)), \
> +		$(foreach pkg,$(1),\
> +			rsync -a --hard-links --link-dest=$(PER_PACKAGE_DIR)/$(pkg)/$(2)/ \
> +				$(PER_PACKAGE_DIR)/$(pkg)/$(2)/ $(3)$(sep)), \
> +		printf "%s/$(2)/\n" $(1) | tac \
> +			| rsync -a --hard-links --files-from=- --no-R -r $(PER_PACKAGE_DIR) $(3))
>  endef
>  
>  # prepares the per-package HOST_DIR and TARGET_DIR of the current

On my side, it looks good.

Reviewed-by: Herve Codina <herve.codina@bootlin.com>

Regards,
Hervé
diff mbox series

Patch

diff --git a/package/pkg-utils.mk b/package/pkg-utils.mk
index 059e86ae0a..bada328829 100644
--- a/package/pkg-utils.mk
+++ b/package/pkg-utils.mk
@@ -217,18 +217,12 @@  ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
 # $4: literal "copy" or "hardlink" to copy or hardlink files from src to dest
 define per-package-rsync
 	mkdir -p $(3)
-	$(foreach pkg,$(1),\
-		rsync -a \
-			--hard-links \
-			$(if $(filter hardlink,$(4)), \
-				--link-dest=$(PER_PACKAGE_DIR)/$(pkg)/$(2)/, \
-				$(if $(filter copy,$(4)), \
-					$(empty), \
-					$(error per-package-rsync can only "copy" or "hardlink", not "$(4)") \
-				) \
-			) \
-			$(PER_PACKAGE_DIR)/$(pkg)/$(2)/ \
-			$(3)$(sep))
+	$(if $(filter hardlink,$(4)), \
+		$(foreach pkg,$(1),\
+			rsync -a --hard-links --link-dest=$(PER_PACKAGE_DIR)/$(pkg)/$(2)/ \
+				$(PER_PACKAGE_DIR)/$(pkg)/$(2)/ $(3)$(sep)), \
+		printf "%s/$(2)/\n" $(1) | tac \
+			| rsync -a --hard-links --files-from=- --no-R -r $(PER_PACKAGE_DIR) $(3))
 endef
 
 # prepares the per-package HOST_DIR and TARGET_DIR of the current