From patchwork Mon Nov 27 22:41:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Brandon Maier X-Patchwork-Id: 1869036 Return-Path: X-Original-To: incoming-buildroot@patchwork.ozlabs.org Delivered-To: patchwork-incoming-buildroot@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=buildroot.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=buildroot-bounces@buildroot.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SfLDn5C6sz1yRW for ; Tue, 28 Nov 2023 09:42:01 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id B39FF400AF; Mon, 27 Nov 2023 22:41:57 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org B39FF400AF X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7gbLI37Zui55; Mon, 27 Nov 2023 22:41:56 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by smtp2.osuosl.org (Postfix) with ESMTP id 73289405FD; Mon, 27 Nov 2023 22:41:55 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 73289405FD X-Original-To: buildroot@lists.busybox.net Delivered-To: buildroot@osuosl.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by ash.osuosl.org (Postfix) with ESMTP id 77F231BF407 for ; Mon, 27 Nov 2023 22:41:54 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 5D9B6818A2 for ; Mon, 27 Nov 2023 22:41:54 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5D9B6818A2 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j3XHiN6iQWZQ for ; Mon, 27 Nov 2023 22:41:53 +0000 (UTC) Received: from mx0b-00105401.pphosted.com (mx0b-00105401.pphosted.com [67.231.152.184]) by smtp1.osuosl.org (Postfix) with ESMTPS id D4B848188B for ; Mon, 27 Nov 2023 22:41:52 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org D4B848188B Received: from pps.filterd (m0346905.ppops.net [127.0.0.1]) by mx0a-00105401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3ARLIiV4019271; Mon, 27 Nov 2023 22:41:50 GMT Received: from xusxph009rp050.rtx.com (xusxph009rp050.rtx.com [128.13.125.147]) by mx0a-00105401.pphosted.com (PPS) with ESMTPS id 3uk69u38re-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Nov 2023 22:41:50 +0000 Received: from xusxph004rp020.corp.ray.com ([128.13.120.224]) by xusxph009rp050.rtx.com (8.17.1.19/8.17.1.19) with ESMTPS id 3ARMfnrD015691 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Nov 2023 22:41:49 GMT Received: from dtulimr02.rockwellcollins.com (snat-utc-mailhub.rockwellcollins.com [10.172.224.19]) by xusxph004rp020.corp.ray.com (8.17.1.19/8.17.1.19) with ESMTPS id 3ARMfm42024945 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 27 Nov 2023 22:41:49 GMT X-Received: from bspbox.kirk (kirk.rockwellcollins.lab [10.148.204.208]) by dtulimr02.rockwellcollins.com (Postfix) with ESMTP id A525E20043; Mon, 27 Nov 2023 16:41:48 -0600 (CST) To: buildroot@buildroot.org Date: Mon, 27 Nov 2023 22:41:39 +0000 Message-ID: <20231127224139.35969-1-brandon.maier@collins.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-27_19,2023-11-27_01,2023-05-22_02 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-27_19,2023-11-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 adultscore=0 mlxlogscore=999 malwarescore=0 spamscore=0 phishscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311270158 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 mlxlogscore=996 adultscore=0 mlxscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311270158 X-Proofpoint-ORIG-GUID: Cn9kRkdVx0B9en8jd6AIZkdoEP7ObYZz X-Proofpoint-GUID: Cn9kRkdVx0B9en8jd6AIZkdoEP7ObYZz X-Proofpoint-Spam-Details: rule=outbound_default_notspam policy=outbound_default score=0 mlxlogscore=999 malwarescore=0 phishscore=0 spamscore=0 lowpriorityscore=0 bulkscore=0 adultscore=0 mlxscore=0 priorityscore=1501 impostorscore=0 clxscore=1011 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311270158 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=collins.com; h=from : to : cc : subject : date : message-id : mime-version : content-type : content-transfer-encoding; s=POD051818; bh=PH+013zmdXz4XMPQrDvyZw8rjK+JpI6B8N5GcTCSitE=; b=LY+08INdewdWuclF1HYBTk+G61iHZBklMMK09bKrUFP4OdFhKoXEPg/wHjTweribJap5 y4YTKGQSA13nb0Z624LTqJMBzVieMszkTxl/gkBD/TGEuYBU8q8DuuoAtwbOUEvj8Q8S tmOyIMqUeuGoANPjHZHPcdVx4bvqS6hjg3y2bBpAm+gTaKv5BVbSK2QGyTT7Z6yLouS5 gKnnGGNutUHuezEUeikh5pbcB4ruXFALzu1HG6o3qY08YscAkP8LTuYzNGCVHtg5yCmp uqOYGrnRVUR8Hv/m22IoEy9DtddI2ZeUURJ5wwh/86AqBZWvX+ZK+zYDS1N/4ApFPlKY ZA== X-Mailman-Original-Authentication-Results: smtp1.osuosl.org; dkim=pass (2048-bit key) header.d=collins.com header.i=@collins.com header.a=rsa-sha256 header.s=POD051818 header.b=LY+08INd Subject: [Buildroot] [PATCH 1/1] ppd-merge: speed up per-package-rsync X-BeenThere: buildroot@buildroot.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion and development of buildroot List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Brandon Maier via buildroot From: Brandon Maier Reply-To: Brandon Maier Cc: Herve Codina , "Yann E . MORIN" , Brandon Maier Errors-To: buildroot-bounces@buildroot.org Sender: "buildroot" The per-package-rsync stage can add a significant amount of time to builds. They can also be annoying as the slowest rsyncs, the final target-finalize and host-finalize stages, run on `make all` which we use frequently during development. The per-package-rsync is slow because it launches a new rsync for each source tree, and each rsync must rescan the destination directory and potentially overwrite files multiple times. So instead we manually walk the source trees to find only the files that should be written to the destination, then feed that to a single instance of rsync. Below is a benchmark running just the host-finalize merge for a system with 200 host packages, running in both hardlink and full copy mode. Benchmark 1: ppd-merge-copy Time (mean ± σ): 5.332 s ± 0.098 s [User: 5.300 s, System: 3.926 s] Range (min … max): 5.099 s … 5.468 s 10 runs Benchmark 2: ppd-merge-hardlink Time (mean ± σ): 2.067 s ± 0.027 s [User: 1.218 s, System: 1.614 s] Range (min … max): 2.037 s … 2.114 s 10 runs Benchmark 3: rsync-copy Time (mean ± σ): 25.539 s ± 0.233 s [User: 11.705 s, System: 10.862 s] Range (min … max): 25.215 s … 25.966 s 10 runs Benchmark 4: rsync-hardlink Time (mean ± σ): 24.074 s ± 0.271 s [User: 3.451 s, System: 10.802 s] Range (min … max): 23.672 s … 24.522 s 10 runs Summary 'ppd-merge-hardlink' ran 2.58 ± 0.06 times faster than 'ppd-merge-copy' 11.65 ± 0.20 times faster than 'rsync-hardlink' 12.36 ± 0.20 times faster than 'rsync-copy' Signed-off-by: Brandon Maier --- package/pkg-utils.mk | 16 +--- support/scripts/ppd-merge.sh | 154 +++++++++++++++++++++++++++++++++++ 2 files changed, 157 insertions(+), 13 deletions(-) create mode 100755 support/scripts/ppd-merge.sh diff --git a/package/pkg-utils.mk b/package/pkg-utils.mk index 059e86ae0a..3968cabebd 100644 --- a/package/pkg-utils.mk +++ b/package/pkg-utils.mk @@ -216,19 +216,9 @@ ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y) # $3: destination directory # $4: literal "copy" or "hardlink" to copy or hardlink files from src to dest define per-package-rsync - mkdir -p $(3) - $(foreach pkg,$(1),\ - rsync -a \ - --hard-links \ - $(if $(filter hardlink,$(4)), \ - --link-dest=$(PER_PACKAGE_DIR)/$(pkg)/$(2)/, \ - $(if $(filter copy,$(4)), \ - $(empty), \ - $(error per-package-rsync can only "copy" or "hardlink", not "$(4)") \ - ) \ - ) \ - $(PER_PACKAGE_DIR)/$(pkg)/$(2)/ \ - $(3)$(sep)) + PER_PACKAGE_DIR=$(PER_PACKAGE_DIR) \ + $(TOPDIR)/support/scripts/ppd-merge.sh \ + $(2) $(3) $(4) $(1) endef # prepares the per-package HOST_DIR and TARGET_DIR of the current diff --git a/support/scripts/ppd-merge.sh b/support/scripts/ppd-merge.sh new file mode 100755 index 0000000000..eb1caf9e52 --- /dev/null +++ b/support/scripts/ppd-merge.sh @@ -0,0 +1,154 @@ +#!/bin/bash +# Merge a set of Buildroot per-package-directories (PPD) into a single +# destination directory. +# +# ppd-merge scans through all the PPD and builds a table of which files from +# each source directory will be written to the destination directory. It feeds +# that table into rsync to tell it exactly which files to copy. +# +# Example: +# ppd-merge replaces the original method of merging PPD that ran many rsync +# commands like below. +# +# > rsync -a --link-dest=$PER_PACKAGE_DIR/pkg-1/host/ \ +# > $PER_PACKAGE_DIR/pkg-1/host/ dest/ +# > rsync -a --link-dest=$PER_PACKAGE_DIR/pkg-2/host/ \ +# > $PER_PACKAGE_DIR/pkg-2/host/ dest/ +# ... +# > rsync -a --link-dest=$PER_PACKAGE_DIR/pkg-n/host/ \ +# > $PER_PACKAGE_DIR/pkg-n/host/ dest/ +# +# The equivalent command with ppd-merge is +# +# > PER_PACKAGE_DIR=$PER_PACKAGE_DIR \ +# > ppd-merge.sh host dest/ hardlink \ +# > pkg-1 pkg-2 ... pkg-n + +set -euo pipefail + +usage() { + cat <&2 +Usage: + ppd-merge.sh ... + + PER_PACKAGE_DIR must be defined in the environment. + + Merges each directory for at + "\$PER_PACKAGE_DIR///" into . +EOF + echo "Error: $*" >&2 + exit 1 +} + +search_subtrees() { + # Use `find` to walk all subtrees and print their contents in `rsync` + # '--relative' path syntax. Filter paths through awk so we discard any + # duplicate files found while walking subtrees. + (cd "$PER_PACKAGE_DIR" && find "${subtrees[@]}" -mindepth 1 -printf "%H/./%P\0") \ + | awk ' + BEGIN { + RS="\0"; + ORS="\0"; + FS="/\\./"; + } + { + if (!($2 in found)) { + found[$2]=1; + print; + } + }' +} + +rsync_hardlink() { + # rsync hardlinking with --link-dest hard caps at 20 subtrees. Batch up the + # input files in $tmpdir until we have 20 subtrees, then flush them with an + # rsync call. + tmpdir=$(mktemp -d) + awk -v tmpdir="$tmpdir" ' + function mkfiles() { + out_file=tmpdir "/" i ".files"; + subtrees_file=tmpdir "/" i ".subtrees"; + } + function batch_out() { + close(out_file); + close(subtrees_file); + delete subtrees; + print i; + i+=1; + mkfiles(); + } + BEGIN { + RS="\0"; + ORS="\0"; + FS="/\\./"; + i=0; + mkfiles(); + } + !($1 in subtrees) && (length(subtrees) >= 20) { + batch_out(); + } + { + print > out_file; + if (!($1 in subtrees)) { + subtrees[$1]=1; + print $1 > subtrees_file; + } + } + END { + if (length(subtrees) > 0) { + batch_out(); + } + }' | while IFS= read -r -d $'\0' i; do + rsync_cmd_batch=( "${rsync_cmd[@]}" ) + while IFS= read -r -d $'\0' subtree; do + rsync_cmd_batch+=( --link-dest="$PER_PACKAGE_DIR/$subtree/" ) + done <"$tmpdir/$i.subtrees" + "${rsync_cmd_batch[@]}" "$PER_PACKAGE_DIR/" "$dest/" <"$tmpdir/$i.files" + done + rm -rf "$tmpdir" +} + +if [ "$#" -lt 3 ]; then + usage "Invalid number of arguments" +fi +type=$1 +dest=$2 +link=$3 +shift 3 + +case "$type" in +host|target) + ;; +*) + usage "Unknown type '$type'" + ;; +esac + +subtrees=() +for ((i = $#; i > 0; i--)); do + subtrees+=( "${!i}/$type" ) +done + +if [ -z "${PER_PACKAGE_DIR:-}" ]; then + usage "PER_PACKAGE_DIR must be defined" +fi + +rsync_cmd=( rsync -a --hard-links --files-from=- --from0 ) + +mkdir -p "$dest" + +if [ "${#subtrees[@]}" -eq 0 ]; then + exit 0 +fi + +case "$link" in +hardlink) + search_subtrees | rsync_hardlink + ;; +copy) + search_subtrees | "${rsync_cmd[@]}" "$PER_PACKAGE_DIR/" "$dest/" + ;; +*) + usage "Unknown link command '$link'" + ;; +esac