diff mbox series

[V3,1/3] base-files: sysupgrade: add tar.sh with helpers for building archives

Message ID 20240228105102.3399-1-zajec5@gmail.com
State Accepted
Delegated to: Rafał Miłecki
Headers show
Series [V3,1/3] base-files: sysupgrade: add tar.sh with helpers for building archives | expand

Commit Message

Rafał Miłecki Feb. 28, 2024, 10:51 a.m. UTC
From: Jo-Philipp Wich <jo@mein.io>

This allows building uncompressed tar archives from shell scripts (and
compressing them later if needed)

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
---
V2: Simplify dd in __tar_print_padding (I still think helper is useful)
    Hardcode 0/0/ root/root for now as most likely it'll be enough
    Simplify name validation (leasing slash)
    Reorder some variables
V3: Fix dd in __tar_print_padding
    Rename functions
    Drop unused functions
    Document usage

 package/base-files/files/lib/upgrade/tar.sh | 71 +++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 package/base-files/files/lib/upgrade/tar.sh

Comments

Jo-Philipp Wich Feb. 29, 2024, 1:40 p.m. UTC | #1
Hi,

> [...] 
> This allows building uncompressed tar archives from shell scripts (and
> compressing them later if needed)
> 
> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>

Signed-off-by: Jo-Philipp Wich <jo@mein.io>


~ Jo
Luiz Angelo Daros de Luca March 1, 2024, 4:54 a.m. UTC | #2
Hello Rafał,

Sorry for the late review.

I know this will be used for a limited context but it can be
constructed in such a way it could be used for more cases without
increasing its complexity.

> From: Jo-Philipp Wich <jo@mein.io>
>
> This allows building uncompressed tar archives from shell scripts (and
> compressing them later if needed)
>
> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
> ---
> V2: Simplify dd in __tar_print_padding (I still think helper is useful)
>     Hardcode 0/0/ root/root for now as most likely it'll be enough
>     Simplify name validation (leasing slash)
>     Reorder some variables
> V3: Fix dd in __tar_print_padding
>     Rename functions
>     Drop unused functions
>     Document usage
>
>  package/base-files/files/lib/upgrade/tar.sh | 71 +++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100644 package/base-files/files/lib/upgrade/tar.sh
>
> diff --git a/package/base-files/files/lib/upgrade/tar.sh b/package/base-files/files/lib/upgrade/tar.sh
> new file mode 100644
> index 0000000000..a9d1d559e6
> --- /dev/null
> +++ b/package/base-files/files/lib/upgrade/tar.sh
> @@ -0,0 +1,71 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later OR MIT
> +
> +# Example usage:
> +#
> +# {
> +#         tar_print_member "date.txt" "It's $(date +"%Y")"
> +#         tar_print_trailer
> +# } > test.tar
> +
> +__tar_print_padding() {
> +       dd if=/dev/zero bs=1 count=$1 2>/dev/null
> +}
> +
> +tar_print_member() {
> +       local name="$1"
> +       local content="$2"
> +       local mtime="${3:-$(date +%s)}"
> +       local mode=644
> +       local uid=0
> +       local gid=0
> +       local size=${#content}
> +       local type=0
> +       local link=""
> +       local username="root"
> +       local groupname="root"
> +
> +       # 100 byte of padding bytes, using 0x01 since the shell does not tolerate null bytes in strings
> +       local pad=$'\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1'

\1 is still a valid filename character. Shell cannot save a string
with \0 but printf can handle it nicely (as well as pipes). If you
build the header using a function at the moment you use it, you could
avoid the \1 hack.

print_header() {
   local filename=$1
   local mode=$2
   ...
   local checksum=$7
   ...
   printf '%s' "$filename"; printf '\x00%.0s' $(seq $((100 - ${#filename})))
   ...
}

print_header "$filename" "$mode" \
   "$uid" "$gid" .. | hexdump -ve '1/1 "%u

If you don't like the printf trick to repeat a char, you could use
__tar_print_padding (or use printf for both).

You could call it twice, first without a checksum to calculate it and
then a new one to generate the final output.

> +
> +       # validate name (strip leading slash if present)
> +       name=${name#/}
> +
> +       # truncate string header values to their maximum length
> +       name=${name:0:100}

It feels strange to simply truncate the name. If you need to truncate
it, you are implying it might be larger. In that case, it should fail
if it cannot fit a specific filename. Otherwise, you could end up
overwriting two filenames with the same first 100 bytes. And BTW,
shouldn't ustar support 255 or 256 filenames?

> +       link=${link:0:100}
> +       username=${username:0:32}
> +       groupname=${groupname:0:32}
> +
> +       # construct header part before checksum field
> +       local header1="${name}${pad:0:$((100 - ${#name}))}"
> +       header1="${header1}$(printf '%07d\1' $mode)"
> +       header1="${header1}$(printf '%07o\1' $uid)"
> +       header1="${header1}$(printf '%07o\1' $gid)"
> +       header1="${header1}$(printf '%011o\1' $size)"
> +       header1="${header1}$(printf '%011o\1' $mtime)"
> +
> +       # construct header part after checksum field
> +       local header2="$(printf '%d' $type)"
> +       header2="${header2}${link}${pad:0:$((100 - ${#link}))}"
> +       header2="${header2}ustar  ${pad:0:1}"
> +       header2="${header2}${username}${pad:0:$((32 - ${#username}))}"
> +       header2="${header2}${groupname}${pad:0:$((32 - ${#groupname}))}"

It seems to be a strange ustar format. Shouldn't it have an ustar
version before the username? And Device major/minor after groupname,
followed by more 150 filename prefix? It might still work if the
reader interprets it as a v7.

> +       # calculate checksum over header fields
> +       local checksum=0
> +       for byte in $(printf '%s%8s%s' "$header1" "" "$header2" | tr '\1' '\0' | hexdump -ve '1/1 "%u "'); do
> +               checksum=$((checksum + byte))
> +       done
> +

Maybe use a single line like:

checksum=$(print header... | hexdump -ve '1/1 "%u\n"' | awk '{s+=$1}
END {print s}')

?

> +       # print member header, padded to 512 byte
> +       printf '%s%06o\0 %s' "$header1" $checksum "$header2" | tr '\1' '\0'
> +       __tar_print_padding 183
> +
> +       # print content data, padded to multiple of 512 byte
> +       printf "%s" "$content"
> +       __tar_print_padding $((512 - (size % 512)))
> +}
> +
> +tar_print_trailer() {
> +       __tar_print_padding 1024
> +}
> --
> 2.35.3
>
diff mbox series

Patch

diff --git a/package/base-files/files/lib/upgrade/tar.sh b/package/base-files/files/lib/upgrade/tar.sh
new file mode 100644
index 0000000000..a9d1d559e6
--- /dev/null
+++ b/package/base-files/files/lib/upgrade/tar.sh
@@ -0,0 +1,71 @@ 
+# SPDX-License-Identifier: GPL-2.0-or-later OR MIT
+
+# Example usage:
+#
+# {
+#         tar_print_member "date.txt" "It's $(date +"%Y")"
+#         tar_print_trailer
+# } > test.tar
+
+__tar_print_padding() {
+	dd if=/dev/zero bs=1 count=$1 2>/dev/null
+}
+
+tar_print_member() {
+	local name="$1"
+	local content="$2"
+	local mtime="${3:-$(date +%s)}"
+	local mode=644
+	local uid=0
+	local gid=0
+	local size=${#content}
+	local type=0
+	local link=""
+	local username="root"
+	local groupname="root"
+
+	# 100 byte of padding bytes, using 0x01 since the shell does not tolerate null bytes in strings
+	local pad=$'\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1'
+
+	# validate name (strip leading slash if present)
+	name=${name#/}
+
+	# truncate string header values to their maximum length
+	name=${name:0:100}
+	link=${link:0:100}
+	username=${username:0:32}
+	groupname=${groupname:0:32}
+
+	# construct header part before checksum field
+	local header1="${name}${pad:0:$((100 - ${#name}))}"
+	header1="${header1}$(printf '%07d\1' $mode)"
+	header1="${header1}$(printf '%07o\1' $uid)"
+	header1="${header1}$(printf '%07o\1' $gid)"
+	header1="${header1}$(printf '%011o\1' $size)"
+	header1="${header1}$(printf '%011o\1' $mtime)"
+
+	# construct header part after checksum field
+	local header2="$(printf '%d' $type)"
+	header2="${header2}${link}${pad:0:$((100 - ${#link}))}"
+	header2="${header2}ustar  ${pad:0:1}"
+	header2="${header2}${username}${pad:0:$((32 - ${#username}))}"
+	header2="${header2}${groupname}${pad:0:$((32 - ${#groupname}))}"
+
+	# calculate checksum over header fields
+	local checksum=0
+	for byte in $(printf '%s%8s%s' "$header1" "" "$header2" | tr '\1' '\0' | hexdump -ve '1/1 "%u "'); do
+		checksum=$((checksum + byte))
+	done
+
+	# print member header, padded to 512 byte
+	printf '%s%06o\0 %s' "$header1" $checksum "$header2" | tr '\1' '\0'
+	__tar_print_padding 183
+
+	# print content data, padded to multiple of 512 byte
+	printf "%s" "$content"
+	__tar_print_padding $((512 - (size % 512)))
+}
+
+tar_print_trailer() {
+	__tar_print_padding 1024
+}