Message ID | 20230727125123.927568-1-herve.codina@bootlin.com |
---|---|
State | Rejected |
Headers | show |
Series | [RFC,1/1] support/download: allow to use part of file in checksum checking | expand |
Hervé, All, On 2023-07-27 14:51 +0200, Herve Codina spake thusly: > The checksum checking is done on whole files only using the <pkg>.hash > checksum references. > Among the files checked, files related to licenses are checked. > > Some packages do not contain any specific license files and, for > them, some source files are used. These source files contain license > information (usually comments at the beginning of the file). > > Using the whole source file for checksum checking in this case can lead > to issues if a br2-external is present and applies some patches to this s/br2-external/global patch-dir/ > source file. > Indeed, patching a package from a br2-external is allowed but in that s/br2-external/global patch-dir/ > case the whole file checksum change and all checksum verification done > on that file fails. In particular 'make legal-info' fails. > > Using only the license related part of a source file for checksum > checking solve this issue. > 'make legal-info' will fail only if the license part is modify. modified > Introduce the possibility to have a lines range in <pkg>.hash and, if > present, compute checksum on the part defined by this lines range. I'm afraid this is fraught with limitations. For example, I've seen source files where the license texts (yes, plural) are spread out out in the source file, intermixed between parts of the code. If you patch such a file, you are certain that: 1. its hash as a whole changes, as you noticed 2. the license will not be on the lines you expect them to be, and will defeat your proposed scheme. > For instance, in <pkg>.hash: > sha256 xxxxxx foo.c <-- sha256 on the whole foo.c file > sha256 xxxxxx foo.c 1,15 <-- sha256 on extraction from line 1 to 15 That's hackish. Instead, I would argue for either or both of the following: 1. considering that the source file is used as a license file, it is not non-redistributable, so it is already redistributed as part of the legal-info artefacts, so I would suggest that we simply stop using source file as a license file; 2. for such packages, add a post-extract hook that prepares the license file(s), and register those as _FOO_LICENSE_FILES instead. By using code, you can search for patterns rather than depend on constant line numbers. Both are trivial enough, I believe. Regards, Yann E. MORIN. > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > --- > support/download/check-hash | 24 +++++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/support/download/check-hash b/support/download/check-hash > index 5a47f49bc3..a90c9ca58a 100755 > --- a/support/download/check-hash > +++ b/support/download/check-hash > @@ -45,10 +45,20 @@ fi > # $1: algo hash > # $2: known hash > # $3: file (full path) > +# $4: lines ranges in the form s,e. If present, the checksum is compute on the > +# file extracted part from line number s to line number e included. > +# The first line in the file is the line number 1. > +# If not present, the whole file is used. > check_one_hash() { > _h="${1}" > _known="${2}" > _file="${3}" > + _r="${4}" > + > + base_with_range=${base} > + if [ ${_r} ]; then > + base_with_range="${base_with_range}:${_r}" > + fi > > # Note: md5 is supported, but undocumented on purpose. > # Note: sha3 is not supported, since there is currently no implementation > @@ -64,13 +74,17 @@ check_one_hash() { > esac > > # Do the hashes match? > - _hash=$( ${_h}sum "${_file}" |cut -d ' ' -f 1 ) > + if [ ${_r} ]; then > + _hash=$( sed -n "${r} p" ${file} | ${_h}sum |cut -d ' ' -f 1 ) > + else > + _hash=$( ${_h}sum "${_file}" |cut -d ' ' -f 1 ) > + fi > if [ "${_hash}" = "${_known}" ]; then > - printf "%s: OK (%s: %s)\n" "${base}" "${_h}" "${_hash}" > + printf "%s: OK (%s: %s)\n" "${base_with_range}" "${_h}" "${_hash}" > return 0 > fi > > - printf "ERROR: %s has wrong %s hash:\n" "${base}" "${_h}" >&2 > + printf "ERROR: %s has wrong %s hash:\n" "${base_with_range}" "${_h}" >&2 > printf "ERROR: expected: %s\n" "${_known}" >&2 > printf "ERROR: got : %s\n" "${_hash}" >&2 > printf "ERROR: Incomplete download, or man-in-the-middle (MITM) attack\n" >&2 > @@ -80,7 +94,7 @@ check_one_hash() { > > # Do we know one or more hashes for that file? > nb_checks=0 > -while read t h f; do > +while read t h f r; do > case "${t}" in > ''|'#'*) > # Skip comments and empty lines > @@ -88,7 +102,7 @@ while read t h f; do > ;; > *) > if [ "${f}" = "${base}" ]; then > - check_one_hash "${t}" "${h}" "${file}" > + check_one_hash "${t}" "${h}" "${file}" "${r}" > : $((nb_checks++)) > fi > ;; > -- > 2.41.0 >
On Thu, 27 Jul 2023 17:59:44 +0200 "Yann E. MORIN" <yann.morin.1998@free.fr> wrote: > 2. for such packages, add a post-extract hook that prepares the license > file(s), and register those as _FOO_LICENSE_FILES instead. By using > code, you can search for patterns rather than depend on constant > line numbers. Like: # The Github repository doesn't contain the source code as-is: it # contains a tarball with the kernel driver source code, and a # self-extractible binary for the user-space parts. So we extract both # below, and also extract the EULA text from the self-extractible binary define GCNANO_BINARIES_EXTRACT_HELPER tar --strip-components=1 -xJf $(@D)/gcnano-driver-$(GCNANO_BINARIES_DRIVER_VERSION).tar.xz -C $(@D) awk 'BEGIN { start = 0; } \ /^EOEULA/ { start = 0; } \ { if (start) print; } \ /<<EOEULA/ { start = 1; }' \ $(@D)/gcnano-userland-multi-$(GCNANO_BINARIES_USERLAND_VERSION).bin > $(@D)/EULA cd $(@D) && sh gcnano-userland-multi-$(GCNANO_BINARIES_USERLAND_VERSION).bin --auto-accept endef GCNANO_BINARIES_POST_EXTRACT_HOOKS += GCNANO_BINARIES_EXTRACT_HELPER (which among other things extracts the EULA file, that is used as a license file) Best regards, Thomas
Hi Yann, Thomas, On Thu, 27 Jul 2023 22:59:50 +0200 Thomas Petazzoni <thomas.petazzoni@bootlin.com> wrote: > On Thu, 27 Jul 2023 17:59:44 +0200 > "Yann E. MORIN" <yann.morin.1998@free.fr> wrote: > > > 2. for such packages, add a post-extract hook that prepares the license > > file(s), and register those as _FOO_LICENSE_FILES instead. By using > > code, you can search for patterns rather than depend on constant > > line numbers. > > Like: > > # The Github repository doesn't contain the source code as-is: it > # contains a tarball with the kernel driver source code, and a > # self-extractible binary for the user-space parts. So we extract both > # below, and also extract the EULA text from the self-extractible binary > define GCNANO_BINARIES_EXTRACT_HELPER > tar --strip-components=1 -xJf $(@D)/gcnano-driver-$(GCNANO_BINARIES_DRIVER_VERSION).tar.xz -C $(@D) > awk 'BEGIN { start = 0; } \ > /^EOEULA/ { start = 0; } \ > { if (start) print; } \ > /<<EOEULA/ { start = 1; }' \ > $(@D)/gcnano-userland-multi-$(GCNANO_BINARIES_USERLAND_VERSION).bin > $(@D)/EULA > cd $(@D) && sh gcnano-userland-multi-$(GCNANO_BINARIES_USERLAND_VERSION).bin --auto-accept > endef > > GCNANO_BINARIES_POST_EXTRACT_HOOKS += GCNANO_BINARIES_EXTRACT_HELPER > > (which among other things extracts the EULA file, that is used as a > license file) > > Best regards, > > Thomas Thanks for your feedback. I am going to look at the POST_EXTRACT_HOOKS to generate a license file. Best regards, Hervé
diff --git a/support/download/check-hash b/support/download/check-hash index 5a47f49bc3..a90c9ca58a 100755 --- a/support/download/check-hash +++ b/support/download/check-hash @@ -45,10 +45,20 @@ fi # $1: algo hash # $2: known hash # $3: file (full path) +# $4: lines ranges in the form s,e. If present, the checksum is compute on the +# file extracted part from line number s to line number e included. +# The first line in the file is the line number 1. +# If not present, the whole file is used. check_one_hash() { _h="${1}" _known="${2}" _file="${3}" + _r="${4}" + + base_with_range=${base} + if [ ${_r} ]; then + base_with_range="${base_with_range}:${_r}" + fi # Note: md5 is supported, but undocumented on purpose. # Note: sha3 is not supported, since there is currently no implementation @@ -64,13 +74,17 @@ check_one_hash() { esac # Do the hashes match? - _hash=$( ${_h}sum "${_file}" |cut -d ' ' -f 1 ) + if [ ${_r} ]; then + _hash=$( sed -n "${r} p" ${file} | ${_h}sum |cut -d ' ' -f 1 ) + else + _hash=$( ${_h}sum "${_file}" |cut -d ' ' -f 1 ) + fi if [ "${_hash}" = "${_known}" ]; then - printf "%s: OK (%s: %s)\n" "${base}" "${_h}" "${_hash}" + printf "%s: OK (%s: %s)\n" "${base_with_range}" "${_h}" "${_hash}" return 0 fi - printf "ERROR: %s has wrong %s hash:\n" "${base}" "${_h}" >&2 + printf "ERROR: %s has wrong %s hash:\n" "${base_with_range}" "${_h}" >&2 printf "ERROR: expected: %s\n" "${_known}" >&2 printf "ERROR: got : %s\n" "${_hash}" >&2 printf "ERROR: Incomplete download, or man-in-the-middle (MITM) attack\n" >&2 @@ -80,7 +94,7 @@ check_one_hash() { # Do we know one or more hashes for that file? nb_checks=0 -while read t h f; do +while read t h f r; do case "${t}" in ''|'#'*) # Skip comments and empty lines @@ -88,7 +102,7 @@ while read t h f; do ;; *) if [ "${f}" = "${base}" ]; then - check_one_hash "${t}" "${h}" "${file}" + check_one_hash "${t}" "${h}" "${file}" "${r}" : $((nb_checks++)) fi ;;
The checksum checking is done on whole files only using the <pkg>.hash checksum references. Among the files checked, files related to licenses are checked. Some packages do not contain any specific license files and, for them, some source files are used. These source files contain license information (usually comments at the beginning of the file). Using the whole source file for checksum checking in this case can lead to issues if a br2-external is present and applies some patches to this source file. Indeed, patching a package from a br2-external is allowed but in that case the whole file checksum change and all checksum verification done on that file fails. In particular 'make legal-info' fails. Using only the license related part of a source file for checksum checking solve this issue. 'make legal-info' will fail only if the license part is modify. Introduce the possibility to have a lines range in <pkg>.hash and, if present, compute checksum on the part defined by this lines range. For instance, in <pkg>.hash: sha256 xxxxxx foo.c <-- sha256 on the whole foo.c file sha256 xxxxxx foo.c 1,15 <-- sha256 on extraction from line 1 to 15 Signed-off-by: Herve Codina <herve.codina@bootlin.com> --- support/download/check-hash | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)