diff mbox series

[12/19] infra/pkg-generic: store md5 of just-installed files

Message ID 4ccf7565614332a1435f3729936dea4a73d5c4e9.1546898693.git.yann.morin.1998@free.fr
State Changes Requested
Headers show
Series [01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS | expand

Commit Message

Yann E. MORIN Jan. 7, 2019, 10:05 p.m. UTC
Now that we can accurately list files touched during a package
installation, we can restore the md5sum to those files with a minor
impact on the build time, generally in the order of a few tens of
milliseconds per package at worst, but that his regained later by
providing even more accurate accountability of files to packages, and
will even also allows ignoring packages that install the same file with
the same content.

Update parser accordingly.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: John Keeping <john@metanate.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk       | 10 +++++++---
 support/scripts/brpkgutil.py |  2 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Thomas De Schampheleire Jan. 8, 2019, 3:13 p.m. UTC | #1
El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> Now that we can accurately list files touched during a package
> installation, we can restore the md5sum to those files with a minor
> impact on the build time, generally in the order of a few tens of
> milliseconds per package at worst, but that his regained later by

that is

> providing even more accurate accountability of files to packages, and
> will even also allows ignoring packages that install the same file with
> the same content.
>
> Update parser accordingly.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> Cc: John Keeping <john@metanate.com>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> ---
>  package/pkg-generic.mk       | 10 +++++++---
>  support/scripts/brpkgutil.py |  2 ++
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index d261b5bf76..dd6650db7f 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -63,14 +63,18 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  # field separator. A record is made of these fields:
>  #  - file path
>  #  - package name
> +#  - md5sum of file content, as installed by this package
>  # $(1): package name
>  # $(2): base directory to search in
>  # $(3): suffix of file  (optional)
>  define step_pkg_size_inner
>         cd $(2); \
> -       find . \( -type f -o -type l \) \
> -               -newer $@_before \
> -       |sed -r -e 's/$$/\x00$(1)\x00/' \
> +       { \
> +               find . -type d -newer $@_before -printf 'directory  %p\n'; \
> +               find . -xtype f -newer $@_before -print0 \

Note that the original implementation also used '-xtype' on the
directory case, I think it's important.


/Thomas
Yann E. MORIN Jan. 8, 2019, 7:31 p.m. UTC | #2
Thomas DS, All,

On 2019-01-08 16:13 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
[--SNIP--]
> >  define step_pkg_size_inner
> >         cd $(2); \
> > -       find . \( -type f -o -type l \) \
> > -               -newer $@_before \
> > -       |sed -r -e 's/$$/\x00$(1)\x00/' \
> > +       { \
> > +               find . -type d -newer $@_before -printf 'directory  %p\n'; \
> > +               find . -xtype f -newer $@_before -print0 \
> Note that the original implementation also used '-xtype' on the
> directory case, I think it's important.

I pondered whether we wanted to restore it for directories too, but
completely messed the conditions in my head, and I missed the whole
picture. If I can recall why I did not do so, I shall write it in the
commit log.

Otherwise, yes, we also want to use -xtype for directories too.

Regards,
Yann E. MORIN.
diff mbox series

Patch

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index d261b5bf76..dd6650db7f 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -63,14 +63,18 @@  GLOBAL_INSTRUMENTATION_HOOKS += step_time
 # field separator. A record is made of these fields:
 #  - file path
 #  - package name
+#  - md5sum of file content, as installed by this package
 # $(1): package name
 # $(2): base directory to search in
 # $(3): suffix of file  (optional)
 define step_pkg_size_inner
 	cd $(2); \
-	find . \( -type f -o -type l \) \
-		-newer $@_before \
-	|sed -r -e 's/$$/\x00$(1)\x00/' \
+	{ \
+		find . -type d -newer $@_before -printf 'directory  %p\n'; \
+		find . -xtype f -newer $@_before -print0 \
+		|xargs -0 -r md5sum; \
+	} \
+	|sed -r -e 's/^([^[:space:]]+)  (.+)$$/\2\x00$(1)\x00\1\x00/' \
 	>> $(BUILD_DIR)/packages-file-list$(3).txt
 endef
 
diff --git a/support/scripts/brpkgutil.py b/support/scripts/brpkgutil.py
index f6ef4b3dca..a1114681ea 100644
--- a/support/scripts/brpkgutil.py
+++ b/support/scripts/brpkgutil.py
@@ -30,6 +30,7 @@  def _readlines0n(f):
 # these keys (others maybe added in the future):
 # 'file': the path of the file,
 # 'pkg':  the last package that installed that file
+# 'md5':  the md5sum of the file content
 def parse_pkg_file_list(path):
     with open(path, 'rb') as f:
         for rec in _readlines0n(f):
@@ -37,6 +38,7 @@  def parse_pkg_file_list(path):
             d = {
                   'file': srec[0],
                   'pkg':  srec[1],
+                  'md5':  srec[2],
                 }
             yield d