diff mbox series

support/scripts/check-uniq-files: ignore reinstalled packages

Message ID 20180426162731.4710-1-john@metanate.com
State Changes Requested
Headers show
Series support/scripts/check-uniq-files: ignore reinstalled packages | expand

Commit Message

John Keeping April 26, 2018, 4:27 p.m. UTC
If a package is rebuilt, then any files it installs will be listed
multiple times in the file list and check-uniq-files will report that
these files are touched by more than one package even though it is the
same package listed multiple times.

Switch to storing the package names in a set so that each package can
only appear once.

Signed-off-by: John Keeping <john@metanate.com>
---
 support/scripts/check-uniq-files | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Henrique Marks April 26, 2018, 6:22 p.m. UTC | #1
Hello All,

----- Mensagem original -----
> De: "John Keeping" <john@metanate.com>
> Para: buildroot@buildroot.org
> Cc: "John Keeping" <john@metanate.com>
> Enviadas: Quinta-feira, 26 de abril de 2018 13:27:31
> Assunto: [Buildroot] [PATCH] support/scripts/check-uniq-files: ignore reinstalled packages

> If a package is rebuilt, then any files it installs will be listed
> multiple times in the file list and check-uniq-files will report that
> these files are touched by more than one package even though it is the
> same package listed multiple times.
> 
> Switch to storing the package names in a set so that each package can
> only appear once.
> 
> Signed-off-by: John Keeping <john@metanate.com>
> ---
> support/scripts/check-uniq-files | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
> index fbc6b5d6e7..eb92724e42 100755
> --- a/support/scripts/check-uniq-files
> +++ b/support/scripts/check-uniq-files
> @@ -24,11 +24,11 @@ def main():
>         sys.stderr.write('No type was provided\n')
>         return False
> 
> -    file_to_pkg = defaultdict(list)
> +    file_to_pkg = defaultdict(set)
>     with open(args.packages_file_list[0], 'rb') as pkg_file_list:
>         for line in pkg_file_list.readlines():
>             pkg, _, file = line.rstrip(b'\n').partition(b',')
> -            file_to_pkg[file].append(pkg)
> +            file_to_pkg[file].add(pkg)
> 
>     for file in file_to_pkg:
>         if len(file_to_pkg[file]) > 1:
> --
> 2.17.0
> 
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

I can confirm we have problems rebuilding packages using buildroot 2018-02.1. The 3 files "package-file-list" are increasing in size when we rebuild packages (make <pkg>-rebuild or make <pkg>-reinstall), and the compilation time is increasing linearly with every rebuild, but the increase is noticeable.

One build that takes one hour is increasing 40 minutes every time we rebuild the entire platform (make <pkg>-rebuild for all packages). And we can see the problem is directly associated with the instrumentation hooks, mainly the "check-bin-arch" parsing the ever increasing "package-file-list".

Thanks
John Keeping April 27, 2018, 10:31 a.m. UTC | #2
On Thu, 26 Apr 2018 15:22:35 -0300 (BRT)
Henrique Marks <henrique.marks@datacom.ind.br> wrote:

> ----- Mensagem original -----
> > De: "John Keeping" <john@metanate.com>
> > Para: buildroot@buildroot.org
> > Cc: "John Keeping" <john@metanate.com>
> > Enviadas: Quinta-feira, 26 de abril de 2018 13:27:31
> > Assunto: [Buildroot] [PATCH] support/scripts/check-uniq-files:
> > ignore reinstalled packages  
> 
> > If a package is rebuilt, then any files it installs will be listed
> > multiple times in the file list and check-uniq-files will report
> > that these files are touched by more than one package even though
> > it is the same package listed multiple times.
> > 
> > Switch to storing the package names in a set so that each package
> > can only appear once.
> 
> I can confirm we have problems rebuilding packages using buildroot
> 2018-02.1. The 3 files "package-file-list" are increasing in size
> when we rebuild packages (make <pkg>-rebuild or make
> <pkg>-reinstall), and the compilation time is increasing linearly
> with every rebuild, but the increase is noticeable.

I hadn't considered this aspect of the problem.  Maybe we are better
fixing the root cause of the problem with a patch like this:

-- >8 --
diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 1c9dd1d734..edc2c9349c 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -64,6 +64,7 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
 # $(3): suffix of file  (optional)
 define step_pkg_size_inner
 	cd $(2); \
+	$(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt; \
 	find . \( -type f -o -type l \) \
 		-newer $($(PKG)_DIR)/.stamp_built \
 		-exec printf '$(1),%s\n' {} + \
-- 8< --

But I don't know enough about the build process to know if this is safe
in the presence of top-level parallel build.


Regards,
John
Yann E. MORIN April 28, 2018, 9:29 p.m. UTC | #3
John, Henrique, All,

On 2018-04-27 11:31 +0100, John Keeping spake thusly:
> On Thu, 26 Apr 2018 15:22:35 -0300 (BRT)
> Henrique Marks <henrique.marks@datacom.ind.br> wrote:
> > ----- Mensagem original -----
> > > De: "John Keeping" <john@metanate.com>
> > > Para: buildroot@buildroot.org
> > > Cc: "John Keeping" <john@metanate.com>
> > > Enviadas: Quinta-feira, 26 de abril de 2018 13:27:31
> > > Assunto: [Buildroot] [PATCH] support/scripts/check-uniq-files:
> > > ignore reinstalled packages  
> > 
> > > If a package is rebuilt, then any files it installs will be listed
> > > multiple times in the file list and check-uniq-files will report
> > > that these files are touched by more than one package even though
> > > it is the same package listed multiple times.
> > > 
> > > Switch to storing the package names in a set so that each package
> > > can only appear once.
> > 
> > I can confirm we have problems rebuilding packages using buildroot
> > 2018-02.1. The 3 files "package-file-list" are increasing in size
> > when we rebuild packages (make <pkg>-rebuild or make
> > <pkg>-reinstall), and the compilation time is increasing linearly
> > with every rebuild, but the increase is noticeable.
> 
> I hadn't considered this aspect of the problem.  Maybe we are better
> fixing the root cause of the problem with a patch like this:
> 
> -- >8 --
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 1c9dd1d734..edc2c9349c 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -64,6 +64,7 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  # $(3): suffix of file  (optional)
>  define step_pkg_size_inner
>  	cd $(2); \
> +	$(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt; \

This means that a file that was previously installed by that package,
but no longer is (e.g. because the install comands were "fixed" to no
longer install it), will now be orphaned, and belong to no package.

But I don't care about that situation, since only a clean build from
scratch is known to provide good results anyway, and that is the only
thing we want to ensure.

So, please resend an updated patch to use this new solution.

Please provide an extensive commit log that explains why we do remove
the existign entries, and the limitations it has (e.g. orphaned files).

Regards,
Yann E. MORIN.

>  	find . \( -type f -o -type l \) \
>  		-newer $($(PKG)_DIR)/.stamp_built \
>  		-exec printf '$(1),%s\n' {} + \
> -- 8< --
> 
> But I don't know enough about the build process to know if this is safe
> in the presence of top-level parallel build.
> 
> 
> Regards,
> John
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
Angelo Compagnucci May 3, 2018, 4:27 p.m. UTC | #4
John,

2018-04-28 23:29 GMT+02:00 Yann E. MORIN <yann.morin.1998@free.fr>:
> John, Henrique, All,
>
> On 2018-04-27 11:31 +0100, John Keeping spake thusly:
>> On Thu, 26 Apr 2018 15:22:35 -0300 (BRT)
>> Henrique Marks <henrique.marks@datacom.ind.br> wrote:
>> > ----- Mensagem original -----
>> > > De: "John Keeping" <john@metanate.com>
>> > > Para: buildroot@buildroot.org
>> > > Cc: "John Keeping" <john@metanate.com>
>> > > Enviadas: Quinta-feira, 26 de abril de 2018 13:27:31
>> > > Assunto: [Buildroot] [PATCH] support/scripts/check-uniq-files:
>> > > ignore reinstalled packages
>> >
>> > > If a package is rebuilt, then any files it installs will be listed
>> > > multiple times in the file list and check-uniq-files will report
>> > > that these files are touched by more than one package even though
>> > > it is the same package listed multiple times.
>> > >
>> > > Switch to storing the package names in a set so that each package
>> > > can only appear once.
>> >
>> > I can confirm we have problems rebuilding packages using buildroot
>> > 2018-02.1. The 3 files "package-file-list" are increasing in size
>> > when we rebuild packages (make <pkg>-rebuild or make
>> > <pkg>-reinstall), and the compilation time is increasing linearly
>> > with every rebuild, but the increase is noticeable.
>>
>> I hadn't considered this aspect of the problem.  Maybe we are better
>> fixing the root cause of the problem with a patch like this:
>>
>> -- >8 --
>> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
>> index 1c9dd1d734..edc2c9349c 100644
>> --- a/package/pkg-generic.mk
>> +++ b/package/pkg-generic.mk
>> @@ -64,6 +64,7 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
>>  # $(3): suffix of file  (optional)
>>  define step_pkg_size_inner
>>       cd $(2); \
>> +     $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt; \
>
> This means that a file that was previously installed by that package,
> but no longer is (e.g. because the install comands were "fixed" to no
> longer install it), will now be orphaned, and belong to no package.
>
> But I don't care about that situation, since only a clean build from
> scratch is known to provide good results anyway, and that is the only
> thing we want to ensure.
>
> So, please resend an updated patch to use this new solution.
>
> Please provide an extensive commit log that explains why we do remove
> the existign entries, and the limitations it has (e.g. orphaned files).

Could you respin this patch?

Thanks!

>
> Regards,
> Yann E. MORIN.
>
>>       find . \( -type f -o -type l \) \
>>               -newer $($(PKG)_DIR)/.stamp_built \
>>               -exec printf '$(1),%s\n' {} + \
>> -- 8< --
>>
>> But I don't know enough about the build process to know if this is safe
>> in the presence of top-level parallel build.
>>
>>
>> Regards,
>> John
>> _______________________________________________
>> buildroot mailing list
>> buildroot@busybox.net
>> http://lists.busybox.net/mailman/listinfo/buildroot
>
> --
> .-----------------.--------------------.------------------.--------------------.
> |  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
> | +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
> | +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
> | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
> '------------------------------^-------^------------------^--------------------'
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
John Keeping May 3, 2018, 4:36 p.m. UTC | #5
Hi Angelo,

On Thu, 3 May 2018 18:27:53 +0200
Angelo Compagnucci <angelo.compagnucci@gmail.com> wrote:

> 2018-04-28 23:29 GMT+02:00 Yann E. MORIN <yann.morin.1998@free.fr>:
> > John, Henrique, All,
> >
> > On 2018-04-27 11:31 +0100, John Keeping spake thusly:  
> >> On Thu, 26 Apr 2018 15:22:35 -0300 (BRT)
> >> Henrique Marks <henrique.marks@datacom.ind.br> wrote:  
> >> > ----- Mensagem original -----  
> >> > > De: "John Keeping" <john@metanate.com>
> >> > > Para: buildroot@buildroot.org
> >> > > Cc: "John Keeping" <john@metanate.com>
> >> > > Enviadas: Quinta-feira, 26 de abril de 2018 13:27:31
> >> > > Assunto: [Buildroot] [PATCH] support/scripts/check-uniq-files:
> >> > > ignore reinstalled packages  
> >> >  
> >> > > If a package is rebuilt, then any files it installs will be
> >> > > listed multiple times in the file list and check-uniq-files
> >> > > will report that these files are touched by more than one
> >> > > package even though it is the same package listed multiple
> >> > > times.
> >> > >
> >> > > Switch to storing the package names in a set so that each
> >> > > package can only appear once.  
> >> >
> >> > I can confirm we have problems rebuilding packages using
> >> > buildroot 2018-02.1. The 3 files "package-file-list" are
> >> > increasing in size when we rebuild packages (make <pkg>-rebuild
> >> > or make <pkg>-reinstall), and the compilation time is increasing
> >> > linearly with every rebuild, but the increase is noticeable.  
> >>
> >> I hadn't considered this aspect of the problem.  Maybe we are
> >> better fixing the root cause of the problem with a patch like this:
> >>  
> >> -- >8 --  
> >> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> >> index 1c9dd1d734..edc2c9349c 100644
> >> --- a/package/pkg-generic.mk
> >> +++ b/package/pkg-generic.mk
> >> @@ -64,6 +64,7 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
> >>  # $(3): suffix of file  (optional)
> >>  define step_pkg_size_inner
> >>       cd $(2); \
> >> +     $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt;
> >> \  
> >
> > This means that a file that was previously installed by that
> > package, but no longer is (e.g. because the install comands were
> > "fixed" to no longer install it), will now be orphaned, and belong
> > to no package.
> >
> > But I don't care about that situation, since only a clean build from
> > scratch is known to provide good results anyway, and that is the
> > only thing we want to ensure.
> >
> > So, please resend an updated patch to use this new solution.
> >
> > Please provide an extensive commit log that explains why we do
> > remove the existign entries, and the limitations it has (e.g.
> > orphaned files).  
> 
> Could you respin this patch?

This got respun a few times on a separate thread, resulting in:

https://git.buildroot.org/buildroot/commit/package/pkg-generic.mk?id=d3dca1e9936bcaa0eed226a5bcb8c6a4d1fd1472


Regards,
John
diff mbox series

Patch

diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index fbc6b5d6e7..eb92724e42 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -24,11 +24,11 @@  def main():
         sys.stderr.write('No type was provided\n')
         return False
 
-    file_to_pkg = defaultdict(list)
+    file_to_pkg = defaultdict(set)
     with open(args.packages_file_list[0], 'rb') as pkg_file_list:
         for line in pkg_file_list.readlines():
             pkg, _, file = line.rstrip(b'\n').partition(b',')
-            file_to_pkg[file].append(pkg)
+            file_to_pkg[file].add(pkg)
 
     for file in file_to_pkg:
         if len(file_to_pkg[file]) > 1: