diff mbox

[RFCv1,2/4] pkg-generic: add step_pkg_size global instrumentation hook

Message ID 1402177567-8021-3-git-send-email-thomas.petazzoni@free-electrons.com
State Deferred
Headers show

Commit Message

Thomas Petazzoni June 7, 2014, 9:46 p.m. UTC
This patch adds a global instrumentation hook that collects the list
of files installed in $(TARGET_DIR) by each package, and stores this
list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
be used to determine the size contribution of each package to the
target root filesystem.

The only limitation is that if a file is installed by a package A, and
then overriden by a file from package B, the file will only be listed
in $(BUILD_DIR)/A.filelist as it is the first time we will see the
file.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 package/pkg-generic.mk | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Baruch Siach June 8, 2014, 2:56 a.m. UTC | #1
Hi Thomas,

On Sat, Jun 07, 2014 at 11:46:05PM +0200, Thomas Petazzoni wrote:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> be used to determine the size contribution of each package to the
> target root filesystem.

How does this play with parallel build? Is install-target guaranteed to run 
sequentially for each package?

baruch

> The only limitation is that if a file is installed by a package A, and
> then overriden by a file from package B, the file will only be listed
> in $(BUILD_DIR)/A.filelist as it is the first time we will see the
> file.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  package/pkg-generic.mk | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 5116ed9..069653e 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,30 @@ define step_time
>  endef
>  GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  
> +# Package size steps
> +define step_pkg_size_start
> +	echo "PKG SIZE START $(1)"
> +	(cd $(TARGET_DIR) ; find . -type f) | sort > \
> +		$(BUILD_DIR)/$(1).tmp_filelist_before
> +endef
> +
> +define step_pkg_size_end
> +	echo "PKG SIZE END $(1)"
> +	(cd $(TARGET_DIR); find . -type f) | sort > \
> +		$(BUILD_DIR)/$(1).tmp_filelist_after
> +	diff -u $(BUILD_DIR)/$(1).tmp_filelist_before $(BUILD_DIR)/$(1).tmp_filelist_after | \
> +		grep '^\+\./' | sed 's%^\+%%' > $(BUILD_DIR)/$(1).filelist
> +	$(RM) -f $(BUILD_DIR)/$(1).tmp_filelist_before \
> +		$(BUILD_DIR)/$(1).tmp_filelist_after
> +endef
> +
> +define step_pkg_size
> +	$(if $(filter install-target,$(2)),\
> +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef
> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> +
>  # User-supplied script
>  define step_user
>  	@$(foreach user_hook, $(BR2_INSTRUMENTATION_SCRIPTS), \
Thomas Petazzoni June 8, 2014, 8:19 a.m. UTC | #2
Dear Baruch Siach,

On Sun, 8 Jun 2014 05:56:29 +0300, Baruch Siach wrote:

> On Sat, Jun 07, 2014 at 11:46:05PM +0200, Thomas Petazzoni wrote:
> > This patch adds a global instrumentation hook that collects the list
> > of files installed in $(TARGET_DIR) by each package, and stores this
> > list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> > be used to determine the size contribution of each package to the
> > target root filesystem.
> 
> How does this play with parallel build? Is install-target guaranteed to run 
> sequentially for each package?

It obviously clearly doesn't work with top-level parallel build. The
mechanism assumes that between the beginning of an install-target step
and its end, nothing else runs and installs stuff in $(TARGET_DIR).
Which is true for sequential builds, but false for top-level parallel
builds.

Thomas
Yann E. MORIN June 9, 2014, 10:02 p.m. UTC | #3
Thomas, All,

On 2014-06-07 23:46 +0200, Thomas Petazzoni spake thusly:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> be used to determine the size contribution of each package to the
> target root filesystem.
> 
> The only limitation is that if a file is installed by a package A, and
> then overriden by a file from package B, the file will only be listed
> in $(BUILD_DIR)/A.filelist as it is the first time we will see the
> file.

If we really wanted to account for the realy package, we'd have to
somehow notice that a pacakge did change the content of a file.

So, we would need to run sha1sum on all the files in the pre-step and
the post step. Any differing line would mean a new file, or a changed
file.

See below for a proposed storage solution.

> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  package/pkg-generic.mk | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 5116ed9..069653e 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,30 @@ define step_time
>  endef
>  GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  
> +# Package size steps
> +define step_pkg_size_start
> +	echo "PKG SIZE START $(1)"
> +	(cd $(TARGET_DIR) ; find . -type f) | sort > \
> +		$(BUILD_DIR)/$(1).tmp_filelist_before

At first, I wondered if we should not store those files in the packages'
own build directory (along with the .stamp files.)

But then I went back to thinking about the second-package-to-install-a-file
issue raised in the commit log.

So, say we are able to determine what files a pacakge installs or modify
(using the sah1, for example.) Then we could just store that list in a
single file, that gets appended to package after package, and which
format would be:

package-name <TAB> path/to/file
package-name <TAB> path/to/other/file
other-package <TAB> path/to/third/file
pther-package <TAB> path/to/file             <-- override

That way, the python script has only one file to scan, which is sorted
by build-order, and the script can detect overwritten files, and even
report that, while still accounting the size to the real pacakge that
installed the file that will end up in the target.

Of course, using sha1 would slow the build quite a bit.

Thoughts?

Regards,
Yann E. MORIN.

> +endef
> +
> +define step_pkg_size_end
> +	echo "PKG SIZE END $(1)"
> +	(cd $(TARGET_DIR); find . -type f) | sort > \
> +		$(BUILD_DIR)/$(1).tmp_filelist_after
> +	diff -u $(BUILD_DIR)/$(1).tmp_filelist_before $(BUILD_DIR)/$(1).tmp_filelist_after | \
> +		grep '^\+\./' | sed 's%^\+%%' > $(BUILD_DIR)/$(1).filelist
> +	$(RM) -f $(BUILD_DIR)/$(1).tmp_filelist_before \
> +		$(BUILD_DIR)/$(1).tmp_filelist_after
> +endef
> +
> +define step_pkg_size
> +	$(if $(filter install-target,$(2)),\
> +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef
> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> +
>  # User-supplied script
>  define step_user
>  	@$(foreach user_hook, $(BR2_INSTRUMENTATION_SCRIPTS), \
> -- 
> 2.0.0
> 
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
Jérôme Pouiller June 10, 2014, 4:42 p.m. UTC | #4
Hello Yann,

On Tuesday 10 June 2014 00:02:41 Yann E. MORIN wrote:
> Thomas, All,
> 
> On 2014-06-07 23:46 +0200, Thomas Petazzoni spake thusly:
> > This patch adds a global instrumentation hook that collects the list
> > of files installed in $(TARGET_DIR) by each package, and stores this
> > list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> > be used to determine the size contribution of each package to the
> > target root filesystem.
> > 
> > The only limitation is that if a file is installed by a package A, and
> > then overriden by a file from package B, the file will only be listed
> > in $(BUILD_DIR)/A.filelist as it is the first time we will see the
> > file.
> 
> If we really wanted to account for the realy package, we'd have to
> somehow notice that a pacakge did change the content of a file.
> 
> So, we would need to run sha1sum on all the files in the pre-step and
> the post step. Any differing line would mean a new file, or a changed
> file.
I did something similar in the past. I used inotify to follow modification 
done on TARGET_DIR. It was fast and detect overwritten files.

Stopping inotify process when build was interrupted was a little tricky, but 
results was corrects.
Yann E. MORIN June 10, 2014, 4:58 p.m. UTC | #5
Jérôme, All,

On 2014-06-10 18:20 +0200, Jérôme Pouiller spake thusly:
> On Tuesday 10 June 2014 00:02:41 Yann E. MORIN wrote:
> > Thomas, All,
> > 
> > On 2014-06-07 23:46 +0200, Thomas Petazzoni spake thusly:
> > > This patch adds a global instrumentation hook that collects the list
> > > of files installed in $(TARGET_DIR) by each package, and stores this
> > > list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> > > be used to determine the size contribution of each package to the
> > > target root filesystem.
> > > 
> > > The only limitation is that if a file is installed by a package A, and
> > > then overriden by a file from package B, the file will only be listed
> > > in $(BUILD_DIR)/A.filelist as it is the first time we will see the
> > > file.
> > 
> > If we really wanted to account for the realy package, we'd have to
> > somehow notice that a pacakge did change the content of a file.
> > 
> > So, we would need to run sha1sum on all the files in the pre-step and
> > the post step. Any differing line would mean a new file, or a changed
> > file.
> I did something similar in the past. I used inotify to follow modification 
> done on TARGET_DIR. It was fast and detect overwritten files.

I've been playing with inotify too, and it is not really up to the task.
I was able to overload the "wait queue" very easily, and missed some
events (sometimes a lot of them), when the build was creating/changing a
lot of files in rapid-fire.

I was able to overcome the "wait-queue" limitation by increasing the
number of queueable events, but that requires root access, but not
everyone can be root on their machine (think shared build farms), and
the defaults are quite low.

> Handling build interruption was a little tricky, but results was corrects.

That's also an issue I see, and I don't think we want to go into too
complex a setting.

Regards,
Yann E. MORIN.
Jérôme Pouiller June 10, 2014, 5:37 p.m. UTC | #6
On Tuesday 10 June 2014 18:58:40 Yann E. MORIN wrote:
> Jérôme, All,
> 
> On 2014-06-10 18:20 +0200, Jérôme Pouiller spake thusly:
> > On Tuesday 10 June 2014 00:02:41 Yann E. MORIN wrote:
> > > Thomas, All,
> > > 
> > > On 2014-06-07 23:46 +0200, Thomas Petazzoni spake thusly:
> > > > This patch adds a global instrumentation hook that collects the list
> > > > of files installed in $(TARGET_DIR) by each package, and stores this
> > > > list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> > > > be used to determine the size contribution of each package to the
> > > > target root filesystem.
> > > > 
> > > > The only limitation is that if a file is installed by a package A, and
> > > > then overriden by a file from package B, the file will only be listed
> > > > in $(BUILD_DIR)/A.filelist as it is the first time we will see the
> > > > file.
> > > 
> > > If we really wanted to account for the realy package, we'd have to
> > > somehow notice that a pacakge did change the content of a file.
> > > 
> > > So, we would need to run sha1sum on all the files in the pre-step and
> > > the post step. Any differing line would mean a new file, or a changed
> > > file.
> > 
> > I did something similar in the past. I used inotify to follow modification
> > done on TARGET_DIR. It was fast and detect overwritten files.
> 
> I've been playing with inotify too, and it is not really up to the task.
> I was able to overload the "wait queue" very easily, and missed some
> events (sometimes a lot of them), when the build was creating/changing a
> lot of files in rapid-fire.
> 
> I was able to overcome the "wait-queue" limitation by increasing the
> number of queueable events, but that requires root access, but not
> everyone can be root on their machine (think shared build farms), and
> the defaults are quite low.
Strange, I watched read access to staging directory and I didn't notice this 
issue. However, I did only one build at time.


> > Handling build interruption was a little tricky, but results was corrects.
> 
> That's also an issue I see, and I don't think we want to go into too
> complex a setting.
From some point of view, it is not so much complex than computing file 
checksums.
Arnout Vandecappelle June 24, 2014, 4:36 p.m. UTC | #7
On 07/06/14 23:46, Thomas Petazzoni wrote:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> be used to determine the size contribution of each package to the
> target root filesystem.
> 
> The only limitation is that if a file is installed by a package A, and
> then overriden by a file from package B, the file will only be listed
> in $(BUILD_DIR)/A.filelist as it is the first time we will see the
> file.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  package/pkg-generic.mk | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 5116ed9..069653e 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,30 @@ define step_time
>  endef
>  GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  
> +# Package size steps
> +define step_pkg_size_start
> +	echo "PKG SIZE START $(1)"
> +	(cd $(TARGET_DIR) ; find . -type f) | sort > \
> +		$(BUILD_DIR)/$(1).tmp_filelist_before
> +endef
> +
> +define step_pkg_size_end
> +	echo "PKG SIZE END $(1)"
> +	(cd $(TARGET_DIR); find . -type f) | sort > \
> +		$(BUILD_DIR)/$(1).tmp_filelist_after
> +	diff -u $(BUILD_DIR)/$(1).tmp_filelist_before $(BUILD_DIR)/$(1).tmp_filelist_after | \
> +		grep '^\+\./' | sed 's%^\+%%' > $(BUILD_DIR)/$(1).filelist
> +	$(RM) -f $(BUILD_DIR)/$(1).tmp_filelist_before \
> +		$(BUILD_DIR)/$(1).tmp_filelist_after
> +endef
> +
> +define step_pkg_size
> +	$(if $(filter install-target,$(2)),\
> +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef
> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size

 Since these instrumentation steps are relatively expensive (especially for
large builds with many small packages), I would prefer to only enable this after
setting some environment variable or config option or similar.

 Regards,
 Arnout

> +
>  # User-supplied script
>  define step_user
>  	@$(foreach user_hook, $(BR2_INSTRUMENTATION_SCRIPTS), \
>
Thomas Petazzoni June 24, 2014, 4:41 p.m. UTC | #8
Dear Arnout Vandecappelle,

On Tue, 24 Jun 2014 18:36:58 +0200, Arnout Vandecappelle wrote:

> > +# Package size steps
> > +define step_pkg_size_start
> > +	echo "PKG SIZE START $(1)"
> > +	(cd $(TARGET_DIR) ; find . -type f) | sort > \
> > +		$(BUILD_DIR)/$(1).tmp_filelist_before
> > +endef
> > +
> > +define step_pkg_size_end
> > +	echo "PKG SIZE END $(1)"
> > +	(cd $(TARGET_DIR); find . -type f) | sort > \
> > +		$(BUILD_DIR)/$(1).tmp_filelist_after
> > +	diff -u $(BUILD_DIR)/$(1).tmp_filelist_before $(BUILD_DIR)/$(1).tmp_filelist_after | \
> > +		grep '^\+\./' | sed 's%^\+%%' > $(BUILD_DIR)/$(1).filelist
> > +	$(RM) -f $(BUILD_DIR)/$(1).tmp_filelist_before \
> > +		$(BUILD_DIR)/$(1).tmp_filelist_after
> > +endef
> > +
> > +define step_pkg_size
> > +	$(if $(filter install-target,$(2)),\
> > +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> > +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> > +endef
> > +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> 
>  Since these instrumentation steps are relatively expensive (especially for
> large builds with many small packages), I would prefer to only enable this after
> setting some environment variable or config option or similar.

Agreed, I'll add this in the next iteration of the patch series.

Thomas
Yann E. MORIN June 24, 2014, 4:53 p.m. UTC | #9
Arnout, All,

On 2014-06-24 18:36 +0200, Arnout Vandecappelle spake thusly:
> On 07/06/14 23:46, Thomas Petazzoni wrote:
> > This patch adds a global instrumentation hook that collects the list
> > of files installed in $(TARGET_DIR) by each package, and stores this
> > list into a file called $(BUILD_DIR)/<pkgname>.filelist. It can later
> > be used to determine the size contribution of each package to the
> > target root filesystem.
> > 
> > The only limitation is that if a file is installed by a package A, and
> > then overriden by a file from package B, the file will only be listed
> > in $(BUILD_DIR)/A.filelist as it is the first time we will see the
> > file.
> > 
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> > ---
> >  package/pkg-generic.mk | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> > 
> > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > index 5116ed9..069653e 100644
> > --- a/package/pkg-generic.mk
> > +++ b/package/pkg-generic.mk
> > @@ -55,6 +55,30 @@ define step_time
> >  endef
> >  GLOBAL_INSTRUMENTATION_HOOKS += step_time
> >  
> > +# Package size steps
> > +define step_pkg_size_start
> > +	echo "PKG SIZE START $(1)"
> > +	(cd $(TARGET_DIR) ; find . -type f) | sort > \
> > +		$(BUILD_DIR)/$(1).tmp_filelist_before
> > +endef
> > +
> > +define step_pkg_size_end
> > +	echo "PKG SIZE END $(1)"
> > +	(cd $(TARGET_DIR); find . -type f) | sort > \
> > +		$(BUILD_DIR)/$(1).tmp_filelist_after
> > +	diff -u $(BUILD_DIR)/$(1).tmp_filelist_before $(BUILD_DIR)/$(1).tmp_filelist_after | \
> > +		grep '^\+\./' | sed 's%^\+%%' > $(BUILD_DIR)/$(1).filelist
> > +	$(RM) -f $(BUILD_DIR)/$(1).tmp_filelist_before \
> > +		$(BUILD_DIR)/$(1).tmp_filelist_after
> > +endef
> > +
> > +define step_pkg_size
> > +	$(if $(filter install-target,$(2)),\
> > +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> > +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> > +endef
> > +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> 
>  Since these instrumentation steps are relatively expensive (especially for
> large builds with many small packages), I would prefer to only enable this after
> setting some environment variable or config option or similar.

Ideally, I would like an entry in the menuconfig (probably in the "build
options" sub-menu):

    [*] Graphs and intrumentation  --->
        *** Graphs ***
        [ ] Packages' installed size
        [ ] Packages' build time
        [ ] Packages' dependencies
        [ ] Global dependencies
        *** Instrumentation ***
        [ ] Check RPATH
        [ ] Check for overwritten files

Thus, all relevant information would be automatically generated. Of
course, some can be generated on-demand (eg. the dependencies graphs),
while others 

(Note: the rpath check I'm currently working on, and the overwriting
check would be relatively easy to implement I guess; but that's just an
example.)

Regards,
Yann E. MORIN.
diff mbox

Patch

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 5116ed9..069653e 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -55,6 +55,30 @@  define step_time
 endef
 GLOBAL_INSTRUMENTATION_HOOKS += step_time
 
+# Package size steps
+define step_pkg_size_start
+	echo "PKG SIZE START $(1)"
+	(cd $(TARGET_DIR) ; find . -type f) | sort > \
+		$(BUILD_DIR)/$(1).tmp_filelist_before
+endef
+
+define step_pkg_size_end
+	echo "PKG SIZE END $(1)"
+	(cd $(TARGET_DIR); find . -type f) | sort > \
+		$(BUILD_DIR)/$(1).tmp_filelist_after
+	diff -u $(BUILD_DIR)/$(1).tmp_filelist_before $(BUILD_DIR)/$(1).tmp_filelist_after | \
+		grep '^\+\./' | sed 's%^\+%%' > $(BUILD_DIR)/$(1).filelist
+	$(RM) -f $(BUILD_DIR)/$(1).tmp_filelist_before \
+		$(BUILD_DIR)/$(1).tmp_filelist_after
+endef
+
+define step_pkg_size
+	$(if $(filter install-target,$(2)),\
+		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
+		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
+endef
+GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
+
 # User-supplied script
 define step_user
 	@$(foreach user_hook, $(BR2_INSTRUMENTATION_SCRIPTS), \