Message ID | 1417470100-32657-3-git-send-email-thomas.petazzoni@free-electrons.com |
---|---|
State | Changes Requested |
Headers | show |
Hello Thomas, On Monday 01 December 2014 22:41:38 Thomas Petazzoni wrote: > This patch adds a global instrumentation hook that collects the list > of files installed in $(TARGET_DIR) by each package, and stores this > list into a file called $(BUILD_DIR)/packages-file-list.txt. It can > later be used to determine the size contribution of each package to > the target root filesystem. > > Note that in order to detect if a file installed by one package is > later overriden by another package, we calculate the md5 of installed > files and compare them at each installation of a new package. > > This commit also adds a Config.in option to enable the collection of > this data, as calculating the md5 of all installed files at the > beginning and end of the installation of each package can be > considered a time-consuming process which maybe some users will not be > willing to suffer from. > > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> > --- > Config.in | 9 +++++++++ > package/pkg-generic.mk | 36 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 45 insertions(+) > > diff --git a/Config.in b/Config.in > index 1aa1080..328654c 100644 > --- a/Config.in > +++ b/Config.in > @@ -569,6 +569,15 @@ config BR2_GLOBAL_PATCH_DIR > Otherwise, if the directory <global-patch-dir>/<packagename> exists, > then all *.patch files in the directory will be applied. > > +config BR2_COLLECT_FILE_SIZE_STATS > + bool "collect statistics about installed file size" > + help > + Enable this option to let Buildroot collect data about the > + installed files. When this option is enabled, you will be > + able to use the 'size-stats' make target, which will > + generate a graph and CSV files giving statistics about the > + installed size of each file and each package. > + > endmenu > > source "toolchain/Config.in" > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk > index 9643a30..82f8ff8 100644 > --- a/package/pkg-generic.mk > +++ b/package/pkg-generic.mk > @@ -55,6 +55,42 @@ define step_time > endef > GLOBAL_INSTRUMENTATION_HOOKS += step_time > > +# Hooks to collect statistics about installed files > +ifeq ($(BR2_COLLECT_FILE_SIZE_STATS),y) > + > +# This hook will be called before the target installation of a > +# package. We store in a file named $(1).filelist_before the list of > +# files currently installed in the target. Note that the MD5 is also > +# stored, in order to identify if the files are overwritten. > +define step_pkg_size_start > + (cd $(TARGET_DIR) ; find . -type f | xargs md5sum) | sort > \ > + $(BUILD_DIR)/$(1).filelist_before > +endef I think this does not work if filename contains spaces. > +# This hook will be called after the target installation of a > +# package. We store in a file named $(1).filelist_after the list > +# of files (and their MD5) currently installed in the target. We then > +# do a diff with the $(1).filelist_before to compute the list of > +# files installed by this package. > +define step_pkg_size_end > + (cd $(TARGET_DIR); find . -type f | xargs md5sum) | sort > \ > + $(BUILD_DIR)/$(1).filelist_after > + comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \ > + while read hash file ; do \ > + echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \ > + done Does it would make sense if we also record removed lines? We may wrote another script that detect if a file was in conflict between two packages. > + $(RM) -f $(BUILD_DIR)/$(1).filelist_before \ > + $(BUILD_DIR)/$(1).filelist_after > +endef > + > +define step_pkg_size > + $(if $(filter install-target,$(2)),\ > + $(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \ > + $(if $(filter end,$(1)),$(call step_pkg_size_end,$(3)))) > +endef > +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size > +endif > + > # User-supplied script > ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),) > define step_user >
Dear Jérôme Pouiller, On Tue, 02 Dec 2014 12:00:51 +0100, Jérôme Pouiller wrote: > > +# This hook will be called before the target installation of a > > +# package. We store in a file named $(1).filelist_before the list of > > +# files currently installed in the target. Note that the MD5 is also > > +# stored, in order to identify if the files are overwritten. > > +define step_pkg_size_start > > + (cd $(TARGET_DIR) ; find . -type f | xargs md5sum) | sort > \ > > + $(BUILD_DIR)/$(1).filelist_before > > +endef > I think this does not work if filename contains spaces. Hum, yes, very possible. But is Buildroot really working fine as a whole if some file in the target filesystem has some spaces? > > +# This hook will be called after the target installation of a > > +# package. We store in a file named $(1).filelist_after the list > > +# of files (and their MD5) currently installed in the target. We then > > +# do a diff with the $(1).filelist_before to compute the list of > > +# files installed by this package. > > +define step_pkg_size_end > > + (cd $(TARGET_DIR); find . -type f | xargs md5sum) | sort > \ > > + $(BUILD_DIR)/$(1).filelist_after > > + comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \ > > + while read hash file ; do \ > > + echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \ > > + done > Does it would make sense if we also record removed lines? We may wrote > another script that detect if a file was in conflict between two packages. I'm not sure to follow you here. We already take care of packages installing the same file, that's the whole point of storing the MD5 of each file. By using comm -13, we keep only the lines that are unique in the second file (compared to the first file). So we keep lines for either new files added by this package, or files already installed but overwritten by the package (detected using the MD5). Thanks, Thomas
On Tuesday 02 December 2014 13:23:49 Thomas Petazzoni wrote: > Dear Jérôme Pouiller, > > On Tue, 02 Dec 2014 12:00:51 +0100, Jérôme Pouiller wrote: > > > > +# This hook will be called before the target installation of a > > > +# package. We store in a file named $(1).filelist_before the list of > > > +# files currently installed in the target. Note that the MD5 is also > > > +# stored, in order to identify if the files are overwritten. > > > +define step_pkg_size_start > > > + (cd $(TARGET_DIR) ; find . -type f | xargs md5sum) | sort > \ > > > + $(BUILD_DIR)/$(1).filelist_before > > > +endef > > I think this does not work if filename contains spaces. > > Hum, yes, very possible. But is Buildroot really working fine as a > whole if some file in the target filesystem has some spaces? I don't know, but adding -print0/--null is cheap. > > > +# This hook will be called after the target installation of a > > > +# package. We store in a file named $(1).filelist_after the list > > > +# of files (and their MD5) currently installed in the target. We then > > > +# do a diff with the $(1).filelist_before to compute the list of > > > +# files installed by this package. > > > +define step_pkg_size_end > > > + (cd $(TARGET_DIR); find . -type f | xargs md5sum) | sort > \ > > > + $(BUILD_DIR)/$(1).filelist_after > > > + comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \ > > > + while read hash file ; do \ > > > + echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \ > > > + done > > Does it would make sense if we also record removed lines? We may wrote > > another script that detect if a file was in conflict between two packages. > > I'm not sure to follow you here. We already take care of packages > installing the same file, that's the whole point of storing the MD5 of > each file. By using comm -13, we keep only the lines that are unique in > the second file (compared to the first file). So we keep lines for > either new files added by this package, or files already installed but > overwritten by the package (detected using the MD5). Recording deleted files has no interest for current purpose. However, I though to use packages-file-list.txt for other scripts, and especially, to detect suspicious file modifications. I agree current format is enough to give information about overwrote files, but it may be handier to exploit with file removal information. (In add, in case of file removal, it is not possible to find guilty package). I just noticed another thing. To make this feature compatible to BR2_JLEVEL, we just need to manage a mutex in step_pkg_size hook. Do you planned to add one?
On Tuesday 02 December 2014 14:22:11 Jérôme Pouiller wrote: > On Tuesday 02 December 2014 13:23:49 Thomas Petazzoni wrote: [...] > I just noticed another thing. To make this feature compatible to > BR2_JLEVEL, we just need to manage a mutex in step_pkg_size hook. Do ^^^^^^^^^^ Sure, I meant "top-level parallel make". > you planned to add one?
diff --git a/Config.in b/Config.in index 1aa1080..328654c 100644 --- a/Config.in +++ b/Config.in @@ -569,6 +569,15 @@ config BR2_GLOBAL_PATCH_DIR Otherwise, if the directory <global-patch-dir>/<packagename> exists, then all *.patch files in the directory will be applied. +config BR2_COLLECT_FILE_SIZE_STATS + bool "collect statistics about installed file size" + help + Enable this option to let Buildroot collect data about the + installed files. When this option is enabled, you will be + able to use the 'size-stats' make target, which will + generate a graph and CSV files giving statistics about the + installed size of each file and each package. + endmenu source "toolchain/Config.in" diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk index 9643a30..82f8ff8 100644 --- a/package/pkg-generic.mk +++ b/package/pkg-generic.mk @@ -55,6 +55,42 @@ define step_time endef GLOBAL_INSTRUMENTATION_HOOKS += step_time +# Hooks to collect statistics about installed files +ifeq ($(BR2_COLLECT_FILE_SIZE_STATS),y) + +# This hook will be called before the target installation of a +# package. We store in a file named $(1).filelist_before the list of +# files currently installed in the target. Note that the MD5 is also +# stored, in order to identify if the files are overwritten. +define step_pkg_size_start + (cd $(TARGET_DIR) ; find . -type f | xargs md5sum) | sort > \ + $(BUILD_DIR)/$(1).filelist_before +endef + +# This hook will be called after the target installation of a +# package. We store in a file named $(1).filelist_after the list +# of files (and their MD5) currently installed in the target. We then +# do a diff with the $(1).filelist_before to compute the list of +# files installed by this package. +define step_pkg_size_end + (cd $(TARGET_DIR); find . -type f | xargs md5sum) | sort > \ + $(BUILD_DIR)/$(1).filelist_after + comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \ + while read hash file ; do \ + echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \ + done + $(RM) -f $(BUILD_DIR)/$(1).filelist_before \ + $(BUILD_DIR)/$(1).filelist_after +endef + +define step_pkg_size + $(if $(filter install-target,$(2)),\ + $(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \ + $(if $(filter end,$(1)),$(call step_pkg_size_end,$(3)))) +endef +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size +endif + # User-supplied script ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),) define step_user
This patch adds a global instrumentation hook that collects the list of files installed in $(TARGET_DIR) by each package, and stores this list into a file called $(BUILD_DIR)/packages-file-list.txt. It can later be used to determine the size contribution of each package to the target root filesystem. Note that in order to detect if a file installed by one package is later overriden by another package, we calculate the md5 of installed files and compare them at each installation of a new package. This commit also adds a Config.in option to enable the collection of this data, as calculating the md5 of all installed files at the beginning and end of the installation of each package can be considered a time-consuming process which maybe some users will not be willing to suffer from. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> --- Config.in | 9 +++++++++ package/pkg-generic.mk | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+)