diff mbox

[09/34] reproducibility/libglib2: allow removing codegen

Message ID 1462002570-14706-9-git-send-email-gilles.chanteperdrix@xenomai.org
State Changes Requested
Headers show

Commit Message

Gilles Chanteperdrix April 30, 2016, 7:49 a.m. UTC
glib2 installs compiled python bytecode in /usr/share/glib-2.0/codegen,
likely for the gdbus-codegen program.

This bytecode is not compiled with the buildroot host python, which
causes a first source of non reproducibility, fix that by using
buildroot host python.

But this is not sufficient, compiling the python bytecode with the
same interpreter in different environments yields different binaries,
so, since buildroot users are unlikely to need the qdbus-codegen
program on target, add an option to remove it. The option is disabled
by default.
---
 package/libglib2/Config.in   | 10 ++++++++++
 package/libglib2/libglib2.mk | 18 ++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

Comments

Thomas Petazzoni May 7, 2016, 1:28 p.m. UTC | #1
Hello,

Cc'ing Gustavo on this, since he has done quite a bit of libglib
packaging work.

On Sat, 30 Apr 2016 09:49:05 +0200, Gilles Chanteperdrix wrote:
> glib2 installs compiled python bytecode in /usr/share/glib-2.0/codegen,
> likely for the gdbus-codegen program.
> 
> This bytecode is not compiled with the buildroot host python, which
> causes a first source of non reproducibility, fix that by using
> buildroot host python.

Which has the annoying side effect of increasing even more the set of
mandatory dependencies to build libglib :-/

So I wonder if the dependency on host-python should be made conditional
on BR2_REPRODUCIBLE.

> But this is not sufficient, compiling the python bytecode with the
> same interpreter in different environments yields different binaries,
> so, since buildroot users are unlikely to need the qdbus-codegen
> program on target, add an option to remove it. The option is disabled
> by default.

Since then, Gustavo has already modified the libglib2 package to
remove the /usr/share/glib-2.0/codegen directory. However,
gdbus-codegen is not removed.

See commit 1f1f16e9e5367048faa1cf17237e0c0d422e98d6.

Thomas
Arnout Vandecappelle May 7, 2016, 9:04 p.m. UTC | #2
On 04/30/16 09:49, Gilles Chanteperdrix wrote:
> But this is not sufficient, compiling the python bytecode with the
> same interpreter in different environments yields different binaries,

  Er, this is worrisome... You are saying that we don't have a chance in hell of 
generating reproducible python bytecode?

  Regards,
  Arnout
Gustavo Zacarias May 8, 2016, 12:51 p.m. UTC | #3
On 07/05/16 10:28, Thomas Petazzoni wrote:

> Which has the annoying side effect of increasing even more the set of
> mandatory dependencies to build libglib :-/
>
> So I wonder if the dependency on host-python should be made conditional
> on BR2_REPRODUCIBLE.

Hi.
Generally, there's no use case of having gdbus-codegen in the target 
since it's used at build time.

> Since then, Gustavo has already modified the libglib2 package to
> remove the /usr/share/glib-2.0/codegen directory. However,
> gdbus-codegen is not removed.
>
> See commit 1f1f16e9e5367048faa1cf17237e0c0d422e98d6.

Exactly, gdbus-codegen is a leftover that i missed, it can be safely 
removed.
Regards.
Thomas Petazzoni May 8, 2016, 12:56 p.m. UTC | #4
Hello,

On Sun, 8 May 2016 09:51:32 -0300, Gustavo Zacarias wrote:

> Exactly, gdbus-codegen is a leftover that i missed, it can be safely 
> removed.

Thanks for confirming. Could you submit a patch doing this?

Thanks!

Thomas
Gilles Chanteperdrix May 8, 2016, 8:25 p.m. UTC | #5
On Sat, May 07, 2016 at 11:04:25PM +0200, Arnout Vandecappelle wrote:
> On 04/30/16 09:49, Gilles Chanteperdrix wrote:
> > But this is not sufficient, compiling the python bytecode with the
> > same interpreter in different environments yields different binaries,
> 
>   Er, this is worrisome... You are saying that we don't have a chance in hell of 
> generating reproducible python bytecode?

I am not a python specialist, but it seems to me four bytes in the
python generated bytecode are the build timestamp, so unless there
is a way to override it with SOURCE_DATE_EPOCH, I do not see that
possible.
Arnout Vandecappelle May 9, 2016, 11:40 p.m. UTC | #6
On 05/08/16 22:25, Gilles Chanteperdrix wrote:
> On Sat, May 07, 2016 at 11:04:25PM +0200, Arnout Vandecappelle wrote:
>> On 04/30/16 09:49, Gilles Chanteperdrix wrote:
>>> But this is not sufficient, compiling the python bytecode with the
>>> same interpreter in different environments yields different binaries,
>>
>>   Er, this is worrisome... You are saying that we don't have a chance in hell of
>> generating reproducible python bytecode?
>
> I am not a python specialist, but it seems to me four bytes in the
> python generated bytecode are the build timestamp, so unless there
> is a way to override it with SOURCE_DATE_EPOCH, I do not see that
> possible.

  I've checked the docs. What is saved is the timestamp of the .py file, not the 
build time.

  I think it would make sense to run 'touch -d @$(SOURCE_DATE_EPOCH)' on all 
files after patching to capture this aspect. This was also proposed for 
Fedora[1] (though there it was only for the .py files). Not sure what happened 
with that proposal in the end.

  Regards,
  Arnout

[1] https://lists.fedoraproject.org/pipermail/python-devel/2014-August/000617.html
Gilles Chanteperdrix May 10, 2016, 7:42 p.m. UTC | #7
On Tue, May 10, 2016 at 01:40:30AM +0200, Arnout Vandecappelle wrote:
> On 05/08/16 22:25, Gilles Chanteperdrix wrote:
> > On Sat, May 07, 2016 at 11:04:25PM +0200, Arnout Vandecappelle wrote:
> >> On 04/30/16 09:49, Gilles Chanteperdrix wrote:
> >>> But this is not sufficient, compiling the python bytecode with the
> >>> same interpreter in different environments yields different binaries,
> >>
> >>   Er, this is worrisome... You are saying that we don't have a chance in hell of
> >> generating reproducible python bytecode?
> >
> > I am not a python specialist, but it seems to me four bytes in the
> > python generated bytecode are the build timestamp, so unless there
> > is a way to override it with SOURCE_DATE_EPOCH, I do not see that
> > possible.
> 
>   I've checked the docs. What is saved is the timestamp of the .py file, not the 
> build time.

Mmmm I don't remember. I would run a compilation manually, twice at
a different time, to make sure that the problem is only the file
date.

> 
>   I think it would make sense to run 'touch -d @$(SOURCE_DATE_EPOCH)' on all 
> files after patching to capture this aspect. This was also proposed for 
> Fedora[1] (though there it was only for the .py files). Not sure what happened 
> with that proposal in the end.

I think I tried running "touch" before compiling, unfortunately
playing with file dates before running "make" is a bit like playing
with fire. For instance with autotools based projects for which
autoreconf is not run, the project may use versions of the autotools
not installed on the user machine, and because of file dates may
want to rerun autoconf or autmake and whine because the right
versions are not available. Doing this only for .py files is much
more reasonable.
Arnout Vandecappelle May 12, 2016, 8:05 p.m. UTC | #8
On 05/10/16 21:42, Gilles Chanteperdrix wrote:
> On Tue, May 10, 2016 at 01:40:30AM +0200, Arnout Vandecappelle wrote:
>> On 05/08/16 22:25, Gilles Chanteperdrix wrote:
>>> On Sat, May 07, 2016 at 11:04:25PM +0200, Arnout Vandecappelle wrote:
>>>> On 04/30/16 09:49, Gilles Chanteperdrix wrote:
>>>>> But this is not sufficient, compiling the python bytecode with the
>>>>> same interpreter in different environments yields different binaries,
>>>>
>>>>   Er, this is worrisome... You are saying that we don't have a chance in hell of
>>>> generating reproducible python bytecode?
>>>
>>> I am not a python specialist, but it seems to me four bytes in the
>>> python generated bytecode are the build timestamp, so unless there
>>> is a way to override it with SOURCE_DATE_EPOCH, I do not see that
>>> possible.
>>
>>   I've checked the docs. What is saved is the timestamp of the .py file, not the
>> build time.
>
> Mmmm I don't remember. I would run a compilation manually, twice at
> a different time, to make sure that the problem is only the file
> date.

  I did that - admittedly with just a few seconds difference. Both in 
python2.7.11 and python3.5, they were identical when compiling a second time, 
and different after touch'ing.

>
>>
>>   I think it would make sense to run 'touch -d @$(SOURCE_DATE_EPOCH)' on all
>> files after patching to capture this aspect. This was also proposed for
>> Fedora[1] (though there it was only for the .py files). Not sure what happened
>> with that proposal in the end.
>
> I think I tried running "touch" before compiling, unfortunately
> playing with file dates before running "make" is a bit like playing
> with fire. For instance with autotools based projects for which
> autoreconf is not run, the project may use versions of the autotools
> not installed on the user machine, and because of file dates may
> want to rerun autoconf or autmake and whine because the right
> versions are not available. Doing this only for .py files is much
> more reasonable.

  I tested this as well, with make-3.81 and make-4.04. Both of them do _not_ 
rebuild if the timestamps are identical. And since the idea is to use touch -d 
@$(SOURCE_DATE_EPOCH), all timestamps will be identical.

  But it does indeed mean that if a package has a generated file with an earlier 
date than the source files, it will now suddenly no longer be rebuilt.

  I don't know how the various other build systems (waf, scons) will behave. 
There is indeed a risk, but I'd say we deal with that when it happens. Actually 
I think we would prefer to detect such issues.

  Regards,
  Arnout
Gilles Chanteperdrix May 14, 2016, 1:34 p.m. UTC | #9
On Thu, May 12, 2016 at 10:05:36PM +0200, Arnout Vandecappelle wrote:
> On 05/10/16 21:42, Gilles Chanteperdrix wrote:
> > On Tue, May 10, 2016 at 01:40:30AM +0200, Arnout Vandecappelle wrote:
> >> On 05/08/16 22:25, Gilles Chanteperdrix wrote:
> >>> On Sat, May 07, 2016 at 11:04:25PM +0200, Arnout Vandecappelle wrote:
> >>>> On 04/30/16 09:49, Gilles Chanteperdrix wrote:
> >>>>> But this is not sufficient, compiling the python bytecode with the
> >>>>> same interpreter in different environments yields different binaries,
> >>>>
> >>>>   Er, this is worrisome... You are saying that we don't have a chance in hell of
> >>>> generating reproducible python bytecode?
> >>>
> >>> I am not a python specialist, but it seems to me four bytes in the
> >>> python generated bytecode are the build timestamp, so unless there
> >>> is a way to override it with SOURCE_DATE_EPOCH, I do not see that
> >>> possible.
> >>
> >>   I've checked the docs. What is saved is the timestamp of the .py file, not the
> >> build time.
> >
> > Mmmm I don't remember. I would run a compilation manually, twice at
> > a different time, to make sure that the problem is only the file
> > date.
> 
>   I did that - admittedly with just a few seconds difference. Both in 
> python2.7.11 and python3.5, they were identical when compiling a second time, 
> and different after touch'ing.
> 
> >
> >>
> >>   I think it would make sense to run 'touch -d @$(SOURCE_DATE_EPOCH)' on all
> >> files after patching to capture this aspect. This was also proposed for
> >> Fedora[1] (though there it was only for the .py files). Not sure what happened
> >> with that proposal in the end.
> >
> > I think I tried running "touch" before compiling, unfortunately
> > playing with file dates before running "make" is a bit like playing
> > with fire. For instance with autotools based projects for which
> > autoreconf is not run, the project may use versions of the autotools
> > not installed on the user machine, and because of file dates may
> > want to rerun autoconf or autmake and whine because the right
> > versions are not available. Doing this only for .py files is much
> > more reasonable.
> 
>   I tested this as well, with make-3.81 and make-4.04. Both of them do _not_ 
> rebuild if the timestamps are identical. And since the idea is to use touch -d 
> @$(SOURCE_DATE_EPOCH), all timestamps will be identical.

You checked both the pyo and the pyc?

> 
>   But it does indeed mean that if a package has a generated file with an earlier 
> date than the source files, it will now suddenly no longer be
> rebuilt.

Yes, that is another problem. But I tried it, and this is not the
one I had, the one I had was the contrary: the dates made make want
to rebuild some files (the autotools/automake files), whereas the
right versions of autoconf and automake were not installed, this was
a package that did not run autoreconf. The "make" tool is completely
based on file dates, so again, I think messing with file dates
before running make is a bad idea.
Arnout Vandecappelle May 14, 2016, 11:48 p.m. UTC | #10
On 05/14/16 15:34, Gilles Chanteperdrix wrote:
> On Thu, May 12, 2016 at 10:05:36PM +0200, Arnout Vandecappelle wrote:
>> On 05/10/16 21:42, Gilles Chanteperdrix wrote:
>>> On Tue, May 10, 2016 at 01:40:30AM +0200, Arnout Vandecappelle wrote:
[snip]
>>>>   I think it would make sense to run 'touch -d @$(SOURCE_DATE_EPOCH)' on all
>>>> files after patching to capture this aspect. This was also proposed for
>>>> Fedora[1] (though there it was only for the .py files). Not sure what happened
>>>> with that proposal in the end.
>>>
>>> I think I tried running "touch" before compiling, unfortunately
>>> playing with file dates before running "make" is a bit like playing
>>> with fire. For instance with autotools based projects for which
>>> autoreconf is not run, the project may use versions of the autotools
>>> not installed on the user machine, and because of file dates may
>>> want to rerun autoconf or autmake and whine because the right
>>> versions are not available. Doing this only for .py files is much
>>> more reasonable.
>>
>>   I tested this as well, with make-3.81 and make-4.04. Both of them do _not_
>> rebuild if the timestamps are identical. And since the idea is to use touch -d
>> @$(SOURCE_DATE_EPOCH), all timestamps will be identical.
>
> You checked both the pyo and the pyc?

  Yep.


>>   But it does indeed mean that if a package has a generated file with an earlier
>> date than the source files, it will now suddenly no longer be
>> rebuilt.
>
> Yes, that is another problem. But I tried it, and this is not the
> one I had, the one I had was the contrary: the dates made make want
> to rebuild some files (the autotools/automake files), whereas the
> right versions of autoconf and automake were not installed, this was
> a package that did not run autoreconf. The "make" tool is completely
> based on file dates, so again, I think messing with file dates
> before running make is a bad idea.

  Hm, I don't see how this could happen... My test showed that 'make' would not 
try to rebuild the autotools/automake files when they have the same timestamp as 
the corresponding source file. So what triggers the reconf then?

  I've tried with a random package (bash) and it's not reconfiguring after 
touching all sources. Do you remember for which package you saw this?


  Regards,
  Arnout
Gilles Chanteperdrix June 14, 2016, 2:18 p.m. UTC | #11
On Sun, May 15, 2016 at 01:48:29AM +0200, Arnout Vandecappelle wrote:
> On 05/14/16 15:34, Gilles Chanteperdrix wrote:
> > On Thu, May 12, 2016 at 10:05:36PM +0200, Arnout Vandecappelle wrote:
> >> On 05/10/16 21:42, Gilles Chanteperdrix wrote:
> >>> On Tue, May 10, 2016 at 01:40:30AM +0200, Arnout Vandecappelle wrote:
> [snip]
> >>>>   I think it would make sense to run 'touch -d @$(SOURCE_DATE_EPOCH)' on all
> >>>> files after patching to capture this aspect. This was also proposed for
> >>>> Fedora[1] (though there it was only for the .py files). Not sure what happened
> >>>> with that proposal in the end.
> >>>
> >>> I think I tried running "touch" before compiling, unfortunately
> >>> playing with file dates before running "make" is a bit like playing
> >>> with fire. For instance with autotools based projects for which
> >>> autoreconf is not run, the project may use versions of the autotools
> >>> not installed on the user machine, and because of file dates may
> >>> want to rerun autoconf or autmake and whine because the right
> >>> versions are not available. Doing this only for .py files is much
> >>> more reasonable.
> >>
> >>   I tested this as well, with make-3.81 and make-4.04. Both of them do _not_
> >> rebuild if the timestamps are identical. And since the idea is to use touch -d
> >> @$(SOURCE_DATE_EPOCH), all timestamps will be identical.
> >
> > You checked both the pyo and the pyc?
> 
>   Yep.
> 
> 
> >>   But it does indeed mean that if a package has a generated file with an earlier
> >> date than the source files, it will now suddenly no longer be
> >> rebuilt.
> >
> > Yes, that is another problem. But I tried it, and this is not the
> > one I had, the one I had was the contrary: the dates made make want
> > to rebuild some files (the autotools/automake files), whereas the
> > right versions of autoconf and automake were not installed, this was
> > a package that did not run autoreconf. The "make" tool is completely
> > based on file dates, so again, I think messing with file dates
> > before running make is a bad idea.
> 
>   Hm, I don't see how this could happen... My test showed that 'make' would not 
> try to rebuild the autotools/automake files when they have the same timestamp as 
> the corresponding source file. So what triggers the reconf then?
> 
>   I've tried with a random package (bash) and it's not reconfiguring after 
> touching all sources. Do you remember for which package you saw this?

Actually, I can not remember what I did. I may not have used
SOURCE_DATE_EPOCH. Anyway, I still find touching all the source
files can have unintended effects, and it is not sufficient either:
- some py files may be generated by configure
- some py files may be generated by Makefiles

So, I would suggest to add a python compiler wrapper which touches
the files with SOURCE_DATE_EPOCH before compiling them. That would
cover all the cases and limit the unintended effects. The downside
is that you have to get all packages compiling python files to use
that wrapper instead of the python compiler.
diff mbox

Patch

diff --git a/package/libglib2/Config.in b/package/libglib2/Config.in
index 7cbfea5..8a39805 100644
--- a/package/libglib2/Config.in
+++ b/package/libglib2/Config.in
@@ -13,6 +13,16 @@  config BR2_PACKAGE_LIBGLIB2
 
 	  http://www.gtk.org/
 
+if BR2_PACKAGE_LIBGLIB2
+
+config BR2_PACKAGE_LIBGLIB2_CODEGEN
+	bool "install Glib D-Bus code and documentation generator"
+	default y
+	help
+	  Install Glib D-Bus code and documentation generator.
+
+endif
+
 comment "libglib2 needs a toolchain w/ wchar, threads"
 	depends on BR2_USE_MMU
 	depends on !BR2_USE_WCHAR || !BR2_TOOLCHAIN_HAS_THREADS
diff --git a/package/libglib2/libglib2.mk b/package/libglib2/libglib2.mk
index 8cf055d..7feba90 100644
--- a/package/libglib2/libglib2.mk
+++ b/package/libglib2/libglib2.mk
@@ -99,9 +99,9 @@  HOST_LIBGLIB2_CONF_OPTS = \
 	--disable-systemtap \
 	--disable-xattr
 
-LIBGLIB2_DEPENDENCIES = host-pkgconf host-libglib2 libffi zlib $(if $(BR2_NEEDS_GETTEXT),gettext) host-gettext
+LIBGLIB2_DEPENDENCIES = host-pkgconf host-libglib2 libffi zlib $(if $(BR2_NEEDS_GETTEXT),gettext) host-gettext host-python
 
-HOST_LIBGLIB2_DEPENDENCIES = host-pkgconf host-libffi host-zlib host-gettext
+HOST_LIBGLIB2_DEPENDENCIES = host-pkgconf host-libffi host-zlib host-gettext host-python
 
 ifneq ($(BR2_ENABLE_LOCALE),y)
 LIBGLIB2_DEPENDENCIES += libiconv
@@ -137,6 +137,20 @@  ifneq ($(BR2_PACKAGE_GDB),y)
 LIBGLIB2_POST_INSTALL_TARGET_HOOKS += LIBGLIB2_REMOVE_GDB_FILES
 endif
 
+ifeq ($(BR2_PACKAGE_LIBGLIB2_CODEGEN),)
+define LIBGLIB2_REMOVE_CODEGEN
+	rm $(TARGET_DIR)/usr/bin/gdbus-codegen
+	rm -rf $(TARGET_DIR)/usr/share/glib-2.0/codegen
+	rmdir --ignore-fail-on-non-empty $(TARGET_DIR)/usr/share/glib-2.0
+endef
+
+LIBGLIB2_POST_INSTALL_TARGET_HOOKS += LIBGLIB2_REMOVE_CODEGEN
+endif
+
+LIBGLIB2_CONF_OPTS += --with-python=$(HOST_DIR)/usr/bin/python
+
+HOST_LIBGLIB2_CONF_OPTS += --with-python=$(HOST_DIR)/usr/bin/python
+
 $(eval $(autotools-package))
 $(eval $(host-autotools-package))