diff mbox

Allow embedded timestamps by C/C++ macros to be set externally

Message ID 55929721.4020400@openmailbox.org
State New
Headers show

Commit Message

Dhole June 30, 2015, 1:18 p.m. UTC
Hi!

We are working in Debian —and I know other free software projects
care— in providing our users with a way to reproduce bit-for-bit
identical binary packages from the source and build environment.
See <https://wiki.debian.org/ReproducibleBuilds/About> for some
rationale and further explanations.

In order to do this, we need to make the build processes deterministic.
As you can imagine, gcc is quite involved in producing Debian packages.
One issue we encounter in many packages that fail to build reproducibly
is the use of the __DATE__, __TIME__ C macros [1], right now we have 456
affected packages that would need patching (either removing the macros,
or passing a known date externally).

A solution for toolchain packages that embed timestamps during the build
process has been proposed for anyone interested and it consists of the
following:
The build environment can export an environment variable called
SOURCE_DATE_EPOCH with a known timestamp in Unix epoch format (In our
case, we use the last date of the package's debian changelog). The
toolchain package running during the build can check if the exported
variable is set and if so, instead of embedding the local date/time,
embed the date/time from SOURCE_DATE_EPOCH.

It would be very beneficial to our project (and other free software
projects working on reproducible builds) if gcc supported this feature.
I'm attaching a patch for gcc-5.1.0 that enables this feature: it
modifies the behavior of the macros __DATE__ and __TIME__ when
SOURCE_DATE_EPOCH is exported.

What do you think? Any suggestions or other ideas that help getting
reproducible builds are welcomed.

I'm willing to extend the documentation if the patch feels appropriate.

Thanks for your attention!

[1] https://wiki.debian.org/ReproducibleBuilds/TimestampsFromCPPMacros

Best regards,
Dhole

Comments

Manuel López-Ibáñez June 30, 2015, 2:43 p.m. UTC | #1
On 30/06/15 15:18, Dhole wrote:
> A solution for toolchain packages that embed timestamps during the build
> process has been proposed for anyone interested and it consists of the
> following:

Perhaps this has been discussed and discarded before (if so I would appreciate 
if you could point me to the relevant discussion), why not simply redefine 
__DATE__ and __TIME__ to an appropriate string via the command-line or a dummy 
include?

That probably triggers some warnings (or it may not be supported at all, I 
haven't tried myself), but fixing those issues leads to a more general solution 
than GCC reacting to an arbitrary variable name and changing its behaviour 
quite silently.

Cheers,

Manuel.
Manuel López-Ibáñez June 30, 2015, 2:48 p.m. UTC | #2
On 30/06/15 16:43, Manuel López-Ibáñez wrote:
> On 30/06/15 15:18, Dhole wrote:
>> A solution for toolchain packages that embed timestamps during the build
>> process has been proposed for anyone interested and it consists of the
>> following:
>
> Perhaps this has been discussed and discarded before (if so I would appreciate
> if you could point me to the relevant discussion), why not simply redefine
> __DATE__ and __TIME__ to an appropriate string via the command-line or a dummy
> include?
>
> That probably triggers some warnings (or it may not be supported at all, I
> haven't tried myself), but fixing those issues leads to a more general solution
> than GCC reacting to an arbitrary variable name and changing its behaviour
> quite silently.

In any case, you should be aware of point 10 here: 
https://gcc.gnu.org/wiki/Community (You only need to convince the 
decision-makers). I'm not one of them ;)

Cheers,

	Manuel.
Dhole June 30, 2015, 3:18 p.m. UTC | #3
On 06/30/2015 04:48 PM, Manuel López-Ibáñez wrote:
> On 30/06/15 16:43, Manuel López-Ibáñez wrote:
>> Perhaps this has been discussed and discarded before (if so I would
>> appreciate
>> if you could point me to the relevant discussion), why not simply
>> redefine
>> __DATE__ and __TIME__ to an appropriate string via the command-line or
>> a dummy
>> include?

I'm not aware of any previous discussion on the subject, but I'm also
interested in reading it in case it exists :)

In the debian reproducible builds project we have considered several
options to address this issue. We considered redefining the __DATE__ and
__TIME__ defines by command line flags passed to gcc, but as you say,
that triggers warnings, which could become errors when building with
-Werror and thus may require manual intervention on many packages.

We are trying to find a solution that can make as much packages build
reproducible as possible minimizing the amount of specific patches for
affected packages, and we believe such solution will benefit other
projects working on reproducible builds as well.

We propose to extend the env variable SOURCE_DATE_EPOCH to anyone
interested for this purpose. For instance, this feature has been
implemented upstream in help2man (1.47.1) [1], quoting the latest
changelog entry:
"""
  * Add support for reproducible builds by using $SOURCE_DATE_EPOCH as
    the date for the generated pages (closes: #787444).
"""

>> That probably triggers some warnings (or it may not be supported at
>> all, I
>> haven't tried myself), but fixing those issues leads to a more general
>> solution
>> than GCC reacting to an arbitrary variable name and changing its
>> behaviour
>> quite silently.

In case the fact that GCC changing its behavior silently is a concern,
we also discussed the possibility of enabling this feature with a flag
such as `--use-date-from-env`. Again, we are open to comments and other
ideas about this approach :)

> In any case, you should be aware of point 10 here:
> https://gcc.gnu.org/wiki/Community (You only need to convince the
> decision-makers). I'm not one of them ;)

Thanks for the tip!

[1] https://www.gnu.org/software/help2man/

Best regards,
Dhole
Manuel López-Ibáñez June 30, 2015, 4:23 p.m. UTC | #4
On 30 June 2015 at 17:18, Dhole <dhole@openmailbox.org> wrote:
> In the debian reproducible builds project we have considered several
> options to address this issue. We considered redefining the __DATE__ and
> __TIME__ defines by command line flags passed to gcc, but as you say,
> that triggers warnings, which could become errors when building with
> -Werror and thus may require manual intervention on many packages.

Well, it would require adding -Wno-something (-Wno-reproducible?
-Wno-unreproducible? or perhaps simply -freproducible? ) to some
CFLAGS/CXXFLAGS. Is that too much manual intervention? (I'm asking
sincerely, perhaps indeed it is).

This could be a big hammer option that simply disables any warning
that is not relevant for reproducible builds (the default being
-Wsomething), for example avoid emitting --Wbuiltin-macro-redefined
warnings in the specific cases of __TIME__ and __DATE. Just an idea,
the maintainers would need to say if they would accept such an option.

Cheers,

Manuel.
Martin Sebor June 30, 2015, 5:38 p.m. UTC | #5
> In the debian reproducible builds project we have considered several
> options to address this issue. We considered redefining the __DATE__ and
> __TIME__ defines by command line flags passed to gcc, but as you say,
> that triggers warnings, which could become errors when building with
> -Werror and thus may require manual intervention on many packages.

Would replacing the localtime function with one of your own
in a DSO and preloading the DSO when invoking GCC be a viable
solution? E.g., like so:

$ cat time.c && gcc -Wall -fpic -o libtime.so -shared time.c && echo 
"__DATE__ __TIME__" | LD_PRELOAD=./libtime.so gcc -E -xc -
#include <time.h>
static struct tm t;
struct tm *localtime (const time_t *timer) { return &t; }
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "<stdin>"
"Jan  0 1900" "00:00:00"

Martin
Mike Stump June 30, 2015, 6:07 p.m. UTC | #6
On Jun 30, 2015, at 10:38 AM, Martin Sebor <msebor@gmail.com> wrote:
>> In the debian reproducible builds project we have considered several
>> options to address this issue. We considered redefining the __DATE__ and
>> __TIME__ defines by command line flags passed to gcc, but as you say,
>> that triggers warnings, which could become errors when building with
>> -Werror and thus may require manual intervention on many packages.
> 
> Would replacing the localtime function with one of your own
> in a DSO and preloading the DSO when invoking GCC be a viable
> solution?

No please.  Not all systems have shared libraries, preloading and so on.  Indeed, I like building on my target system using the simulator, no shared libraries around.
Joseph Myers June 30, 2015, 10:26 p.m. UTC | #7
Since cpplib is a library and doesn't have any existing getenv calls, I 
wonder if it would be better for the cpplib client (i.e. something in the 
gcc/ directory) to be what calls getenv and then informs cpplib of the 
timestamp it should treat as being the time of compilation.
Dhole July 3, 2015, 1:58 p.m. UTC | #8
On 06/30/2015 06:23 PM, Manuel López-Ibáñez wrote:
> On 30 June 2015 at 17:18, Dhole <dhole@openmailbox.org> wrote:
>> In the debian reproducible builds project we have considered several
>> options to address this issue. We considered redefining the __DATE__ and
>> __TIME__ defines by command line flags passed to gcc, but as you say,
>> that triggers warnings, which could become errors when building with
>> -Werror and thus may require manual intervention on many packages.
> 
> Well, it would require adding -Wno-something (-Wno-reproducible?
> -Wno-unreproducible? or perhaps simply -freproducible? ) to some
> CFLAGS/CXXFLAGS. Is that too much manual intervention? (I'm asking
> sincerely, perhaps indeed it is).

Our idea with the SOURCE_DATE_EPOCH env var was to find a general
solution for all toolchain packages involved in the build process that
embed timestamps. We already have a patched version of a package used
during Debian builds (debhelper) which sets the SOURCE_DATE_EPOCH in the
build environment. With the submitted patch to GCC nothing else would be
needed, and we believe it would be useful to other projects working on
reproducible builds, as they would only need to set the
SOURCE_DATE_EPOCH env var during their build process. Modifying the
CFLAGS/CXXFLAGS would need more intervention during the build process,
and this would be a solution only useful for GCC and not other toolchain
packages. It could be done, but we'd prefer the general approach.

As mentioned before, we are trying to create a standard way of modifying
timestamp embedding behavior for any package with the SOURCE_DATE_EPOCH.

> This could be a big hammer option that simply disables any warning
> that is not relevant for reproducible builds (the default being
> -Wsomething), for example avoid emitting --Wbuiltin-macro-redefined
> warnings in the specific cases of __TIME__ and __DATE. Just an idea,
> the maintainers would need to say if they would accept such an option.
> 
> Cheers,
> 
> Manuel.
> 

I'm looking forward to hear opinions from the maintainers :)

Regards,
diff mbox

Patch

diff --git a/libcpp/macro.c b/libcpp/macro.c
index 1e0a0b5..a52e3cb 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -349,14 +349,38 @@  _cpp_builtin_macro_text (cpp_reader *pfile, cpp_hashnode *node)
 	     slow on some systems.  */
 	  time_t tt;
 	  struct tm *tb = NULL;
+	  char *source_date_epoch;
 
-	  /* (time_t) -1 is a legitimate value for "number of seconds
-	     since the Epoch", so we have to do a little dance to
-	     distinguish that from a genuine error.  */
-	  errno = 0;
-	  tt = time(NULL);
-	  if (tt != (time_t)-1 || errno == 0)
-	    tb = localtime (&tt);
+	  /* Allow the date and time to be set externally by an exported
+	     environment variable to enable reproducible builds. */
+	  source_date_epoch = getenv ("SOURCE_DATE_EPOCH");
+	  if (source_date_epoch)
+	    {
+	      errno = 0;
+	      tt = (time_t) strtol (source_date_epoch, NULL, 10);
+	      if (errno == 0)
+	        {
+		  tb = gmtime (&tt);
+		  if (tb == NULL)
+		    cpp_error (pfile, CPP_DL_ERROR,
+			       "SOURCE_DATE_EPOCH=\"%s\" is not a valid date",
+			       source_date_epoch);
+	        }
+	      else
+		cpp_error (pfile, CPP_DL_ERROR,
+			   "SOURCE_DATE_EPOCH=\"%s\" is not a valid number",
+			   source_date_epoch);
+	    }
+	  else
+	    {
+	    /* (time_t) -1 is a legitimate value for "number of seconds
+	       since the Epoch", so we have to do a little dance to
+	       distinguish that from a genuine error.  */
+	    errno = 0;
+	    tt = time(NULL);
+	    if (tt != (time_t)-1 || errno == 0)
+	      tb = localtime (&tt);
+	  }
 
 	  if (tb)
 	    {