Warn about using _ino_t without -D_FILE_OFFSET_BITS=64.
diff mbox

Message ID 20140317174452.GA10644@domone
State New
Headers show

Commit Message

Ondřej Bílka March 17, 2014, 5:44 p.m. UTC
On Tue, Mar 11, 2014 at 06:51:55AM -0400, Mike Frysinger wrote:
> On Wed 05 Mar 2014 10:13:31 Ondřej Bílka wrote:
> > On Thu, Feb 27, 2014 at 09:45:37AM -0800, Paul Eggert wrote:
> > > Squashing an inode that way has a small chance of introducing what
> > > could be a serious bug.  If glibc is going to squash them, it should
> > > do so reliably, by maintaining a table of all the inodes it's ever
> > > seen and making sure there are no collisions.
> > > 
> > > Why bother to squash them at all, though?  Programs that care about
> > > files should be compiled with _FILE_OFFSET_BITS defined to 64.  If
> > > we're worried about programs that don't define _FILE_OFFSET_BITS, we
> > > could change glibc to default to _FILE_OFFSET_BITS=64; that's a
> > > better long-term solution anyway.
> > 
> > Changing default would be better. I dig while how to do it and can not
> > find a nonugly solution. How should we do this?
> 
> isn't it simply:
> 
> --- a/include/features.h
> +++ b/include/features.h
> @@ -303,7 +303,7 @@
>  # define __USE_LARGEFILE64	1
>  #endif
>  
> -#if defined _FILE_OFFSET_BITS && _FILE_OFFSET_BITS == 64
> +#if !defined _FILE_OFFSET_BITS || _FILE_OFFSET_BITS == 64
>  # define __USE_FILE_OFFSET64	1
>  #endif
>  
> 
> the glibc source files themselves seem to expect default of 
> _FILE_OFFSET_BITS=32 though, so we might have to add that to default CPPFLAGS 
> when building.
> 
> i kind of feel like deploying this on Gentoo systems to see what breaks ;).
> -mike

Yes, it is. Could we in this release at least add this warning?
I tried to restrict warning only when _ino_t is used but warning
attribute does not apply to types and using __attribute__ ((deprecated))
would confuse users.

	* dirent/dirent.h [defined __USE_XOPEN && defined __ino_t_defined 
	&& ! defined __USE_FILE_OFFSET64]: Add warning.

Comments

Ondřej Bílka March 21, 2014, 11:29 a.m. UTC | #1
ping
On Mon, Mar 17, 2014 at 06:44:52PM +0100, Ondřej Bílka wrote:
> On Tue, Mar 11, 2014 at 06:51:55AM -0400, Mike Frysinger wrote:
> > On Wed 05 Mar 2014 10:13:31 Ondřej Bílka wrote:
> > > On Thu, Feb 27, 2014 at 09:45:37AM -0800, Paul Eggert wrote:
> > > > Squashing an inode that way has a small chance of introducing what
> > > > could be a serious bug.  If glibc is going to squash them, it should
> > > > do so reliably, by maintaining a table of all the inodes it's ever
> > > > seen and making sure there are no collisions.
> > > > 
> > > > Why bother to squash them at all, though?  Programs that care about
> > > > files should be compiled with _FILE_OFFSET_BITS defined to 64.  If
> > > > we're worried about programs that don't define _FILE_OFFSET_BITS, we
> > > > could change glibc to default to _FILE_OFFSET_BITS=64; that's a
> > > > better long-term solution anyway.
> > > 
> > > Changing default would be better. I dig while how to do it and can not
> > > find a nonugly solution. How should we do this?
> > 
> > isn't it simply:
> > 
> > --- a/include/features.h
> > +++ b/include/features.h
> > @@ -303,7 +303,7 @@
> >  # define __USE_LARGEFILE64	1
> >  #endif
> >  
> > -#if defined _FILE_OFFSET_BITS && _FILE_OFFSET_BITS == 64
> > +#if !defined _FILE_OFFSET_BITS || _FILE_OFFSET_BITS == 64
> >  # define __USE_FILE_OFFSET64	1
> >  #endif
> >  
> > 
> > the glibc source files themselves seem to expect default of 
> > _FILE_OFFSET_BITS=32 though, so we might have to add that to default CPPFLAGS 
> > when building.
> > 
> > i kind of feel like deploying this on Gentoo systems to see what breaks ;).
> > -mike
> 
> Yes, it is. Could we in this release at least add this warning?
> I tried to restrict warning only when _ino_t is used but warning
> attribute does not apply to types and using __attribute__ ((deprecated))
> would confuse users.
> 
> 	* dirent/dirent.h [defined __USE_XOPEN && defined __ino_t_defined 
> 	&& ! defined __USE_FILE_OFFSET64]: Add warning.
> 
> diff --git a/dirent/dirent.h b/dirent/dirent.h
> index f6db6c6..35c0a93 100644
> --- a/dirent/dirent.h
> +++ b/dirent/dirent.h
> @@ -31,6 +31,7 @@ __BEGIN_DECLS
>  #ifdef __USE_XOPEN
>  # ifndef __ino_t_defined
>  #  ifndef __USE_FILE_OFFSET64
> +#   warning Please use -D_FILE_OFFSET_BITS=64 to support long files.
>  typedef __ino_t ino_t;
>  #  else
>  typedef __ino64_t ino_t;
Joseph Myers March 21, 2014, 1:11 p.m. UTC | #2
On Fri, 21 Mar 2014, Ondrej Bilka wrote:

> ping

Scattergun patches changing a random header with a random deprecation (not 
even the header most likely to be used to get the type in question, which 
is sys/types.h not dirent.h) make no sense here.

The following kinds of patches make sense here as incremental improvements 
(they are not necessarily needed for a change in the default, just 
somewhat vaguely related, but they also have self-contained justification 
independent of any such change or deprecation): (this is not an exhaustive 
list)

* Adding *64 versions of interfaces without them (at least fts).

* Refining conditions used in headers (at least fts.h) to reject use in a 
particular _FILE_OFFSET_BITS mode.  The conditions need to be 
conservatively safe.  The relevant conditions for fts.h are, I think, that 
ino_t and struct stat match between the two interfaces.  I don't think any 
form of static assertion is needed there; bits/typesizes.h has a macro 
__INO_T_MATCHES_INO64_T, and a header could be added that defines a macro 
for whether stat and stat64 match (probably with a generic version 
defining it if __LP64__, and a MIPS GNU/Linux version that never defines 
it, but anyone implementing this should check all systems carefully to 
confirm this).  Of course, if the previous point were implemented, this 
would become irrelevant.

* Fixing the ISO C / POSIX namespace issues for _FILE_OFFSET_BITS=64, 
using existing __*64 symbols where already exported at a public symbol 
version or adding new ones at a new symbol version if not (bug 14106).

* Stopping _FILE_OFFSET_BITS=64 affecting the C++ ABI on 64-bit 
architectures (bug 15766) - although inevitably that could break things 
for some existing C++ binaries (but sometimes such mangling breakage seems 
unavoidable, e.g. the removal of the "siginfo" tag from siginfo_t).

* Where code internally in glibc and the miscellaneous programs it 
installs uses non-large-files interfaces such as fopen or __open, making 
it use the versions supporting large files instead.

But for any change to the default or deprecation, we need an in-depth 
analysis showing what the effects would be in practice.  For all the 
library headers in a large GNU/Linux distribution, how many use any 
affected type in an ABI, and what settings are those libraries built with?  
For all the program and library binaries in a large distribution, how many 
use the _FILE_OFFSET_BITS=32 interfaces, how many use the 
_FILE_OFFSET_BITS=64 interfaces, how many use interfaces such as fseek and 
ftell that are implicitly unsafe in the _FILE_OFFSET_BITS=64 context and 
need changing to fseeko / ftello to work reliably in that context?  Post 
the scripts used so other distributions can adapt them to tell what's 
affected in their distributions.  If you rebuild a distribution with a 
changed default, what builds break?

In principle I think a change to the default is likely to be a good idea, 
but first we need a clear understanding of the extent of its impact on 
ABIs of non-glibc libraries, and tools that distributors can use to work 
out what libraries in their distributions are affected and plan ABI 
transitions as needed.

(I'm not asking for a detailed analysis of each affected library, but 
lists like "these 50 library packages are affected out of 1000 in 
distribution X" would help indicate the scale of the issue.  And some 
examples for individual cases from distributors, "this is how we'd handle 
the transition for library Y so that binaries users built with the old 
version of library Y continue to work correctly", would help as well - 
such a change means a large whole-system transition that distributors need 
to be on-board with to avoid breaking things for their users, or breaking 
cross-distribution binary compatibility as far as possible.  Details 
agreed between distributors of how to handle particular libraries might 
then go on a wiki page somewhere in the interests of cross-distribution 
binary compatibility.)
Paul Eggert March 21, 2014, 9:24 p.m. UTC | #3
On 03/21/2014 06:11 AM, Joseph S. Myers wrote:
> * Adding *64 versions of interfaces without them (at least fts).

This is a messy situation.  GNU applications typically use Gnulib's 
version of fts, and Gnulib's interface has diverged from libc's.  The 
Gnulib interface and implementation support features that glibc doesn't, 
e.g., checking for directory loops, and these are used by programs like 
'rm -r' and 'chmod -R'.

One possible way forward would be to deprecate the current fts API, and 
switch to the gnulib API (with room for future extensions), retaining 
binary support for old applications that dynamically link to the new 
glibc.  This would be some work, though.

> * Refining conditions used in headers (at least fts.h) to reject use in a
> particular _FILE_OFFSET_BITS mode.

I looked into this a bit for fts.h, and expect that it's more trouble 
than it's worth due to glitches similar to what you mentioned for MIPS 
n64.  For example, on 64-bit hosts sysdeps/mach/hurd/bits/stat.h appends 
slightly different padding at struct stat's end when _FILE_OFFSET_BITS 
is 64 (why, I don't know), making struct stat incompatible with struct 
stat64.  In practice, people building 64-bit applications rarely define 
_FILE_OFFSET_BITS to be 64 (because Autoconf and Gnulib don't do that), 
so it's probably not worth our while to improve support for that 
combination.

It is odd that libc supports two different 64-bit ABIs for 'stat' etc. 
on all 64-bit platforms, one hardly ever used.  It might make sense to 
deprecate the rarely-used ABI, i.e., have _FILE_OFFSET_BITS=64 be a 
no-op on 64-bit platforms.  This should fix the C++ ABI problem you 
mentioned (bug #15766).

> we need an in-depth
> analysis showing what the effects would be in practice.  For all the
> library headers in a large GNU/Linux distribution, how many use any
> affected type in an ABI, and what settings are those libraries built with?

It'd take quite some time to generate such an analysis.  I grabbed all 
the Fedora 20 -devel packages that I could (i.e., all those that didn't 
conflict with something already installed), and ran the attached script 
while in /usr/include, and this generated 2847 lines of output.  Not 
every line needs to be looked at in detail, but enough do.  I can tell 
you the results of looking at the first 45 lines of output of 
"file_offset *", though. I found four categories of libraries:

1. These libraries are compiled with -D_FILE_OFFSET_BITS=64, so 
theycurrently should have problems unless applications are also built 
with-D_FILE_OFFSET_BITS=64.  Changing the default will fix these problems.

flac
fltk
GraphicsMagick
hdf5
ImageMagick

2. These libraries use a type like off_t in macros or static or 
inlinefunctions, but it's not part of the binary API.  Changing the 
defaultto 64-bit will therefore not affect the library's API.  Programs 
recompiled with the new default will have larger off_t etc., just asthey 
will with glibc.

libeplplot
InsightToolkit

3. These libraries use a type like off_t in defining a data type that 
isnot used anywhere (either as part of the API, or internally), so 
changingthe default shouldn't affect them.

GeoIP

4. These libraries use a type like off_t in redeclaring 
standardfunctions like 'stat', which should continue to work if the 
defaultis changed.

ice


 From looking at the above, it appears that changing the default 
_FILE_OFFSET_BITS to 64 will fix more bugs than it'll cause.  Of course 
this is just one small sample.  I stopped at 45 lines of grep output 
because the next lines were big batches from Qt and from X11; these will 
be a pain to analyze, and it's not clear to me what the results will be.

One other data point: on 32-bit platforms Perl and Python both assume 
_FILE_OFFSET_BITS=64, so I suspect that libraries linked with these 
programs can have problems if they are compiled without 
_FILE_OFFSET_BITS=64.
#! /bin/sh

LC_ALL=C
export LC_ALL

pat='RLIM_INFINITY
blkcnt_t
ino_t
off_t
rlim_t
struct rlimit
struct dirent
struct statfs
struct stat
struct statvfs
fsblkcnt_t
fsfilcnt_t
fpos_t
struct flock
struct aiocb
fts.h'

exec find "${@-.}" -type f -exec grep -nwHF "$pat" {} +
Roland McGrath March 21, 2014, 10:04 p.m. UTC | #4
> On 03/21/2014 06:11 AM, Joseph S. Myers wrote:
> > * Adding *64 versions of interfaces without them (at least fts).
> 
> This is a messy situation.  GNU applications typically use Gnulib's 
> version of fts, and Gnulib's interface has diverged from libc's.  The 
> Gnulib interface and implementation support features that glibc doesn't, 
> e.g., checking for directory loops, and these are used by programs like 
> 'rm -r' and 'chmod -R'.
> 
> One possible way forward would be to deprecate the current fts API, and 
> switch to the gnulib API (with room for future extensions), retaining 
> binary support for old applications that dynamically link to the new 
> glibc.  This would be some work, though.

I don't think it's so hard.  But the first question is if perhaps we
shouldn't just deprecate fts entirely and not support a new one in libc.
If there is no great benefit to having this in libc vs just everyone
building their own from gnulib, that might be simpler.  Before deciding
where to go, we should see what the current versions of 4.4BSD-derived
systems are doing for fts.  Compatibility with 4.4BSD libc was the original
reason we (I) added fts.

> It is odd that libc supports two different 64-bit ABIs for 'stat' etc. 
> on all 64-bit platforms, one hardly ever used.  It might make sense to 
> deprecate the rarely-used ABI, i.e., have _FILE_OFFSET_BITS=64 be a 
> no-op on 64-bit platforms.  This should fix the C++ ABI problem you 
> mentioned (bug #15766).

I think that was always the intent.  Perhaps we should also consider what
we ought to be doing better in API/ABI checking that could be made to
ensure that we maintain the desired invariant in the future.

> 1. These libraries are compiled with -D_FILE_OFFSET_BITS=64, so 
> theycurrently should have problems unless applications are also built 
> with-D_FILE_OFFSET_BITS=64.  Changing the default will fix these problems.
> 
> flac
> fltk
> GraphicsMagick
> hdf5
> ImageMagick

You didn't list elfutils ({-libelf,{-devel}}).  That is built with
-D_FILE_OFFSET_BITS=64 but uses off64_t et al in its public API headers,
so it should not have a problem.  I don't know if your method of analysis
groks those cases or not.


Thanks,
Roland
Paul Eggert March 21, 2014, 11:04 p.m. UTC | #5
Roland McGrath wrote:

> the first question is if perhaps we shouldn't just deprecate fts entirely and not support a new one in libc.

That would be easier, yes.

> we should see what the current versions of 4.4BSD-derived
> systems are doing for fts.

It's in the latest versions of FreeBSD (10.0), NetBSD (6.1.3), and 
OpenBSD (5.4).  FreeBSD has some minor extensions that the others 
(including glibc) lack.

> You didn't list elfutils ({-libelf,{-devel}}).

That package is free of the problem; my script gave its include files a 
clean bill of health.  In fact I'm puzzled why you mentioned it, as its 
include files don't use off_t, off64_t, or any of the other types (with 
or without "64") affected by the problem.

Here's a list of elfutils-devel's include files on Fedora 20:

dwarf.h
elfutils/elf-knowledge.h
elfutils/libasm.h
elfutils/libdwfl.h
elfutils/libdw.h
elfutils/libebl.h
elfutils/version.h
Roland McGrath March 24, 2014, 10:01 p.m. UTC | #6
> That package is free of the problem; my script gave its include files a 
> clean bill of health.  In fact I'm puzzled why you mentioned it, as its 
> include files don't use off_t, off64_t, or any of the other types (with 
> or without "64") affected by the problem.

I brought it up because it's a case of a library (libelf) that did have an
API whose ABI was (accidentally) dependent on the _FILE_OFFSET_BITS
setting.  The library itself was compiled with -D_FILE_OFFSET_BITS=64 and
so on 32-bit platforms it required that its users be compiled that way too.
We fixed it back in 2007.  I misremembered and thought we'd used off64_t.
In fact it uses loff_t.  The log entry I wrote at the time said that only
loff_t (not off64_t) was defined by <sys/types.h> in all circumstances.


Thanks,
Roland

Patch
diff mbox

diff --git a/dirent/dirent.h b/dirent/dirent.h
index f6db6c6..35c0a93 100644
--- a/dirent/dirent.h
+++ b/dirent/dirent.h
@@ -31,6 +31,7 @@  __BEGIN_DECLS
 #ifdef __USE_XOPEN
 # ifndef __ino_t_defined
 #  ifndef __USE_FILE_OFFSET64
+#   warning Please use -D_FILE_OFFSET_BITS=64 to support long files.
 typedef __ino_t ino_t;
 #  else
 typedef __ino64_t ino_t;