[v4] Use a proper C tokenizer to implement the obsolete typedefs test.
diff mbox series

Message ID 20190311145927.20399-1-zackw@panix.com
State New
Headers show
Series
  • [v4] Use a proper C tokenizer to implement the obsolete typedefs test.
Related show

Commit Message

Zack Weinberg March 11, 2019, 2:59 p.m. UTC
The test for obsolete typedefs in installed headers was implemented
using grep, and could therefore get false positives on e.g. “ulong”
in a comment.  It was also scanning all of the headers included by
our headers, and therefore testing headers we don’t control, e.g.
Linux kernel headers.

This patch splits the obsolete-typedef test from
scripts/check-installed-headers.sh to a separate program,
scripts/check-obsolete-constructs.py.  Being implemented in Python,
it is feasible to make it tokenize C accurately enough to avoid false
positives on the contents of comments and strings.  It also only
examines $(headers) in each subdirectory--all the headers we install,
but not any external dependencies of those headers.  Headers whose
installed name starts with finclude/ are ignored, on the assumption
that they contain Fortran.

It is also feasible to make the new test understand the difference
between _defining_ the obsolete typedefs and _using_ the obsolete
typedefs, which means posix/{bits,sys}/types.h no longer need to be
exempted.  This uncovered an actual bug in bits/types.h: __quad_t and
__u_quad_t were being used to define __S64_TYPE, __U64_TYPE,
__SQUAD_TYPE and __UQUAD_TYPE.  These are changed to __int64_t and
__uint64_t respectively.  This is a safe change, despite the comments
in bits/types.h claiming a difference between __quad_t and __int64_t,
because those comments are incorrect.  In all current ABIs, both
__quad_t and __int64_t are ‘long’ when ‘long’ is a 64-bit type, and
‘long long’ when ‘long’ is a 32-bit type, and similarly for __u_quad_t
and __uint64_t.  (Changing the types to be what the comments say they
are would be an ABI break, as it affects C++ name mangling.)  This
patch includes a minimal change to make the comments not completely
wrong.  I plan to remove __SQUAD_TYPE and __UQUAD_TYPE altogether in
subseqent patches, but that would be inappropriate for backporting to
release branches.

sys/types.h was defining the legacy BSD u_intN_t typedefs using a
construct that was not necessarily consistent with how the C99 uintN_t
typedefs are defined, and is also too complicated for the new script to
understand (it lexes C relatively accurately, but it does not attempt
to expand preprocessor macros, nor does it do any actual parsing).
This patch cuts all of that out and uses bits/types.h's __uintN_t typedefs
to define u_intN_t instead.  This is verified to not change the ABI on
any supported architecture, via the c++-types test, which means u_intN_t
and uintN_t were, in fact, consistent on all supported architectures.
I plan to restrict u_intN_t and some other legacy typedefs (but not
intN_t) to __USE_MISC in subsequent patches, but again that would be
inappropriate for backporting to release branches.

	* scripts/check-obsolete-constructs.py: New test script.
        * scripts/check-installed-headers.sh: Remove tests for
        obsolete typedefs, superseded by check-obsolete-constructs.py.
        * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
        as a special test.  Update commentary.
        * posix/bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
        (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
        Update commentary.
        * posix/sys/types.h (__u_intN_t): Remove.
        (u_int8_t): Typedef using __uint8_t.
        (u_int16_t): Typedef using __uint16_t.
        (u_int32_t): Typedef using __uint32_t.
        (u_int64_t): Typedef using __uint64_t.
---

Changes since v3: Fortran headers are now detected by path
(finclude/*) instead of looking for Emacs mode annotations within the
file.  tokenize_c is now responsible for issuing errors for the BAD_*
and OTHER lexical productions, and no longer returns BAD_* tokens.
It is also responsible for tracking whether or not each token belongs
to a preprocessing directive line, and accurately tokenizes #include
arguments.  (Accurate tokenization of #include arguments will be
required by future patches I have planned.)

---
 Rules                                |  17 +-
 posix/bits/types.h                   |  10 +-
 posix/sys/types.h                    |  33 +-
 scripts/check-installed-headers.sh   |  37 +--
 scripts/check-obsolete-constructs.py | 466 +++++++++++++++++++++++++++
 5 files changed, 500 insertions(+), 63 deletions(-)
 create mode 100755 scripts/check-obsolete-constructs.py

2.20.1

Comments

Carlos O'Donell March 11, 2019, 6:57 p.m. UTC | #1
On 3/11/19 10:59 AM, Zack Weinberg wrote:
> The test for obsolete typedefs in installed headers was implemented
> using grep, and could therefore get false positives on e.g. “ulong”
> in a comment.  It was also scanning all of the headers included by
> our headers, and therefore testing headers we don’t control, e.g.
> Linux kernel headers.

Correct.

> This patch splits the obsolete-typedef test from
> scripts/check-installed-headers.sh to a separate program,
> scripts/check-obsolete-constructs.py.  Being implemented in Python,
> it is feasible to make it tokenize C accurately enough to avoid false
> positives on the contents of comments and strings.  It also only
> examines $(headers) in each subdirectory--all the headers we install,
> but not any external dependencies of those headers.  Headers whose
> installed name starts with finclude/ are ignored, on the assumption
> that they contain Fortran.

OK.

> It is also feasible to make the new test understand the difference
> between _defining_ the obsolete typedefs and _using_ the obsolete
> typedefs, which means posix/{bits,sys}/types.h no longer need to be
> exempted.  This uncovered an actual bug in bits/types.h: __quad_t and
> __u_quad_t were being used to define __S64_TYPE, __U64_TYPE,
> __SQUAD_TYPE and __UQUAD_TYPE.  These are changed to __int64_t and
> __uint64_t respectively.  This is a safe change, despite the comments
> in bits/types.h claiming a difference between __quad_t and __int64_t,
> because those comments are incorrect.  In all current ABIs, both
> __quad_t and __int64_t are ‘long’ when ‘long’ is a 64-bit type, and
> ‘long long’ when ‘long’ is a 32-bit type, and similarly for __u_quad_t
> and __uint64_t.  (Changing the types to be what the comments say they
> are would be an ABI break, as it affects C++ name mangling.)  This
> patch includes a minimal change to make the comments not completely
> wrong.  I plan to remove __SQUAD_TYPE and __UQUAD_TYPE altogether in
> subseqent patches, but that would be inappropriate for backporting to
> release branches.

OK.

> sys/types.h was defining the legacy BSD u_intN_t typedefs using a
> construct that was not necessarily consistent with how the C99 uintN_t
> typedefs are defined, and is also too complicated for the new script to
> understand (it lexes C relatively accurately, but it does not attempt
> to expand preprocessor macros, nor does it do any actual parsing).
> This patch cuts all of that out and uses bits/types.h's __uintN_t typedefs
> to define u_intN_t instead.  This is verified to not change the ABI on
> any supported architecture, via the c++-types test, which means u_intN_t
> and uintN_t were, in fact, consistent on all supported architectures.
> I plan to restrict u_intN_t and some other legacy typedefs (but not
> intN_t) to __USE_MISC in subsequent patches, but again that would be
> inappropriate for backporting to release branches.

OK.

> 	* scripts/check-obsolete-constructs.py: New test script.
>         * scripts/check-installed-headers.sh: Remove tests for
>         obsolete typedefs, superseded by check-obsolete-constructs.py.
>         * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
>         as a special test.  Update commentary.
>         * posix/bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
>         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
>         Update commentary.
>         * posix/sys/types.h (__u_intN_t): Remove.
>         (u_int8_t): Typedef using __uint8_t.
>         (u_int16_t): Typedef using __uint16_t.
>         (u_int32_t): Typedef using __uint32_t.
>         (u_int64_t): Typedef using __uint64_t.

OK for master if you:
- Fix the conditional in check-installed-headers.sh.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
> 
> Changes since v3: Fortran headers are now detected by path
> (finclude/*) instead of looking for Emacs mode annotations within the
> file.  tokenize_c is now responsible for issuing errors for the BAD_*
> and OTHER lexical productions, and no longer returns BAD_* tokens.
> It is also responsible for tracking whether or not each token belongs
> to a preprocessing directive line, and accurately tokenizes #include
> arguments.  (Accurate tokenization of #include arguments will be
> required by future patches I have planned.)

OK.

> ---
>  Rules                                |  17 +-
>  posix/bits/types.h                   |  10 +-
>  posix/sys/types.h                    |  33 +-
>  scripts/check-installed-headers.sh   |  37 +--
>  scripts/check-obsolete-constructs.py | 466 +++++++++++++++++++++++++++
>  5 files changed, 500 insertions(+), 63 deletions(-)
>  create mode 100755 scripts/check-obsolete-constructs.py
> 
> diff --git a/Rules b/Rules
> index e08a28d9f3..222dba6dcb 100644
> --- a/Rules
> +++ b/Rules
> @@ -82,7 +82,8 @@ $(common-objpfx)dummy.c:
>  common-generated += dummy.o dummy.c
>  
>  ifneq "$(headers)" ""
> -# Special test of all the installed headers in this directory.
> +# Test that all of the headers installed by this directory can be compiled
> +# in isolation.

OK.

>  tests-special += $(objpfx)check-installed-headers-c.out
>  libof-check-installed-headers-c := testsuite
>  $(objpfx)check-installed-headers-c.out: \
> @@ -93,6 +94,8 @@ $(objpfx)check-installed-headers-c.out: \
>  	$(evaluate-test)
>  
>  ifneq "$(CXX)" ""
> +# If a C++ compiler is available, also test that they can be compiled
> +# in isolation as C++.

OK.

>  tests-special += $(objpfx)check-installed-headers-cxx.out
>  libof-check-installed-headers-cxx := testsuite
>  $(objpfx)check-installed-headers-cxx.out: \
> @@ -103,12 +106,24 @@ $(objpfx)check-installed-headers-cxx.out: \
>  	$(evaluate-test)
>  endif # $(CXX)
>  
> +# Test that a wrapper header exists in include/ for each non-sysdeps header.
> +# This script does not need $(py-env).

OK.

>  tests-special += $(objpfx)check-wrapper-headers.out
>  $(objpfx)check-wrapper-headers.out: \
>    $(..)scripts/check-wrapper-headers.py $(headers)
>  	$(PYTHON) $< --root=$(..) --subdir=$(subdir) $(headers) > $@; \
>  	  $(evaluate-test)
>  
> +# Test that none of the headers installed by this directory use certain
> +# obsolete constructs (e.g. legacy BSD typedefs superseded by stdint.h).
> +# This script does not need $(py-env).
> +tests-special += $(objpfx)check-obsolete-constructs.out
> +libof-check-obsolete-constructs := testsuite
> +$(objpfx)check-obsolete-constructs.out: \
> +    $(..)scripts/check-obsolete-constructs.py $(headers)
> +	$(PYTHON) $^ > $@ 2>&1; \
> +	$(evaluate-test)

OK.

> +
>  endif # $(headers)
>  
>  # This makes all the auxiliary and test programs.
> diff --git a/posix/bits/types.h b/posix/bits/types.h
> index 27e065c3be..0de6c59bb4 100644
> --- a/posix/bits/types.h
> +++ b/posix/bits/types.h
> @@ -87,7 +87,7 @@ __extension__ typedef unsigned long long int __uintmax_t;
>  	32		-- "natural" 32-bit type (always int)
>  	64		-- "natural" 64-bit type (long or long long)
>  	LONG32		-- 32-bit type, traditionally long
> -	QUAD		-- 64-bit type, always long long
> +	QUAD		-- 64-bit type, traditionally long long

OK.

>  	WORD		-- natural type of __WORDSIZE bits (int or long)
>  	LONGWORD	-- type of __WORDSIZE bits, traditionally long
>  
> @@ -113,14 +113,14 @@ __extension__ typedef unsigned long long int __uintmax_t;
>  #define __SLONGWORD_TYPE	long int
>  #define __ULONGWORD_TYPE	unsigned long int
>  #if __WORDSIZE == 32
> -# define __SQUAD_TYPE		__quad_t
> -# define __UQUAD_TYPE		__u_quad_t
> +# define __SQUAD_TYPE		__int64_t
> +# define __UQUAD_TYPE		__uint64_t

OK.

>  # define __SWORD_TYPE		int
>  # define __UWORD_TYPE		unsigned int
>  # define __SLONG32_TYPE		long int
>  # define __ULONG32_TYPE		unsigned long int
> -# define __S64_TYPE		__quad_t
> -# define __U64_TYPE		__u_quad_t
> +# define __S64_TYPE		__int64_t
> +# define __U64_TYPE		__uint64_t

OK.

>  /* We want __extension__ before typedef's that use nonstandard base types
>     such as `long long' in C89 mode.  */
>  # define __STD_TYPE		__extension__ typedef
> diff --git a/posix/sys/types.h b/posix/sys/types.h
> index 27129c5c23..0e37b1ce6a 100644
> --- a/posix/sys/types.h
> +++ b/posix/sys/types.h
> @@ -154,37 +154,20 @@ typedef unsigned int uint;
>  
>  #include <bits/stdint-intn.h>
>  
> -#if !__GNUC_PREREQ (2, 7)
> -
>  /* These were defined by ISO C without the first `_'.  */
> -typedef	unsigned char u_int8_t;
> -typedef	unsigned short int u_int16_t;
> -typedef	unsigned int u_int32_t;
> -# if __WORDSIZE == 64
> -typedef unsigned long int u_int64_t;
> -# else
> -__extension__ typedef unsigned long long int u_int64_t;
> -# endif
> -
> -typedef int register_t;

OK, this removed bug covered below.

> -
> -#else
> -
> -/* For GCC 2.7 and later, we can use specific type-size attributes.  */
> -# define __u_intN_t(N, MODE) \
> -  typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE)))
> -
> -__u_intN_t (8, __QI__);
> -__u_intN_t (16, __HI__);
> -__u_intN_t (32, __SI__);
> -__u_intN_t (64, __DI__);
> +typedef __uint8_t u_int8_t;
> +typedef __uint16_t u_int16_t;
> +typedef __uint32_t u_int32_t;
> +typedef __uint64_t u_int64_t;

OK, switch to 4 new typedefs that don't change meaning.

>  
> +#if __GNUC_PREREQ (2, 7)

OK, retain old check here.

>  typedef int register_t __attribute__ ((__mode__ (__word__)));
> -
> +#else
> +typedef int register_t;
> +#endif

OK.

>  
>  /* Some code from BIND tests this macro to see if the types above are
>     defined.  */
> -#endif

OK.

>  #define __BIT_TYPES_DEFINED__	1
>  
>  
> diff --git a/scripts/check-installed-headers.sh b/scripts/check-installed-headers.sh
> index 1f4496446c..e4f37d33f8 100644
> --- a/scripts/check-installed-headers.sh
> +++ b/scripts/check-installed-headers.sh
> @@ -16,11 +16,9 @@
>  # License along with the GNU C Library; if not, see
>  # <http://www.gnu.org/licenses/>.
>  
> -# Check installed headers for cleanliness.  For each header, confirm
> -# that it's possible to compile a file that includes that header and
> -# does nothing else, in several different compilation modes.  Also,
> -# scan the header for a set of obsolete typedefs that should no longer
> -# appear.
> +# For each installed header, confirm that it's possible to compile a
> +# file that includes that header and does nothing else, in several
> +# different compilation modes.

OK.

>  
>  # These compilation switches assume GCC or compatible, which is probably
>  # fine since we also assume that when _building_ glibc.
> @@ -31,13 +29,6 @@ cxx_modes="-std=c++98 -std=gnu++98 -std=c++11 -std=gnu++11"
>  # These are probably the most commonly used three.
>  lib_modes="-D_DEFAULT_SOURCE=1 -D_GNU_SOURCE=1 -D_XOPEN_SOURCE=700"
>  
> -# sys/types.h+bits/types.h have to define the obsolete types.
> -# rpc(svc)/* have the obsolete types too deeply embedded in their API
> -# to remove.
> -skip_obsolete_type_check='*/sys/types.h|*/bits/types.h|*/rpc/*|*/rpcsvc/*'
> -obsolete_type_re=\
> -'\<((__)?(quad_t|u(short|int|long|_(char|short|int([0-9]+_t)?|long|quad_t))))\>'
> -

OK.

>  if [ $# -lt 3 ]; then
>      echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
>      exit 2
> @@ -46,14 +37,10 @@ case "$1" in
>      (c)
>          lang_modes="$c_modes"
>          cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.c)
> -        already="$skip_obsolete_type_check"

OK.

>      ;;
>      (c++)
>          lang_modes="$cxx_modes"
>          cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.cc)
> -        # The obsolete-type check can be skipped for C++; it is
> -        # sufficient to do it for C.
> -        already="*"

OK.

>      ;;
>      (*)
>          echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
> @@ -155,22 +142,8 @@ $expanded_lib_mode
>  int avoid_empty_translation_unit;
>  EOF
>              if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1
> -            then
> -                includes=$($cc_cmd -fsyntax-only -H $lang_mode \
> -                              "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')
> -                for h in $includes; do
> -                    # Don't repeat work.
> -                    eval 'case "$h" in ('"$already"') continue;; esac'
> -
> -                    if grep -qE "$obsolete_type_re" "$h"; then
> -                        echo "*** Obsolete types detected:"
> -                        grep -HE "$obsolete_type_re" "$h"
> -                        failed=1
> -                    fi
> -                    already="$already|$h"
> -                done
> -            else
> -                failed=1
> +            then :
> +            else failed=1

Why not 'if ! $cc_cmd ...' ? Which avoids the odd empty if block e.g. ":".

>              fi
>          done
>      done
> diff --git a/scripts/check-obsolete-constructs.py b/scripts/check-obsolete-constructs.py
> new file mode 100755
> index 0000000000..46535afcac
> --- /dev/null
> +++ b/scripts/check-obsolete-constructs.py
> @@ -0,1 +1,466 @@
> +#! /usr/bin/python3

OK.

> +# Copyright (C) 2019 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +"""Verifies that installed headers do not use any obsolete constructs:
> + * legacy BSD typedefs superseded by <stdint.h>:
> +   ushort uint ulong u_char u_short u_int u_long u_intNN_t quad_t u_quad_t
> +   (sys/types.h is allowed to _define_ these types, but not to use them
> +    to define anything else).
> +"""
> +
> +import argparse
> +import collections
> +import re
> +import sys
> +
> +# Simplified lexical analyzer for C preprocessing tokens.
> +# Does not implement trigraphs.
> +# Does not implement backslash-newline in the middle of any lexical
> +#   item other than a string literal.
> +# Does not implement universal-character-names in identifiers.
> +# Treats prefixed strings (e.g. L"...") as two tokens (L and "...")
> +# Accepts non-ASCII characters only within comments and strings.
> +
> +# Caution: The order of the outermost alternation matters.
> +# STRING must be before BAD_STRING, CHARCONST before BAD_CHARCONST,
> +# BLOCK_COMMENT before BAD_BLOCK_COM before PUNCTUATOR, and OTHER must
> +# be last.
> +# Caution: There should be no capturing groups other than the named
> +# captures in the outermost alternation.
> +
> +# For reference, these are all of the C punctuators as of C11:
> +#   [ ] ( ) { } , ; ? ~
> +#   ! != * *= / /= ^ ^= = ==
> +#   # ##
> +#   % %= %> %: %:%:
> +#   & &= &&
> +#   | |= ||
> +#   + += ++
> +#   - -= -- ->
> +#   . ...
> +#   : :>
> +#   < <% <: << <<= <=
> +#   > >= >> >>=
> +
> +# The BAD_* tokens are not part of the official definition of pp-tokens;
> +# they match unclosed strings, character constants, and block comments,
> +# so that the regex engine doesn't have to backtrack all the way to the
> +# beginning of a broken construct and then emit dozens of junk tokens.
> +
> +PP_TOKEN_RE_ = re.compile(r"""
> +    (?P<STRING>        \"(?:[^\"\\\r\n]|\\(?:[\r\n -~]|\r\n))*\")
> +   |(?P<BAD_STRING>    \"(?:[^\"\\\r\n]|\\[ -~])*)
> +   |(?P<CHARCONST>     \'(?:[^\'\\\r\n]|\\(?:[\r\n -~]|\r\n))*\')
> +   |(?P<BAD_CHARCONST> \'(?:[^\'\\\r\n]|\\[ -~])*)
> +   |(?P<BLOCK_COMMENT> /\*(?:\*(?!/)|[^*])*\*/)
> +   |(?P<BAD_BLOCK_COM> /\*(?:\*(?!/)|[^*])*\*?)
> +   |(?P<LINE_COMMENT>  //[^\r\n]*)
> +   |(?P<IDENT>         [_a-zA-Z][_a-zA-Z0-9]*)
> +   |(?P<PP_NUMBER>     \.?[0-9](?:[0-9a-df-oq-zA-DF-OQ-Z_.]|[eEpP][+-]?)*)
> +   |(?P<PUNCTUATOR>
> +       [,;?~(){}\[\]]
> +     | [!*/^=]=?
> +     | \#\#?
> +     | %(?:[=>]|:(?:%:)?)?
> +     | &[=&]?
> +     |\|[=|]?
> +     |\+[=+]?
> +     | -[=->]?
> +     |\.(?:\.\.)?
> +     | :>?
> +     | <(?:[%:]|<(?:=|<=?)?)?
> +     | >(?:=|>=?)?)
> +   |(?P<ESCNL>         \\(?:\r|\n|\r\n))
> +   |(?P<WHITESPACE>    [ \t\n\r\v\f]+)
> +   |(?P<OTHER>         .)
> +""", re.DOTALL | re.VERBOSE)
> +
> +HEADER_NAME_RE_ = re.compile(r"""
> +    < [^>\r\n]+ >
> +  | " [^"\r\n]+ "
> +""", re.DOTALL | re.VERBOSE)
> +
> +ENDLINE_RE_ = re.compile(r"""\r|\n|\r\n""")
> +
> +# based on the sample code in the Python re documentation
> +Token_ = collections.namedtuple("Token", (
> +    "kind", "text", "line", "column", "context"))
> +Token_.__doc__ = """
> +   One C preprocessing token, comment, or chunk of whitespace.
> +   'kind' identifies the token type, which will be one of:
> +       STRING, CHARCONST, BLOCK_COMMENT, LINE_COMMENT, IDENT,
> +       PP_NUMBER, PUNCTUATOR, ESCNL, WHITESPACE, HEADER_NAME,
> +       or OTHER.  The BAD_* alternatives in PP_TOKEN_RE_ are
> +       handled within tokenize_c, below.
> +
> +   'text' is the sequence of source characters making up the token;
> +       no decoding whatsoever is performed.
> +
> +   'line' and 'column' give the position of the first character of the
> +      token within the source file.  They are both 1-based.
> +
> +   'context' indicates whether or not this token occurred within a
> +      preprocessing directive; it will be None for running text,
> +      '<null>' for the leading '#' of a directive line (because '#'
> +      all by itself on a line is a "null directive"), or the name of
> +      the directive for tokens within a directive line, starting with
> +      the IDENT for the name itself.
> +"""
> +
> +def tokenize_c(file_contents, reporter):
> +    """Yield a series of Token objects, one for each preprocessing
> +       token, comment, or chunk of whitespace within FILE_CONTENTS.
> +       The REPORTER object is expected to have one method,
> +       reporter.error(token, message), which will be called to
> +       indicate a lexical error at the position of TOKEN.
> +       If MESSAGE contains the four-character sequence '{!r}', that
> +       is expected to be replaced by repr(token.text).
> +    """
> +
> +    Token = Token_
> +    PP_TOKEN_RE = PP_TOKEN_RE_
> +    ENDLINE_RE = ENDLINE_RE_
> +    HEADER_NAME_RE = HEADER_NAME_RE_
> +
> +    line_num = 1
> +    line_start = 0
> +    pos = 0
> +    limit = len(file_contents)
> +    directive = None
> +    at_bol = True
> +    while pos < limit:
> +        if directive == "include":
> +            mo = HEADER_NAME_RE.match(file_contents, pos)
> +            if mo:
> +                kind = "HEADER_NAME"
> +                directive = "after_include"
> +            else:
> +                mo = PP_TOKEN_RE.match(file_contents, pos)
> +                kind = mo.lastgroup
> +                if kind != "WHITESPACE":
> +                    directive = "after_include"
> +        else:
> +            mo = PP_TOKEN_RE.match(file_contents, pos)
> +            kind = mo.lastgroup
> +
> +        text = mo.group()
> +        line = line_num
> +        column = mo.start() - line_start
> +        adj_line_start = 0
> +        # only these kinds can contain a newline
> +        if kind in ("WHITESPACE", "BLOCK_COMMENT", "LINE_COMMENT",
> +                    "STRING", "CHARCONST", "BAD_BLOCK_COM", "ESCNL"):
> +            for tmo in ENDLINE_RE.finditer(text):
> +                line_num += 1
> +                adj_line_start = tmo.end()
> +            if adj_line_start:
> +                line_start = mo.start() + adj_line_start
> +
> +        # Track whether or not we are scanning a preprocessing directive.
> +        if kind == "LINE_COMMENT" or (kind == "WHITESPACE" and adj_line_start):
> +            at_bol = True
> +            directive = None
> +        else:
> +            if kind == "PUNCTUATOR" and text == "#" and at_bol:
> +                directive = "<null>"
> +            elif kind == "IDENT" and directive == "<null>":
> +                directive = text
> +            at_bol = False
> +
> +        # Report ill-formed tokens and rewrite them as their well-formed
> +        # equivalents, so downstream processing doesn't have to know about them.
> +        # (Rewriting instead of discarding provides better error recovery.)
> +        if kind == "BAD_BLOCK_COM":
> +            reporter.error(Token("BAD_BLOCK_COM", "", line, column+1, ""),
> +                           "unclosed block comment")
> +            text += "*/"
> +            kind = "BLOCK_COMMENT"
> +        elif kind == "BAD_STRING":
> +            reporter.error(Token("BAD_STRING", "", line, column+1, ""),
> +                           "unclosed string")
> +            text += "\""
> +            kind = "STRING"
> +        elif kind == "BAD_CHARCONST":
> +            reporter.error(Token("BAD_CHARCONST", "", line, column+1, ""),
> +                           "unclosed char constant")
> +            text += "'"
> +            kind = "CHARCONST"
> +
> +        tok = Token(kind, text, line, column+1,
> +                    "include" if directive == "after_include" else directive)
> +        # Do not complain about OTHER tokens inside macro definitions.
> +        # $ and @ appear in macros defined by headers intended to be
> +        # included from assembly language, e.g. sysdeps/mips/sys/asm.h.
> +        if kind == "OTHER" and directive != "define":
> +            self.error(tok, "stray {!r} in program")
> +
> +        yield tok
> +        pos = mo.end()
> +
> +#
> +# Base and generic classes for individual checks.
> +#
> +
> +class ConstructChecker:
> +    """Scan a stream of C preprocessing tokens and possibly report
> +       problems with them.  The REPORTER object passed to __init__ has
> +       one method, reporter.error(token, message), which should be
> +       called to indicate a problem detected at the position of TOKEN.
> +       If MESSAGE contains the four-character sequence '{!r}' then that
> +       will be replaced with a textual representation of TOKEN.
> +    """
> +    def __init__(self, reporter):
> +        self.reporter = reporter
> +
> +    def examine(self, tok):
> +        """Called once for each token in a header file.
> +           Call self.reporter.error if a problem is detected.
> +        """
> +        raise NotImplementedError
> +
> +    def eof(self):
> +        """Called once at the end of the stream.  Subclasses need only
> +           override this if it might have something to do."""
> +        pass
> +
> +class NoCheck(ConstructChecker):
> +    """Generic checker class which doesn't do anything.  Substitute this
> +       class for a real checker when a particular check should be skipped
> +       for some file."""
> +
> +    def examine(self, tok):
> +        pass
> +
> +#
> +# Check for obsolete type names.
> +#
> +
> +# The obsolete type names we're looking for:
> +OBSOLETE_TYPE_RE_ = re.compile(r"""\A
> +  (__)?
> +  (   quad_t
> +    | u(?: short | int | long
> +         | _(?: char | short | int(?:[0-9]+_t)? | long | quad_t )))
> +\Z""", re.VERBOSE)
> +
> +class ObsoleteNotAllowed(ConstructChecker):
> +    """Don't allow any use of the obsolete typedefs."""
> +    def examine(self, tok):
> +        if OBSOLETE_TYPE_RE_.match(tok.text):
> +            self.reporter.error(tok, "use of {!r}")
> +
> +class ObsoletePrivateDefinitionsAllowed(ConstructChecker):
> +    """Allow definitions of the private versions of the
> +       obsolete typedefs; that is, 'typedef [anything] __obsolete;'
> +    """
> +    def __init__(self, reporter):
> +        super().__init__(reporter)
> +        self.in_typedef = False
> +        self.prev_token = None
> +
> +    def examine(self, tok):
> +        # bits/types.h hides 'typedef' in a macro sometimes.
> +        if (tok.kind == "IDENT"
> +            and tok.text in ("typedef", "__STD_TYPE")
> +            and tok.context is None):
> +            self.in_typedef = True
> +        elif tok.kind == "PUNCTUATOR" and tok.text == ";" and self.in_typedef:
> +            self.in_typedef = False
> +            if self.prev_token.kind == "IDENT":
> +                m = OBSOLETE_TYPE_RE_.match(self.prev_token.text)
> +                if m and m.group(1) != "__":
> +                    self.reporter.error(self.prev_token, "use of {!r}")
> +            self.prev_token = None
> +        else:
> +            self._check_prev()
> +
> +        self.prev_token = tok
> +
> +    def eof(self):
> +        self._check_prev()
> +
> +    def _check_prev(self):
> +        if (self.prev_token is not None
> +            and self.prev_token.kind == "IDENT"
> +            and OBSOLETE_TYPE_RE_.match(self.prev_token.text)):
> +            self.reporter.error(self.prev_token, "use of {!r}")
> +
> +class ObsoletePublicDefinitionsAllowed(ConstructChecker):
> +    """Allow definitions of the public versions of the obsolete
> +       typedefs.  Only specific forms of definition are allowed:
> +
> +           typedef __obsolete obsolete;  // identifiers must agree
> +           typedef __uintN_t u_intN_t;   // N must agree
> +           typedef unsigned long int ulong;
> +           typedef unsigned short int ushort;
> +           typedef unsigned int uint;
> +    """
> +    def __init__(self, reporter):
> +        super().__init__(reporter)
> +        self.typedef_tokens = []
> +
> +    def examine(self, tok):
> +        if tok.kind in ("WHITESPACE", "BLOCK_COMMENT",
> +                        "LINE_COMMENT", "NL", "ESCNL"):
> +            pass
> +
> +        elif (tok.kind == "IDENT" and tok.text == "typedef"
> +              and tok.context is None):
> +            if self.typedef_tokens:
> +                self.reporter.error(tok, "typedef inside typedef")
> +                self._reset()
> +            self.typedef_tokens.append(tok)
> +
> +        elif tok.kind == "PUNCTUATOR" and tok.text == ";":
> +            self._finish()
> +
> +        elif self.typedef_tokens:
> +            self.typedef_tokens.append(tok)
> +
> +    def eof(self):
> +        self._reset()
> +
> +    def _reset(self):
> +        while self.typedef_tokens:
> +            tok = self.typedef_tokens.pop(0)
> +            if tok.kind == "IDENT" and OBSOLETE_TYPE_RE_.match(tok.text):
> +                self.reporter.error(tok, "use of {!r}")
> +
> +    def _finish(self):
> +        if not self.typedef_tokens: return
> +        if self.typedef_tokens[-1].kind == "IDENT":
> +            m = OBSOLETE_TYPE_RE_.match(self.typedef_tokens[-1].text)
> +            if m:
> +                if self._permissible_public_definition(m):
> +                    self.typedef_tokens.clear()
> +        self._reset()
> +
> +    def _permissible_public_definition(self, m):
> +        if m.group(1) == "__": return False
> +        name = m.group(2)
> +        toks = self.typedef_tokens
> +        ntok = len(toks)
> +        if ntok == 3 and toks[1].kind == "IDENT":
> +            defn = toks[1].text
> +            n = OBSOLETE_TYPE_RE_.match(defn)
> +            if n and n.group(1) == "__" and n.group(2) == name:
> +                return True
> +
> +            if (name[:5] == "u_int" and name[-2:] == "_t"
> +                and defn[:6] == "__uint" and defn[-2:] == "_t"
> +                and name[5:-2] == defn[6:-2]):
> +                return True
> +
> +            return False
> +
> +        if (name == "ulong" and ntok == 5
> +            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
> +            and toks[2].kind == "IDENT" and toks[2].text == "long"
> +            and toks[3].kind == "IDENT" and toks[3].text == "int"):
> +            return True
> +
> +        if (name == "ushort" and ntok == 5
> +            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
> +            and toks[2].kind == "IDENT" and toks[2].text == "short"
> +            and toks[3].kind == "IDENT" and toks[3].text == "int"):
> +            return True
> +
> +        if (name == "uint" and ntok == 4
> +            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
> +            and toks[2].kind == "IDENT" and toks[2].text == "int"):
> +            return True
> +
> +        return False
> +
> +def ObsoleteTypedefChecker(reporter, fname):
> +    """Factory: produce an instance of the appropriate
> +       obsolete-typedef checker for FNAME."""
> +
> +    # The obsolete rpc/ and rpcsvc/ headers are allowed to use the
> +    # obsolete types, because it would be more trouble than it's
> +    # worth to remove them from headers that we intend to stop
> +    # installing eventually anyway.
> +    if (fname.startswith("rpc/")
> +        or fname.startswith("rpcsvc/")
> +        or "/rpc/" in fname
> +        or "/rpcsvc/" in fname):
> +        return NoCheck(reporter)
> +
> +    # bits/types.h is allowed to define the __-versions of the
> +    # obsolete types.
> +    if (fname == "bits/types.h"
> +        or fname.endswith("/bits/types.h")):
> +        return ObsoletePrivateDefinitionsAllowed(reporter)
> +
> +    # sys/types.h is allowed to use the __-versions of the
> +    # obsolete types, but only to define the unprefixed versions.
> +    if (fname == "sys/types.h"
> +        or fname.endswith("/sys/types.h")):
> +        return ObsoletePublicDefinitionsAllowed(reporter)
> +
> +    return ObsoleteNotAllowed(reporter)
> +
> +#
> +# Master control
> +#
> +
> +class HeaderChecker:
> +    """Perform all of the checks on each header.  This is also the
> +       "reporter" object expected by tokenize_c and ConstructChecker.
> +    """
> +    def __init__(self):
> +        self.fname = None
> +        self.status = 0
> +
> +    def error(self, tok, message):
> +        self.status = 1
> +        if '{!r}' in message:
> +            message = message.format(tok.text)
> +        sys.stderr.write("{}:{}:{}: error: {}\n".format(
> +            self.fname, tok.line, tok.column, message))
> +
> +    def check(self, fname):
> +        self.fname = fname
> +        try:
> +            with open(fname, "rt") as fp:
> +                contents = fp.read()
> +        except OSError as e:
> +            sys.stderr.write("{}: {}\n".format(fname, e.strerror))
> +            self.status = 1
> +            return
> +
> +        typedef_checker = ObsoleteTypedefChecker(self, self.fname)
> +
> +        for tok in tokenize_c(contents, self):
> +            typedef_checker.examine(tok)
> +
> +def main():
> +    ap = argparse.ArgumentParser(description=__doc__)
> +    ap.add_argument("headers", metavar="header", nargs="+",
> +                    help="one or more headers to scan for obsolete constructs")
> +    args = ap.parse_args()
> +
> +    checker = HeaderChecker()
> +    for fname in args.headers:
> +        # Headers whose installed name begins with "finclude/" contain
> +        # Fortran, not C, and this program should completely ignore them.
> +        if not (fname.startswith("finclude/") or "/finclude/" in fname):
> +            checker.check(fname)
> +    sys.exit(checker.status)
> +
> +main()
> 

OK.
Zack Weinberg March 12, 2019, 12:59 a.m. UTC | #2
On Mon, Mar 11, 2019 at 2:57 PM Carlos O'Donell <carlos@redhat.com> wrote:
> >       * scripts/check-obsolete-constructs.py: New test script.
> >         * scripts/check-installed-headers.sh: Remove tests for
> >         obsolete typedefs, superseded by check-obsolete-constructs.py.
> >         * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
> >         as a special test.  Update commentary.
> >         * posix/bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
> >         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
> >         Update commentary.
> >         * posix/sys/types.h (__u_intN_t): Remove.
> >         (u_int8_t): Typedef using __uint8_t.
> >         (u_int16_t): Typedef using __uint16_t.
> >         (u_int32_t): Typedef using __uint32_t.
> >         (u_int64_t): Typedef using __uint64_t.
>
> OK for master if you:
> - Fix the conditional in check-installed-headers.sh.
>
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>

Is it now the convention that we put Reviewed-by: lines into the final
commit message, or just that we say that on the mailing list?

> >              if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1
> > -            then
> > -                includes=$($cc_cmd -fsyntax-only -H $lang_mode \
> > -                              "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')
> > -                for h in $includes; do
> > -                    # Don't repeat work.
> > -                    eval 'case "$h" in ('"$already"') continue;; esac'
> > -
> > -                    if grep -qE "$obsolete_type_re" "$h"; then
> > -                        echo "*** Obsolete types detected:"
> > -                        grep -HE "$obsolete_type_re" "$h"
> > -                        failed=1
> > -                    fi
> > -                    already="$already|$h"
> > -                done
> > -            else
> > -                failed=1
> > +            then :
> > +            else failed=1
>
> Why not 'if ! $cc_cmd ...' ? Which avoids the odd empty if block e.g. ":".

`if ! ...` is not portable shell, despite being included in the POSIX
shell language.  See
https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Limitations-of-Builtins.html
(under `!`) `if command_1; then :; else command_2; fi` is the
alternative idiom I learned back in the days of SunOS 4.

Possibly `if ! ...` is portable enough for this script, but I try not
to think about whether semi-portable shell constructs are portable
enough for the current script.

Do you still want me to change it?

zw
Carlos O'Donell March 12, 2019, 3:47 a.m. UTC | #3
On 3/11/19 8:59 PM, Zack Weinberg wrote:
> On Mon, Mar 11, 2019 at 2:57 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>>       * scripts/check-obsolete-constructs.py: New test script.
>>>         * scripts/check-installed-headers.sh: Remove tests for
>>>         obsolete typedefs, superseded by check-obsolete-constructs.py.
>>>         * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
>>>         as a special test.  Update commentary.
>>>         * posix/bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
>>>         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
>>>         Update commentary.
>>>         * posix/sys/types.h (__u_intN_t): Remove.
>>>         (u_int8_t): Typedef using __uint8_t.
>>>         (u_int16_t): Typedef using __uint16_t.
>>>         (u_int32_t): Typedef using __uint32_t.
>>>         (u_int64_t): Typedef using __uint64_t.
>>
>> OK for master if you:
>> - Fix the conditional in check-installed-headers.sh.
>>
>> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> 
> Is it now the convention that we put Reviewed-by: lines into the final
> commit message, or just that we say that on the mailing list?

Please ad the reviewed-by lines into the final commit message.

I would like to see this become the way in which we more accurately
track reviewer participation and highlight the value of reviewers
to everyone investing in glibc.

>>>              if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1
>>> -            then
>>> -                includes=$($cc_cmd -fsyntax-only -H $lang_mode \
>>> -                              "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')
>>> -                for h in $includes; do
>>> -                    # Don't repeat work.
>>> -                    eval 'case "$h" in ('"$already"') continue;; esac'
>>> -
>>> -                    if grep -qE "$obsolete_type_re" "$h"; then
>>> -                        echo "*** Obsolete types detected:"
>>> -                        grep -HE "$obsolete_type_re" "$h"
>>> -                        failed=1
>>> -                    fi
>>> -                    already="$already|$h"
>>> -                done
>>> -            else
>>> -                failed=1
>>> +            then :
>>> +            else failed=1
>>
>> Why not 'if ! $cc_cmd ...' ? Which avoids the odd empty if block e.g. ":".
> 
> `if ! ...` is not portable shell, despite being included in the POSIX
> shell language.  See
> https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Limitations-of-Builtins.html
> (under `!`) `if command_1; then :; else command_2; fi` is the
> alternative idiom I learned back in the days of SunOS 4.

Holy snicker doodles. #TIL.

> Possibly `if ! ...` is portable enough for this script, but I try not
> to think about whether semi-portable shell constructs are portable
> enough for the current script.
> 
> Do you still want me to change it?

No.
Zack Weinberg March 13, 2019, 1:47 p.m. UTC | #4
On Mon, Mar 11, 2019 at 11:47 PM Carlos O'Donell <carlos@redhat.com> wrote:
> On 3/11/19 8:59 PM, Zack Weinberg wrote:
> >> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> >
> > Is it now the convention that we put Reviewed-by: lines into the final
> > commit message, or just that we say that on the mailing list?
>
> Please ad the reviewed-by lines into the final commit message.

OK, done.

> I would like to see this become the way in which we more accurately
> track reviewer participation and highlight the value of reviewers
> to everyone investing in glibc.

*nod*

> > Do you still want me to change it?
>
> No.

OK.  Patch now committed.

zw
Joseph Myers March 13, 2019, 10:16 p.m. UTC | #5
I'm seeing failures from build-many-glibcs.py for 
resource/check-obsolete-constructs:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3198: ordinal not in range(128)

This is with LC_ALL=C (and bits/resource.h headers containing UTF-8 µ in a 
comment).  Code opening text files that might not be pure ASCII needs to 
specify an encoding explicitly to avoid depending on the locale tests are 
run with.  There is also a case that the encoding specified should be 
ASCII - that installed headers should be required to be pure ASCII so they 
can be included in source files with any ASCII-compatible character set if 
compiling with -finput-charset= (which affects included headers as well as 
the main source file, so compiling "#include <sys/resource.h>" with 
-finput-charset=ascii currently fails).
Carlos O'Donell March 14, 2019, 1 p.m. UTC | #6
On 3/13/19 6:16 PM, Joseph Myers wrote:
> I'm seeing failures from build-many-glibcs.py for 
> resource/check-obsolete-constructs:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3198: ordinal not in range(128)
> 
> This is with LC_ALL=C (and bits/resource.h headers containing UTF-8 µ in a 
> comment).  Code opening text files that might not be pure ASCII needs to 
> specify an encoding explicitly to avoid depending on the locale tests are 
> run with.  There is also a case that the encoding specified should be 
> ASCII - that installed headers should be required to be pure ASCII so they 
> can be included in source files with any ASCII-compatible character set if 
> compiling with -finput-charset= (which affects included headers as well as 
> the main source file, so compiling "#include <sys/resource.h>" with 
> -finput-charset=ascii currently fails).

Do we have a requirement that #incldue <sys/resources.h> be compilable with
-finput-charset=ascii?

Or to put it another way, who decides which sources files have to be ASCII
compatible?

Is the fix to fix bits/resource.h or the python opening of the file with
UTF-8 support?
Zack Weinberg March 14, 2019, 1:21 p.m. UTC | #7
On Thu, Mar 14, 2019 at 9:00 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 3/13/19 6:16 PM, Joseph Myers wrote:
> > I'm seeing failures from build-many-glibcs.py for
> > resource/check-obsolete-constructs:
> >
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3198: ordinal not in range(128)
> >
> > This is with LC_ALL=C (and bits/resource.h headers containing UTF-8 µ in a
> > comment).

This did not happen in my build-many-glibcs run, possibly because I’m
running it in a UTF-8 locale.  Should build-many-glibcs perhaps be
setting LC_ALL=C for all subprocesses?

As an immediate fix, I am going to commit a patch to
check-obsolete-constructs that specifies encoding="utf-8" since that’s
what we have in header files right now.

> > There is also a case that the encoding specified should be
> > ASCII - that installed headers should be required to be pure ASCII so they
> > can be included in source files with any ASCII-compatible character set if
> > compiling with -finput-charset= (which affects included headers as well as
> > the main source file, so compiling "#include <sys/resource.h>" with
> > -finput-charset=ascii currently fails).
>
> Do we have a requirement that #incldue <sys/resources.h> be compilable with
> -finput-charset=ascii?

I think a requirement that our installed header files be compilable
with *any* valid setting of -finput-charset= by application Makefiles
is reasonable (or, in other words, all installed header files should
use only the basic source character set).  This is technically a
stronger constraint than requiring -finput-charset=ascii to work, but
in practice I think testing against -finput-charset=ascii would be
sufficient.

I think it’s a bug in GCC that -finput-charset=ascii causes an error
for non-ASCII characters inside comments, but there have been so many
releases with that bug that we have to cope.

A counterargument is that clang apparently only implements
-finput-charset=utf-8; *any other value* is rejected.  That this was
considered adequate Makefile compatibility for the feature, strongly
suggests that nobody is using any other extended source character set
and we should be OK to continue using UTF-8 in installed headers, at
least in comments.

Whatever we do should be enforced by some test or other.  It might be
more appropriate to add it to check-installed-headers.sh than
check-obsolete-constructs.py, though.

zw
Joseph Myers March 14, 2019, 6:06 p.m. UTC | #8
On Thu, 14 Mar 2019, Zack Weinberg wrote:

> On Thu, Mar 14, 2019 at 9:00 AM Carlos O'Donell <carlos@redhat.com> wrote:
> >
> > On 3/13/19 6:16 PM, Joseph Myers wrote:
> > > I'm seeing failures from build-many-glibcs.py for
> > > resource/check-obsolete-constructs:
> > >
> > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3198: ordinal not in range(128)
> > >
> > > This is with LC_ALL=C (and bits/resource.h headers containing UTF-8 µ in a
> > > comment).
> 
> This did not happen in my build-many-glibcs run, possibly because I’m
> running it in a UTF-8 locale.  Should build-many-glibcs perhaps be
> setting LC_ALL=C for all subprocesses?

The argument that it shouldn't is that it can be run it any valid build 
environment, and glibc should build and test independent of the user's 
locale - individual parts of the build should set the locale where needed 
(whether to build or pass tests at all, or to achieve results that are 
independent of the user's locale, e.g. when sorting data).

(So someone could reasonably run build-many-glibcs.py in different locales 
and compare the results, to test whether the build is properly 
locale-independent, for example.)

Patch
diff mbox series

diff --git a/Rules b/Rules
index e08a28d9f3..222dba6dcb 100644
--- a/Rules
+++ b/Rules
@@ -82,7 +82,8 @@  $(common-objpfx)dummy.c:
 common-generated += dummy.o dummy.c
 
 ifneq "$(headers)" ""
-# Special test of all the installed headers in this directory.
+# Test that all of the headers installed by this directory can be compiled
+# in isolation.
 tests-special += $(objpfx)check-installed-headers-c.out
 libof-check-installed-headers-c := testsuite
 $(objpfx)check-installed-headers-c.out: \
@@ -93,6 +94,8 @@  $(objpfx)check-installed-headers-c.out: \
 	$(evaluate-test)
 
 ifneq "$(CXX)" ""
+# If a C++ compiler is available, also test that they can be compiled
+# in isolation as C++.
 tests-special += $(objpfx)check-installed-headers-cxx.out
 libof-check-installed-headers-cxx := testsuite
 $(objpfx)check-installed-headers-cxx.out: \
@@ -103,12 +106,24 @@  $(objpfx)check-installed-headers-cxx.out: \
 	$(evaluate-test)
 endif # $(CXX)
 
+# Test that a wrapper header exists in include/ for each non-sysdeps header.
+# This script does not need $(py-env).
 tests-special += $(objpfx)check-wrapper-headers.out
 $(objpfx)check-wrapper-headers.out: \
   $(..)scripts/check-wrapper-headers.py $(headers)
 	$(PYTHON) $< --root=$(..) --subdir=$(subdir) $(headers) > $@; \
 	  $(evaluate-test)
 
+# Test that none of the headers installed by this directory use certain
+# obsolete constructs (e.g. legacy BSD typedefs superseded by stdint.h).
+# This script does not need $(py-env).
+tests-special += $(objpfx)check-obsolete-constructs.out
+libof-check-obsolete-constructs := testsuite
+$(objpfx)check-obsolete-constructs.out: \
+    $(..)scripts/check-obsolete-constructs.py $(headers)
+	$(PYTHON) $^ > $@ 2>&1; \
+	$(evaluate-test)
+
 endif # $(headers)
 
 # This makes all the auxiliary and test programs.
diff --git a/posix/bits/types.h b/posix/bits/types.h
index 27e065c3be..0de6c59bb4 100644
--- a/posix/bits/types.h
+++ b/posix/bits/types.h
@@ -87,7 +87,7 @@  __extension__ typedef unsigned long long int __uintmax_t;
 	32		-- "natural" 32-bit type (always int)
 	64		-- "natural" 64-bit type (long or long long)
 	LONG32		-- 32-bit type, traditionally long
-	QUAD		-- 64-bit type, always long long
+	QUAD		-- 64-bit type, traditionally long long
 	WORD		-- natural type of __WORDSIZE bits (int or long)
 	LONGWORD	-- type of __WORDSIZE bits, traditionally long
 
@@ -113,14 +113,14 @@  __extension__ typedef unsigned long long int __uintmax_t;
 #define __SLONGWORD_TYPE	long int
 #define __ULONGWORD_TYPE	unsigned long int
 #if __WORDSIZE == 32
-# define __SQUAD_TYPE		__quad_t
-# define __UQUAD_TYPE		__u_quad_t
+# define __SQUAD_TYPE		__int64_t
+# define __UQUAD_TYPE		__uint64_t
 # define __SWORD_TYPE		int
 # define __UWORD_TYPE		unsigned int
 # define __SLONG32_TYPE		long int
 # define __ULONG32_TYPE		unsigned long int
-# define __S64_TYPE		__quad_t
-# define __U64_TYPE		__u_quad_t
+# define __S64_TYPE		__int64_t
+# define __U64_TYPE		__uint64_t
 /* We want __extension__ before typedef's that use nonstandard base types
    such as `long long' in C89 mode.  */
 # define __STD_TYPE		__extension__ typedef
diff --git a/posix/sys/types.h b/posix/sys/types.h
index 27129c5c23..0e37b1ce6a 100644
--- a/posix/sys/types.h
+++ b/posix/sys/types.h
@@ -154,37 +154,20 @@  typedef unsigned int uint;
 
 #include <bits/stdint-intn.h>
 
-#if !__GNUC_PREREQ (2, 7)
-
 /* These were defined by ISO C without the first `_'.  */
-typedef	unsigned char u_int8_t;
-typedef	unsigned short int u_int16_t;
-typedef	unsigned int u_int32_t;
-# if __WORDSIZE == 64
-typedef unsigned long int u_int64_t;
-# else
-__extension__ typedef unsigned long long int u_int64_t;
-# endif
-
-typedef int register_t;
-
-#else
-
-/* For GCC 2.7 and later, we can use specific type-size attributes.  */
-# define __u_intN_t(N, MODE) \
-  typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE)))
-
-__u_intN_t (8, __QI__);
-__u_intN_t (16, __HI__);
-__u_intN_t (32, __SI__);
-__u_intN_t (64, __DI__);
+typedef __uint8_t u_int8_t;
+typedef __uint16_t u_int16_t;
+typedef __uint32_t u_int32_t;
+typedef __uint64_t u_int64_t;
 
+#if __GNUC_PREREQ (2, 7)
 typedef int register_t __attribute__ ((__mode__ (__word__)));
-
+#else
+typedef int register_t;
+#endif
 
 /* Some code from BIND tests this macro to see if the types above are
    defined.  */
-#endif
 #define __BIT_TYPES_DEFINED__	1
 
 
diff --git a/scripts/check-installed-headers.sh b/scripts/check-installed-headers.sh
index 1f4496446c..e4f37d33f8 100644
--- a/scripts/check-installed-headers.sh
+++ b/scripts/check-installed-headers.sh
@@ -16,11 +16,9 @@ 
 # License along with the GNU C Library; if not, see
 # <http://www.gnu.org/licenses/>.
 
-# Check installed headers for cleanliness.  For each header, confirm
-# that it's possible to compile a file that includes that header and
-# does nothing else, in several different compilation modes.  Also,
-# scan the header for a set of obsolete typedefs that should no longer
-# appear.
+# For each installed header, confirm that it's possible to compile a
+# file that includes that header and does nothing else, in several
+# different compilation modes.
 
 # These compilation switches assume GCC or compatible, which is probably
 # fine since we also assume that when _building_ glibc.
@@ -31,13 +29,6 @@  cxx_modes="-std=c++98 -std=gnu++98 -std=c++11 -std=gnu++11"
 # These are probably the most commonly used three.
 lib_modes="-D_DEFAULT_SOURCE=1 -D_GNU_SOURCE=1 -D_XOPEN_SOURCE=700"
 
-# sys/types.h+bits/types.h have to define the obsolete types.
-# rpc(svc)/* have the obsolete types too deeply embedded in their API
-# to remove.
-skip_obsolete_type_check='*/sys/types.h|*/bits/types.h|*/rpc/*|*/rpcsvc/*'
-obsolete_type_re=\
-'\<((__)?(quad_t|u(short|int|long|_(char|short|int([0-9]+_t)?|long|quad_t))))\>'
-
 if [ $# -lt 3 ]; then
     echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
     exit 2
@@ -46,14 +37,10 @@  case "$1" in
     (c)
         lang_modes="$c_modes"
         cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.c)
-        already="$skip_obsolete_type_check"
     ;;
     (c++)
         lang_modes="$cxx_modes"
         cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.cc)
-        # The obsolete-type check can be skipped for C++; it is
-        # sufficient to do it for C.
-        already="*"
     ;;
     (*)
         echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
@@ -155,22 +142,8 @@  $expanded_lib_mode
 int avoid_empty_translation_unit;
 EOF
             if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1
-            then
-                includes=$($cc_cmd -fsyntax-only -H $lang_mode \
-                              "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')
-                for h in $includes; do
-                    # Don't repeat work.
-                    eval 'case "$h" in ('"$already"') continue;; esac'
-
-                    if grep -qE "$obsolete_type_re" "$h"; then
-                        echo "*** Obsolete types detected:"
-                        grep -HE "$obsolete_type_re" "$h"
-                        failed=1
-                    fi
-                    already="$already|$h"
-                done
-            else
-                failed=1
+            then :
+            else failed=1
             fi
         done
     done
diff --git a/scripts/check-obsolete-constructs.py b/scripts/check-obsolete-constructs.py
new file mode 100755
index 0000000000..46535afcac
--- /dev/null
+++ b/scripts/check-obsolete-constructs.py
@@ -0,1 +1,466 @@ 
+#! /usr/bin/python3
+# Copyright (C) 2019 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+"""Verifies that installed headers do not use any obsolete constructs:
+ * legacy BSD typedefs superseded by <stdint.h>:
+   ushort uint ulong u_char u_short u_int u_long u_intNN_t quad_t u_quad_t
+   (sys/types.h is allowed to _define_ these types, but not to use them
+    to define anything else).
+"""
+
+import argparse
+import collections
+import re
+import sys
+
+# Simplified lexical analyzer for C preprocessing tokens.
+# Does not implement trigraphs.
+# Does not implement backslash-newline in the middle of any lexical
+#   item other than a string literal.
+# Does not implement universal-character-names in identifiers.
+# Treats prefixed strings (e.g. L"...") as two tokens (L and "...")
+# Accepts non-ASCII characters only within comments and strings.
+
+# Caution: The order of the outermost alternation matters.
+# STRING must be before BAD_STRING, CHARCONST before BAD_CHARCONST,
+# BLOCK_COMMENT before BAD_BLOCK_COM before PUNCTUATOR, and OTHER must
+# be last.
+# Caution: There should be no capturing groups other than the named
+# captures in the outermost alternation.
+
+# For reference, these are all of the C punctuators as of C11:
+#   [ ] ( ) { } , ; ? ~
+#   ! != * *= / /= ^ ^= = ==
+#   # ##
+#   % %= %> %: %:%:
+#   & &= &&
+#   | |= ||
+#   + += ++
+#   - -= -- ->
+#   . ...
+#   : :>
+#   < <% <: << <<= <=
+#   > >= >> >>=
+
+# The BAD_* tokens are not part of the official definition of pp-tokens;
+# they match unclosed strings, character constants, and block comments,
+# so that the regex engine doesn't have to backtrack all the way to the
+# beginning of a broken construct and then emit dozens of junk tokens.
+
+PP_TOKEN_RE_ = re.compile(r"""
+    (?P<STRING>        \"(?:[^\"\\\r\n]|\\(?:[\r\n -~]|\r\n))*\")
+   |(?P<BAD_STRING>    \"(?:[^\"\\\r\n]|\\[ -~])*)
+   |(?P<CHARCONST>     \'(?:[^\'\\\r\n]|\\(?:[\r\n -~]|\r\n))*\')
+   |(?P<BAD_CHARCONST> \'(?:[^\'\\\r\n]|\\[ -~])*)
+   |(?P<BLOCK_COMMENT> /\*(?:\*(?!/)|[^*])*\*/)
+   |(?P<BAD_BLOCK_COM> /\*(?:\*(?!/)|[^*])*\*?)
+   |(?P<LINE_COMMENT>  //[^\r\n]*)
+   |(?P<IDENT>         [_a-zA-Z][_a-zA-Z0-9]*)
+   |(?P<PP_NUMBER>     \.?[0-9](?:[0-9a-df-oq-zA-DF-OQ-Z_.]|[eEpP][+-]?)*)
+   |(?P<PUNCTUATOR>
+       [,;?~(){}\[\]]
+     | [!*/^=]=?
+     | \#\#?
+     | %(?:[=>]|:(?:%:)?)?
+     | &[=&]?
+     |\|[=|]?
+     |\+[=+]?
+     | -[=->]?
+     |\.(?:\.\.)?
+     | :>?
+     | <(?:[%:]|<(?:=|<=?)?)?
+     | >(?:=|>=?)?)
+   |(?P<ESCNL>         \\(?:\r|\n|\r\n))
+   |(?P<WHITESPACE>    [ \t\n\r\v\f]+)
+   |(?P<OTHER>         .)
+""", re.DOTALL | re.VERBOSE)
+
+HEADER_NAME_RE_ = re.compile(r"""
+    < [^>\r\n]+ >
+  | " [^"\r\n]+ "
+""", re.DOTALL | re.VERBOSE)
+
+ENDLINE_RE_ = re.compile(r"""\r|\n|\r\n""")
+
+# based on the sample code in the Python re documentation
+Token_ = collections.namedtuple("Token", (
+    "kind", "text", "line", "column", "context"))
+Token_.__doc__ = """
+   One C preprocessing token, comment, or chunk of whitespace.
+   'kind' identifies the token type, which will be one of:
+       STRING, CHARCONST, BLOCK_COMMENT, LINE_COMMENT, IDENT,
+       PP_NUMBER, PUNCTUATOR, ESCNL, WHITESPACE, HEADER_NAME,
+       or OTHER.  The BAD_* alternatives in PP_TOKEN_RE_ are
+       handled within tokenize_c, below.
+
+   'text' is the sequence of source characters making up the token;
+       no decoding whatsoever is performed.
+
+   'line' and 'column' give the position of the first character of the
+      token within the source file.  They are both 1-based.
+
+   'context' indicates whether or not this token occurred within a
+      preprocessing directive; it will be None for running text,
+      '<null>' for the leading '#' of a directive line (because '#'
+      all by itself on a line is a "null directive"), or the name of
+      the directive for tokens within a directive line, starting with
+      the IDENT for the name itself.
+"""
+
+def tokenize_c(file_contents, reporter):
+    """Yield a series of Token objects, one for each preprocessing
+       token, comment, or chunk of whitespace within FILE_CONTENTS.
+       The REPORTER object is expected to have one method,
+       reporter.error(token, message), which will be called to
+       indicate a lexical error at the position of TOKEN.
+       If MESSAGE contains the four-character sequence '{!r}', that
+       is expected to be replaced by repr(token.text).
+    """
+
+    Token = Token_
+    PP_TOKEN_RE = PP_TOKEN_RE_
+    ENDLINE_RE = ENDLINE_RE_
+    HEADER_NAME_RE = HEADER_NAME_RE_
+
+    line_num = 1
+    line_start = 0
+    pos = 0
+    limit = len(file_contents)
+    directive = None
+    at_bol = True
+    while pos < limit:
+        if directive == "include":
+            mo = HEADER_NAME_RE.match(file_contents, pos)
+            if mo:
+                kind = "HEADER_NAME"
+                directive = "after_include"
+            else:
+                mo = PP_TOKEN_RE.match(file_contents, pos)
+                kind = mo.lastgroup
+                if kind != "WHITESPACE":
+                    directive = "after_include"
+        else:
+            mo = PP_TOKEN_RE.match(file_contents, pos)
+            kind = mo.lastgroup
+
+        text = mo.group()
+        line = line_num
+        column = mo.start() - line_start
+        adj_line_start = 0
+        # only these kinds can contain a newline
+        if kind in ("WHITESPACE", "BLOCK_COMMENT", "LINE_COMMENT",
+                    "STRING", "CHARCONST", "BAD_BLOCK_COM", "ESCNL"):
+            for tmo in ENDLINE_RE.finditer(text):
+                line_num += 1
+                adj_line_start = tmo.end()
+            if adj_line_start:
+                line_start = mo.start() + adj_line_start
+
+        # Track whether or not we are scanning a preprocessing directive.
+        if kind == "LINE_COMMENT" or (kind == "WHITESPACE" and adj_line_start):
+            at_bol = True
+            directive = None
+        else:
+            if kind == "PUNCTUATOR" and text == "#" and at_bol:
+                directive = "<null>"
+            elif kind == "IDENT" and directive == "<null>":
+                directive = text
+            at_bol = False
+
+        # Report ill-formed tokens and rewrite them as their well-formed
+        # equivalents, so downstream processing doesn't have to know about them.
+        # (Rewriting instead of discarding provides better error recovery.)
+        if kind == "BAD_BLOCK_COM":
+            reporter.error(Token("BAD_BLOCK_COM", "", line, column+1, ""),
+                           "unclosed block comment")
+            text += "*/"
+            kind = "BLOCK_COMMENT"
+        elif kind == "BAD_STRING":
+            reporter.error(Token("BAD_STRING", "", line, column+1, ""),
+                           "unclosed string")
+            text += "\""
+            kind = "STRING"
+        elif kind == "BAD_CHARCONST":
+            reporter.error(Token("BAD_CHARCONST", "", line, column+1, ""),
+                           "unclosed char constant")
+            text += "'"
+            kind = "CHARCONST"
+
+        tok = Token(kind, text, line, column+1,
+                    "include" if directive == "after_include" else directive)
+        # Do not complain about OTHER tokens inside macro definitions.
+        # $ and @ appear in macros defined by headers intended to be
+        # included from assembly language, e.g. sysdeps/mips/sys/asm.h.
+        if kind == "OTHER" and directive != "define":
+            self.error(tok, "stray {!r} in program")
+
+        yield tok
+        pos = mo.end()
+
+#
+# Base and generic classes for individual checks.
+#
+
+class ConstructChecker:
+    """Scan a stream of C preprocessing tokens and possibly report
+       problems with them.  The REPORTER object passed to __init__ has
+       one method, reporter.error(token, message), which should be
+       called to indicate a problem detected at the position of TOKEN.
+       If MESSAGE contains the four-character sequence '{!r}' then that
+       will be replaced with a textual representation of TOKEN.
+    """
+    def __init__(self, reporter):
+        self.reporter = reporter
+
+    def examine(self, tok):
+        """Called once for each token in a header file.
+           Call self.reporter.error if a problem is detected.
+        """
+        raise NotImplementedError
+
+    def eof(self):
+        """Called once at the end of the stream.  Subclasses need only
+           override this if it might have something to do."""
+        pass
+
+class NoCheck(ConstructChecker):
+    """Generic checker class which doesn't do anything.  Substitute this
+       class for a real checker when a particular check should be skipped
+       for some file."""
+
+    def examine(self, tok):
+        pass
+
+#
+# Check for obsolete type names.
+#
+
+# The obsolete type names we're looking for:
+OBSOLETE_TYPE_RE_ = re.compile(r"""\A
+  (__)?
+  (   quad_t
+    | u(?: short | int | long
+         | _(?: char | short | int(?:[0-9]+_t)? | long | quad_t )))
+\Z""", re.VERBOSE)
+
+class ObsoleteNotAllowed(ConstructChecker):
+    """Don't allow any use of the obsolete typedefs."""
+    def examine(self, tok):
+        if OBSOLETE_TYPE_RE_.match(tok.text):
+            self.reporter.error(tok, "use of {!r}")
+
+class ObsoletePrivateDefinitionsAllowed(ConstructChecker):
+    """Allow definitions of the private versions of the
+       obsolete typedefs; that is, 'typedef [anything] __obsolete;'
+    """
+    def __init__(self, reporter):
+        super().__init__(reporter)
+        self.in_typedef = False
+        self.prev_token = None
+
+    def examine(self, tok):
+        # bits/types.h hides 'typedef' in a macro sometimes.
+        if (tok.kind == "IDENT"
+            and tok.text in ("typedef", "__STD_TYPE")
+            and tok.context is None):
+            self.in_typedef = True
+        elif tok.kind == "PUNCTUATOR" and tok.text == ";" and self.in_typedef:
+            self.in_typedef = False
+            if self.prev_token.kind == "IDENT":
+                m = OBSOLETE_TYPE_RE_.match(self.prev_token.text)
+                if m and m.group(1) != "__":
+                    self.reporter.error(self.prev_token, "use of {!r}")
+            self.prev_token = None
+        else:
+            self._check_prev()
+
+        self.prev_token = tok
+
+    def eof(self):
+        self._check_prev()
+
+    def _check_prev(self):
+        if (self.prev_token is not None
+            and self.prev_token.kind == "IDENT"
+            and OBSOLETE_TYPE_RE_.match(self.prev_token.text)):
+            self.reporter.error(self.prev_token, "use of {!r}")
+
+class ObsoletePublicDefinitionsAllowed(ConstructChecker):
+    """Allow definitions of the public versions of the obsolete
+       typedefs.  Only specific forms of definition are allowed:
+
+           typedef __obsolete obsolete;  // identifiers must agree
+           typedef __uintN_t u_intN_t;   // N must agree
+           typedef unsigned long int ulong;
+           typedef unsigned short int ushort;
+           typedef unsigned int uint;
+    """
+    def __init__(self, reporter):
+        super().__init__(reporter)
+        self.typedef_tokens = []
+
+    def examine(self, tok):
+        if tok.kind in ("WHITESPACE", "BLOCK_COMMENT",
+                        "LINE_COMMENT", "NL", "ESCNL"):
+            pass
+
+        elif (tok.kind == "IDENT" and tok.text == "typedef"
+              and tok.context is None):
+            if self.typedef_tokens:
+                self.reporter.error(tok, "typedef inside typedef")
+                self._reset()
+            self.typedef_tokens.append(tok)
+
+        elif tok.kind == "PUNCTUATOR" and tok.text == ";":
+            self._finish()
+
+        elif self.typedef_tokens:
+            self.typedef_tokens.append(tok)
+
+    def eof(self):
+        self._reset()
+
+    def _reset(self):
+        while self.typedef_tokens:
+            tok = self.typedef_tokens.pop(0)
+            if tok.kind == "IDENT" and OBSOLETE_TYPE_RE_.match(tok.text):
+                self.reporter.error(tok, "use of {!r}")
+
+    def _finish(self):
+        if not self.typedef_tokens: return
+        if self.typedef_tokens[-1].kind == "IDENT":
+            m = OBSOLETE_TYPE_RE_.match(self.typedef_tokens[-1].text)
+            if m:
+                if self._permissible_public_definition(m):
+                    self.typedef_tokens.clear()
+        self._reset()
+
+    def _permissible_public_definition(self, m):
+        if m.group(1) == "__": return False
+        name = m.group(2)
+        toks = self.typedef_tokens
+        ntok = len(toks)
+        if ntok == 3 and toks[1].kind == "IDENT":
+            defn = toks[1].text
+            n = OBSOLETE_TYPE_RE_.match(defn)
+            if n and n.group(1) == "__" and n.group(2) == name:
+                return True
+
+            if (name[:5] == "u_int" and name[-2:] == "_t"
+                and defn[:6] == "__uint" and defn[-2:] == "_t"
+                and name[5:-2] == defn[6:-2]):
+                return True
+
+            return False
+
+        if (name == "ulong" and ntok == 5
+            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
+            and toks[2].kind == "IDENT" and toks[2].text == "long"
+            and toks[3].kind == "IDENT" and toks[3].text == "int"):
+            return True
+
+        if (name == "ushort" and ntok == 5
+            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
+            and toks[2].kind == "IDENT" and toks[2].text == "short"
+            and toks[3].kind == "IDENT" and toks[3].text == "int"):
+            return True
+
+        if (name == "uint" and ntok == 4
+            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
+            and toks[2].kind == "IDENT" and toks[2].text == "int"):
+            return True
+
+        return False
+
+def ObsoleteTypedefChecker(reporter, fname):
+    """Factory: produce an instance of the appropriate
+       obsolete-typedef checker for FNAME."""
+
+    # The obsolete rpc/ and rpcsvc/ headers are allowed to use the
+    # obsolete types, because it would be more trouble than it's
+    # worth to remove them from headers that we intend to stop
+    # installing eventually anyway.
+    if (fname.startswith("rpc/")
+        or fname.startswith("rpcsvc/")
+        or "/rpc/" in fname
+        or "/rpcsvc/" in fname):
+        return NoCheck(reporter)
+
+    # bits/types.h is allowed to define the __-versions of the
+    # obsolete types.
+    if (fname == "bits/types.h"
+        or fname.endswith("/bits/types.h")):
+        return ObsoletePrivateDefinitionsAllowed(reporter)
+
+    # sys/types.h is allowed to use the __-versions of the
+    # obsolete types, but only to define the unprefixed versions.
+    if (fname == "sys/types.h"
+        or fname.endswith("/sys/types.h")):
+        return ObsoletePublicDefinitionsAllowed(reporter)
+
+    return ObsoleteNotAllowed(reporter)
+
+#
+# Master control
+#
+
+class HeaderChecker:
+    """Perform all of the checks on each header.  This is also the
+       "reporter" object expected by tokenize_c and ConstructChecker.
+    """
+    def __init__(self):
+        self.fname = None
+        self.status = 0
+
+    def error(self, tok, message):
+        self.status = 1
+        if '{!r}' in message:
+            message = message.format(tok.text)
+        sys.stderr.write("{}:{}:{}: error: {}\n".format(
+            self.fname, tok.line, tok.column, message))
+
+    def check(self, fname):
+        self.fname = fname
+        try:
+            with open(fname, "rt") as fp:
+                contents = fp.read()
+        except OSError as e:
+            sys.stderr.write("{}: {}\n".format(fname, e.strerror))
+            self.status = 1
+            return
+
+        typedef_checker = ObsoleteTypedefChecker(self, self.fname)
+
+        for tok in tokenize_c(contents, self):
+            typedef_checker.examine(tok)
+
+def main():
+    ap = argparse.ArgumentParser(description=__doc__)
+    ap.add_argument("headers", metavar="header", nargs="+",
+                    help="one or more headers to scan for obsolete constructs")
+    args = ap.parse_args()
+
+    checker = HeaderChecker()
+    for fname in args.headers:
+        # Headers whose installed name begins with "finclude/" contain
+        # Fortran, not C, and this program should completely ignore them.
+        if not (fname.startswith("finclude/") or "/finclude/" in fname):
+            checker.check(fname)
+    sys.exit(checker.status)
+
+main()
--