diff mbox series

Proposal to patch libiberty.a for controlling whether pathnames on Windows are converted to lower case

Message ID a0af4ffc-7e67-3d69-6c00-3e55ec139e9b@codesourcery.com
State New
Headers show
Series Proposal to patch libiberty.a for controlling whether pathnames on Windows are converted to lower case | expand

Commit Message

Carroll, Paul Aug. 16, 2019, 12:42 a.m. UTC
This is a proposed patch to libiberty, with an accompanying patch to GCC.
The purpose of this patch is to make it possible for Windows-hosted 
toolchains to have the ability to control whether Canonicalized 
filenames are converted to all lower-case.
Most Windows users are not affected by this behavior.
However, we have at least one customer who does use Windows systems 
where their hard drives are case-sensitive.  Thus, they desire this ability.

The first implementation of Windows support in libiberty/lrealpath.c was 
added back in 2004.  The new code included a call to GetFullPathName(), 
to Canonicalize the filename.  Next,  if there was sufficient room in 
the buffer, the following code was run:

         /* The file system is case-preserving but case-insensitive,
            Canonicalize to lowercase, using the codepage associated
            with the process locale.  */
         CharLowerBuff (buf, len);
         return strdup (buf);

In effect, the assumption of the code is that all Windows file systems 
will be case-preserving but case-insensitive, so converting a filename 
to lowercase is not a problem.  And tools would always find the 
resulting file on the file system.  That turns out not to be true, but 
lrealpath() was mostly used just for system header files, so no one noticed.

However, in April 2014, libcpp was patched, to cause even non-system 
headers on Windows systems to be Canonicalized.  This patch has caused 
problems for users that have case-sensitive file systems on their 
Windows systems.  A patch to libcpp was proposed to do additional 
Canonicalize non-system headers on Windows systems.
The discussion on the patch starts at 
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00009.html
As is noted in the comments:
       For DOS based file system, we always try to shorten non-system 
headers, as DOS has a tighter constraint on max path length.
The okay to add the patch was given May 9, 2014 at 
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00557.html , with the note:
       I've spoke with Kai a bit about this and he thinks it's 
appropriate and desirable to shorten paths on these kinds of filesystems.

The libcpp change meant that lrealpath() began to get called for 
non-system header files.  There is other code in both bfd and libiberty 
that can reach the lrealpath() function, but apparently those functions 
are not as much of a problem, as in not really being used in most tools 
(obviously since our customer had no complaints before 2014).

What I am proposing is to modify lrealpath() to only call CharLowerBuff 
when the user has a filesystem where changing the case of the filename 
is not an issue.
That is actually most users, so the default should be to call CharLowerBuff.
But I want to give the user the ability to control that behavior.

My proposal is to change that section of code in libiberty/lrealpath.c 
as follows:

      else
        {
-       /* The file system is case-preserving but case-insensitive,
-          Canonicalize to lowercase, using the codepage associated
+       /* The file system is normally case-preserving but case-insensitive,
+          Canonicalize to lowercase if desired, using the codepage 
associated
            with the process locale.  */
-        CharLowerBuff (buf, len);
+       if (libiberty_lowerpath)
+          CharLowerBuff (buf, len);
          return strdup (buf);
        }

In effect, this just adds a control to let the user decide whether or 
not a pathname is converted to lowercase on Windows systems.

I also added a global definition for that control at the start of the 
libiberty/lrealpath.c source, setting it so the default behavior on 
Windows is to convert pathnames to lowercase:

+/* External option to control whether a pathname is converted
+   to all lower-case on Windows systems.  The default behavior
+   is to convert.
+*/
+#if defined (_WIN32)
+#ifdef __cplusplus
+extern "C" {
+#endif
+unsigned char libiberty_lowerpath = 1;
+#ifdef __cplusplus
+}
+#endif
+#endif

And, for use by tools that link to libiberty.a, I added an external 
reference to that control in include/libiberty.h:

+#if defined (_WIN32)
+/* Determine if filenames should be converted to lower case */
+extern unsigned char libiberty_lowerpath;
+#endif


Adding the above code to include/libiberty.h and libiberty/lrealpath.c 
results in a libiberty.a that behaves exactly the same as it does today 
for most users.
It also makes available a method of affecting whether or not a given 
tool affects canonicalized pathnames on Windows by also converting those 
pathnames to lowercase.

The questions to discuss:
1.    Is this a reasonable solution to this problem, by adding such a 
controlling symbol?
2.    Is it reasonable to use ‘libiberty_lowerpath’ as the symbol’s 
name?  There are other global symbols defined in libiberty that use a 
similar name, so this seems reasonable.
3.    Should the symbol be called ‘libiberty_lowerpath’ or something 
else, such as ‘libiberty_lowerfilename’?  Does that matter?
4.    While the ‘CharLowerBuff()’ code is within a _WIN32 section of 
code, should the global symbol also be defined so it only exists in 
libiberty.a for Windows hosts?  Or should that global symbol always be 
defined in libiberty.a, regardless of whether it is for Windows or 
Linux?  Obviously that symbol is only meaningful for Windows hosts.  But 
the existence (or non-existence)  of the symbol obviously affects how 
tools write their code to modify that symbol.  As presently written, 
tools would only modify this symbol when being built for running on 
Windows.

Depending on the answers to the above, a followup can show the proposed 
changes to tools, to add options to allow control of changing filenames 
to lower case on Windows.

If the above change (or something similar) is done to libiberty, then 
GCC should also be changed to add an option for accessing this 
variable.  I would propose adding the '-flowerpath' option, which would 
be the default.  That also adds '-fno-lowerpath', which is what users 
would use to keep their filenames from being converted to lower-case.  
Most users, of course, could just ignore this option.  And it would only 
be meaningful for Windows-hosted tools.

Here are the patches that would implement the above:

                         loc);

Comments

Jeff Law Sept. 4, 2019, 3:46 p.m. UTC | #1
On 8/15/19 6:42 PM, Carroll, Paul wrote:
> This is a proposed patch to libiberty, with an accompanying patch to GCC.
> The purpose of this patch is to make it possible for Windows-hosted
> toolchains to have the ability to control whether Canonicalized
> filenames are converted to all lower-case.
> Most Windows users are not affected by this behavior.
> However, we have at least one customer who does use Windows systems
> where their hard drives are case-sensitive.  Thus, they desire this
> ability.
> 
> The first implementation of Windows support in libiberty/lrealpath.c was
> added back in 2004.  The new code included a call to GetFullPathName(),
> to Canonicalize the filename.  Next,  if there was sufficient room in
> the buffer, the following code was run:
> 
>         /* The file system is case-preserving but case-insensitive,
>            Canonicalize to lowercase, using the codepage associated
>            with the process locale.  */
>         CharLowerBuff (buf, len);
>         return strdup (buf);
> 
> In effect, the assumption of the code is that all Windows file systems
> will be case-preserving but case-insensitive, so converting a filename
> to lowercase is not a problem.  And tools would always find the
> resulting file on the file system.  That turns out not to be true, but
> lrealpath() was mostly used just for system header files, so no one
> noticed.
> 
> However, in April 2014, libcpp was patched, to cause even non-system
> headers on Windows systems to be Canonicalized.  This patch has caused
> problems for users that have case-sensitive file systems on their
> Windows systems.  A patch to libcpp was proposed to do additional
> Canonicalize non-system headers on Windows systems.
> The discussion on the patch starts at
> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00009.html
> As is noted in the comments:
>       For DOS based file system, we always try to shorten non-system
> headers, as DOS has a tighter constraint on max path length.
> The okay to add the patch was given May 9, 2014 at
> https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00557.html , with the note:
>       I've spoke with Kai a bit about this and he thinks it's
> appropriate and desirable to shorten paths on these kinds of filesystems.
> 
> The libcpp change meant that lrealpath() began to get called for
> non-system header files.  There is other code in both bfd and libiberty
> that can reach the lrealpath() function, but apparently those functions
> are not as much of a problem, as in not really being used in most tools
> (obviously since our customer had no complaints before 2014).
> 
> What I am proposing is to modify lrealpath() to only call CharLowerBuff
> when the user has a filesystem where changing the case of the filename
> is not an issue.
> That is actually most users, so the default should be to call
> CharLowerBuff.
> But I want to give the user the ability to control that behavior.
> 
> My proposal is to change that section of code in libiberty/lrealpath.c
> as follows:
> 
>      else
>        {
> -       /* The file system is case-preserving but case-insensitive,
> -          Canonicalize to lowercase, using the codepage associated
> +       /* The file system is normally case-preserving but
> case-insensitive,
> +          Canonicalize to lowercase if desired, using the codepage
> associated
>            with the process locale.  */
> -        CharLowerBuff (buf, len);
> +       if (libiberty_lowerpath)
> +          CharLowerBuff (buf, len);
>          return strdup (buf);
>        }
> 
> In effect, this just adds a control to let the user decide whether or
> not a pathname is converted to lowercase on Windows systems.
> 
> I also added a global definition for that control at the start of the
> libiberty/lrealpath.c source, setting it so the default behavior on
> Windows is to convert pathnames to lowercase:
> 
> +/* External option to control whether a pathname is converted
> +   to all lower-case on Windows systems.  The default behavior
> +   is to convert.
> +*/
> +#if defined (_WIN32)
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +unsigned char libiberty_lowerpath = 1;
> +#ifdef __cplusplus
> +}
> +#endif
> +#endif
> 
> And, for use by tools that link to libiberty.a, I added an external
> reference to that control in include/libiberty.h:
> 
> +#if defined (_WIN32)
> +/* Determine if filenames should be converted to lower case */
> +extern unsigned char libiberty_lowerpath;
> +#endif
> 
> 
> Adding the above code to include/libiberty.h and libiberty/lrealpath.c
> results in a libiberty.a that behaves exactly the same as it does today
> for most users.
> It also makes available a method of affecting whether or not a given
> tool affects canonicalized pathnames on Windows by also converting those
> pathnames to lowercase.
> 
> The questions to discuss:
> 1.    Is this a reasonable solution to this problem, by adding such a
> controlling symbol?
> 2.    Is it reasonable to use ‘libiberty_lowerpath’ as the symbol’s
> name?  There are other global symbols defined in libiberty that use a
> similar name, so this seems reasonable.
> 3.    Should the symbol be called ‘libiberty_lowerpath’ or something
> else, such as ‘libiberty_lowerfilename’?  Does that matter?
> 4.    While the ‘CharLowerBuff()’ code is within a _WIN32 section of
> code, should the global symbol also be defined so it only exists in
> libiberty.a for Windows hosts?  Or should that global symbol always be
> defined in libiberty.a, regardless of whether it is for Windows or
> Linux?  Obviously that symbol is only meaningful for Windows hosts.  But
> the existence (or non-existence)  of the symbol obviously affects how
> tools write their code to modify that symbol.  As presently written,
> tools would only modify this symbol when being built for running on
> Windows.
> 
> Depending on the answers to the above, a followup can show the proposed
> changes to tools, to add options to allow control of changing filenames
> to lower case on Windows.
> 
> If the above change (or something similar) is done to libiberty, then
> GCC should also be changed to add an option for accessing this
> variable.  I would propose adding the '-flowerpath' option, which would
> be the default.  That also adds '-fno-lowerpath', which is what users
> would use to keep their filenames from being converted to lower-case. 
> Most users, of course, could just ignore this option.  And it would only
> be meaningful for Windows-hosted tools.
> 
> Here are the patches that would implement the above:
> 
> diff -raup a/include/libiberty.h b/include/libiberty.h
> --- a/include/libiberty.h    2019-08-07 08:16:05.269694038 -0700
> +++ b/include/libiberty.h    2019-08-07 09:02:05.771381502 -0700
> @@ -750,6 +750,11 @@ extern unsigned long libiberty_len;
>     (char *) memcpy (libiberty_nptr, libiberty_optr, libiberty_len))
>  #endif
> 
> +#if defined (_WIN32)
> +/* Determine if filenames should be converted to lower case */
> +extern unsigned char libiberty_lowerpath;
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff -raup a/libiberty/lrealpath.c b/libiberty/lrealpath.c
> --- a/libiberty/lrealpath.c    2019-08-07 08:16:05.957694774 -0700
> +++ b/libiberty/lrealpath.c    2019-08-07 09:03:03.271408306 -0700
> @@ -72,6 +72,20 @@ extern char *canonicalize_file_name (con
>  # endif
>  #endif
> 
> +/* External option to control whether a pathname is converted
> +   to all lower-case on Windows systems.  The default behavior
> +   is to convert.
> +*/
> +#if defined (_WIN32)
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +unsigned char libiberty_lowerpath = 1;
> +#ifdef __cplusplus
> +}
> +#endif
> +#endif
> +
>  char *
>  lrealpath (const char *filename)
>  {
> @@ -159,10 +173,11 @@ lrealpath (const char *filename)
>        return strdup (filename);
>      else
>        {
> -    /* The file system is case-preserving but case-insensitive,
> -       Canonicalize to lowercase, using the codepage associated
> +    /* The file system is normally case-preserving but case-insensitive,
> +       Canonicalize to lowercase if desired, using the codepage associated
>         with the process locale.  */
> -        CharLowerBuff (buf, len);
> +    if (libiberty_lowerpath)
> +          CharLowerBuff (buf, len);
>          return strdup (buf);
>        }
>    }
> diff -raup a/gcc/common.opt b/gcc/common.opt
> --- a/gcc/common.opt    2019-08-05 09:42:54.889054582 -0700
> +++ b/gcc/common.opt    2019-08-05 10:09:01.708902089 -0700
> @@ -1543,6 +1543,10 @@ floop-nest-optimize
>  Common Report Var(flag_loop_nest_optimize) Optimization
>  Enable the loop nest optimizer.
> 
> +flowerpath
> +Common Report
> +Convert pathnames to all lowercase on Windows (ignore elsewhere).
> +
>  fstrict-volatile-bitfields
>  Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization
>  Force bitfield accesses to match their type width.
> diff -raup a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> --- a/gcc/doc/invoke.texi    2019-08-05 09:42:55.213053535 -0700
> +++ b/gcc/doc/invoke.texi    2019-08-05 10:27:53.749415626 -0700
> @@ -174,7 +174,8 @@ in the following sections.
>  -pass-exit-codes  -pipe  -specs=@var{file}  -wrapper  @gol
>  @@@var{file}  -ffile-prefix-map=@var{old}=@var{new}  @gol
>  -fplugin=@var{file}  -fplugin-arg-@var{name}=@var{arg}  @gol
> --fdump-ada-spec@r{[}-slim@r{]}  -fada-spec-parent=@var{unit}
> -fdump-go-spec=@var{file}}
> +-fdump-ada-spec@r{[}-slim@r{]}  -fada-spec-parent=@var{unit} @gol
> +-fdump-go-spec=@var{file} -fno-lowerpath}
> 
>  @item C Language Options
>  @xref{C Dialect Options,,Options Controlling C Dialect}.
> diff -raup a/gcc/opts.c b/gcc/opts.c
> --- a/gcc/opts.c    2019-08-05 09:42:55.385052980 -0700
> +++ b/gcc/opts.c    2019-08-05 10:09:01.724902054 -0700
> @@ -2412,6 +2412,12 @@ common_handle_option (struct gcc_options
>        opts->x_flag_stack_usage_info = value != 0;
>        break;
> 
> +    case OPT_flowerpath:
> +#ifdef __WIN32
> +      libiberty_lowerpath = value;
> +#endif
> +      break;
> +
>      case OPT_g:
>        set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, arg, opts,
> opts_set,
>                         loc);
So what I dislike here is using global variables to pass around state
between gcc and libiberty.  ISTM that providing a function based API
that libiberty consumers can use would be better.

THoughts?

Jeff
diff mbox series

Patch

diff -raup a/include/libiberty.h b/include/libiberty.h
--- a/include/libiberty.h    2019-08-07 08:16:05.269694038 -0700
+++ b/include/libiberty.h    2019-08-07 09:02:05.771381502 -0700
@@ -750,6 +750,11 @@  extern unsigned long libiberty_len;
     (char *) memcpy (libiberty_nptr, libiberty_optr, libiberty_len))
  #endif

+#if defined (_WIN32)
+/* Determine if filenames should be converted to lower case */
+extern unsigned char libiberty_lowerpath;
+#endif
+
  #ifdef __cplusplus
  }
  #endif
diff -raup a/libiberty/lrealpath.c b/libiberty/lrealpath.c
--- a/libiberty/lrealpath.c    2019-08-07 08:16:05.957694774 -0700
+++ b/libiberty/lrealpath.c    2019-08-07 09:03:03.271408306 -0700
@@ -72,6 +72,20 @@  extern char *canonicalize_file_name (con
  # endif
  #endif

+/* External option to control whether a pathname is converted
+   to all lower-case on Windows systems.  The default behavior
+   is to convert.
+*/
+#if defined (_WIN32)
+#ifdef __cplusplus
+extern "C" {
+#endif
+unsigned char libiberty_lowerpath = 1;
+#ifdef __cplusplus
+}
+#endif
+#endif
+
  char *
  lrealpath (const char *filename)
  {
@@ -159,10 +173,11 @@  lrealpath (const char *filename)
        return strdup (filename);
      else
        {
-    /* The file system is case-preserving but case-insensitive,
-       Canonicalize to lowercase, using the codepage associated
+    /* The file system is normally case-preserving but case-insensitive,
+       Canonicalize to lowercase if desired, using the codepage associated
         with the process locale.  */
-        CharLowerBuff (buf, len);
+    if (libiberty_lowerpath)
+          CharLowerBuff (buf, len);
          return strdup (buf);
        }
    }
diff -raup a/gcc/common.opt b/gcc/common.opt
--- a/gcc/common.opt    2019-08-05 09:42:54.889054582 -0700
+++ b/gcc/common.opt    2019-08-05 10:09:01.708902089 -0700
@@ -1543,6 +1543,10 @@  floop-nest-optimize
  Common Report Var(flag_loop_nest_optimize) Optimization
  Enable the loop nest optimizer.

+flowerpath
+Common Report
+Convert pathnames to all lowercase on Windows (ignore elsewhere).
+
  fstrict-volatile-bitfields
  Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization
  Force bitfield accesses to match their type width.
diff -raup a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
--- a/gcc/doc/invoke.texi    2019-08-05 09:42:55.213053535 -0700
+++ b/gcc/doc/invoke.texi    2019-08-05 10:27:53.749415626 -0700
@@ -174,7 +174,8 @@  in the following sections.
  -pass-exit-codes  -pipe  -specs=@var{file}  -wrapper  @gol
  @@@var{file}  -ffile-prefix-map=@var{old}=@var{new}  @gol
  -fplugin=@var{file}  -fplugin-arg-@var{name}=@var{arg}  @gol
--fdump-ada-spec@r{[}-slim@r{]}  -fada-spec-parent=@var{unit} 
-fdump-go-spec=@var{file}}
+-fdump-ada-spec@r{[}-slim@r{]}  -fada-spec-parent=@var{unit} @gol
+-fdump-go-spec=@var{file} -fno-lowerpath}

  @item C Language Options
  @xref{C Dialect Options,,Options Controlling C Dialect}.
diff -raup a/gcc/opts.c b/gcc/opts.c
--- a/gcc/opts.c    2019-08-05 09:42:55.385052980 -0700
+++ b/gcc/opts.c    2019-08-05 10:09:01.724902054 -0700
@@ -2412,6 +2412,12 @@  common_handle_option (struct gcc_options
        opts->x_flag_stack_usage_info = value != 0;
        break;

+    case OPT_flowerpath:
+#ifdef __WIN32
+      libiberty_lowerpath = value;
+#endif
+      break;
+
      case OPT_g:
        set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, arg, opts, 
opts_set,