diff mbox

Fix genmultilib reuse rule checks for large sets of option combinations

Message ID alpine.DEB.2.20.1706082027370.28164@digraph.polyomino.org.uk
State New
Headers show

Commit Message

Joseph Myers June 8, 2017, 8:28 p.m. UTC
genmultilib computes combination_space, a list of all combinations of
options in MULTILIB_OPTIONS that might have multilibs built for them
(some of which may end up not having multilibs built for them, and
some of those may end up being mapped to other multilibs with
MULTILIB_REUSE).  It is then used to validate the right hand part of
MULTILIB_REUSE rules, checking with expr that combination_space
matches a basic regular expression derived from that right hand part.

There are two problems with this approach to validation:

* It requires that right hand part to have options in the same order
  as in MULTILIB_OPTIONS, in contradiction to the documentation of
  MULTILIB_REUSE saying that order does not matter there.

* combination_space can be so large that the expr call fails with an
  E2BIG error.  I have a local ARM configuration with 40 multilibs but
  3840 combinations of options from MULTILIB_OPTIONS (so 3839 listed
  in combination_space, since it doesn't list the default multilib)
  and 996 MULTILIB_REUSE rules.  This generates a combination_space
  string longer than the Linux kernel's MAX_ARG_STRLEN (PAGE_SIZE *
  32, the limit on the length of a single argv string), so that expr
  cannot be run.

This patch changes the validation approach to generate a much shorter
extended regular expression for any sequence of multilib options in
any order, and uses that for the validation instead.

Tested with a built for arm-none-eabi --with-multilib-list=aprofile
(as a configuration that uses MULTILIB_REUSE).

2017-06-08  Joseph Myers  <joseph@codesourcery.com>

	* genmultilib (combination_space): Remove variable.
	Validate reuse rules against regular expression for any sequence
	of multilib options in any order.

Comments

Joseph Myers June 14, 2017, 3:09 p.m. UTC | #1
Ping.  This patch 
<https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00562.html> is pending 
review.
Joseph Myers June 27, 2017, 3:47 p.m. UTC | #2
Ping^2.  This patch 
<https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00562.html> is still 
pending review.
Jeff Law June 27, 2017, 4:14 p.m. UTC | #3
On 06/08/2017 02:28 PM, Joseph Myers wrote:
> genmultilib computes combination_space, a list of all combinations of
> options in MULTILIB_OPTIONS that might have multilibs built for them
> (some of which may end up not having multilibs built for them, and
> some of those may end up being mapped to other multilibs with
> MULTILIB_REUSE).  It is then used to validate the right hand part of
> MULTILIB_REUSE rules, checking with expr that combination_space
> matches a basic regular expression derived from that right hand part.
> 
> There are two problems with this approach to validation:
> 
> * It requires that right hand part to have options in the same order
>   as in MULTILIB_OPTIONS, in contradiction to the documentation of
>   MULTILIB_REUSE saying that order does not matter there.
> 
> * combination_space can be so large that the expr call fails with an
>   E2BIG error.  I have a local ARM configuration with 40 multilibs but
>   3840 combinations of options from MULTILIB_OPTIONS (so 3839 listed
>   in combination_space, since it doesn't list the default multilib)
>   and 996 MULTILIB_REUSE rules.  This generates a combination_space
>   string longer than the Linux kernel's MAX_ARG_STRLEN (PAGE_SIZE *
>   32, the limit on the length of a single argv string), so that expr
>   cannot be run.
> 
> This patch changes the validation approach to generate a much shorter
> extended regular expression for any sequence of multilib options in
> any order, and uses that for the validation instead.
> 
> Tested with a built for arm-none-eabi --with-multilib-list=aprofile
> (as a configuration that uses MULTILIB_REUSE).
> 
> 2017-06-08  Joseph Myers  <joseph@codesourcery.com>
> 
> 	* genmultilib (combination_space): Remove variable.
> 	Validate reuse rules against regular expression for any sequence
> 	of multilib options in any order.
Going to trust you on this :-)  regexps are far from my sweet spot.


jeff
Christophe Lyon June 28, 2017, 8:03 a.m. UTC | #4
Hi Joseph,

On 8 June 2017 at 22:28, Joseph Myers <joseph@codesourcery.com> wrote:
> genmultilib computes combination_space, a list of all combinations of
> options in MULTILIB_OPTIONS that might have multilibs built for them
> (some of which may end up not having multilibs built for them, and
> some of those may end up being mapped to other multilibs with
> MULTILIB_REUSE).  It is then used to validate the right hand part of
> MULTILIB_REUSE rules, checking with expr that combination_space
> matches a basic regular expression derived from that right hand part.
>
> There are two problems with this approach to validation:
>
> * It requires that right hand part to have options in the same order
>   as in MULTILIB_OPTIONS, in contradiction to the documentation of
>   MULTILIB_REUSE saying that order does not matter there.
>
> * combination_space can be so large that the expr call fails with an
>   E2BIG error.  I have a local ARM configuration with 40 multilibs but
>   3840 combinations of options from MULTILIB_OPTIONS (so 3839 listed
>   in combination_space, since it doesn't list the default multilib)
>   and 996 MULTILIB_REUSE rules.  This generates a combination_space
>   string longer than the Linux kernel's MAX_ARG_STRLEN (PAGE_SIZE *
>   32, the limit on the length of a single argv string), so that expr
>   cannot be run.
>
> This patch changes the validation approach to generate a much shorter
> extended regular expression for any sequence of multilib options in
> any order, and uses that for the validation instead.
>
> Tested with a built for arm-none-eabi --with-multilib-list=aprofile
> (as a configuration that uses MULTILIB_REUSE).
>
> 2017-06-08  Joseph Myers  <joseph@codesourcery.com>
>
>         * genmultilib (combination_space): Remove variable.
>         Validate reuse rules against regular expression for any sequence
>         of multilib options in any order.
>
> Index: gcc/genmultilib
> ===================================================================
> --- gcc/genmultilib     (revision 249028)
> +++ gcc/genmultilib     (working copy)
> @@ -186,8 +186,7 @@
>  EOF
>  chmod +x tmpmultilib
>
> -combination_space=`initial=/ ./tmpmultilib ${options}`
> -combinations="$combination_space"
> +combinations=`initial=/ ./tmpmultilib ${options}`
>
>  # If there exceptions, weed them out now
>  if [ -n "${exceptions}" ]; then
> @@ -460,6 +459,15 @@
>  echo "NULL"
>  echo "};"
>
> +# Generate a regular expression to validate option combinations.
> +options_re=
> +for set in ${options}; do
> +  for opt in `echo ${set} | sed -e 's_[/|]_ _g'`; do
> +    options_re="${options_re}${options_re:+|}${opt}"
> +  done
> +done
> +options_re="^/((${options_re})/)*\$"
> +
>  # Output rules used for multilib reuse.
>  echo ""
>  echo "static const char *const multilib_reuse_raw[] = {"
> @@ -473,7 +481,7 @@
>    # in this variable, it means no multilib will be built for current reuse
>    # rule.  Thus the reuse purpose specified by current rule is meaningless.
>    if expr "${combinations} " : ".*/${combo}/.*" > /dev/null; then
> -    if expr "${combination_space} " : ".*/${copts}/.*" > /dev/null; then
> +    if echo "/${copts}/" | grep -E "${options_re}" > /dev/null; then
>        combo="/${combo}/"
>        dirout=`./tmpmultilib3 "${combo}" "${todirnames}" "${toosdirnames}" "${enable_multilib}"`
>        copts="/${copts}/"
>

This broke my builds, where I do not use
--with-multilib-list=aprofile, and uses the default.
I suspect it would also fail for the aprofile multilibs, though.

I think there's a problem with options that have a '+' which confuses
the regexp in options_re.
For instance, I get this error message:
The rule mthumb=mthumb/mfpu.auto/march.armv5te+fp contains an option
absent from MULTILIB_OPTIONS.

A bit of manual debugging led me to:
$ echo /mthumb/mfpu=auto/march=armv5te+fp/ | grep -E
'^/((marm|mthumb|mfpu=auto|march=armv5te+fp|march=armv7+fp|mfloat-abi=hard)/)*$'
[empty result]

Replacing the '+' in armv5te+fp with either '\+' or '.' allows the
pattern to match:
/mthumb/mfpu=auto/march=armv5te+fp/

Is it a matter of adding sed -e 's/\+/./g' when building options_re ?
Or would this break something else?

Thanks,

Christophe

> --
> Joseph S. Myers
> joseph@codesourcery.com
diff mbox

Patch

Index: gcc/genmultilib
===================================================================
--- gcc/genmultilib	(revision 249028)
+++ gcc/genmultilib	(working copy)
@@ -186,8 +186,7 @@ 
 EOF
 chmod +x tmpmultilib
 
-combination_space=`initial=/ ./tmpmultilib ${options}`
-combinations="$combination_space"
+combinations=`initial=/ ./tmpmultilib ${options}`
 
 # If there exceptions, weed them out now
 if [ -n "${exceptions}" ]; then
@@ -460,6 +459,15 @@ 
 echo "NULL"
 echo "};"
 
+# Generate a regular expression to validate option combinations.
+options_re=
+for set in ${options}; do
+  for opt in `echo ${set} | sed -e 's_[/|]_ _g'`; do
+    options_re="${options_re}${options_re:+|}${opt}"
+  done
+done
+options_re="^/((${options_re})/)*\$"
+
 # Output rules used for multilib reuse.
 echo ""
 echo "static const char *const multilib_reuse_raw[] = {"
@@ -473,7 +481,7 @@ 
   # in this variable, it means no multilib will be built for current reuse
   # rule.  Thus the reuse purpose specified by current rule is meaningless.
   if expr "${combinations} " : ".*/${combo}/.*" > /dev/null; then
-    if expr "${combination_space} " : ".*/${copts}/.*" > /dev/null; then
+    if echo "/${copts}/" | grep -E "${options_re}" > /dev/null; then
       combo="/${combo}/"
       dirout=`./tmpmultilib3 "${combo}" "${todirnames}" "${toosdirnames}" "${enable_multilib}"`
       copts="/${copts}/"