diff mbox series

[v3,1/2] scripts: Add sort-makefile-lines.py to sort Makefile variables.

Message ID 20230510115815.2464940-2-carlos@redhat.com
State New
Headers show
Series Standardize the sorting of Makefile variables | expand

Commit Message

Carlos O'Donell May 10, 2023, 11:58 a.m. UTC
The scripts/sort-makefile-lines.py script sorts Makefile variables
according to project expected order.

The script can be used like this:

$ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
$ mv elf/Makefile.tmp elf/Makefile
---
v2->v3: Use stdin/stdout.

 scripts/sort-makefile-lines.py | 160 +++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)
 create mode 100755 scripts/sort-makefile-lines.py

Comments

Alejandro Colomar May 10, 2023, 12:42 p.m. UTC | #1
Hi Carlos!


On 5/10/23 13:59, Carlos O'Donell via Libc-alpha wrote:
> v3 sent with stdin/stdout. Thanks for the review! 😄

You're welcome :-)


On 5/10/23 13:58, Carlos O'Donell via Libc-alpha wrote:
> The scripts/sort-makefile-lines.py script sorts Makefile variables
> according to project expected order.
> 
> The script can be used like this:
> 
> $ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
> $ mv elf/Makefile.tmp elf/Makefile
> ---
> v2->v3: Use stdin/stdout.
> 
>  scripts/sort-makefile-lines.py | 160 +++++++++++++++++++++++++++++++++
>  1 file changed, 160 insertions(+)
>  create mode 100755 scripts/sort-makefile-lines.py
> 
> diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
> new file mode 100755
> index 0000000000..fd657df970
> --- /dev/null
> +++ b/scripts/sort-makefile-lines.py
> @@ -0,0 +1,160 @@
> +#!/usr/bin/python3
> +# Sort Makefile lines as expected by project policy.
> +# Copyright (C) 2023 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +# The project consensus is to split Makefile variable assignment
> +# across multiple lines with one value per line.  The values are
> +# then sorted as described below, and terminated with a special
> +# list termination marker.  This splitting makes it much easier
> +# to add new tests to the list since they become just a single
> +# line insertion.  It also makes backports and merges easier
> +# since the new test may not conflict due to the ordering.
> +#
> +# Consensus discussion:
> +# https://inbox.sourceware.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
> +#
> +# To support cleaning up Makefiles we created this program to
> +# help sort existing lists converted to the new format.
> +#
> +# The program takes as input the Makefile to sort correctly,
> +# and the output file to write the correctly sorted output
> +# (it can be the same file).
> +#
> +# Sorting is only carried out between two special markers:
> +# (a) Marker start is '<variable> += \' (or '= \', or ':= \')
> +# (b) Marker end is '  # <variable>' (whitespace matters)
> +# With everthing between (a) and (b) being sorted accordingly.
> +#
> +# You can use it like this:
> +# $ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
> +# $ mv elf/Makefile.tmp elf/Makefile
> +#
> +# The Makefile lines in the project are sorted using the
> +# following rules:
> +# - All lines are sorted as-if `LC_COLLATE=C sort`
> +# - Lines that have a numeric suffix and whose leading prefix
> +#   matches exactly are sorted according the numeric suffix
> +#   in increasing numerical order.
> +#
> +# For example:
> +# ~~~
> +# tests += \
> +#   test-a \
> +#   test-b \
> +#   test-b1 \
> +#   test-b2 \
> +#   test-b10 \
> +#   test-b20 \
> +#   test-b100 \
> +#   # tests
> +# ~~~
> +# This example shows tests sorted alphabetically, followed
> +# by a numeric suffix sort in increasing numeric order.
> +#
> +# Cleanups:
> +# - Tests that end in "a" or "b" variants should be renamed to
> +#   end in just the numerical value. For example 'tst-mutex7robust'
> +#   should be renamed to 'tst-mutex12' (the highest numbered test)
> +#   or 'tst-robust11' (the highest numbered test) in order to get
> +#   reasonable ordering.
> +# - Modules that end in "mod" or "mod1" should be renamed. For
> +#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
> +#   (test module for atfork2). If there are more than one module
> +#   then they should be named with a suffix that uses [0-9] first
> +#   then [A-Z] next for a total of 36 possible modules per test.
> +#   No manually listed test currently uses more than that (though
> +#   automatically generated tests may; they don't need sorting).
> +# - Avoid including another test and instead refactor into common
> +#   code with all tests including hte common code, then give the
> +#   tests unique names.
> +#
> +# If you have a Makefile that needs converting, then you can
> +# quickly split the values into one-per-line, ensure the start
> +# and end markers are in place, and then run the script to
> +# sort the values.
> +
> +import sys
> +import locale
> +import re
> +import functools
> +
> +def glibc_makefile_numeric(string1, string2):
> +    # Check if string1 has a numeric suffix.
> +    var1 = re.search(r'([0-9]+) \\$', string1)
> +    var2 = re.search(r'([0-9]+) \\$', string2)
> +    if var1 and var2:
> +        if string1[0:var1.span()[0]] == string2[0:var2.span()[0]]:
> +            # string1 and string2 both share a prefix and
> +            # have a numeric suffix that can be compared.
> +            # Sort order is based on the numeric suffix.
> +            return int(var1.group(1)) > int(var2.group(1))
> +    # Default to strcoll.
> +    return locale.strcoll(string1, string2)
> +
> +def sort_lines(lines):
> +
> +    # Use the C locale for language independent collation.
> +    locale.setlocale (locale.LC_ALL, "C")
> +
> +    # Sort using a glibc-specific sorting function.
> +    lines = sorted(lines, key=functools.cmp_to_key(glibc_makefile_numeric))

I believe you're looking for a version sort(1).

$ cat s
bar20x
foo1
bar200
bar19
foo2
bar20
bar2

$ sort -V s
bar2
bar19
bar20
bar20x
bar200
foo1
foo2


I don't know how easy it is to call sort(1) in this script, but it
would probably simplify a big part of it.  And it would also make
unnecessary the renaming of things like 'tst-mutex7robust'.

If using `sort -V` here is complex, maybe it's easier to write a
shell script.

Cheers,
Alex


> +
> +    return lines
> +
> +def sort_makefile_lines():
> +
> +    # Read the whole Makefile.
> +    lines = sys.stdin.readlines()
> +
> +    # Build a list of all start markers (tuple includes name).
> +    startmarks = []
> +    for i in range(len(lines)):
> +        # Look for things like "var = \", "var := \" or "var += \"
> +        # to start the sorted list.
> +        var = re.search(r'^([a-zA-Z0-9-]*) [\+:]?\= \\$', lines[i])
> +        if var:
> +            # Remember the index and the name.
> +            startmarks.append((i, var.group(1)))
> +
> +    # For each start marker try to find a matching end mark
> +    # and build a block that needs sorting.  The end marker
> +    # must have the matching comment name for it to be valid.
> +    rangemarks = []
> +    for sm in startmarks:
> +        # Look for things like "  # var" to end the sorted list.
> +        reg = r'^  # ' + sm[1] + r'$'
> +        for j in range(sm[0] + 1, len(lines)):
> +            if re.search(reg, lines[j]):
> +                # Rembember the block to sort (inclusive).
> +                rangemarks.append((sm[0] + 1, j))
> +                break
> +
> +    # We now have a list of all ranges that need sorting.
> +    # Sort those ranges (inclusive).
> +    for r in rangemarks:
> +        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
> +
> +    # Output the whole list with sorted lines to stdout.
> +    [sys.stdout.write(line) for line in lines]
> +
> +
> +def main(argv):
> +    sort_makefile_lines ()
> +
> +if __name__ == '__main__':
> +    main(sys.argv[1:])
Carlos O'Donell May 10, 2023, 3:07 p.m. UTC | #2
On 5/10/23 08:42, Alejandro Colomar wrote:
> I believe you're looking for a version sort(1).

Yes, sort -V does work, the downside is the integration to handle the blocks
of text in the Makefile.

> I don't know how easy it is to call sort(1) in this script, but it
> would probably simplify a big part of it.  And it would also make
> unnecessary the renaming of things like 'tst-mutex7robust'.

You don't *need* to rename any tests in the new version of the code, I took
out the error case and just sort as the new function sorts (like sort -V does).

> If using `sort -V` here is complex, maybe it's easier to write a
> shell script.

I would want to use bash with arrays to handle lines and processing the start
and end blocks. It could be possible to do it in an equivalent number of lines,
but I think the standalone python is easier to read and maintain (less shell
quoting issues with lines of text).

Any objection to the proposed python?
Alejandro Colomar May 10, 2023, 3:12 p.m. UTC | #3
Hi Carlos,

On 5/10/23 17:07, Carlos O'Donell wrote:
> On 5/10/23 08:42, Alejandro Colomar wrote:
>> I believe you're looking for a version sort(1).
> 
> Yes, sort -V does work, the downside is the integration to handle the blocks
> of text in the Makefile.

Yup, I've been thinking about it, and a pipeline seems non-obvious.
It would need some bash magic, or maybe some non-trivial awk(1) or perl(1).

> 
>> I don't know how easy it is to call sort(1) in this script, but it
>> would probably simplify a big part of it.  And it would also make
>> unnecessary the renaming of things like 'tst-mutex7robust'.
> 
> You don't *need* to rename any tests in the new version of the code, I took
> out the error case and just sort as the new function sorts (like sort -V does).
> 
>> If using `sort -V` here is complex, maybe it's easier to write a
>> shell script.
> 
> I would want to use bash with arrays to handle lines and processing the start
> and end blocks. It could be possible to do it in an equivalent number of lines,
> but I think the standalone python is easier to read and maintain (less shell
> quoting issues with lines of text).
> 
> Any objection to the proposed python?

Not really.  I was just suggesting it case it didn't occur to you.
Maybe a small objection would be in the naming of the functions.
If it sorts as if `sort -V`, how about calling it something like
version_sort?  And maybe a comment stating it behaves as if `sort -V`?

Other than that, I'm fine with it :)

Cheers,
Alex

>
Siddhesh Poyarekar May 10, 2023, 3:49 p.m. UTC | #4
On 2023-05-10 07:58, Carlos O'Donell wrote:
> The scripts/sort-makefile-lines.py script sorts Makefile variables
> according to project expected order.
> 
> The script can be used like this:
> 
> $ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
> $ mv elf/Makefile.tmp elf/Makefile
> ---
> v2->v3: Use stdin/stdout.
> 
>   scripts/sort-makefile-lines.py | 160 +++++++++++++++++++++++++++++++++
>   1 file changed, 160 insertions(+)
>   create mode 100755 scripts/sort-makefile-lines.py

LGTM.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

> 
> diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
> new file mode 100755
> index 0000000000..fd657df970
> --- /dev/null
> +++ b/scripts/sort-makefile-lines.py
> @@ -0,0 +1,160 @@
> +#!/usr/bin/python3
> +# Sort Makefile lines as expected by project policy.
> +# Copyright (C) 2023 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +# The project consensus is to split Makefile variable assignment
> +# across multiple lines with one value per line.  The values are
> +# then sorted as described below, and terminated with a special
> +# list termination marker.  This splitting makes it much easier
> +# to add new tests to the list since they become just a single
> +# line insertion.  It also makes backports and merges easier
> +# since the new test may not conflict due to the ordering.
> +#
> +# Consensus discussion:
> +# https://inbox.sourceware.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
> +#
> +# To support cleaning up Makefiles we created this program to
> +# help sort existing lists converted to the new format.
> +#
> +# The program takes as input the Makefile to sort correctly,
> +# and the output file to write the correctly sorted output
> +# (it can be the same file).
> +#
> +# Sorting is only carried out between two special markers:
> +# (a) Marker start is '<variable> += \' (or '= \', or ':= \')
> +# (b) Marker end is '  # <variable>' (whitespace matters)
> +# With everthing between (a) and (b) being sorted accordingly.
> +#
> +# You can use it like this:
> +# $ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
> +# $ mv elf/Makefile.tmp elf/Makefile
> +#
> +# The Makefile lines in the project are sorted using the
> +# following rules:
> +# - All lines are sorted as-if `LC_COLLATE=C sort`
> +# - Lines that have a numeric suffix and whose leading prefix
> +#   matches exactly are sorted according the numeric suffix
> +#   in increasing numerical order.
> +#
> +# For example:
> +# ~~~
> +# tests += \
> +#   test-a \
> +#   test-b \
> +#   test-b1 \
> +#   test-b2 \
> +#   test-b10 \
> +#   test-b20 \
> +#   test-b100 \
> +#   # tests
> +# ~~~
> +# This example shows tests sorted alphabetically, followed
> +# by a numeric suffix sort in increasing numeric order.
> +#
> +# Cleanups:
> +# - Tests that end in "a" or "b" variants should be renamed to
> +#   end in just the numerical value. For example 'tst-mutex7robust'
> +#   should be renamed to 'tst-mutex12' (the highest numbered test)
> +#   or 'tst-robust11' (the highest numbered test) in order to get
> +#   reasonable ordering.
> +# - Modules that end in "mod" or "mod1" should be renamed. For
> +#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
> +#   (test module for atfork2). If there are more than one module
> +#   then they should be named with a suffix that uses [0-9] first
> +#   then [A-Z] next for a total of 36 possible modules per test.
> +#   No manually listed test currently uses more than that (though
> +#   automatically generated tests may; they don't need sorting).
> +# - Avoid including another test and instead refactor into common
> +#   code with all tests including hte common code, then give the
> +#   tests unique names.
> +#
> +# If you have a Makefile that needs converting, then you can
> +# quickly split the values into one-per-line, ensure the start
> +# and end markers are in place, and then run the script to
> +# sort the values.
> +
> +import sys
> +import locale
> +import re
> +import functools
> +
> +def glibc_makefile_numeric(string1, string2):
> +    # Check if string1 has a numeric suffix.
> +    var1 = re.search(r'([0-9]+) \\$', string1)
> +    var2 = re.search(r'([0-9]+) \\$', string2)
> +    if var1 and var2:
> +        if string1[0:var1.span()[0]] == string2[0:var2.span()[0]]:
> +            # string1 and string2 both share a prefix and
> +            # have a numeric suffix that can be compared.
> +            # Sort order is based on the numeric suffix.
> +            return int(var1.group(1)) > int(var2.group(1))
> +    # Default to strcoll.
> +    return locale.strcoll(string1, string2)
> +
> +def sort_lines(lines):
> +
> +    # Use the C locale for language independent collation.
> +    locale.setlocale (locale.LC_ALL, "C")
> +
> +    # Sort using a glibc-specific sorting function.
> +    lines = sorted(lines, key=functools.cmp_to_key(glibc_makefile_numeric))
> +
> +    return lines
> +
> +def sort_makefile_lines():
> +
> +    # Read the whole Makefile.
> +    lines = sys.stdin.readlines()
> +
> +    # Build a list of all start markers (tuple includes name).
> +    startmarks = []
> +    for i in range(len(lines)):
> +        # Look for things like "var = \", "var := \" or "var += \"
> +        # to start the sorted list.
> +        var = re.search(r'^([a-zA-Z0-9-]*) [\+:]?\= \\$', lines[i])
> +        if var:
> +            # Remember the index and the name.
> +            startmarks.append((i, var.group(1)))
> +
> +    # For each start marker try to find a matching end mark
> +    # and build a block that needs sorting.  The end marker
> +    # must have the matching comment name for it to be valid.
> +    rangemarks = []
> +    for sm in startmarks:
> +        # Look for things like "  # var" to end the sorted list.
> +        reg = r'^  # ' + sm[1] + r'$'
> +        for j in range(sm[0] + 1, len(lines)):
> +            if re.search(reg, lines[j]):
> +                # Rembember the block to sort (inclusive).
> +                rangemarks.append((sm[0] + 1, j))
> +                break
> +
> +    # We now have a list of all ranges that need sorting.
> +    # Sort those ranges (inclusive).
> +    for r in rangemarks:
> +        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
> +
> +    # Output the whole list with sorted lines to stdout.
> +    [sys.stdout.write(line) for line in lines]
> +
> +
> +def main(argv):
> +    sort_makefile_lines ()
> +
> +if __name__ == '__main__':
> +    main(sys.argv[1:])
Carlos O'Donell May 10, 2023, 5:17 p.m. UTC | #5
On 5/10/23 11:49, Siddhesh Poyarekar wrote:
> On 2023-05-10 07:58, Carlos O'Donell wrote:
>> The scripts/sort-makefile-lines.py script sorts Makefile variables
>> according to project expected order.
>>
>> The script can be used like this:
>>
>> $ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
>> $ mv elf/Makefile.tmp elf/Makefile
>> ---
>> v2->v3: Use stdin/stdout.
>>
>>   scripts/sort-makefile-lines.py | 160 +++++++++++++++++++++++++++++++++
>>   1 file changed, 160 insertions(+)
>>   create mode 100755 scripts/sort-makefile-lines.py
> 
> LGTM.
> 
> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

Thanks! Pushed.
 
>>
>> diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
>> new file mode 100755
>> index 0000000000..fd657df970
>> --- /dev/null
>> +++ b/scripts/sort-makefile-lines.py
>> @@ -0,0 +1,160 @@
>> +#!/usr/bin/python3
>> +# Sort Makefile lines as expected by project policy.
>> +# Copyright (C) 2023 Free Software Foundation, Inc.
>> +# This file is part of the GNU C Library.
>> +#
>> +# The GNU C Library is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU Lesser General Public
>> +# License as published by the Free Software Foundation; either
>> +# version 2.1 of the License, or (at your option) any later version.
>> +#
>> +# The GNU C Library is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +# Lesser General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU Lesser General Public
>> +# License along with the GNU C Library; if not, see
>> +# <https://www.gnu.org/licenses/>.
>> +
>> +# The project consensus is to split Makefile variable assignment
>> +# across multiple lines with one value per line.  The values are
>> +# then sorted as described below, and terminated with a special
>> +# list termination marker.  This splitting makes it much easier
>> +# to add new tests to the list since they become just a single
>> +# line insertion.  It also makes backports and merges easier
>> +# since the new test may not conflict due to the ordering.
>> +#
>> +# Consensus discussion:
>> +# https://inbox.sourceware.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
>> +#
>> +# To support cleaning up Makefiles we created this program to
>> +# help sort existing lists converted to the new format.
>> +#
>> +# The program takes as input the Makefile to sort correctly,
>> +# and the output file to write the correctly sorted output
>> +# (it can be the same file).
>> +#
>> +# Sorting is only carried out between two special markers:
>> +# (a) Marker start is '<variable> += \' (or '= \', or ':= \')
>> +# (b) Marker end is '  # <variable>' (whitespace matters)
>> +# With everthing between (a) and (b) being sorted accordingly.
>> +#
>> +# You can use it like this:
>> +# $ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
>> +# $ mv elf/Makefile.tmp elf/Makefile
>> +#
>> +# The Makefile lines in the project are sorted using the
>> +# following rules:
>> +# - All lines are sorted as-if `LC_COLLATE=C sort`
>> +# - Lines that have a numeric suffix and whose leading prefix
>> +#   matches exactly are sorted according the numeric suffix
>> +#   in increasing numerical order.
>> +#
>> +# For example:
>> +# ~~~
>> +# tests += \
>> +#   test-a \
>> +#   test-b \
>> +#   test-b1 \
>> +#   test-b2 \
>> +#   test-b10 \
>> +#   test-b20 \
>> +#   test-b100 \
>> +#   # tests
>> +# ~~~
>> +# This example shows tests sorted alphabetically, followed
>> +# by a numeric suffix sort in increasing numeric order.
>> +#
>> +# Cleanups:
>> +# - Tests that end in "a" or "b" variants should be renamed to
>> +#   end in just the numerical value. For example 'tst-mutex7robust'
>> +#   should be renamed to 'tst-mutex12' (the highest numbered test)
>> +#   or 'tst-robust11' (the highest numbered test) in order to get
>> +#   reasonable ordering.
>> +# - Modules that end in "mod" or "mod1" should be renamed. For
>> +#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
>> +#   (test module for atfork2). If there are more than one module
>> +#   then they should be named with a suffix that uses [0-9] first
>> +#   then [A-Z] next for a total of 36 possible modules per test.
>> +#   No manually listed test currently uses more than that (though
>> +#   automatically generated tests may; they don't need sorting).
>> +# - Avoid including another test and instead refactor into common
>> +#   code with all tests including hte common code, then give the
>> +#   tests unique names.
>> +#
>> +# If you have a Makefile that needs converting, then you can
>> +# quickly split the values into one-per-line, ensure the start
>> +# and end markers are in place, and then run the script to
>> +# sort the values.
>> +
>> +import sys
>> +import locale
>> +import re
>> +import functools
>> +
>> +def glibc_makefile_numeric(string1, string2):
>> +    # Check if string1 has a numeric suffix.
>> +    var1 = re.search(r'([0-9]+) \\$', string1)
>> +    var2 = re.search(r'([0-9]+) \\$', string2)
>> +    if var1 and var2:
>> +        if string1[0:var1.span()[0]] == string2[0:var2.span()[0]]:
>> +            # string1 and string2 both share a prefix and
>> +            # have a numeric suffix that can be compared.
>> +            # Sort order is based on the numeric suffix.
>> +            return int(var1.group(1)) > int(var2.group(1))
>> +    # Default to strcoll.
>> +    return locale.strcoll(string1, string2)
>> +
>> +def sort_lines(lines):
>> +
>> +    # Use the C locale for language independent collation.
>> +    locale.setlocale (locale.LC_ALL, "C")
>> +
>> +    # Sort using a glibc-specific sorting function.
>> +    lines = sorted(lines, key=functools.cmp_to_key(glibc_makefile_numeric))
>> +
>> +    return lines
>> +
>> +def sort_makefile_lines():
>> +
>> +    # Read the whole Makefile.
>> +    lines = sys.stdin.readlines()
>> +
>> +    # Build a list of all start markers (tuple includes name).
>> +    startmarks = []
>> +    for i in range(len(lines)):
>> +        # Look for things like "var = \", "var := \" or "var += \"
>> +        # to start the sorted list.
>> +        var = re.search(r'^([a-zA-Z0-9-]*) [\+:]?\= \\$', lines[i])
>> +        if var:
>> +            # Remember the index and the name.
>> +            startmarks.append((i, var.group(1)))
>> +
>> +    # For each start marker try to find a matching end mark
>> +    # and build a block that needs sorting.  The end marker
>> +    # must have the matching comment name for it to be valid.
>> +    rangemarks = []
>> +    for sm in startmarks:
>> +        # Look for things like "  # var" to end the sorted list.
>> +        reg = r'^  # ' + sm[1] + r'$'
>> +        for j in range(sm[0] + 1, len(lines)):
>> +            if re.search(reg, lines[j]):
>> +                # Rembember the block to sort (inclusive).
>> +                rangemarks.append((sm[0] + 1, j))
>> +                break
>> +
>> +    # We now have a list of all ranges that need sorting.
>> +    # Sort those ranges (inclusive).
>> +    for r in rangemarks:
>> +        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
>> +
>> +    # Output the whole list with sorted lines to stdout.
>> +    [sys.stdout.write(line) for line in lines]
>> +
>> +
>> +def main(argv):
>> +    sort_makefile_lines ()
>> +
>> +if __name__ == '__main__':
>> +    main(sys.argv[1:])
>
Carlos O'Donell May 10, 2023, 5:30 p.m. UTC | #6
On 5/10/23 11:12, Alejandro Colomar wrote:
> Hi Carlos,
> 
> On 5/10/23 17:07, Carlos O'Donell wrote:
>> On 5/10/23 08:42, Alejandro Colomar wrote:
>>> I believe you're looking for a version sort(1).
>>
>> Yes, sort -V does work, the downside is the integration to handle the blocks
>> of text in the Makefile.
> 
> Yup, I've been thinking about it, and a pipeline seems non-obvious.
> It would need some bash magic, or maybe some non-trivial awk(1) or perl(1).

Exactly, I'd use bash arrays.

>>
>>> I don't know how easy it is to call sort(1) in this script, but it
>>> would probably simplify a big part of it.  And it would also make
>>> unnecessary the renaming of things like 'tst-mutex7robust'.
>>
>> You don't *need* to rename any tests in the new version of the code, I took
>> out the error case and just sort as the new function sorts (like sort -V does).
>>
>>> If using `sort -V` here is complex, maybe it's easier to write a
>>> shell script.
>>
>> I would want to use bash with arrays to handle lines and processing the start
>> and end blocks. It could be possible to do it in an equivalent number of lines,
>> but I think the standalone python is easier to read and maintain (less shell
>> quoting issues with lines of text).
>>
>> Any objection to the proposed python?
> 
> Not really.  I was just suggesting it case it didn't occur to you.
> Maybe a small objection would be in the naming of the functions.
> If it sorts as if `sort -V`, how about calling it something like
> version_sort?  And maybe a comment stating it behaves as if `sort -V`?

It doesn't sort exactly like `sort -V` because of the dashes.

$ cat sort-dash.txt 
0
-
1
2
$ LC_COLLATE=C sort sort-dash.txt 
-
0
1
2
$ sort -V sort-dash.txt 
0
1
2
-

Consensus was to use something *like* `LC_COLLATE=C sort` (code point sort order) but
with trailing numeric sort.

The above shows that `sort -V` treats U+002D ('-') differently from the code-point sort
order and sorts it after U+0030 ('0')

Coreutils itself identifies there is no standard version sort:
https://www.gnu.org/software/coreutils/manual/html_node/Variations-in-version-sort-order.html

With the python implementation we get a stable sort that the project agrees is the way to
sort variables.

Lastly I want to avoid dependencies on things like Python's natsort, which are outside the
minimum python we use in glibc to facilitate a bootstrap.

> Other than that, I'm fine with it :)

Thank you again for looking it over. Your suggestions helped me delete code and make it
simpler overall!
Alejandro Colomar May 10, 2023, 6:33 p.m. UTC | #7
Hi Carlos,

On 5/10/23 19:30, Carlos O'Donell wrote:
[...]

>> Yup, I've been thinking about it, and a pipeline seems non-obvious.
>> It would need some bash magic, or maybe some non-trivial awk(1) or perl(1).
> 
> Exactly, I'd use bash arrays.
> 

[...]

>>> Any objection to the proposed python?
>>
>> Not really.  I was just suggesting it case it didn't occur to you.
>> Maybe a small objection would be in the naming of the functions.
>> If it sorts as if `sort -V`, how about calling it something like
>> version_sort?  And maybe a comment stating it behaves as if `sort -V`?
> 
> It doesn't sort exactly like `sort -V` because of the dashes.
> 
> $ cat sort-dash.txt 
> 0
> -
> 1
> 2
> $ LC_COLLATE=C sort sort-dash.txt 
> -
> 0
> 1
> 2
> $ sort -V sort-dash.txt 
> 0
> 1
> 2
> -
> 
> Consensus was to use something *like* `LC_COLLATE=C sort` (code point sort order) but
> with trailing numeric sort.
> 
> The above shows that `sort -V` treats U+002D ('-') differently from the code-point sort
> order and sorts it after U+0030 ('0')
> 
> Coreutils itself identifies there is no standard version sort:
> https://www.gnu.org/software/coreutils/manual/html_node/Variations-in-version-sort-order.html
> 
> With the python implementation we get a stable sort that the project agrees is the way to
> sort variables.
> 
> Lastly I want to avoid dependencies on things like Python's natsort, which are outside the
> minimum python we use in glibc to facilitate a bootstrap.

Sounds reasonable.

> 
>> Other than that, I'm fine with it :)
> 
> Thank you again for looking it over. Your suggestions helped me delete code and make it
> simpler overall!

I'm very glad to read that.  Thank you!  :-)

Cheers,
Alex

>
diff mbox series

Patch

diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
new file mode 100755
index 0000000000..fd657df970
--- /dev/null
+++ b/scripts/sort-makefile-lines.py
@@ -0,0 +1,160 @@ 
+#!/usr/bin/python3
+# Sort Makefile lines as expected by project policy.
+# Copyright (C) 2023 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+
+# The project consensus is to split Makefile variable assignment
+# across multiple lines with one value per line.  The values are
+# then sorted as described below, and terminated with a special
+# list termination marker.  This splitting makes it much easier
+# to add new tests to the list since they become just a single
+# line insertion.  It also makes backports and merges easier
+# since the new test may not conflict due to the ordering.
+#
+# Consensus discussion:
+# https://inbox.sourceware.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
+#
+# To support cleaning up Makefiles we created this program to
+# help sort existing lists converted to the new format.
+#
+# The program takes as input the Makefile to sort correctly,
+# and the output file to write the correctly sorted output
+# (it can be the same file).
+#
+# Sorting is only carried out between two special markers:
+# (a) Marker start is '<variable> += \' (or '= \', or ':= \')
+# (b) Marker end is '  # <variable>' (whitespace matters)
+# With everthing between (a) and (b) being sorted accordingly.
+#
+# You can use it like this:
+# $ scripts/sort-makefile-lines.py < elf/Makefile > elf/Makefile.tmp
+# $ mv elf/Makefile.tmp elf/Makefile
+#
+# The Makefile lines in the project are sorted using the
+# following rules:
+# - All lines are sorted as-if `LC_COLLATE=C sort`
+# - Lines that have a numeric suffix and whose leading prefix
+#   matches exactly are sorted according the numeric suffix
+#   in increasing numerical order.
+#
+# For example:
+# ~~~
+# tests += \
+#   test-a \
+#   test-b \
+#   test-b1 \
+#   test-b2 \
+#   test-b10 \
+#   test-b20 \
+#   test-b100 \
+#   # tests
+# ~~~
+# This example shows tests sorted alphabetically, followed
+# by a numeric suffix sort in increasing numeric order.
+#
+# Cleanups:
+# - Tests that end in "a" or "b" variants should be renamed to
+#   end in just the numerical value. For example 'tst-mutex7robust'
+#   should be renamed to 'tst-mutex12' (the highest numbered test)
+#   or 'tst-robust11' (the highest numbered test) in order to get
+#   reasonable ordering.
+# - Modules that end in "mod" or "mod1" should be renamed. For
+#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
+#   (test module for atfork2). If there are more than one module
+#   then they should be named with a suffix that uses [0-9] first
+#   then [A-Z] next for a total of 36 possible modules per test.
+#   No manually listed test currently uses more than that (though
+#   automatically generated tests may; they don't need sorting).
+# - Avoid including another test and instead refactor into common
+#   code with all tests including hte common code, then give the
+#   tests unique names.
+#
+# If you have a Makefile that needs converting, then you can
+# quickly split the values into one-per-line, ensure the start
+# and end markers are in place, and then run the script to
+# sort the values.
+
+import sys
+import locale
+import re
+import functools
+
+def glibc_makefile_numeric(string1, string2):
+    # Check if string1 has a numeric suffix.
+    var1 = re.search(r'([0-9]+) \\$', string1)
+    var2 = re.search(r'([0-9]+) \\$', string2)
+    if var1 and var2:
+        if string1[0:var1.span()[0]] == string2[0:var2.span()[0]]:
+            # string1 and string2 both share a prefix and
+            # have a numeric suffix that can be compared.
+            # Sort order is based on the numeric suffix.
+            return int(var1.group(1)) > int(var2.group(1))
+    # Default to strcoll.
+    return locale.strcoll(string1, string2)
+
+def sort_lines(lines):
+
+    # Use the C locale for language independent collation.
+    locale.setlocale (locale.LC_ALL, "C")
+
+    # Sort using a glibc-specific sorting function.
+    lines = sorted(lines, key=functools.cmp_to_key(glibc_makefile_numeric))
+
+    return lines
+
+def sort_makefile_lines():
+
+    # Read the whole Makefile.
+    lines = sys.stdin.readlines()
+
+    # Build a list of all start markers (tuple includes name).
+    startmarks = []
+    for i in range(len(lines)):
+        # Look for things like "var = \", "var := \" or "var += \"
+        # to start the sorted list.
+        var = re.search(r'^([a-zA-Z0-9-]*) [\+:]?\= \\$', lines[i])
+        if var:
+            # Remember the index and the name.
+            startmarks.append((i, var.group(1)))
+
+    # For each start marker try to find a matching end mark
+    # and build a block that needs sorting.  The end marker
+    # must have the matching comment name for it to be valid.
+    rangemarks = []
+    for sm in startmarks:
+        # Look for things like "  # var" to end the sorted list.
+        reg = r'^  # ' + sm[1] + r'$'
+        for j in range(sm[0] + 1, len(lines)):
+            if re.search(reg, lines[j]):
+                # Rembember the block to sort (inclusive).
+                rangemarks.append((sm[0] + 1, j))
+                break
+
+    # We now have a list of all ranges that need sorting.
+    # Sort those ranges (inclusive).
+    for r in rangemarks:
+        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
+
+    # Output the whole list with sorted lines to stdout.
+    [sys.stdout.write(line) for line in lines]
+
+
+def main(argv):
+    sort_makefile_lines ()
+
+if __name__ == '__main__':
+    main(sys.argv[1:])