diff mbox series

[v3,1/9] tests: introduce tree-wide code style checking

Message ID 20220707163720.1421716-2-berrange@redhat.com
State New
Headers show
Series tests: introduce a tree-wide code style checking facility | expand

Commit Message

Daniel P. Berrangé July 7, 2022, 4:37 p.m. UTC
Historically QEMU has used the 'scripts/checkpatch.pl' script to
validate various style rules but there are a number of issues:

 - Contributors / maintainers are reluctant to add new
   tests to it, nor fix existint rules, because the Perl
   code is much too hard to understand for most people.

 - It requires the contributor to remember to run it as it
   is not wired into 'make check'

 - While it can be told to check whole files, in practice
   it is usually only used to check patch diffs, because the
   former would flag up pages of pre-existing violations that
   have never been fixed

 - It is said to be OK to ignore some things reported by the
   script, but don't record these exceptional cases anywere.
   Thus contributors looking at existing violations in tree
   are never sure whether they are intentional or historical
   unfixed technical debt.

 - There is no distinct reporting for each type of check
   performed and as a consequence there is also no way to
   mark particular files to be skipped for particular checks

This commit aims to give us a better approach to checking many
types of code style problems by introducing a flexible and simple
way to define whole tree style checks.

The logic provide is inspired by the GNULIB 'top/maint.mk' file,
but has been re-implemented in a simple Python script, using a
YAML config file, in an attempt to make it easier to understand
than the make rules.

This commit does the bare minimum introducing the basic infra:

 - tests/style.py - the script for evaluating coding style rules
 - tests/style.yml - the config defining the coding style rules

The concept behind the style checking is to perform simple regular
expression matches across the source file content.

The key benefit of regular expression matching is that it is very
fast, and can match against all types of source file in the repository,
regardless of whether it is used in the current build, or the language
the source is written in.

The downside, compared to a compiler based approach (eg libclang) is
that it does not have semantic understanding of the code, which makes
some checks difficult to implement.

As such this style matching framework is not proposed as a solution for
all possible coding style rules. It is general enough that it can
accomplish many useful checks, and is intended to be complimentary to
any style checkers with semantic knowledge of the code like libclang,
or pylint/flake8.

It would be possible to use Python's regular expression engine to
perform this matching directly, it instead calls out to 'grep' (for
single line matches) and 'perl' (for multiline matches). These are
highly optimized regex engines, so should maximise performance. They
also avoid problems with python's charset encoding causing it to
throw exceptions when encountering invalid utf8, rather than continue
on a best effort basis.

In terms of defining checks, a short bit of yaml is all that is
required. For example, consider we want to stop people using the
'bool' type entirely in C source files. A rule in tests/style.yml
would say

  prohibit_bool:
	files: \.c$
	prohibit: \bbool\b
	message: do not use the bool type

The 'prohibit' rule is matched line-wise across every .c source
file. If any violation is found, the contents of that line are
printed, and 'message' is shown as a error message.

There are many more advanced options, which are documented in
comments in the style.yml file in this commit.

The tool can be invoked directly

   ./tests/style.py --config test/style.yml check

Or for individual checks

   ./tests/style.py --config test/style.yml check --rule prohibit_bool

If a file is known to intentionally violate a style check rule
this can be recorded in the style.yml and will result in it being
ignored.  The '--ignored' flag can be used to see ignored failures.

This is all wired up into meson, such that a 'style' test suite is
defined and each individual style check is exposed as a test case.

This results in creation of a 'make check-style' target that is
triggerd by 'make check' by default.

Note that the checks require the use of 'git' to detect the list of
source files to search. Thus the check is skipped when not running
from a git repository.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 meson.build            |   2 +
 tests/Makefile.include |   3 +-
 tests/meson.build      |  17 ++++
 tests/style.py         | 218 +++++++++++++++++++++++++++++++++++++++++
 tests/style.yml        |  88 +++++++++++++++++
 5 files changed, 327 insertions(+), 1 deletion(-)
 create mode 100755 tests/style.py
 create mode 100644 tests/style.yml

Comments

Peter Maydell July 20, 2022, 4:25 p.m. UTC | #1
On Thu, 7 Jul 2022 at 17:37, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> Historically QEMU has used the 'scripts/checkpatch.pl' script to
> validate various style rules but there are a number of issues:
>
>  - Contributors / maintainers are reluctant to add new
>    tests to it, nor fix existint rules, because the Perl
>    code is much too hard to understand for most people.
>
>  - It requires the contributor to remember to run it as it
>    is not wired into 'make check'
>
>  - While it can be told to check whole files, in practice
>    it is usually only used to check patch diffs, because the
>    former would flag up pages of pre-existing violations that
>    have never been fixed
>
>  - It is said to be OK to ignore some things reported by the
>    script, but don't record these exceptional cases anywere.
>    Thus contributors looking at existing violations in tree
>    are never sure whether they are intentional or historical
>    unfixed technical debt.
>
>  - There is no distinct reporting for each type of check
>    performed and as a consequence there is also no way to
>    mark particular files to be skipped for particular checks
>
> This commit aims to give us a better approach to checking many
> types of code style problems by introducing a flexible and simple
> way to define whole tree style checks.

Hi; thanks for doing this rewrite into Python. I think it
is definitely easier to understand.

> The logic provide is inspired by the GNULIB 'top/maint.mk' file,
> but has been re-implemented in a simple Python script, using a
> YAML config file, in an attempt to make it easier to understand
> than the make rules.
>
> This commit does the bare minimum introducing the basic infra:
>
>  - tests/style.py - the script for evaluating coding style rules
>  - tests/style.yml - the config defining the coding style rules
>
> The concept behind the style checking is to perform simple regular
> expression matches across the source file content.

> As such this style matching framework is not proposed as a solution for
> all possible coding style rules. It is general enough that it can
> accomplish many useful checks, and is intended to be complimentary to
> any style checkers with semantic knowledge of the code like libclang,
> or pylint/flake8.

So would the intention be that we try to obsolete checkpatch,
or will we still have checkpatch because it can find some
style issues that this framework cannot handle?

I think that on balance I'm in favour of this patchseries if
it is part of a path where we say "we are going to drop
checkpatch and replace it with X, Y and Z" (and we actually
implement that path and don't just end up with another
half-completed transition :-)). I'm much less in favour if
it's just "we added yet another thing to the pile"...

> If a file is known to intentionally violate a style check rule
> this can be recorded in the style.yml and will result in it being
> ignored.  The '--ignored' flag can be used to see ignored failures.

Is it possible to have an individual "suppress this style check
in this one place" mechanism? Dropping an entire file from the
style check is certainly going to be useful for some situations,
but very often I would expect there might be one place in a
multi-thousand line C file where we want to violate a style
rule and it would be nice not to lose the coverage on all the
rest of the file as a result. Plus a whole-file suppression that
lives somewhere other than the source file is going to tend to
hang around for ages after we refactor/delete whatever bit of
source code it was that meant we needed the suppression, whereas
if the suppression is in the source file itself then you see it
when you're working on that bit of code.

(All comments below here are just nits.)

>  meson.build            |   2 +
>  tests/Makefile.include |   3 +-
>  tests/meson.build      |  17 ++++
>  tests/style.py         | 218 +++++++++++++++++++++++++++++++++++++++++
>  tests/style.yml        |  88 +++++++++++++++++

I think this should live in scripts/, same as checkpatch.

>  5 files changed, 327 insertions(+), 1 deletion(-)
>  create mode 100755 tests/style.py
>  create mode 100644 tests/style.yml
>
> diff --git a/meson.build b/meson.build
> index 65a885ea69..d8ef24bacb 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -18,6 +18,8 @@ config_host = keyval.load(meson.current_build_dir() / 'config-host.mak')
>  enable_modules = 'CONFIG_MODULES' in config_host
>  enable_static = 'CONFIG_STATIC' in config_host
>
> +in_gitrepo = run_command('test', '-d', '.git', check: false).returncode() == 0

Should we use Meson's fs.is_dir() rather than running a shell?
(https://mesonbuild.com/Fs-module.html)


I spotted a couple of typos just while scrolling through, but I
have not attempted to actually review the Python.

> +# Expand a regular expression from the config file which
> +# can be in several formats
> +#
> +#  - a plain string - used as-is as a regular expression
> +#  - a list of strings - each element is joined with '|'
> +#  - a dict containing
> +#      - 'terms' - interpreted as a string / list of strings
> +#      - 'prefix' - added to the front of the regular
> +#      - 'prefix' - added to the end of the regular
> +#
> +# Returns: a regulare expression string

"regular"

> +# Take a list of source files and filter it returning a subset
> +#
> +# If @match is non-NULL, it is expanded as a regular expression
> +# and the source file name is included if-and-only-if it matches
> +# the regex.
> +#
> +# If @nonmatch is non-NULL, it is expanded as a regular expression
> +# and the source file name is excluded if-and-only-if it matches
> +# the regex.
> +#
> +# Returns: the filtered list of soruces

"sources"

> diff --git a/tests/style.yml b/tests/style.yml
> new file mode 100644
> index 0000000000..b4e7c6111f
> --- /dev/null
> +++ b/tests/style.yml
> @@ -0,0 +1,88 @@
> +# Source code style checking rules
> +#
> +# Each top level key defines a new check, that is
> +# exposed as a test case in the meson 'style' test
> +# suite.
> +#

You should say somewhere here which of the half a dozen
possible regular expression syntaxes is used.

> +# Within each check, the following keys are valid

Missing trailing colon:

> +#
> +#  * files
> +#
> +#    A regular expression matching filenames that
> +#    are to be checked. Typically used to filter
> +#    based on file extension. If omitted all files
> +#    managed by git will be checked.
> +#
> +#  * prohibit
> +#
> +#    A regular expression matching content that is
> +#    not allowed to be present in source files. Matches
> +#    against single lines of text, unless 'multiline'
> +#    option overrides. Either this option or 'require'
> +#    must be present

Missing trailing '.' (here and in various others below)

> +#  * enabled
> +#
> +#    A boolean providing a way to temporarily disable
> +#    a check. Defaults to 'true' if omitted.
> +#
> +# For all the keys above which accept a regular expression,
> +# one of three syntaxes are permitted

trailing colon

> +#
> +#  * string
> +#
> +#    The full regular expression to match
> +#
> +#  * list of strings
> +#
> +#    Each element of the list will be combined with '|'
> +#    to form the final regular expression. This is typically
> +#    useful to keep line length short when specifying matches
> +#    across many filenames
> +#
> +#  * dict
> +#
> +#    Contains the keys:
> +#
> +#      * terms
> +#
> +#        Either a string or list of strings interpreted as above
> +#
> +#      * prefix
> +#
> +#        A match added to the front of the regex. Useful when
> +#        'terms' is a list of strings and a common prefix is
> +#        desired
> +#
> +#      * suffix
> +#
> +#        A match added to the front of the regex. Useful when
> +#        'terms' is a list of strings and a common prefix is
> +#        desired

thanks
-- PMM
Daniel P. Berrangé July 20, 2022, 4:31 p.m. UTC | #2
On Wed, Jul 20, 2022 at 05:25:00PM +0100, Peter Maydell wrote:
> On Thu, 7 Jul 2022 at 17:37, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > The logic provide is inspired by the GNULIB 'top/maint.mk' file,
> > but has been re-implemented in a simple Python script, using a
> > YAML config file, in an attempt to make it easier to understand
> > than the make rules.
> >
> > This commit does the bare minimum introducing the basic infra:
> >
> >  - tests/style.py - the script for evaluating coding style rules
> >  - tests/style.yml - the config defining the coding style rules
> >
> > The concept behind the style checking is to perform simple regular
> > expression matches across the source file content.
> 
> > As such this style matching framework is not proposed as a solution for
> > all possible coding style rules. It is general enough that it can
> > accomplish many useful checks, and is intended to be complimentary to
> > any style checkers with semantic knowledge of the code like libclang,
> > or pylint/flake8.
> 
> So would the intention be that we try to obsolete checkpatch,
> or will we still have checkpatch because it can find some
> style issues that this framework cannot handle?
> 
> I think that on balance I'm in favour of this patchseries if
> it is part of a path where we say "we are going to drop
> checkpatch and replace it with X, Y and Z" (and we actually
> implement that path and don't just end up with another
> half-completed transition :-)). I'm much less in favour if
> it's just "we added yet another thing to the pile"...

I would certainly like to see us eventually remove
checkpatch.pl because of the various downsides it has.

The caveat is that I've not actually looked in any detail
at what things checkpatch.pl actually checks for. Without
looking my guess-timate is that we could probably replace
90% of it without much trouble. Perhaps we'll just decide
some of the checkjs in checkpatch aren't worth the burden
of maintaining its usage.

> > If a file is known to intentionally violate a style check rule
> > this can be recorded in the style.yml and will result in it being
> > ignored.  The '--ignored' flag can be used to see ignored failures.
> 
> Is it possible to have an individual "suppress this style check
> in this one place" mechanism? Dropping an entire file from the
> style check is certainly going to be useful for some situations,
> but very often I would expect there might be one place in a
> multi-thousand line C file where we want to violate a style
> rule and it would be nice not to lose the coverage on all the
> rest of the file as a result. Plus a whole-file suppression that
> lives somewhere other than the source file is going to tend to
> hang around for ages after we refactor/delete whatever bit of
> source code it was that meant we needed the suppression, whereas
> if the suppression is in the source file itself then you see it
> when you're working on that bit of code.

We could possibly come up with a way to put a magic comment
on the end of a line (eg '// style:ignore' ), and have it applied
automatically as an exclusion. A magic comment the line before
is hard though, given that we're aiming to match things linewise
for speed.

> >  meson.build            |   2 +
> >  tests/Makefile.include |   3 +-
> >  tests/meson.build      |  17 ++++
> >  tests/style.py         | 218 +++++++++++++++++++++++++++++++++++++++++
> >  tests/style.yml        |  88 +++++++++++++++++
> 
> I think this should live in scripts/, same as checkpatch.

Sure, I don't mind.

> > diff --git a/meson.build b/meson.build
> > index 65a885ea69..d8ef24bacb 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -18,6 +18,8 @@ config_host = keyval.load(meson.current_build_dir() / 'config-host.mak')
> >  enable_modules = 'CONFIG_MODULES' in config_host
> >  enable_static = 'CONFIG_STATIC' in config_host
> >
> > +in_gitrepo = run_command('test', '-d', '.git', check: false).returncode() == 0
> 
> Should we use Meson's fs.is_dir() rather than running a shell?
> (https://mesonbuild.com/Fs-module.html)

Will investigate 


> > diff --git a/tests/style.yml b/tests/style.yml
> > new file mode 100644
> > index 0000000000..b4e7c6111f
> > --- /dev/null
> > +++ b/tests/style.yml
> > @@ -0,0 +1,88 @@
> > +# Source code style checking rules
> > +#
> > +# Each top level key defines a new check, that is
> > +# exposed as a test case in the meson 'style' test
> > +# suite.
> > +#
> 
> You should say somewhere here which of the half a dozen
> possible regular expression syntaxes is used.

ok


With regards,
Daniel
Eric Blake July 20, 2022, 5:08 p.m. UTC | #3
On Wed, Jul 20, 2022 at 05:31:52PM +0100, Daniel P. Berrangé wrote:
> > > diff --git a/meson.build b/meson.build
> > > index 65a885ea69..d8ef24bacb 100644
> > > --- a/meson.build
> > > +++ b/meson.build
> > > @@ -18,6 +18,8 @@ config_host = keyval.load(meson.current_build_dir() / 'config-host.mak')
> > >  enable_modules = 'CONFIG_MODULES' in config_host
> > >  enable_static = 'CONFIG_STATIC' in config_host
> > >
> > > +in_gitrepo = run_command('test', '-d', '.git', check: false).returncode() == 0
> > 
> > Should we use Meson's fs.is_dir() rather than running a shell?
> > (https://mesonbuild.com/Fs-module.html)
> 
> Will investigate

Probably not a good idea as-is; .git need not be a directory, but can
also be a symlink.  So 'test -e .git' is the better check (see
scripts/qemu-version.sh); but if you can write an existence check in
meson (instead of a directory check), then go for it.
Peter Maydell July 20, 2022, 5:38 p.m. UTC | #4
On Wed, 20 Jul 2022 at 18:08, Eric Blake <eblake@redhat.com> wrote:
>
> On Wed, Jul 20, 2022 at 05:31:52PM +0100, Daniel P. Berrangé wrote:
> > > > diff --git a/meson.build b/meson.build
> > > > index 65a885ea69..d8ef24bacb 100644
> > > > --- a/meson.build
> > > > +++ b/meson.build
> > > > @@ -18,6 +18,8 @@ config_host = keyval.load(meson.current_build_dir() / 'config-host.mak')
> > > >  enable_modules = 'CONFIG_MODULES' in config_host
> > > >  enable_static = 'CONFIG_STATIC' in config_host
> > > >
> > > > +in_gitrepo = run_command('test', '-d', '.git', check: false).returncode() == 0
> > >
> > > Should we use Meson's fs.is_dir() rather than running a shell?
> > > (https://mesonbuild.com/Fs-module.html)
> >
> > Will investigate
>
> Probably not a good idea as-is; .git need not be a directory, but can
> also be a symlink.  So 'test -e .git' is the better check (see
> scripts/qemu-version.sh); but if you can write an existence check in
> meson (instead of a directory check), then go for it.

There is a fs.exists(), yes. I just suggested .is_dir() because
the code as written here is doing a -d check, not a -e check.

-- PMM
Eric Blake July 21, 2022, 2:54 p.m. UTC | #5
On Thu, Jul 07, 2022 at 05:37:12PM +0100, Daniel P. Berrangé wrote:
> Historically QEMU has used the 'scripts/checkpatch.pl' script to
> validate various style rules but there are a number of issues:
> 
>  - Contributors / maintainers are reluctant to add new
>    tests to it, nor fix existint rules, because the Perl

existing

> 
> The 'prohibit' rule is matched line-wise across every .c source
> file. If any violation is found, the contents of that line are
> printed, and 'message' is shown as a error message.

Can we add an exception regex as well: the prohibit rule is ignored on
lines that also match the exception rule, allowing us to write a rule
that would recognize magic comments on lines where we intentionally
break the normal rule?

> +++ b/tests/style.py
> @@ -0,0 +1,218 @@

> +
> +# Expand a regular expression from the config file which
> +# can be in several formats
> +#
> +#  - a plain string - used as-is as a regular expression
> +#  - a list of strings - each element is joined with '|'
> +#  - a dict containing
> +#      - 'terms' - interpreted as a string / list of strings
> +#      - 'prefix' - added to the front of the regular
> +#      - 'prefix' - added to the end of the regular

'suffix'

> +
> +# Evalate the rule against the designated sources
> +#
> +# Returns: 1 if the rule failed against one or more sources, 0 otherwise
> +def evaluate(sources, name, rule, ignored=False):

Rather large, but looks like a nice translation of much of gnulib's
maint.mk rule engine.

> +
> +    if len(proc.stdout) > 0:
> +        print("\033[31;1mFAIL\033[0m ❌ (%0.2f secs)" % delta)
> +        print(proc.stdout.strip())
> +        print("\033[31;1mERROR\033[0m: %s: %s ❌" % (name, rule["message"]))
> +        return 1
> +    else:
> +        print("\033[32;1mPASS\033[0m ✅ (%0.2f secs)" % delta)
> +        return 0

Do we need to make the colorization dependent on whether output is a
terminal or a specific flag is in use?

> +++ b/tests/style.yml
> @@ -0,0 +1,88 @@
> +# Source code style checking rules
> +#
> +# Each top level key defines a new check, that is
> +# exposed as a test case in the meson 'style' test
> +# suite.
> +#
> +# Within each check, the following keys are valid
> +#
> +#  * files
> +#
> +#    A regular expression matching filenames that
> +#    are to be checked. Typically used to filter
> +#    based on file extension. If omitted all files
> +#    managed by git will be checked.
> +#
> +#  * prohibit
> +#
> +#    A regular expression matching content that is
> +#    not allowed to be present in source files. Matches
> +#    against single lines of text, unless 'multiline'
> +#    option overrides. Either this option or 'require'
> +#    must be present
> +#
> +#  * require
> +#
> +#    A regular expression matching content that must
> +#    always be present in source files. Matches against
> +#    single lines of text, unless 'multiline' option
> +#    overrides. Either this option of 'prohibit' must
> +#    be present
> +#
> +#  * multiline
> +#
> +#    A boolean controlling whether 'prohibit' and 'require'
> +#    regular expressions match single lines or the entire
> +#    file contents. Defaults to 'false', matching single
> +#    lines at a time.
> +#
> +#  * ignore
> +#
> +#    A regular expression matching files to exclude from
> +#    the check. This is typically used when certain files
> +#    otherwise checked have known acceptable violations

s/have/that have/

> +#    of the test.
> +#
> +#  * message
> +#
> +#    A string providing a message to emit when the test
> +#    condition fails. Must be present
> +#
> +#  * enabled
> +#
> +#    A boolean providing a way to temporarily disable
> +#    a check. Defaults to 'true' if omitted.
> +#
> +# For all the keys above which accept a regular expression,
> +# one of three syntaxes are permitted
> +#
> +#  * string
> +#
> +#    The full regular expression to match
> +#
> +#  * list of strings
> +#
> +#    Each element of the list will be combined with '|'
> +#    to form the final regular expression. This is typically
> +#    useful to keep line length short when specifying matches
> +#    across many filenames
> +#
> +#  * dict
> +#
> +#    Contains the keys:
> +#
> +#      * terms
> +#
> +#        Either a string or list of strings interpreted as above
> +#
> +#      * prefix
> +#
> +#        A match added to the front of the regex. Useful when
> +#        'terms' is a list of strings and a common prefix is
> +#        desired
> +#
> +#      * suffix
> +#
> +#        A match added to the front of the regex. Useful when

s/front/end/

> +#        'terms' is a list of strings and a common prefix is

s/prefix/suffix/

> +#        desired
> -- 
> 2.36.1
>
Peter Maydell July 25, 2022, 3:25 p.m. UTC | #6
On Wed, 20 Jul 2022 at 17:32, Daniel P. Berrangé <berrange@redhat.com> wrote:
> I would certainly like to see us eventually remove
> checkpatch.pl because of the various downsides it has.
>
> The caveat is that I've not actually looked in any detail
> at what things checkpatch.pl actually checks for. Without
> looking my guess-timate is that we could probably replace
> 90% of it without much trouble. Perhaps we'll just decide
> some of the checkjs in checkpatch aren't worth the burden
> of maintaining its usage.

I went through checkpatch, and here are the warnings I think
are worth making sure we still have. I've included all the
ones that look like we've added them specifically for QEMU
on the basis that if we cared enough to edit checkpatch to
add them then we ought to care enough to retain them.

* "Do not add expected files together with tests,
   follow instructions in tests/qtest/bios-tables-test.c"
* "do not set execute permissions for source files"
* "please use python3 interpreter"
* "Author email address is mangled by the mailing list"
* syntax checks on Signed-off-by lines
* "does MAINTAINERS need updating?"
* "Invalid UTF-8"
* "trailing whitespace"
* "DOS line endings" (maybe)
* "Don't use '#' flag of printf format ('%#') in trace-events"
* "Hex numbers must be prefixed with '0x' [in trace-events]"
* line-length checks (though for a "whole codebase must pass"
  test we would want to set the length longer than the current
  "author should consider whether to wrap" value
* hard coded tabs
* the various dodgy-indentation checks
* the various space-required checks eg around operators
* misformatted block comments
* "do not use C99 // comments"
* "do not initialise globals/statics to 0 or NULL"
* "do not use assignment in if condition"
* "braces {} are necessary for all arms of this statement"
* "braces {} are necessary even for single statement blocks"
* "Use of volatile is usually wrong, please add a comment"
* "g_free(NULL) is safe this check is probably not required"
* "memory barrier without comment"
* "unnecessary cast may hide bugs, use g_new/g_renew instead"
* "consider using g_path_get_$1() in preference to g_strdup($1())"
* "use g_memdup2() instead of unsafe g_memdup()"
* "consider using qemu_$1 in preference to $1" (strto* etc)
* "use sigaction to establish signal handlers; signal is not portable"
* "please use block_init(), type_init() etc. instead of module_init()"
* "initializer for struct $1 should normally be const"
* "Error messages should not contain newlines"
* "use ctz32() instead of ffs()"
* ditto, ffsl, ffsll, bzero, getpagesize, _SC_PAGESIZE
* "Use g_assert or g_assert_not_reached" [instead of non-exiting glib asserts]

These seem to me to fall into three categories:

(1) many are easy enough to do with grep
(2) there are some checks we really do want to do on the patch,
not on the codebase (most obviously things like Signed-off-by:
format checks)
(3) there are coding style checks that do need to have at least some
idea of C syntax, to check brace placement, whitespace, etc

thanks
-- PMM
diff mbox series

Patch

diff --git a/meson.build b/meson.build
index 65a885ea69..d8ef24bacb 100644
--- a/meson.build
+++ b/meson.build
@@ -18,6 +18,8 @@  config_host = keyval.load(meson.current_build_dir() / 'config-host.mak')
 enable_modules = 'CONFIG_MODULES' in config_host
 enable_static = 'CONFIG_STATIC' in config_host
 
+in_gitrepo = run_command('test', '-d', '.git', check: false).returncode() == 0
+
 # Allow both shared and static libraries unless --enable-static
 static_kwargs = enable_static ? {'static': true} : {}
 
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 3accb83b13..f7c1d2644e 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -3,12 +3,13 @@ 
 .PHONY: check-help
 check-help:
 	@echo "Regression testing targets:"
-	@echo " $(MAKE) check                  Run block, qapi-schema, unit, softfloat, qtest and decodetree tests"
+	@echo " $(MAKE) check                  Run block, qapi-schema, unit, style, softfloat, qtest and decodetree tests"
 	@echo " $(MAKE) bench                  Run speed tests"
 	@echo
 	@echo "Individual test suites:"
 	@echo " $(MAKE) check-qtest-TARGET     Run qtest tests for given target"
 	@echo " $(MAKE) check-qtest            Run qtest tests"
+	@echo " $(MAKE) check-style            Run style checks"
 	@echo " $(MAKE) check-unit             Run qobject tests"
 	@echo " $(MAKE) check-qapi-schema      Run QAPI schema tests"
 	@echo " $(MAKE) check-block            Run block tests"
diff --git a/tests/meson.build b/tests/meson.build
index 8e318ec513..f3140428c3 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -89,6 +89,23 @@  if get_option('tcg').allowed()
   endif
 endif
 
+if in_gitrepo
+  stylecmd = files('style.py')
+  stylecfg = files('style.yml')
+
+  checks = run_command(
+      stylecmd, '--config', stylecfg, 'list',
+      check: true)
+
+  foreach check: checks.stdout().strip().split()
+     test(check,
+          stylecmd,
+          args: [ '--config', stylecfg, 'check', '--rule', check ],
+          workdir: meson.project_source_root(),
+          suite: 'style')
+  endforeach
+endif
+
 subdir('unit')
 subdir('qapi-schema')
 subdir('qtest')
diff --git a/tests/style.py b/tests/style.py
new file mode 100755
index 0000000000..a6c05bbb32
--- /dev/null
+++ b/tests/style.py
@@ -0,0 +1,218 @@ 
+#!/usr/bin/python
+
+import argparse
+import re
+import subprocess
+import sys
+import time
+import yaml
+
+
+def source_files():
+    src = subprocess.check_output(
+        ["git", "ls-tree", "--name-only", "-r", "HEAD:"])
+
+    return src.decode("utf8").strip().split("\n")
+
+
+# Expand a regular expression from the config file which
+# can be in several formats
+#
+#  - a plain string - used as-is as a regular expression
+#  - a list of strings - each element is joined with '|'
+#  - a dict containing
+#      - 'terms' - interpreted as a string / list of strings
+#      - 'prefix' - added to the front of the regular
+#      - 'prefix' - added to the end of the regular
+#
+# Returns: a regulare expression string
+def expand_re(restr):
+    if restr is None:
+        return None
+
+    if type(restr) == list:
+        return "|".join(restr)
+
+    if type(restr) == dict:
+        terms = "(?:" + expand_re(restr["terms"]) + ")"
+
+        return restr.get("prefix", "") + terms + restr.get("suffix", "")
+
+    return restr
+
+
+# Expand the regular expression and then compile it
+#
+# Returns: a compiled regular expresison object for matching
+def compile_re(restr):
+    if restr is None:
+        return None
+
+    return re.compile(expand_re(restr))
+
+
+# Take a list of source files and filter it returning a subset
+#
+# If @match is non-NULL, it is expanded as a regular expression
+# and the source file name is included if-and-only-if it matches
+# the regex.
+#
+# If @nonmatch is non-NULL, it is expanded as a regular expression
+# and the source file name is excluded if-and-only-if it matches
+# the regex.
+#
+# Returns: the filtered list of soruces
+def filtered_sources(sources, match, nonmatch):
+    matchre = compile_re(match)
+    nonmatchre = compile_re(nonmatch)
+
+    filtered = []
+    for name in sources:
+        if ((matchre is None or matchre.search(name)) and
+            (nonmatchre is None or not nonmatchre.search(name))):
+            filtered.append(name)
+    return filtered
+
+
+# Sanity check the configuration of a rule
+#
+# Returns: true if the rule is valid
+def validate(name, rule):
+    if "prohibit" not in rule and "require" not in rule:
+        raise Exception("Either 'prohibit' or 'require' regex is needed")
+
+    if "prohibit" in rule and "require" in rule:
+        raise Exception("Only one of 'prohibit' or 'require' regex is needed")
+
+
+# Evalate the rule against the designated sources
+#
+# Returns: 1 if the rule failed against one or more sources, 0 otherwise
+def evaluate(sources, name, rule, ignored=False):
+    if not rule.get("enabled", True):
+        return
+
+    validate(name, rule)
+
+    ignorere = None
+    if not ignored:
+        ignorere = rule.get("ignore")
+
+    print("CHECK: %s: " % name, end='')
+    sources = filtered_sources(sources,
+                               rule.get("files"),
+                               ignorere)
+
+    input = "\n".join(sources)
+
+    then = time.time()
+
+    # For single line matching, 'grep' is most efficient,
+    # but it can't do the required multi-line matching
+    # so for the latter we turn to 'perl'
+    if not rule.get("multiline", False):
+        if "prohibit" in rule:
+            # The output is the list of lines that have invalid content
+            proc = subprocess.run(["xargs", "grep", "-nE",
+                                   expand_re(rule["prohibit"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        elif "require" in rule:
+            # The output is the list of filenames which don't have
+            # the required content
+            proc = subprocess.run(["xargs", "grep", "-LE",
+                                   expand_re(rule["require"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        else:
+            raise Exception("Unexpected rule config")
+    else:
+        if "prohibit" in rule:
+            # The output is the list of lines that have invalid content
+            proc = subprocess.run(["xargs", "perl", "-0777", "-ne",
+                                   (r'while (m,%s,gs) {' + \
+                                    r'    $n = ($` =~ tr/\n/\n/ + 1);' + \
+                                    r'    ($v = $&) =~ s/\n/\\n/g;' + \
+                                    r'    print "$ARGV:$n:$v\n";' +\
+                                    r'}') % expand_re(rule["prohibit"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        elif "require" in rule:
+            # The output is the list of filenames which don't have
+            # the required content
+            proc = subprocess.run(["xargs", "perl", "-0777", "-ne",
+                                   ("unless (m,%s,s) {" + \
+                                    "    print \"$ARGV\n\";" + \
+                                    "}") % expand_re(rule["require"])],
+                                  input=input, capture_output=True,
+                                  encoding='utf8')
+        else:
+            raise Exception("Unexpected rule config")
+
+        if proc.returncode != 0:
+            raise Exception(proc.stderr)
+
+    now = time.time()
+    delta = now - then
+
+    if len(proc.stdout) > 0:
+        print("\033[31;1mFAIL\033[0m ❌ (%0.2f secs)" % delta)
+        print(proc.stdout.strip())
+        print("\033[31;1mERROR\033[0m: %s: %s ❌" % (name, rule["message"]))
+        return 1
+    else:
+        print("\033[32;1mPASS\033[0m ✅ (%0.2f secs)" % delta)
+        return 0
+
+
+def parse_args():
+    parser = argparse.ArgumentParser("Code style checker")
+    parser.add_argument("--config",
+                        default="tests/style.yml",
+                        help="Path to style rules file")
+
+    subparsers = parser.add_subparsers(dest="command")
+    subparsers.required = True
+
+    list = subparsers.add_parser("list", help="list rules")
+
+    check = subparsers.add_parser("check", help="check rules")
+    check.add_argument("--rule",
+                       help="Name of rule to check")
+    check.add_argument("--ignored",
+                       action="store_true",
+                       help="Show intentionally ignored violations")
+
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+
+    sources = source_files()
+
+    with open(args.config, "r") as fh:
+        rules = yaml.safe_load(fh)
+        if rules is None:
+            rules = {}
+
+    if args.command == "list":
+        for name, rule in rules.items():
+            if rule.get("enabled", True):
+                print(name)
+    elif args.command == "check":
+        errs = 0
+        for name, rule in rules.items():
+            if args.rule == None or args.rule == name:
+                errs += evaluate(sources, name, rule, args.ignored)
+        if errs:
+            return 1
+    else:
+        raise Exception("unknown command '%s'" % args.command)
+    return 0
+
+try:
+    sys.exit(main())
+except Exception as e:
+    print("ERROR: %s: %s" % (sys.argv[0], str(e)))
+    sys.exit(2)
diff --git a/tests/style.yml b/tests/style.yml
new file mode 100644
index 0000000000..b4e7c6111f
--- /dev/null
+++ b/tests/style.yml
@@ -0,0 +1,88 @@ 
+# Source code style checking rules
+#
+# Each top level key defines a new check, that is
+# exposed as a test case in the meson 'style' test
+# suite.
+#
+# Within each check, the following keys are valid
+#
+#  * files
+#
+#    A regular expression matching filenames that
+#    are to be checked. Typically used to filter
+#    based on file extension. If omitted all files
+#    managed by git will be checked.
+#
+#  * prohibit
+#
+#    A regular expression matching content that is
+#    not allowed to be present in source files. Matches
+#    against single lines of text, unless 'multiline'
+#    option overrides. Either this option or 'require'
+#    must be present
+#
+#  * require
+#
+#    A regular expression matching content that must
+#    always be present in source files. Matches against
+#    single lines of text, unless 'multiline' option
+#    overrides. Either this option of 'prohibit' must
+#    be present
+#
+#  * multiline
+#
+#    A boolean controlling whether 'prohibit' and 'require'
+#    regular expressions match single lines or the entire
+#    file contents. Defaults to 'false', matching single
+#    lines at a time.
+#
+#  * ignore
+#
+#    A regular expression matching files to exclude from
+#    the check. This is typically used when certain files
+#    otherwise checked have known acceptable violations
+#    of the test.
+#
+#  * message
+#
+#    A string providing a message to emit when the test
+#    condition fails. Must be present
+#
+#  * enabled
+#
+#    A boolean providing a way to temporarily disable
+#    a check. Defaults to 'true' if omitted.
+#
+# For all the keys above which accept a regular expression,
+# one of three syntaxes are permitted
+#
+#  * string
+#
+#    The full regular expression to match
+#
+#  * list of strings
+#
+#    Each element of the list will be combined with '|'
+#    to form the final regular expression. This is typically
+#    useful to keep line length short when specifying matches
+#    across many filenames
+#
+#  * dict
+#
+#    Contains the keys:
+#
+#      * terms
+#
+#        Either a string or list of strings interpreted as above
+#
+#      * prefix
+#
+#        A match added to the front of the regex. Useful when
+#        'terms' is a list of strings and a common prefix is
+#        desired
+#
+#      * suffix
+#
+#        A match added to the front of the regex. Useful when
+#        'terms' is a list of strings and a common prefix is
+#        desired