mbox series

[0/12] detect quoting and punctuation problems in diagnostics

Message ID 8ac62fe2-e4bf-0922-4947-fca9567a0561@gmail.com
Headers show
Series detect quoting and punctuation problems in diagnostics | expand

Message

Martin Sebor May 14, 2019, 9:31 p.m. UTC
Near the end of every release a bunch of problem reports are
raised for various punctuation, quoting, and spelling issues
in GCC.  In GCC 9 a total 28 such issues were submitted.
A fair number of them are discovered as new or changed
diagnostics are being translated.  A checker was developed
to help find some of these as well many others by scanning
the gcc.pot file: contrib/check-internal-format-escaping.py.
The checker automates the process of finding these issues but
it doesn't prevent them.  Being external to GCC the checker
cannot easily distinguish between message strings that are
expected to be translated and those that aren't.

To help avoid introducing the most glaring subset of these
problems as early as possible while making it possible to
differentiate messages that are expected to conform from those
that need not, the attached patch adds a simple checker to GCC:
-Wformat-diag.  My goal is to improve consistency of messages
and relieve the burden at the end of each release both on
translators and those who then have to fix the problems.

The warning detects some of the following:

  *  unquoted operators consisting of two or more characters
     (e.g., == or ?:)
  *  unquoted keywords, types, GCC built-ins and command line
     options, or identifiers with underscores
  *  unquoted apostrophes and contractions like in can't
  *  informal abbreviations (arg or reg)
  *  unbalanced paired tokens (brackets, parentheses, or quotes)
  *  duplicate or trailing spaces (it allows multiple spaces at
     the beginning of a diagnostic since that's used by the C++
     front-end as an "indentation")
  *  unintended control characters
  *  inconsistent sentence capitalization

To avoid diagnosing direct calls to pretty-printer functions
that often format parts of the same diagnostic in multiple steps,
the solution introduces the "_raw__" suffix to the __gcc_diag__
attributes that the warning then uses to avoid checking those
calls.  Functions with the raw attributes aren't checked.

I tested the patch kit as a whole in an x86_64-linux build and
fixed all the issues it pointed out except for 43 instances of
the new warning in the Go front-end that I wasn't sure how best
to handle (I will follow up on that with Ian).  I also adjusted
the affected tests.

There were just a few issues specific to the i386 back-end but they
had an impact on 9 tests.  The impact of fixing the same kinds of
issues in other back-ends is likely to be similar.  Rather than
trying to do the clean up across all supported targets I think it's
better to handle the rest of the issues over time and with the help
of back end maintainers who can more easily verify that tests still
pass.  With than in mind, I have prevented the warning from causing
bootstrap failures. Once the cleanup is complete I expect to remove
this and include the new warning in -Werror.

Since a few files in GCC "misuse" the diagnostic machinery for
debugging output leading up to internal errors (instead of
calling a pp_xxx function) I suppressed the new warning in those
files via #pragma GCC diagnostic ignored "-Wformat-diag".  If we
agree that calling pp_format or some such is the way to go I can
work replacing those calls as a subsequent step.

The subset should cover the following bug reports (some already
resolved):

90176 - diagnostics should generally contain underscore only
         inside quotes
90162 - exclamation mark in diagnostic!!!!!1111!!!!
90158 - aarch64: wrong quotation in diagnostics
90157 - aarch64: unnecessary abbreviation in diagnostic
90156 - add linter check suggesting to replace %<%s%> with %qs
90149 - diagnostics containing BIT_FIELD_REF don't conform to
         diagnostics guideline
90121 - extra space in error message
90117 - Replace %<%s%> with %qs
90011 - [9 Regression] trailing space in diagnostic
79477 - Please write code more translator-friendly (unquoted options)
89936 - wrong punctuation in tree-profile.c (%<%s%>)

Although the changes are mechanical, since I made them all by hand
I broke up the patch into a series to make it easier to review:

[PATCH 1/12] implement -Wformat-diag to detect quoting and spelling
              issues in GCC diagnostics
[PATCH 2/12] fix diagnostic quoting/spelling in ada
[PATCH 3/12] fix diagnostic quoting/spelling in Brig
[PATCH 4/12] fix diagnostic quoting/spelling in the C front-end
[PATCH 5/12] fix diagnostic quoting/spelling in c-family
[PATCH 6/12] fix diagnostic quoting/spelling in C++
[PATCH 7/12] fix diagnostic quoting/spelling in libgcc
[PATCH 8/12] fix diagnostic quoting/spelling in the middle-end
[PATCH 9/12] adjust tests to quoting/spelling diagnostics fixes
[PATCH 10/12] fix diagnostic quoting/spelling in D
[PATCH 11/12] fix diagnostic quoting/spelling issues in i386 back-end
[PATCH 12/12] fix diagnostic quoting/spelling issues in ObjC

Martin

Comments

Martin Liška May 15, 2019, 11:40 a.m. UTC | #1
On 5/14/19 11:31 PM, Martin Sebor wrote:
> Near the end of every release a bunch of problem reports are
> raised for various punctuation, quoting, and spelling issues
> in GCC.  In GCC 9 a total 28 such issues were submitted.

Hi Martin.

Great that you prepared the patchset.

> A fair number of them are discovered as new or changed
> diagnostics are being translated.  A checker was developed
> to help find some of these as well many others by scanning
> the gcc.pot file: contrib/check-internal-format-escaping.py.
> The checker automates the process of finding these issues but
> it doesn't prevent them.  Being external to GCC the checker
> cannot easily distinguish between message strings that are
> expected to be translated and those that aren't.
> 
> To help avoid introducing the most glaring subset of these
> problems as early as possible while making it possible to
> differentiate messages that are expected to conform from those
> that need not, the attached patch adds a simple checker to GCC:
> -Wformat-diag.  My goal is to improve consistency of messages
> and relieve the burden at the end of each release both on
> translators and those who then have to fix the problems.

I like the idea! One limitation compared to the linter is that
one needs to have a test-suite coverage for errors and warnings.
Moreover, one needs to have that for all target supported by GCC.

> 
> The warning detects some of the following:
> 
>  *  unquoted operators consisting of two or more characters
>     (e.g., == or ?:)
>  *  unquoted keywords, types, GCC built-ins and command line
>     options, or identifiers with underscores
>  *  unquoted apostrophes and contractions like in can't
>  *  informal abbreviations (arg or reg)
>  *  unbalanced paired tokens (brackets, parentheses, or quotes)
>  *  duplicate or trailing spaces (it allows multiple spaces at
>     the beginning of a diagnostic since that's used by the C++
>     front-end as an "indentation")
>  *  unintended control characters
>  *  inconsistent sentence capitalization

You implemented very complex checks! Nice.

> 
> To avoid diagnosing direct calls to pretty-printer functions
> that often format parts of the same diagnostic in multiple steps,
> the solution introduces the "_raw__" suffix to the __gcc_diag__
> attributes that the warning then uses to avoid checking those
> calls.  Functions with the raw attributes aren't checked.
> 
> I tested the patch kit as a whole in an x86_64-linux build and
> fixed all the issues it pointed out except for 43 instances of
> the new warning in the Go front-end that I wasn't sure how best
> to handle (I will follow up on that with Ian).  I also adjusted
> the affected tests.
> 
> There were just a few issues specific to the i386 back-end but they
> had an impact on 9 tests.  The impact of fixing the same kinds of
> issues in other back-ends is likely to be similar.  Rather than
> trying to do the clean up across all supported targets I think it's
> better to handle the rest of the issues over time and with the help
> of back end maintainers who can more easily verify that tests still
> pass.  With than in mind, I have prevented the warning from causing
> bootstrap failures. Once the cleanup is complete I expect to remove
> this and include the new warning in -Werror.

The approach works form me.

Martin

> 
> Since a few files in GCC "misuse" the diagnostic machinery for
> debugging output leading up to internal errors (instead of
> calling a pp_xxx function) I suppressed the new warning in those
> files via #pragma GCC diagnostic ignored "-Wformat-diag".  If we
> agree that calling pp_format or some such is the way to go I can
> work replacing those calls as a subsequent step.
> 
> The subset should cover the following bug reports (some already
> resolved):
> 
> 90176 - diagnostics should generally contain underscore only
>         inside quotes
> 90162 - exclamation mark in diagnostic!!!!!1111!!!!
> 90158 - aarch64: wrong quotation in diagnostics
> 90157 - aarch64: unnecessary abbreviation in diagnostic
> 90156 - add linter check suggesting to replace %<%s%> with %qs
> 90149 - diagnostics containing BIT_FIELD_REF don't conform to
>         diagnostics guideline
> 90121 - extra space in error message
> 90117 - Replace %<%s%> with %qs
> 90011 - [9 Regression] trailing space in diagnostic
> 79477 - Please write code more translator-friendly (unquoted options)
> 89936 - wrong punctuation in tree-profile.c (%<%s%>)
> 
> Although the changes are mechanical, since I made them all by hand
> I broke up the patch into a series to make it easier to review:
> 
> [PATCH 1/12] implement -Wformat-diag to detect quoting and spelling
>              issues in GCC diagnostics
> [PATCH 2/12] fix diagnostic quoting/spelling in ada
> [PATCH 3/12] fix diagnostic quoting/spelling in Brig
> [PATCH 4/12] fix diagnostic quoting/spelling in the C front-end
> [PATCH 5/12] fix diagnostic quoting/spelling in c-family
> [PATCH 6/12] fix diagnostic quoting/spelling in C++
> [PATCH 7/12] fix diagnostic quoting/spelling in libgcc
> [PATCH 8/12] fix diagnostic quoting/spelling in the middle-end
> [PATCH 9/12] adjust tests to quoting/spelling diagnostics fixes
> [PATCH 10/12] fix diagnostic quoting/spelling in D
> [PATCH 11/12] fix diagnostic quoting/spelling issues in i386 back-end
> [PATCH 12/12] fix diagnostic quoting/spelling issues in ObjC
> 
> Martin
Martin Liška May 16, 2019, 10:02 a.m. UTC | #2
Hi.

Maybe I've install the patches wrongly, but I see following error on ppc64le
during bootstrap in stage2:

/home/marxin/Programming/gcc/objdir/./prev-gcc/xg++ -B/home/marxin/Programming/gcc/objdir/./prev-gcc/ -B/usr/local/powerpc64le-unknown-linux-gnu/bin/ -nostdinc++ -B/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknn-linux-gnu/libstdc++-v3/src/.libs -B/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs  -I/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu  -I/homearxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/include  -I/home/marxin/Programming/gcc/libstdc++-v3/libsupc++ -L/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs -L/home/marxin/Programming/gcc/objr/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c   -g -O2 -fno-checking -gtoggle -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-tribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/. -I../../gcc/../include -I../../gcc/../libcpp/include  -I../../gcc/../libdecnumber -I../../gcc/../libcnumber/dpd -I../libdecnumber -I../../gcc/../libbacktrace   -o tree.o -MT tree.o -MMD -MP -MF ./.deps/tree.TPo ../../gcc/tree.c
../../gcc/tree.c: In function ‘void omp_clause_check_failed(const_tree, const char*, int, const char*, omp_clause_code)’:
../../gcc/tree.c:10012:67: error: unknown conversion type character ‘k’ in format [-Werror=format=]
10012 |   internal_error ("tree check: expected %<omp_clause %s%>, have %qk "
      |                                                                   ^
../../gcc/tree.c:10013:10: error: format ‘%s’ expects argument of type ‘char*’, but argument 3 has type ‘int’ [-Werror=format=]
10013 |     "in %s, at %s:%d",
      |         ~^
      |          |
      |          char*
      |         %d
../../gcc/tree.c:10013:20: error: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘const char*’ [-Werror=format=]
10013 |     "in %s, at %s:%d",
      |                   ~^
      |                    |
      |                    int
      |                   %s
10014 |     omp_clause_code_name[code], TREE_CODE (node),
10015 |     function, trim_filename (file), line);
      |               ~~~~~~~~~~~~~~~~~~~~
      |                             |
      |                             const char*
../../gcc/tree.c:10012:19: error: too many arguments for format [-Werror=format-extra-args]
10012 |   internal_error ("tree check: expected %<omp_clause %s%>, have %qk "
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10013 |     "in %s, at %s:%d",
      |     ~~~~~~~~~~~~~~~~~

Martin
Martin Sebor May 16, 2019, 3:01 p.m. UTC | #3
On 5/16/19 4:02 AM, Martin Liška wrote:
> Hi.
> 
> Maybe I've install the patches wrongly, but I see following error on ppc64le
> during bootstrap in stage2:

I also noticed it yesterday on x86_64.  The %qk was vestige of
an earlier attempt to use the pretty-printer to format TREE_CODEs.
I have this in my tree that fixes it but let me post an updated
patch.

--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -10009,8 +10009,10 @@ void
  omp_clause_check_failed (const_tree node, const char *file, int line,
                           const char *function, enum omp_clause_code code)
  {
-  internal_error ("tree check: expected omp_clause %s, have %s in %s, 
at %s:%d",
-                 omp_clause_code_name[code], get_tree_code_name 
(TREE_CODE (node)),
+  internal_error ("tree check: expected %<omp_clause %s%>, have %qs "
+                 "in %s, at %s:%d",
+                 omp_clause_code_name[code],
+                 get_tree_code_name (TREE_CODE (node)),
                   function, trim_filename (file), line);
  }


As a heads up, my latest log still shows a few testsuite failures
that I need to clean up. Those I've looked at are all missing
adjustments to expected dg-warning output.

!  FAIL: 20_util/any/misc/any_cast_neg.cc (3: +3)
!  FAIL: gcc.dg/gcc_diag-11.c (1: +1)
!  FAIL: g++.dg/ubsan/pr63956.C (21: +21)
!  FAIL: gnat.dg/inline3.adb (2: +2)
!  FAIL: gnat.dg/inline5.adb (2: +2)
!  FAIL: gnat.dg/inline7.adb (2: +2)
!  FAIL: gnat.dg/inline9.adb (2: +2)
!  FAIL: objc.dg/method-19.m (2: +2)
!  FAIL: objc.dg/protocol-qualifier-2.m (2: +2)
!  FAIL: obj-c++.dg/protocol-qualifier-2.mm (2: +2)

Martin

> 
> /home/marxin/Programming/gcc/objdir/./prev-gcc/xg++ -B/home/marxin/Programming/gcc/objdir/./prev-gcc/ -B/usr/local/powerpc64le-unknown-linux-gnu/bin/ -nostdinc++ -B/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknn-linux-gnu/libstdc++-v3/src/.libs -B/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs  -I/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu  -I/homearxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/include  -I/home/marxin/Programming/gcc/libstdc++-v3/libsupc++ -L/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs -L/home/marxin/Programming/gcc/objr/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c   -g -O2 -fno-checking -gtoggle -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-tribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/. -I../../gcc/../include -I../../gcc/../libcpp/include  -I../../gcc/../libdecnumber -I../../gcc/../libcnumber/dpd -I../libdecnumber -I../../gcc/../libbacktrace   -o tree.o -MT tree.o -MMD -MP -MF ./.deps/tree.TPo ../../gcc/tree.c
> ../../gcc/tree.c: In function ‘void omp_clause_check_failed(const_tree, const char*, int, const char*, omp_clause_code)’:
> ../../gcc/tree.c:10012:67: error: unknown conversion type character ‘k’ in format [-Werror=format=]
> 10012 |   internal_error ("tree check: expected %<omp_clause %s%>, have %qk "
>        |                                                                   ^
> ../../gcc/tree.c:10013:10: error: format ‘%s’ expects argument of type ‘char*’, but argument 3 has type ‘int’ [-Werror=format=]
> 10013 |     "in %s, at %s:%d",
>        |         ~^
>        |          |
>        |          char*
>        |         %d
> ../../gcc/tree.c:10013:20: error: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘const char*’ [-Werror=format=]
> 10013 |     "in %s, at %s:%d",
>        |                   ~^
>        |                    |
>        |                    int
>        |                   %s
> 10014 |     omp_clause_code_name[code], TREE_CODE (node),
> 10015 |     function, trim_filename (file), line);
>        |               ~~~~~~~~~~~~~~~~~~~~
>        |                             |
>        |                             const char*
> ../../gcc/tree.c:10012:19: error: too many arguments for format [-Werror=format-extra-args]
> 10012 |   internal_error ("tree check: expected %<omp_clause %s%>, have %qk "
>        |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 10013 |     "in %s, at %s:%d",
>        |     ~~~~~~~~~~~~~~~~~
> 
> Martin
>