mbox series

[00/49] RFC: Add a static analysis framework to GCC

Message ID 1573867416-55618-1-git-send-email-dmalcolm@redhat.com
Headers show
Series RFC: Add a static analysis framework to GCC | expand

Message

David Malcolm Nov. 16, 2019, 1:22 a.m. UTC
This patch kit introduces a static analysis pass for GCC that can diagnose
various kinds of problems in C code at compile-time (e.g. double-free,
use-after-free, etc).

The analyzer runs as an IPA pass on the gimple SSA representation.
It associates state machines with data, with transitions at certain
statements and edges.  It finds "interesting" interprocedural paths
through the user's code, in which bogus state transitions happen.

For example, given:

   free (ptr);
   free (ptr);

at the first call, "ptr" transitions to the "freed" state, and
at the second call the analyzer complains, since "ptr" is already in
the "freed" state (unless "ptr" is NULL, in which case it stays in
the NULL state for both calls).

Specific state machines include:
- a checker for malloc/free, for detecting double-free, resource leaks,
  use-after-free, etc (sm-malloc.cc), and
- a checker for stdio's FILE stream API (sm-file.cc)

There are also two state-machine-based checkers that are just
proof-of-concept at this stage:
- a checker for tracking exposure of sensitive data (e.g.
  writing passwords to log files aka CWE-532), and
- a checker for tracking "taint", where data potentially under an
  attacker's control is used without sanitization for things like
  array indices (CWE-129).

There's a separation between the state machines and the analysis
engine, so it ought to be relatively easy to add new warnings.

For any given diagnostic emitted by a state machine, the analysis engine
generates the simplest feasible interprocedural path of control flow for
triggering the diagnostic.


Diagnostic paths
================

The patch kit adds support to GCC's diagnostic subsystem for optionally
associating a "diagnostic_path" with a diagnostic.  A diagnostic path
describes a sequence of events predicted by the compiler that leads to the
problem occurring, with their locations in the user's source, and text
descriptions.

For example, the following warning has a 6-event interprocedural path:

malloc-ipa-8-unchecked.c: In function 'make_boxed_int':
malloc-ipa-8-unchecked.c:21:13: warning: dereference of possibly-NULL 'result' [CWE-690] [-Wanalyzer-possible-null-dereference]
   21 |   result->i = i;
      |   ~~~~~~~~~~^~~
  'make_boxed_int': events 1-2
    |
    |   18 | make_boxed_int (int i)
    |      | ^~~~~~~~~~~~~~
    |      | |
    |      | (1) entry to 'make_boxed_int'
    |   19 | {
    |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int));
    |      |                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                                    |
    |      |                                    (2) calling 'wrapped_malloc' from 'make_boxed_int'
    |
    +--> 'wrapped_malloc': events 3-4
           |
           |    7 | void *wrapped_malloc (size_t size)
           |      |       ^~~~~~~~~~~~~~
           |      |       |
           |      |       (3) entry to 'wrapped_malloc'
           |    8 | {
           |    9 |   return malloc (size);
           |      |          ~~~~~~~~~~~~~
           |      |          |
           |      |          (4) this call could return NULL
           |
    <------+
    |
  'make_boxed_int': events 5-6
    |
    |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int));
    |      |                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                                    |
    |      |                                    (5) possible return of NULL to 'make_boxed_int' from 'wrapped_malloc'
    |   21 |   result->i = i;
    |      |   ~~~~~~~~~~~~~
    |      |             |
    |      |             (6) 'result' could be NULL: unchecked value from (4)
    |

The diagnostic-printing code has consolidated the path into 3 runs of events
(where the events are near each other and within the same function), using
ASCII art to show the interprocedural call and return.

A colorized version of the above can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/test.html

Other examples can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/malloc-1.c.html
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/setjmp-4.c.html

An example of detecting a historical double-free CVE can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/CVE-2005-1689.html
(there are also some false positives in this report)


Diagnostic metadata
===================

The patch kit also adds the ability to associate additional metadata with
a diagnostic. The only such metadata added by the patch kit are CWE
classifications (for the new warnings), such as the CWE-690 in the warning
above, or CWE-401 in this example:

malloc-1.c: In function 'test_42a':
malloc-1.c:466:1: warning: leak of 'p' [CWE-401] [-Wanalyzer-malloc-leak]
  466 | }
      | ^
  'test_42a': events 1-2
       |
       |  463 |   void *p = malloc (1024);
       |      |             ^~~~~~~~~~~~~
       |      |             |
       |      |             (1) allocated here
       |......
       |  466 | }
       |      | ~
       |      | |
       |      | (2) 'p' leaks here; was allocated at (1)
       |

If the terminal supports it, the above "CWE-401" is a clickable hyperlink
to:
  https://cwe.mitre.org/data/definitions/401.html


Scope
=====

The checker is implemented as a GCC plugin.

The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
would live in the GCC source tree and be shipped as part of the GCC tarball,
with a new:
  --enable-plugins=[LIST OF PLUGIN NAMES]
configure option, analogous to --enable-languages (the Makefile/configure
machinery for handling in-tree GCC plugins is adapted from how we support
frontends).

The default is for no such plugins to be enabled, so the default would
be that the checker isn't built - you'd have to opt-in to building it,
with --enable-plugins=analyzer

To mitigate feature creep, I've been focusing on implementing double-free
detection, albeit with an eye to building something that can be developed
into a more fully-featured static analyzer.  For example, I haven't yet
attempted to track buffer overflows in this version, but I believe that
that could be added on top of this foundation.

Many projects implement some kind of wrapper around malloc and free, so
there is enough interprocedural support to cope with that, but only very
primitive support for summarizing larger functions and planning/performing
an efficient interprocedural analysis on non-trivial functions that
have state-machine effects.

In theory the analyzer can work with LTO, and perform cross-TU analysis.
There's a bare-bones prototype of this in the testsuite, which finds a
double-free spanning two TUs; see:

  https://dmalcolm.fedorapeople.org/gcc/2019-11-15/double-free-lto-1.STAR.c.html

However this is just a proof-of-concept at this stage (see the internal docs
for more notes on its limitations).


User interface
==============

--analyzer turns on all the analyzer warnings (it also enables the
expensive traversal that they rely on); individual warnings are all
prefixed "-Wanalyzer-" and can be turned off in the usual way
e.g. -Wno-analyzer-use-after-free.


Rationale
=========

There's benefit in integrating a checker directly into the compiler, so
that the programmer can see the diagnostics as he or she works on the code,
rather than at some later point.  I think that if the analyzer can be
made sufficiently fast that many people would opt-in to deeper but more
expensive warnings.  (I'm aiming for 2x compile time as my rough estimate
of what's reasonable in exchange for being told up-front about various
kinds of pointer snafu).


Correctness
===========

The analyzer is neither sound nor complete, but does attempt to explore
"interesting" paths through the code and generate meaningful diagnostics.
There are no known ICEs, but there are bugs... (see the xfails and TODOs
in the testsuite, and the "Limitations" section of the internal docs).


Performance
===========

Using --analyzer roughly doubles the compile time on various testcases
I've tried (krb5, zlib), but also sometimes takes a lot longer
(again, see the "Limitations" section of the internal docs; there are
bugs...).


Overview of patch kit
=====================

Patch 01 contains user-facing documentation for the analyzer

Patch 02 documents the implementation internals (to avoid having to repeat
it here)

Patches 03-15 are preliminary work.
  Patch 11 adds metadata support to GCC's diagnostic subsystem, so that
  the analyzer can associate CWE identifiers with diagnostics.
  Patch 12 adds diagnostic_path support to to GCC's diagnostic subsystem

Patches 16-17 add support for in-tree plugins

Patches 18-24 add the basics of the analyzer plugin itself

Patches 25-48 adds the analysis "machinery"
  Patches 33-38 add the state machines (somewhat abstracted from the
  rest of the analyzer)

Patch 49 adds the test suite for the analyzer

The patches can also be seen on the git mirror as branch "dmalcolm/analyzer-v1"
  https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/dmalcolm/analyzer-v1

This is relative to r276961 (which predates the recent update to how
params work).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

The analyzer works on toy examples, and on some moderately-sized real
world codebases (krb5 and zlib).  I'd hoped to have tested it more deeply
at this point (and dogfooded it), but given that GCC stage 1 is closing
shortly I thought I ought to post what I have.

It's not clear to me whether I should focus on:

(a) pruning the scope of the checker so that it works well on
*intra*procedural C examples (and bail on anything more complex), perhaps
targetting GCC 10 as an optional extra hidden behind
--enable-plugins=analyzer, or

(b) work on deeper interprocedural analysis (and fixing efficiency issues
with this).

See also: https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
(which currently duplicates much of the above).

Thoughts?
David

David Malcolm (49):
  analyzer: user-facing documentation
  analyzer: internal documentation
  diagnostic_show_locus: move initial newline to callers
  sbitmap.h: add operator const sbitmap & to auto_sbitmap
  vec.h: add auto_delete_vec
  timevar.h: add auto_client_timevar class
  Replace label_text ctor with "borrow" and "take"
  Introduce pretty_printer::clone vfunc
  gimple const-correctness fixes
  Add -fdiagnostics-nn-line-numbers
  Add diagnostic_metadata and CWE support
  Add diagnostic paths
  function-tests.c: expose selftest::make_fndecl for use elsewhere
  hash-map-tests.c: add a selftest involving int_hash
  Add ordered_hash_map
  Add support for in-tree plugins
  Support for adding selftests via a plugin
  Add in-tree plugin: "analyzer"
  analyzer: new files: analyzer-selftests.{cc|h}
  analyzer: new builtins
  analyzer: command-line options
  analyzer: params.def: new parameters
  analyzer: logging support
  analyzer: new file: analyzer-pass.cc
  analyzer: new files: graphviz.{cc|h}
  analyzer: new files: digraph.{cc|h} and shortest-paths.h
  analyzer: new files: supergraph.{cc|h}
  analyzer: new files: analyzer.{cc|h}
  analyzer: new files: tristate.{cc|h}
  analyzer: new files: constraint-manager.{cc|h}
  analyzer: new files: region-model.{cc|h}
  analyzer: new files: pending-diagnostic.{cc|h}
  analyzer: new files: sm.{cc|h}
  analyzer: new file: sm-malloc.cc
  analyzer: new file: sm-file.cc
  analyzer: new file: sm-pattern-test.cc
  analyzer: new file: sm-sensitive.cc
  analyzer: new file: sm-taint.cc
  analyzer: new files: analysis-plan.{cc|h}
  analyzer: new files: call-string.{cc|h}
  analyzer: new files: program-point.{cc|h}
  analyzer: new files: program-state.{cc|h}
  analyzer: new file: exploded-graph.h
  analyzer: new files: state-purge.{cc|h}
  analyzer: new files: engine.{cc|h}
  analyzer: new files: checker-path.{cc|h}
  analyzer: new files: diagnostic-manager.{cc|h}
  gdbinit.in: add break-on-saved-diagnostic
  analyzer: test suite

 configure.ac                                       |    6 +
 gcc/Makefile.in                                    |  102 +-
 gcc/analyzer/Make-plugin.in                        |  181 +
 gcc/analyzer/analysis-plan.cc                      |  115 +
 gcc/analyzer/analysis-plan.h                       |   56 +
 gcc/analyzer/analyzer-logging.cc                   |  220 +
 gcc/analyzer/analyzer-logging.h                    |  256 +
 gcc/analyzer/analyzer-pass.cc                      |  103 +
 gcc/analyzer/analyzer-plugin.cc                    |   63 +
 gcc/analyzer/analyzer-selftests.cc                 |   61 +
 gcc/analyzer/analyzer-selftests.h                  |   46 +
 gcc/analyzer/analyzer.cc                           |  125 +
 gcc/analyzer/analyzer.h                            |  126 +
 gcc/analyzer/call-string.cc                        |  201 +
 gcc/analyzer/call-string.h                         |   74 +
 gcc/analyzer/checker-path.cc                       |  899 +++
 gcc/analyzer/checker-path.h                        |  563 ++
 gcc/analyzer/config-plugin.in                      |   34 +
 gcc/analyzer/constraint-manager.cc                 | 2263 ++++++
 gcc/analyzer/constraint-manager.h                  |  248 +
 gcc/analyzer/diagnostic-manager.cc                 | 1117 +++
 gcc/analyzer/diagnostic-manager.h                  |  116 +
 gcc/analyzer/digraph.cc                            |  189 +
 gcc/analyzer/digraph.h                             |  248 +
 gcc/analyzer/engine.cc                             | 3416 +++++++++
 gcc/analyzer/engine.h                              |   26 +
 gcc/analyzer/exploded-graph.h                      |  754 ++
 gcc/analyzer/graphviz.cc                           |   81 +
 gcc/analyzer/graphviz.h                            |   50 +
 gcc/analyzer/pending-diagnostic.cc                 |   61 +
 gcc/analyzer/pending-diagnostic.h                  |  265 +
 gcc/analyzer/plugin.opt                            |  161 +
 gcc/analyzer/program-point.cc                      |  490 ++
 gcc/analyzer/program-point.h                       |  316 +
 gcc/analyzer/program-state.cc                      | 1284 ++++
 gcc/analyzer/program-state.h                       |  360 +
 gcc/analyzer/region-model.cc                       | 7686 ++++++++++++++++++++
 gcc/analyzer/region-model.h                        | 2076 ++++++
 gcc/analyzer/shortest-paths.h                      |  147 +
 gcc/analyzer/sm-file.cc                            |  338 +
 gcc/analyzer/sm-malloc.cc                          |  799 ++
 gcc/analyzer/sm-pattern-test.cc                    |  165 +
 gcc/analyzer/sm-sensitive.cc                       |  209 +
 gcc/analyzer/sm-taint.cc                           |  338 +
 gcc/analyzer/sm.cc                                 |  135 +
 gcc/analyzer/sm.h                                  |  160 +
 gcc/analyzer/state-purge.cc                        |  516 ++
 gcc/analyzer/state-purge.h                         |  170 +
 gcc/analyzer/supergraph.cc                         |  936 +++
 gcc/analyzer/supergraph.h                          |  560 ++
 gcc/analyzer/tristate.cc                           |  222 +
 gcc/analyzer/tristate.h                            |   82 +
 gcc/builtins.def                                   |   33 +
 gcc/c-family/c-format.c                            |   15 +-
 gcc/c-family/c-format.h                            |    1 +
 gcc/c-family/c-opts.c                              |    1 +
 gcc/c-family/c-pretty-print.c                      |    7 +
 gcc/c-family/c-pretty-print.h                      |    1 +
 gcc/c/c-objc-common.c                              |    4 +-
 gcc/common.opt                                     |   31 +
 gcc/configure.ac                                   |  172 +-
 gcc/coretypes.h                                    |    1 +
 gcc/cp/cxx-pretty-print.c                          |    8 +
 gcc/cp/cxx-pretty-print.h                          |    2 +
 gcc/cp/error.c                                     |    9 +-
 gcc/diagnostic-color.c                             |    3 +-
 gcc/diagnostic-core.h                              |   10 +
 gcc/diagnostic-event-id.h                          |   61 +
 gcc/diagnostic-format-json.cc                      |   33 +-
 gcc/diagnostic-metadata.h                          |   42 +
 gcc/diagnostic-path.h                              |  149 +
 gcc/diagnostic-show-locus.c                        |  219 +-
 gcc/diagnostic.c                                   |  283 +-
 gcc/diagnostic.def                                 |    5 +
 gcc/diagnostic.h                                   |   43 +-
 gcc/doc/analyzer.texi                              |  470 ++
 gcc/doc/gccint.texi                                |    2 +
 gcc/doc/install.texi                               |    9 +
 gcc/doc/invoke.texi                                |  600 +-
 gcc/dwarf2out.c                                    |    1 +
 gcc/fortran/error.c                                |    1 +
 gcc/function-tests.c                               |    4 +-
 gcc/gcc-rich-location.c                            |    2 +-
 gcc/gcc-rich-location.h                            |    6 +-
 gcc/gcc.c                                          |   13 +
 gcc/gdbinit.in                                     |   10 +
 gcc/gimple-predict.h                               |    4 +-
 gcc/gimple-pretty-print.c                          |  159 +-
 gcc/gimple-pretty-print.h                          |    3 +-
 gcc/gimple.h                                       |  156 +-
 gcc/hash-map-tests.c                               |   41 +
 gcc/lto-wrapper.c                                  |    3 +
 gcc/opts.c                                         |   16 +
 gcc/ordered-hash-map-tests.cc                      |  247 +
 gcc/ordered-hash-map.h                             |  184 +
 gcc/params.def                                     |   25 +
 gcc/plugin.c                                       |    2 +
 gcc/plugin.def                                     |    3 +
 gcc/pretty-print.c                                 |   66 +
 gcc/pretty-print.h                                 |    4 +
 gcc/sbitmap.h                                      |    1 +
 gcc/selftest-run-tests.c                           |    6 +
 gcc/selftest.h                                     |    9 +
 .../gcc.dg/analyzer/CVE-2005-1689-minimal.c        |   30 +
 gcc/testsuite/gcc.dg/analyzer/abort.c              |   71 +
 gcc/testsuite/gcc.dg/analyzer/alloca-leak.c        |    8 +
 .../gcc.dg/analyzer/analyzer-verbosity-0.c         |  133 +
 .../gcc.dg/analyzer/analyzer-verbosity-1.c         |  160 +
 .../gcc.dg/analyzer/analyzer-verbosity-2.c         |  191 +
 gcc/testsuite/gcc.dg/analyzer/analyzer.exp         |   49 +
 gcc/testsuite/gcc.dg/analyzer/attribute-nonnull.c  |   57 +
 gcc/testsuite/gcc.dg/analyzer/call-summaries-1.c   |   14 +
 gcc/testsuite/gcc.dg/analyzer/conditionals-2.c     |   44 +
 gcc/testsuite/gcc.dg/analyzer/conditionals-3.c     |   45 +
 .../gcc.dg/analyzer/conditionals-notrans.c         |  158 +
 gcc/testsuite/gcc.dg/analyzer/conditionals-trans.c |  143 +
 gcc/testsuite/gcc.dg/analyzer/data-model-1.c       | 1078 +++
 gcc/testsuite/gcc.dg/analyzer/data-model-10.c      |   17 +
 gcc/testsuite/gcc.dg/analyzer/data-model-11.c      |    6 +
 gcc/testsuite/gcc.dg/analyzer/data-model-12.c      |   13 +
 gcc/testsuite/gcc.dg/analyzer/data-model-13.c      |   21 +
 gcc/testsuite/gcc.dg/analyzer/data-model-14.c      |   24 +
 gcc/testsuite/gcc.dg/analyzer/data-model-15.c      |   34 +
 gcc/testsuite/gcc.dg/analyzer/data-model-16.c      |   50 +
 gcc/testsuite/gcc.dg/analyzer/data-model-17.c      |   20 +
 gcc/testsuite/gcc.dg/analyzer/data-model-18.c      |   20 +
 gcc/testsuite/gcc.dg/analyzer/data-model-19.c      |   31 +
 gcc/testsuite/gcc.dg/analyzer/data-model-2.c       |   13 +
 gcc/testsuite/gcc.dg/analyzer/data-model-3.c       |   15 +
 gcc/testsuite/gcc.dg/analyzer/data-model-4.c       |   16 +
 gcc/testsuite/gcc.dg/analyzer/data-model-5.c       |  100 +
 gcc/testsuite/gcc.dg/analyzer/data-model-5b.c      |   91 +
 gcc/testsuite/gcc.dg/analyzer/data-model-5c.c      |   84 +
 gcc/testsuite/gcc.dg/analyzer/data-model-5d.c      |   63 +
 gcc/testsuite/gcc.dg/analyzer/data-model-6.c       |   13 +
 gcc/testsuite/gcc.dg/analyzer/data-model-7.c       |   19 +
 gcc/testsuite/gcc.dg/analyzer/data-model-8.c       |   24 +
 gcc/testsuite/gcc.dg/analyzer/data-model-9.c       |   32 +
 gcc/testsuite/gcc.dg/analyzer/data-model-path-1.c  |   13 +
 .../gcc.dg/analyzer/double-free-lto-1-a.c          |   16 +
 .../gcc.dg/analyzer/double-free-lto-1-b.c          |    8 +
 gcc/testsuite/gcc.dg/analyzer/double-free-lto-1.h  |    1 +
 gcc/testsuite/gcc.dg/analyzer/equivalence.c        |   29 +
 gcc/testsuite/gcc.dg/analyzer/explode-1.c          |   60 +
 gcc/testsuite/gcc.dg/analyzer/explode-2.c          |   50 +
 gcc/testsuite/gcc.dg/analyzer/factorial.c          |    7 +
 gcc/testsuite/gcc.dg/analyzer/fibonacci.c          |    9 +
 gcc/testsuite/gcc.dg/analyzer/fields.c             |   41 +
 gcc/testsuite/gcc.dg/analyzer/file-1.c             |   37 +
 gcc/testsuite/gcc.dg/analyzer/file-2.c             |   18 +
 gcc/testsuite/gcc.dg/analyzer/function-ptr-1.c     |    8 +
 gcc/testsuite/gcc.dg/analyzer/function-ptr-2.c     |   43 +
 gcc/testsuite/gcc.dg/analyzer/function-ptr-3.c     |   17 +
 gcc/testsuite/gcc.dg/analyzer/gzio-2.c             |   11 +
 gcc/testsuite/gcc.dg/analyzer/gzio-3.c             |   31 +
 gcc/testsuite/gcc.dg/analyzer/gzio-3a.c            |   27 +
 gcc/testsuite/gcc.dg/analyzer/gzio.c               |   17 +
 gcc/testsuite/gcc.dg/analyzer/infinite-recursion.c |   55 +
 gcc/testsuite/gcc.dg/analyzer/loop-2.c             |   36 +
 gcc/testsuite/gcc.dg/analyzer/loop-2a.c            |   39 +
 gcc/testsuite/gcc.dg/analyzer/loop-3.c             |   17 +
 gcc/testsuite/gcc.dg/analyzer/loop-4.c             |   41 +
 gcc/testsuite/gcc.dg/analyzer/loop.c               |   33 +
 gcc/testsuite/gcc.dg/analyzer/malloc-1.c           |  565 ++
 gcc/testsuite/gcc.dg/analyzer/malloc-2.c           |   23 +
 gcc/testsuite/gcc.dg/analyzer/malloc-3.c           |    8 +
 gcc/testsuite/gcc.dg/analyzer/malloc-dce.c         |   12 +
 gcc/testsuite/gcc.dg/analyzer/malloc-dedupe-1.c    |   46 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-1.c       |   24 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-10.c      |   32 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-11.c      |   95 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-12.c      |    7 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13.c      |   30 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-2.c       |   34 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-3.c       |   23 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-4.c       |   13 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-5.c       |   13 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-6.c       |   22 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-7.c       |   29 +
 .../gcc.dg/analyzer/malloc-ipa-8-double-free.c     |  172 +
 .../gcc.dg/analyzer/malloc-ipa-8-unchecked.c       |   66 +
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-9.c       |   18 +
 .../gcc.dg/analyzer/malloc-macro-inline-events.c   |   45 +
 .../gcc.dg/analyzer/malloc-macro-separate-events.c |   15 +
 gcc/testsuite/gcc.dg/analyzer/malloc-macro.h       |    2 +
 .../gcc.dg/analyzer/malloc-many-paths-1.c          |   14 +
 .../gcc.dg/analyzer/malloc-many-paths-2.c          |   30 +
 .../gcc.dg/analyzer/malloc-many-paths-3.c          |   36 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-1.c     |   15 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-10.c    |   19 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-2.c     |   13 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-3.c     |   14 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-4.c     |   20 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-5.c     |   43 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-6.c     |   11 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-7.c     |   21 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-8.c     |   54 +
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-9.c     |  298 +
 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1a.c |  180 +
 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1b.c |  175 +
 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-2.c  |  178 +
 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-3.c  |   65 +
 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-4.c  |   40 +
 gcc/testsuite/gcc.dg/analyzer/operations.c         |   42 +
 gcc/testsuite/gcc.dg/analyzer/params-2.c           |   16 +
 gcc/testsuite/gcc.dg/analyzer/params.c             |   32 +
 gcc/testsuite/gcc.dg/analyzer/paths-1.c            |   16 +
 gcc/testsuite/gcc.dg/analyzer/paths-1a.c           |   16 +
 gcc/testsuite/gcc.dg/analyzer/paths-2.c            |   25 +
 gcc/testsuite/gcc.dg/analyzer/paths-3.c            |   48 +
 gcc/testsuite/gcc.dg/analyzer/paths-4.c            |   49 +
 gcc/testsuite/gcc.dg/analyzer/paths-5.c            |   10 +
 gcc/testsuite/gcc.dg/analyzer/paths-6.c            |  118 +
 gcc/testsuite/gcc.dg/analyzer/paths-7.c            |   58 +
 gcc/testsuite/gcc.dg/analyzer/pattern-test-1.c     |   28 +
 gcc/testsuite/gcc.dg/analyzer/pattern-test-2.c     |   29 +
 gcc/testsuite/gcc.dg/analyzer/pointer-merging.c    |   16 +
 gcc/testsuite/gcc.dg/analyzer/pr61861.c            |    2 +
 gcc/testsuite/gcc.dg/analyzer/pragma-1.c           |   26 +
 gcc/testsuite/gcc.dg/analyzer/scope-1.c            |   23 +
 gcc/testsuite/gcc.dg/analyzer/sensitive-1.c        |   33 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-1.c           |    1 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-2.c           |   97 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-3.c           |  106 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-4.c           |  107 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-5.c           |   65 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-6.c           |   31 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-7.c           |   36 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-8.c           |  107 +
 gcc/testsuite/gcc.dg/analyzer/setjmp-9.c           |  109 +
 gcc/testsuite/gcc.dg/analyzer/switch.c             |   28 +
 gcc/testsuite/gcc.dg/analyzer/taint-1.c            |  128 +
 gcc/testsuite/gcc.dg/analyzer/zlib-1.c             |   67 +
 gcc/testsuite/gcc.dg/analyzer/zlib-2.c             |   51 +
 gcc/testsuite/gcc.dg/analyzer/zlib-3.c             |  214 +
 gcc/testsuite/gcc.dg/analyzer/zlib-4.c             |   20 +
 gcc/testsuite/gcc.dg/analyzer/zlib-5.c             |   49 +
 gcc/testsuite/gcc.dg/analyzer/zlib-6.c             |   47 +
 gcc/testsuite/gcc.dg/format/gcc_diag-10.c          |    6 +-
 .../gcc.dg/plugin/diagnostic-path-format-default.c |  142 +
 .../diagnostic-path-format-inline-events-1.c       |  142 +
 .../diagnostic-path-format-inline-events-2.c       |  154 +
 .../diagnostic-path-format-inline-events-3.c       |  153 +
 .../gcc.dg/plugin/diagnostic-path-format-none.c    |   43 +
 .../diagnostic-path-format-separate-events.c       |   44 +
 .../gcc.dg/plugin/diagnostic-test-paths-1.c        |   38 +
 .../gcc.dg/plugin/diagnostic-test-paths-2.c        |   56 +
 .../gcc.dg/plugin/diagnostic-test-paths-3.c        |   38 +
 .../gcc.dg/plugin/diagnostic_plugin_test_paths.c   |  379 +
 .../plugin/diagnostic_plugin_test_show_locus.c     |    1 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp             |   10 +
 gcc/testsuite/lib/target-supports.exp              |    8 +
 gcc/timevar.h                                      |   33 +
 gcc/toplev.c                                       |    8 +
 gcc/tree-diagnostic-path.cc                        |  809 ++
 gcc/tree-diagnostic.c                              |   12 +-
 gcc/tree-diagnostic.h                              |    8 +
 gcc/tree-eh.c                                      |    6 +-
 gcc/tree-eh.h                                      |    4 +-
 gcc/tree-ssa-alias.h                               |    2 +-
 gcc/tree-ssa-structalias.c                         |    2 +-
 gcc/vec.c                                          |   27 +
 gcc/vec.h                                          |   35 +
 libcpp/include/line-map.h                          |   38 +-
 libcpp/line-map.c                                  |    3 +-
 265 files changed, 42181 insertions(+), 336 deletions(-)
 create mode 100644 gcc/analyzer/Make-plugin.in
 create mode 100644 gcc/analyzer/analysis-plan.cc
 create mode 100644 gcc/analyzer/analysis-plan.h
 create mode 100644 gcc/analyzer/analyzer-logging.cc
 create mode 100644 gcc/analyzer/analyzer-logging.h
 create mode 100644 gcc/analyzer/analyzer-pass.cc
 create mode 100644 gcc/analyzer/analyzer-plugin.cc
 create mode 100644 gcc/analyzer/analyzer-selftests.cc
 create mode 100644 gcc/analyzer/analyzer-selftests.h
 create mode 100644 gcc/analyzer/analyzer.cc
 create mode 100644 gcc/analyzer/analyzer.h
 create mode 100644 gcc/analyzer/call-string.cc
 create mode 100644 gcc/analyzer/call-string.h
 create mode 100644 gcc/analyzer/checker-path.cc
 create mode 100644 gcc/analyzer/checker-path.h
 create mode 100644 gcc/analyzer/config-plugin.in
 create mode 100644 gcc/analyzer/constraint-manager.cc
 create mode 100644 gcc/analyzer/constraint-manager.h
 create mode 100644 gcc/analyzer/diagnostic-manager.cc
 create mode 100644 gcc/analyzer/diagnostic-manager.h
 create mode 100644 gcc/analyzer/digraph.cc
 create mode 100644 gcc/analyzer/digraph.h
 create mode 100644 gcc/analyzer/engine.cc
 create mode 100644 gcc/analyzer/engine.h
 create mode 100644 gcc/analyzer/exploded-graph.h
 create mode 100644 gcc/analyzer/graphviz.cc
 create mode 100644 gcc/analyzer/graphviz.h
 create mode 100644 gcc/analyzer/pending-diagnostic.cc
 create mode 100644 gcc/analyzer/pending-diagnostic.h
 create mode 100644 gcc/analyzer/plugin.opt
 create mode 100644 gcc/analyzer/program-point.cc
 create mode 100644 gcc/analyzer/program-point.h
 create mode 100644 gcc/analyzer/program-state.cc
 create mode 100644 gcc/analyzer/program-state.h
 create mode 100644 gcc/analyzer/region-model.cc
 create mode 100644 gcc/analyzer/region-model.h
 create mode 100644 gcc/analyzer/shortest-paths.h
 create mode 100644 gcc/analyzer/sm-file.cc
 create mode 100644 gcc/analyzer/sm-malloc.cc
 create mode 100644 gcc/analyzer/sm-pattern-test.cc
 create mode 100644 gcc/analyzer/sm-sensitive.cc
 create mode 100644 gcc/analyzer/sm-taint.cc
 create mode 100644 gcc/analyzer/sm.cc
 create mode 100644 gcc/analyzer/sm.h
 create mode 100644 gcc/analyzer/state-purge.cc
 create mode 100644 gcc/analyzer/state-purge.h
 create mode 100644 gcc/analyzer/supergraph.cc
 create mode 100644 gcc/analyzer/supergraph.h
 create mode 100644 gcc/analyzer/tristate.cc
 create mode 100644 gcc/analyzer/tristate.h
 create mode 100644 gcc/diagnostic-event-id.h
 create mode 100644 gcc/diagnostic-metadata.h
 create mode 100644 gcc/diagnostic-path.h
 create mode 100644 gcc/doc/analyzer.texi
 create mode 100644 gcc/ordered-hash-map-tests.cc
 create mode 100644 gcc/ordered-hash-map.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/CVE-2005-1689-minimal.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/abort.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/alloca-leak.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer-verbosity-0.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer-verbosity-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer-verbosity-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer.exp
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attribute-nonnull.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/call-summaries-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-notrans.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-trans.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-10.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-11.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-12.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-13.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-14.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-15.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-16.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-17.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-18.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-19.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5b.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5c.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5d.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-7.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-8.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-9.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-path-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/double-free-lto-1-a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/double-free-lto-1-b.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/double-free-lto-1.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/equivalence.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/explode-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/explode-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/factorial.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fibonacci.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fields.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/file-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/file-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio-3a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infinite-recursion.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-2a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-dce.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-dedupe-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-10.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-11.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-12.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-7.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-8-double-free.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-8-unchecked.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-9.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-macro-inline-events.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-macro-separate-events.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-macro.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-many-paths-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-many-paths-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-many-paths-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-10.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-7.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-8.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-9.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1b.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/operations.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/params-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/params.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-1a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-7.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pattern-test-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pattern-test-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pointer-merging.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr61861.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pragma-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/scope-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/sensitive-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-7.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-8.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-9.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-6.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-default.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-inline-events-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-inline-events-2.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-inline-events-3.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-none.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-separate-events.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-2.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-3.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_paths.c
 create mode 100644 gcc/tree-diagnostic-path.cc

Comments

Thomas Schwinge Nov. 16, 2019, 8:42 p.m. UTC | #1
Hi David!

On 2019-11-15T20:22:47-0500, David Malcolm <dmalcolm@redhat.com> wrote:
> This patch kit

(I have not looked at the patches.)  ;-)

> introduces a static analysis pass for GCC that can diagnose
> various kinds of problems in C code at compile-time (e.g. double-free,
> use-after-free, etc).

Sounds great from the description!


Would it make sense to add to the wiki page
<https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer> a (high-level)
comparison to other static analyzers (Coverity, cppcheck,
clang-static-analyzer, others?), in terms of how they work, what their
respective benefits are, what their design goals are, etc.  (Of course
understanding that yours is much less mature at this point; talking about
high-level design rather than current implementation status.)

For example, why do we want that in GCC instead of an external tool -- in
part covered in your Rationale.  Can a compiler-side implementation
benefit from having more information available than an external tool?
GCC-side implementation is readily available (modulo GCC plugin
installation?) vs. external ones need to be installed/set up first.
GCC-side one only works with GCC-supported languages.  GCC-side one
analyzes actual code being compiled -- thinking about preprocessor-level
'#if' etc., which surely are problematic for external tools that are not
actually replicating a real build.  And so on.  (If you don't want to
spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
compare yours to external tools.)

Just an idea, because I wondered about these things.


> The analyzer runs as an IPA pass on the gimple SSA representation.
> It associates state machines with data, with transitions at certain
> statements and edges.  It finds "interesting" interprocedural paths
> through the user's code, in which bogus state transitions happen.
>
> For example, given:
>
>    free (ptr);
>    free (ptr);
>
> at the first call, "ptr" transitions to the "freed" state, and
> at the second call the analyzer complains, since "ptr" is already in
> the "freed" state (unless "ptr" is NULL, in which case it stays in
> the NULL state for both calls).
>
> Specific state machines include:
> - a checker for malloc/free, for detecting double-free, resource leaks,
>   use-after-free, etc (sm-malloc.cc), and

I can immediately see how this can be useful for a bunch of
'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as source
code directives.  ..., and this would've flagged existing code in the
libgomp OpenACC tests, which recently has given me some grief. Short
summary/examples:

In addition to host-side 'malloc'/'free', there is device-side (separate
memory space) 'acc_malloc'/'acc_free'.  Static checking example: don't
mix up host-side and device-side pointers.  (Both are normal C/C++
pointers.  Hmm, maybe such checking could easily be implemented even
outside of your checker by annotating the respective function
declarations with an attribute describing which in/out arguments are
host-side vs. device-side pointers.)

Then, there are functions to "map" host-side and device-side memory:
'acc_map_data'/'acc_unmap_data'.  Static checking example: you must not
'acc_free' memory spaces that are still mapped.

Then, there are functions like 'acc_create' (or equivalent directives
like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data'
(plus/depending on internal reference counting).  Static checking
example: for a pointer returned by 'acc_create" etc., you must use
'acc_delete' etc. instead of 'acc_free', which first does
'acc_unmap_data' before interal 'acc_free' (..., and again all that
depending on reference counting).  (Might be "interesting" to teach your
checker about the reference counting -- if that is actually necessary;
needs further thought.)


> The checker is implemented as a GCC plugin.
>
> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
> would live in the GCC source tree and be shipped as part of the GCC tarball,
> with a new:
>   --enable-plugins=[LIST OF PLUGIN NAMES]
> configure option, analogous to --enable-languages (the Makefile/configure
> machinery for handling in-tree GCC plugins is adapted from how we support
> frontends).

I like that.  Implementing this as a plugin surely must help to either
document the GCC plugin interface as powerful/mature for such a task.  Or
make it so, if it isn't yet.  ;-)

> The default is for no such plugins to be enabled, so the default would
> be that the checker isn't built - you'd have to opt-in to building it,
> with --enable-plugins=analyzer

I'd favor a default of '--enable-plugins=default' which enables the
"usable" plugins.


> It's not clear to me whether I should focus on:
>
> (a) pruning the scope of the checker so that it works well on
> *intra*procedural C examples (and bail on anything more complex), perhaps
> targetting GCC 10 as an optional extra hidden behind
> --enable-plugins=analyzer, or
>
> (b) work on deeper interprocedural analysis (and fixing efficiency issues
> with this).

I personally would be happy to see (a) happen now (without "optional
extra hidden behind" configure-time flag but rather hidden behind GCC
command-line flag, '-fanalyze'?), and then (b) later on.  As
always, doing the incremental thing, (a) first, then (b) later, would
give it more exposure in the wild, which should help to identify design
issues etc. now instead of much later, for example.


> Thoughts?

One very high-level item: you're using the very generic name "analyzer"
('-Wanalyzer', 'gcc/analyzer/' filenames, for example, or the '-fanalyze'
I just proposed).  Might there be potential for confusion (now, or in the
future) what kind of analyzer that is, or is it safe to assume that in
context of a compiler, an analyzer is always a compile-time, static code
analyzer?  After all, the existing run-time ones are known as "checkers":
'-fstack-check', or "sanitizers": '--fsanitize, or "verifiers":
'-fvtable-verify'.


Grüße
 Thomas
David Malcolm Nov. 19, 2019, 10:01 p.m. UTC | #2
On Sat, 2019-11-16 at 21:42 +0100, Thomas Schwinge wrote:
> Hi David!
> 
> On 2019-11-15T20:22:47-0500, David Malcolm <dmalcolm@redhat.com>
> wrote:
> > This patch kit
> 
> (I have not looked at the patches.)  ;-)
> 
> > introduces a static analysis pass for GCC that can diagnose
> > various kinds of problems in C code at compile-time (e.g. double-
> > free,
> > use-after-free, etc).
> 
> Sounds great from the description!

Thanks.

> Would it make sense to add to the wiki page
> <https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer> a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what
> their
> respective benefits are, what their design goals are, etc.  (Of
> course
> understanding that yours is much less mature at this point; talking
> about
> high-level design rather than current implementation status.)
> 
> For example, why do we want that in GCC instead of an external tool
> -- in
> part covered in your Rationale.  Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-
> level
> '#if' etc., which surely are problematic for external tools that are
> not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
> 
> Just an idea, because I wondered about these things.

Thanks; I've added some notes to the "Rationale" section of the wiki
page.

A lot of the information you're after is hidden in patch 2 of the kit,
in an analysis.texi (though admittedly that's hard to read in "patch
that adds a .texi file" form).

For now, I've uploaded a prebuilt version of the HTML to:

https://dmalcolm.fedorapeople.org/gcc/2019-11-19/gccint/Static-Analyzer.html


> > The analyzer runs as an IPA pass on the gimple SSA representation.
> > It associates state machines with data, with transitions at certain
> > statements and edges.  It finds "interesting" interprocedural paths
> > through the user's code, in which bogus state transitions happen.
> > 
> > For example, given:
> > 
> >    free (ptr);
> >    free (ptr);
> > 
> > at the first call, "ptr" transitions to the "freed" state, and
> > at the second call the analyzer complains, since "ptr" is already
> > in
> > the "freed" state (unless "ptr" is NULL, in which case it stays in
> > the NULL state for both calls).
> > 
> > Specific state machines include:
> > - a checker for malloc/free, for detecting double-free, resource
> > leaks,
> >   use-after-free, etc (sm-malloc.cc), and
> 
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as
> source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
> 
> In addition to host-side 'malloc'/'free', there is device-side
> (separate
> memory space) 'acc_malloc'/'acc_free'. 

I've been thinking about generalizing the malloc/free checker to cover
resource acquisition/release pairs, adding a "domain" for the
allocation, where we'd complain if the resource release function isn't
of the same domain as the resource acquisition function.

Allocation domains might be:
  malloc/free
  C++ scalar new/delete
  C++ array new/delete
  FILE * (fopen/fclose)
  "foo_alloc"/"foo_release" for libfoo (i.e. user-extensible, via
attributes)

and thus catch things like deleting with scalar delete when the buffer
was allocated using new[], and various kinds of layering violations.

I'm afraid that I'm not very familiar with OpenACC.  Would
acc_malloc/acc_free fit into that pattern, or would more be needed? 
For example, can you e.g. dereference a device-side pointer in host
code, or would we ideally issue a diagnostic about that?

>  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
> 
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must
> not
> 'acc_free' memory spaces that are still mapped.

It sounds like this state machine is somewhat more complicated.

Is there a state transition diagram for this somewhere?  I don't have
that for my state machines, but there are at least lists of states; see
e.g. the various state_t within malloc_state_machine
near the top of:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01539.html

> Then, there are functions like 'acc_create' (or equivalent directives
> like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data'
> (plus/depending on internal reference counting).  Static checking
> example: for a pointer returned by 'acc_create" etc., you must use
> 'acc_delete' etc. instead of 'acc_free', which first does
> 'acc_unmap_data' before interal 'acc_free' (..., and again all that
> depending on reference counting).  (Might be "interesting" to teach
> your
> checker about the reference counting -- if that is actually
> necessary;
> needs further thought.)

Perhaps acc_create/acc_delete could simply be handled as a different
allocation "domain" in the approach I suggested above.

Reference counting is actually how I got into GCC development, by
writing a static analysis pass for verifying reference counts in
CPython extension modules (as part of the GCC Python plugin):
https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html
(sadly, that code is slowly bit-rotting).

The approach taken in that python plugin code has various issues. 
Plus, doing it in Python isn't fast, and it's really helpful having
compile-time type checking while writing an analyzer; doing this one in
C++ has been a big improvement.

> > The checker is implemented as a GCC plugin.
> > 
> > The patch kit adds support for "in-tree" plugins i.e. GCC plugins
> > that
> > would live in the GCC source tree and be shipped as part of the GCC
> > tarball,
> > with a new:
> >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > configure option, analogous to --enable-languages (the
> > Makefile/configure
> > machinery for handling in-tree GCC plugins is adapted from how we
> > support
> > frontends).
> 
> I like that.  Implementing this as a plugin surely must help to
> either
> document the GCC plugin interface as powerful/mature for such a
> task.  Or
> make it so, if it isn't yet.  ;-)

Our plugin "interface" as such is very broad.

> > The default is for no such plugins to be enabled, so the default
> > would
> > be that the checker isn't built - you'd have to opt-in to building
> > it,
> > with --enable-plugins=analyzer
> 
> I'd favor a default of '--enable-plugins=default' which enables the
> "usable" plugins.

> > It's not clear to me whether I should focus on:
> > 
> > (a) pruning the scope of the checker so that it works well on
> > *intra*procedural C examples (and bail on anything more complex),
> > perhaps
> > targetting GCC 10 as an optional extra hidden behind
> > --enable-plugins=analyzer, or
> > 
> > (b) work on deeper interprocedural analysis (and fixing efficiency
> > issues
> > with this).
> 
> I personally would be happy to see (a) happen now (without "optional
> extra hidden behind" configure-time flag but rather hidden behind GCC
> command-line flag, '-fanalyze'?), and then (b) later on.  As
> always, doing the incremental thing, (a) first, then (b) later, would
> give it more exposure in the wild, which should help to identify
> design
> issues etc. now instead of much later, for example.

I'm continuing to work on this code; I've already fixed the biggest
issue with LTO support (in my local copy).

I wonder if I ought to hide the interprocedural stuff behind an option
for the initial release, or only handle the simplest wrappers around
malloc/free.  Though obviously I'd prefer to get it working.  I guess
the thing to do for now is to fix bugs, and test it on increasingly
larger codebases.

> > Thoughts?
> 
> One very high-level item: you're using the very generic name
> "analyzer"
> ('-Wanalyzer', 'gcc/analyzer/' filenames, for example, or the '-
> fanalyze'
> I just proposed).  

I like "-fanalyze".

> Might there be potential for confusion (now, or in the
> future) what kind of analyzer that is, or is it safe to assume that
> in
> context of a compiler, an analyzer is always a compile-time, static
> code
> analyzer?  After all, the existing run-time ones are known as
> "checkers":
> '-fstack-check', or "sanitizers": '--fsanitize, or "verifiers":
> '-fvtable-verify'.

My hope is that long-term the analyzer will be extensible: that clang-
static-analyzer works this way, where there's a general framework, with
various "checkers" within it, potentially themselves as plugins.

For an initial version it's simplest to hardcode the checkers, and only
add generality and extension points once we have something concrete and
useful working.

For naming, I've used "analyzer", but I now realize I've conflated two
things:
  (a) the subdirectory we use internally, and the name for the
configure-time enablement option
  (b) what the user types in command-line options (warning names etc)

Potentially (a) could be "sa" (for static analyzer) rather than
"analyzer", but that might be an abbreviation too far.

Hopefully "analyzer" is OK as a name for now.  I plan to create an
ongoing git branch, probably "dmalcolm/analyzer" (I've got some fixes
queued up in my local copy).

> 
> Grüße
>  Thomas

Thanks for the feedback
Dave
Richard Biener Nov. 20, 2019, 10:18 a.m. UTC | #3
On Tue, Nov 19, 2019 at 11:02 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> > > The checker is implemented as a GCC plugin.
> > >
> > > The patch kit adds support for "in-tree" plugins i.e. GCC plugins
> > > that
> > > would live in the GCC source tree and be shipped as part of the GCC
> > > tarball,
> > > with a new:
> > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > configure option, analogous to --enable-languages (the
> > > Makefile/configure
> > > machinery for handling in-tree GCC plugins is adapted from how we
> > > support
> > > frontends).
> >
> > I like that.  Implementing this as a plugin surely must help to
> > either
> > document the GCC plugin interface as powerful/mature for such a
> > task.  Or
> > make it so, if it isn't yet.  ;-)
>
> Our plugin "interface" as such is very broad.

Just to sneak in here I don't like exposing our current plugin "non-API"
more.  In fact I'd just build the analyzer into GCC with maybe an
option to disable its build (in case it is very fat?).

From what I read it seems the analyzer could do with a proper
plugin API that just exposes introspection - and I really hope
somebody finds the time to complete (or rewrite...) the
proposed introspection API that ideally is even cross-compiler
(proven by implementing said API ontop of both GCC and clang/llvm).
That way the Analyzer would work with both GCC and clang [and golang
and rustc...].

So it would be interesting if you could try to sketch the kind of API
the Analyzer needs?  That is, merely the detail on which it inspects
statements, the CFG and the callgraph.

Richard.
Eric Gallager Dec. 2, 2019, 3:03 a.m. UTC | #4
On 11/20/19, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Nov 19, 2019 at 11:02 PM David Malcolm <dmalcolm@redhat.com> wrote:
>>
>> > > The checker is implemented as a GCC plugin.
>> > >
>> > > The patch kit adds support for "in-tree" plugins i.e. GCC plugins
>> > > that
>> > > would live in the GCC source tree and be shipped as part of the GCC
>> > > tarball,
>> > > with a new:
>> > >   --enable-plugins=[LIST OF PLUGIN NAMES]
>> > > configure option, analogous to --enable-languages (the
>> > > Makefile/configure
>> > > machinery for handling in-tree GCC plugins is adapted from how we
>> > > support
>> > > frontends).
>> >
>> > I like that.  Implementing this as a plugin surely must help to
>> > either
>> > document the GCC plugin interface as powerful/mature for such a
>> > task.  Or
>> > make it so, if it isn't yet.  ;-)
>>
>> Our plugin "interface" as such is very broad.
>
> Just to sneak in here I don't like exposing our current plugin "non-API"
> more.  In fact I'd just build the analyzer into GCC with maybe an
> option to disable its build (in case it is very fat?).
>
> From what I read it seems the analyzer could do with a proper
> plugin API that just exposes introspection - and I really hope
> somebody finds the time to complete (or rewrite...) the
> proposed introspection API that ideally is even cross-compiler
> (proven by implementing said API ontop of both GCC and clang/llvm).
> That way the Analyzer would work with both GCC and clang [and golang
> and rustc...].

That might be a good idea for a long-term goal, but I just hope it
doesn't get in the way too much of the analyzer getting into GCC in
the short-term. The analyzer seems like it could do some really cool
analysis, and I'd like to use it sooner rather than later. Rewriting
the plugin API sounds like it could take a really long time...

>
> So it would be interesting if you could try to sketch the kind of API
> the Analyzer needs?  That is, merely the detail on which it inspects
> statements, the CFG and the callgraph.
>
> Richard.
>
Eric Gallager Dec. 2, 2019, 3:20 a.m. UTC | #5
On 11/16/19, Thomas Schwinge <thomas@codesourcery.com> wrote:
> Hi David!
>
> On 2019-11-15T20:22:47-0500, David Malcolm <dmalcolm@redhat.com> wrote:
>> This patch kit
>
> (I have not looked at the patches.)  ;-)
>
>> introduces a static analysis pass for GCC that can diagnose
>> various kinds of problems in C code at compile-time (e.g. double-free,
>> use-after-free, etc).
>
> Sounds great from the description!
>
>
> Would it make sense to add to the wiki page
> <https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer> a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what their
> respective benefits are, what their design goals are, etc.  (Of course
> understanding that yours is much less mature at this point; talking about
> high-level design rather than current implementation status.)
>
> For example, why do we want that in GCC instead of an external tool -- in
> part covered in your Rationale.

There are a lot of bug reports open for requests for warnings that
this analyzer could solve. Users clearly want this in GCC, or else
they wouldn't keep making these requests.

> Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-level
> '#if' etc., which surely are problematic for external tools that are not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
>
> Just an idea, because I wondered about these things.
>
>
>> The analyzer runs as an IPA pass on the gimple SSA representation.
>> It associates state machines with data, with transitions at certain
>> statements and edges.  It finds "interesting" interprocedural paths
>> through the user's code, in which bogus state transitions happen.
>>
>> For example, given:
>>
>>    free (ptr);
>>    free (ptr);
>>
>> at the first call, "ptr" transitions to the "freed" state, and
>> at the second call the analyzer complains, since "ptr" is already in
>> the "freed" state (unless "ptr" is NULL, in which case it stays in
>> the NULL state for both calls).
>>
>> Specific state machines include:
>> - a checker for malloc/free, for detecting double-free, resource leaks,
>>   use-after-free, etc (sm-malloc.cc), and
>
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
>
> In addition to host-side 'malloc'/'free', there is device-side (separate
> memory space) 'acc_malloc'/'acc_free'.  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
>
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must not
> 'acc_free' memory spaces that are still mapped.
>
> Then, there are functions like 'acc_create' (or equivalent directives
> like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data'
> (plus/depending on internal reference counting).  Static checking
> example: for a pointer returned by 'acc_create" etc., you must use
> 'acc_delete' etc. instead of 'acc_free', which first does
> 'acc_unmap_data' before interal 'acc_free' (..., and again all that
> depending on reference counting).  (Might be "interesting" to teach your
> checker about the reference counting -- if that is actually necessary;
> needs further thought.)
>
>
>> The checker is implemented as a GCC plugin.
>>
>> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
>> would live in the GCC source tree and be shipped as part of the GCC
>> tarball,
>> with a new:
>>   --enable-plugins=[LIST OF PLUGIN NAMES]
>> configure option, analogous to --enable-languages (the Makefile/configure
>> machinery for handling in-tree GCC plugins is adapted from how we support
>> frontends).
>
> I like that.  Implementing this as a plugin surely must help to either
> document the GCC plugin interface as powerful/mature for such a task.  Or
> make it so, if it isn't yet.  ;-)

Nick Clifton was bringing this up as a point in his talk on his
annobin plugin at Cauldron; this should make him happy.

>
>> The default is for no such plugins to be enabled, so the default would
>> be that the checker isn't built - you'd have to opt-in to building it,
>> with --enable-plugins=analyzer
>
> I'd favor a default of '--enable-plugins=default' which enables the
> "usable" plugins.

Agreed.

>
>
>> It's not clear to me whether I should focus on:
>>
>> (a) pruning the scope of the checker so that it works well on
>> *intra*procedural C examples (and bail on anything more complex), perhaps
>> targetting GCC 10 as an optional extra hidden behind
>> --enable-plugins=analyzer, or
>>
>> (b) work on deeper interprocedural analysis (and fixing efficiency issues
>> with this).
>
> I personally would be happy to see (a) happen now (without "optional
> extra hidden behind" configure-time flag but rather hidden behind GCC
> command-line flag, '-fanalyze'?), and then (b) later on.  As
> always, doing the incremental thing, (a) first, then (b) later, would
> give it more exposure in the wild, which should help to identify design
> issues etc. now instead of much later, for example.
>
>
>> Thoughts?
>
> One very high-level item: you're using the very generic name "analyzer"
> ('-Wanalyzer', 'gcc/analyzer/' filenames, for example, or the '-fanalyze'
> I just proposed).  Might there be potential for confusion (now, or in the
> future) what kind of analyzer that is, or is it safe to assume that in
> context of a compiler, an analyzer is always a compile-time, static code
> analyzer?  After all, the existing run-time ones are known as "checkers":
> '-fstack-check', or "sanitizers": '--fsanitize, or "verifiers":
> '-fvtable-verify'.
>
>
> Grüße
>  Thomas
>
David Malcolm Dec. 3, 2019, 4:52 p.m. UTC | #6
On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> On Tue, Nov 19, 2019 at 11:02 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > > > The checker is implemented as a GCC plugin.
> > > > 
> > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > plugins
> > > > that
> > > > would live in the GCC source tree and be shipped as part of the
> > > > GCC
> > > > tarball,
> > > > with a new:
> > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > configure option, analogous to --enable-languages (the
> > > > Makefile/configure
> > > > machinery for handling in-tree GCC plugins is adapted from how
> > > > we
> > > > support
> > > > frontends).
> > > 
> > > I like that.  Implementing this as a plugin surely must help to
> > > either
> > > document the GCC plugin interface as powerful/mature for such a
> > > task.  Or
> > > make it so, if it isn't yet.  ;-)
> > 
> > Our plugin "interface" as such is very broad.
> 
> Just to sneak in here I don't like exposing our current plugin "non-
> API"
> more.  In fact I'd just build the analyzer into GCC with maybe an
> option to disable its build (in case it is very fat?).

My aim here is to provide a way for distributors to be able to disable
its build - indeed, for now, for it to be disabled by default,
requiring opting-in.

My reasoning here is that the analyzer is middle-end code, but isn't as
mature as the rest of the middle-end (but I'm working on getting it
more mature).

I want some way to label the code as a "technology preview", that
people may want to experiment with, but to set expectations that this
is a lot of new code and there will be bugs - but to make it available
to make it easier for adventurous users to try it out.

I hope that makes sense.

I went down the "in-tree plugin" path by seeing the analogy with
frontends, but yes, it would probably be simpler to just build it into
GCC, guarded with a configure-time variable.  It's many thousand lines
of non-trivial C++ code, and associated selftests and DejaGnu tests.

Building with --enable-checking=release, and stripping the binaries and
the plugin, I see:

$ ls -al cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david 25921600 Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david 27473568 Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david   645256 Dec  3 11:22
plugin/analyzer_plugin.so

$ ls -alh cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david  25M Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david  27M Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david 631K Dec  3 11:22 plugin/analyzer_plugin.so

so the plugin is about 2.5% of the size of the existing compiler.

The analysis pass is very time-consuming when enabled via -fanalyzer. 
I'm aiming for "x2 compile-time in exchange for finding lots of bugs"
as a tradeoff that users will be happy to make (by supplying
-fanalyzer) - that's faster than comparable static analyzers I've been
playing with.

> From what I read it seems the analyzer could do with a proper
> plugin API that just exposes introspection - and I really hope
> somebody finds the time to complete (or rewrite...) the
> proposed introspection API that ideally is even cross-compiler
> (proven by implementing said API ontop of both GCC and clang/llvm).
> That way the Analyzer would work with both GCC and clang [and golang
> and rustc...].

We've gone back and forth about what a GCC plugin API should look like;
I'm not sure what the objectives are.

For example, are we hoping to offer some kind of ABI guarantee to
plugins so that we can patch GCC without plugins needing to be rebuilt?
If so, how strong is the ABI guarantee?  For example, do we directly
expose the tree code enums and the gimple code enums?

Or is it more ambitious, and hoping to be cross-compiler, in which case
are these enums themselves hidden?

This feels like opening a massive can of worms, and orthogonal to my
goal of giving GCC a static analysis framework.

> So it would be interesting if you could try to sketch the kind of API
> the Analyzer needs?  That is, merely the detail on which it inspects
> statements, the CFG and the callgraph.

FWIW the symbols consumed by the plugin can be seen at:
 https://dmalcolm.fedorapeople.org/gcc/2019-11-27/symbols-used.txt

This is the result of:
  eu-readelf -s plugin/analyzer_plugin.so |c++filt|grep UNDEF

Surveying that, the plugin:
- creates a pass
- views the callgraph and the functions (e.g. ipa_reverse_postorder)
- views CFGs and SSA representation (including statements)
- uses the diagnostic subsystem (which parts of the patch kit extend,
adding e.g. control flow paths), e.g. creating and subclassing
rich_locations, subclassing diagnostic_path and diagnostic_event
- calls into middle-end support functions like
useless_type_conversion_p
- uses GCC types such as bitmap, inchash, wideint
- creates temporary trees
- has selftests
...etc.

But there are inline uses of various functions that don't show up in
that list (e.g. the various gimple_* accessor functions - grepping the
source shows over a hundred uses of these, but they're all inlined and
so don't show up in the above view).

My gut feeling is that writing a plugin API and then rewriting the
analyzer to use it would be a huge amount of work: I'd strongly prefer
not to do so (and to use the existing API, either as a plugin, or
directly, dropping the plugin machinery from the analyzer).

Perhaps the best way forward is to build this directly into the
compiler, but guard it by a configure-time option?

Dave
Jakub Jelinek Dec. 3, 2019, 5:17 p.m. UTC | #7
On Tue, Dec 03, 2019 at 11:52:13AM -0500, David Malcolm wrote:
> > > Our plugin "interface" as such is very broad.
> > 
> > Just to sneak in here I don't like exposing our current plugin "non-
> > API"
> > more.  In fact I'd just build the analyzer into GCC with maybe an
> > option to disable its build (in case it is very fat?).
> 
> My aim here is to provide a way for distributors to be able to disable
> its build - indeed, for now, for it to be disabled by default,
> requiring opting-in.
> 
> My reasoning here is that the analyzer is middle-end code, but isn't as
> mature as the rest of the middle-end (but I'm working on getting it
> more mature).
> 
> I want some way to label the code as a "technology preview", that
> people may want to experiment with, but to set expectations that this
> is a lot of new code and there will be bugs - but to make it available
> to make it easier for adventurous users to try it out.
> 
> I hope that makes sense.
> 
> I went down the "in-tree plugin" path by seeing the analogy with
> frontends, but yes, it would probably be simpler to just build it into
> GCC, guarded with a configure-time variable.  It's many thousand lines
> of non-trivial C++ code, and associated selftests and DejaGnu tests.

I think it is enough to document it as tech preview in the documentation,
no need to have it as an in-tree plugin.  We have lots of options that had
such a state (perhaps undeclared) over the years, I'd consider
-fvtable-verify= to be such an option, or in the past e.g.
-fipa-matrix-reorg or -fipa-struct-reorg.  And 2.5% code growth isn't that
bad.  So, as long as the option isn't enabled by default, I think we'd be
fine.

	Jakub
Richard Biener Dec. 4, 2019, 12:40 p.m. UTC | #8
On Tue, Dec 3, 2019 at 5:52 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> > On Tue, Nov 19, 2019 at 11:02 PM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > > > > The checker is implemented as a GCC plugin.
> > > > >
> > > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > > plugins
> > > > > that
> > > > > would live in the GCC source tree and be shipped as part of the
> > > > > GCC
> > > > > tarball,
> > > > > with a new:
> > > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > > configure option, analogous to --enable-languages (the
> > > > > Makefile/configure
> > > > > machinery for handling in-tree GCC plugins is adapted from how
> > > > > we
> > > > > support
> > > > > frontends).
> > > >
> > > > I like that.  Implementing this as a plugin surely must help to
> > > > either
> > > > document the GCC plugin interface as powerful/mature for such a
> > > > task.  Or
> > > > make it so, if it isn't yet.  ;-)
> > >
> > > Our plugin "interface" as such is very broad.
> >
> > Just to sneak in here I don't like exposing our current plugin "non-
> > API"
> > more.  In fact I'd just build the analyzer into GCC with maybe an
> > option to disable its build (in case it is very fat?).
>
> My aim here is to provide a way for distributors to be able to disable
> its build - indeed, for now, for it to be disabled by default,
> requiring opting-in.
>
> My reasoning here is that the analyzer is middle-end code, but isn't as
> mature as the rest of the middle-end (but I'm working on getting it
> more mature).
>
> I want some way to label the code as a "technology preview", that
> people may want to experiment with, but to set expectations that this
> is a lot of new code and there will be bugs - but to make it available
> to make it easier for adventurous users to try it out.
>
> I hope that makes sense.
>
> I went down the "in-tree plugin" path by seeing the analogy with
> frontends, but yes, it would probably be simpler to just build it into
> GCC, guarded with a configure-time variable.  It's many thousand lines
> of non-trivial C++ code, and associated selftests and DejaGnu tests.
>
> Building with --enable-checking=release, and stripping the binaries and
> the plugin, I see:
>
> $ ls -al cc1 cc1plus plugin/analyzer_plugin.so
> -rwxrwxr-x. 1 david david 25921600 Dec  3 11:22 cc1
> -rwxrwxr-x. 1 david david 27473568 Dec  3 11:22 cc1plus
> -rwxrwxr-x. 1 david david   645256 Dec  3 11:22
> plugin/analyzer_plugin.so
>
> $ ls -alh cc1 cc1plus plugin/analyzer_plugin.so
> -rwxrwxr-x. 1 david david  25M Dec  3 11:22 cc1
> -rwxrwxr-x. 1 david david  27M Dec  3 11:22 cc1plus
> -rwxrwxr-x. 1 david david 631K Dec  3 11:22 plugin/analyzer_plugin.so
>
> so the plugin is about 2.5% of the size of the existing compiler.
>
> The analysis pass is very time-consuming when enabled via -fanalyzer.
> I'm aiming for "x2 compile-time in exchange for finding lots of bugs"
> as a tradeoff that users will be happy to make (by supplying
> -fanalyzer) - that's faster than comparable static analyzers I've been
> playing with.
>
> > From what I read it seems the analyzer could do with a proper
> > plugin API that just exposes introspection - and I really hope
> > somebody finds the time to complete (or rewrite...) the
> > proposed introspection API that ideally is even cross-compiler
> > (proven by implementing said API ontop of both GCC and clang/llvm).
> > That way the Analyzer would work with both GCC and clang [and golang
> > and rustc...].
>
> We've gone back and forth about what a GCC plugin API should look like;
> I'm not sure what the objectives are.
>
> For example, are we hoping to offer some kind of ABI guarantee to
> plugins so that we can patch GCC without plugins needing to be rebuilt?

Yes, I think that's desirable.

> If so, how strong is the ABI guarantee?  For example, do we directly
> expose the tree code enums and the gimple code enums?

No, we'd remap them semantically.

> Or is it more ambitious, and hoping to be cross-compiler, in which case
> are these enums themselves hidden?

Well, my original idea was to see what people really would use
(when just considering introspection and maybe very simple instrumentation).
And then sketch something independent of the underlying compiler.
And then have that API [or even ABI] implemented by more than one
compiler to see if that's viable.

> This feels like opening a massive can of worms, and orthogonal to my
> goal of giving GCC a static analysis framework.

Sure it is orthogonal.  The only reason it comes up here is that you
propose a "plugin" ;)

I'd rather have the current plugin non-API go away so having it
"fixed" by introducing in-tree plugins looks backwards to me in
that regard.

> > So it would be interesting if you could try to sketch the kind of API
> > the Analyzer needs?  That is, merely the detail on which it inspects
> > statements, the CFG and the callgraph.
>
> FWIW the symbols consumed by the plugin can be seen at:
>  https://dmalcolm.fedorapeople.org/gcc/2019-11-27/symbols-used.txt
>
> This is the result of:
>   eu-readelf -s plugin/analyzer_plugin.so |c++filt|grep UNDEF
>
> Surveying that, the plugin:
> - creates a pass
> - views the callgraph and the functions (e.g. ipa_reverse_postorder)
> - views CFGs and SSA representation (including statements)
> - uses the diagnostic subsystem (which parts of the patch kit extend,
> adding e.g. control flow paths), e.g. creating and subclassing
> rich_locations, subclassing diagnostic_path and diagnostic_event
> - calls into middle-end support functions like
> useless_type_conversion_p
> - uses GCC types such as bitmap, inchash, wideint
> - creates temporary trees
> - has selftests
> ...etc.

Ok. So if one rewrote it from scratch on a hypothetical plugin API
then it would need introspection only.  But it would need to re-implement
a lot of stuff like diagnostics (unless that is easy to expose via
a generic compiler plugin API).

> But there are inline uses of various functions that don't show up in
> that list (e.g. the various gimple_* accessor functions - grepping the
> source shows over a hundred uses of these, but they're all inlined and
> so don't show up in the above view).
>
> My gut feeling is that writing a plugin API and then rewriting the
> analyzer to use it would be a huge amount of work: I'd strongly prefer
> not to do so (and to use the existing API, either as a plugin, or
> directly, dropping the plugin machinery from the analyzer).

Understood ;)

> Perhaps the best way forward is to build this directly into the
> compiler, but guard it by a configure-time option?

Yeah, that makes most sense.

> Dave
>
Martin Sebor Dec. 4, 2019, 7:55 p.m. UTC | #9
On 11/15/19 6:22 PM, David Malcolm wrote:
> This patch kit introduces a static analysis pass for GCC that can diagnose
> various kinds of problems in C code at compile-time (e.g. double-free,
> use-after-free, etc).

I haven't looked at the analyzer bits in any detail yet so I have
just some very high-level questions.  But first let me say I'm
excited to see this project! :)

It looks like the analyzer detects some of the same problems as
some existing middle-end warnings (e.g., -Wnonnull, -Wuninitialized),
and some that I have been working toward implementing (invalid uses
of freed pointers such as returning them from functions or passing
them to others), and others still that I have been thinking about
as possible future projects (e.g., detecting uses of uninitialized
arrays in string functions).

What are your thoughts about this sort of overlap?  Do you expect
us to enhance both sets of warnings in parallel, or do you see us
moving away from issuing warnings in the middle-end and toward
making the analyzer the main source of these kinds of diagnostics?
Maybe even replace some of the problematic middle-end warnings
with the analyzer?  What (if anything) should we do about warnings
issued for the same problems by both the middle-end and the analyzer?
Or about false negatives?  E.g., a bug detected by the middle-end
but not the analyzer or vice versa.

What do you see as the biggest pros and cons of either approach?
(Middle-end vs analyzer.)  What limitations is the analyzer
approach inherently subject to that the middle-end warnings aren't,
and vice versa?

How do we prioritize between the two approaches (e.g., choose
where to add a new warning)?

Martin

> The analyzer runs as an IPA pass on the gimple SSA representation.
> It associates state machines with data, with transitions at certain
> statements and edges.  It finds "interesting" interprocedural paths
> through the user's code, in which bogus state transitions happen.
> 
> For example, given:
> 
>     free (ptr);
>     free (ptr);
> 
> at the first call, "ptr" transitions to the "freed" state, and
> at the second call the analyzer complains, since "ptr" is already in
> the "freed" state (unless "ptr" is NULL, in which case it stays in
> the NULL state for both calls).
> 
> Specific state machines include:
> - a checker for malloc/free, for detecting double-free, resource leaks,
>    use-after-free, etc (sm-malloc.cc), and
> - a checker for stdio's FILE stream API (sm-file.cc)
> 
> There are also two state-machine-based checkers that are just
> proof-of-concept at this stage:
> - a checker for tracking exposure of sensitive data (e.g.
>    writing passwords to log files aka CWE-532), and
> - a checker for tracking "taint", where data potentially under an
>    attacker's control is used without sanitization for things like
>    array indices (CWE-129).
> 
> There's a separation between the state machines and the analysis
> engine, so it ought to be relatively easy to add new warnings.
> 
> For any given diagnostic emitted by a state machine, the analysis engine
> generates the simplest feasible interprocedural path of control flow for
> triggering the diagnostic.
> 
> 
> Diagnostic paths
> ================
> 
> The patch kit adds support to GCC's diagnostic subsystem for optionally
> associating a "diagnostic_path" with a diagnostic.  A diagnostic path
> describes a sequence of events predicted by the compiler that leads to the
> problem occurring, with their locations in the user's source, and text
> descriptions.
> 
> For example, the following warning has a 6-event interprocedural path:
> 
> malloc-ipa-8-unchecked.c: In function 'make_boxed_int':
> malloc-ipa-8-unchecked.c:21:13: warning: dereference of possibly-NULL 'result' [CWE-690] [-Wanalyzer-possible-null-dereference]
>     21 |   result->i = i;
>        |   ~~~~~~~~~~^~~
>    'make_boxed_int': events 1-2
>      |
>      |   18 | make_boxed_int (int i)
>      |      | ^~~~~~~~~~~~~~
>      |      | |
>      |      | (1) entry to 'make_boxed_int'
>      |   19 | {
>      |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int));
>      |      |                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      |      |                                    |
>      |      |                                    (2) calling 'wrapped_malloc' from 'make_boxed_int'
>      |
>      +--> 'wrapped_malloc': events 3-4
>             |
>             |    7 | void *wrapped_malloc (size_t size)
>             |      |       ^~~~~~~~~~~~~~
>             |      |       |
>             |      |       (3) entry to 'wrapped_malloc'
>             |    8 | {
>             |    9 |   return malloc (size);
>             |      |          ~~~~~~~~~~~~~
>             |      |          |
>             |      |          (4) this call could return NULL
>             |
>      <------+
>      |
>    'make_boxed_int': events 5-6
>      |
>      |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof (boxed_int));
>      |      |                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      |      |                                    |
>      |      |                                    (5) possible return of NULL to 'make_boxed_int' from 'wrapped_malloc'
>      |   21 |   result->i = i;
>      |      |   ~~~~~~~~~~~~~
>      |      |             |
>      |      |             (6) 'result' could be NULL: unchecked value from (4)
>      |
> 
> The diagnostic-printing code has consolidated the path into 3 runs of events
> (where the events are near each other and within the same function), using
> ASCII art to show the interprocedural call and return.
> 
> A colorized version of the above can be seen at:
>    https://dmalcolm.fedorapeople.org/gcc/2019-11-13/test.html
> 
> Other examples can be seen at:
>    https://dmalcolm.fedorapeople.org/gcc/2019-11-13/malloc-1.c.html
>    https://dmalcolm.fedorapeople.org/gcc/2019-11-13/setjmp-4.c.html
> 
> An example of detecting a historical double-free CVE can be seen at:
>    https://dmalcolm.fedorapeople.org/gcc/2019-11-13/CVE-2005-1689.html
> (there are also some false positives in this report)
> 
> 
> Diagnostic metadata
> ===================
> 
> The patch kit also adds the ability to associate additional metadata with
> a diagnostic. The only such metadata added by the patch kit are CWE
> classifications (for the new warnings), such as the CWE-690 in the warning
> above, or CWE-401 in this example:
> 
> malloc-1.c: In function 'test_42a':
> malloc-1.c:466:1: warning: leak of 'p' [CWE-401] [-Wanalyzer-malloc-leak]
>    466 | }
>        | ^
>    'test_42a': events 1-2
>         |
>         |  463 |   void *p = malloc (1024);
>         |      |             ^~~~~~~~~~~~~
>         |      |             |
>         |      |             (1) allocated here
>         |......
>         |  466 | }
>         |      | ~
>         |      | |
>         |      | (2) 'p' leaks here; was allocated at (1)
>         |
> 
> If the terminal supports it, the above "CWE-401" is a clickable hyperlink
> to:
>    https://cwe.mitre.org/data/definitions/401.html
> 
> 
> Scope
> =====
> 
> The checker is implemented as a GCC plugin.
> 
> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
> would live in the GCC source tree and be shipped as part of the GCC tarball,
> with a new:
>    --enable-plugins=[LIST OF PLUGIN NAMES]
> configure option, analogous to --enable-languages (the Makefile/configure
> machinery for handling in-tree GCC plugins is adapted from how we support
> frontends).
> 
> The default is for no such plugins to be enabled, so the default would
> be that the checker isn't built - you'd have to opt-in to building it,
> with --enable-plugins=analyzer
> 
> To mitigate feature creep, I've been focusing on implementing double-free
> detection, albeit with an eye to building something that can be developed
> into a more fully-featured static analyzer.  For example, I haven't yet
> attempted to track buffer overflows in this version, but I believe that
> that could be added on top of this foundation.
> 
> Many projects implement some kind of wrapper around malloc and free, so
> there is enough interprocedural support to cope with that, but only very
> primitive support for summarizing larger functions and planning/performing
> an efficient interprocedural analysis on non-trivial functions that
> have state-machine effects.
> 
> In theory the analyzer can work with LTO, and perform cross-TU analysis.
> There's a bare-bones prototype of this in the testsuite, which finds a
> double-free spanning two TUs; see:
> 
>    https://dmalcolm.fedorapeople.org/gcc/2019-11-15/double-free-lto-1.STAR.c.html
> 
> However this is just a proof-of-concept at this stage (see the internal docs
> for more notes on its limitations).
> 
> 
> User interface
> ==============
> 
> --analyzer turns on all the analyzer warnings (it also enables the
> expensive traversal that they rely on); individual warnings are all
> prefixed "-Wanalyzer-" and can be turned off in the usual way
> e.g. -Wno-analyzer-use-after-free.
> 
> 
> Rationale
> =========
> 
> There's benefit in integrating a checker directly into the compiler, so
> that the programmer can see the diagnostics as he or she works on the code,
> rather than at some later point.  I think that if the analyzer can be
> made sufficiently fast that many people would opt-in to deeper but more
> expensive warnings.  (I'm aiming for 2x compile time as my rough estimate
> of what's reasonable in exchange for being told up-front about various
> kinds of pointer snafu).
> 
> 
> Correctness
> ===========
> 
> The analyzer is neither sound nor complete, but does attempt to explore
> "interesting" paths through the code and generate meaningful diagnostics.
> There are no known ICEs, but there are bugs... (see the xfails and TODOs
> in the testsuite, and the "Limitations" section of the internal docs).
> 
> 
> Performance
> ===========
> 
> Using --analyzer roughly doubles the compile time on various testcases
> I've tried (krb5, zlib), but also sometimes takes a lot longer
> (again, see the "Limitations" section of the internal docs; there are
> bugs...).
> 
> 
> Overview of patch kit
> =====================
> 
> Patch 01 contains user-facing documentation for the analyzer
> 
> Patch 02 documents the implementation internals (to avoid having to repeat
> it here)
> 
> Patches 03-15 are preliminary work.
>    Patch 11 adds metadata support to GCC's diagnostic subsystem, so that
>    the analyzer can associate CWE identifiers with diagnostics.
>    Patch 12 adds diagnostic_path support to to GCC's diagnostic subsystem
> 
> Patches 16-17 add support for in-tree plugins
> 
> Patches 18-24 add the basics of the analyzer plugin itself
> 
> Patches 25-48 adds the analysis "machinery"
>    Patches 33-38 add the state machines (somewhat abstracted from the
>    rest of the analyzer)
> 
> Patch 49 adds the test suite for the analyzer
> 
> The patches can also be seen on the git mirror as branch "dmalcolm/analyzer-v1"
>    https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/dmalcolm/analyzer-v1
> 
> This is relative to r276961 (which predates the recent update to how
> params work).
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> The analyzer works on toy examples, and on some moderately-sized real
> world codebases (krb5 and zlib).  I'd hoped to have tested it more deeply
> at this point (and dogfooded it), but given that GCC stage 1 is closing
> shortly I thought I ought to post what I have.
> 
> It's not clear to me whether I should focus on:
> 
> (a) pruning the scope of the checker so that it works well on
> *intra*procedural C examples (and bail on anything more complex), perhaps
> targetting GCC 10 as an optional extra hidden behind
> --enable-plugins=analyzer, or
> 
> (b) work on deeper interprocedural analysis (and fixing efficiency issues
> with this).
> 
> See also: https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> (which currently duplicates much of the above).
> 
> Thoughts?
> David
> 
> David Malcolm (49):
>    analyzer: user-facing documentation
>    analyzer: internal documentation
>    diagnostic_show_locus: move initial newline to callers
>    sbitmap.h: add operator const sbitmap & to auto_sbitmap
>    vec.h: add auto_delete_vec
>    timevar.h: add auto_client_timevar class
>    Replace label_text ctor with "borrow" and "take"
>    Introduce pretty_printer::clone vfunc
>    gimple const-correctness fixes
>    Add -fdiagnostics-nn-line-numbers
>    Add diagnostic_metadata and CWE support
>    Add diagnostic paths
>    function-tests.c: expose selftest::make_fndecl for use elsewhere
>    hash-map-tests.c: add a selftest involving int_hash
>    Add ordered_hash_map
>    Add support for in-tree plugins
>    Support for adding selftests via a plugin
>    Add in-tree plugin: "analyzer"
>    analyzer: new files: analyzer-selftests.{cc|h}
>    analyzer: new builtins
>    analyzer: command-line options
>    analyzer: params.def: new parameters
>    analyzer: logging support
>    analyzer: new file: analyzer-pass.cc
>    analyzer: new files: graphviz.{cc|h}
>    analyzer: new files: digraph.{cc|h} and shortest-paths.h
>    analyzer: new files: supergraph.{cc|h}
>    analyzer: new files: analyzer.{cc|h}
>    analyzer: new files: tristate.{cc|h}
>    analyzer: new files: constraint-manager.{cc|h}
>    analyzer: new files: region-model.{cc|h}
>    analyzer: new files: pending-diagnostic.{cc|h}
>    analyzer: new files: sm.{cc|h}
>    analyzer: new file: sm-malloc.cc
>    analyzer: new file: sm-file.cc
>    analyzer: new file: sm-pattern-test.cc
>    analyzer: new file: sm-sensitive.cc
>    analyzer: new file: sm-taint.cc
>    analyzer: new files: analysis-plan.{cc|h}
>    analyzer: new files: call-string.{cc|h}
>    analyzer: new files: program-point.{cc|h}
>    analyzer: new files: program-state.{cc|h}
>    analyzer: new file: exploded-graph.h
>    analyzer: new files: state-purge.{cc|h}
>    analyzer: new files: engine.{cc|h}
>    analyzer: new files: checker-path.{cc|h}
>    analyzer: new files: diagnostic-manager.{cc|h}
>    gdbinit.in: add break-on-saved-diagnostic
>    analyzer: test suite
> 
>   configure.ac                                       |    6 +
>   gcc/Makefile.in                                    |  102 +-
>   gcc/analyzer/Make-plugin.in                        |  181 +
>   gcc/analyzer/analysis-plan.cc                      |  115 +
>   gcc/analyzer/analysis-plan.h                       |   56 +
>   gcc/analyzer/analyzer-logging.cc                   |  220 +
>   gcc/analyzer/analyzer-logging.h                    |  256 +
>   gcc/analyzer/analyzer-pass.cc                      |  103 +
>   gcc/analyzer/analyzer-plugin.cc                    |   63 +
>   gcc/analyzer/analyzer-selftests.cc                 |   61 +
>   gcc/analyzer/analyzer-selftests.h                  |   46 +
>   gcc/analyzer/analyzer.cc                           |  125 +
>   gcc/analyzer/analyzer.h                            |  126 +
>   gcc/analyzer/call-string.cc                        |  201 +
>   gcc/analyzer/call-string.h                         |   74 +
>   gcc/analyzer/checker-path.cc                       |  899 +++
>   gcc/analyzer/checker-path.h                        |  563 ++
>   gcc/analyzer/config-plugin.in                      |   34 +
>   gcc/analyzer/constraint-manager.cc                 | 2263 ++++++
>   gcc/analyzer/constraint-manager.h                  |  248 +
>   gcc/analyzer/diagnostic-manager.cc                 | 1117 +++
>   gcc/analyzer/diagnostic-manager.h                  |  116 +
>   gcc/analyzer/digraph.cc                            |  189 +
>   gcc/analyzer/digraph.h                             |  248 +
>   gcc/analyzer/engine.cc                             | 3416 +++++++++
>   gcc/analyzer/engine.h                              |   26 +
>   gcc/analyzer/exploded-graph.h                      |  754 ++
>   gcc/analyzer/graphviz.cc                           |   81 +
>   gcc/analyzer/graphviz.h                            |   50 +
>   gcc/analyzer/pending-diagnostic.cc                 |   61 +
>   gcc/analyzer/pending-diagnostic.h                  |  265 +
>   gcc/analyzer/plugin.opt                            |  161 +
>   gcc/analyzer/program-point.cc                      |  490 ++
>   gcc/analyzer/program-point.h                       |  316 +
>   gcc/analyzer/program-state.cc                      | 1284 ++++
>   gcc/analyzer/program-state.h                       |  360 +
>   gcc/analyzer/region-model.cc                       | 7686 ++++++++++++++++++++
>   gcc/analyzer/region-model.h                        | 2076 ++++++
>   gcc/analyzer/shortest-paths.h                      |  147 +
>   gcc/analyzer/sm-file.cc                            |  338 +
>   gcc/analyzer/sm-malloc.cc                          |  799 ++
>   gcc/analyzer/sm-pattern-test.cc                    |  165 +
>   gcc/analyzer/sm-sensitive.cc                       |  209 +
>   gcc/analyzer/sm-taint.cc                           |  338 +
>   gcc/analyzer/sm.cc                                 |  135 +
>   gcc/analyzer/sm.h                                  |  160 +
>   gcc/analyzer/state-purge.cc                        |  516 ++
>   gcc/analyzer/state-purge.h                         |  170 +
>   gcc/analyzer/supergraph.cc                         |  936 +++
>   gcc/analyzer/supergraph.h                          |  560 ++
>   gcc/analyzer/tristate.cc                           |  222 +
>   gcc/analyzer/tristate.h                            |   82 +
>   gcc/builtins.def                                   |   33 +
>   gcc/c-family/c-format.c                            |   15 +-
>   gcc/c-family/c-format.h                            |    1 +
>   gcc/c-family/c-opts.c                              |    1 +
>   gcc/c-family/c-pretty-print.c                      |    7 +
>   gcc/c-family/c-pretty-print.h                      |    1 +
>   gcc/c/c-objc-common.c                              |    4 +-
>   gcc/common.opt                                     |   31 +
>   gcc/configure.ac                                   |  172 +-
>   gcc/coretypes.h                                    |    1 +
>   gcc/cp/cxx-pretty-print.c                          |    8 +
>   gcc/cp/cxx-pretty-print.h                          |    2 +
>   gcc/cp/error.c                                     |    9 +-
>   gcc/diagnostic-color.c                             |    3 +-
>   gcc/diagnostic-core.h                              |   10 +
>   gcc/diagnostic-event-id.h                          |   61 +
>   gcc/diagnostic-format-json.cc                      |   33 +-
>   gcc/diagnostic-metadata.h                          |   42 +
>   gcc/diagnostic-path.h                              |  149 +
>   gcc/diagnostic-show-locus.c                        |  219 +-
>   gcc/diagnostic.c                                   |  283 +-
>   gcc/diagnostic.def                                 |    5 +
>   gcc/diagnostic.h                                   |   43 +-
>   gcc/doc/analyzer.texi                              |  470 ++
>   gcc/doc/gccint.texi                                |    2 +
>   gcc/doc/install.texi                               |    9 +
>   gcc/doc/invoke.texi                                |  600 +-
>   gcc/dwarf2out.c                                    |    1 +
>   gcc/fortran/error.c                                |    1 +
>   gcc/function-tests.c                               |    4 +-
>   gcc/gcc-rich-location.c                            |    2 +-
>   gcc/gcc-rich-location.h                            |    6 +-
>   gcc/gcc.c                                          |   13 +
>   gcc/gdbinit.in                                     |   10 +
>   gcc/gimple-predict.h                               |    4 +-
>   gcc/gimple-pretty-print.c                          |  159 +-
>   gcc/gimple-pretty-print.h                          |    3 +-
>   gcc/gimple.h                                       |  156 +-
>   gcc/hash-map-tests.c                               |   41 +
>   gcc/lto-wrapper.c                                  |    3 +
>   gcc/opts.c                                         |   16 +
>   gcc/ordered-hash-map-tests.cc                      |  247 +
>   gcc/ordered-hash-map.h                             |  184 +
>   gcc/params.def                                     |   25 +
>   gcc/plugin.c                                       |    2 +
>   gcc/plugin.def                                     |    3 +
>   gcc/pretty-print.c                                 |   66 +
>   gcc/pretty-print.h                                 |    4 +
>   gcc/sbitmap.h                                      |    1 +
>   gcc/selftest-run-tests.c                           |    6 +
>   gcc/selftest.h                                     |    9 +
>   .../gcc.dg/analyzer/CVE-2005-1689-minimal.c        |   30 +
>   gcc/testsuite/gcc.dg/analyzer/abort.c              |   71 +
>   gcc/testsuite/gcc.dg/analyzer/alloca-leak.c        |    8 +
>   .../gcc.dg/analyzer/analyzer-verbosity-0.c         |  133 +
>   .../gcc.dg/analyzer/analyzer-verbosity-1.c         |  160 +
>   .../gcc.dg/analyzer/analyzer-verbosity-2.c         |  191 +
>   gcc/testsuite/gcc.dg/analyzer/analyzer.exp         |   49 +
>   gcc/testsuite/gcc.dg/analyzer/attribute-nonnull.c  |   57 +
>   gcc/testsuite/gcc.dg/analyzer/call-summaries-1.c   |   14 +
>   gcc/testsuite/gcc.dg/analyzer/conditionals-2.c     |   44 +
>   gcc/testsuite/gcc.dg/analyzer/conditionals-3.c     |   45 +
>   .../gcc.dg/analyzer/conditionals-notrans.c         |  158 +
>   gcc/testsuite/gcc.dg/analyzer/conditionals-trans.c |  143 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-1.c       | 1078 +++
>   gcc/testsuite/gcc.dg/analyzer/data-model-10.c      |   17 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-11.c      |    6 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-12.c      |   13 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-13.c      |   21 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-14.c      |   24 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-15.c      |   34 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-16.c      |   50 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-17.c      |   20 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-18.c      |   20 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-19.c      |   31 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-2.c       |   13 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-3.c       |   15 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-4.c       |   16 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-5.c       |  100 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-5b.c      |   91 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-5c.c      |   84 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-5d.c      |   63 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-6.c       |   13 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-7.c       |   19 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-8.c       |   24 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-9.c       |   32 +
>   gcc/testsuite/gcc.dg/analyzer/data-model-path-1.c  |   13 +
>   .../gcc.dg/analyzer/double-free-lto-1-a.c          |   16 +
>   .../gcc.dg/analyzer/double-free-lto-1-b.c          |    8 +
>   gcc/testsuite/gcc.dg/analyzer/double-free-lto-1.h  |    1 +
>   gcc/testsuite/gcc.dg/analyzer/equivalence.c        |   29 +
>   gcc/testsuite/gcc.dg/analyzer/explode-1.c          |   60 +
>   gcc/testsuite/gcc.dg/analyzer/explode-2.c          |   50 +
>   gcc/testsuite/gcc.dg/analyzer/factorial.c          |    7 +
>   gcc/testsuite/gcc.dg/analyzer/fibonacci.c          |    9 +
>   gcc/testsuite/gcc.dg/analyzer/fields.c             |   41 +
>   gcc/testsuite/gcc.dg/analyzer/file-1.c             |   37 +
>   gcc/testsuite/gcc.dg/analyzer/file-2.c             |   18 +
>   gcc/testsuite/gcc.dg/analyzer/function-ptr-1.c     |    8 +
>   gcc/testsuite/gcc.dg/analyzer/function-ptr-2.c     |   43 +
>   gcc/testsuite/gcc.dg/analyzer/function-ptr-3.c     |   17 +
>   gcc/testsuite/gcc.dg/analyzer/gzio-2.c             |   11 +
>   gcc/testsuite/gcc.dg/analyzer/gzio-3.c             |   31 +
>   gcc/testsuite/gcc.dg/analyzer/gzio-3a.c            |   27 +
>   gcc/testsuite/gcc.dg/analyzer/gzio.c               |   17 +
>   gcc/testsuite/gcc.dg/analyzer/infinite-recursion.c |   55 +
>   gcc/testsuite/gcc.dg/analyzer/loop-2.c             |   36 +
>   gcc/testsuite/gcc.dg/analyzer/loop-2a.c            |   39 +
>   gcc/testsuite/gcc.dg/analyzer/loop-3.c             |   17 +
>   gcc/testsuite/gcc.dg/analyzer/loop-4.c             |   41 +
>   gcc/testsuite/gcc.dg/analyzer/loop.c               |   33 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-1.c           |  565 ++
>   gcc/testsuite/gcc.dg/analyzer/malloc-2.c           |   23 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-3.c           |    8 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-dce.c         |   12 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-dedupe-1.c    |   46 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-1.c       |   24 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-10.c      |   32 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-11.c      |   95 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-12.c      |    7 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13.c      |   30 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-2.c       |   34 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-3.c       |   23 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-4.c       |   13 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-5.c       |   13 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-6.c       |   22 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-7.c       |   29 +
>   .../gcc.dg/analyzer/malloc-ipa-8-double-free.c     |  172 +
>   .../gcc.dg/analyzer/malloc-ipa-8-unchecked.c       |   66 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-ipa-9.c       |   18 +
>   .../gcc.dg/analyzer/malloc-macro-inline-events.c   |   45 +
>   .../gcc.dg/analyzer/malloc-macro-separate-events.c |   15 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-macro.h       |    2 +
>   .../gcc.dg/analyzer/malloc-many-paths-1.c          |   14 +
>   .../gcc.dg/analyzer/malloc-many-paths-2.c          |   30 +
>   .../gcc.dg/analyzer/malloc-many-paths-3.c          |   36 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-1.c     |   15 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-10.c    |   19 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-2.c     |   13 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-3.c     |   14 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-4.c     |   20 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-5.c     |   43 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-6.c     |   11 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-7.c     |   21 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-8.c     |   54 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-paths-9.c     |  298 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1a.c |  180 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1b.c |  175 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-2.c  |  178 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-3.c  |   65 +
>   gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-4.c  |   40 +
>   gcc/testsuite/gcc.dg/analyzer/operations.c         |   42 +
>   gcc/testsuite/gcc.dg/analyzer/params-2.c           |   16 +
>   gcc/testsuite/gcc.dg/analyzer/params.c             |   32 +
>   gcc/testsuite/gcc.dg/analyzer/paths-1.c            |   16 +
>   gcc/testsuite/gcc.dg/analyzer/paths-1a.c           |   16 +
>   gcc/testsuite/gcc.dg/analyzer/paths-2.c            |   25 +
>   gcc/testsuite/gcc.dg/analyzer/paths-3.c            |   48 +
>   gcc/testsuite/gcc.dg/analyzer/paths-4.c            |   49 +
>   gcc/testsuite/gcc.dg/analyzer/paths-5.c            |   10 +
>   gcc/testsuite/gcc.dg/analyzer/paths-6.c            |  118 +
>   gcc/testsuite/gcc.dg/analyzer/paths-7.c            |   58 +
>   gcc/testsuite/gcc.dg/analyzer/pattern-test-1.c     |   28 +
>   gcc/testsuite/gcc.dg/analyzer/pattern-test-2.c     |   29 +
>   gcc/testsuite/gcc.dg/analyzer/pointer-merging.c    |   16 +
>   gcc/testsuite/gcc.dg/analyzer/pr61861.c            |    2 +
>   gcc/testsuite/gcc.dg/analyzer/pragma-1.c           |   26 +
>   gcc/testsuite/gcc.dg/analyzer/scope-1.c            |   23 +
>   gcc/testsuite/gcc.dg/analyzer/sensitive-1.c        |   33 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-1.c           |    1 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-2.c           |   97 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-3.c           |  106 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-4.c           |  107 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-5.c           |   65 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-6.c           |   31 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-7.c           |   36 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-8.c           |  107 +
>   gcc/testsuite/gcc.dg/analyzer/setjmp-9.c           |  109 +
>   gcc/testsuite/gcc.dg/analyzer/switch.c             |   28 +
>   gcc/testsuite/gcc.dg/analyzer/taint-1.c            |  128 +
>   gcc/testsuite/gcc.dg/analyzer/zlib-1.c             |   67 +
>   gcc/testsuite/gcc.dg/analyzer/zlib-2.c             |   51 +
>   gcc/testsuite/gcc.dg/analyzer/zlib-3.c             |  214 +
>   gcc/testsuite/gcc.dg/analyzer/zlib-4.c             |   20 +
>   gcc/testsuite/gcc.dg/analyzer/zlib-5.c             |   49 +
>   gcc/testsuite/gcc.dg/analyzer/zlib-6.c             |   47 +
>   gcc/testsuite/gcc.dg/format/gcc_diag-10.c          |    6 +-
>   .../gcc.dg/plugin/diagnostic-path-format-default.c |  142 +
>   .../diagnostic-path-format-inline-events-1.c       |  142 +
>   .../diagnostic-path-format-inline-events-2.c       |  154 +
>   .../diagnostic-path-format-inline-events-3.c       |  153 +
>   .../gcc.dg/plugin/diagnostic-path-format-none.c    |   43 +
>   .../diagnostic-path-format-separate-events.c       |   44 +
>   .../gcc.dg/plugin/diagnostic-test-paths-1.c        |   38 +
>   .../gcc.dg/plugin/diagnostic-test-paths-2.c        |   56 +
>   .../gcc.dg/plugin/diagnostic-test-paths-3.c        |   38 +
>   .../gcc.dg/plugin/diagnostic_plugin_test_paths.c   |  379 +
>   .../plugin/diagnostic_plugin_test_show_locus.c     |    1 +
>   gcc/testsuite/gcc.dg/plugin/plugin.exp             |   10 +
>   gcc/testsuite/lib/target-supports.exp              |    8 +
>   gcc/timevar.h                                      |   33 +
>   gcc/toplev.c                                       |    8 +
>   gcc/tree-diagnostic-path.cc                        |  809 ++
>   gcc/tree-diagnostic.c                              |   12 +-
>   gcc/tree-diagnostic.h                              |    8 +
>   gcc/tree-eh.c                                      |    6 +-
>   gcc/tree-eh.h                                      |    4 +-
>   gcc/tree-ssa-alias.h                               |    2 +-
>   gcc/tree-ssa-structalias.c                         |    2 +-
>   gcc/vec.c                                          |   27 +
>   gcc/vec.h                                          |   35 +
>   libcpp/include/line-map.h                          |   38 +-
>   libcpp/line-map.c                                  |    3 +-
>   265 files changed, 42181 insertions(+), 336 deletions(-)
>   create mode 100644 gcc/analyzer/Make-plugin.in
>   create mode 100644 gcc/analyzer/analysis-plan.cc
>   create mode 100644 gcc/analyzer/analysis-plan.h
>   create mode 100644 gcc/analyzer/analyzer-logging.cc
>   create mode 100644 gcc/analyzer/analyzer-logging.h
>   create mode 100644 gcc/analyzer/analyzer-pass.cc
>   create mode 100644 gcc/analyzer/analyzer-plugin.cc
>   create mode 100644 gcc/analyzer/analyzer-selftests.cc
>   create mode 100644 gcc/analyzer/analyzer-selftests.h
>   create mode 100644 gcc/analyzer/analyzer.cc
>   create mode 100644 gcc/analyzer/analyzer.h
>   create mode 100644 gcc/analyzer/call-string.cc
>   create mode 100644 gcc/analyzer/call-string.h
>   create mode 100644 gcc/analyzer/checker-path.cc
>   create mode 100644 gcc/analyzer/checker-path.h
>   create mode 100644 gcc/analyzer/config-plugin.in
>   create mode 100644 gcc/analyzer/constraint-manager.cc
>   create mode 100644 gcc/analyzer/constraint-manager.h
>   create mode 100644 gcc/analyzer/diagnostic-manager.cc
>   create mode 100644 gcc/analyzer/diagnostic-manager.h
>   create mode 100644 gcc/analyzer/digraph.cc
>   create mode 100644 gcc/analyzer/digraph.h
>   create mode 100644 gcc/analyzer/engine.cc
>   create mode 100644 gcc/analyzer/engine.h
>   create mode 100644 gcc/analyzer/exploded-graph.h
>   create mode 100644 gcc/analyzer/graphviz.cc
>   create mode 100644 gcc/analyzer/graphviz.h
>   create mode 100644 gcc/analyzer/pending-diagnostic.cc
>   create mode 100644 gcc/analyzer/pending-diagnostic.h
>   create mode 100644 gcc/analyzer/plugin.opt
>   create mode 100644 gcc/analyzer/program-point.cc
>   create mode 100644 gcc/analyzer/program-point.h
>   create mode 100644 gcc/analyzer/program-state.cc
>   create mode 100644 gcc/analyzer/program-state.h
>   create mode 100644 gcc/analyzer/region-model.cc
>   create mode 100644 gcc/analyzer/region-model.h
>   create mode 100644 gcc/analyzer/shortest-paths.h
>   create mode 100644 gcc/analyzer/sm-file.cc
>   create mode 100644 gcc/analyzer/sm-malloc.cc
>   create mode 100644 gcc/analyzer/sm-pattern-test.cc
>   create mode 100644 gcc/analyzer/sm-sensitive.cc
>   create mode 100644 gcc/analyzer/sm-taint.cc
>   create mode 100644 gcc/analyzer/sm.cc
>   create mode 100644 gcc/analyzer/sm.h
>   create mode 100644 gcc/analyzer/state-purge.cc
>   create mode 100644 gcc/analyzer/state-purge.h
>   create mode 100644 gcc/analyzer/supergraph.cc
>   create mode 100644 gcc/analyzer/supergraph.h
>   create mode 100644 gcc/analyzer/tristate.cc
>   create mode 100644 gcc/analyzer/tristate.h
>   create mode 100644 gcc/diagnostic-event-id.h
>   create mode 100644 gcc/diagnostic-metadata.h
>   create mode 100644 gcc/diagnostic-path.h
>   create mode 100644 gcc/doc/analyzer.texi
>   create mode 100644 gcc/ordered-hash-map-tests.cc
>   create mode 100644 gcc/ordered-hash-map.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/CVE-2005-1689-minimal.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/abort.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/alloca-leak.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer-verbosity-0.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer-verbosity-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer-verbosity-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/analyzer.exp
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attribute-nonnull.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/call-summaries-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-notrans.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/conditionals-trans.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-10.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-11.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-12.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-13.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-14.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-15.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-16.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-17.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-18.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-19.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5b.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5c.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-5d.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-7.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-8.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-9.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/data-model-path-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/double-free-lto-1-a.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/double-free-lto-1-b.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/double-free-lto-1.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/equivalence.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/explode-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/explode-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/factorial.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/fibonacci.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/fields.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/file-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/file-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio-3a.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/gzio.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infinite-recursion.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-2a.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/loop.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-dce.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-dedupe-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-10.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-11.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-12.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-7.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-8-double-free.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-8-unchecked.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-9.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-macro-inline-events.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-macro-separate-events.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-macro.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-many-paths-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-many-paths-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-many-paths-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-10.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-7.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-8.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-paths-9.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1a.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-1b.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-vs-local-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/operations.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/params-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/params.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-1a.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-7.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/pattern-test-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/pattern-test-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/pointer-merging.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr61861.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/pragma-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/scope-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/sensitive-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-7.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-8.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/setjmp-9.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/zlib-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-default.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-inline-events-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-inline-events-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-inline-events-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-none.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-path-format-separate-events.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_paths.c
>   create mode 100644 gcc/tree-diagnostic-path.cc
>
Jeff Law Dec. 6, 2019, 10:25 p.m. UTC | #10
On Tue, 2019-12-03 at 11:52 -0500, David Malcolm wrote:
> On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> > On Tue, Nov 19, 2019 at 11:02 PM David Malcolm <dmalcolm@redhat.com
> > >
> > wrote:
> > > > > The checker is implemented as a GCC plugin.
> > > > > 
> > > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > > plugins
> > > > > that
> > > > > would live in the GCC source tree and be shipped as part of
> > > > > the
> > > > > GCC
> > > > > tarball,
> > > > > with a new:
> > > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > > configure option, analogous to --enable-languages (the
> > > > > Makefile/configure
> > > > > machinery for handling in-tree GCC plugins is adapted from
> > > > > how
> > > > > we
> > > > > support
> > > > > frontends).
> > > > 
> > > > I like that.  Implementing this as a plugin surely must help to
> > > > either
> > > > document the GCC plugin interface as powerful/mature for such a
> > > > task.  Or
> > > > make it so, if it isn't yet.  ;-)
> > > 
> > > Our plugin "interface" as such is very broad.
> > 
> > Just to sneak in here I don't like exposing our current plugin
> > "non-
> > API"
> > more.  In fact I'd just build the analyzer into GCC with maybe an
> > option to disable its build (in case it is very fat?).
> 
> My aim here is to provide a way for distributors to be able to
> disable
> its build - indeed, for now, for it to be disabled by default,
> requiring opting-in.
It seems like there's some move to have this as part of the core
compiler rather than as a plug-in.  That's a bit of a surprise, but a
good one.


> I want some way to label the code as a "technology preview", that
> people may want to experiment with, but to set expectations that this
> is a lot of new code and there will be bugs - but to make it
> available
> to make it easier for adventurous users to try it out.
> 
> I hope that makes sense.
> 
> I went down the "in-tree plugin" path by seeing the analogy with
> frontends, but yes, it would probably be simpler to just build it
> into
> GCC, guarded with a configure-time variable.  It's many thousand
> lines
> of non-trivial C++ code, and associated selftests and DejaGnu tests.
Given the overall feedback, core component with an opt-out seems like
it'd be best.

jeff
>
Jeff Law Dec. 6, 2019, 10:31 p.m. UTC | #11
On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> On 11/15/19 6:22 PM, David Malcolm wrote:
> > This patch kit introduces a static analysis pass for GCC that can
> > diagnose
> > various kinds of problems in C code at compile-time (e.g. double-
> > free,
> > use-after-free, etc).
> 
> I haven't looked at the analyzer bits in any detail yet so I have
> just some very high-level questions.  But first let me say I'm
> excited to see this project! :)
> 
> It looks like the analyzer detects some of the same problems as
> some existing middle-end warnings (e.g., -Wnonnull, -Wuninitialized),
> and some that I have been working toward implementing (invalid uses
> of freed pointers such as returning them from functions or passing
> them to others), and others still that I have been thinking about
> as possible future projects (e.g., detecting uses of uninitialized
> arrays in string functions).
> 
> What are your thoughts about this sort of overlap?  Do you expect
> us to enhance both sets of warnings in parallel, or do you see us
> moving away from issuing warnings in the middle-end and toward
> making the analyzer the main source of these kinds of diagnostics?
> Maybe even replace some of the problematic middle-end warnings
> with the analyzer?  What (if anything) should we do about warnings
> issued for the same problems by both the middle-end and the analyzer?
> Or about false negatives?  E.g., a bug detected by the middle-end
> but not the analyzer or vice versa.
> 
> What do you see as the biggest pros and cons of either approach?
> (Middle-end vs analyzer.)  What limitations is the analyzer
> approach inherently subject to that the middle-end warnings aren't,
> and vice versa?
> 
> How do we prioritize between the two approaches (e.g., choose
> where to add a new warning)?
Given the cost of David's analyzer, I would tend to prioritize the more
localized analysis.  Also note that because of the compile-time
complexities we end up pruning paths from the search space and lose
precision when we have to merge nodes.   These issues are inherent in
the depth of analysis we're looking to do.

So the way to think about things is David's work is a slower, deeper
analysis than what we usually do.  So things that are reasonable
candidates for -Wall would need to use the traditional mechansisms. 
Things that require deeper analysis would be done in David's framework.

Also note that part of David's work is to bring a fairly generic engine
that we can expand with different domain specific analyzers.  It just
happens to be the case that the first place he's focused is on double-
free and use-after-free.  But (IMHO) the gem is really the generic
engine.

jeff
Richard Biener Dec. 9, 2019, 8:10 a.m. UTC | #12
On Fri, Dec 6, 2019 at 11:31 PM Jeff Law <law@redhat.com> wrote:
>
> On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> > On 11/15/19 6:22 PM, David Malcolm wrote:
> > > This patch kit introduces a static analysis pass for GCC that can
> > > diagnose
> > > various kinds of problems in C code at compile-time (e.g. double-
> > > free,
> > > use-after-free, etc).
> >
> > I haven't looked at the analyzer bits in any detail yet so I have
> > just some very high-level questions.  But first let me say I'm
> > excited to see this project! :)
> >
> > It looks like the analyzer detects some of the same problems as
> > some existing middle-end warnings (e.g., -Wnonnull, -Wuninitialized),
> > and some that I have been working toward implementing (invalid uses
> > of freed pointers such as returning them from functions or passing
> > them to others), and others still that I have been thinking about
> > as possible future projects (e.g., detecting uses of uninitialized
> > arrays in string functions).
> >
> > What are your thoughts about this sort of overlap?  Do you expect
> > us to enhance both sets of warnings in parallel, or do you see us
> > moving away from issuing warnings in the middle-end and toward
> > making the analyzer the main source of these kinds of diagnostics?
> > Maybe even replace some of the problematic middle-end warnings
> > with the analyzer?  What (if anything) should we do about warnings
> > issued for the same problems by both the middle-end and the analyzer?
> > Or about false negatives?  E.g., a bug detected by the middle-end
> > but not the analyzer or vice versa.
> >
> > What do you see as the biggest pros and cons of either approach?
> > (Middle-end vs analyzer.)  What limitations is the analyzer
> > approach inherently subject to that the middle-end warnings aren't,
> > and vice versa?
> >
> > How do we prioritize between the two approaches (e.g., choose
> > where to add a new warning)?
> Given the cost of David's analyzer, I would tend to prioritize the more
> localized analysis.  Also note that because of the compile-time
> complexities we end up pruning paths from the search space and lose
> precision when we have to merge nodes.   These issues are inherent in
> the depth of analysis we're looking to do.
>
> So the way to think about things is David's work is a slower, deeper
> analysis than what we usually do.  So things that are reasonable
> candidates for -Wall would need to use the traditional mechansisms.
> Things that require deeper analysis would be done in David's framework.
>
> Also note that part of David's work is to bring a fairly generic engine
> that we can expand with different domain specific analyzers.  It just
> happens to be the case that the first place he's focused is on double-
> free and use-after-free.  But (IMHO) the gem is really the generic
> engine.

So if the "generic engine" lives inside GCC can the actual analyzers
be plugins on a (stable) "analyzer plugin API"?

Does the analyzer work with LTO at whole-program scope btw?

Richard.

> jeff
>
David Malcolm Dec. 11, 2019, 8:11 p.m. UTC | #13
On Mon, 2019-12-09 at 09:10 +0100, Richard Biener wrote:
> On Fri, Dec 6, 2019 at 11:31 PM Jeff Law <law@redhat.com> wrote:
> > On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> > > On 11/15/19 6:22 PM, David Malcolm wrote:
> > > > This patch kit introduces a static analysis pass for GCC that
> > > > can
> > > > diagnose
> > > > various kinds of problems in C code at compile-time (e.g.
> > > > double-
> > > > free,
> > > > use-after-free, etc).
> > > 
> > > I haven't looked at the analyzer bits in any detail yet so I have
> > > just some very high-level questions.  But first let me say I'm
> > > excited to see this project! :)
> > > 
> > > It looks like the analyzer detects some of the same problems as
> > > some existing middle-end warnings (e.g., -Wnonnull,
> > > -Wuninitialized),
> > > and some that I have been working toward implementing (invalid
> > > uses
> > > of freed pointers such as returning them from functions or
> > > passing
> > > them to others), and others still that I have been thinking about
> > > as possible future projects (e.g., detecting uses of
> > > uninitialized
> > > arrays in string functions).
> > > 
> > > What are your thoughts about this sort of overlap?  Do you expect
> > > us to enhance both sets of warnings in parallel, or do you see us
> > > moving away from issuing warnings in the middle-end and toward
> > > making the analyzer the main source of these kinds of
> > > diagnostics?
> > > Maybe even replace some of the problematic middle-end warnings
> > > with the analyzer?  What (if anything) should we do about
> > > warnings
> > > issued for the same problems by both the middle-end and the
> > > analyzer?
> > > Or about false negatives?  E.g., a bug detected by the middle-end
> > > but not the analyzer or vice versa.
> > > 
> > > What do you see as the biggest pros and cons of either approach?
> > > (Middle-end vs analyzer.)  What limitations is the analyzer
> > > approach inherently subject to that the middle-end warnings
> > > aren't,
> > > and vice versa?
> > > 
> > > How do we prioritize between the two approaches (e.g., choose
> > > where to add a new warning)?
> > Given the cost of David's analyzer, I would tend to prioritize the
> > more
> > localized analysis.  Also note that because of the compile-time
> > complexities we end up pruning paths from the search space and lose
> > precision when we have to merge nodes.   These issues are inherent
> > in
> > the depth of analysis we're looking to do.
> > 
> > So the way to think about things is David's work is a slower,
> > deeper
> > analysis than what we usually do.  So things that are reasonable
> > candidates for -Wall would need to use the traditional mechansisms.
> > Things that require deeper analysis would be done in David's
> > framework.
> > 
> > Also note that part of David's work is to bring a fairly generic
> > engine
> > that we can expand with different domain specific analyzers.  It
> > just
> > happens to be the case that the first place he's focused is on
> > double-
> > free and use-after-free.  But (IMHO) the gem is really the generic
> > engine.
> 
> So if the "generic engine" lives inside GCC can the actual analyzers
> be plugins on a (stable) "analyzer plugin API"?

I like the idea of having plugins be able to support the analyzer
itself, so that new checkers can be registered by a plugin, analogous
to plugins that register new passes.  AIUI the clang static analyzer
works in such a fashion.

However, speaking to the "(stable)" part of your question: to do
anything useful, the checkers have to query GCC's IR (as well as
interact with the state of the analyzer), and so this reopens the
question of what the plugin API to GCC's IR is.

I'm focusing on building a concrete example of a checker (double-free)
and a few other examples; trying to generalize it into something
pluggable feels very much like something not to attempt in the initial
version.

> Does the analyzer work with LTO at whole-program scope btw?

My understanding of LTO is a little hazy, but yes, I think.

The first thing the analyzer does (in engine.cc) is:

  /* If using LTO, ensure that the cgraph nodes have function bodies.  */
  cgraph_node *node;
  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
    node->get_untransformed_body ();

before then building a "supergraph" that combines CFGs and the callgraph.

BTW, for more on implementation details, prebuilt HTML of the internal
docs are at:
https://dmalcolm.fedorapeople.org/gcc/static-analyzer/gccint/Static-Analyzer.html

Dave
David Malcolm Dec. 16, 2019, 2:09 p.m. UTC | #14
On Tue, 2019-12-03 at 18:17 +0100, Jakub Jelinek wrote:
> On Tue, Dec 03, 2019 at 11:52:13AM -0500, David Malcolm wrote:
> > > > Our plugin "interface" as such is very broad.
> > > 
> > > Just to sneak in here I don't like exposing our current plugin
> > > "non-
> > > API"
> > > more.  In fact I'd just build the analyzer into GCC with maybe an
> > > option to disable its build (in case it is very fat?).
> > 
> > My aim here is to provide a way for distributors to be able to
> > disable
> > its build - indeed, for now, for it to be disabled by default,
> > requiring opting-in.
> > 
> > My reasoning here is that the analyzer is middle-end code, but
> > isn't as
> > mature as the rest of the middle-end (but I'm working on getting it
> > more mature).
> > 
> > I want some way to label the code as a "technology preview", that
> > people may want to experiment with, but to set expectations that
> > this
> > is a lot of new code and there will be bugs - but to make it
> > available
> > to make it easier for adventurous users to try it out.
> > 
> > I hope that makes sense.
> > 
> > I went down the "in-tree plugin" path by seeing the analogy with
> > frontends, but yes, it would probably be simpler to just build it
> > into
> > GCC, guarded with a configure-time variable.  It's many thousand
> > lines
> > of non-trivial C++ code, and associated selftests and DejaGnu
> > tests.
> 
> I think it is enough to document it as tech preview in the
> documentation,
> no need to have it as an in-tree plugin.  We have lots of options
> that had
> such a state (perhaps undeclared) over the years, I'd consider
> -fvtable-verify= to be such an option, or in the past e.g.
> -fipa-matrix-reorg or -fipa-struct-reorg.  And 2.5% code growth isn't
> that
> bad.  So, as long as the option isn't enabled by default, I think
> we'd be
> fine.

FWIW I did some testing of v4 of the patch kit [1], which drops the in-
tree plugin idea in favor of simply building the analyzer into the
compiler as a regular IPA pass.  The pass is disabled by default
(enabled by -fanalyzer).  There is also a configure-time option to
disable building it (it's built by default).

I did 3 bootstraps of a release build of x86_64-pc-linux-gnu:
- unpatched,
- with the kit but with --disable-analyzer, and
- with the kit, with the analyzer enabled.

Here are the sizes of cc1 and cc1plus in bytes in each build, after
stripping debuginfo (and showing the change relative to the unpatched
build:

          Unpatched:   With kit:
                       Disabled  change:         Enabled   change:
cc1       25778720     25815672  +36952 (+0.1%)  26270328  +491608 (+1.9%)
cc1plus   27355296     27388152  +32856 (+0.1%)  27842808  +487512 (+1.8%)


So it's a little less than 2% code growth.

Dave

[1] see https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer for the
various links