diff mbox series

[RFC,5/8] static-analyzer: Enforce coroutine_fn restrictions on function pointers

Message ID 20220702113331.2003820-6-afaria@redhat.com
State New
Headers show
Series Introduce an extensible static analyzer | expand

Commit Message

Alberto Faria July 2, 2022, 11:33 a.m. UTC
Extend static-analyzer.py to enforce coroutine_fn restrictions on
function pointer operations.

Invalid operations include assigning a coroutine_fn value to a
non-coroutine_fn function pointer, and invoking a coroutine_fn function
pointer from a non-coroutine_fn function.

Signed-off-by: Alberto Faria <afaria@redhat.com>
---
 static-analyzer.py | 147 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 143 insertions(+), 4 deletions(-)

Comments

Víctor Colombo July 4, 2022, 2:16 p.m. UTC | #1
On 02/07/2022 08:33, Alberto Faria wrote:

Alberto, hello. I was testing this patch as follows:

./static-analyzer.py build target/ppc/mmu-hash64.c

<snip>

> @@ -627,9 +744,31 @@ def is_coroutine_fn(node: Cursor) -> bool:
>           else:
>               break
> 
> -    return node.kind == CursorKind.FUNCTION_DECL and is_annotated_with(
> -        node, "coroutine_fn"
> -    )
> +    if node.kind in [CursorKind.DECL_REF_EXPR, CursorKind.MEMBER_REF_EXPR]:
> +        node = node.referenced
> +
> +    # ---
> +
> +    if node.kind == CursorKind.FUNCTION_DECL:


And I receive an exception on the line above saying that node is of type
NoneType. Seems that `node = node.referenced` is setting `node` to None
in this case.

I was unable to understand the root cause of it. Is this an incorrect
usage of the tool from my part? Full error message below

Traceback (most recent call last):
   File "./static-analyzer.py", line 327, in analyze_translation_unit
     checker(tu, context.absolute_path, log)
   File "./static-analyzer.py", line 613, in check_coroutine_pointers
     and is_coroutine_fn(right)
   File "./static-analyzer.py", line 781, in is_coroutine_fn
     if node.kind == CursorKind.FUNCTION_DECL:
AttributeError: 'NoneType' object has no attribute 'kind'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
   File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
     result = (True, func(*args, **kwds))
   File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
     return list(map(*args))
   File "./static-analyzer.py", line 329, in analyze_translation_unit
     raise RuntimeError(f"Error analyzing {relative_path}") from e
RuntimeError: Error analyzing target/ppc/mmu-hash64.c
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
   File "./static-analyzer.py", line 893, in <module>
     main()
   File "./static-analyzer.py", line 123, in main
     analyze_translation_units(args, contexts)
   File "./static-analyzer.py", line 240, in analyze_translation_units
     results = pool.map(analyze_translation_unit, contexts)
   File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
     return self._map_async(func, iterable, mapstar, chunksize).get()
   File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
     raise self._value
RuntimeError: Error analyzing target/ppc/mmu-hash64.c

 > +        return is_annotated_with(node, "coroutine_fn")
> +
> +    if node.kind in [
> +        CursorKind.FIELD_DECL,
> +        CursorKind.VAR_DECL,
> +        CursorKind.PARM_DECL,
> +    ]:
> +
> +        if is_annotated_with(node, "coroutine_fn"):
> +            return True
> +
> +        # TODO: If type is typedef or pointer to typedef, follow typedef.
> +
> +        return False
> +
> +    if node.kind == CursorKind.TYPEDEF_DECL:
> +        return is_annotated_with(node, "coroutine_fn")
> +
> +    return False
Best regards,
Alberto Faria July 4, 2022, 4:57 p.m. UTC | #2
Hi Víctor,

On Mon, Jul 4, 2022 at 3:18 PM Víctor Colombo
<victor.colombo@eldorado.org.br> wrote:
> And I receive an exception on the line above saying that node is of type
> NoneType. Seems that `node = node.referenced` is setting `node` to None
> in this case.
>
> I was unable to understand the root cause of it. Is this an incorrect
> usage of the tool from my part? Full error message below

Unfortunately there seem to be a lot of corner cases that libclang can
throw at us. I hadn't come across this one before. I expected that
DECL_REF_EXPR/MEMBER_REF_EXPR would always reference something.

This may be due to some build error -- libclang tries to continue
processing a translation unit by dropping subtrees or nodes that have
problems. Is there a "too many errors emitted, stopping now; this may
lead to false positives and negatives" line at the top of the script's
output?

Thanks,
Alberto
Víctor Colombo July 4, 2022, 5:46 p.m. UTC | #3
On 04/07/2022 13:57, Alberto Faria wrote:
> Hi Víctor,
> 
> On Mon, Jul 4, 2022 at 3:18 PM Víctor Colombo
> <victor.colombo@eldorado.org.br> wrote:
>> And I receive an exception on the line above saying that node is of type
>> NoneType. Seems that `node = node.referenced` is setting `node` to None
>> in this case.
>>
>> I was unable to understand the root cause of it. Is this an incorrect
>> usage of the tool from my part? Full error message below
> 
> Unfortunately there seem to be a lot of corner cases that libclang can
> throw at us. I hadn't come across this one before. I expected that
> DECL_REF_EXPR/MEMBER_REF_EXPR would always reference something.
> 
> This may be due to some build error -- libclang tries to continue
> processing a translation unit by dropping subtrees or nodes that have
> problems. Is there a "too many errors emitted, stopping now; this may
> lead to false positives and negatives" line at the top of the script's
> output?
> 

Yes, this line is present at the beginning of the output
Is this caused by problems with the code being analyzed or is it because
libclang is getting confused with something that is outside of our
control?
Alberto Faria July 4, 2022, 6:04 p.m. UTC | #4
On Mon, Jul 4, 2022 at 6:46 PM Víctor Colombo
<victor.colombo@eldorado.org.br> wrote:
> Yes, this line is present at the beginning of the output
> Is this caused by problems with the code being analyzed or is it because
> libclang is getting confused with something that is outside of our
> control?

I think I found the problem: the commands in the compilation database
weren't being parsed properly. I switched to shlex.split() and it
seems to be working now. The WIP v2 is available at [1], if you want
to give it a try.

Thanks for reporting this!

Alberto

[1] https://gitlab.com/albertofaria/qemu/-/tree/static-analysis
Víctor Colombo July 4, 2022, 7:06 p.m. UTC | #5
On 04/07/2022 15:04, Alberto Faria wrote:
> On Mon, Jul 4, 2022 at 6:46 PM Víctor Colombo
> <victor.colombo@eldorado.org.br> wrote:
>> Yes, this line is present at the beginning of the output
>> Is this caused by problems with the code being analyzed or is it because
>> libclang is getting confused with something that is outside of our
>> control?
> 
> I think I found the problem: the commands in the compilation database
> weren't being parsed properly. I switched to shlex.split() and it
> seems to be working now. The WIP v2 is available at [1], if you want
> to give it a try.
> 
> Thanks for reporting this!
> 
> Alberto
> 
> [1] https://gitlab.com/albertofaria/qemu/-/tree/static-analysis
> 

I tested the version from the WIP v2 and seems to be working now.
Thanks!
diff mbox series

Patch

diff --git a/static-analyzer.py b/static-analyzer.py
index 8abc552b82..485d054052 100755
--- a/static-analyzer.py
+++ b/static-analyzer.py
@@ -36,6 +36,8 @@ 
     TranslationUnit,
     TranslationUnitLoadError,
     TypeKind,
+    SourceRange,
+    TokenGroup,
 )
 
 Cursor.__hash__ = lambda self: self.hash  # so `Cursor`s can be dict keys
@@ -515,6 +517,90 @@  def check_coroutine_calls(
                 log(call, "non-coroutine_fn function calls coroutine_fn")
 
 
+@check("coroutine-pointers")
+def check_coroutine_pointers(
+    translation_unit: TranslationUnit,
+    translation_unit_path: str,
+    log: Callable[[Cursor, str], None],
+) -> None:
+
+    # What we would really like is to associate annotation attributes with types
+    # directly, but that doesn't seem possible. Instead, we have to look at the
+    # relevant variable/field/parameter declarations, and follow typedefs.
+
+    # This doesn't check all possible ways of assigning to a coroutine_fn
+    # field/variable/parameter. That would probably be too much work.
+
+    # TODO: Check struct/union/array initialization.
+    # TODO: Check assignment to struct/union/array fields.
+
+    for [*_, node] in all_paths(translation_unit.cursor):
+
+        # check initialization of variables using coroutine_fn values
+
+        if node.kind == CursorKind.VAR_DECL:
+
+            children = [
+                c
+                for c in node.get_children()
+                if c.kind
+                not in [
+                    CursorKind.ANNOTATE_ATTR,
+                    CursorKind.INIT_LIST_EXPR,
+                    CursorKind.TYPE_REF,
+                    CursorKind.UNEXPOSED_ATTR,
+                ]
+            ]
+
+            if (
+                len(children) == 1
+                and not is_coroutine_fn(node)
+                and is_coroutine_fn(children[0])
+            ):
+                log(node, "assigning coroutine_fn to non-coroutine_fn")
+
+        # check initialization of fields using coroutine_fn values
+
+        # TODO: This only checks designator initializers.
+
+        if node.kind == CursorKind.INIT_LIST_EXPR:
+
+            for initializer in filter(
+                lambda n: n.kind == CursorKind.UNEXPOSED_EXPR,
+                node.get_children(),
+            ):
+
+                children = list(initializer.get_children())
+
+                if (
+                    len(children) == 2
+                    and children[0].kind == CursorKind.MEMBER_REF
+                    and not is_coroutine_fn(children[0].referenced)
+                    and is_coroutine_fn(children[1])
+                ):
+                    log(
+                        initializer,
+                        "assigning coroutine_fn to non-coroutine_fn",
+                    )
+
+        # check assignments of coroutine_fn values to variables or fields
+
+        if is_binary_operator(node, "="):
+
+            [left, right] = node.get_children()
+
+            if (
+                left.kind
+                in [
+                    CursorKind.DECL_REF_EXPR,
+                    CursorKind.MEMBER_REF_EXPR,
+                ]
+                and not is_coroutine_fn(left.referenced)
+                and is_coroutine_fn(right)
+            ):
+                log(node, "assigning coroutine_fn to non-coroutine_fn")
+
+
 # ---------------------------------------------------------------------------- #
 # Traversal
 
@@ -555,6 +641,31 @@  def subtrees_matching(
 # Node predicates
 
 
+def is_binary_operator(node: Cursor, operator: str) -> bool:
+    return (
+        node.kind == CursorKind.BINARY_OPERATOR
+        and get_binary_operator_spelling(node) == operator
+    )
+
+
+def get_binary_operator_spelling(node: Cursor) -> Optional[str]:
+
+    assert node.kind == CursorKind.BINARY_OPERATOR
+
+    [left, right] = node.get_children()
+
+    op_range = SourceRange.from_locations(left.extent.end, right.extent.start)
+
+    tokens = list(TokenGroup.get_tokens(node.translation_unit, op_range))
+    if not tokens:
+        # Can occur when left and right children extents overlap due to
+        # misparsing.
+        return None
+
+    [op_token, *_] = tokens
+    return op_token.spelling
+
+
 def is_valid_coroutine_fn_usage(parent: Cursor) -> bool:
     """
     Check if an occurrence of `coroutine_fn` represented by a node with parent
@@ -599,7 +710,13 @@  def is_valid_allow_coroutine_fn_call_usage(node: Cursor) -> bool:
     `node` appears at a valid point in the AST. This is the case if its right
     operand is a call to:
 
-      - A function declared with the `coroutine_fn` annotation.
+      - A function declared with the `coroutine_fn` annotation, OR
+      - A field/variable/parameter whose declaration has the `coroutine_fn`
+        annotation, and of function pointer type, OR
+      - [TODO] A field/variable/parameter of a typedef function pointer type,
+        and the typedef has the `coroutine_fn` annotation, OR
+      - [TODO] A field/variable/parameter of a pointer to typedef function type,
+        and the typedef has the `coroutine_fn` annotation.
 
     TODO: Ensure that `__allow_coroutine_fn_call()` is in the body of a
     non-`coroutine_fn` function.
@@ -627,9 +744,31 @@  def is_coroutine_fn(node: Cursor) -> bool:
         else:
             break
 
-    return node.kind == CursorKind.FUNCTION_DECL and is_annotated_with(
-        node, "coroutine_fn"
-    )
+    if node.kind in [CursorKind.DECL_REF_EXPR, CursorKind.MEMBER_REF_EXPR]:
+        node = node.referenced
+
+    # ---
+
+    if node.kind == CursorKind.FUNCTION_DECL:
+        return is_annotated_with(node, "coroutine_fn")
+
+    if node.kind in [
+        CursorKind.FIELD_DECL,
+        CursorKind.VAR_DECL,
+        CursorKind.PARM_DECL,
+    ]:
+
+        if is_annotated_with(node, "coroutine_fn"):
+            return True
+
+        # TODO: If type is typedef or pointer to typedef, follow typedef.
+
+        return False
+
+    if node.kind == CursorKind.TYPEDEF_DECL:
+        return is_annotated_with(node, "coroutine_fn")
+
+    return False
 
 
 def is_annotated_with(node: Cursor, annotation: str) -> bool: