Fix comparison of trees via tree_cmp
diff mbox series

Message ID 20200122170231.GA10992@dyn-9-152-222-55.boeblingen.de.ibm.com
State New
Headers show
Series
  • Fix comparison of trees via tree_cmp
Related show

Commit Message

Stefan Schulze Frielinghaus Jan. 22, 2020, 5:02 p.m. UTC
Hi David,

In function `tree_cmp` an invariant [1] is assumed which does not necessarily
exist. In case both input trees are finally compared via `strcmp`, then

  tree_cmp (t1, t2) == -tree_cmp (t2, t1)

does not hold in general, since function `strcmp (x, y)` guarantees only that a
negative integer is returned in case x<y or a positive integer in case x>y.
Currently this breaks s390x where, for example, for certain inputs x,y
`tree_cmp (x, y) == 1` and `tree_cmp (y, x) == -2` hold.  The attached patch
normalizes the output from `strcmp` to -1, 0, or 1 while using an auxiliary
function `sign` (stolen from the Hacker's Delight book ;-)).

Bootstrapped and tested on s390x. Any thoughts?

Cheers,
Stefan

[1] https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/analyzer/region-model.cc;h=9474c6737d54d68f5b36893903cfa6d19df0efed;hb=HEAD#l1849

Comments

Alexander Monakov Jan. 22, 2020, 5:08 p.m. UTC | #1
On Wed, 22 Jan 2020, Stefan Schulze Frielinghaus wrote:

> Hi David,
> 
> In function `tree_cmp` an invariant [1] is assumed which does not necessarily
> exist. In case both input trees are finally compared via `strcmp`, then
> 
>   tree_cmp (t1, t2) == -tree_cmp (t2, t1)
> 
> does not hold in general, since function `strcmp (x, y)` guarantees only that a
> negative integer is returned in case x<y or a positive integer in case x>y.
> Currently this breaks s390x where, for example, for certain inputs x,y
> `tree_cmp (x, y) == 1` and `tree_cmp (y, x) == -2` hold.  The attached patch
> normalizes the output from `strcmp` to -1, 0, or 1 while using an auxiliary
> function `sign` (stolen from the Hacker's Delight book ;-)).
> 
> Bootstrapped and tested on s390x. Any thoughts?

It's more appropriate to fix the assert rather than the comparator, like

  gcc_assert (sign (reversed) == -sign (result));

But qsort_chk already checks that, and more, so why is the assert there?
Shouldn't it be simply removed?

Alexander
Jakub Jelinek Jan. 22, 2020, 6:02 p.m. UTC | #2
On Wed, Jan 22, 2020 at 08:08:32PM +0300, Alexander Monakov wrote:
> 
> 
> On Wed, 22 Jan 2020, Stefan Schulze Frielinghaus wrote:
> 
> > Hi David,
> > 
> > In function `tree_cmp` an invariant [1] is assumed which does not necessarily
> > exist. In case both input trees are finally compared via `strcmp`, then
> > 
> >   tree_cmp (t1, t2) == -tree_cmp (t2, t1)
> > 
> > does not hold in general, since function `strcmp (x, y)` guarantees only that a
> > negative integer is returned in case x<y or a positive integer in case x>y.
> > Currently this breaks s390x where, for example, for certain inputs x,y
> > `tree_cmp (x, y) == 1` and `tree_cmp (y, x) == -2` hold.  The attached patch
> > normalizes the output from `strcmp` to -1, 0, or 1 while using an auxiliary
> > function `sign` (stolen from the Hacker's Delight book ;-)).
> > 
> > Bootstrapped and tested on s390x. Any thoughts?
> 
> It's more appropriate to fix the assert rather than the comparator, like
> 
>   gcc_assert (sign (reversed) == -sign (result));
> 
> But qsort_chk already checks that, and more, so why is the assert there?
> Shouldn't it be simply removed?

Yeah.  Note there is also return DECL_UID (t1) - DECL_UID (t2); that also
doesn't guarantee -1/0/1 return values, so the patch as posted isn't
sufficient.

	Jakub
David Malcolm Jan. 23, 2020, 3:08 p.m. UTC | #3
On Wed, 2020-01-22 at 19:02 +0100, Jakub Jelinek wrote:
> On Wed, Jan 22, 2020 at 08:08:32PM +0300, Alexander Monakov wrote:
> > 
> > On Wed, 22 Jan 2020, Stefan Schulze Frielinghaus wrote:
> > 
> > > Hi David,
> > > 
> > > In function `tree_cmp` an invariant [1] is assumed which does not
> > > necessarily
> > > exist. In case both input trees are finally compared via
> > > `strcmp`, then
> > > 
> > >   tree_cmp (t1, t2) == -tree_cmp (t2, t1)
> > > 
> > > does not hold in general, since function `strcmp (x, y)`
> > > guarantees only that a
> > > negative integer is returned in case x<y or a positive integer in
> > > case x>y.
> > > Currently this breaks s390x where, for example, for certain
> > > inputs x,y
> > > `tree_cmp (x, y) == 1` and `tree_cmp (y, x) == -2` hold.  The
> > > attached patch
> > > normalizes the output from `strcmp` to -1, 0, or 1 while using an
> > > auxiliary
> > > function `sign` (stolen from the Hacker's Delight book ;-)).
> > > 
> > > Bootstrapped and tested on s390x. Any thoughts?
> > 
> > It's more appropriate to fix the assert rather than the comparator,
> > like
> > 
> >   gcc_assert (sign (reversed) == -sign (result));
> > 
> > But qsort_chk already checks that, and more, so why is the assert
> > there?
> > Shouldn't it be simply removed?
> 
> Yeah.  Note there is also return DECL_UID (t1) - DECL_UID (t2); that
> also
> doesn't guarantee -1/0/1 return values, so the patch as posted isn't
> sufficient.

Sorry about the breakage; to be specific, I've reproduced this on
s390x-ibm-linux-gnu, where it fails the selftests in stage 1, breaking
the build (unless configured with --disable-analyzer).

Removing the assertions fixes it for me (a stage1 build, at least, and
it then passes the testsuite).

I've made this blunder in four places in the analyzer:

  call-string.cc:162:  call_string::cmp
  engine.cc:1820:  worklist::key_t::cmp
  program-point.cc:461:  function_point::cmp_within_supernode
  region-model.cc:1878:  tree_cmp

IIRC, I added these checks as I was finding it difficult to debug
things when qsort_chk failed - the first three of the comparators in
the list above build on each other:  worklist::key_t::cmp uses both
function_point::cmp_within_supernode and call_string::cmp (and various
other things), so if the worklist qsort_chk fails, it sometimes
required a fair bit of digging into which of the nested comparators had
failed (also, all the void * don't help; I'd love to have a template
that does a typesafe qsort without needing casts in the comparator).

Some approaches for fixing this:
(a) I can simply remove these checks on all the comparators
(b) I could fix the checks so that they merely look at the signedness
of the results
(c) Remove the checks from tree_cmp, but retain them in the other
places, fixing them to just look at signedness, for ease of debugging.

I think I prefer (c) as it makes debugging failures easier, though I
appreciate it's rather redundant.

Thoughts?

Dave
Alexander Monakov Jan. 23, 2020, 3:46 p.m. UTC | #4
On Thu, 23 Jan 2020, David Malcolm wrote:

> Removing the assertions fixes it for me (a stage1 build, at least, and
> it then passes the testsuite).
> 
> I've made this blunder in four places in the analyzer:
> 
>   call-string.cc:162:  call_string::cmp
>   program-point.cc:461:  function_point::cmp_within_supernode

These two don't raise concerns on brief inspection.

>   engine.cc:1820:  worklist::key_t::cmp

This one tries to use signed difference of hashes as return value, this is
not going to work in general while passing your assert almost always (except
when difference of hashes is exactly -2^31). The problem is, difference of
hashes leads to non-transitive comparison and qsort_chk can catch it on a
suitable testcase.

>   region-model.cc:1878:  tree_cmp

This is the one under discussion here.

> IIRC, I added these checks as I was finding it difficult to debug
> things when qsort_chk failed - the first three of the comparators in
> the list above build on each other:  worklist::key_t::cmp uses both
> function_point::cmp_within_supernode and call_string::cmp (and various
> other things), so if the worklist qsort_chk fails, it sometimes
> required a fair bit of digging into which of the nested comparators had
> failed (also, all the void * don't help; I'd love to have a template
> that does a typesafe qsort without needing casts in the comparator).

FWIW qsort_chk_error is a separate function to make that easier: if you place a
breakpoint on qsort_chk_error you just need to step a few times to get into a
comparator that is under suspicion, and don't need to type out casts in gdb.

Alexander

Patch
diff mbox series

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 9474c6737d5..bdd9a97e5f8 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -62,6 +62,20 @@  along with GCC; see the file COPYING3.  If not see
 
 #if ENABLE_ANALYZER
 
+/* Normalize X to either -1, 0, or 1.
+
+   Basically performs the same as
+     if (x < 0) return -1;
+     else if (x > 0) return 1;
+     else return 0;
+   but in a concise way trying to prevent branches.  */
+
+static int
+sign (int x)
+{
+  return (x > 0) - (x < 0);
+}
+
 /* Dump T to PP in language-independent form, for debugging/logging/dumping
    purposes.  */
 
@@ -1768,8 +1782,8 @@  tree_cmp (const_tree t1, const_tree t2)
   if (DECL_P (t1))
     {
       if (DECL_NAME (t1) && DECL_NAME (t2))
-	return strcmp (IDENTIFIER_POINTER (DECL_NAME (t1)),
-		       IDENTIFIER_POINTER (DECL_NAME (t2)));
+	return sign (strcmp (IDENTIFIER_POINTER (DECL_NAME (t1)),
+			     IDENTIFIER_POINTER (DECL_NAME (t2))));
       else
 	{
 	  if (DECL_NAME (t1))
@@ -1819,8 +1833,8 @@  tree_cmp (const_tree t1, const_tree t2)
       }
 
     case STRING_CST:
-      return strcmp (TREE_STRING_POINTER (t1),
-		     TREE_STRING_POINTER (t2));
+      return sign (strcmp (TREE_STRING_POINTER (t1),
+			   TREE_STRING_POINTER (t2)));
 
     default:
       gcc_unreachable ();