diff mbox series

[RFC] Add dynamic edge/bb flag allocation

Message ID alpine.LSU.2.20.1805181304180.24704@zhemvz.fhfr.qr
State New
Headers show
Series [RFC] Add dynamic edge/bb flag allocation | expand

Commit Message

Richard Biener May 18, 2018, 11:11 a.m. UTC
The following adds a simple alloc/free_flag machinery allocating
bits from an int typed pool and applies that to bb->flags and edge->flags.
This should allow infrastructure pieces to use egde/bb flags temporarily
without worrying that users might already use it as for example
BB_VISITED and friends.  It converts one clever user to the new interface.

The allocation state is per CFG but we could also make it global
or merge the two pools so one allocates a flag that can be used for
bbs and edges at the same time.

Thus - any opinions welcome.  I'm mainly targeting cfganal algorithms
where I want to add a few region-based ones that to be O(region-size)
complexity may not use sbitmaps for visited sets because of the clearing
overhead and bitmaps are probably more expensive to use than a BB/edge
flag that needs to be cleared afterwards.

Built on x86_64, otherwise untested.

Any comments?

Templating the core flag allocator comes to my mind (up to max
HOST_WIDE_INT) as well as moving the implementation elsewhere (hwi.h?).

Thanks,
Richard.

From 739d8f202a161b4714cad8b19844ae5fa7924118 Mon Sep 17 00:00:00 2001
From: Richard Guenther <rguenther@suse.de>
Date: Fri, 18 May 2018 13:01:36 +0200
Subject: [PATCH 4/4] add dynamic cfg flag allocation

	* cfg.h (struct control_flow_graph): Add edge_flags_allocated and
	bb_flags_allocated members.
	(alloc_flag): Declare.
	(free_flag): Likewise.
	(alloc_bb_flag): New inline function.
	(alloc_edge_flag): Likewise.
	(free_bb_flag): Likewise.
	(free_edge_flag): Likewise.
	* cfg.c (alloc_flag): New function.
	(free_flag): Likewise.
	(init_flow): Mark static bb and edge flag allocated.
	* cfgloop.c (verify_loop_structure): Allocate temporary edge
	flag dynamically.

Comments

David Malcolm May 18, 2018, 1:15 p.m. UTC | #1
On Fri, 2018-05-18 at 13:11 +0200, Richard Biener wrote:
> The following adds a simple alloc/free_flag machinery allocating
> bits from an int typed pool and applies that to bb->flags and edge-
> >flags.
> This should allow infrastructure pieces to use egde/bb flags
> temporarily
> without worrying that users might already use it as for example
> BB_VISITED and friends.  It converts one clever user to the new
> interface.
> 
> The allocation state is per CFG but we could also make it global
> or merge the two pools so one allocates a flag that can be used for
> bbs and edges at the same time.
> 
> Thus - any opinions welcome.  I'm mainly targeting cfganal algorithms
> where I want to add a few region-based ones that to be O(region-size)
> complexity may not use sbitmaps for visited sets because of the
> clearing
> overhead and bitmaps are probably more expensive to use than a
> BB/edge
> flag that needs to be cleared afterwards.
> 
> Built on x86_64, otherwise untested.
> 
> Any comments?

Rather than putting alloc/free pairs at the usage sites, how about an
RAII class?  Something like this:

class auto_edge_flag
{
public:
   auto_edge_flag (function *fun)
   : m_flag (alloc_edge_flag (fun)), m_fun (fun)
   {}

   ~auto_edge_flag ()
   {
      free_edge_flag (m_fun, m_flag);
   }

   operator int () const { return m_flag; }

private:
  int m_flag;
  function *m_fun;
};


> Templating the core flag allocator comes to my mind (up to max
> HOST_WIDE_INT) as well as moving the implementation elsewhere
> (hwi.h?).

Maybe wrap the data type up in a class?  Passing around an "int *"
seemed a bit low-level to me.  (But maybe that's me overthinking it)

[...snip...]

> diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
> index 8af793c6015..64ad42c83ca 100644
> --- a/gcc/cfgloop.c
> +++ b/gcc/cfgloop.c
> @@ -1539,6 +1539,7 @@ verify_loop_structure (void)
>    /* Check irreducible loops.  */
>    if (loops_state_satisfies_p
> (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
>      {
> +      int saved_irr_mask = alloc_edge_flag (cfun);

         auto_edge_flag saved_irr_mask (cfun);

[...snip...]


Dave
Jeff Law May 21, 2018, 7:04 p.m. UTC | #2
On 05/18/2018 07:15 AM, David Malcolm wrote:
> On Fri, 2018-05-18 at 13:11 +0200, Richard Biener wrote:
>> The following adds a simple alloc/free_flag machinery allocating
>> bits from an int typed pool and applies that to bb->flags and edge-
>>> flags.
>> This should allow infrastructure pieces to use egde/bb flags
>> temporarily
>> without worrying that users might already use it as for example
>> BB_VISITED and friends.  It converts one clever user to the new
>> interface.
>>
>> The allocation state is per CFG but we could also make it global
>> or merge the two pools so one allocates a flag that can be used for
>> bbs and edges at the same time.
>>
>> Thus - any opinions welcome.  I'm mainly targeting cfganal algorithms
>> where I want to add a few region-based ones that to be O(region-size)
>> complexity may not use sbitmaps for visited sets because of the
>> clearing
>> overhead and bitmaps are probably more expensive to use than a
>> BB/edge
>> flag that needs to be cleared afterwards.
>>
>> Built on x86_64, otherwise untested.
>>
>> Any comments?
> 
> Rather than putting alloc/free pairs at the usage sites, how about an
> RAII class?  Something like this:
Yes, please if at all possible we should be using RAII.

jeff
Richard Biener May 22, 2018, 8:43 a.m. UTC | #3
On Mon, 21 May 2018, Jeff Law wrote:

> On 05/18/2018 07:15 AM, David Malcolm wrote:
> > On Fri, 2018-05-18 at 13:11 +0200, Richard Biener wrote:
> >> The following adds a simple alloc/free_flag machinery allocating
> >> bits from an int typed pool and applies that to bb->flags and edge-
> >>> flags.
> >> This should allow infrastructure pieces to use egde/bb flags
> >> temporarily
> >> without worrying that users might already use it as for example
> >> BB_VISITED and friends.  It converts one clever user to the new
> >> interface.
> >>
> >> The allocation state is per CFG but we could also make it global
> >> or merge the two pools so one allocates a flag that can be used for
> >> bbs and edges at the same time.
> >>
> >> Thus - any opinions welcome.  I'm mainly targeting cfganal algorithms
> >> where I want to add a few region-based ones that to be O(region-size)
> >> complexity may not use sbitmaps for visited sets because of the
> >> clearing
> >> overhead and bitmaps are probably more expensive to use than a
> >> BB/edge
> >> flag that needs to be cleared afterwards.
> >>
> >> Built on x86_64, otherwise untested.
> >>
> >> Any comments?
> > 
> > Rather than putting alloc/free pairs at the usage sites, how about an
> > RAII class?  Something like this:
> Yes, please if at all possible we should be using RAII.

So like the following?  (see comments in the hwint.h hunk for
extra C++ questions...)

I dropped the non-RAII interface - it's very likely never needed.

Better suggestions for placement of auto_flag welcome.

Thanks,
Richard.

From 8ae07eb0aa6c430605a16f043ec08726f81b2442 Mon Sep 17 00:00:00 2001
From: Richard Guenther <rguenther@suse.de>
Date: Fri, 18 May 2018 13:01:36 +0200
Subject: [PATCH 2/2] add dynamic cfg flag allocation

	* cfg.h (struct control_flow_graph): Add edge_flags_allocated and
	bb_flags_allocated members.
	(auto_edge_flag): New RAII class for allocating edge flags.
	(auto_bb_flag): New RAII class for allocating bb flags.
	* hwint.h (auto_flag): New RAII class for allocating flags.
	* cfgloop.c (verify_loop_structure): Allocate temporary edge
	flag dynamically.

cfg flag

diff --git a/gcc/cfg.c b/gcc/cfg.c
index 11026e7209a..f8b217d39ca 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -79,6 +79,8 @@ init_flow (struct function *the_fun)
     = EXIT_BLOCK_PTR_FOR_FN (the_fun);
   EXIT_BLOCK_PTR_FOR_FN (the_fun)->prev_bb
     = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
+  the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
+  the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
 }
 
 /* Helper function for remove_edge and clear_edges.  Frees edge structure
diff --git a/gcc/cfg.h b/gcc/cfg.h
index 0953456782b..f9f762a520b 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -74,6 +74,10 @@ struct GTY(()) control_flow_graph {
 
   /* Maximal count of BB in function.  */
   profile_count count_max;
+
+  /* Dynamically allocated edge/bb flags.  */
+  int edge_flags_allocated;
+  int bb_flags_allocated;
 };
 
 
@@ -121,4 +125,18 @@ extern basic_block get_bb_copy (basic_block);
 void set_loop_copy (struct loop *, struct loop *);
 struct loop *get_loop_copy (struct loop *);
 
+class auto_edge_flag : public auto_flag<int>
+{
+public:
+  auto_edge_flag (function *fun)
+    : auto_flag (&fun->cfg->edge_flags_allocated) {}
+};
+
+class auto_bb_flag : public auto_flag<int>
+{
+public:
+  auto_bb_flag (function *fun)
+    : auto_flag (&fun->cfg->bb_flags_allocated) {}
+};
+
 #endif /* GCC_CFG_H */
diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 8af793c6015..fb5ebad1dfd 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1539,6 +1539,7 @@ verify_loop_structure (void)
   /* Check irreducible loops.  */
   if (loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
     {
+      auto_edge_flag saved_irr_mask (cfun);
       /* Record old info.  */
       auto_sbitmap irreds (last_basic_block_for_fn (cfun));
       FOR_EACH_BB_FN (bb, cfun)
@@ -1550,7 +1551,7 @@ verify_loop_structure (void)
 	    bitmap_clear_bit (irreds, bb->index);
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    if (e->flags & EDGE_IRREDUCIBLE_LOOP)
-	      e->flags |= EDGE_ALL_FLAGS + 1;
+	      e->flags |= saved_irr_mask;
 	}
 
       /* Recount it.  */
@@ -1576,20 +1577,20 @@ verify_loop_structure (void)
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    {
 	      if ((e->flags & EDGE_IRREDUCIBLE_LOOP)
-		  && !(e->flags & (EDGE_ALL_FLAGS + 1)))
+		  && !(e->flags & saved_irr_mask))
 		{
 		  error ("edge from %d to %d should be marked irreducible",
 			 e->src->index, e->dest->index);
 		  err = 1;
 		}
 	      else if (!(e->flags & EDGE_IRREDUCIBLE_LOOP)
-		       && (e->flags & (EDGE_ALL_FLAGS + 1)))
+		       && (e->flags & saved_irr_mask))
 		{
 		  error ("edge from %d to %d should not be marked irreducible",
 			 e->src->index, e->dest->index);
 		  err = 1;
 		}
-	      e->flags &= ~(EDGE_ALL_FLAGS + 1);
+	      e->flags &= ~saved_irr_mask;
 	    }
 	}
     }
diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index d3efff40453..ca066118ebd 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "output.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "fold-const.h"
 #include "stringpool.h"
 #include "gimple-pretty-print.h"
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index 4ee53c81277..77fef5ee5d8 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -27,8 +27,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "cgraph.h"
diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
index f402587408d..819f680d1bc 100644
--- a/gcc/hsa-regalloc.c
+++ b/gcc/hsa-regalloc.c
@@ -27,9 +27,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "dominance.h"
 #include "basic-block.h"
-#include "cfg.h"
-#include "cfganal.h"
 #include "function.h"
+#include "cfganal.h"
+#include "cfg.h"
 #include "bitmap.h"
 #include "dumpfile.h"
 #include "cgraph.h"
diff --git a/gcc/hwint.h b/gcc/hwint.h
index 5054d8e3883..fdbcdeafdcd 100644
--- a/gcc/hwint.h
+++ b/gcc/hwint.h
@@ -333,4 +333,34 @@ absu_hwi (HOST_WIDE_INT x)
   return x >= 0 ? (unsigned HOST_WIDE_INT)x : -(unsigned HOST_WIDE_INT)x;
 }
 
+template <class T>
+class auto_flag
+{
+public:
+  /* static assert T is integer type of max HOST_WIDE_INT precision.  */
+  auto_flag (T *sptr)
+    {
+      m_sptr = sptr;
+      /* Check if we are out of bits.
+         ???  With extra complexity we can use the clz_hwi result if
+	 we make sure to convert the argument to its signed type variant.
+	 With C++11 we can use std::make_unsigned<T>::type (but likely
+	 not in this file).  */
+      if (*sptr & (1 << (CHAR_BIT * sizeof (T) - 1)))
+	gcc_unreachable ();
+      m_flag = 1 << ((CHAR_BIT * sizeof (T)) - clz_hwi (*sptr));
+      gcc_checking_assert ((*sptr & m_flag) == 0);
+      *sptr |= m_flag;
+    }
+  ~auto_flag ()
+    {
+      gcc_checking_assert ((*m_sptr & m_flag) == m_flag);
+      *m_sptr &= ~m_flag;
+    }
+  operator T () const { return m_flag; }
+private:
+  T *m_sptr;
+  T m_flag;
+};
+
 #endif /* ! GCC_HWINT_H */
diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 37c0d53fae2..ba7aa260194 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -36,11 +36,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "alias.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "print-tree.h"
 #include "flags.h"
 #include "predict.h"
 #include "function.h"
+#include "cfg.h"
 #include "basic-block.h"
 #include "diagnostic.h"
 #include "tree-pretty-print.h"
diff --git a/gcc/profile-count.c b/gcc/profile-count.c
index 3d411cfbfb3..3745ae073c8 100644
--- a/gcc/profile-count.c
+++ b/gcc/profile-count.c
@@ -25,8 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "options.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "gimple.h"
 #include "data-streamer.h"
 #include "cgraph.h"
David Malcolm May 22, 2018, 1:30 p.m. UTC | #4
On Tue, 2018-05-22 at 10:43 +0200, Richard Biener wrote:
> On Mon, 21 May 2018, Jeff Law wrote:
> 
> > On 05/18/2018 07:15 AM, David Malcolm wrote:
> > > On Fri, 2018-05-18 at 13:11 +0200, Richard Biener wrote:
> > > > The following adds a simple alloc/free_flag machinery
> > > > allocating
> > > > bits from an int typed pool and applies that to bb->flags and
> > > > edge-
> > > > > flags.
> > > > 
> > > > This should allow infrastructure pieces to use egde/bb flags
> > > > temporarily
> > > > without worrying that users might already use it as for example
> > > > BB_VISITED and friends.  It converts one clever user to the new
> > > > interface.
> > > > 
> > > > The allocation state is per CFG but we could also make it
> > > > global
> > > > or merge the two pools so one allocates a flag that can be used
> > > > for
> > > > bbs and edges at the same time.
> > > > 
> > > > Thus - any opinions welcome.  I'm mainly targeting cfganal
> > > > algorithms
> > > > where I want to add a few region-based ones that to be
> > > > O(region-size)
> > > > complexity may not use sbitmaps for visited sets because of the
> > > > clearing
> > > > overhead and bitmaps are probably more expensive to use than a
> > > > BB/edge
> > > > flag that needs to be cleared afterwards.
> > > > 
> > > > Built on x86_64, otherwise untested.
> > > > 
> > > > Any comments?
> > > 
> > > Rather than putting alloc/free pairs at the usage sites, how
> > > about an
> > > RAII class?  Something like this:
> > 
> > Yes, please if at all possible we should be using RAII.
> 
> So like the following?  (see comments in the hwint.h hunk for
> extra C++ questions...)
> 
> I dropped the non-RAII interface - it's very likely never needed.
> 
> Better suggestions for placement of auto_flag welcome.

Do you have ideas for other uses?  If not, maybe just put it in cfg.h
right in front of auto_edge_flag and auto_bb_flag, for simplicity?

> Thanks,
> Richard.
[...snip...]

The new classes are missing leading comments.  I think it's worth
noting that the auto_flag (and thus their subclasses) hold a pointer
into a control_flow_graph instance, but they don't interact with the
garbage collector, so there's an implicit assumption that the auto_flag
instances are short-lived and that the underlying storage is kept alive
some other way (e.g. as cfun is kept alive by cfun being a GC root).


> +class auto_edge_flag : public auto_flag<int>
> +{
> +public:
> +  auto_edge_flag (function *fun)
> +    : auto_flag (&fun->cfg->edge_flags_allocated) {}
> +};
> +
> +class auto_bb_flag : public auto_flag<int>
> +{
> +public:
> +  auto_bb_flag (function *fun)
> +    : auto_flag (&fun->cfg->bb_flags_allocated) {}
> +};
> +
>  #endif /* GCC_CFG_H */

[...snip...]

Hope this is constructive
Dave
Richard Biener May 22, 2018, 1:51 p.m. UTC | #5
On Tue, 22 May 2018, David Malcolm wrote:

> On Tue, 2018-05-22 at 10:43 +0200, Richard Biener wrote:
> > On Mon, 21 May 2018, Jeff Law wrote:
> > 
> > > On 05/18/2018 07:15 AM, David Malcolm wrote:
> > > > On Fri, 2018-05-18 at 13:11 +0200, Richard Biener wrote:
> > > > > The following adds a simple alloc/free_flag machinery
> > > > > allocating
> > > > > bits from an int typed pool and applies that to bb->flags and
> > > > > edge-
> > > > > > flags.
> > > > > 
> > > > > This should allow infrastructure pieces to use egde/bb flags
> > > > > temporarily
> > > > > without worrying that users might already use it as for example
> > > > > BB_VISITED and friends.  It converts one clever user to the new
> > > > > interface.
> > > > > 
> > > > > The allocation state is per CFG but we could also make it
> > > > > global
> > > > > or merge the two pools so one allocates a flag that can be used
> > > > > for
> > > > > bbs and edges at the same time.
> > > > > 
> > > > > Thus - any opinions welcome.  I'm mainly targeting cfganal
> > > > > algorithms
> > > > > where I want to add a few region-based ones that to be
> > > > > O(region-size)
> > > > > complexity may not use sbitmaps for visited sets because of the
> > > > > clearing
> > > > > overhead and bitmaps are probably more expensive to use than a
> > > > > BB/edge
> > > > > flag that needs to be cleared afterwards.
> > > > > 
> > > > > Built on x86_64, otherwise untested.
> > > > > 
> > > > > Any comments?
> > > > 
> > > > Rather than putting alloc/free pairs at the usage sites, how
> > > > about an
> > > > RAII class?  Something like this:
> > > 
> > > Yes, please if at all possible we should be using RAII.
> > 
> > So like the following?  (see comments in the hwint.h hunk for
> > extra C++ questions...)
> > 
> > I dropped the non-RAII interface - it's very likely never needed.
> > 
> > Better suggestions for placement of auto_flag welcome.
> 
> Do you have ideas for other uses?  If not, maybe just put it in cfg.h
> right in front of auto_edge_flag and auto_bb_flag, for simplicity?

I don't have more users but of course gimple stmts and tree nodes
would come to my mind ;)  Basically nodes in any data structure we walk
and that we can (cheaply) re-walk to clear flags in the end.  Cost
comparison would always be to a simple pointer-set or bitmap.

But sure, I'll stick it to cfg.h for the moment.  As said, my
main use case didn't materialize on trunk yet but is in a patchset
I have to bring up-to-date.

> > Thanks,
> > Richard.
> [...snip...]
> 
> The new classes are missing leading comments.  I think it's worth
> noting that the auto_flag (and thus their subclasses) hold a pointer
> into a control_flow_graph instance, but they don't interact with the
> garbage collector, so there's an implicit assumption that the auto_flag
> instances are short-lived and that the underlying storage is kept alive
> some other way (e.g. as cfun is kept alive by cfun being a GC root).

Ah, yes - missed a comment.

> 
> > +class auto_edge_flag : public auto_flag<int>
> > +{
> > +public:
> > +  auto_edge_flag (function *fun)
> > +    : auto_flag (&fun->cfg->edge_flags_allocated) {}
> > +};
> > +
> > +class auto_bb_flag : public auto_flag<int>
> > +{
> > +public:
> > +  auto_bb_flag (function *fun)
> > +    : auto_flag (&fun->cfg->bb_flags_allocated) {}
> > +};
> > +
> >  #endif /* GCC_CFG_H */
> 
> [...snip...]
> 
> Hope this is constructive

Sure!

Thanks,
Richard.
Joseph Myers May 22, 2018, 4:53 p.m. UTC | #6
On Tue, 22 May 2018, Richard Biener wrote:

> +      if (*sptr & (1 << (CHAR_BIT * sizeof (T) - 1)))
> +	gcc_unreachable ();
> +      m_flag = 1 << ((CHAR_BIT * sizeof (T)) - clz_hwi (*sptr));

I don't see how the use of clz_hwi works with a type T that may be 
narrower than HOST_WIDE_INT.  Surely this logic requires a count of 
leading zeros in something of type T, not a possibly larger number of 
leading zeros after conversion to HOST_WIDE_INT?  Also, if T is wider than 
int, shifting plain 1 won't work here.
Richard Biener May 22, 2018, 6:36 p.m. UTC | #7
On May 22, 2018 6:53:57 PM GMT+02:00, Joseph Myers <joseph@codesourcery.com> wrote:
>On Tue, 22 May 2018, Richard Biener wrote:
>
>> +      if (*sptr & (1 << (CHAR_BIT * sizeof (T) - 1)))
>> +	gcc_unreachable ();
>> +      m_flag = 1 << ((CHAR_BIT * sizeof (T)) - clz_hwi (*sptr));
>
>I don't see how the use of clz_hwi works with a type T that may be 
>narrower than HOST_WIDE_INT.  Surely this logic requires a count of 
>leading zeros in something of type T, not a possibly larger number of 
>leading zeros after conversion to HOST_WIDE_INT?  Also, if T is wider
>than 
>int, shifting plain 1 won't work here.

I messed up the conversion to a template. The bitnum should be subtracted from HOST_BITS_PER_WIDE_INT and yes, 1 in unsigned hwi should be shifted. 

Richard.
Richard Biener May 23, 2018, 11:02 a.m. UTC | #8
On Tue, 22 May 2018, Richard Biener wrote:

> On May 22, 2018 6:53:57 PM GMT+02:00, Joseph Myers <joseph@codesourcery.com> wrote:
> >On Tue, 22 May 2018, Richard Biener wrote:
> >
> >> +      if (*sptr & (1 << (CHAR_BIT * sizeof (T) - 1)))
> >> +	gcc_unreachable ();
> >> +      m_flag = 1 << ((CHAR_BIT * sizeof (T)) - clz_hwi (*sptr));
> >
> >I don't see how the use of clz_hwi works with a type T that may be 
> >narrower than HOST_WIDE_INT.  Surely this logic requires a count of 
> >leading zeros in something of type T, not a possibly larger number of 
> >leading zeros after conversion to HOST_WIDE_INT?  Also, if T is wider
> >than 
> >int, shifting plain 1 won't work here.
> 
> I messed up the conversion to a template. The bitnum should be subtracted from HOST_BITS_PER_WIDE_INT and yes, 1 in unsigned hwi should be shifted. 

So this is the final patch, I've changed the flags compute to use ffs
which is better suited to find a "random" unset bit.  For types
smaller than HOST_WIDE_INT we'll find bits outside of the range but
the truncated mask will be zero.  I guess the
ffsl ((unsigned long)~intvar) cannot be easily pattern-matched to
ffs so a way to do that unsigned conversion would be nice (or
mass-change all signed flag ints to unsigned...).

I took the opportunity to change dfs_enumerate_from to use
an auto_bb_flag rather than a static sbitmap.  That should be
profitable given we currently have the cacheline with BB flags
loaded anyways because we access BB index.  So using a BB flag
will avoid pulling in the sbitmap cachelines.  But it trades
possibly less memory stores for it in case the sbitmap modifications
were adjacent.  OTOH applying a mask should be cheaper than
variable shifts involved in sbitmap bit access.  It's definitely
less code ;)

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK for trunk?

Thanks,
Richard.

From 0091e95a133454da62973ad570c97e7b61bfd0ec Mon Sep 17 00:00:00 2001
From: Richard Guenther <rguenther@suse.de>
Date: Fri, 18 May 2018 13:01:36 +0200
Subject: [PATCH] add dynamic cfg flag allocation

	* cfg.h (struct control_flow_graph): Add edge_flags_allocated and
	bb_flags_allocated members.
	(auto_flag): New RAII class for allocating flags.
	(auto_edge_flag): New RAII class for allocating edge flags.
	(auto_bb_flag): New RAII class for allocating bb flags.
	* cfgloop.c (verify_loop_structure): Allocate temporary edge
	flag dynamically.
	* cfganal.c (dfs_enumerate_from): Remove use of visited sbitmap
	in favor of temporarily allocated BB flag.
	* hsa-brig.c: Re-order includes.
	* hsa-dump.c: Likewise.
	* hsa-regalloc.c: Likewise.
	* print-rtl.c: Likewise.
	* profile-count.c: Likewise.

diff --git a/gcc/cfg.c b/gcc/cfg.c
index 11026e7209a..f8b217d39ca 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -79,6 +79,8 @@ init_flow (struct function *the_fun)
     = EXIT_BLOCK_PTR_FOR_FN (the_fun);
   EXIT_BLOCK_PTR_FOR_FN (the_fun)->prev_bb
     = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
+  the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
+  the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
 }
 
 /* Helper function for remove_edge and clear_edges.  Frees edge structure
diff --git a/gcc/cfg.h b/gcc/cfg.h
index 0953456782b..9fff135d11f 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -74,6 +74,10 @@ struct GTY(()) control_flow_graph {
 
   /* Maximal count of BB in function.  */
   profile_count count_max;
+
+  /* Dynamically allocated edge/bb flags.  */
+  int edge_flags_allocated;
+  int bb_flags_allocated;
 };
 
 
@@ -121,4 +125,60 @@ extern basic_block get_bb_copy (basic_block);
 void set_loop_copy (struct loop *, struct loop *);
 struct loop *get_loop_copy (struct loop *);
 
+/* Generic RAII class to allocate a bit from storage of integer type T.
+   The allocated bit is accessible as mask with the single bit set
+   via the conversion operator to T.  */
+
+template <class T>
+class auto_flag
+{
+public:
+  /* static assert T is integer type of max HOST_WIDE_INT precision.  */
+  auto_flag (T *sptr)
+    {
+      m_sptr = sptr;
+      int free_bit = ffs_hwi (~*sptr);
+      /* If there are no unset bits... */
+      if (free_bit == 0)
+	gcc_unreachable ();
+      m_flag = HOST_WIDE_INT_1U << (free_bit - 1);
+      /* ...or if T is signed and thus the complement is sign-extended,
+         check if we ran out of bits.  We could spare us this bit
+	 if we could use C++11 std::make_unsigned<T>::type to pass
+	 ~*sptr to ffs_hwi.  */
+      if (m_flag == 0)
+	gcc_unreachable ();
+      gcc_checking_assert ((*sptr & m_flag) == 0);
+      *sptr |= m_flag;
+    }
+  ~auto_flag ()
+    {
+      gcc_checking_assert ((*m_sptr & m_flag) == m_flag);
+      *m_sptr &= ~m_flag;
+    }
+  operator T () const { return m_flag; }
+private:
+  T *m_sptr;
+  T m_flag;
+};
+
+/* RAII class to allocate an edge flag for temporary use.  You have
+   to clear the flag from all edges when you are finished using it.  */
+
+class auto_edge_flag : public auto_flag<int>
+{
+public:
+  auto_edge_flag (function *fun)
+    : auto_flag (&fun->cfg->edge_flags_allocated) {}
+};
+
+/* RAII class to allocate a bb flag for temporary use.  You have
+   to clear the flag from all edges when you are finished using it.  */
+class auto_bb_flag : public auto_flag<int>
+{
+public:
+  auto_bb_flag (function *fun)
+    : auto_flag (&fun->cfg->bb_flags_allocated) {}
+};
+
 #endif /* GCC_CFG_H */
diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index a901b3f3f2c..b9944c6ef98 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -1145,41 +1145,12 @@ dfs_enumerate_from (basic_block bb, int reverse,
 {
   basic_block *st, lbb;
   int sp = 0, tv = 0;
-  unsigned size;
 
-  /* A bitmap to keep track of visited blocks.  Allocating it each time
-     this function is called is not possible, since dfs_enumerate_from
-     is often used on small (almost) disjoint parts of cfg (bodies of
-     loops), and allocating a large sbitmap would lead to quadratic
-     behavior.  */
-  static sbitmap visited;
-  static unsigned v_size;
+  auto_bb_flag visited (cfun);
 
-#define MARK_VISITED(BB) (bitmap_set_bit (visited, (BB)->index))
-#define UNMARK_VISITED(BB) (bitmap_clear_bit (visited, (BB)->index))
-#define VISITED_P(BB) (bitmap_bit_p (visited, (BB)->index))
-
-  /* Resize the VISITED sbitmap if necessary.  */
-  size = last_basic_block_for_fn (cfun);
-  if (size < 10)
-    size = 10;
-
-  if (!visited)
-    {
-
-      visited = sbitmap_alloc (size);
-      bitmap_clear (visited);
-      v_size = size;
-    }
-  else if (v_size < size)
-    {
-      /* Ensure that we increase the size of the sbitmap exponentially.  */
-      if (2 * v_size > size)
-	size = 2 * v_size;
-
-      visited = sbitmap_resize (visited, size, 0);
-      v_size = size;
-    }
+#define MARK_VISITED(BB) ((BB)->flags |= visited)
+#define UNMARK_VISITED(BB) ((BB)->flags &= ~visited)
+#define VISITED_P(BB) (((BB)->flags & visited) != 0)
 
   st = XNEWVEC (basic_block, rslt_max);
   rslt[tv++] = st[sp++] = bb;
diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 8af793c6015..fb5ebad1dfd 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1539,6 +1539,7 @@ verify_loop_structure (void)
   /* Check irreducible loops.  */
   if (loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
     {
+      auto_edge_flag saved_irr_mask (cfun);
       /* Record old info.  */
       auto_sbitmap irreds (last_basic_block_for_fn (cfun));
       FOR_EACH_BB_FN (bb, cfun)
@@ -1550,7 +1551,7 @@ verify_loop_structure (void)
 	    bitmap_clear_bit (irreds, bb->index);
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    if (e->flags & EDGE_IRREDUCIBLE_LOOP)
-	      e->flags |= EDGE_ALL_FLAGS + 1;
+	      e->flags |= saved_irr_mask;
 	}
 
       /* Recount it.  */
@@ -1576,20 +1577,20 @@ verify_loop_structure (void)
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    {
 	      if ((e->flags & EDGE_IRREDUCIBLE_LOOP)
-		  && !(e->flags & (EDGE_ALL_FLAGS + 1)))
+		  && !(e->flags & saved_irr_mask))
 		{
 		  error ("edge from %d to %d should be marked irreducible",
 			 e->src->index, e->dest->index);
 		  err = 1;
 		}
 	      else if (!(e->flags & EDGE_IRREDUCIBLE_LOOP)
-		       && (e->flags & (EDGE_ALL_FLAGS + 1)))
+		       && (e->flags & saved_irr_mask))
 		{
 		  error ("edge from %d to %d should not be marked irreducible",
 			 e->src->index, e->dest->index);
 		  err = 1;
 		}
-	      e->flags &= ~(EDGE_ALL_FLAGS + 1);
+	      e->flags &= ~saved_irr_mask;
 	    }
 	}
     }
diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index d3efff40453..ca066118ebd 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "output.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "fold-const.h"
 #include "stringpool.h"
 #include "gimple-pretty-print.h"
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index 4ee53c81277..77fef5ee5d8 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -27,8 +27,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "cgraph.h"
diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
index f402587408d..819f680d1bc 100644
--- a/gcc/hsa-regalloc.c
+++ b/gcc/hsa-regalloc.c
@@ -27,9 +27,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "dominance.h"
 #include "basic-block.h"
-#include "cfg.h"
-#include "cfganal.h"
 #include "function.h"
+#include "cfganal.h"
+#include "cfg.h"
 #include "bitmap.h"
 #include "dumpfile.h"
 #include "cgraph.h"
diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 37c0d53fae2..ba7aa260194 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -36,11 +36,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "alias.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "print-tree.h"
 #include "flags.h"
 #include "predict.h"
 #include "function.h"
+#include "cfg.h"
 #include "basic-block.h"
 #include "diagnostic.h"
 #include "tree-pretty-print.h"
diff --git a/gcc/profile-count.c b/gcc/profile-count.c
index 3d411cfbfb3..3745ae073c8 100644
--- a/gcc/profile-count.c
+++ b/gcc/profile-count.c
@@ -25,8 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "options.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "gimple.h"
 #include "data-streamer.h"
 #include "cgraph.h"
Michael Matz May 23, 2018, 12:47 p.m. UTC | #9
Hi,

On Wed, 23 May 2018, Richard Biener wrote:

> > I messed up the conversion to a template. The bitnum should be 
> > subtracted from HOST_BITS_PER_WIDE_INT and yes, 1 in unsigned hwi 
> > should be shifted.

Maybe you should convert the thing to a template when the need arises 
instead of before?  You have now added 54 lines of code for wrapping an 
int!


Ciao,
Michael.
Eric Botcazou May 23, 2018, 12:54 p.m. UTC | #10
> Maybe you should convert the thing to a template when the need arises
> instead of before?  You have now added 54 lines of code for wrapping an
> int!

Yeah, it took me 5 minutes to understand what all this fluff is about!
Michael Matz May 23, 2018, 1:45 p.m. UTC | #11
Hi,

On Wed, 23 May 2018, Eric Botcazou wrote:

> > Maybe you should convert the thing to a template when the need arises
> > instead of before?  You have now added 54 lines of code for wrapping an
> > int!
> 
> Yeah, it took me 5 minutes to understand what all this fluff is about!

So, what I think this should look like: only one non-templated class for 
RAII purposes, which get's the pool to allocate from as a parameter in the 
ctor.

Use:

    alloc_flags (&cfun->cfg->bb_flag_pool);
    alloc_flags (&cfun->cfg->edge_flag_pool);

I don't see the sense in creating two classes for determining the pool 
(and then adding a third class when another pool is invented somewhere 
else) just for going from cfun to cfun->cfg->foopool.  Also Richi asked if 
the flag pools (sigh, a large word for an int) should be merged.  I think 
at this time they should be, but that the class ctor should still take the 
pool param (instead of the function), even if right now there'd only be 
one.

So much for bike shedding :)


Ciao,
Michael.
Richard Biener May 23, 2018, 1:57 p.m. UTC | #12
On Wed, 23 May 2018, Michael Matz wrote:

> Hi,
> 
> On Wed, 23 May 2018, Eric Botcazou wrote:
> 
> > > Maybe you should convert the thing to a template when the need arises
> > > instead of before?  You have now added 54 lines of code for wrapping an
> > > int!
> > 
> > Yeah, it took me 5 minutes to understand what all this fluff is about!
> 
> So, what I think this should look like: only one non-templated class for 
> RAII purposes, which get's the pool to allocate from as a parameter in the 
> ctor.
> 
> Use:
> 
>     alloc_flags (&cfun->cfg->bb_flag_pool);
>     alloc_flags (&cfun->cfg->edge_flag_pool);

You'll end up with sth like

   alloc_flags flag (BB_FLAG_POOL_FOR_FN (cfun));

then, mixing C++ RAII and macros! (eh)  Note you missed to name the
variable you declare.  And yes, template deduction should make this
work w/o writing alloc_flags<int> flag (...).

> I don't see the sense in creating two classes for determining the pool 
> (and then adding a third class when another pool is invented somewhere 
> else) just for going from cfun to cfun->cfg->foopool.  Also Richi asked if 
> the flag pools (sigh, a large word for an int) should be merged.  I think 
> at this time they should be, but that the class ctor should still take the 
> pool param (instead of the function), even if right now there'd only be 
> one.
> 
> So much for bike shedding :)

:/

Richard.
Michael Matz May 23, 2018, 2:02 p.m. UTC | #13
Hi,

On Wed, 23 May 2018, Richard Biener wrote:

> >     alloc_flags (&cfun->cfg->bb_flag_pool);
> >     alloc_flags (&cfun->cfg->edge_flag_pool);
> 
> You'll end up with sth like
> 
>    alloc_flags flag (BB_FLAG_POOL_FOR_FN (cfun));
> 
> then, mixing C++ RAII and macros! (eh)

First: I don't see the problem.  Second: if the above needs to use a macro 
you should have used a macro as well in your two classes.

> Note you missed to name the variable you declare.

Sure.

> And yes, template deduction should make this work w/o writing 
> alloc_flags<int> flag (...).

If you insist on an template, maybe.  But why would you?


Ciao,
Michael.
diff mbox series

Patch

diff --git a/gcc/cfg.c b/gcc/cfg.c
index 11026e7209a..30fc417d28f 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -79,6 +79,8 @@  init_flow (struct function *the_fun)
     = EXIT_BLOCK_PTR_FOR_FN (the_fun);
   EXIT_BLOCK_PTR_FOR_FN (the_fun)->prev_bb
     = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
+  the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
+  the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
 }
 
 /* Helper function for remove_edge and clear_edges.  Frees edge structure
@@ -1167,3 +1169,26 @@  get_loop_copy (struct loop *loop)
   else
     return NULL;
 }
+
+/* Allocate a bit from *POOL and return a corresponding mask.  */
+
+int
+alloc_flag (int *pool)
+{
+  int bitnum = clz_hwi (*pool);
+  /* out of bits */
+  gcc_assert (bitnum != 0);
+  int mask = 1 << (HOST_BITS_PER_INT - bitnum);
+  gcc_assert ((*pool & mask) == 0);
+  *pool |= mask;
+  return mask;
+}
+
+/* Release a bit with MASK to *POOL.  */
+
+void
+free_flag (int *pool, int mask)
+{
+  gcc_assert (popcount_hwi (mask) == 1 && (*pool & mask) == mask);
+  *pool &= ~mask;
+}
diff --git a/gcc/cfg.h b/gcc/cfg.h
index 0953456782b..23b02346751 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -74,6 +74,10 @@  struct GTY(()) control_flow_graph {
 
   /* Maximal count of BB in function.  */
   profile_count count_max;
+
+  /* Dynamically allocated edge/bb flags.  */
+  int edge_flags_allocated;
+  int bb_flags_allocated;
 };
 
 
@@ -121,4 +125,28 @@  extern basic_block get_bb_copy (basic_block);
 void set_loop_copy (struct loop *, struct loop *);
 struct loop *get_loop_copy (struct loop *);
 
+extern int alloc_flag (int *);
+extern void free_flag (int *, int);
+
+inline int
+alloc_edge_flag (struct function *fn)
+{
+  return alloc_flag (&fn->cfg->edge_flags_allocated);
+}
+inline int
+alloc_bb_flag (struct function *fn)
+{
+  return alloc_flag (&fn->cfg->bb_flags_allocated);
+}
+inline void
+free_edge_flag (struct function *fn, int mask)
+{
+  free_flag (&fn->cfg->edge_flags_allocated, mask);
+}
+inline void
+free_bb_flag (struct function *fn, int mask)
+{
+  free_flag (&fn->cfg->bb_flags_allocated, mask);
+}
+
 #endif /* GCC_CFG_H */
diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 8af793c6015..64ad42c83ca 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1539,6 +1539,7 @@  verify_loop_structure (void)
   /* Check irreducible loops.  */
   if (loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
     {
+      int saved_irr_mask = alloc_edge_flag (cfun);
       /* Record old info.  */
       auto_sbitmap irreds (last_basic_block_for_fn (cfun));
       FOR_EACH_BB_FN (bb, cfun)
@@ -1550,7 +1551,7 @@  verify_loop_structure (void)
 	    bitmap_clear_bit (irreds, bb->index);
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    if (e->flags & EDGE_IRREDUCIBLE_LOOP)
-	      e->flags |= EDGE_ALL_FLAGS + 1;
+	      e->flags |= saved_irr_mask;
 	}
 
       /* Recount it.  */
@@ -1576,22 +1577,23 @@  verify_loop_structure (void)
 	  FOR_EACH_EDGE (e, ei, bb->succs)
 	    {
 	      if ((e->flags & EDGE_IRREDUCIBLE_LOOP)
-		  && !(e->flags & (EDGE_ALL_FLAGS + 1)))
+		  && !(e->flags & saved_irr_mask))
 		{
 		  error ("edge from %d to %d should be marked irreducible",
 			 e->src->index, e->dest->index);
 		  err = 1;
 		}
 	      else if (!(e->flags & EDGE_IRREDUCIBLE_LOOP)
-		       && (e->flags & (EDGE_ALL_FLAGS + 1)))
+		       && (e->flags & saved_irr_mask))
 		{
 		  error ("edge from %d to %d should not be marked irreducible",
 			 e->src->index, e->dest->index);
 		  err = 1;
 		}
-	      e->flags &= ~(EDGE_ALL_FLAGS + 1);
+	      e->flags &= ~saved_irr_mask;
 	    }
 	}
+      free_edge_flag (cfun, saved_irr_mask);
     }
 
   /* Check the recorded loop exits.  */
diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index d3efff40453..ca066118ebd 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -35,8 +35,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "output.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "fold-const.h"
 #include "stringpool.h"
 #include "gimple-pretty-print.h"
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index 4ee53c81277..77fef5ee5d8 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -27,8 +27,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "vec.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "cgraph.h"
diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
index f402587408d..819f680d1bc 100644
--- a/gcc/hsa-regalloc.c
+++ b/gcc/hsa-regalloc.c
@@ -27,9 +27,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "dominance.h"
 #include "basic-block.h"
-#include "cfg.h"
-#include "cfganal.h"
 #include "function.h"
+#include "cfganal.h"
+#include "cfg.h"
 #include "bitmap.h"
 #include "dumpfile.h"
 #include "cgraph.h"
diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 37c0d53fae2..ba7aa260194 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -36,11 +36,11 @@  along with GCC; see the file COPYING3.  If not see
 #include "alias.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "print-tree.h"
 #include "flags.h"
 #include "predict.h"
 #include "function.h"
+#include "cfg.h"
 #include "basic-block.h"
 #include "diagnostic.h"
 #include "tree-pretty-print.h"
diff --git a/gcc/profile-count.c b/gcc/profile-count.c
index 3d411cfbfb3..3745ae073c8 100644
--- a/gcc/profile-count.c
+++ b/gcc/profile-count.c
@@ -25,8 +25,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "options.h"
 #include "tree.h"
 #include "basic-block.h"
-#include "cfg.h"
 #include "function.h"
+#include "cfg.h"
 #include "gimple.h"
 #include "data-streamer.h"
 #include "cgraph.h"