diff mbox

[U-Boot,RFC/PATCH] tools: New 'spl-stackusage' script

Message ID 1422748072-32405-1-git-send-email-siarhei.siamashka@gmail.com
State RFC
Delegated to: Tom Rini
Headers show

Commit Message

Siarhei Siamashka Jan. 31, 2015, 11:47 p.m. UTC
This is a script, which tries to provide a pessimistic estimate
of the stack usage in the SPL binary.

A more detailed description about how it works and some example
pictures are available in an earlier e-mail:
    http://lists.denx.de/pipermail/u-boot/2015-January/201713.html

In fact, I have done nothing with it since that old status update
and the code is still very much just at the proof of concept
stage and can be only treated as a somewhat working prototype.
It only supports 32-bit ARM so far.

An example of trying it:

1. First compile u-boot for some ARM board. For example:

   make CROSS_COMPILE=arm-none-linux-gnueabi- Cubieboard_defconfig
   make CROSS_COMPILE=arm-none-linux-gnueabi- -j8

2. Run the script:

   tools/spl-stackusage/spl-stackusage

3. Look at the (slightly reformatted) output:

   Execution path with maximized stack usage (3152 bytes):
      entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
      mmc_init > mmc_startup > mmc_switch.isra.1 > mmc_send_status >
      printf > puts > console_puts ? mmc_bread > mmc_set_blocklen >
      mmc_send_cmd ? sunxi_mmc_getcd > gpio_get_value >
      axp_gpio_get_value > axp209_read > i2c_read ? twsi_i2c_read >
      i2c_begin > twsi_wait > udelay > __udelay

   The '>' symbols indicate normal calls. The '?' symbols
   indicate a suspected indirect call. Now we can see that the
   'console_puts ? mmc_bread' indirect call does not make any sense.
   So we can run the script again, now indicating that a call
   from 'console_puts' to 'mmc_bread' is impossible:

4. Run the script again as:

   tools/spl-stackusage/spl-stackusage \
       remove_arrow[console_puts,mmc_bread]

5. Look at the new output:

   Execution path with maximized stack usage (3152 bytes):
      entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
      mmc_init > mmc_startup > mmc_switch.isra.1 > mmc_send_status >
      printf > puts > console_puts ? sunxi_mmc_getcd > gpio_get_value >
      axp_gpio_get_value > axp209_read > i2c_read ? mmc_bread >
      mmc_set_blocklen > mmc_send_cmd ? twsi_i2c_read > i2c_begin >
      twsi_wait > udelay > __udelay

   Now it tries to guess that there is an indirect call from
   'console_puts' to 'sunxi_mmc_getcd'. This also does not make sense.
   We add it to the black list too. And repreat the whole process
   iteratively. Eventually we end up with something like

6. Run the script as:

   tools/spl-stackusage/spl-stackusage \
       remove_arrow[console_puts,mmc_bread] \
       remove_arrow[console_puts,sunxi_mmc_getcd] \
       remove_arrow[i2c_read,sunxi_mmc_set_ios] \
       remove_arrow[console_puts,twsi_i2c_read] \
       remove_arrow[i2c_read,mmc_bread] \
       remove_arrow[serial_puts,mmc_bread] \
       remove_arrow[serial_puts,sunxi_mmc_getcd] \
       remove_arrow[console_puts,twsi_i2c_write] \
       remove_arrow[console_puts,sunxi_mmc_send_cmd] \
       remove_arrow[console_puts,twsi_i2c_probe] \
       remove_arrow[serial_puts,twsi_i2c_read] \
       remove_arrow[console_puts,twsi_i2c_init] \
       remove_arrow[serial_puts,twsi_i2c_write] \
       remove_arrow[serial_puts,sunxi_mmc_send_cmd] \
       remove_arrow[serial_puts,twsi_i2c_probe] \
       remove_arrow[console_puts,sunxi_mmc_set_ios] \
       remove_arrow[serial_puts,twsi_i2c_init]

7. And look at the final report:

   Execution path with maximized stack usage (3064 bytes):
      entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
      mmc_init > mmc_startup > mmc_set_clock > mmc_set_ios ? mmc_bread >
      mmc_set_blocklen > mmc_send_cmd ? sunxi_mmc_getcd > gpio_get_value >
      axp_gpio_get_value > axp209_read > i2c_read > i2c_get_adapter >
      printf > vsprintf > vsnprintf_internal.isra.6 > number.isra.4 >
      put_dec > __div64_32 > __aeabi_uidiv > __aeabi_idiv0

The process of blacklisting impossible caller->callee pairs made
the stack usage report less pessimistic (from 3152 to 3064 bytes)
and the execution path makes a little bit more sense. But that's
the whole point. Even the initial stack usage number is supposed
to be usable, because it is just overestimating the stack usage
and should never underestimate it (unless confused by tricks in
the assembly code).

Also there is a 'spl-stackusage.png' file with the whole callgraph
generated by the graphviz 'dot' tool. It may looks like this (a
different picture, not generated by following the steps above):
    http://people.freedesktop.org/~siamashka/files/20150116-spl-stackgraph/spl-stackgraph-v2015.01-cubieboard2-fel.png

This is surely not intended to be pushed as-is yet. And still needs
some cleanups. However I would like to get some feedback, just to get
a better sense of direction where it should be moving.

Also, formally, the merge window is still open. So, I guess, this tool
might have a very slim theoretical chance to get into v2015.04 :-)

---
 tools/spl-stackusage/spl-stackusage    |   1 +
 tools/spl-stackusage/spl-stackusage.rb | 471 +++++++++++++++++++++++++++++++++
 2 files changed, 472 insertions(+)
 create mode 120000 tools/spl-stackusage/spl-stackusage
 create mode 100755 tools/spl-stackusage/spl-stackusage.rb

Comments

Andreas Bießmann Feb. 2, 2015, 1:15 p.m. UTC | #1
Dear Siarhei Siamashka,

On 02/01/2015 12:47 AM, Siarhei Siamashka wrote:
> This is a script, which tries to provide a pessimistic estimate
> of the stack usage in the SPL binary.
> 
> A more detailed description about how it works and some example
> pictures are available in an earlier e-mail:
>     http://lists.denx.de/pipermail/u-boot/2015-January/201713.html
> 
> In fact, I have done nothing with it since that old status update
> and the code is still very much just at the proof of concept
> stage and can be only treated as a somewhat working prototype.
> It only supports 32-bit ARM so far.
> 
> An example of trying it:
> 
> 1. First compile u-boot for some ARM board. For example:
> 
>    make CROSS_COMPILE=arm-none-linux-gnueabi- Cubieboard_defconfig
>    make CROSS_COMPILE=arm-none-linux-gnueabi- -j8

unfortunately out-of-tree build is not directly supported with this
script. I mean some environment (CROSS_COMPILE) is re-used/required by
the script but others not (O). It would be fine to just replace the make
in cmdline with this tool to run it.

> 2. Run the script:
> 
>    tools/spl-stackusage/spl-stackusage
> 
> 3. Look at the (slightly reformatted) output:
> 
>    Execution path with maximized stack usage (3152 bytes):
>       entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
>       mmc_init > mmc_startup > mmc_switch.isra.1 > mmc_send_status >
>       printf > puts > console_puts ? mmc_bread > mmc_set_blocklen >
>       mmc_send_cmd ? sunxi_mmc_getcd > gpio_get_value >
>       axp_gpio_get_value > axp209_read > i2c_read ? twsi_i2c_read >
>       i2c_begin > twsi_wait > udelay > __udelay
> 
>    The '>' symbols indicate normal calls. The '?' symbols
>    indicate a suspected indirect call. Now we can see that the
>    'console_puts ? mmc_bread' indirect call does not make any sense.
>    So we can run the script again, now indicating that a call
>    from 'console_puts' to 'mmc_bread' is impossible:
> 
> 4. Run the script again as:
> 
>    tools/spl-stackusage/spl-stackusage \
>        remove_arrow[console_puts,mmc_bread]

Unfortunately dynamic stack allocation is not considered. So adding FAT
to SPL will result in an empty list here. I only get 'dynamic stack size
in get_cluster'. I think the most important thing is to always print out
the summary and the caller stack, mark the dynamic parts of the stack
and mention that there is some more required.
Some interface to declare the maximum amount of dynamic stack usage for
a specific symbol by command line would be really great. So one can
guess the relevant parts and calculate the maximum size.

Another point ... I really appreciate your efforts to bring this tool
in. So don't get me wrong, but ... why ruby?
A while ago some guys started to introduce python for some complex
scripts, before that u-boot only had shell scripts.
I personally think one interpreter in u-boot world is enough. I'm
curious how others think about that.

Best regards

Andreas Bießmann
Siarhei Siamashka Feb. 11, 2015, 4:15 a.m. UTC | #2
On Mon, 02 Feb 2015 14:15:33 +0100
"Andreas Bießmann" <andreas.devel@googlemail.com> wrote:

> Dear Siarhei Siamashka,

Hi Andreas,

Thanks for your feedback.

> On 02/01/2015 12:47 AM, Siarhei Siamashka wrote:
> > This is a script, which tries to provide a pessimistic estimate
> > of the stack usage in the SPL binary.
> > 
> > A more detailed description about how it works and some example
> > pictures are available in an earlier e-mail:
> >     http://lists.denx.de/pipermail/u-boot/2015-January/201713.html
> > 
> > In fact, I have done nothing with it since that old status update
> > and the code is still very much just at the proof of concept
> > stage and can be only treated as a somewhat working prototype.
> > It only supports 32-bit ARM so far.
> > 
> > An example of trying it:
> > 
> > 1. First compile u-boot for some ARM board. For example:
> > 
> >    make CROSS_COMPILE=arm-none-linux-gnueabi- Cubieboard_defconfig
> >    make CROSS_COMPILE=arm-none-linux-gnueabi- -j8
> 
> unfortunately out-of-tree build is not directly supported with this
> script. I mean some environment (CROSS_COMPILE) is re-used/required by
> the script but others not (O). It would be fine to just replace the make
> in cmdline with this tool to run it.

I'm not so sure that replacing make is a very good idea because that's
outside of the scope of this script. The script only needs the 'spl'
directory with the SPL binary itself and also the *.su files generated
by GCC.

The CROSS_COMPILE environment variable is only used to find the
arm variant of the objdump tool. This variable is even only used as
a fallback method, because the script contains the list of possible arm
toolchain triplets and probes them. In many cases it can be run even
without the CROSS_COMPILE variable defined.

> > 2. Run the script:
> > 
> >    tools/spl-stackusage/spl-stackusage
> > 
> > 3. Look at the (slightly reformatted) output:
> > 
> >    Execution path with maximized stack usage (3152 bytes):
> >       entry > _main > board_init_f > board_init_r > spl_mmc_load_image >
> >       mmc_init > mmc_startup > mmc_switch.isra.1 > mmc_send_status >
> >       printf > puts > console_puts ? mmc_bread > mmc_set_blocklen >
> >       mmc_send_cmd ? sunxi_mmc_getcd > gpio_get_value >
> >       axp_gpio_get_value > axp209_read > i2c_read ? twsi_i2c_read >
> >       i2c_begin > twsi_wait > udelay > __udelay
> > 
> >    The '>' symbols indicate normal calls. The '?' symbols
> >    indicate a suspected indirect call. Now we can see that the
> >    'console_puts ? mmc_bread' indirect call does not make any sense.
> >    So we can run the script again, now indicating that a call
> >    from 'console_puts' to 'mmc_bread' is impossible:
> > 
> > 4. Run the script again as:
> > 
> >    tools/spl-stackusage/spl-stackusage \
> >        remove_arrow[console_puts,mmc_bread]
> 
> Unfortunately dynamic stack allocation is not considered.

The use of dynamic stack size is at least detected and the script
refuses to work. I wanted to get some feedback before deciding
how to best handle it.

> So adding FAT to SPL will result in an empty list here. I only get 'dynamic stack size
> in get_cluster'. I think the most important thing is to always print out
> the summary and the caller stack, mark the dynamic parts of the stack
> and mention that there is some more required.

Thanks for the suggestion. For now you can just find the relevant .su
file and edit it fake some some arbitrary stack usage estimate there.

> Some interface to declare the maximum amount of dynamic stack usage for
> a specific symbol by command line would be really great. So one can
> guess the relevant parts and calculate the maximum size.

Yes. Or alternatively it is perhaps possible to just fix the FAT code
and use malloc instead of dynamic allocations on stack.

The large buffers in .bss in the FAT code is already responsible for
the .bss section placement in DRAM. And these dynamic allocations on
stack are yet another wacky thing in this code.

> Another point ... I really appreciate your efforts to bring this tool
> in. So don't get me wrong, but ... why ruby?

That's because I know ruby, but don't know python at the moment.

Why I have picked ruby over python some years ago is another story.
But I don't want to start a programming language holy war here :)

> A while ago some guys started to introduce python for some complex
> scripts, before that u-boot only had shell scripts.
> I personally think one interpreter in u-boot world is enough. I'm
> curious how others think about that.

If python is a requirement, then I will have to learn it first. Which
is fair enough. But this requirement first needs to be explicitly
communicated by somebody.
Andreas Bießmann Feb. 11, 2015, 9:12 a.m. UTC | #3
Hi Siarhei,

On 02/11/2015 05:15 AM, Siarhei Siamashka wrote:
> On Mon, 02 Feb 2015 14:15:33 +0100
> "Andreas Bießmann" <andreas.devel@googlemail.com> wrote:

>> Another point ... I really appreciate your efforts to bring this tool
>> in. So don't get me wrong, but ... why ruby?
> 
> That's because I know ruby, but don't know python at the moment.

That's a good reason.

> Why I have picked ruby over python some years ago is another story.
> But I don't want to start a programming language holy war here :)
> 
>> A while ago some guys started to introduce python for some complex
>> scripts, before that u-boot only had shell scripts.
>> I personally think one interpreter in u-boot world is enough. I'm
>> curious how others think about that.
> 
> If python is a requirement, then I will have to learn it first. Which
> is fair enough. But this requirement first needs to be explicitly
> communicated by somebody.

For me it is not a real requirement, I'm fine with ruby when it works as
expected. The dynamic stack allocation should be considered in the tool
even though FAT would be converted to malloc(2) (some days ;).

Best regards

Andreas Bießmann
diff mbox

Patch

diff --git a/tools/spl-stackusage/spl-stackusage b/tools/spl-stackusage/spl-stackusage
new file mode 120000
index 0000000..eb10b9e
--- /dev/null
+++ b/tools/spl-stackusage/spl-stackusage
@@ -0,0 +1 @@ 
+spl-stackusage.rb
\ No newline at end of file
diff --git a/tools/spl-stackusage/spl-stackusage.rb b/tools/spl-stackusage/spl-stackusage.rb
new file mode 100755
index 0000000..a135d6c
--- /dev/null
+++ b/tools/spl-stackusage/spl-stackusage.rb
@@ -0,0 +1,471 @@ 
+#!/usr/bin/env ruby
+#
+# Copyright © 2015 Siarhei Siamashka <siarhei.siamashka@gmail.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+
+###############################################################################
+# This is a u-boot SPL stack usage analysis script. Needs to be run           #
+# from the root of the u-boot source tree after a successful SPL build.       #
+#                                                                             #
+# Produces 'spl-stackusage.png' file as the output.                           #
+###############################################################################
+
+require 'find'
+
+###############################################################################
+# Process command line arguments
+#
+# add_node[node_name, stack_usage]
+# remove_node[node_name, stack_usage]
+# add_arrow[node_name1, node_name2]
+# remove_arrow[node_name1, node_name2]
+#
+###############################################################################
+
+$removed_arrows = {}
+
+ARGV.each {|a|
+  a.split(";").each {|a|
+    if a =~ /remove_arrow[\(\[]\s*(\S+)\s*\,\s*(\S+)\s*[\)\]]/
+      $removed_arrows[[$1, $2]] = 1
+    end
+  }
+}
+
+###############################################################################
+
+if !File.directory?("spl")
+  abort("Error: can't find the 'spl' directory.\n" +
+        "Please ensure that you are in the root of the u-boot source tree.\n")
+end
+
+def tool_exists(tool_name)
+    `which #{tool_name} > /dev/null 2>&1`
+    return $?.to_i == 0
+end
+
+if !tool_exists("dot")
+  abort("Please install the graphviz dot tool.\n")
+end
+
+###############################################################################
+# Get objdump log                                                             #
+###############################################################################
+
+# Get the information from the environment
+$toolchain = ENV["CROSS_COMPILE"]
+
+# Try to guess the ARM toolchain automatically
+if !$toolchain && `file spl/u-boot-spl` =~ / ARM, /
+  toolchain_list = [
+    "arm-none-linux-gnueabi-",
+    "armv7a-hardfloat-linux-gnueabi-",
+    "arm-linux-gnueabihf-"
+  ]
+  toolchain_list.each {|toolchain|
+    if tool_exists("#{toolchain}objdump")
+      $toolchain = toolchain
+      break
+    end
+  }
+end
+
+if !$toolchain
+  abort("Error: can't find a usable toolchain.\n" +
+        "Try to set the CROSS_COMPILE environment variable.\n")
+end
+
+$objdump_dis_log = `#{$toolchain}objdump -d -M reg-names-raw spl/u-boot-spl`
+$objdump_sym_log = `#{$toolchain}objdump -t spl/u-boot-spl`
+$size_log = `#{$toolchain}size spl/u-boot-spl`
+
+File.open("spl-stackusage.dis", "w").write($objdump_dis_log)
+File.open("spl-stackusage.sym", "w").write($objdump_sym_log)
+
+###############################################################################
+# Read the *.su files (generated by GCC with '-fstack-usage' option)          #
+###############################################################################
+
+$dynamic_stack_size = {}
+$self_stack_size = {}
+$total_stack_size  = {}
+
+Find.find("spl") { |filename|
+  next unless filename =~ /.*\.su$/
+  File.open(filename).each_line {|l|
+    if l =~ /\:([^\:\s]+)\s+(\d+)\s+(\S+)$/
+      case $3
+      when "dynamic"
+        $dynamic_stack_size[$1] = true
+      when "static"
+        $self_stack_size[$1] = [$2.to_i, ($self_stack_size[$1] || 0)].max
+      else
+        abort("Unsupported stack usage type #{$3} in #{filename}\n")
+      end
+    end
+  }
+}
+
+if $self_stack_size.size == 0
+  abort("Error: could not load anything from *.su files in 'spl' directory.\n" +
+        "Please ensure that you are using '-fstack-usage' GCC option.\n")
+end
+
+###############################################################################
+# Parse the objdump log                                                       #
+###############################################################################
+
+$detected_cycles = {}
+$labels = {}
+$arrows = {}
+$functions = {}
+$func_code = {}
+$datastructs = {}
+$refs = {}
+$xrefs = {}
+$max_stack_size = 0
+$critical_path = {}
+$shortest_cycle = nil
+$indirect_targets = {}
+$indirect_sources = {}
+$blue_arrows = {}
+
+# Stack usage, based on inspecting disassembly listing in objdump log
+def guess_func_stack_usage(func_name)
+  func_code = $func_code[func_name]
+  if !func_code
+    printf("No function code for '%s'\n", func_name)
+    return nil
+  end
+  stack_usage = 0
+  func_code.each_line { |l|
+    # 'push {reglist}' instruction
+    if l =~ /push\s+\{\s*(.*?)\s*\}/
+      reglist = $1.split(",")
+      abort("Register ranges are not supported") if $1 =~ /\-/
+      stack_usage += reglist.size * 4
+    end
+    # 'sub sp, sp, #imm' instruction
+    if l =~ /sub\s+r13\s*\,\s*(r13\s*)\,\s*\#(\d+)/
+      stack_usage += $2.to_i
+    end
+  }
+  return stack_usage
+end
+
+# Stack usage, based on parsing *.su files
+def get_func_stack_usage(func_name)
+  stack_usage = $self_stack_size[func_name]
+  stack_usage = guess_func_stack_usage(func_name) if !stack_usage
+  return stack_usage
+end
+
+$objdump_sym_log.each_line {|l|
+  # 000034c8 l     F .text	000000c8 twsi_i2c_read
+  next unless l =~ /^(\h+)\s+\S+\s+(\S+)\s+(\.\S+)\s+\h+.*?\s+(\S+)\s*$/
+  addr = $1
+  type = $2
+  sect = $3
+  name = $4
+  if type == "F" && sect == ".text"
+    $functions[name] = addr.to_i(16)
+  else
+    $datastructs[name] = addr.to_i(16)
+  end
+}
+
+$functions.each { |func, _|
+  if $dynamic_stack_size.has_key?(func)
+    abort("dynamic stack size in #{func}")
+  end
+}
+
+Find.find("spl") { |filename|
+  next unless filename =~ /.*\.o$/
+  skipflag = false
+  current_section = ""
+  `#{$toolchain}objdump -r #{filename}`.each_line {|l|
+    if l =~ /RELOCATION RECORDS FOR/
+      if l =~ /\[\.text\.(.*?)\]/
+        skipflag = !$functions.has_key?($1)
+      elsif l =~ /\[\.(data|rodata)\.(.*?)\]/
+        skipflag = !$datastructs.has_key?($2)
+      else
+        skipflag = false
+      end
+    end
+
+    next if skipflag
+
+    if l =~ /(\S+)\s+(\S+)\s+(\S+)/
+      reltype = $2
+      name    = $3
+      next if !$functions.has_key?(name)
+      next if reltype == "R_ARM_CALL" || reltype == "R_ARM_JUMP24"
+      next if reltype == "R_ARM_THM_CALL" || reltype == "R_ARM_THM_JUMP24"
+      $indirect_targets[name] = 1
+    end
+  }
+}
+
+def add_ref(current_func, func_name)
+  return if $removed_arrows.has_key?([current_func, func_name])
+
+  $arrows[[current_func, func_name]] = 1
+  $refs[current_func] = {} if !$refs.has_key?(current_func)
+  $refs[func_name] = {} if !$refs.has_key?(func_name)
+  $refs[current_func][func_name] = true
+
+  $xrefs[current_func] = {} if !$xrefs.has_key?(current_func)
+  $xrefs[func_name] = {} if !$xrefs.has_key?(func_name)
+  $xrefs[func_name][current_func] = true
+end
+
+# First pass, collect local aliases
+current_func = "entry"
+current_label = "entry"
+$label_to_func_mapping = {}
+$objdump_dis_log.each_line {|l|
+  current_label = $1 if l =~ /^\h+\s+\<(.*?)>\:$/
+  current_func = $1 if l =~ /^\h+\s+\<(.*?)>\:$/ && $functions.has_key?($1)
+  $label_to_func_mapping[current_label] = current_func
+}
+
+current_func = "entry"
+$objdump_dis_log.each_line {|l|
+  current_func = $1 if l =~ /^\h+\s+\<(.*?)>\:$/ && $functions.has_key?($1)
+  if l =~ /(.*?)\<(.*?)(\+0x\h+)?\>/
+    target_func = $label_to_func_mapping[$2]
+    add_ref(current_func, target_func) if target_func != current_func
+  end
+  if l =~ /\s+blx?\s+r/
+    $indirect_sources[current_func] = 1
+  end
+  if l =~ /\s+bx?\s+r/ && !(l =~ /\s+bx?\s+r14/)
+    $indirect_sources[current_func] = 1
+  end
+  $func_code[current_func] = "" unless $func_code.has_key?(current_func)
+  $func_code[current_func] << sprintf("%s\n", l.strip)
+}
+
+###############################################################################
+# Handle indirect calls                                                       #
+###############################################################################
+
+$functions["<indirect-call>"] = 1
+$self_stack_size["<indirect-call>"] = 0
+
+$multiplexed_indirect_sources = $indirect_sources.clone
+$multiplexed_indirect_targets = $indirect_targets.clone
+
+# A function can't indirectly call itself (that would be a recursion)
+$indirect_sources.each {|src, _|
+  $removed_arrows[[src, src]] = 1
+}
+# A function can't indirectly call its caller (that would be a recursion)
+$refs.each {|src, dst_list|
+  dst_list.each {|dst, _|
+    $removed_arrows[[dst, src]] = 1
+  }
+}
+
+def impossible_indirect_call(bad_src, bad_dst)
+  return unless $multiplexed_indirect_sources.has_key?(bad_src)
+  return unless $multiplexed_indirect_targets.has_key?(bad_dst)
+
+  if $multiplexed_indirect_sources.size < $multiplexed_indirect_targets.size
+    $multiplexed_indirect_sources.delete(bad_src)
+    $multiplexed_indirect_targets.each {|dst, _|
+      next if dst == bad_dst
+      newarrow = [bad_src, dst]
+      if !($removed_arrows.has_key?(newarrow))
+        add_ref(newarrow[0], newarrow[1])
+        $blue_arrows[newarrow] = 1
+      end
+    }
+  else
+    $multiplexed_indirect_targets.delete(bad_dst)
+    $multiplexed_indirect_sources.each {|src, _|
+      next if src == bad_src
+      newarrow = [src, bad_dst]
+      if !($removed_arrows.has_key?(newarrow))
+        add_ref(newarrow[0], newarrow[1])
+        $blue_arrows[newarrow] = 1
+      end
+    }
+  end
+end
+
+$removed_arrows.each {|k, _|
+  impossible_indirect_call(k[0], k[1])
+}
+
+$multiplexed_indirect_sources.each {|src, _|
+  newarrow = [src, "<indirect-call>"]
+  add_ref(newarrow[0], newarrow[1])
+  $blue_arrows[newarrow] = 1
+}
+
+$multiplexed_indirect_targets.each {|dst, _|
+  newarrow = ["<indirect-call>", dst]
+  add_ref(newarrow[0], newarrow[1])
+  $blue_arrows[newarrow] = 1
+}
+
+###############################################################################
+
+def recursive_walk(node, visited_nodes, depth, total_stack)
+  $total_stack_size[node] = [total_stack, ($total_stack_size[node] || 0)].max
+  if $total_stack_size[node] > $max_stack_size
+    $critical_path = visited_nodes.clone
+    $max_stack_size = $total_stack_size[node]
+  end
+  $refs[node].each {|next_node, _|
+    func_stack = get_func_stack_usage(next_node)
+    if visited_nodes.has_key?(next_node) then
+      # Find shortest cycle
+      tmp = visited_nodes.sort {|a, b| a[1] <=> b[1]}.map {|a| a[0]}
+      idx = tmp.find_index(next_node)
+      tmp = tmp.slice(idx, tmp.size - idx)
+      $shortest_cycle = tmp if !$shortest_cycle
+      $shortest_cycle = tmp if tmp.size < $shortest_cycle.size
+      $detected_cycles[[node, next_node]] = 1
+      next
+    end
+    visited_nodes[next_node] = depth if next_node != "<indirect-call>"
+    recursive_walk(next_node, visited_nodes, depth + 1,
+                   total_stack + func_stack)
+    visited_nodes.delete(next_node)
+  }
+end
+
+# Find the 'root' nodes and recursively traverse the graph starting from them
+$xrefs.each {|func_name, nodes|
+  next if nodes.size != 0
+  visited_nodes = {}
+  depth = 0
+  visited_nodes[func_name] = depth
+  recursive_walk(func_name, visited_nodes, depth + 1,
+                 get_func_stack_usage(func_name))
+}
+
+###############################################################################
+# Generate .dot file for graphviz                                             #
+###############################################################################
+
+$legend = ""
+$max_stack_size = $total_stack_size.map { |k, v| v }.max
+
+def format_path_arr(path_arr)
+  result = ""
+  (path_arr + [path_arr[0]]).each_cons(2) { |a|
+    normal_call   = $refs[a[0]].has_key?(a[1]) &&
+                    !$blue_arrows.has_key?([a[0], a[1]])
+    indirect_call = ($refs[a[0]].has_key?("<indirect-call>") &&
+                     $refs["<indirect-call>"].has_key?(a[1])) ||
+                     $blue_arrows.has_key?([a[0], a[1]])
+    if normal_call
+      result << sprintf("%s > ", a[0])
+    elsif indirect_call
+      result << sprintf("%s ? ", a[0])
+    else
+      result << sprintf("%s ! ", a[0])
+    end
+  }
+  return result + path_arr[0]
+end
+
+if $size_log =~ /(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\h+)\s+spl\/u\-boot\-spl/
+    text_plus_data_size = $1.to_i + $2.to_i
+    bss_size = $3.to_i
+
+    $legend << sprintf("------------ u-boot SPL ------------\\l")
+    $legend << sprintf("text+data   = %5d bytes\\l", text_plus_data_size)
+    $legend << sprintf("bss         = %5d bytes\\l", bss_size)
+    $legend << sprintf("stack usage = %5d bytes\\l", $max_stack_size)
+end
+
+if $detected_cycles.size != 0
+  $legend << sprintf("--------------------------------------\\l")
+  $legend << sprintf("Warning, recursion suspected:\\l\\\"%s\\\"\\l",
+                     format_path_arr($shortest_cycle))
+  $legend << sprintf("\\lWhere:")
+  $legend << sprintf("\\l  '>' - means normal function call")
+  $legend << sprintf("\\l  '?' - indirect call (via a pointer)\\l")
+  $legend << sprintf("\\lBlue arrows highlight possible indirect calls")
+  $legend << sprintf("\\lgoing through a fake multiplexer node. Red")
+  $legend << sprintf("\\larrows highlight possible cases of recursion")
+  $legend << sprintf("\\lon the callgraph. Recursion is bad because it")
+  $legend << sprintf("\\lmakes stack usage estimation unpredictable.\\l")
+  $legend << sprintf("\\lThe recursion warnings can be eliminated by:")
+  $legend << sprintf("\\l  1) Fixing the SPL code if it is really broken.")
+  $legend << sprintf("\\l  2) Incorrectly guessed indirect calls can")
+  $legend << sprintf("\\l     be suppressed by using one or more")
+  $legend << sprintf("\\l     'remove_arrow[caller,callee]' command")
+  $legend << sprintf("\\l     line arguments passed to the script.\\l")
+  $legend << sprintf("\\lThe nodes on the execution path with maximized")
+  $legend << sprintf("\\lstack usage are orange colored.\\l")
+end
+
+fh = File.open("spl-stackusage.dot", "w")
+
+$refs.each {|func_name, nodes|
+  $labels[func_name] = sprintf("%s\\n[total=%d, self=%d]",
+                               func_name,
+                               $total_stack_size[func_name],
+                               get_func_stack_usage(func_name))
+}
+
+fh.printf("digraph graphname {\n")
+fh.printf("rankdir=LR;\n")
+fh.printf("node [shape=box,fontname=helvetica];\n")
+
+$total_stack_size.each { |k, v|
+  extra = ""
+  shape = "box"
+  shape = "octagon" if $indirect_sources.has_key?(k)
+  shape = "doubleoctagon" if $indirect_targets.has_key?(k)
+  if $critical_path.has_key?(k)
+    extra = ",style=filled,fillcolor=orange,group=main"
+  end
+  if k == "<indirect-call>"
+    shape = "oval"
+    extra = ",style=filled,fillcolor=lightblue"
+  end
+  fh.printf("\"%s\" [shape=#{shape}#{extra}];\n", $labels[k])
+}
+
+$arrows.each {|k, v|
+  if $detected_cycles.has_key?(k) then
+    fh.printf("\"%s\" -> \"%s\" [color=\"red\",penwidth=3];\n",
+              $labels[k[0]], $labels[k[1]])
+  elsif $blue_arrows.has_key?(k) then
+    fh.printf("\"%s\" -> \"%s\" [color=blue];\n",
+              $labels[k[0]], $labels[k[1]])
+  else
+    fh.printf("\"%s\" -> \"%s\";\n", $labels[k[0]], $labels[k[1]])
+  end
+}
+fh.printf("legend [shape=box,style=filled,fontname=monospace,fontsize=24," +
+          "color=\"lightgray\",label=\"#{$legend}\"];\n")
+fh.printf("}\n")
+fh.close
+
+printf("Execution path with maximized stack usage (%d bytes):\n%s\n",
+       $max_stack_size,
+       format_path_arr($critical_path.invert.sort.map { |a| a[1] }))
+
+`dot -Tpng -o spl-stackusage.png spl-stackusage.dot`