diff mbox

support/graph-depends: detect circular dependencies

Message ID 1453586685-24190-1-git-send-email-yann.morin.1998@free.fr
State Changes Requested
Headers show

Commit Message

Yann E. MORIN Jan. 23, 2016, 10:04 p.m. UTC
Currently, if there is a circular dependency in the packages, the
graph-depends script just errors out with a Python RunteimError which is
not caught, resulting in a very-long backtrace which does not provide
any hint as what the real issue is (even if "RuntimeError: maximum
recursion depth exceeded" is a pretty good hint at it).

We fix that by recusrsing the dependency chain of each package, until we
either end up with a package with no dependency, or with a package
already seen along the current dependency chain.

We need to introduce a new function, check_circular_deps(), because we
can't re-use the existing ones:

  - remove_mandatory_deps() does not iterate,

  - remove_transitive_deps() does iterate, but we do not call it for the
    top-level package if it is not 'all'

  - it does not make sense to use those functions anyway, as they were
    not designed to _check_ but to _at_ on the dependency chain.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Samuel Martin <s.martin49@gmail.com>

---
Note: I'm not completely happy with the way the code detects the end of
the dependency chain, but at least it works and is a starting point for
further discussion. Python experts will happily point me in the right
direction! ;-)
---
 support/scripts/graph-depends | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Thomas Petazzoni Jan. 23, 2016, 10:21 p.m. UTC | #1
Yann,

On Sat, 23 Jan 2016 23:04:45 +0100, Yann E. MORIN wrote:
> Currently, if there is a circular dependency in the packages, the
> graph-depends script just errors out with a Python RunteimError which is

RuntimeError

> We fix that by recusrsing the dependency chain of each package, until we

recursing

> We need to introduce a new function, check_circular_deps(), because we
> can't re-use the existing ones:
> 
>   - remove_mandatory_deps() does not iterate,
> 
>   - remove_transitive_deps() does iterate, but we do not call it for the
>     top-level package if it is not 'all'
> 
>   - it does not make sense to use those functions anyway, as they were
>     not designed to _check_ but to _at_ on the dependency chain.

at -> act

Isn't kconfig already detecting such situations ? It normally spits out
a warning when you have a circular dep, no ?

> +# This function will check that there is no loop in the dependency chain
> +# As a side effect, it builds up the dependency cache.
> +def check_circular_deps(deps):
> +    def recurse(pkg):
> +        if not pkg in list(deps.keys()):
> +            return
> +        chain.append(pkg)
> +        for p in deps[pkg]:
> +            if p in chain:
> +                sys.stderr.write("\nRecursion detected for  : %s\n" % (p))
> +                while True:
> +                    _p = chain.pop()
> +                    sys.stderr.write("which is a dependency of: %s\n" % (_p))
> +                    if p == _p:
> +                        sys.exit(1)
> +            recurse(p)
> +        chain.pop()
> +
> +    chain = []
> +    for pkg in list(deps.keys()):
> +        recurse(pkg)

I am a bit worried about the algorithmic complexity of this new
function. As you know, we had issues with other parts of graph-depends
having a too high algorithmic complexity to handle large
configurations, or configurations having specific patterns of
dependencies.

Have you measured the time impact of this new check on a very large
configuration (like allyespackageconfig) ?

I'm Cc'ing also Gustavo, who has access to a configuration that was
even worse than allyespackageconfig for the previous speed problem
which we fixed.

Best regards,

Thomas
Yann E. MORIN Jan. 23, 2016, 10:31 p.m. UTC | #2
Thomas, All,

On 2016-01-23 23:21 +0100, Thomas Petazzoni spake thusly:
> On Sat, 23 Jan 2016 23:04:45 +0100, Yann E. MORIN wrote:
[--SNIP--]
> Isn't kconfig already detecting such situations ? It normally spits out
> a warning when you have a circular dep, no ?

No, because the ones I'm concerned with are optional dependencies:
libgtk2 and libgtk3 have an optional dependency on cups, like so;

    ifeq ($(BR2_PKG_CUPS),y)
    FOO_DEPENDENCIES += cups
    endif

And this is not caught at the Kconfig level, because there is actually
no circular deps there.

> > +# This function will check that there is no loop in the dependency chain
> > +# As a side effect, it builds up the dependency cache.
> > +def check_circular_deps(deps):
> > +    def recurse(pkg):
> > +        if not pkg in list(deps.keys()):
> > +            return
> > +        chain.append(pkg)
> > +        for p in deps[pkg]:
> > +            if p in chain:
> > +                sys.stderr.write("\nRecursion detected for  : %s\n" % (p))
> > +                while True:
> > +                    _p = chain.pop()
> > +                    sys.stderr.write("which is a dependency of: %s\n" % (_p))
> > +                    if p == _p:
> > +                        sys.exit(1)
> > +            recurse(p)
> > +        chain.pop()
> > +
> > +    chain = []
> > +    for pkg in list(deps.keys()):
> > +        recurse(pkg)
> 
> I am a bit worried about the algorithmic complexity of this new
> function. As you know, we had issues with other parts of graph-depends
> having a too high algorithmic complexity to handle large
> configurations, or configurations having specific patterns of
> dependencies.
> 
> Have you measured the time impact of this new check on a very large
> configuration (like allyespackageconfig) ?

I have an allyespackageconfig with an recent toolchain so I get a lot
of packages, and I tweaked the config to disable a few to enable others.

And no, the speed impact is not measurable for me. I'll come up with
numbers (of course, when there's no loop!) a bit later.

Regards,
Yann E. MORIN.
Yann E. MORIN Jan. 23, 2016, 11 p.m. UTC | #3
Thomas, All,

On 2016-01-23 23:31 +0100, Yann E. MORIN spake thusly:
> On 2016-01-23 23:21 +0100, Thomas Petazzoni spake thusly:
> > On Sat, 23 Jan 2016 23:04:45 +0100, Yann E. MORIN wrote:
> [--SNIP--]
> > I am a bit worried about the algorithmic complexity of this new
> > function. As you know, we had issues with other parts of graph-depends
> > having a too high algorithmic complexity to handle large
> > configurations, or configurations having specific patterns of
> > dependencies.
> > 
> > Have you measured the time impact of this new check on a very large
> > configuration (like allyespackageconfig) ?
> 
> I have an allyespackageconfig with an recent toolchain so I get a lot
> of packages, and I tweaked the config to disable a few to enable others.
> 
> And no, the speed impact is not measurable for me. I'll come up with
> numbers (of course, when there's no loop!) a bit later.

Damn, I spoke too fast. The speed was totally fine as long as there were
those circular dependencies I was hunting for.

But without any circular deps, the speed is awfull and totally
inacceptable.

I'll see what I can do to speed this up.

One option is to not do the check in graph-depends, but to off-load that
into the package infrastructure, so we detect them even earlier.

Regards,
Yann E. MORIN.
Arnout Vandecappelle Jan. 24, 2016, 1:04 a.m. UTC | #4
On 24-01-16 00:00, Yann E. MORIN wrote:
> Thomas, All,
> 
> On 2016-01-23 23:31 +0100, Yann E. MORIN spake thusly:
>> On 2016-01-23 23:21 +0100, Thomas Petazzoni spake thusly:
>>> On Sat, 23 Jan 2016 23:04:45 +0100, Yann E. MORIN wrote:
>> [--SNIP--]
>>> I am a bit worried about the algorithmic complexity of this new
>>> function. As you know, we had issues with other parts of graph-depends
>>> having a too high algorithmic complexity to handle large
>>> configurations, or configurations having specific patterns of
>>> dependencies.
>>>
>>> Have you measured the time impact of this new check on a very large
>>> configuration (like allyespackageconfig) ?
>>
>> I have an allyespackageconfig with an recent toolchain so I get a lot
>> of packages, and I tweaked the config to disable a few to enable others.
>>
>> And no, the speed impact is not measurable for me. I'll come up with
>> numbers (of course, when there's no loop!) a bit later.
> 
> Damn, I spoke too fast. The speed was totally fine as long as there were
> those circular dependencies I was hunting for.
> 
> But without any circular deps, the speed is awfull and totally
> inacceptable.
>
> I'll see what I can do to speed this up.


 Naive cycle detection is exponential. See [1] for a pythonese implementation of
single-visit cycle detection. Note that you have to do it on the whole graph at
once, not once per node, for this to work.


> One option is to not do the check in graph-depends, but to off-load that
> into the package infrastructure, so we detect them even earlier.

 It's going to stay equally exponential when you do it in make... And since make
already has cycle detection, I don't think it makes much sense to add a custom
one there. What more is your custom cycle detection going to tell you? Ah, yes,
you want it to show the entire cycle instead of just the place where it
arbitrarily cut it. Hm, I think graph-depends is still a better place for it -
the make logic is going to look _very_ ugly.


 Regards,
 Arnout

[1]
http://codereview.stackexchange.com/questions/86021/check-if-a-directed-graph-contains-a-cycle
Yann E. MORIN Jan. 24, 2016, 11:56 a.m. UTC | #5
Arnout, All,

On 2016-01-24 02:04 +0100, Arnout Vandecappelle spake thusly:
> On 24-01-16 00:00, Yann E. MORIN wrote:
> > Thomas, All,
> > 
> > On 2016-01-23 23:31 +0100, Yann E. MORIN spake thusly:
> >> On 2016-01-23 23:21 +0100, Thomas Petazzoni spake thusly:
> >>> On Sat, 23 Jan 2016 23:04:45 +0100, Yann E. MORIN wrote:
> >> [--SNIP--]
> >>> I am a bit worried about the algorithmic complexity of this new
> >>> function. As you know, we had issues with other parts of graph-depends
> >>> having a too high algorithmic complexity to handle large
> >>> configurations, or configurations having specific patterns of
> >>> dependencies.
> >>>
> >>> Have you measured the time impact of this new check on a very large
> >>> configuration (like allyespackageconfig) ?
> >>
> >> I have an allyespackageconfig with an recent toolchain so I get a lot
> >> of packages, and I tweaked the config to disable a few to enable others.
> >>
> >> And no, the speed impact is not measurable for me. I'll come up with
> >> numbers (of course, when there's no loop!) a bit later.
> > 
> > Damn, I spoke too fast. The speed was totally fine as long as there were
> > those circular dependencies I was hunting for.
> > 
> > But without any circular deps, the speed is awfull and totally
> > inacceptable.
> >
> > I'll see what I can do to speed this up.
> 
> 
>  Naive cycle detection is exponential. See [1] for a pythonese implementation of
> single-visit cycle detection. Note that you have to do it on the whole graph at
> once, not once per node, for this to work.

Yeah, I have already implemented such a cut:
    https://git.buildroot.org/~ymorin/git/buildroot/commit/?h=yem/fixes&id=451599b

This new circular dependency check just adds less than 0.5 seconds now,
which is IMHO totally acceptable. I'll respin the patch shortly.

> > One option is to not do the check in graph-depends, but to off-load that
> > into the package infrastructure, so we detect them even earlier.
> 
>  It's going to stay equally exponential when you do it in make... And since make
> already has cycle detection, I don't think it makes much sense to add a custom
> one there. What more is your custom cycle detection going to tell you? Ah, yes,
> you want it to show the entire cycle instead of just the place where it
> arbitrarily cut it.

Yes, that's what I wanted to get, the list of packages that makes the
loop, so it is easier to understand.

> Hm, I think graph-depends is still a better place for it -
> the make logic is going to look _very_ ugly.

Well, I managed to have a relatively simple code to detect circular
dependencies in generic-package, but all it was able to do is detect it,
not dump the loop.

And it was really expensive, but not quite as expensive as the Python
code, adding 'just' 2 minutes to the parsing of the Makefiles, instead
of the 10+ minutes for the python code (which I really don't know hiw
long it runs, as I always Ctrl-C-ed it after ~10 minutes).

Thanks! :-)
diff mbox

Patch

diff --git a/support/scripts/graph-depends b/support/scripts/graph-depends
index fd8ad2f..2a357d8 100755
--- a/support/scripts/graph-depends
+++ b/support/scripts/graph-depends
@@ -306,9 +306,32 @@  def remove_transitive_deps(pkg,deps):
 def remove_mandatory_deps(pkg,deps):
     return [p for p in deps[pkg] if p not in ['toolchain', 'skeleton']]
 
+# This function will check that there is no loop in the dependency chain
+# As a side effect, it builds up the dependency cache.
+def check_circular_deps(deps):
+    def recurse(pkg):
+        if not pkg in list(deps.keys()):
+            return
+        chain.append(pkg)
+        for p in deps[pkg]:
+            if p in chain:
+                sys.stderr.write("\nRecursion detected for  : %s\n" % (p))
+                while True:
+                    _p = chain.pop()
+                    sys.stderr.write("which is a dependency of: %s\n" % (_p))
+                    if p == _p:
+                        sys.exit(1)
+            recurse(p)
+        chain.pop()
+
+    chain = []
+    for pkg in list(deps.keys()):
+        recurse(pkg)
+
 # This functions trims down the dependency list of all packages.
 # It applies in sequence all the dependency-elimination methods.
 def remove_extra_deps(deps):
+    check_circular_deps(dict_deps)
     for pkg in list(deps.keys()):
         if not pkg == 'all':
             deps[pkg] = remove_mandatory_deps(pkg,deps)