diff mbox series

[1/1] Refine the dependencies so that packages can be compiled in parallel.

Message ID 5DBA847110A28B46BBA17BE675F0C9B21818F2B0@cnshjmbx01
State Rejected
Headers show
Series [1/1] Refine the dependencies so that packages can be compiled in parallel. | expand

Commit Message

Chu, Zhuliang (NSB - CN/Shanghai) Sept. 20, 2017, 2:31 p.m. UTC
support/scripts/parallel-build: Try to provide support for parallel compilation

Now we know that buildroot does not support parallel compilation.
My colleagues and I in the process of working will be a lot of repetitive compilation of buildroot.
A lot of time spent in the buildroot compiler, so I try to provide parallel compiler support.
I added a target 'parallel-build' to the Makefile that will call the python script.

in script parallel-build:
the dependencies of all packages are parsed first and then stored in a dictionary, and then the packages that are not dependent are extracted from the dictionary, After successfully compiling, it will release the other packages that depend on these packages, this will run until the dictionary is empty.

In this script I also wrote a detailed note.

Signed-off-by: Zhuliang Chu <zhuliang.chu@nokia-sbell.com>
---
 Makefile                       |   4 ++
 support/scripts/parallel-build | 150 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 154 insertions(+)
 create mode 100755 support/scripts/parallel-build

--
1.8.3.1

Comments

Arnout Vandecappelle Sept. 20, 2017, 2:51 p.m. UTC | #1
Hi Zhuliang,

On 20-09-17 16:31, Chu, Zhuliang (NSB - CN/Shanghai) wrote:
> support/scripts/parallel-build: Try to provide support for parallel compilation
> 
> Now we know that buildroot does not support parallel compilation.
> My colleagues and I in the process of working will be a lot of repetitive compilation of buildroot.
> A lot of time spent in the buildroot compiler, so I try to provide parallel compiler support.
> I added a target 'parallel-build' to the Makefile that will call the python script.
> 
> in script parallel-build:
> the dependencies of all packages are parsed first and then stored in a dictionary, and then the packages that are not dependent are extracted from the dictionary, After successfully compiling, it will release the other packages that depend on these packages, this will run until the dictionary is empty.

 As far as I can see after a cursory look, this will do exactly the same as
'make -j' since it just takes into account the dependencies that are encoded in
Buildroot. Did you read lines 103-116 of the top-level Makefile?

# Parallel execution of this Makefile is disabled because it changes
# the packages building order, that can be a problem for two reasons:
# - If a package has an unspecified optional dependency and that
#   dependency is present when the package is built, it is used,
#   otherwise it isn't (but compilation happily proceeds) so the end
#   result will differ if the order is swapped due to parallel
#   building.
# - Also changing the building order can be a problem if two packages
#   manipulate the same file in the target directory.
#
# Taking into account the above considerations, if you still want to execute
# this top-level Makefile in parallel comment the ".NOTPARALLEL" line and
# use the -j<jobs> option when building, e.g:
#      make -j$((`getconf _NPROCESSORS_ONLN`+1))

 This script is not addressing either of the problems mentioned there. And if
this script works for you, a vastly simpler and faster to instead rely on make -j.

 See also commit 1668e1da390c3320ed7bcf0377ba57ed2280b38d for details.

 Regards,
 Arnout

> 
> In this script I also wrote a detailed note.
> 
> Signed-off-by: Zhuliang Chu <zhuliang.chu@nokia-sbell.com>
[snip]
Yann E. MORIN Sept. 26, 2017, 12:18 p.m. UTC | #2
Zhuliang Chu, All,

On 2017-09-20 14:31 +0000, Chu, Zhuliang (NSB - CN/Shanghai) spake thusly:
> support/scripts/parallel-build: Try to provide support for parallel
> compilation
> 
> Now we know that buildroot does not support parallel compilation.
> My colleagues and I in the process of working will be a lot of repetitive
> compilation of buildroot.
> A lot of time spent in the buildroot compiler, so I try to provide
> parallel compiler support.
> I added a target 'parallel-build' to the Makefile that will call the
> python script.

Although I see the reason you decided to go with an external python
script, I would prefer that we go with a solution that is entirely
implemented in the existing infrastructures.

Some people have started such an endeavour in the past (and you can find
their work on the mailing list; search for "top-level parallel build"),
simply building two or more packages at the same time is not so trivial.

First and foremost, we strive for reproducibility. Given a configuration
and the same build machine, two builds will give the same result (or so
we strive for it).

Second, the most dreaded cause for non repropducibility are optional,
hidden dependencies. In an ideal world, all dependencies should be
expresed in the .mk files. But in practice, this is not true. Currently,
this issue is side-stepped thanks to the build ordering: two packages
will always be built in the same order, so the optional dependency is
always met, or it never is. Top-level parallel build, by its very
nature, no longer guarantees the build ordering, and thus breaks
reproducibility.

Those kind of hidden dependencies are either headers, libraries, or
host tools alike.

The way to solve this is to guarantee that a package will only ever
"see" the staging and host directories for its explicitly specified
dependencies. This is what we call "per-package staging" (where staging
implies host as well).

Unless we can do that, we killed reproducibility.

Third, we want to maximise the CPU usage, while still keeping the total
job-level to an acceptable amount. It has to be noted that not all
packages support building in parallel. Those are using $(MAKE1) instead
of $(MAKE) in Buildroot.

So, if you want to maximise the CPU usage on (say) an 8-core system, you
will want to use up to 9 jobs; you don't want to use more, or you'd kill
useability of the system.

So if you decorelate (like your script does) the top-level and
per-package number of jobs, then either you do not make full use of your
system, or you overwhelm it with build jobs. You want to use a high
top-level number of jobs, to cover the case where only MAKE1 packages
get built (worst case), but you also want a high per-package number of
jobs, in case a single package gets built (worst case).

But in doing so, you will happen to build 9 packages in parallel, with
each package bulding up to 9 files in parallel, which is 91 jobs in
parallel (worst case). This is definitely no good.

So you want to have a single number of jobs, that is spread evenly
across all ready-to-build packages.

So, the only solution is to push for top-level parallel build to be
natively supported in Buildroot.

Yes, talk is cheap, show-me-the-code and what-not. Don't hold your
breath...

Regards,
Yann E. MORIN.

> in script parallel-build:
> the dependencies of all packages are parsed first and then stored in a dictionary, and then the packages that are not dependent are extracted from the dictionary, After successfully compiling, it will release the other packages that depend on these packages, this will run until the dictionary is empty.
> 
> In this script I also wrote a detailed note.
> 
> Signed-off-by: Zhuliang Chu <zhuliang.chu@nokia-sbell.com>
> ---
>  Makefile                       |   4 ++
>  support/scripts/parallel-build | 150 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 154 insertions(+)
>  create mode 100755 support/scripts/parallel-build
> 
> diff --git a/Makefile b/Makefile
> index 9b09589..a854760 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -785,6 +785,10 @@ show-targets:
>  .PHONY: show-build-order
>  show-build-order: $(patsubst %,%-show-build-order,$(PACKAGES))
>  
> +.PHONY:parallel-build
> +parallel-build:dependencies
> +	$(TOPDIR)/support/scripts/parallel-build --jobs $(PARALLEL_JOBS) 
> +--packages $(PACKAGES)
> +
>  .PHONY: graph-build
>  graph-build: $(O)/build/build-time.log
>  	@install -d $(GRAPHS_DIR)
> diff --git a/support/scripts/parallel-build b/support/scripts/parallel-build new file mode 100755 index 0000000..562d114
> --- /dev/null
> +++ b/support/scripts/parallel-build
> @@ -0,0 +1,150 @@
> +#!/usr/bin/python
> +import sys
> +import subprocess
> +import argparse
> +from copy import deepcopy
> +import multiprocessing
> +import brpkgutil
> +
> +done_queue=multiprocessing.Queue()
> +extras=""
> +get_depends_func = brpkgutil.get_depends get_rdepends_func = 
> +brpkgutil.get_rdepends
> +
> +# get all dependencies of packages
> +def get_all_depends(pkgs):
> +  filtered_pkgs = []
> +  for pkg in pkgs:
> +    if pkg in filtered_pkgs:
> +      continue
> +    filtered_pkgs.append(pkg)
> +  if len(filtered_pkgs) == 0:
> +    return []
> +  return get_depends_func(filtered_pkgs)
> +
> +# select someone which is in dictionary`s values but isn`t in keys.
> +def pickup_nokey_pkg(depends):
> +  nokey_deps = []
> +  alldeps=[]
> +  for deps in depends.values():
> +    alldeps.extend(deps)
> +  alldeps=list(set(alldeps))
> +  for dep in alldeps:
> +    if not depends.has_key(dep):
> +      nokey_deps.append(dep)
> +  return nokey_deps
> +
> +# select some packages that have no dependencies def 
> +pickup_nodepends_pkgs(depends):
> +  no_deps_pkgs = []
> +  for pkg,deps in depends.items():
> +    if deps == []:
> +      no_deps_pkgs.append(pkg)
> +  return no_deps_pkgs
> +
> +# when a package has been compiled successfully, then remove it from dictionary 'dependencies'
> +def remove_pkg_from_depends(package,depends):
> +  for pkg,deps in depends.items():
> +    if package == pkg:
> +      del depends[package]
> +    if package in deps:
> +      depends[pkg].remove(package)
> +  return depends
> +
> +# real build process
> +def make_build_pkg(package):
> +  cmd = "make %s %s"%(extras,package)
> +  p = 
> +subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subproces
> +s.STDOUT)
> +  (stdoutput,erroutput) = p.communicate()
> +  if stdoutput:
> +    sys.stdout.write(stdoutput)
> +  if erroutput:
> +    sys.stderr.write(erroutput)
> +  if p.returncode == 0:
> +    return package
> +  else:
> +    sys.stderr.write("make %s have a error,so the parallel build must exit %d\n"%(package,p.returncode))
> +    return '__error__'
> +
> +def callback(x):
> +  done_queue.put(x)
> +
> +if __name__ == '__main__':
> +
> +  # Running Scenario
> +  #
> +  #  step1:             step2:                  step3:                  step4:
> +  # 'packages'      'dependencies'             'processPool'           'Distribute'
> +  #                                              ____                 1) 'pkg0' has no dependencies in dictionary 'dependencies'
> +  #  pkg0                 pkg0                  |    |                2) Distribute 'pkg0' to child process from Pool
> +  #    |                 /    \                 |____|                3) 'pkg0' is successfully completed 
> +  #  pkg1              pkg1   pkg2              |    |                4) the function callback resume main process
> +  #    |              / |  \   |  \    ---------------------->        5) the main process remove 'pkg0' from dictionary 'dependencies'  
> +  #  pkg2            /  |   \  |   \            |    |                   and now the 'pkg1' and 'pkg2' have no dependencies 
> +  #    |          pkg3 pkg4  pkg5  pkg6         |____|                6) query the dictionary 'dependencies' and select 'pkg1' and 'pkg2'. goto 1)
> +  #  pkg3          /|\  /\     |    / \         |    | 
> +  #  .....       ...  ...  ... ... ... ....    ...  ...
> +  #
> +  
> +  # step1: Get all packages that will been compiled  parser = 
> + argparse.ArgumentParser(description="Parallel build")  
> + parser.add_argument("--packages", '-p', dest="packages",nargs='+',metavar="PACKAGE",
> +      help="all the packages to parallel compiled")  
> + parser.add_argument("--jobs", '-j', dest="jobs",metavar="JOB",
> +      help="all the packages to parallel compiled")  args = 
> + parser.parse_args()  packages=args.packages
> +  cur_jobs=int(args.jobs)
> +  max_jobs = len(packages)/2
> +  if not packages:
> +    sys.stderr.write("parallel build must have targets\n")
> +    sys.exit(1)
> +  
> +  # step2: Create the frame of all packages and dependencies which will 
> + be built  dependencies = get_all_depends(packages)  while packages:
> +    packages = pickup_nokey_pkg(dependencies)
> +    if packages:
> +      depends = get_all_depends(packages)
> +      dependencies.update(depends)
> +  
> +  # step3: Create a process pool for parallel compilation
> +  jobs=min(cur_jobs,max_jobs)
> +  pool = multiprocessing.Pool(processes=jobs)
> +  
> +  # step4: 
> +  #   1) Pick up some packages that have no dependencies from the dictionary 'dependencies'
> +  #   2) Distribute the packages that have been selected to the child process to compile
> +  #   3) When a child process is successfully completed , the function callback will be invoked. otherwise the main process will exit.
> +  #   4) The callback will triger main process to resume.
> +  #   5) The main process will remove the package that has been compiled by child process from the dictionary 'dependencies' 
> +  #      and then some other packages that depend on this package will be released.
> +  #   6) Continue to query the dictionary, and then get the package that has no dependencies util the dictionary dependencies is empty.
> +  allpending=[]
> +  while dependencies:
> +    # 1) pick up some packages  
> +    no_deps_pkgs = pickup_nodepends_pkgs(dependencies)
> +
> +    if not no_deps_pkgs:
> +      sys.stderr.write("parallel build must have targets\n")
> +      sys.exit(1)
> +    for pkg in no_deps_pkgs:
> +      if pkg in allpending:
> +        continue
> +      # 2) if one package have no dependencies or its dependencies have been built successfully before, then we can build it here 
> +      pool.apply_async(make_build_pkg, (pkg, ),callback=callback)
> +      allpending.append(pkg)
> +    while True:
> +      # 3,4) Wait for child process execution to end , if the child process has some errors, the main process will exit 
> +      pkg = done_queue.get()
> +      if pkg == '__error__':
> +        sys.stderr.write("An error occurred during compilation, so must exit\n")
> +        sys.exit(1)
> +      # 5) remove the package that has been compiled successfully and released some other packages
> +      dependencies=remove_pkg_from_depends(pkg,dependencies)
> +      if done_queue.empty():
> +        break
> +    # 6) Continue
> +
> +make_build_pkg("")
> +print "all builds is done"
> --
> 1.8.3.1
>
Chu, Zhuliang (NSB - CN/Shanghai) Oct. 9, 2017, 1:15 p.m. UTC | #3
Hi Arnout, Yann, all,
I am sorry that I have not responded to your mail in the Chinese National Day holiday.
Thank you for your professional advice.

Now I will reply a few comments in the mailing list.

---Q1. Why do I have to write an extra script that has the same function with make $((`getconf _NPROCESSORS_ONLN` + 1))?
I am a newbie in buildroot and I ditn`t have a thorough study of the background of parallel compilation. but I know that `make -j` will cause some problems as shows in lines 103-116 of the top-level Makefile. So `make -j` is not a real solution for parallel compilation of buildroot. If the community can not avoid the risk of parallel compilation, then for security and stability, customers will not adopt this solution. I am interested in trying to implement the mechanism of parallel compilation.

At that time, my idea was to introduce a script which is gradually improved in this order as below:
First, realize the function of parallel compilation and push the patch to the community for discussion. 
Second, solve the problem of the hidden dependencies. 
(How did I conceive this step at that time?
In python script, there will split two step to compile buildroot. The first step is normal parallel compilation, which will compile all the packages out. Some packages which have hidden dependents may fail to compile. Don`t care about these things, we just record the configuration of all packages. The second step would recompile the package`s configuration and record the configuration of this package, that corresponds to the c language exception handling mechanism, like a try catch. If the results of the two comparisons are not the same, the package will be recompiled. Its dependencies(contain hidden depend) have been compiled in the first step. So this package would been compiled successfully.)
Third ,fix the bug of two packages manipulate the same file. I envisioned using some files lock.
Fourth optimized.

I see your reply and I think some of my ideas maybe difficult to achieve. But I will continue to try. If this is valid, I will send a test report to the community.

----Q 2. Is the real parallel compilation necessary?
If I am on my own computer to compile buildroot, then it spends a little more time. I still can endure it.
But buildroot`s compile time as little as possible is expected when people in different regions use a common buildroot repository and each person needs a different version of the buildroot. In this scenario, buildroot will be compiled over and over again when each engineer build his own project in build server. the time to reduce a little bit will bring greatly improved efficiency.

@Yann E. MORIN
'But in doing so, you will happen to build 9 packages in parallel, with
each package bulding up to 9 files in parallel, which is 91 jobs in
parallel (worst case). This is definitely no good.'
I seriously read the comments you gave me.
Your advice is very valuable to me like above. Thank you very much.

@all
I will spend some time to study the buildroot parallel compilation background.
The last version that I can find of this proposed solution is here:
http://lists.busybox.net/pipermail/buildroot/2015-June/131051.html
- if indeed that is the last version available or there is newer work?
- is someone currently working on this topic already?
- Who can tell me these information?

Thanks,
Zhuliang Chu


-----Original Message-----
From: Yann E. MORIN [mailto:yann.morin.1998@gmail.com] On Behalf Of Yann E. MORIN

Sent: 2017年9月26日 20:19
To: Chu, Zhuliang (NSB - CN/Shanghai) <zhuliang.chu@nokia-sbell.com>
Cc: buildroot@buildroot.org; jacmet@gmail.com; thomas.petazzoni@free-electrons.com; arnout@mind.be; patrickdepinguin+buildroot@gmail.com
Subject: Re: [PATCH 1/1] Refine the dependencies so that packages can be compiled in parallel.

Zhuliang Chu, All,

On 2017-09-20 14:31 +0000, Chu, Zhuliang (NSB - CN/Shanghai) spake thusly:
> support/scripts/parallel-build: Try to provide support for parallel 

> compilation

> 

> Now we know that buildroot does not support parallel compilation.

> My colleagues and I in the process of working will be a lot of 

> repetitive compilation of buildroot.

> A lot of time spent in the buildroot compiler, so I try to provide 

> parallel compiler support.

> I added a target 'parallel-build' to the Makefile that will call the 

> python script.


Although I see the reason you decided to go with an external python script, I would prefer that we go with a solution that is entirely implemented in the existing infrastructures.

Some people have started such an endeavour in the past (and you can find their work on the mailing list; search for "top-level parallel build"), simply building two or more packages at the same time is not so trivial.

First and foremost, we strive for reproducibility. Given a configuration and the same build machine, two builds will give the same result (or so we strive for it).

Second, the most dreaded cause for non repropducibility are optional, hidden dependencies. In an ideal world, all dependencies should be expresed in the .mk files. But in practice, this is not true. Currently, this issue is side-stepped thanks to the build ordering: two packages will always be built in the same order, so the optional dependency is always met, or it never is. Top-level parallel build, by its very nature, no longer guarantees the build ordering, and thus breaks reproducibility.

Those kind of hidden dependencies are either headers, libraries, or host tools alike.

The way to solve this is to guarantee that a package will only ever "see" the staging and host directories for its explicitly specified dependencies. This is what we call "per-package staging" (where staging implies host as well).

Unless we can do that, we killed reproducibility.

Third, we want to maximise the CPU usage, while still keeping the total job-level to an acceptable amount. It has to be noted that not all packages support building in parallel. Those are using $(MAKE1) instead of $(MAKE) in Buildroot.

So, if you want to maximise the CPU usage on (say) an 8-core system, you will want to use up to 9 jobs; you don't want to use more, or you'd kill useability of the system.

So if you decorelate (like your script does) the top-level and per-package number of jobs, then either you do not make full use of your system, or you overwhelm it with build jobs. You want to use a high top-level number of jobs, to cover the case where only MAKE1 packages get built (worst case), but you also want a high per-package number of jobs, in case a single package gets built (worst case).

But in doing so, you will happen to build 9 packages in parallel, with each package bulding up to 9 files in parallel, which is 91 jobs in parallel (worst case). This is definitely no good.

So you want to have a single number of jobs, that is spread evenly across all ready-to-build packages.

So, the only solution is to push for top-level parallel build to be natively supported in Buildroot.

Yes, talk is cheap, show-me-the-code and what-not. Don't hold your breath...

Regards,
Yann E. MORIN.

> in script parallel-build:

> the dependencies of all packages are parsed first and then stored in a dictionary, and then the packages that are not dependent are extracted from the dictionary, After successfully compiling, it will release the other packages that depend on these packages, this will run until the dictionary is empty.

> 

> In this script I also wrote a detailed note.

> 

> Signed-off-by: Zhuliang Chu <zhuliang.chu@nokia-sbell.com>

> ---

>  Makefile                       |   4 ++

>  support/scripts/parallel-build | 150 

> +++++++++++++++++++++++++++++++++++++++++

>  2 files changed, 154 insertions(+)

>  create mode 100755 support/scripts/parallel-build

> 

> diff --git a/Makefile b/Makefile

> index 9b09589..a854760 100644

> --- a/Makefile

> +++ b/Makefile

> @@ -785,6 +785,10 @@ show-targets:

>  .PHONY: show-build-order

>  show-build-order: $(patsubst %,%-show-build-order,$(PACKAGES))

>  

> +.PHONY:parallel-build

> +parallel-build:dependencies

> +	$(TOPDIR)/support/scripts/parallel-build --jobs $(PARALLEL_JOBS) 

> +--packages $(PACKAGES)

> +

>  .PHONY: graph-build

>  graph-build: $(O)/build/build-time.log

>  	@install -d $(GRAPHS_DIR)

> diff --git a/support/scripts/parallel-build 

> b/support/scripts/parallel-build new file mode 100755 index 

> 0000000..562d114

> --- /dev/null

> +++ b/support/scripts/parallel-build

> @@ -0,0 +1,150 @@

> +#!/usr/bin/python

> +import sys

> +import subprocess

> +import argparse

> +from copy import deepcopy

> +import multiprocessing

> +import brpkgutil

> +

> +done_queue=multiprocessing.Queue()

> +extras=""

> +get_depends_func = brpkgutil.get_depends get_rdepends_func = 

> +brpkgutil.get_rdepends

> +

> +# get all dependencies of packages

> +def get_all_depends(pkgs):

> +  filtered_pkgs = []

> +  for pkg in pkgs:

> +    if pkg in filtered_pkgs:

> +      continue

> +    filtered_pkgs.append(pkg)

> +  if len(filtered_pkgs) == 0:

> +    return []

> +  return get_depends_func(filtered_pkgs)

> +

> +# select someone which is in dictionary`s values but isn`t in keys.

> +def pickup_nokey_pkg(depends):

> +  nokey_deps = []

> +  alldeps=[]

> +  for deps in depends.values():

> +    alldeps.extend(deps)

> +  alldeps=list(set(alldeps))

> +  for dep in alldeps:

> +    if not depends.has_key(dep):

> +      nokey_deps.append(dep)

> +  return nokey_deps

> +

> +# select some packages that have no dependencies def

> +pickup_nodepends_pkgs(depends):

> +  no_deps_pkgs = []

> +  for pkg,deps in depends.items():

> +    if deps == []:

> +      no_deps_pkgs.append(pkg)

> +  return no_deps_pkgs

> +

> +# when a package has been compiled successfully, then remove it from dictionary 'dependencies'

> +def remove_pkg_from_depends(package,depends):

> +  for pkg,deps in depends.items():

> +    if package == pkg:

> +      del depends[package]

> +    if package in deps:

> +      depends[pkg].remove(package)

> +  return depends

> +

> +# real build process

> +def make_build_pkg(package):

> +  cmd = "make %s %s"%(extras,package)

> +  p =

> +subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subproc

> +es

> +s.STDOUT)

> +  (stdoutput,erroutput) = p.communicate()

> +  if stdoutput:

> +    sys.stdout.write(stdoutput)

> +  if erroutput:

> +    sys.stderr.write(erroutput)

> +  if p.returncode == 0:

> +    return package

> +  else:

> +    sys.stderr.write("make %s have a error,so the parallel build must exit %d\n"%(package,p.returncode))

> +    return '__error__'

> +

> +def callback(x):

> +  done_queue.put(x)

> +

> +if __name__ == '__main__':

> +

> +  # Running Scenario

> +  #

> +  #  step1:             step2:                  step3:                  step4:

> +  # 'packages'      'dependencies'             'processPool'           'Distribute'

> +  #                                              ____                 1) 'pkg0' has no dependencies in dictionary 'dependencies'

> +  #  pkg0                 pkg0                  |    |                2) Distribute 'pkg0' to child process from Pool

> +  #    |                 /    \                 |____|                3) 'pkg0' is successfully completed 

> +  #  pkg1              pkg1   pkg2              |    |                4) the function callback resume main process

> +  #    |              / |  \   |  \    ---------------------->        5) the main process remove 'pkg0' from dictionary 'dependencies'  

> +  #  pkg2            /  |   \  |   \            |    |                   and now the 'pkg1' and 'pkg2' have no dependencies 

> +  #    |          pkg3 pkg4  pkg5  pkg6         |____|                6) query the dictionary 'dependencies' and select 'pkg1' and 'pkg2'. goto 1)

> +  #  pkg3          /|\  /\     |    / \         |    | 

> +  #  .....       ...  ...  ... ... ... ....    ...  ...

> +  #

> +  

> +  # step1: Get all packages that will been compiled  parser = 

> + argparse.ArgumentParser(description="Parallel build") 

> + parser.add_argument("--packages", '-p', dest="packages",nargs='+',metavar="PACKAGE",

> +      help="all the packages to parallel compiled") 

> + parser.add_argument("--jobs", '-j', dest="jobs",metavar="JOB",

> +      help="all the packages to parallel compiled")  args =

> + parser.parse_args()  packages=args.packages

> +  cur_jobs=int(args.jobs)

> +  max_jobs = len(packages)/2

> +  if not packages:

> +    sys.stderr.write("parallel build must have targets\n")

> +    sys.exit(1)

> +  

> +  # step2: Create the frame of all packages and dependencies which 

> + will be built  dependencies = get_all_depends(packages)  while packages:

> +    packages = pickup_nokey_pkg(dependencies)

> +    if packages:

> +      depends = get_all_depends(packages)

> +      dependencies.update(depends)

> +  

> +  # step3: Create a process pool for parallel compilation

> +  jobs=min(cur_jobs,max_jobs)

> +  pool = multiprocessing.Pool(processes=jobs)

> +  

> +  # step4: 

> +  #   1) Pick up some packages that have no dependencies from the dictionary 'dependencies'

> +  #   2) Distribute the packages that have been selected to the child process to compile

> +  #   3) When a child process is successfully completed , the function callback will be invoked. otherwise the main process will exit.

> +  #   4) The callback will triger main process to resume.

> +  #   5) The main process will remove the package that has been compiled by child process from the dictionary 'dependencies' 

> +  #      and then some other packages that depend on this package will be released.

> +  #   6) Continue to query the dictionary, and then get the package that has no dependencies util the dictionary dependencies is empty.

> +  allpending=[]

> +  while dependencies:

> +    # 1) pick up some packages  

> +    no_deps_pkgs = pickup_nodepends_pkgs(dependencies)

> +

> +    if not no_deps_pkgs:

> +      sys.stderr.write("parallel build must have targets\n")

> +      sys.exit(1)

> +    for pkg in no_deps_pkgs:

> +      if pkg in allpending:

> +        continue

> +      # 2) if one package have no dependencies or its dependencies have been built successfully before, then we can build it here 

> +      pool.apply_async(make_build_pkg, (pkg, ),callback=callback)

> +      allpending.append(pkg)

> +    while True:

> +      # 3,4) Wait for child process execution to end , if the child process has some errors, the main process will exit 

> +      pkg = done_queue.get()

> +      if pkg == '__error__':

> +        sys.stderr.write("An error occurred during compilation, so must exit\n")

> +        sys.exit(1)

> +      # 5) remove the package that has been compiled successfully and released some other packages

> +      dependencies=remove_pkg_from_depends(pkg,dependencies)

> +      if done_queue.empty():

> +        break

> +    # 6) Continue

> +

> +make_build_pkg("")

> +print "all builds is done"

> --

> 1.8.3.1

> 




--
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' 
| conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'
Hi Zhuliang,

On 20-09-17 16:31, Chu, Zhuliang (NSB - CN/Shanghai) wrote:
> support/scripts/parallel-build: Try to provide support for parallel compilation
> 
> Now we know that buildroot does not support parallel compilation.
> My colleagues and I in the process of working will be a lot of repetitive compilation of buildroot.
> A lot of time spent in the buildroot compiler, so I try to provide parallel compiler support.
> I added a target 'parallel-build' to the Makefile that will call the python script.
> 
> in script parallel-build:
> the dependencies of all packages are parsed first and then stored in a dictionary, and then the packages that are not dependent are extracted from the dictionary, After successfully compiling, it will release the other packages that depend on these packages, this will run until the dictionary is empty.

 As far as I can see after a cursory look, this will do exactly the same as
'make -j' since it just takes into account the dependencies that are encoded in
Buildroot. Did you read lines 103-116 of the top-level Makefile?

# Parallel execution of this Makefile is disabled because it changes
# the packages building order, that can be a problem for two reasons:
# - If a package has an unspecified optional dependency and that
#   dependency is present when the package is built, it is used,
#   otherwise it isn't (but compilation happily proceeds) so the end
#   result will differ if the order is swapped due to parallel
#   building.
# - Also changing the building order can be a problem if two packages
#   manipulate the same file in the target directory.
#
# Taking into account the above considerations, if you still want to execute
# this top-level Makefile in parallel comment the ".NOTPARALLEL" line and
# use the -j<jobs> option when building, e.g:
#      make -j$((`getconf _NPROCESSORS_ONLN`+1))

 This script is not addressing either of the problems mentioned there. And if
this script works for you, a vastly simpler and faster to instead rely on make -j.

 See also commit 1668e1da390c3320ed7bcf0377ba57ed2280b38d for details.

 Regards,
 Arnout

> 
> In this script I also wrote a detailed note.
> 
> Signed-off-by: Zhuliang Chu <zhuliang.chu@nokia-sbell.com>
[snip]
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 9b09589..a854760 100644
--- a/Makefile
+++ b/Makefile
@@ -785,6 +785,10 @@  show-targets:
 .PHONY: show-build-order
 show-build-order: $(patsubst %,%-show-build-order,$(PACKAGES))
 
+.PHONY:parallel-build
+parallel-build:dependencies
+	$(TOPDIR)/support/scripts/parallel-build --jobs $(PARALLEL_JOBS) 
+--packages $(PACKAGES)
+
 .PHONY: graph-build
 graph-build: $(O)/build/build-time.log
 	@install -d $(GRAPHS_DIR)
diff --git a/support/scripts/parallel-build b/support/scripts/parallel-build new file mode 100755 index 0000000..562d114
--- /dev/null
+++ b/support/scripts/parallel-build
@@ -0,0 +1,150 @@ 
+#!/usr/bin/python
+import sys
+import subprocess
+import argparse
+from copy import deepcopy
+import multiprocessing
+import brpkgutil
+
+done_queue=multiprocessing.Queue()
+extras=""
+get_depends_func = brpkgutil.get_depends get_rdepends_func = 
+brpkgutil.get_rdepends
+
+# get all dependencies of packages
+def get_all_depends(pkgs):
+  filtered_pkgs = []
+  for pkg in pkgs:
+    if pkg in filtered_pkgs:
+      continue
+    filtered_pkgs.append(pkg)
+  if len(filtered_pkgs) == 0:
+    return []
+  return get_depends_func(filtered_pkgs)
+
+# select someone which is in dictionary`s values but isn`t in keys.
+def pickup_nokey_pkg(depends):
+  nokey_deps = []
+  alldeps=[]
+  for deps in depends.values():
+    alldeps.extend(deps)
+  alldeps=list(set(alldeps))
+  for dep in alldeps:
+    if not depends.has_key(dep):
+      nokey_deps.append(dep)
+  return nokey_deps
+
+# select some packages that have no dependencies def 
+pickup_nodepends_pkgs(depends):
+  no_deps_pkgs = []
+  for pkg,deps in depends.items():
+    if deps == []:
+      no_deps_pkgs.append(pkg)
+  return no_deps_pkgs
+
+# when a package has been compiled successfully, then remove it from dictionary 'dependencies'
+def remove_pkg_from_depends(package,depends):
+  for pkg,deps in depends.items():
+    if package == pkg:
+      del depends[package]
+    if package in deps:
+      depends[pkg].remove(package)
+  return depends
+
+# real build process
+def make_build_pkg(package):
+  cmd = "make %s %s"%(extras,package)
+  p = 
+subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subproces
+s.STDOUT)
+  (stdoutput,erroutput) = p.communicate()
+  if stdoutput:
+    sys.stdout.write(stdoutput)
+  if erroutput:
+    sys.stderr.write(erroutput)
+  if p.returncode == 0:
+    return package
+  else:
+    sys.stderr.write("make %s have a error,so the parallel build must exit %d\n"%(package,p.returncode))
+    return '__error__'
+
+def callback(x):
+  done_queue.put(x)
+
+if __name__ == '__main__':
+
+  # Running Scenario
+  #
+  #  step1:             step2:                  step3:                  step4:
+  # 'packages'      'dependencies'             'processPool'           'Distribute'
+  #                                              ____                 1) 'pkg0' has no dependencies in dictionary 'dependencies'
+  #  pkg0                 pkg0                  |    |                2) Distribute 'pkg0' to child process from Pool
+  #    |                 /    \                 |____|                3) 'pkg0' is successfully completed 
+  #  pkg1              pkg1   pkg2              |    |                4) the function callback resume main process
+  #    |              / |  \   |  \    ---------------------->        5) the main process remove 'pkg0' from dictionary 'dependencies'  
+  #  pkg2            /  |   \  |   \            |    |                   and now the 'pkg1' and 'pkg2' have no dependencies 
+  #    |          pkg3 pkg4  pkg5  pkg6         |____|                6) query the dictionary 'dependencies' and select 'pkg1' and 'pkg2'. goto 1)
+  #  pkg3          /|\  /\     |    / \         |    | 
+  #  .....       ...  ...  ... ... ... ....    ...  ...
+  #
+  
+  # step1: Get all packages that will been compiled  parser = 
+ argparse.ArgumentParser(description="Parallel build")  
+ parser.add_argument("--packages", '-p', dest="packages",nargs='+',metavar="PACKAGE",
+      help="all the packages to parallel compiled")  
+ parser.add_argument("--jobs", '-j', dest="jobs",metavar="JOB",
+      help="all the packages to parallel compiled")  args = 
+ parser.parse_args()  packages=args.packages
+  cur_jobs=int(args.jobs)
+  max_jobs = len(packages)/2
+  if not packages:
+    sys.stderr.write("parallel build must have targets\n")
+    sys.exit(1)
+  
+  # step2: Create the frame of all packages and dependencies which will 
+ be built  dependencies = get_all_depends(packages)  while packages:
+    packages = pickup_nokey_pkg(dependencies)
+    if packages:
+      depends = get_all_depends(packages)
+      dependencies.update(depends)
+  
+  # step3: Create a process pool for parallel compilation
+  jobs=min(cur_jobs,max_jobs)
+  pool = multiprocessing.Pool(processes=jobs)
+  
+  # step4: 
+  #   1) Pick up some packages that have no dependencies from the dictionary 'dependencies'
+  #   2) Distribute the packages that have been selected to the child process to compile
+  #   3) When a child process is successfully completed , the function callback will be invoked. otherwise the main process will exit.
+  #   4) The callback will triger main process to resume.
+  #   5) The main process will remove the package that has been compiled by child process from the dictionary 'dependencies' 
+  #      and then some other packages that depend on this package will be released.
+  #   6) Continue to query the dictionary, and then get the package that has no dependencies util the dictionary dependencies is empty.
+  allpending=[]
+  while dependencies:
+    # 1) pick up some packages  
+    no_deps_pkgs = pickup_nodepends_pkgs(dependencies)
+
+    if not no_deps_pkgs:
+      sys.stderr.write("parallel build must have targets\n")
+      sys.exit(1)
+    for pkg in no_deps_pkgs:
+      if pkg in allpending:
+        continue
+      # 2) if one package have no dependencies or its dependencies have been built successfully before, then we can build it here 
+      pool.apply_async(make_build_pkg, (pkg, ),callback=callback)
+      allpending.append(pkg)
+    while True:
+      # 3,4) Wait for child process execution to end , if the child process has some errors, the main process will exit 
+      pkg = done_queue.get()
+      if pkg == '__error__':
+        sys.stderr.write("An error occurred during compilation, so must exit\n")
+        sys.exit(1)
+      # 5) remove the package that has been compiled successfully and released some other packages
+      dependencies=remove_pkg_from_depends(pkg,dependencies)
+      if done_queue.empty():
+        break
+    # 6) Continue
+
+make_build_pkg("")
+print "all builds is done"