diff mbox series

[qemu-web,v2] add post about plans for Python venvs

Message ID 20230323084005.1032305-1-pbonzini@redhat.com
State New
Headers show
Series [qemu-web,v2] add post about plans for Python venvs | expand

Commit Message

Paolo Bonzini March 23, 2023, 8:40 a.m. UTC
This post details the design that John Snow and I are planning for QEMU 8.1.
The purpose is to detect possible inconsistencies in the build environment,
that could happen on enterprise distros once Python 3.6 support is dropped.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
v1->v2: add CSS for asciicast
	note that sphinx is already checked for now-enough Python
	some more copy-editing

 _posts/2023-03-22-python.md | 223 ++++++++++++++++++++++++++++++++++++
 assets/css/style.css        |   4 +
 2 files changed, 227 insertions(+)
 create mode 100644 _posts/2023-03-22-python.md

Comments

Thomas Huth March 24, 2023, 8:58 a.m. UTC | #1
On 23/03/2023 09.40, Paolo Bonzini wrote:
> This post details the design that John Snow and I are planning for QEMU 8.1.
> The purpose is to detect possible inconsistencies in the build environment,
> that could happen on enterprise distros once Python 3.6 support is dropped.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> v1->v2: add CSS for asciicast
> 	note that sphinx is already checked for now-enough Python
> 	some more copy-editing
> 
>   _posts/2023-03-22-python.md | 223 ++++++++++++++++++++++++++++++++++++
>   assets/css/style.css        |   4 +
>   2 files changed, 227 insertions(+)
>   create mode 100644 _posts/2023-03-22-python.md
> 
> diff --git a/_posts/2023-03-22-python.md b/_posts/2023-03-22-python.md
> new file mode 100644
> index 0000000..d463847
> --- /dev/null
> +++ b/_posts/2023-03-22-python.md
> @@ -0,0 +1,222 @@
> +---
> +layout: post
> +title:  "Preparing a consistent Python environment"
> +date:   2023-03-22 13:30:00 +0000
> +categories: [build, python, developers]
> +---
> +Building QEMU is a complex task, split across several programs.
> +configure finds the host and cross compilers that are needed to build

s/configure/The `configure` script/ ?

> +emulators and firmware; Meson prepares the build environment for the
> +emulators; finally, Make and ninja actually perform the build, and

I'd either capitalize both, Make and Ninja, or use quotes: "make" and "ninja".

The remaining parts look fine to me.

Acked-by: Thomas Huth <thuth@redhat.com>
diff mbox series

Patch

diff --git a/_posts/2023-03-22-python.md b/_posts/2023-03-22-python.md
new file mode 100644
index 0000000..d463847
--- /dev/null
+++ b/_posts/2023-03-22-python.md
@@ -0,0 +1,222 @@ 
+---
+layout: post
+title:  "Preparing a consistent Python environment"
+date:   2023-03-22 13:30:00 +0000
+categories: [build, python, developers]
+---
+Building QEMU is a complex task, split across several programs.
+configure finds the host and cross compilers that are needed to build
+emulators and firmware; Meson prepares the build environment for the
+emulators; finally, Make and ninja actually perform the build, and
+in some cases they run tests as well.
+
+In addition to compiling C code, many build steps run tools and
+scripts which are mostly written in the Python language.  These include
+processing the emulator configuration, code generators for tracepoints
+and QAPI, extensions for the Sphinx documentation tool, and the Avocado
+testing framework.  The Meson build system itself is written in Python, too.
+
+Some of these tools are run through the `python3` executable, while others
+are invoked directly as `sphinx-build` or `meson`, and this can create
+inconsistencies.  For example, QEMU's `configure` script checks for a
+minimum version of Python and rejects too-old interpreters.  However,
+what would happen if code run by Sphinx used a different version?
+
+This situation has been largely hypothetical until recently; QEMU's
+Python code is already tested with a wide range of versions of the
+interpreter, and it would not be a huge issue if Sphinx used a different
+version of Python as long as both of them were supported.  This will
+change in version 8.1 of QEMU, which will bump the minimum supported
+version of Python from 3.6 to 3.8.  While all the distros that QEMU
+supports have a recent-enough interpreter, the default on RHEL8 and
+SLES15 is still version 3.6, and that is what all binaries in `/usr/bin`
+use unconditionally.
+
+As of QEMU 8.0, even if `configure` is told to use `/usr/bin/python3.8`
+for the build, QEMU's custom Sphinx extensions would still run under
+Python 3.6.  configure does separately check that Sphinx is executing
+with a new enough Python version, but it would be nice if there were
+a more generic way to prepare a consistent Python environment.
+
+This post will explain how QEMU 8.1 will ensure that a single interpreter
+is used for the whole of the build process.  Getting there will require
+some familiarity with Python packaging, so let's start with virtual
+environments.
+
+## Virtual environments
+
+It is surprisingly hard to find what Python interpreter a given script
+will use.  You can try to parse the first line of the script, which will
+be something like `#! /usr/bin/python3`, but there is no guarantee of
+success.  For example, on some version of Homebrew `/usr/bin/meson`
+will be a wrapper script like:
+
+```bash
+#!/bin/bash
+PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" \
+  exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@"
+```
+
+The file with the Python shebang line will be hidden somewhere in
+`/usr/local/Cellar`.  Therefore, performing some kind of check on the
+files in `/usr/bin` is ruled out.  QEMU needs to set up a consistent
+environment on its own.
+
+If a user who is building QEMU wanted to do so, the simplest way would
+be to use Python virtual environments.  A virtual environment takes an
+existing Python installation but gives it a local set of Python packages.
+It also has its own `bin` directory; place it at the beginning of your
+`PATH` and you will be able to control the Python interpreter for scripts
+that begin with `#! /usr/bin/env python3`.
+
+Furthermore, when packages are installed into the virtual environment
+with `pip`, they always refer to the Python interpreter that was used to
+create the environment.  Virtual environments mostly solve the consistency
+problem at the cost of an extra `pip install` step to put QEMU's build
+dependencies into the environment.
+
+Unfortunately, this extra step has a substantial downside.  Even though
+the virtual environment can optionally refer to the base installation's
+installed packages, `pip` will always install packages from scratch
+into the virtual environment. For all Linux distributions except RHEL8
+and SLES15 this is unnecessary, and users would be happy to build QEMU
+using the versions of Meson and Sphinx included in the distribution.
+
+Even worse, `pip install` will access the Python package index (PyPI)
+over the Internet, which is often impossible on build machines that
+are sealed from the outside world.  Automated installation of PyPI
+dependencies may actually be a welcome feature, but it must also remain
+a strictly optional feature.
+
+In other words, the ideal solution would use a non-isolated virtual
+environment, to be able to use system packages provided by Linux
+distributions; but it would also ensure that scripts (`sphinx-build`,
+`meson`, `avocado`) are placed into `bin` just like `pip install` does.
+
+## Distribution packages
+
+When it comes to packages, Python surely makes an effort to be confusing.
+The fundamental unit for _importing_ code into a Python program is called
+a package; for example `os` and `sys` are two examples of a package.
+However, a program or library that is distributed on PyPI consists
+of _many_ such "import packages": that's because while `pip` is usually
+said to be a "package installer" for Python, more precisely it installs
+"distribution packages".
+
+To add to the confusion, the term "distribution package" is often
+shortened to _either_ "package" or "distribution".  And finally,
+the metadata of the distribution package remains available even after
+installation, so "distributions" include things that are already
+installed (and are not being distributed anywhere).
+
+All this matters because distribution metadata will be the key to
+building the perfect virtual environment.  If you look at the content
+of `bin/meson` in a virtual environment, after installing the package
+with `pip`, this is what you find:
+
+```python
+#!/home/pbonzini/my-venv/bin/python3
+# -*- coding: utf-8 -*-
+import re
+import sys
+from mesonbuild.mesonmain import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())
+```
+
+This looks a lot like automatically generated code, and in fact it is;
+the only parts that vary are the `from mesonbuild.mesonmain import main`
+import, and the invocation of the `main()` function on the last line.
+`pip` creates this invocation script based on the `setup.cfg` file
+in Meson's source code, more specifically based on the following stanza:
+
+```
+[options.entry_points]
+console_scripts =
+  meson = mesonbuild.mesonmain:main
+```
+
+Similar declarations exist in Sphinx, Avocado and so on, and accessing their
+content is easy via `importlib.metadata` (available in Python 3.8+):
+
+```
+$ python3
+>>> from importlib.metadata import distribution
+>>> distribution('meson').entry_points
+[EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')]
+```
+
+`importlib` looks up the metadata in the running Python interpreter's
+search path; if Meson is installed under another interpreter's `site-packages`
+directory, it will not be found:
+
+```
+$ python3.8
+>>> from importlib.metadata import distribution
+>>> distribution('meson').entry_points
+Traceback (most recent call last):
+...
+importlib.metadata.PackageNotFoundError: meson
+```
+
+So finally we have a plan!  `configure` can build a non-isolated virtual
+environment, use `importlib` to check that the required packages exist
+in the base installation, and create scripts in `bin` that point to the
+right Python interpreter.  Then, it can optionally use `pip install` to
+install the missing packages.
+
+While this process includes a certain amount of
+specialized logic, Python provides a customizable [`venv`
+module](https://docs.python.org/3/library/venv.html) to create virtual
+environments.  The custom steps can be performed by subclassing
+`venv.EnvBuilder`.
+
+This will provide the same experience as QEMU 8.0, except that there will
+be no need for the `--meson` and `--sphinx-build` options to the
+`configure` script.  The path to the Python interpreter is enough to
+set up all Python programs used during the build.
+
+There is only one thing left to fix...
+
+## Nesting virtual environments
+
+Remember how we started with a user that creates her own virtual
+environment before building QEMU?  Well, this would not work
+anymore, because virtual environments cannot be nested.  As soon
+as `configure` creates its own virtual environment, the packages
+installed by the user are not available anymore.
+
+Fortunately, the "appearance" of a nested virtual environment is easy
+to emulate.  Detecting whether `python3` runs in a virtual environment
+is as easy as checking `sys.prefix != sys.base_prefix`; if it is,
+we need to retrieve the parent virtual environments `site-packages`
+directory:
+
+```
+>>> import sysconfig
+>>> sysconfig.get_path('purelib')
+'/home/pbonzini/my-venv/lib/python3.11/site-packages'
+```
+
+and write it to a `.pth` file in the `lib` directory of the new virtual
+environment.  The following demo shows how a distribution package in the
+parent virtual environment will be available in the child as well:
+
+<script async id="asciicast-31xjLsR4KjsU9HuhOUpU08tvb" src="https://asciinema.org/a/31xjLsR4KjsU9HuhOUpU08tvb.js"></script>
+
+A small detail is that `configure`'s new virtual environment should
+mirror the isolation setting of the parent.  An isolated venv can be
+detected because `sys.base_prefix in site.PREFIXES` is false.
+
+## Conclusion
+
+Right now, QEMU only makes a minimal attempt at ensuring consistency
+of the Python environment; Meson is always run using the interpreter
+that was passed to the configure script with `--python` or `$PYTHON`,
+but that's it.  Once the above technique will be implemented in QEMU 8.1,
+there will be no difference in the build experience, but configuration
+will be easier and a wider set of invalid build environments will
+be detected.  We will merge these checks before dropping support for
+Python 3.6, so that users on older enterprise distributions will have
+a smooth transition.
diff --git a/assets/css/style.css b/assets/css/style.css
index 2705787..983fb67 100644
--- a/assets/css/style.css
+++ b/assets/css/style.css
@@ -184,6 +184,10 @@ 
 		color: #999999;
 	}
 
+	.asciicast {
+		width: 45em;
+	}
+
 	/* Sections/Articles */
 
 		section,