diff mbox

KVM: Add wrapper script around QEMU to test kernels

Message ID 1320543320-32728-1-git-send-email-agraf@suse.de
State New
Headers show

Commit Message

Alexander Graf Nov. 6, 2011, 1:35 a.m. UTC
On LinuxCon I had a nice chat with Linus on what he thinks kvm-tool
would be doing and what he expects from it. Basically he wants a
small and simple tool he and other developers can run to try out and
see if the kernel they just built actually works.

Fortunately, QEMU can do that today already! The only piece that was
missing was the "simple" piece of the equation, so here is a script
that wraps around QEMU and executes a kernel you just built.

If you do have KVM around and are not cross-compiling, it will use
KVM. But if you don't, you can still fall back to emulation mode and
at least check if your kernel still does what you expect. I only
implemented support for s390x and ppc there, but it's easily extensible
to more platforms, as QEMU can emulate (and virtualize) pretty much
any platform out there.

If you don't have qemu installed, please do so before using this script. Your
distro should provide a package for it (might even call it "kvm"). If not,
just compile it from source - it's not hard!

To quickly get going, just execute the following as user:

    $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash

This will drop you into a shell on your rootfs.

Happy hacking!

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2:

  - fix naming of QEMU
  - use grep -q for has_config
  - support multiple -a args
  - spawn gdb on execution
  - pass through qemu options
  - dont use qemu-system-x86_64 on i386
  - add funny sentence to startup text
  - more helpful error messages

v2 -> v3:

  - move to tools/testing
  - fix running: message

  ( sorry for sending this version so late - I got caught up in
    random other stuff )

---
 tools/testing/run-qemu/run-qemu.sh |  338 ++++++++++++++++++++++++++++++++++++
 1 files changed, 338 insertions(+), 0 deletions(-)
 create mode 100755 tools/testing/run-qemu/run-qemu.sh

Comments

Andreas Färber Nov. 6, 2011, 1:14 a.m. UTC | #1
Am 06.11.2011 02:35, schrieb Alexander Graf:
> On LinuxCon I had a nice chat with Linus on what he thinks kvm-tool
> would be doing and what he expects from it. Basically he wants a
> small and simple tool he and other developers can run to try out and
> see if the kernel they just built actually works.
> 
> Fortunately, QEMU can do that today already! The only piece that was
> missing was the "simple" piece of the equation, so here is a script
> that wraps around QEMU and executes a kernel you just built.
> 
> If you do have KVM around and are not cross-compiling, it will use
> KVM. But if you don't, you can still fall back to emulation mode and
> at least check if your kernel still does what you expect. I only
> implemented support for s390x and ppc there, but it's easily extensible
> to more platforms, as QEMU can emulate (and virtualize) pretty much
> any platform out there.
> 
> If you don't have qemu installed, please do so before using this script. Your
> distro should provide a package for it (might even call it "kvm"). If not,
> just compile it from source - it's not hard!
> 
> To quickly get going, just execute the following as user:
> 
>     $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash

Path needs updating.

> 
> This will drop you into a shell on your rootfs.
> 
> Happy hacking!
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> 
> ---

> diff --git a/tools/testing/run-qemu/run-qemu.sh b/tools/testing/run-qemu/run-qemu.sh
> new file mode 100755
> index 0000000..70f194f
> --- /dev/null
> +++ b/tools/testing/run-qemu/run-qemu.sh

> +# Try to find the KVM accelerated QEMU binary
> +
> +[ "$ARCH" ] || ARCH=$(uname -m)
> +case $ARCH in
> +x86_64)
> +	KERNEL_BIN=arch/x86/boot/bzImage
> +	# SUSE and Red Hat call the binary qemu-kvm
> +	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-kvm 2>/dev/null)
> +
> +	# Debian and Gentoo call it kvm
> +	[ "$QEMU_BIN" ] || QEMU_BIN=$(which kvm 2>/dev/null)
> +
> +	# QEMU's own build system calls it qemu-system-x86_64
> +	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-system-x86_64 2>/dev/null)
> +	;;
> +i*86)
> +	KERNEL_BIN=arch/x86/boot/bzImage
> +	# SUSE and Red Hat call the binary qemu-kvm
> +	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-kvm 2>/dev/null)
> +
> +	# Debian and Gentoo call it kvm
> +	[ "$QEMU_BIN" ] || QEMU_BIN=$(which kvm 2>/dev/null)
> +
> +	KERNEL_BIN=arch/x86/boot/bzImage

Copy&paste?

> +	# i386 version of QEMU

QEMU's own build system calls it qemu-system-i386 now. :)

> +	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu 2>/dev/null)

We should first test for qemu-system-i386, then fall back to old qemu.

Andreas

P.S. You're still ahead of time...
Pekka Enberg Nov. 6, 2011, 10:04 a.m. UTC | #2
Hi Alexander,

On Sun, Nov 6, 2011 at 3:35 AM, Alexander Graf <agraf@suse.de> wrote:
> On LinuxCon I had a nice chat with Linus on what he thinks kvm-tool
> would be doing and what he expects from it. Basically he wants a
> small and simple tool he and other developers can run to try out and
> see if the kernel they just built actually works.
>
> Fortunately, QEMU can do that today already! The only piece that was
> missing was the "simple" piece of the equation, so here is a script
> that wraps around QEMU and executes a kernel you just built.

I'm happy to see some real competition for the KVM tool in usability. ;-)

That said, while the script looks really useful for developers,
wouldn't it make more sense to put it in QEMU to make sure it's kept
up-to-date and distributions can pick it up too? (And yes, I realize
the irony here.)

                        Pekka
Avi Kivity Nov. 6, 2011, 10:07 a.m. UTC | #3
On 11/06/2011 12:04 PM, Pekka Enberg wrote:
> Hi Alexander,
>
> On Sun, Nov 6, 2011 at 3:35 AM, Alexander Graf <agraf@suse.de> wrote:
> > On LinuxCon I had a nice chat with Linus on what he thinks kvm-tool
> > would be doing and what he expects from it. Basically he wants a
> > small and simple tool he and other developers can run to try out and
> > see if the kernel they just built actually works.
> >
> > Fortunately, QEMU can do that today already! The only piece that was
> > missing was the "simple" piece of the equation, so here is a script
> > that wraps around QEMU and executes a kernel you just built.
>
> I'm happy to see some real competition for the KVM tool in usability. ;-)
>
> That said, while the script looks really useful for developers,
> wouldn't it make more sense to put it in QEMU to make sure it's kept
> up-to-date and distributions can pick it up too? (And yes, I realize
> the irony here.)

Why would distributions want it?  It's only useful for kernel developers.
Pekka Enberg Nov. 6, 2011, 10:12 a.m. UTC | #4
On Sun, Nov 6, 2011 at 12:07 PM, Avi Kivity <avi@redhat.com> wrote:
>> I'm happy to see some real competition for the KVM tool in usability. ;-)
>>
>> That said, while the script looks really useful for developers,
>> wouldn't it make more sense to put it in QEMU to make sure it's kept
>> up-to-date and distributions can pick it up too? (And yes, I realize
>> the irony here.)
>
> Why would distributions want it?  It's only useful for kernel developers.

It's useful for kernel testers too.

If this is a serious attempt in making QEMU command line suck less on
Linux, I think it makes sense to do this properly instead of adding a
niche script to the kernel tree that's simply going to bit rot over
time.

                        Pekka
Avi Kivity Nov. 6, 2011, 10:23 a.m. UTC | #5
On 11/06/2011 12:12 PM, Pekka Enberg wrote:
> On Sun, Nov 6, 2011 at 12:07 PM, Avi Kivity <avi@redhat.com> wrote:
> >> I'm happy to see some real competition for the KVM tool in usability. ;-)
> >>
> >> That said, while the script looks really useful for developers,
> >> wouldn't it make more sense to put it in QEMU to make sure it's kept
> >> up-to-date and distributions can pick it up too? (And yes, I realize
> >> the irony here.)
> >
> > Why would distributions want it?  It's only useful for kernel developers.
>
> It's useful for kernel testers too.

Well, they usually have a kernel with them.

> If this is a serious attempt in making QEMU command line suck less on
> Linux, I think it makes sense to do this properly instead of adding a
> niche script to the kernel tree that's simply going to bit rot over
> time.

You misunderstand.  This is an attempt to address the requirements of a
niche population, kernel developers and testers, not to improve the qemu
command line.  For the majority of qemu installations, this script is
useless.

In most installations, qemu is driven by other programs, so any changes
to the command line would be invisible, except insofar as they break things.

For the occasional direct user of qemu, something like 'qemu-kvm -m 1G
/images/blah.img' is enough to boot an image.  This script doesn't help
in any way.

This script is for kernel developers who don't want to bother with
setting up a disk image (which, btw, many are still required to do - I'm
guessing most kernel developers who use qemu are cross-arch).  It has
limited scope and works mostly by hiding qemu features.  As such it
doesn't belong in qemu.
Pekka Enberg Nov. 6, 2011, 11:08 a.m. UTC | #6
Hi Avi,

On Sun, 2011-11-06 at 12:23 +0200, Avi Kivity wrote:
> > If this is a serious attempt in making QEMU command line suck less on
> > Linux, I think it makes sense to do this properly instead of adding a
> > niche script to the kernel tree that's simply going to bit rot over
> > time.
> 
> You misunderstand.  This is an attempt to address the requirements of a
> niche population, kernel developers and testers, not to improve the qemu
> command line.  For the majority of qemu installations, this script is
> useless.

Right.

On Sun, 2011-11-06 at 12:23 +0200, Avi Kivity wrote:
> In most installations, qemu is driven by other programs, so any changes
> to the command line would be invisible, except insofar as they break things.
> 
> For the occasional direct user of qemu, something like 'qemu-kvm -m 1G
> /images/blah.img' is enough to boot an image.  This script doesn't help
> in any way.
> 
> This script is for kernel developers who don't want to bother with
> setting up a disk image (which, btw, many are still required to do - I'm
> guessing most kernel developers who use qemu are cross-arch).  It has
> limited scope and works mostly by hiding qemu features.  As such it
> doesn't belong in qemu.

I'm certainly not against merging the script if people are actually
using it and it solves their problem.

I personally find the whole exercise pointless because it's not
attempting to solve any of the fundamental issues QEMU command line
interface has nor does it try to make Linux on Linux virtualization
simpler and more integrated.

People seem to think the KVM tool is only about solving a specific
problem to kernel developers. That's certainly never been my goal as I
do lots of userspace programming as well. The end game for me is to
replace QEMU/VirtualBox for Linux on Linux virtualization for my day to
day purposes.

			Pekka
Avi Kivity Nov. 6, 2011, 11:50 a.m. UTC | #7
On 11/06/2011 01:08 PM, Pekka Enberg wrote:
> On Sun, 2011-11-06 at 12:23 +0200, Avi Kivity wrote:
> > In most installations, qemu is driven by other programs, so any changes
> > to the command line would be invisible, except insofar as they break things.
> > 
> > For the occasional direct user of qemu, something like 'qemu-kvm -m 1G
> > /images/blah.img' is enough to boot an image.  This script doesn't help
> > in any way.
> > 
> > This script is for kernel developers who don't want to bother with
> > setting up a disk image (which, btw, many are still required to do - I'm
> > guessing most kernel developers who use qemu are cross-arch).  It has
> > limited scope and works mostly by hiding qemu features.  As such it
> > doesn't belong in qemu.
>
> I'm certainly not against merging the script if people are actually
> using it and it solves their problem.
>
> I personally find the whole exercise pointless because it's not
> attempting to solve any of the fundamental issues QEMU command line
> interface

There are no "fundamental qemu command line issues".  It's hairy, yes,
and verbose, but using "fundamental" to describe a choice between one
arcane set command line options and another is a bit of overstatement. 
Most users will use a GUI anyway.

>  has nor does it try to make Linux on Linux virtualization
> simpler and more integrated.

So far, kvm-tool capabilities are a subset of qemu's.  Does it add
anything beyond a different command-line?

> People seem to think the KVM tool is only about solving a specific
> problem to kernel developers. That's certainly never been my goal as I
> do lots of userspace programming as well. The end game for me is to
> replace QEMU/VirtualBox for Linux on Linux virtualization for my day to
> day purposes.

Maybe it should be in tools/pekka then.  Usually subsystems that want to
be merged into Linux have broaded audiences though.
Pekka Enberg Nov. 6, 2011, 12:14 p.m. UTC | #8
On Sun, Nov 6, 2011 at 1:50 PM, Avi Kivity <avi@redhat.com> wrote:
>> People seem to think the KVM tool is only about solving a specific
>> problem to kernel developers. That's certainly never been my goal as I
>> do lots of userspace programming as well. The end game for me is to
>> replace QEMU/VirtualBox for Linux on Linux virtualization for my day to
>> day purposes.
>
> Maybe it should be in tools/pekka then.  Usually subsystems that want to
> be merged into Linux have broaded audiences though.

I think you completely missed my point.

I'm simply saying that KVM tool was never about solving a narrow
problem Alexander's script is trying to solve. That's why I feel it's
such a pointless exercise.

                        Pekka
Avi Kivity Nov. 6, 2011, 12:27 p.m. UTC | #9
On 11/06/2011 02:14 PM, Pekka Enberg wrote:
> On Sun, Nov 6, 2011 at 1:50 PM, Avi Kivity <avi@redhat.com> wrote:
> >> People seem to think the KVM tool is only about solving a specific
> >> problem to kernel developers. That's certainly never been my goal as I
> >> do lots of userspace programming as well. The end game for me is to
> >> replace QEMU/VirtualBox for Linux on Linux virtualization for my day to
> >> day purposes.
> >
> > Maybe it should be in tools/pekka then.  Usually subsystems that want to
> > be merged into Linux have broaded audiences though.
>
> I think you completely missed my point.
>
> I'm simply saying that KVM tool was never about solving a narrow
> problem Alexander's script is trying to solve. That's why I feel it's
> such a pointless exercise.

But from your description, you're trying to solve just another narrow
problem:

"The end game for me is to replace QEMU/VirtualBox for Linux on Linux
virtualization for my day to day purposes. "

We rarely merge a subsystem to solve one person's problem (esp. when it
is defined as "replace another freely available project", even if you
dislike its command line syntax).
Pekka Enberg Nov. 6, 2011, 12:27 p.m. UTC | #10
On Sun, Nov 6, 2011 at 1:50 PM, Avi Kivity <avi@redhat.com> wrote:
> So far, kvm-tool capabilities are a subset of qemu's.  Does it add
> anything beyond a different command-line?

I think "different command line" is a big thing which is why we've
spent so much time on it. But if you mean other end user features, no,
we don't add anything new on the table right now. I think our
userspace networking implementation is better than QEMU's slirp but
that's purely technical thing.

I also don't think we should add new features for their own sake.
Linux virtualization isn't a terribly difficult thing to do thanks to
KVM and virtio drivers. I think most of the big ticket items will be
doing things like improving guest isolation and making guests more
accessible to the host.

                                Pekka
Pekka Enberg Nov. 6, 2011, 12:32 p.m. UTC | #11
On Sun, Nov 6, 2011 at 2:27 PM, Avi Kivity <avi@redhat.com> wrote:
> But from your description, you're trying to solve just another narrow
> problem:
>
> "The end game for me is to replace QEMU/VirtualBox for Linux on Linux
> virtualization for my day to day purposes. "
>
> We rarely merge a subsystem to solve one person's problem (esp. when it
> is defined as "replace another freely available project", even if you
> dislike its command line syntax).

I really don't understand your point. Other people are using the KVM
tool for other purposes. For example, the (crazy) simulation guys are
using the tool to launch even more guests on a single host and Ingo
seems to be using the tool to test kernels.

I'm not suggesting we should merge the tool because of my particular
use case. I'm simply saying the problem I personally want to solve
with the KVM tool is broader than what Alexander's script is doing.
That's why I feel it's a pointless project.

                        Pekka
Avi Kivity Nov. 6, 2011, 12:43 p.m. UTC | #12
On 11/06/2011 02:32 PM, Pekka Enberg wrote:
> On Sun, Nov 6, 2011 at 2:27 PM, Avi Kivity <avi@redhat.com> wrote:
> > But from your description, you're trying to solve just another narrow
> > problem:
> >
> > "The end game for me is to replace QEMU/VirtualBox for Linux on Linux
> > virtualization for my day to day purposes. "
> >
> > We rarely merge a subsystem to solve one person's problem (esp. when it
> > is defined as "replace another freely available project", even if you
> > dislike its command line syntax).
>
> I really don't understand your point. Other people are using the KVM
> tool for other purposes. For example, the (crazy) simulation guys are
> using the tool to launch even more guests on a single host and Ingo
> seems to be using the tool to test kernels.
>
> I'm not suggesting we should merge the tool because of my particular
> use case. I'm simply saying the problem I personally want to solve
> with the KVM tool is broader than what Alexander's script is doing.
> That's why I feel it's a pointless project.

We're going in circles, but I'll try again.

You say that kvm-tool's scope is broader than Alex's script, therefore
the latter is pointless.
You accept that qemu's scope is broader than kvm-tool (and is a
superset).  That is why many people think kvm-tool is pointless.

Alex's script, though, is just a few dozen lines.  kvm-tool is a 20K
patch - in fact 2X as large as kvm when it was first merged.  And it's
main feature seems to be that "it is not qemu".
Pekka Enberg Nov. 6, 2011, 1:06 p.m. UTC | #13
On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
> You say that kvm-tool's scope is broader than Alex's script, therefore
> the latter is pointless.

I'm saying that Alex's script is pointless because it's not attempting
to fix the real issues. For example, we're trying to make make it as
easy as possible to setup a guest and to be able to access guest data
from the host. Alex's script is essentially just a simplified QEMU
"front end" for kernel developers.

That's why I feel it's a pointless thing to do.

On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
> You accept that qemu's scope is broader than kvm-tool (and is a
> superset).  That is why many people think kvm-tool is pointless.

Sure. I think it's mostly people that are interested in non-Linux
virtualization that think the KVM tool is a pointless project.
However, some people (including myself) think the KVM tool is a more
usable and hackable tool than QEMU for Linux virtualization.

The difference here is that although I feel Alex's script is a
pointless project, I'm in no way opposed to merging it in the tree if
people use it and it solves their problem. Some people seem to be
violently opposed to merging the KVM tool and I'm having difficult
time understanding why that is.

                        Pekka
Pekka Enberg Nov. 6, 2011, 1:11 p.m. UTC | #14
On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
> Alex's script, though, is just a few dozen lines.  kvm-tool is a 20K
> patch - in fact 2X as large as kvm when it was first merged.  And it's
> main feature seems to be that "it is not qemu".

I think I've mentioned many times that I find the QEMU source terribly
difficult to read and hack on. So if you mean "not qemu" from that
point of view, sure, I think it's a very important point. The command
line interface is also "not qemu" for a very good reason too.

As for virtio drivers and such, we're actually following QEMU's
example very closely. I guess we're going to diverge a bit for better
guest isolation but fundamentally I don't see why we'd want to be
totally different from QEMU on that level.

                        Pekka
Avi Kivity Nov. 6, 2011, 3:56 p.m. UTC | #15
On 11/06/2011 03:06 PM, Pekka Enberg wrote:
> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
> > You say that kvm-tool's scope is broader than Alex's script, therefore
> > the latter is pointless.
>
> I'm saying that Alex's script is pointless because it's not attempting
> to fix the real issues. For example, we're trying to make make it as
> easy as possible to setup a guest and to be able to access guest data
> from the host.

Have you tried virt-install/virt-manager?

> Alex's script is essentially just a simplified QEMU
> "front end" for kernel developers.

AFAIR it was based off a random Linus remark.

> That's why I feel it's a pointless thing to do.
>
> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
> > You accept that qemu's scope is broader than kvm-tool (and is a
> > superset).  That is why many people think kvm-tool is pointless.
>
> Sure. I think it's mostly people that are interested in non-Linux
> virtualization that think the KVM tool is a pointless project.
> However, some people (including myself) think the KVM tool is a more
> usable and hackable tool than QEMU for Linux virtualization.

More hackable, certainly, as any 20kloc project will be compared to a
700+kloc project with a long history.  More usable, I really doubt
this.  You take it for granted that people want to run their /boot
kernels in a guest, but in fact only kernel developers (and testers)
want this.  The majority want the real guest kernel.

> The difference here is that although I feel Alex's script is a
> pointless project, I'm in no way opposed to merging it in the tree if
> people use it and it solves their problem. Some people seem to be
> violently opposed to merging the KVM tool and I'm having difficult
> time understanding why that is.

One of the reasons is that if it is merge, anyone with a #include
<linux/foo.h> will line up for the next merge window, wanting in.  The
other is that anything in the Linux source tree might gain an unfair
advantage over out-of-tree projects (at least that's how I read Jan's
comment).
Jan Kiszka Nov. 6, 2011, 4:19 p.m. UTC | #16
On 2011-11-06 14:06, Pekka Enberg wrote:
> Sure. I think it's mostly people that are interested in non-Linux
> virtualization that think the KVM tool is a pointless project.
> However, some people (including myself) think the KVM tool is a more
> usable and hackable tool than QEMU for Linux virtualization.

"Hackable" is relative. I'm surly not saying QEMU has nicer code than
kvm-tool, rather the contrary. But if it were that bad, we would not
have hundreds of contributors, just in the very recent history.

"Usable" - I've tried kvm-tool several times and still (today) fail to
get a standard SUSE image (with a kernel I have to compile and provide
separately...) up and running *). Likely a user mistake, but none that
is very obvious. At least to me.

In contrast, you can throw arbitrary Linux distros in various forms at
QEMU, and it will catch and run them. For me, already this is more usable.

Jan

*) kvm run -m 1000 -d OpenSuse11-4_64.img arch/x86/boot/bzImage \
	-p root=/dev/vda2
...
[    1.772791] mousedev: PS/2 mouse device common for all mice
[    1.774603] cpuidle: using governor ladder
[    1.775490] cpuidle: using governor menu
[    1.776865] input: AT Raw Set 2 keyboard as
/devices/platform/i8042/serio0/input/input0
[    1.778609] TCP cubic registered
[    1.779456] Installing 9P2000 support
[    1.782390] Registering the dns_resolver key type
[    1.794323] registered taskstats version 1

...and here the boot just stops, guest apparently waits for something
Pekka Enberg Nov. 6, 2011, 4:30 p.m. UTC | #17
Hi Jan,

On Sun, Nov 6, 2011 at 6:19 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
> "Usable" - I've tried kvm-tool several times and still (today) fail to
> get a standard SUSE image (with a kernel I have to compile and provide
> separately...) up and running *). Likely a user mistake, but none that
> is very obvious. At least to me.
>
> In contrast, you can throw arbitrary Linux distros in various forms at
> QEMU, and it will catch and run them. For me, already this is more usable.
>
> *) kvm run -m 1000 -d OpenSuse11-4_64.img arch/x86/boot/bzImage \
>        -p root=/dev/vda2
> ...
> [    1.772791] mousedev: PS/2 mouse device common for all mice
> [    1.774603] cpuidle: using governor ladder
> [    1.775490] cpuidle: using governor menu
> [    1.776865] input: AT Raw Set 2 keyboard as
> /devices/platform/i8042/serio0/input/input0
> [    1.778609] TCP cubic registered
> [    1.779456] Installing 9P2000 support
> [    1.782390] Registering the dns_resolver key type
> [    1.794323] registered taskstats version 1
>
> ...and here the boot just stops, guest apparently waits for something

Can you please share your kernel .config with me and I'll take a look
at it. We now have a "make kvmconfig" makefile target for enabling all
the necessary config options for guest kernels. I don't think any of
us developers are using SUSE so it can surely be a KVM tool bug as
well.

                                Pekka
Pekka Enberg Nov. 6, 2011, 4:35 p.m. UTC | #18
Hi Avi,

On Sun, Nov 6, 2011 at 5:56 PM, Avi Kivity <avi@redhat.com> wrote:
> On 11/06/2011 03:06 PM, Pekka Enberg wrote:
>> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
>> > You say that kvm-tool's scope is broader than Alex's script, therefore
>> > the latter is pointless.
>>
>> I'm saying that Alex's script is pointless because it's not attempting
>> to fix the real issues. For example, we're trying to make make it as
>> easy as possible to setup a guest and to be able to access guest data
>> from the host.
>
> Have you tried virt-install/virt-manager?

No, I don't use virtio-manager. I know a lot of people do which is why
someone is working on KVM tool libvirt integration.

>> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
>> > You accept that qemu's scope is broader than kvm-tool (and is a
>> > superset).  That is why many people think kvm-tool is pointless.
>>
>> Sure. I think it's mostly people that are interested in non-Linux
>> virtualization that think the KVM tool is a pointless project.
>> However, some people (including myself) think the KVM tool is a more
>> usable and hackable tool than QEMU for Linux virtualization.
>
> More hackable, certainly, as any 20kloc project will be compared to a
> 700+kloc project with a long history.  More usable, I really doubt
> this.  You take it for granted that people want to run their /boot
> kernels in a guest, but in fact only kernel developers (and testers)
> want this.  The majority want the real guest kernel.

Our inability to boot ISO images, for example, is a usability
limitation, sure. I'm hoping to fix that at some point.

>> The difference here is that although I feel Alex's script is a
>> pointless project, I'm in no way opposed to merging it in the tree if
>> people use it and it solves their problem. Some people seem to be
>> violently opposed to merging the KVM tool and I'm having difficult
>> time understanding why that is.
>
> One of the reasons is that if it is merge, anyone with a #include
> <linux/foo.h> will line up for the next merge window, wanting in.  The
> other is that anything in the Linux source tree might gain an unfair
> advantage over out-of-tree projects (at least that's how I read Jan's
> comment).

Well, having gone through the process of getting something included so
far, I'm not at all worried that there's going to be a huge queue of
"#include <linux/foo.h>" projects if we get in...

What kind of unfair advantage are you referring to? I've specifically
said that the only way for KVM tool to become a reference
implementation would be that the KVM maintainers take the tool through
their tree. As that's not going to happen, I don't see what the
problem would be.

                                 Pekka
Pekka Enberg Nov. 6, 2011, 4:39 p.m. UTC | #19
On Sun, Nov 6, 2011 at 6:19 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
> In contrast, you can throw arbitrary Linux distros in various forms at
> QEMU, and it will catch and run them. For me, already this is more usable.

Yes, I completely agree that this is an unfortunate limitation in the
KVM tool. We definitely need to support booting to images which have
virtio drivers enabled.

                         Pekka
Jan Kiszka Nov. 6, 2011, 4:39 p.m. UTC | #20
On 2011-11-06 17:30, Pekka Enberg wrote:
> Hi Jan,
> 
> On Sun, Nov 6, 2011 at 6:19 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
>> "Usable" - I've tried kvm-tool several times and still (today) fail to
>> get a standard SUSE image (with a kernel I have to compile and provide
>> separately...) up and running *). Likely a user mistake, but none that
>> is very obvious. At least to me.
>>
>> In contrast, you can throw arbitrary Linux distros in various forms at
>> QEMU, and it will catch and run them. For me, already this is more usable.
>>
>> *) kvm run -m 1000 -d OpenSuse11-4_64.img arch/x86/boot/bzImage \
>>        -p root=/dev/vda2
>> ...
>> [    1.772791] mousedev: PS/2 mouse device common for all mice
>> [    1.774603] cpuidle: using governor ladder
>> [    1.775490] cpuidle: using governor menu
>> [    1.776865] input: AT Raw Set 2 keyboard as
>> /devices/platform/i8042/serio0/input/input0
>> [    1.778609] TCP cubic registered
>> [    1.779456] Installing 9P2000 support
>> [    1.782390] Registering the dns_resolver key type
>> [    1.794323] registered taskstats version 1
>>
>> ...and here the boot just stops, guest apparently waits for something
> 
> Can you please share your kernel .config with me and I'll take a look
> at it. We now have a "make kvmconfig" makefile target for enabling all
> the necessary config options for guest kernels. I don't think any of
> us developers are using SUSE so it can surely be a KVM tool bug as
> well.

Attached.

Jan
Avi Kivity Nov. 6, 2011, 4:50 p.m. UTC | #21
On 11/06/2011 06:35 PM, Pekka Enberg wrote:
> >> The difference here is that although I feel Alex's script is a
> >> pointless project, I'm in no way opposed to merging it in the tree if
> >> people use it and it solves their problem. Some people seem to be
> >> violently opposed to merging the KVM tool and I'm having difficult
> >> time understanding why that is.
> >
> > One of the reasons is that if it is merge, anyone with a #include
> > <linux/foo.h> will line up for the next merge window, wanting in.  The
> > other is that anything in the Linux source tree might gain an unfair
> > advantage over out-of-tree projects (at least that's how I read Jan's
> > comment).
>
> Well, having gone through the process of getting something included so
> far, I'm not at all worried that there's going to be a huge queue of
> "#include <linux/foo.h>" projects if we get in...
>
> What kind of unfair advantage are you referring to? I've specifically
> said that the only way for KVM tool to become a reference
> implementation would be that the KVM maintainers take the tool through
> their tree. As that's not going to happen, I don't see what the
> problem would be.

I'm not personally worried about it either (though in fact a *minimal*
reference implementation might not be a bad idea).  There's the risk of
getting informed in-depth press reviews ("Linux KVM Takes A Step Back
From Running Windows Guests"), or of unfairly drawing developers away
from competing projects.
Anthony Liguori Nov. 6, 2011, 5:08 p.m. UTC | #22
On 11/06/2011 10:50 AM, Avi Kivity wrote:
> On 11/06/2011 06:35 PM, Pekka Enberg wrote:
>>>> The difference here is that although I feel Alex's script is a
>>>> pointless project, I'm in no way opposed to merging it in the tree if
>>>> people use it and it solves their problem. Some people seem to be
>>>> violently opposed to merging the KVM tool and I'm having difficult
>>>> time understanding why that is.
>>>
>>> One of the reasons is that if it is merge, anyone with a #include
>>> <linux/foo.h>  will line up for the next merge window, wanting in.  The
>>> other is that anything in the Linux source tree might gain an unfair
>>> advantage over out-of-tree projects (at least that's how I read Jan's
>>> comment).
>>
>> Well, having gone through the process of getting something included so
>> far, I'm not at all worried that there's going to be a huge queue of
>> "#include<linux/foo.h>" projects if we get in...
>>
>> What kind of unfair advantage are you referring to? I've specifically
>> said that the only way for KVM tool to become a reference
>> implementation would be that the KVM maintainers take the tool through
>> their tree. As that's not going to happen, I don't see what the
>> problem would be.
>
> I'm not personally worried about it either (though in fact a *minimal*
> reference implementation might not be a bad idea).  There's the risk of
> getting informed in-depth press reviews ("Linux KVM Takes A Step Back
>  From Running Windows Guests"), or of unfairly drawing developers away
> from competing projects.

I don't think that's really a concern.  Competition is a good thing.  QEMU is a 
large code base that a lot of people rely upon.  It's hard to take big risks in 
a project like QEMU because the consequences are too high.

OTOH, a project like KVM tool can take a lot of risks.  They've attempted a very 
different command line syntax and they've put a lot of work into making 
virtio-9p a main part of the interface.

If it turns out that these things end up working out well for them, then it 
becomes something we can copy in QEMU.  If not, then we didn't go through the 
train wreck of totally changing CLI syntax only to find it was the wrong syntax.

I'm quite happy with KVM tool and hope they continue working on it.  My only 
real wish is that they wouldn't copy QEMU so much and would try bolder things 
that are fundamentally different from QEMU.

Regards,

Anthony Liguori

>
Alexander Graf Nov. 6, 2011, 5:09 p.m. UTC | #23
On 06.11.2011, at 05:11, Pekka Enberg wrote:

> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
>> Alex's script, though, is just a few dozen lines.  kvm-tool is a 20K
>> patch - in fact 2X as large as kvm when it was first merged.  And it's
>> main feature seems to be that "it is not qemu".
> 
> I think I've mentioned many times that I find the QEMU source terribly
> difficult to read and hack on. So if you mean "not qemu" from that
> point of view, sure, I think it's a very important point. The command
> line interface is also "not qemu" for a very good reason too.

That's a matter of taste. In fact, I like the QEMU source code for most parts and there was a whole talk around it on LinuxCon where people agreed that it was really easy to hack away with to prototype new hardware:

  https://events.linuxfoundation.org/events/linuxcon-europe/waskiewicz

As for all matters concerning taste, I don't think we would ever get to a common ground here :).


Alex
Anthony Liguori Nov. 6, 2011, 5:10 p.m. UTC | #24
On 11/06/2011 07:06 AM, Pekka Enberg wrote:
> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity<avi@redhat.com>  wrote:
>> You say that kvm-tool's scope is broader than Alex's script, therefore
>> the latter is pointless.
>
> I'm saying that Alex's script is pointless because it's not attempting
> to fix the real issues. For example, we're trying to make make it as
> easy as possible to setup a guest and to be able to access guest data
> from the host. Alex's script is essentially just a simplified QEMU
> "front end" for kernel developers.
>
> That's why I feel it's a pointless thing to do.
>
> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity<avi@redhat.com>  wrote:
>> You accept that qemu's scope is broader than kvm-tool (and is a
>> superset).  That is why many people think kvm-tool is pointless.
>
> Sure. I think it's mostly people that are interested in non-Linux
> virtualization that think the KVM tool is a pointless project.
> However, some people (including myself) think the KVM tool is a more
> usable and hackable tool than QEMU for Linux virtualization.

There are literally dozens of mini operating systems that exist for exactly the 
same reason that you describe above.  They are smaller and easier to hack on 
than something like Linux.

Regards,

Anthony Liguori

>
> The difference here is that although I feel Alex's script is a
> pointless project, I'm in no way opposed to merging it in the tree if
> people use it and it solves their problem. Some people seem to be
> violently opposed to merging the KVM tool and I'm having difficult
> time understanding why that is.
>
>                          Pekka
>
Pekka Enberg Nov. 6, 2011, 5:11 p.m. UTC | #25
On Sun, 6 Nov 2011, Jan Kiszka wrote:
>> Can you please share your kernel .config with me and I'll take a look
>> at it. We now have a "make kvmconfig" makefile target for enabling all
>> the necessary config options for guest kernels. I don't think any of
>> us developers are using SUSE so it can surely be a KVM tool bug as
>> well.
>
> Attached.

It hang here as well. I ran

   make kvmconfig

on your .config and it works. It's basically these two:

@@ -1478,7 +1478,7 @@
  CONFIG_NETPOLL=y
  # CONFIG_NETPOLL_TRAP is not set
  CONFIG_NET_POLL_CONTROLLER=y
-CONFIG_VIRTIO_NET=m
+CONFIG_VIRTIO_NET=y
  # CONFIG_VMXNET3 is not set
  # CONFIG_ISDN is not set
  # CONFIG_PHONE is not set
@@ -1690,7 +1690,7 @@
  # CONFIG_SERIAL_PCH_UART is not set
  # CONFIG_SERIAL_XILINX_PS_UART is not set
  CONFIG_HVC_DRIVER=y
-CONFIG_VIRTIO_CONSOLE=m
+CONFIG_VIRTIO_CONSOLE=y
  CONFIG_IPMI_HANDLER=m
  # CONFIG_IPMI_PANIC_EVENT is not set
  CONFIG_IPMI_DEVICE_INTERFACE=m

 			Pekka
Alexander Graf Nov. 6, 2011, 5:15 p.m. UTC | #26
On 06.11.2011, at 05:06, Pekka Enberg wrote:

> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
>> You say that kvm-tool's scope is broader than Alex's script, therefore
>> the latter is pointless.
> 
> I'm saying that Alex's script is pointless because it's not attempting
> to fix the real issues. For example, we're trying to make make it as
> easy as possible to setup a guest and to be able to access guest data
> from the host. Alex's script is essentially just a simplified QEMU
> "front end" for kernel developers.
> 
> That's why I feel it's a pointless thing to do.

It's a script tailored to what Linus told me he wanted to see. I merely wanted to prove the point that what he wanted can be achieved without thousands and thousands of lines of code by reusing what is already there. IMHO less code is usually a good thing.

In fact, why don't you just provide a script in tools/testing/ that fetches KVM Tool from a git tree somewhere else and compiles it? It could easily live outside the kernel tree - you can even grab our awesome "fetch all Linux headers" script from QEMU so you can keep in sync with KVM header files.

At that point, both front ends would live in separate trees, could evolve however they like and everyone's happy, because KVM Tools would still be easy to use for people who want it by executing said shell script.

> 
> On Sun, Nov 6, 2011 at 2:43 PM, Avi Kivity <avi@redhat.com> wrote:
>> You accept that qemu's scope is broader than kvm-tool (and is a
>> superset).  That is why many people think kvm-tool is pointless.
> 
> Sure. I think it's mostly people that are interested in non-Linux
> virtualization that think the KVM tool is a pointless project.
> However, some people (including myself) think the KVM tool is a more
> usable and hackable tool than QEMU for Linux virtualization.

Sure. That's taste. If I think that tcsh is a better shell than bash do I pull it into the kernel tree just so "it lies there"? It definitely does use kernel interfaces too, so I can make up just as many reasons as you to pull it in.

> The difference here is that although I feel Alex's script is a
> pointless project, I'm in no way opposed to merging it in the tree if
> people use it and it solves their problem. Some people seem to be
> violently opposed to merging the KVM tool and I'm having difficult
> time understanding why that is.

It's a matter of size and scope. Write a shell script that clones, builds and executes KVM Tool and throw it in testing/tools/ and I'll happily ack it!


Alex
Jan Kiszka Nov. 6, 2011, 5:23 p.m. UTC | #27
On 2011-11-06 18:11, Pekka Enberg wrote:
> On Sun, 6 Nov 2011, Jan Kiszka wrote:
>>> Can you please share your kernel .config with me and I'll take a look
>>> at it. We now have a "make kvmconfig" makefile target for enabling all
>>> the necessary config options for guest kernels. I don't think any of
>>> us developers are using SUSE so it can surely be a KVM tool bug as
>>> well.
>>
>> Attached.
> 
> It hang here as well. I ran
> 
>   make kvmconfig
> 
> on your .config and it works. It's basically these two:
> 
> @@ -1478,7 +1478,7 @@
>  CONFIG_NETPOLL=y
>  # CONFIG_NETPOLL_TRAP is not set
>  CONFIG_NET_POLL_CONTROLLER=y
> -CONFIG_VIRTIO_NET=m
> +CONFIG_VIRTIO_NET=y
>  # CONFIG_VMXNET3 is not set
>  # CONFIG_ISDN is not set
>  # CONFIG_PHONE is not set
> @@ -1690,7 +1690,7 @@
>  # CONFIG_SERIAL_PCH_UART is not set
>  # CONFIG_SERIAL_XILINX_PS_UART is not set
>  CONFIG_HVC_DRIVER=y
> -CONFIG_VIRTIO_CONSOLE=m
> +CONFIG_VIRTIO_CONSOLE=y
>  CONFIG_IPMI_HANDLER=m
>  # CONFIG_IPMI_PANIC_EVENT is not set
>  CONFIG_IPMI_DEVICE_INTERFACE=m
> 
>             Pekka

Doesn't help here (with a disk image).

Also, both dependencies make no sense to me as we boot from disk, not
from net, and the console is on ttyS0.

Jan
Pekka Enberg Nov. 6, 2011, 5:28 p.m. UTC | #28
On Sun, Nov 6, 2011 at 7:15 PM, Alexander Graf <agraf@suse.de> wrote:
>> The difference here is that although I feel Alex's script is a
>> pointless project, I'm in no way opposed to merging it in the tree if
>> people use it and it solves their problem. Some people seem to be
>> violently opposed to merging the KVM tool and I'm having difficult
>> time understanding why that is.
>
> It's a matter of size and scope. Write a shell script that clones, builds and
> executes KVM Tool and throw it in testing/tools/ and I'll happily ack it!

That's pretty much what git submodule would do, isn't it?

I really don't see the point in doing that. We want to be part of
regular kernel history and release cycle. We want people to be able to
see what's going on in our tree to keep us honest and we want to make
the barrier of entry as low as possible.

It's not just about code, it's as much about culture and development process.

                                Pekka
Alexander Graf Nov. 6, 2011, 5:30 p.m. UTC | #29
On 06.11.2011, at 09:28, Pekka Enberg wrote:

> On Sun, Nov 6, 2011 at 7:15 PM, Alexander Graf <agraf@suse.de> wrote:
>>> The difference here is that although I feel Alex's script is a
>>> pointless project, I'm in no way opposed to merging it in the tree if
>>> people use it and it solves their problem. Some people seem to be
>>> violently opposed to merging the KVM tool and I'm having difficult
>>> time understanding why that is.
>> 
>> It's a matter of size and scope. Write a shell script that clones, builds and
>> executes KVM Tool and throw it in testing/tools/ and I'll happily ack it!
> 
> That's pretty much what git submodule would do, isn't it?
> 
> I really don't see the point in doing that. We want to be part of
> regular kernel history and release cycle. We want people to be able to
> see what's going on in our tree to keep us honest and we want to make
> the barrier of entry as low as possible.
> 
> It's not just about code, it's as much about culture and development process.

So you're saying that projects that are not living in the kernel tree aren't worthwhile? Or are you only trying to bump your oloh stats?

I mean, seriously, git makes it so easy to have a separate tree that it almost doesn't make sense not to have one. You're constantly working in separate trees yourself because every one of your branches is separate. Keeping in sync with the kernel release cycles (which I don't think makes any sense for you) should be easy enough too by merely releasing in sync with the kernel tree...


Alex
Pekka Enberg Nov. 6, 2011, 5:55 p.m. UTC | #30
On Sun, 6 Nov 2011, Jan Kiszka wrote:
> Doesn't help here (with a disk image).
>
> Also, both dependencies make no sense to me as we boot from disk, not
> from net, and the console is on ttyS0.

It's only VIRTIO_NET and the guest is not actually stuck, it just takes a 
while to boot:

[    1.866614] Installing 9P2000 support
[    1.868991] Registering the dns_resolver key type
[    1.878084] registered taskstats version 1
[   13.927367] Root-NFS: no NFS server address
[   13.929500] VFS: Unable to mount root fs via NFS, trying floppy.
[   13.939177] VFS: Mounted root (9p filesystem) on device 0:12.
[   13.941522] devtmpfs: mounted
[   13.943317] Freeing unused kernel memory: 684k freed
Mounting...
Starting '/bin/sh'...
sh-4.2#

I'm CC'ing Sasha and Asias.
Pekka Enberg Nov. 6, 2011, 6:05 p.m. UTC | #31
On Sun, Nov 6, 2011 at 7:30 PM, Alexander Graf <agraf@suse.de> wrote:
>> That's pretty much what git submodule would do, isn't it?
>>
>> I really don't see the point in doing that. We want to be part of
>> regular kernel history and release cycle. We want people to be able to
>> see what's going on in our tree to keep us honest and we want to make
>> the barrier of entry as low as possible.
>>
>> It's not just about code, it's as much about culture and development process.
>
> So you're saying that projects that are not living in the kernel tree aren't worthwhile?

Yeah, that's exactly what I'm saying...

> Or are you only trying to bump your oloh stats?

That too!

On Sun, Nov 6, 2011 at 7:30 PM, Alexander Graf <agraf@suse.de> wrote:
> I mean, seriously, git makes it so easy to have a separate tree that
> it almost doesn't make sense not to have one. You're constantly
> working in separate trees yourself because every one of your
> branches is separate. Keeping in sync with the kernel release cycles
> (which I don't think makes any sense for you) should be easy enough
> too by merely releasing in sync with the kernel tree...

We'd be the only subsystem doing that! Why on earth do you think we
want to be the first ones to do that? We don't want to be different,
we want to make the barrier of entry low.

                        Pekka
Pekka Enberg Nov. 6, 2011, 6:09 p.m. UTC | #32
On Sun, Nov 6, 2011 at 7:08 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> I'm quite happy with KVM tool and hope they continue working on it.  My only
> real wish is that they wouldn't copy QEMU so much and would try bolder
> things that are fundamentally different from QEMU.

Hey, right now our only source of crazy ideas is Ingo and I think he's
actually a pretty conservative guy when it comes to technology. Avi
has expressed some crazy ideas in the past but they require switching
away from C and that's not something we're interested in doing. ;-)

                        Pekka
Theodore Ts'o Nov. 6, 2011, 6:31 p.m. UTC | #33
On Sun, Nov 06, 2011 at 11:08:10AM -0600, Anthony Liguori wrote:
> I'm quite happy with KVM tool and hope they continue working on it.
> My only real wish is that they wouldn't copy QEMU so much and would
> try bolder things that are fundamentally different from QEMU.

My big wish is that they don't try to merge the KVM tool into the
kernel code.  It's a separate userspace project, and there's no reason
for it to be bundled with kernel code.  It just makes the kernel
sources larger.  The mere fact that qemu-kvm exists means that the KVM
interface has to remain backward compatible; it *is* an ABI.

So integrating kvm-tool into the kernel isn't going to work as a free
pass to make non-backwards compatible changes to the KVM user/kernel
interface.  Given that, why bloat the kernel source tree size?

Please, keep the kvm-tool sources as a separate git tree.

	     	 	  	       		- Ted
Pekka Enberg Nov. 6, 2011, 6:54 p.m. UTC | #34
On Sun, Nov 06, 2011 at 11:08:10AM -0600, Anthony Liguori wrote:
>> I'm quite happy with KVM tool and hope they continue working on it.
>> My only real wish is that they wouldn't copy QEMU so much and would
>> try bolder things that are fundamentally different from QEMU.

On Sun, Nov 6, 2011 at 8:31 PM, Ted Ts'o <tytso@mit.edu> wrote:
> My big wish is that they don't try to merge the KVM tool into the
> kernel code.  It's a separate userspace project, and there's no reason
> for it to be bundled with kernel code.  It just makes the kernel
> sources larger.  The mere fact that qemu-kvm exists means that the KVM
> interface has to remain backward compatible; it *is* an ABI.
>
> So integrating kvm-tool into the kernel isn't going to work as a free
> pass to make non-backwards compatible changes to the KVM user/kernel
> interface.  Given that, why bloat the kernel source tree size?

Ted, I'm confused. Making backwards incompatible ABI changes has never
been on the table. Why are you bringing it up?

                        Pekka
Pekka Enberg Nov. 6, 2011, 6:58 p.m. UTC | #35
On Sun, Nov 6, 2011 at 8:54 PM, Pekka Enberg <penberg@kernel.org> wrote:
>> So integrating kvm-tool into the kernel isn't going to work as a free
>> pass to make non-backwards compatible changes to the KVM user/kernel
>> interface.  Given that, why bloat the kernel source tree size?
>
> Ted, I'm confused. Making backwards incompatible ABI changes has never
> been on the table. Why are you bringing it up?

And btw, KVM tool is not a random userspace project - it was designed
to live in tools/kvm from the beginning. I've explained the technical
rationale for sharing kernel code here:

https://lkml.org/lkml/2011/11/4/150

Please also see Ingo's original rant that started the project:

http://thread.gmane.org/gmane.linux.kernel/962051/focus=962620

                        Pekka
Paolo Bonzini Nov. 6, 2011, 7:11 p.m. UTC | #36
On 11/06/2011 06:28 PM, Pekka Enberg wrote:
> On Sun, Nov 6, 2011 at 7:15 PM, Alexander Graf<agraf@suse.de>  wrote:
>>> The difference here is that although I feel Alex's script is a
>>> pointless project, I'm in no way opposed to merging it in the tree if
>>> people use it and it solves their problem. Some people seem to be
>>> violently opposed to merging the KVM tool and I'm having difficult
>>> time understanding why that is.
>>
>> It's a matter of size and scope. Write a shell script that clones, builds and
>> executes KVM Tool and throw it in testing/tools/ and I'll happily ack it!
>
> That's pretty much what git submodule would do, isn't it?

Absolutely not.  It would always fetch HEAD from the KVM tool repo.  A 
submodule ties each supermodule commit to a particular submodule commit.

> I really don't see the point in doing that. We want to be part of
> regular kernel history and release cycle.

But I'm pretty certain that, when testing 3.2 with KVM tool in a couple 
of years, I want all the shining new features you added in this time; I 
don't want the old end-2011 code.  Same if I'm bisecting kernels, I 
don't want to build KVM tool once per bisection cycle, do I?

Paolo
Paolo Bonzini Nov. 6, 2011, 7:14 p.m. UTC | #37
On 11/06/2011 07:05 PM, Pekka Enberg wrote:
>> I mean, seriously, git makes it so easy to have a separate tree that
>> >  it almost doesn't make sense not to have one. You're constantly
>> >  working in separate trees yourself because every one of your
>> >  branches is separate. Keeping in sync with the kernel release cycles
>> >  (which I don't think makes any sense for you) should be easy enough
>> >  too by merely releasing in sync with the kernel tree...
> We'd be the only subsystem doing that!

GStreamer (V4L), RTSAdmin (LIO target), sg3_utils, trousers all are out 
of tree, and nobody of their authors is even thinking of doing all this 
brouhaha to get merged into Linus's tree.

Paolo
Pekka Enberg Nov. 6, 2011, 7:17 p.m. UTC | #38
On Sun, Nov 6, 2011 at 9:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> I really don't see the point in doing that. We want to be part of
>> regular kernel history and release cycle.
>
> But I'm pretty certain that, when testing 3.2 with KVM tool in a couple of
> years, I want all the shining new features you added in this time; I don't
> want the old end-2011 code.  Same if I'm bisecting kernels, I don't want to
> build KVM tool once per bisection cycle, do I?

If you're bisecting breakage that can be in the guest kernel or the
KVM tool, you'd want to build both.

What would prevent you from using a newer KVM tool with an older kernel?
Pekka Enberg Nov. 6, 2011, 7:19 p.m. UTC | #39
On Sun, Nov 6, 2011 at 9:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> GStreamer (V4L), RTSAdmin (LIO target), sg3_utils, trousers all are out of
> tree, and nobody of their authors is even thinking of doing all this
> brouhaha to get merged into Linus's tree.

We'd be the first subsystem to use the download script thing Alex suggested.
Paolo Bonzini Nov. 6, 2011, 8:01 p.m. UTC | #40
On 11/06/2011 08:17 PM, Pekka Enberg wrote:
>> >  But I'm pretty certain that, when testing 3.2 with KVM tool in a couple of
>> >  years, I want all the shining new features you added in this time; I don't
>> >  want the old end-2011 code.  Same if I'm bisecting kernels, I don't want to
>> >  build KVM tool once per bisection cycle, do I?
>
> If you're bisecting breakage that can be in the guest kernel or the
> KVM tool, you'd want to build both.

No.  I want to try new tool/old kernel and old tool/new kernel (kernel 
can be either guest or host, depending on the nature of the bug), and 
then bisect just one.  (*) And that's the exceptional case, and only KVM 
tool developers really should have the need to do that.

   (*) Not coincidentially, that's what git bisect does when HEAD is
       a merge of two unrelated histories.

> What would prevent you from using a newer KVM tool with an older kernel?

Nothing, but I'm just giving you *strong* hints that a submodule or a 
merged tool is the wrong solution, and the histories of kernel and tool 
should be kept separate.

More clearly: for its supposedly intended usage, namely testing 
development kernels in a *guest*, KVM tool will generally not run on the 
exact *host* kernel that is in the tree it lives with.  Almost never, in 
fact.  Unlike perf, if you want to test multiple guest kernels you 
should never need to rebuild KVM tool!

This is the main argument as to whether or not to merge the tool.  Would 
the integration of the *build* make sense or not?  Assume you adapt the 
ktest script to make both the KVM tool and the kernel, and test the 
latter using the former.  Your host kernel never changes, and yet you 
introduce a new variable in your testing.  That complicates things, it 
doesn't simplify them.

Paolo
Pekka Enberg Nov. 6, 2011, 8:17 p.m. UTC | #41
On Sun, Nov 6, 2011 at 10:01 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> If you're bisecting breakage that can be in the guest kernel or the
>> KVM tool, you'd want to build both.
>
> No.  I want to try new tool/old kernel and old tool/new kernel (kernel can
> be either guest or host, depending on the nature of the bug), and then
> bisect just one.  (*) And that's the exceptional case, and only KVM tool
> developers really should have the need to do that.

Exactly - having the source code in Linux kernel tree covers the
"exceptional case" where we're unsure which part of the equation broke
things (which are btw the nasties issues we've had so far). I have no
idea why you're trying to convince me that it doesn't matter. You can
bisect only one of the components in isolation just fine.

On Sun, Nov 6, 2011 at 10:01 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> What would prevent you from using a newer KVM tool with an older kernel?
>
> Nothing, but I'm just giving you *strong* hints that a submodule or a merged
> tool is the wrong solution, and the histories of kernel and tool should be
> kept separate.
>
> More clearly: for its supposedly intended usage, namely testing development
> kernels in a *guest*, KVM tool will generally not run on the exact *host*
> kernel that is in the tree it lives with.  Almost never, in fact.  Unlike
> perf, if you want to test multiple guest kernels you should never need to
> rebuild KVM tool!
>
> This is the main argument as to whether or not to merge the tool.  Would the
> integration of the *build* make sense or not?  Assume you adapt the ktest
> script to make both the KVM tool and the kernel, and test the latter using
> the former.  Your host kernel never changes, and yet you introduce a new
> variable in your testing.  That complicates things, it doesn't simplify
> them.

I don't understand what trying to say. There's no requirement to build
the KVM tool if you're bisecting a guest kernel.

                        Pekka
Pekka Enberg Nov. 6, 2011, 8:31 p.m. UTC | #42
On Sun, Nov 6, 2011 at 10:01 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Nothing, but I'm just giving you *strong* hints that a submodule or a merged
> tool is the wrong solution, and the histories of kernel and tool should be
> kept separate.

And btw, I don't really understand what you're trying to accomplish
with this line of reasoning. We've tried both separate and shared
repository and the latter is much better from development point of
view.

This is not some random userspace project that uses the kernel system
calls. It's a hypervisor that implements virtio drivers, serial
emulation, and mini-BIOS. It's very close to the kernel which is why
it's such a good fit with the kernel tree.

I'd actually be willing to argue that from purely technical point of
view, KVM tool makes much more sense to have in the kernel tree than
perf does.

                                Pekka
Frank Ch. Eigler Nov. 6, 2011, 10:08 p.m. UTC | #43
$	<CAOJsxLFCjkAK7Lw4M15G44k11zrcF7tnu9yMbiQYDBNZr+83tg@mail.gmail.com>
From: fche@redhat.com (Frank Ch. Eigler)
Date: Sun, 06 Nov 2011 17:08:48 -0500
In-Reply-To: <CAOJsxLFCjkAK7Lw4M15G44k11zrcF7tnu9yMbiQYDBNZr+83tg@mail.gmail.com> (Pekka Enberg's message of "Sun, 6 Nov 2011 20:05:45 +0200")
Message-ID: <y0mhb2g6gzz.fsf@fche.csb>
User-Agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii


Pekka Enberg <penberg@kernel.org> writes:

> [...]  We don't want to be different, we want to make the barrier of
> entry low.

When has the barrier of entry into the kernel ever been "low"
for anyone not already working in the kernel?

- FChE
Theodore Ts'o Nov. 6, 2011, 11:19 p.m. UTC | #44
On Sun, Nov 06, 2011 at 08:58:20PM +0200, Pekka Enberg wrote:
> > Ted, I'm confused. Making backwards incompatible ABI changes has never
> > been on the table. Why are you bringing it up?
> 
> And btw, KVM tool is not a random userspace project - it was designed
> to live in tools/kvm from the beginning. I've explained the technical
> rationale for sharing kernel code here:
> 
> https://lkml.org/lkml/2011/11/4/150
> 
> Please also see Ingo's original rant that started the project:
> 
> http://thread.gmane.org/gmane.linux.kernel/962051/focus=962620

Because I don't buy any of these arguments.  We have the same kernel
developers working on xfs and xfsprogs, ext4 and e2fsprogs, btrfs and
btrfsprogs, and we don't have those userspace projects in the kernel
source tree.

The only excuse I can see is a hope to make random changes to the
kernel and userspace tools without having to worry about compatibility
problems, which is an argument I've seen with perf (that you have to
use the same version of perf as the kernel version, which to me is bad
software engineering).  And that's why I pointed out that you can't do
that with KVM, since we have out-of-tree userspace users, namely
qemu-kvm.

The rest of the arguments are arguments for a new effort, which is
fine --- but not an excuse for putting in the kernel source tree.

     	     	    	       	       - Ted
Anthony Liguori Nov. 7, 2011, 1:38 a.m. UTC | #45
On 11/06/2011 12:09 PM, Pekka Enberg wrote:
> On Sun, Nov 6, 2011 at 7:08 PM, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> I'm quite happy with KVM tool and hope they continue working on it.  My only
>> real wish is that they wouldn't copy QEMU so much and would try bolder
>> things that are fundamentally different from QEMU.
>
> Hey, right now our only source of crazy ideas is Ingo and I think he's
> actually a pretty conservative guy when it comes to technology. Avi
> has expressed some crazy ideas in the past but they require switching
> away from C and that's not something we're interested in doing. ;-)

Just a couple random suggestions:

- Drop SDL/VNC.  Make a proper Cairo GUI with a full blown GTK interface.  Don't 
rely on virt-manager for this.  Not that I have anything against virt-manager 
but there are many layers between you and the end GUI if you go that route.

- Sandbox the device model from day #1.  The size of the Linux kernel interface 
is pretty huge and as a hypervisor, it's the biggest place for improvement from 
a security perspective.  We're going to do sandboxing in QEMU, but it's going to 
be difficult.  It would be much easier for you given where you're at.

Regards,

Anthony Liguori

>
>                          Pekka
>
Pekka Enberg Nov. 7, 2011, 6:42 a.m. UTC | #46
On Sun, 6 Nov 2011, Ted Ts'o wrote:
> The only excuse I can see is a hope to make random changes to the
> kernel and userspace tools without having to worry about compatibility
> problems, which is an argument I've seen with perf (that you have to
> use the same version of perf as the kernel version, which to me is bad
> software engineering).  And that's why I pointed out that you can't do
> that with KVM, since we have out-of-tree userspace users, namely
> qemu-kvm.

I've never heard ABI incompatibility used as an argument for perf. Ingo?

As for the KVM tool, merging has never been about being able to do ABI 
incompatible changes and never will be. I'm still surprised you even 
brought this up because I've always been one to _complain_ about people 
breaking the ABI - not actually breaking it (at least on purpose).

 			Pekka
Pekka Enberg Nov. 7, 2011, 6:45 a.m. UTC | #47
Hi Anthony,

On Sun, 6 Nov 2011, Anthony Liguori wrote:
> - Drop SDL/VNC.  Make a proper Cairo GUI with a full blown GTK interface. 
> Don't rely on virt-manager for this.  Not that I have anything against 
> virt-manager but there are many layers between you and the end GUI if you go 
> that route.

Funny that you should mention this. It was actually what I started out 
with. I went for SDL because it was a low-hanging fruit after the VNC 
patches which I didn't do myself.

However, it was never figured out if there was going to be a virtio 
transport for GPU commands:

http://lwn.net/Articles/408831/

On Sun, 6 Nov 2011, Anthony Liguori wrote:
> - Sandbox the device model from day #1.  The size of the Linux kernel 
> interface is pretty huge and as a hypervisor, it's the biggest place for 
> improvement from a security perspective.  We're going to do sandboxing in 
> QEMU, but it's going to be difficult.  It would be much easier for you given 
> where you're at.

Completely agreed. I think Sasha is actually starting to work on this. See 
the "Secure KVM" thread on kvm@.

 			Pekka
Pekka Enberg Nov. 7, 2011, 6:58 a.m. UTC | #48
On Mon, Nov 7, 2011 at 12:08 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
>> [...]  We don't want to be different, we want to make the barrier of
>> entry low.
>
> When has the barrier of entry into the kernel ever been "low"
> for anyone not already working in the kernel?

What's your point? Working on the KVM tool requires knowledge of the
Linux kernel.

                                Pekka
Paolo Bonzini Nov. 7, 2011, 8 a.m. UTC | #49
On 11/06/2011 09:17 PM, Pekka Enberg wrote:
> >  No.  I want to try new tool/old kernel and old tool/new kernel (kernel can
> >  be either guest or host, depending on the nature of the bug), and then
> >  bisect just one.  (*) And that's the exceptional case, and only KVM tool
> >  developers really should have the need to do that.
>
> Exactly - having the source code in Linux kernel tree covers the
> "exceptional case" where we're unsure which part of the equation broke
> things (which are btw the nasties issues we've had so far).

No, having the source code in Linux kernel tree is perfectly useless for 
the exceptional case, and forces you to go through extra hoops to build 
only one component.  Small hoops such as adding "-- tools/kvm" to "git 
bisect start" perhaps, but still hoops that aren't traded for a 
practical advantage.  You keep saying "oh things have been so much 
better" because "it's so close to the kernel" and "it worked so great 
for perf", but you haven't brought any practical example that we can 
stare at in admiration.

(BTW, I'm also convinced like Ted that not having a defined perf ABI 
might have made sense in the beginning, but it has now devolved into bad 
software engineering practice).

> I have no idea why you're trying to convince me that it doesn't matter.

I'm not trying to convince you that it doesn't matter, I'm trying to 
convince you that it doesn't *make sense*.

> It's a hypervisor that implements virtio drivers, serial
> emulation, and mini-BIOS.

... all of which have a spec against which you should be working.  Save 
perhaps for the mini-BIOS, if you develop against the kernel source 
rather than the spec you're doing it *wrong*.  Very wrong.  But you've 
been told this many times already.

Paolo
Pekka Enberg Nov. 7, 2011, 8:09 a.m. UTC | #50
On Mon, Nov 7, 2011 at 10:00 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> No, having the source code in Linux kernel tree is perfectly useless for the
> exceptional case, and forces you to go through extra hoops to build only one
> component.  Small hoops such as adding "-- tools/kvm" to "git bisect start"
> perhaps, but still hoops that aren't traded for a practical advantage.  You
> keep saying "oh things have been so much better" because "it's so close to
> the kernel" and "it worked so great for perf", but you haven't brought any
> practical example that we can stare at in admiration.

The _practical example_ is the working software in tools/kvm!

>> I have no idea why you're trying to convince me that it doesn't matter.
>
> I'm not trying to convince you that it doesn't matter, I'm trying to
> convince you that it doesn't *make sense*.
>
>> It's a hypervisor that implements virtio drivers, serial
>> emulation, and mini-BIOS.
>
> ... all of which have a spec against which you should be working.  Save
> perhaps for the mini-BIOS, if you develop against the kernel source rather
> than the spec you're doing it *wrong*.  Very wrong.  But you've been told
> this many times already.

I have zero interest in arguing with you about something you have no
practical experience on. I've tried both out-of-tree and in-tree
development for the KVM tool and I can tell you the latter is much
more productive environment.

We are obviously also using specifications but as you damn well should
know, specifications don't matter nearly as much as working code.
That's why it's important to have easy access to both.

                        Pekka
Pekka Enberg Nov. 7, 2011, 8:13 a.m. UTC | #51
On Mon, Nov 7, 2011 at 10:00 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> (BTW, I'm also convinced like Ted that not having a defined perf ABI might
> have made sense in the beginning, but it has now devolved into bad software
> engineering practice).

I'm not a perf maintainer so I don't know what the situation with wrt.
ABI breakage is. Your or Ted's comments don't match my assumptions or
experience, though.

                        Pekka
Paolo Bonzini Nov. 7, 2011, 8:20 a.m. UTC | #52
On 11/07/2011 09:09 AM, Pekka Enberg wrote:
> We are obviously also using specifications but as you damn well should
> know, specifications don't matter nearly as much as working code.

Specifications matter much more than working code.  Quirks are a fact of 
life but should always come second.

To bring you an example from the kernel, there is a very boring list of 
"PCI quirks" and a lot of code for "PCI specs", not the other way round.

Paolo
Pekka Enberg Nov. 7, 2011, 8:45 a.m. UTC | #53
On 11/07/2011 09:09 AM, Pekka Enberg wrote:
>> We are obviously also using specifications but as you damn well should
>> know, specifications don't matter nearly as much as working code.

On Mon, 7 Nov 2011, Paolo Bonzini wrote:
> Specifications matter much more than working code.  Quirks are a fact of life 
> but should always come second.

To quote Linus:

   And I have seen _lots_ of total crap work that was based on specs. It's
   _the_ single worst way to write software, because it by definition means
   that the software was written to match theory, not reality.

[ http://kerneltrap.org/node/5725 ]

So no, I don't agree with you at all.

 			Pekka
Paolo Bonzini Nov. 7, 2011, 8:52 a.m. UTC | #54
On 11/07/2011 09:45 AM, Pekka Enberg wrote:
>
>> Specifications matter much more than working code.  Quirks are a fact
>> of life but should always come second.
>
> To quote Linus:
>
>    And I have seen _lots_ of total crap work that was based on specs. It's
> _the_ single worst way to write software, because it by definition means
>    that the software was written to match theory, not reality.

All generalizations are false.

Paolo
Pekka Enberg Nov. 7, 2011, 8:57 a.m. UTC | #55
On 11/07/2011 09:45 AM, Pekka Enberg wrote:
>>> Specifications matter much more than working code.  Quirks are a fact
>>> of life but should always come second.
>>
>> To quote Linus:
>>
>>   And I have seen _lots_ of total crap work that was based on specs. It's
>> _the_ single worst way to write software, because it by definition means
>>   that the software was written to match theory, not reality.

On Mon, Nov 7, 2011 at 10:52 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> All generalizations are false.

What is that supposed to mean? You claimed we're "doing it wrong" and
I explained you why we are doing it the way we are.

Really, the way we do things in the KVM tool is not a bug, it's a feature.

                        Pekka
Gerd Hoffmann Nov. 7, 2011, 10:11 a.m. UTC | #56
Hi,

> "Usable" - I've tried kvm-tool several times and still (today) fail to
> get a standard SUSE image (with a kernel I have to compile and provide
> separately...) up and running *). Likely a user mistake, but none that
> is very obvious. At least to me.

Same here.

No support for booting from CDROM.
No support for booting from Network.
Thus no way to install a new guest image.

Booting an existing qcow2 guest image failed, the guest started throwing
I/O errors.  And even to try that I had to manually extract the kernel
and initrd images from the guest.  Maybe you should check with the Xen
guys, they have a funky 'pygrub' which sort-of automates the
copy-kernel-from-guest-image process.

Booting the host kernel failed too.  Standard distro kernel.  The virtio
bits are modular, not statically compiled into the kernel.  kvm tool
can't handle that.

You have to build your own kernel and make sure you flip the correct
config bits, then you can boot it to a shell prompt.  Trying anything
else just doesn't work today ...

cheers,
  Gerd
Pekka Enberg Nov. 7, 2011, 10:18 a.m. UTC | #57
On Mon, Nov 7, 2011 at 12:11 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> No support for booting from CDROM.
> No support for booting from Network.
> Thus no way to install a new guest image.

Sure. It's a pain point which we need to fix.

On Mon, Nov 7, 2011 at 12:11 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> Booting an existing qcow2 guest image failed, the guest started throwing
> I/O errors.  And even to try that I had to manually extract the kernel
> and initrd images from the guest.  Maybe you should check with the Xen
> guys, they have a funky 'pygrub' which sort-of automates the
> copy-kernel-from-guest-image process.

QCOW2 support is experimental. The I/O errors are caused by forced
read-only mode.

On Mon, Nov 7, 2011 at 12:11 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> Booting the host kernel failed too.  Standard distro kernel.  The virtio
> bits are modular, not statically compiled into the kernel.  kvm tool
> can't handle that.

I think we have some support for booting modular distro kernels too if
you tell KVM tool where to find initrd. It sucks out-of-the-box though
because nobody seems to be using it.

On Mon, Nov 7, 2011 at 12:11 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> You have to build your own kernel and make sure you flip the correct
> config bits, then you can boot it to a shell prompt.  Trying anything
> else just doesn't work today ...

What can I say? Patches welcome? :-)

                        Pekka
Gerd Hoffmann Nov. 7, 2011, 10:23 a.m. UTC | #58
Hi,

> It's not just about code, it's as much about culture and development process.

Indeed.  The BSDs have both kernel and the base system in a single
repository.  There are probably good reasons for (and against) it.

In Linux we don't have that culture.  No tool (except perf) lives in the
kernel repo.  I fail to see why kvm-tool is that much different from
udev, util-linux, iproute, filesystem tools, that it should be included.

cheers,
  Gerd
Sasha Levin Nov. 7, 2011, 10:30 a.m. UTC | #59
On Mon, Nov 7, 2011 at 12:23 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>  Hi,
>
>> It's not just about code, it's as much about culture and development process.
>
> Indeed.  The BSDs have both kernel and the base system in a single
> repository.  There are probably good reasons for (and against) it.
>
> In Linux we don't have that culture.  No tool (except perf) lives in the
> kernel repo.  I fail to see why kvm-tool is that much different from
> udev, util-linux, iproute, filesystem tools, that it should be included.

tools/power was merged in just 2 versions ago, do you think that
merging that was a mistake?
Kevin Wolf Nov. 7, 2011, 10:31 a.m. UTC | #60
Am 06.11.2011 19:31, schrieb Ted Ts'o:
> On Sun, Nov 06, 2011 at 11:08:10AM -0600, Anthony Liguori wrote:
>> I'm quite happy with KVM tool and hope they continue working on it.
>> My only real wish is that they wouldn't copy QEMU so much and would
>> try bolder things that are fundamentally different from QEMU.
> 
> My big wish is that they don't try to merge the KVM tool into the
> kernel code.  It's a separate userspace project, and there's no reason
> for it to be bundled with kernel code.  It just makes the kernel
> sources larger. 

In fact, the reverse is true as well: It makes kvm-tool's sources
larger. Instead on just cloning a small repository I need to clone the
whole kernel repository, even though I'm not a kernel developer and
don't intend to touch anything but tools/kvm.

Not too bad for me as I have a kernel repository lying around anyway and
I can share most of the content, but there are people who don't. Still,
having an additional 1.2 GB repository just for ~1 MB in which I'm
really interested doesn't make me too happy. And dealing with a huge
repository also means that even git becomes slower (which means, I had
to turn off some functionality for my shell prompt in this repo, as I
didn't like waiting for much more than a second or two)

Makes it a lot less hackable for me unless you want to restrict the set
of potential developers to Linux kernel developers...

Kevin
Paolo Bonzini Nov. 7, 2011, 11:02 a.m. UTC | #61
On 11/07/2011 11:30 AM, Sasha Levin wrote:
> >  In Linux we don't have that culture.  No tool (except perf) lives in the
> >  kernel repo.  I fail to see why kvm-tool is that much different from
> >  udev, util-linux, iproute, filesystem tools, that it should be included.
>
> tools/power was merged in just 2 versions ago, do you think that
> merging that was a mistake?

Indeed I do not see any advantage, since all the interfaces they use are 
stable anyway (sysfs, msr.ko).

If they had gone in x86info, for example, my distro (F16, not exactly 
conservative) would have likely picked those tools up already, but it 
didn't.

Paolo
Pekka Enberg Nov. 7, 2011, 11:34 a.m. UTC | #62
On Mon, 7 Nov 2011, Gerd Hoffmann wrote:
>> It's not just about code, it's as much about culture and development process.
>
> Indeed.  The BSDs have both kernel and the base system in a single
> repository.  There are probably good reasons for (and against) it.
>
> In Linux we don't have that culture.  No tool (except perf) lives in the
> kernel repo.  I fail to see why kvm-tool is that much different from
> udev, util-linux, iproute, filesystem tools, that it should be included.

You seem to think perf is an exception - I think it's going to be the 
future norm for userspace components that are very close to the kernel. 
That's in fact what Ingo was arguing for when he suggested QEMU to be 
merged to the kernel tree.

 			Pekka
Pekka Enberg Nov. 7, 2011, 11:38 a.m. UTC | #63
On Mon, 7 Nov 2011, Kevin Wolf wrote:
> Makes it a lot less hackable for me unless you want to restrict the set
> of potential developers to Linux kernel developers...

We're not restricting potential developers to Linux kernel folks. We're 
making it easy for them because we believe that the KVM tool is a 
userspace component that requires the kind of low-level knowledge Linux 
kernel developers have.

I think you're looking at the KVM tool with your QEMU glasses on without 
realizing that there's no point in comparing the two: we only support 
Linux on Linux and we avoid hardware emulation as much as possible. So 
what makes sense for QEMU, doesn't necessarily translate to the KVM tool 
project.

 			Pekka
Pekka Enberg Nov. 7, 2011, 11:44 a.m. UTC | #64
On Mon, Nov 7, 2011 at 1:02 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Indeed I do not see any advantage, since all the interfaces they use are
> stable anyway (sysfs, msr.ko).
>
> If they had gone in x86info, for example, my distro (F16, not exactly
> conservative) would have likely picked those tools up already, but it
> didn't.

Distributing userspace tools in the kernel tree is a relatively new
concept so it's not at all surprising distributions don't pick them up
as quickly. That doesn't mean it's a fundamentally flawed approach,
though.

Also, I'm mostly interested in defending the KVM tool, so I'd prefer
not to argue whether or not carrying userspace code in the kernel tree
makes sense or not. The fact is that Linux is already doing it and I
think the only relevant question is whether or not the KVM tool
qualifies. I obviously think the answer is yes.

                               Pekka
Ingo Molnar Nov. 7, 2011, 11:57 a.m. UTC | #65
* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> On Mon, 7 Nov 2011, Gerd Hoffmann wrote:
> >>It's not just about code, it's as much about culture and development process.
> >
> >Indeed.  The BSDs have both kernel and the base system in a single
> >repository.  There are probably good reasons for (and against) it.
> >
> > In Linux we don't have that culture.  No tool (except perf) lives 
> > in the kernel repo.  I fail to see why kvm-tool is that much 
> > different from udev, util-linux, iproute, filesystem tools, that 
> > it should be included.
> 
> You seem to think perf is an exception - I think it's going to be 
> the future norm for userspace components that are very close to the 
> kernel. That's in fact what Ingo was arguing for when he suggested 
> QEMU to be merged to the kernel tree.

Yep, and the answer i got from the Qemu folks when i suggested that 
merge was a polite "buzz off", along the lines of: "We don't want to 
do that, but feel free to write your own tool, leave Qemu alone."

Now that people have done exactly that some Qemu folks not only have 
changed their objection from "write your own tool" to "erm, write 
your own tool but do it the way *we* prefer you to do it" - they also 
started contributing *against* the KVM tool with predictable, once 
every 3 months objections against its upstream merge...

That's not very nice and not very constructive.

The only valid technical objection against tools/kvm/ that i can see 
would be that it's not useful enough yet for the upstream kernel 
versus other tools such as Qemu. In all fairness i think we might 
still be at that early stage of the project but it's clearly 
progressing very rapidly and i'm already using it on a daily basis 
for my own kernel testing purposes. During the Kernel Summit that's 
how i tested contemporary kernels on contemporary user-space 
remotely, without having to risk a physical reboot.

Thanks,

	Ingo
Kevin Wolf Nov. 7, 2011, 11:59 a.m. UTC | #66
Am 07.11.2011 12:38, schrieb Pekka Enberg:
> On Mon, 7 Nov 2011, Kevin Wolf wrote:
>> Makes it a lot less hackable for me unless you want to restrict the set
>> of potential developers to Linux kernel developers...
> 
> We're not restricting potential developers to Linux kernel folks. We're 
> making it easy for them because we believe that the KVM tool is a 
> userspace component that requires the kind of low-level knowledge Linux 
> kernel developers have.
> 
> I think you're looking at the KVM tool with your QEMU glasses on without 
> realizing that there's no point in comparing the two: we only support 
> Linux on Linux and we avoid hardware emulation as much as possible. So 
> what makes sense for QEMU, doesn't necessarily translate to the KVM tool 
> project.

I'm not comparing anything. I'm not even referring to the virtualization
functionality of it. It could be doing anything else and it wouldn't
make a difference.

For KVM tool I am not much more than a mere user. Trying it out was
tedious for me, as it is for anyone else who isn't a kernel developer.
That's all I'm saying.

Making things easier for some kernel developers but ignoring that at the
same time it makes things harder for users I consider a not so clever
move. Just wanted to point that out; feel free to ignore it, your
priorities are probably different.

Kevin
Gerd Hoffmann Nov. 7, 2011, 12:08 p.m. UTC | #67
On 11/07/11 12:34, Pekka Enberg wrote:
> On Mon, 7 Nov 2011, Gerd Hoffmann wrote:
>>> It's not just about code, it's as much about culture and development
>>> process.
>>
>> Indeed.  The BSDs have both kernel and the base system in a single
>> repository.  There are probably good reasons for (and against) it.
>>
>> In Linux we don't have that culture.  No tool (except perf) lives in the
>> kernel repo.  I fail to see why kvm-tool is that much different from
>> udev, util-linux, iproute, filesystem tools, that it should be included.
> 
> You seem to think perf is an exception - I think it's going to be the
> future norm for userspace components that are very close to the kernel.

perf *is* an exception today.

It might make sense to change that.  But IMHO it only makes sense if
there is a really broad agreement on it and other core stuff moves into
the kernel too.  Then you'll be able to get advantages out of it.  For
example standardizing the process to create an initramfs (using the
userspace tools shipped with the kernel) instead of having each distro
creating its own way.

I somehow doubt we'll see such an broad agreement though.  Most people
seem to be happy with the current model.  There is a reason why the
klibc + early-userspace-in-kernel-tree project died in the end ...

cheers,
  Gerd
Gerd Hoffmann Nov. 7, 2011, 12:18 p.m. UTC | #68
On 11/07/11 12:44, Pekka Enberg wrote:
> On Mon, Nov 7, 2011 at 1:02 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Indeed I do not see any advantage, since all the interfaces they use are
>> stable anyway (sysfs, msr.ko).
>>
>> If they had gone in x86info, for example, my distro (F16, not exactly
>> conservative) would have likely picked those tools up already, but it
>> didn't.
> 
> Distributing userspace tools in the kernel tree is a relatively new
> concept so it's not at all surprising distributions don't pick them up
> as quickly. That doesn't mean it's a fundamentally flawed approach,
> though.

tools/ lacks a separation into "kernel hacker's testing+debugging
toolbox" and "userspace tools".  It lacks proper buildsystem integration
for the userspace tools, there is no "make tools" and also no "make
tools_install".  Silently dropping new stuff into tools/ and expecting
the world magically noticing isn't going to work.

cheers,
  Gerd
Pekka Enberg Nov. 7, 2011, 12:21 p.m. UTC | #69
On Mon, Nov 7, 2011 at 2:18 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> tools/ lacks a separation into "kernel hacker's testing+debugging
> toolbox" and "userspace tools".  It lacks proper buildsystem integration
> for the userspace tools, there is no "make tools" and also no "make
> tools_install".  Silently dropping new stuff into tools/ and expecting
> the world magically noticing isn't going to work.

No disagreement here.

                        Pekka
Avi Kivity Nov. 7, 2011, 12:26 p.m. UTC | #70
On 11/07/2011 12:30 PM, Sasha Levin wrote:
> On Mon, Nov 7, 2011 at 12:23 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> >  Hi,
> >
> >> It's not just about code, it's as much about culture and development process.
> >
> > Indeed.  The BSDs have both kernel and the base system in a single
> > repository.  There are probably good reasons for (and against) it.
> >
> > In Linux we don't have that culture.  No tool (except perf) lives in the
> > kernel repo.  I fail to see why kvm-tool is that much different from
> > udev, util-linux, iproute, filesystem tools, that it should be included.
>
> tools/power was merged in just 2 versions ago, do you think that
> merging that was a mistake?

Things like tools/power may make sense, most of the code is tied to the
kernel interfaces.  tools/kvm is 20k lines and is likely to be 40k+
lines or more before it is generally usable.  The proportion of the code
that talks to the kernel is quite small.
Theodore Ts'o Nov. 7, 2011, 12:29 p.m. UTC | #71
On Mon, Nov 07, 2011 at 01:08:50PM +0100, Gerd Hoffmann wrote:
> 
> perf *is* an exception today.
> 
> It might make sense to change that.  But IMHO it only makes sense if
> there is a really broad agreement on it and other core stuff moves into
> the kernel too.  Then you'll be able to get advantages out of it.  For
> example standardizing the process to create an initramfs (using the
> userspace tools shipped with the kernel) instead of having each distro
> creating its own way.

I wish distributions had standardized on a single initramfs, sure.
But that doesn't mean that the only way to do this is to merge
userspace code into the kernel source tree.  Everybody uses fsck,
originally from the e2fsprogs source tree, and now from util-linux-ng,
and that isn't merged into the kernel sources.

And I think would be actively *harmful* to merge util-linux-ng into
the kernel sources.  For a variety of reasons, you may want to upgrade
util-linux-ng, and not the kernel, or the kernel, and not
util-linux-ng.  If you package the two sources together, it becomes
unclear what versions of the kernel will work with which versions of
util-linux-ng, and vice versa.  Suppose you need to fix a security bug
in some program that lives in util-linux-ng.  If it was bundled inside
the kernel, a distribution would now have to release a kernel source
package.  Does that mean that it will have to ship the a new set of
kernel binaries?  Or does the distribution have to ship multiple
binary packages that derive from the differently versioned source
packages?

And the same problems will exist with kvm-tool.  What if you need to
release a new version of kvm-tool?  Does that mean that you have to
release a new set of kernel binaries?  It's a mess, and there's a
reason why we don't have glibc, e2fsprogs, xfsprogs, util-linux-ng,
etc., all packaged into the kernel sources.

Because it's a stupid, idiotic thing to do.

						- Ted
Pekka Enberg Nov. 7, 2011, 12:29 p.m. UTC | #72
Hi Avi,

On Mon, Nov 7, 2011 at 2:26 PM, Avi Kivity <avi@redhat.com> wrote:
>> tools/power was merged in just 2 versions ago, do you think that
>> merging that was a mistake?
>
> Things like tools/power may make sense, most of the code is tied to the
> kernel interfaces.  tools/kvm is 20k lines and is likely to be 40k+
> lines or more before it is generally usable.  The proportion of the code
> that talks to the kernel is quite small.

So what do you think about perf then? The amount of code that talks to
the kernel is much smaller than that of the KVM tool.

                        Pekka
Pekka Enberg Nov. 7, 2011, 12:42 p.m. UTC | #73
Hi Ted,

On Mon, Nov 7, 2011 at 2:29 PM, Ted Ts'o <tytso@mit.edu> wrote:
> And the same problems will exist with kvm-tool.  What if you need to
> release a new version of kvm-tool?  Does that mean that you have to
> release a new set of kernel binaries?  It's a mess, and there's a
> reason why we don't have glibc, e2fsprogs, xfsprogs, util-linux-ng,
> etc., all packaged into the kernel sources.

If we need to release a new version, patches would go through the
-stable tree just like with any other subsystem.

On Mon, Nov 7, 2011 at 2:29 PM, Ted Ts'o <tytso@mit.edu> wrote:
> Because it's a stupid, idiotic thing to do.

The discussion is turning into whether or not linux/tools makes sense
or not. I wish you guys would have had it before perf was merged to
the tree.

                        Pekka
Theodore Ts'o Nov. 7, 2011, 12:43 p.m. UTC | #74
On Mon, Nov 07, 2011 at 02:29:45PM +0200, Pekka Enberg wrote:
> So what do you think about perf then? The amount of code that talks to
> the kernel is much smaller than that of the KVM tool.

I think it's a mess, because it's never clear whether perf needs to be
upgraded when I upgrade the kernel, or vice versa.  This is why I keep
harping on the interface issues.

Fortunately it seems less likely (since perf doesn't run with
privileges) that security fixes will need to be released for perf, but
if it did, given the typical regression testing requirements that many
distributions have, and given that most distro packaging tools assume
that all binaries from a single source package come from a single
version of that source package, I predict you will hear screams from
the distro release engineers.

And by the way, there are use cases, where the guest OS kernel and
root on the guest OS are not available to the untrusted users, where
the userspace KVM program would be part of the security perimeter, and
were security releases to the KVM part of the tool might very well be
necessary, and it would be unfortunate if that forced the release of
new kernel packages each time security fixes are needed to the
kvm-tool userspace.  Might kvm-tool be more secure than qemu?  Quite
possibly, given that it's going to do less than qemu.  But please note
that I've not been arguing that kvm-tool shouldn't be done; just that
it not be included in the kernel sources.

Just as sparse is not bundled into the kernel sources, for crying out
loud!

						- Ted
Avi Kivity Nov. 7, 2011, 12:44 p.m. UTC | #75
On 11/07/2011 02:29 PM, Pekka Enberg wrote:
> Hi Avi,
>
> On Mon, Nov 7, 2011 at 2:26 PM, Avi Kivity <avi@redhat.com> wrote:
> >> tools/power was merged in just 2 versions ago, do you think that
> >> merging that was a mistake?
> >
> > Things like tools/power may make sense, most of the code is tied to the
> > kernel interfaces.  tools/kvm is 20k lines and is likely to be 40k+
> > lines or more before it is generally usable.  The proportion of the code
> > that talks to the kernel is quite small.
>
> So what do you think about perf then? The amount of code that talks to
> the kernel is much smaller than that of the KVM tool.

Maybe it's outgrown the kernel repo too.  Certainly something that has
perl and python integration, a TUI, and one day hopefully a GUI, doesn't
really need the kernel sources.
Theodore Ts'o Nov. 7, 2011, 12:47 p.m. UTC | #76
On Mon, Nov 07, 2011 at 02:42:57PM +0200, Pekka Enberg wrote:
> On Mon, Nov 7, 2011 at 2:29 PM, Ted Ts'o <tytso@mit.edu> wrote:
> > Because it's a stupid, idiotic thing to do.
> 
> The discussion is turning into whether or not linux/tools makes sense
> or not. I wish you guys would have had it before perf was merged to
> the tree.

Perf was IMHO an overreaction caused by the fact that systemtap and
oprofile people packaged and released the sources in a way that kernel
developers didn't like.

I don't think perf should be used as a precendent that now argues that
any new kernel utility should be moved into the kernel sources.  Does
it make sense to move all of mount, fsck, login, etc., into the kernel
sources?  There are far more kernel tools outside of the kernel
sources than inside the kernel sources.

    	       	       	      	       	    - Ted
Pekka Enberg Nov. 7, 2011, 12:59 p.m. UTC | #77
On Mon, Nov 7, 2011 at 2:47 PM, Ted Ts'o <tytso@mit.edu> wrote:
> Perf was IMHO an overreaction caused by the fact that systemtap and
> oprofile people packaged and released the sources in a way that kernel
> developers didn't like.
>
> I don't think perf should be used as a precendent that now argues that
> any new kernel utility should be moved into the kernel sources.  Does
> it make sense to move all of mount, fsck, login, etc., into the kernel
> sources?  There are far more kernel tools outside of the kernel
> sources than inside the kernel sources.

There's two overlapping questions here:

  (1) Does it make sense to merge the KVM tool to Linux kernel tree?

  (2) Does it make sense to merge userspace tools to the kernel tree?

I'm not trying to use perf to justify merging the KVM tool. However, you
seem to be arguing that it shouldn't be merged because merging
userspace tools in general doesn't make sense. That's why I brought up
the situation with perf.

                        Pekka
Pekka Enberg Nov. 7, 2011, 1:12 p.m. UTC | #78
On Mon, Nov 7, 2011 at 2:47 PM, Ted Ts'o <tytso@mit.edu> wrote:
> I don't think perf should be used as a precendent that now argues that
> any new kernel utility should be moved into the kernel sources.  Does
> it make sense to move all of mount, fsck, login, etc., into the kernel
> sources?  There are far more kernel tools outside of the kernel
> sources than inside the kernel sources.

You seem to think that the KVM tool was developed in isolation and we
simply copied the code to tools/kvm for the pull request. That's simply
not true. We've done a lot of work to make the code feel like kernel code
from locking primitive APIs to serial console emulation register names.
We really consider KVM tool to be a new Linux subsystem. It's the long
lost cousin or bastard child of KVM, depending on who you ask.

I don't know if it makes sense to merge the tools you've mentioned above.
My gut feeling is that it's probably not reasonable - there's already a
community working on it with their own development process and coding
style. I don't think there's a simple answer to this but I don't agree with
your rather extreme position that all userspace tools should be kept out
of the kernel tree.

                        Pekka
Anthony Liguori Nov. 7, 2011, 1:17 p.m. UTC | #79
On 11/07/2011 05:57 AM, Ingo Molnar wrote:
>
> * Pekka Enberg<penberg@cs.helsinki.fi>  wrote:
>
>> On Mon, 7 Nov 2011, Gerd Hoffmann wrote:
>>>> It's not just about code, it's as much about culture and development process.
>>>
>>> Indeed.  The BSDs have both kernel and the base system in a single
>>> repository.  There are probably good reasons for (and against) it.
>>>
>>> In Linux we don't have that culture.  No tool (except perf) lives
>>> in the kernel repo.  I fail to see why kvm-tool is that much
>>> different from udev, util-linux, iproute, filesystem tools, that
>>> it should be included.
>>
>> You seem to think perf is an exception - I think it's going to be
>> the future norm for userspace components that are very close to the
>> kernel. That's in fact what Ingo was arguing for when he suggested
>> QEMU to be merged to the kernel tree.
>
> Yep, and the answer i got from the Qemu folks when i suggested that
> merge was a polite "buzz off", along the lines of: "We don't want to
> do that, but feel free to write your own tool, leave Qemu alone."

At least it was polite :-)

>
> Now that people have done exactly that some Qemu folks not only have
> changed their objection from "write your own tool" to "erm, write
> your own tool but do it the way *we* prefer you to do it" - they also
> started contributing *against* the KVM tool with predictable, once
> every 3 months objections against its upstream merge...
>
> That's not very nice and not very constructive.

I think it's fair to have an objection to upstream merge but I think these 
threads are not terribly constructive right now as it's just rehashing the same 
arguments.

I've been thinking about the idea of merging more userspace tools into the 
kernel.  I understand the basic reasoning.  The kernel has a strong, established 
development process.  It has good infrastructure and a robust hierarchy of 
maintainers.

Good infrastructure can make a big difference to the success of a project. 
Expanding the kernel infrastructure to more projects does seem like an obvious 
thing to do when you think about it in that way.

The approach other projects have taken to this is to form a formal incubator. 
Apache is a good example of this.  There are clear (written) rules about what it 
takes for a project to join.  Once a project joins, there's a clear governance 
structure.  The project gets to consume all of the Apache infrastructure resources.

Other foundations have a release cadence to ensure that multiple components form 
a cohesive individual release (oVirt).

I think you are trying to do this in a more organic way by just merging things 
into the main git tree.  Have you thought about creating a more formal kernel 
incubator program?

Regards,

Anthony Liguori
Vince Weaver Nov. 7, 2011, 5:03 p.m. UTC | #80
On Mon, 7 Nov 2011, Pekka Enberg wrote:

> I've never heard ABI incompatibility used as an argument for perf. Ingo?

Never overtly.  They're too clever for that.

In any case, as a primary developer of a library (PAPI) that uses the 
perf_events ABI I have to say that having perf in the kernel has been a 
*major* pain for us.

Unlike the perf developers, we *do* have to maintain backwards 
compatability.  And we have a lot of nasty code in PAPI to handle this.
Entirely because the perf_events ABI is not stable.  It's mostly stable, 
but there are enough regressions to be a pain.

It's problem enough that there's no way to know what version of the 
perf_event abi you are running against and we have to guess based on 
kernel version.  This gets "fun" because all of the vendors have 
backported seemingly random chunks of perf_event code to their older 
kernels.

And it often does seem as the perf developers don't care when something 
breaks in perf_events if it doesn't affect perf users.

For example, the new NMI watchdog severely breaks perf_event event 
allocation if you are using FORMAT_GROUP.  perf doesn't use this though, 
so none of the kernel developers seem to care.  And unless I can quickly 
come up with a patch as an outsider, a few kernel versions will go by and 
the kernel devs will declare "well it was broken so long, now we don't 
have to fix it".  Fun.

Vince
Ingo Molnar Nov. 7, 2011, 5:59 p.m. UTC | #81
* Vince Weaver <vince@deater.net> wrote:

> On Mon, 7 Nov 2011, Pekka Enberg wrote:
> 
> > I've never heard ABI incompatibility used as an argument for 
> > perf. Ingo?

Correct, the ABI has been designed in a way to make it really hard to 
break the ABI via either directed backports or other mess-ups.

The ABI is both backwards *and* forwards ABI compatible, which is 
very rare amongst Linux ABIs.

For frequently used tools, such as perf, there's no ABI compatibility 
problem in practice: using newer perf on older kernels is pretty 
common. Using older perf on new kernels is rarer, but that generally 
works too.

In hindsight being in the kernel repo made it *easier* for perf to 
implement a good, stable ABI while also keeping a very high rate of 
change of the subsystem: changes are more 'concentrated' and people 
can stay focused on the ball to extend the ABI in sensible ways 
instead of struggling with project boundary artifacts.

I think we needed to do only one revert along the way in the past two 
years, to fix an unintended ABI breakage in PowerTop. Considering the 
total complexity of the perf ABI our compatibility track record is 
*very* good.

> Never overtly.  They're too clever for that.

Pekka, Vince has meanwhile become the resident perf critic on lkml, 
always in it when it comes to some perf-bashing:

> In any case, as a primary developer of a library (PAPI) that uses 
> the perf_events ABI I have to say that having perf in the kernel 
> has been a *major* pain for us.

... and you have argued against perf from the very first day on, when 
you were one of the perfmon developers - and IMO in hindsight you've 
been repeatedly wrong about most of your design arguments.

> Unlike the perf developers, we *do* have to maintain backwards 
> compatability. [...]

We do too, i use new perf on older distro kernels all the time. If 
you see a breakage of functionality that tools use and report in a 
timely fashion then please report it.

> [...] And we have a lot of nasty code in PAPI to handle this. 
> Entirely because the perf_events ABI is not stable.  It's mostly 
> stable, but there are enough regressions to be a pain.

You are blaming the wrong guys really.

The PAPI project has the (fundamental) problem that you are still 
doing it in the old-style sw design fashion, with many months long 
delays in testing, and then you are blaming the problems you 
inevitably meet with that model on *us*.
 
There was one PAPI incident i remember where it took you several 
*months* to report a regression in a regular PAPI test-case (no 
actual app affected as far as i know). No other tester ever ran the 
PAPI testcases so nobody else reported it.

Moving perf out of the kernel would make that particular situation 
*worse*, by further increasing the latency of fixes and by further 
increasing the risk of breakages.

Sorry, but you are trying to "fix" perf by dragging it down to your 
bad level of design and we will understandably resist that ...

> It's problem enough that there's no way to know what version of the 
> perf_event abi you are running against and we have to guess based 
> on kernel version.  This gets "fun" because all of the vendors have 
> backported seemingly random chunks of perf_event code to their 
> older kernels.

The ABI design allows for that kind of flexible extensibility, and 
it's one of its major advantages.

What we *cannot* protect against is you relying on obscure details of 
the ABI without adding it to 'perf test' and then not testing the 
upstream kernel in a timely enough fashion either ...

Nobody but you tests PAPI so you need to become *part* of the 
upstream development process, which releases a new upstream kernel 
every 3 months.

> And it often does seem as the perf developers don't care when 
> something breaks in perf_events if it doesn't affect perf users.

I have to reject your slander, both Peter, Arnaldo and me care deeply 
about fixing regressions and i've personally applied fixes out of 
order that addressed some sort of PAPI problem - whenever you chose 
to report them.

Vince, you are wrong and you have also become somewhat malicious in 
your arguments - please stop it.

> For example, the new NMI watchdog severely breaks perf_event event 
> allocation if you are using FORMAT_GROUP.  perf doesn't use this 
> though, so none of the kernel developers seem to care.  And unless 
> I can quickly come up with a patch as an outsider, a few kernel 
> versions will go by and the kernel devs will declare "well it was 
> broken so long, now we don't have to fix it".  Fun.

Face it, the *real* problem is that beyond yourself very few people 
who use a new kernel use PAPI and your long latency of testing 
exposes you to breakages in a much more agile subsystem such as perf. 
Please fix that instead of blaming it on others.

Also, as i mentioned it several times before, you are free to add an 
arbitrary number of ABI test-cases to 'perf test' and we can promise 
that we run that. Right now it consists of a few tests:

 $ perf test
 1: vmlinux symtab matches kallsyms: Ok

 2: detect open syscall event: Ok
 3: detect open syscall event on all cpus: Ok
 4: read samples using the mmap interface: Ok

... but we do not object to adding testcases for functionality used 
by PAPI.

The usual ABI rules also apply: we'll revert everything that breaks 
the ABI - but for that you need to report it *in time*, not timed one 
day before the next -stable release like you did it last time around 
...

So there's several ways of how you could help push your own interests 
into the kernel project.

Thanks,

	Ingo
Pekka Enberg Nov. 7, 2011, 7:53 p.m. UTC | #82
On Mon, 7 Nov 2011, Pekka Enberg wrote:
>> I've never heard ABI incompatibility used as an argument for perf. Ingo?

On Mon, Nov 7, 2011 at 7:03 PM, Vince Weaver <vince@deater.net> wrote:
> Never overtly.  They're too clever for that.

If you want me to take you seriously, spare me from the conspiracy theories, OK?

I'm sure perf developers break the ABI sometimes - that happens
elsewhere in the kernel as well. However, Ted claimed that perf
developers use tools/perf as an excuse to break the ABI _on purpose_
which is something I have hard time believing.

Your snarky remarks doesn't really help this discussion either. It's
apparent from the LKML discussions that you're more interested in
arguing with the perf developers rather than helping them.

                        Pekka
Frank Ch. Eigler Nov. 7, 2011, 8:03 p.m. UTC | #83
Ingo Molnar <mingo@elte.hu> writes:

> [...]
>> It's problem enough that there's no way to know what version of the 
>> perf_event abi you are running against and we have to guess based 
>> on kernel version.  This gets "fun" because all of the vendors have 
>> backported seemingly random chunks of perf_event code to their 
>> older kernels.
>
> The ABI design allows for that kind of flexible extensibility, and 
> it's one of its major advantages.
>
> What we *cannot* protect against is you relying on obscure details of 
> the ABI [...]

Is there some documentation that clearly spells out which parts of the
perf syscall userspace ABI are "obscure" and thus presumably
changeable?

> [...]  The usual ABI rules also apply: we'll revert everything that
> breaks the ABI - but for that you need to report it *in time* [...]

If the ABI is so great in its flexible extensibility, how come it
can't be flexibly extended without having to passing the burden of
compatibility testing & reversion-yawping to someone else?


- FChE
Pekka Enberg Nov. 7, 2011, 8:09 p.m. UTC | #84
On Mon, 7 Nov 2011, Frank Ch. Eigler wrote:
>> The ABI design allows for that kind of flexible extensibility, and
>> it's one of its major advantages.
>>
>> What we *cannot* protect against is you relying on obscure details of
>> the ABI [...]
>
> Is there some documentation that clearly spells out which parts of the
> perf syscall userspace ABI are "obscure" and thus presumably
> changeable?

That's actually something the KVM and virtio folks have done a great job 
with IMHO. Both ABIs are documented pretty extensively and the specs are 
kept up to date.

I guess for perf ABI, "perf test" is the closest thing to a specification 
so if your application is using something that's not covered by it, you 
might be in trouble.

 			Pekka
Theodore Ts'o Nov. 7, 2011, 8:32 p.m. UTC | #85
On Mon, Nov 07, 2011 at 09:53:28PM +0200, Pekka Enberg wrote:
> 
> I'm sure perf developers break the ABI sometimes - that happens
> elsewhere in the kernel as well. However, Ted claimed that perf
> developers use tools/perf as an excuse to break the ABI _on purpose_
> which is something I have hard time believing.

I remember an assertion, probably a year or two ago, probably at the
previous year's kernel summit, that one of the reasons for having the
perf code inline in the kernel was so that synchronized changes could
be made to both the kernel and userspace tool together.  So it's not a
matter of breaking the ABI _on_ _purpose_, it's an assertion that
there is no ABI at all.  Since the perf tool and the kernel tool have
to be built together, so long as a user does that, no harm, no foul.
Recall that Linus has said that he doesn't care about whether or not
something is an ABI; he only care if users code don't perceive
breakage.  If they didn't perceive breakage, then it doesn't matter if
an interface is changed.

So the real question is whether or not this was an excuse to break the
ABI, but whether or not the perf developers acknowledge there is an
ABI at all, and whether it's OK for other developers to depend on the
syscall interface or not.  Actually, though, it shouldn't matter,
because intentions don't matter.

Recall the powertop/ftrace case.  If you expose an interface, and
people start using that interface, then you can't break them, period.
So as far as Vince is concerned, if you have a userspace library which
depends on the perf interface, then you should try out the kernel
after each merge window, and if your library breaks, you should
complain to Ingo and Linus directly, and request that the commit which
broke your tool to be reverted --- because that's the rule; no
breakage is allowed.

As far as kvm-tool being in the kernel, I still don't see particularly
valid arguments for why it should be in the kernel.  It can't be the
perf argument of "we can make simultaneous changes in the userspace
and kernel code", because if those changes break qemu-kvm, then a
complaint to Linus will cause the problem code to be reverted.

As far as the code using the same coding conventions and naming
conventions as the kernel, that to me isn't a particular strong
argument either.  E2fsprogs uses the Signed-off-by lines, and the same
coding conventions of the kernel, and it even has a slightly modified
version of two kernel source file in e2fsprogs (e2fsck/recovery.c and
e2fsck/revoke.c), plus a header file with data structures that have to
be kept in sync with the kernel header file.  But that doesn't make it
"part of the kernel", and it's not a justification for it to be
bundled with the kernel.

Personally, I consider code that runs in userspace as a pretty bright
line, as being "not kernel code", and while perhaps things like
initramfs and the crazy ideas people have had in the past of moving
stuff out of kernel/init.c into userspace might have qualified as
stuff really close to the kernel, something like kvm-tool that runs
way after boot, doesn't even come close.  Wine is another example of
another package that has lots of close kernel ties, but was also not
bundled into the kernel.

The precedent has all mainly been on the "keep the kernel separate"
side of things, and the arguments for bundling it with the kernel are
much weaker, especially since the interface is well-developed, and
there are external users of the interface which means you can't make
changes to the interface willy-nilly.

Indeed, when the perf interface was changing all the time, maybe there
was some convenience to have it be bundled with the kernel, so there
was no need to negotiate interface version numbers, et. al.  But given
how it has to link in so many user space libraries, I personally think
it's fair to ask the question whether now that it has matured, whether
it's time to move it out of the kernel source tree.

Regards,

							- Ted
Theodore Ts'o Nov. 7, 2011, 8:35 p.m. UTC | #86
On Mon, Nov 07, 2011 at 10:09:34PM +0200, Pekka Enberg wrote:
> 
> I guess for perf ABI, "perf test" is the closest thing to a
> specification so if your application is using something that's not
> covered by it, you might be in trouble.

I don't believe there's ever been any guarantee that "perf test" from
version N of the kernel will always work on a version N+M of the
kernel.  Perhaps I am wrong, though. If that is a guarantee that the
perf developers are willing to stand behind, or have already made, I
would love to be corrected and would be delighted to hear that in fact
there is a stable, backwards compatible perf ABI.

Regards,

						- Ted
Pekka Enberg Nov. 7, 2011, 9:36 p.m. UTC | #87
Hi Ted,

On Mon, Nov 7, 2011 at 10:32 PM, Ted Ts'o <tytso@mit.edu> wrote:
> Personally, I consider code that runs in userspace as a pretty bright
> line, as being "not kernel code", and while perhaps things like
> initramfs and the crazy ideas people have had in the past of moving
> stuff out of kernel/init.c into userspace might have qualified as
> stuff really close to the kernel, something like kvm-tool that runs
> way after boot, doesn't even come close.  Wine is another example of
> another package that has lots of close kernel ties, but was also not
> bundled into the kernel.

It's not as clear line as you make it out to be.

KVM tool also has mini-BIOS code that runs in guest space. It has a
code that runs in userspace but is effectively a simple bootloader. So
it definitely doesn't fit the simple definition of "running way after
boot" (we're _booting_ the kernel too).

Linsched fits your definition but is clearly worth integrating to the
kernel tree. While you are suggesting that maybe we should move Perf
out of the tree now that it's mature, I'm pretty sure you'd agree that
it probably would not have happened if the userspace parts were
developed out of tree.

There's also spectacular failures in the kernel history where the
userspace split was enforced. For example, userspace suspend didn't
turn out the way people envisioned it at the time. We don't know how
it would have worked out if the userspace components would have been
in the tree but it certainly would have solved many if the early ABI
issues.

I guess I'm trying to argue here that there's a middle ground. I'm
willing to bet projects like klibc and unified initramfs will
eventually make it to the kernel tree because they simply make so much
sense. I'm also willing to be that the costs of moving Perf out of the
tree are simply too high to make it worthwhile.

Does that mean KVM tool should get a free pass in merging? Absolutely
not. But I do think your position is too extreme and ignores the
benefits of developing userspace tools in the kernel ecosystem which
was summed up by Anthony rather well in this thread:

https://lkml.org/lkml/2011/11/7/169

                                Pekka
Anthony Liguori Nov. 7, 2011, 10:19 p.m. UTC | #88
On 11/07/2011 03:36 PM, Pekka Enberg wrote:
> Hi Ted,
>
> On Mon, Nov 7, 2011 at 10:32 PM, Ted Ts'o<tytso@mit.edu>  wrote:
>> Personally, I consider code that runs in userspace as a pretty bright
>> line, as being "not kernel code", and while perhaps things like
>> initramfs and the crazy ideas people have had in the past of moving
>> stuff out of kernel/init.c into userspace might have qualified as
>> stuff really close to the kernel, something like kvm-tool that runs
>> way after boot, doesn't even come close.  Wine is another example of
>> another package that has lots of close kernel ties, but was also not
>> bundled into the kernel.
>
> It's not as clear line as you make it out to be.
>
> KVM tool also has mini-BIOS code that runs in guest space. It has a
> code that runs in userspace but is effectively a simple bootloader. So
> it definitely doesn't fit the simple definition of "running way after
> boot" (we're _booting_ the kernel too).
>
> Linsched fits your definition but is clearly worth integrating to the
> kernel tree. While you are suggesting that maybe we should move Perf
> out of the tree now that it's mature, I'm pretty sure you'd agree that
> it probably would not have happened if the userspace parts were
> developed out of tree.
>
> There's also spectacular failures in the kernel history where the
> userspace split was enforced. For example, userspace suspend didn't
> turn out the way people envisioned it at the time. We don't know how
> it would have worked out if the userspace components would have been
> in the tree but it certainly would have solved many if the early ABI
> issues.
>
> I guess I'm trying to argue here that there's a middle ground. I'm
> willing to bet projects like klibc and unified initramfs will
> eventually make it to the kernel tree because they simply make so much
> sense. I'm also willing to be that the costs of moving Perf out of the
> tree are simply too high to make it worthwhile.
>
> Does that mean KVM tool should get a free pass in merging? Absolutely
> not. But I do think your position is too extreme and ignores the
> benefits of developing userspace tools in the kernel ecosystem which
> was summed up by Anthony rather well in this thread:
>
> https://lkml.org/lkml/2011/11/7/169

The kernel ecosystem does not have to be limited to linux.git.  There could be a 
process to be a "kernel.org project" for projects that fit a certain set of 
criteria.  These projects could all share the Linux kernel release cadence and 
have a kernel maintainer as a sponsor or something like that.

That is something that could potentially benefit things like e2fs-tools and all 
of the other tools that are tied closely to the kernel.

In fact, having a single place where users could find all of the various kernel 
related tools and helpers would probably be extremely useful.  There's no reason 
this needs to be linux.git though, this could just be a web page on kernel.org.

Regards,

Anthony Liguori

>
>                                  Pekka
Theodore Ts'o Nov. 7, 2011, 11:42 p.m. UTC | #89
On Nov 7, 2011, at 5:19 PM, Anthony Liguori wrote:

> 
> The kernel ecosystem does not have to be limited to linux.git.  There could be a process to be a "kernel.org project" for projects that fit a certain set of criteria.  These projects could all share the Linux kernel release cadence and have a kernel maintainer as a sponsor or something like that.
> 
> That is something that could potentially benefit things like e2fs-tools and all of the other tools that are tied closely to the kernel.

We have that already.   Packages such as e2fsprogs, xfsprogs, xfstests, sparse, git, etc., have git trees under git.kernel.org.  And I agree that's the perfect place for kvm-tool and perf.   :-)

-- Ted
Vince Weaver Nov. 8, 2011, 5:29 a.m. UTC | #90
On Mon, 7 Nov 2011, Ingo Molnar wrote:
> I think we needed to do only one revert along the way in the past two 
> years, to fix an unintended ABI breakage in PowerTop. Considering the 
> total complexity of the perf ABI our compatibility track record is 
> *very* good.

There have been more breakages, as you know.  It's just they weren't 
caught in time so they were declared to be grandfathered in rather
than fixed.

> Pekka, Vince has meanwhile become the resident perf critic on lkml, 
> always in it when it comes to some perf-bashing:

For what it's worth you'll find commits from me in the qemu tree, and I
also oppose the merge of kvm-tool into the Linux tree.

> ... and you have argued against perf from the very first day on, when 
> you were one of the perfmon developers - and IMO in hindsight you've 
> been repeatedly wrong about most of your design arguments.

I can't find an exact e-mail, but I seem to recall my arguments were that
Pentium 4 support would be hard (it was), that in-kernel generalized 
events were a bad idea (I still think that, try talking to the ARM guys 
sometime about that) and that making access to raw events hard (by not 
using a naming library) was silly.  I'm sure I probably said other things
that were eventually addressed.

> The PAPI project has the (fundamental) problem that you are still 
> doing it in the old-style sw design fashion, with many months long 
> delays in testing, and then you are blaming the problems you 
> inevitably meet with that model on *us*.

The fundamental problem with the PAPI project is that we only have 3 
full-time developers, and we have to make sure PAPI runs on about 10 
different platforms, of which perf_events/Linux is only one.

Time I waste tracking down perf_event ABI regressions and DoS bugs
takes away from actual useful userspace PAPI development.

> There was one PAPI incident i remember where it took you several 
> *months* to report a regression in a regular PAPI test-case (no 
> actual app affected as far as i know). No other tester ever ran the 
> PAPI testcases so nobody else reported it.

We have a huge userbase.  They run on some pretty amazing machines and 
do some tests that strain perf libraries to the limit.
They also tend to use distro kernels, assuming they even have moved to 
2.6.31+ kernels yet.  When these power users report problems, they aren't 
going to be against the -tip tree.

> Nobody but you tests PAPI so you need to become *part* of the 
> upstream development process, which releases a new upstream kernel 
> every 3 months.

PAPI is a free software project, with the devel tree available from CVS.
It takes maybe 15 minutes to run the full PAPI regression suite.
I encourage you or any perf developer to try it and report any issues.

I can only be so comprehensive.  I didn't find the current NMI-watchdog 
regression right away because my git tree builds didn't have it enabled.  
It wasn't until there started being 3.0 distro kernels that people started 
reporting the problem to us.

> Also, as i mentioned it several times before, you are free to add an 
> arbitrary number of ABI test-cases to 'perf test' and we can promise 
> that we run that. Right now it consists of a few tests:

as mentioned before I have my own perf_event test suite with 20+ tests.
  http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html

I do run it often.  It tends to be reactionary though, as I can only add a 
test for a bug once I know about it.

I also have more up-to date perf documentation than the kernel does:
  http://web.eecs.utk.edu/~vweaver1/projects/perf-events/programming.html

and a cpu compatability matrix:
  http://web.eecs.utk.edu/~vweaver1/projects/perf-events/support.html

I didn't really want to turn this into yet another perf flamewar.  I just 
didn't want the implication that perf being in kernel is all rainbows
and unicorns to go unchallenged.

Vince
Ingo Molnar Nov. 8, 2011, 9:32 a.m. UTC | #91
* Theodore Tso <tytso@MIT.EDU> wrote:

> On Nov 7, 2011, at 5:19 PM, Anthony Liguori wrote:
> 
> > The kernel ecosystem does not have to be limited to linux.git.  
> > There could be a process to be a "kernel.org project" for 
> > projects that fit a certain set of criteria.  These projects 
> > could all share the Linux kernel release cadence and have a 
> > kernel maintainer as a sponsor or something like that.
> > 
> > That is something that could potentially benefit things like 
> > e2fs-tools and all of the other tools that are tied closely to 
> > the kernel.
> 
> We have that already.  Packages such as e2fsprogs, xfsprogs, 
> xfstests, sparse, git, etc., have git trees under git.kernel.org.  
> And I agree that's the perfect place for kvm-tool and perf.  :-)

I guess this should be a F.A.Q., but it's worth repeating that from 
the perf tooling project perspective, being integrated into the 
kernel tree in the past 2-3 years had *numerous* *massive* advantages 
that improved the project's quality.

The shared repo brought countless advantages that a simple kernel.org 
hosting in a split external tool repo would not have brought.

No ifs and when about it, these are the plain facts:

 - Better features, better ABIs: perf maintainers can enforce clean, 
   functional and usable tooling support *before* committing to an 
   ABI on the kernel side. This is a *huge* deal to improve the 
   quality of the kernel, the ABI and the tooling side and we made 
   use of it a number of times.

   A perf kernel feature has to come with working, high-quality and
   usable tooling support - or it won't go upstream. (I could think
   of numerous other subsystems which would see improvements if they
   enforced this too.)

 - We have a shared Git tree with unified, visible version control. I
   can see kernel feature commits followed by tooling support, in a
   single flow of related commits:

      perf probe: Update perf-probe document
      perf probe: Support --del option
      trace-kprobe: Support delete probe syntax

   With two separate Git repositories this kind of connection between
   the tool and the kernel is inevitably weakened or lost.

 - Easier development, easier testing: if you work on a kernel 
   feature and on matching tooling support then it's *much* easier to
   work in a single tree than working in two or more trees in 
   parallel. I have worked on multi-tree features before, and except
   special exceptions they are generally a big pain to develop.

   It's not just a developer convenience factor: "big pain" 
   inevitably transforms into "lower quality" as well.

 - There's a predictable 3 month release cycle of the perf tool,
   enforced *externally*, by the kernel project. This allowed much
   easier synchronization of kernel and user-space features and
   removes version friction. It also guarantees and simplifies the
   version frequency to packagers and users.

 - We are using and enforcing established quality control and coding
   principles of the kernel project. If we mess up then Linus pushes
   back on us at the last line of defense - and has pushed back on us
   in the past. I think many of the currently external kernel
   utilities could benefit from the resulting rise in quality.
   I've seen separate tool projects degrade into barely usable
   tinkerware - that i think cannot happen to perf, regardless of who
   maintains it in the future.

 - Better debuggability: sometimes a combination of a perf
   change in combination with a kernel change causes a breakage. I
   have bisected the shared tree a couple of times already, instead
   of having to bisect a (100,000 commits x 10,000 commits) combined
   space which much harder to debug ...

 - Code reuse: we can and do share source code between the kernel and
   the tool where it makes sense. Both the tooling and the kernel
   side code improves from this. (Often explicit librarization makes
   little sense due to the additional maintenance overhead of a split
   library project and the impossibly long latency of how the kernel
   can rely on the ready existence of such a newly created library
   project.)

 - [ etc: there's half a dozen of other, smaller positive effects as 
     well. ]

Also, while i'm generally pretty good at being the devil's advocate 
as well, but i've yet to see a *single* serious disadvantage of the 
shared repo:

 - Yes, in principle sharing code could be messy - in practice it is
   not, in fact it cleans things up where we share code and triggers 
   fixes on both sides. Sharing code *works*, as long as there's no 
   artificial project boundary.

 - Yes, in principle we could end up only testing new-kernel+new-tool 
   and regress older ABI or tool versions. In practice it does not 
   happen disproportionately: people (us developers included) do test 
   the other combinations as well and the ABI has been designed in a 
   way to make it backwards and forwards compatible by default. I 
   think we have messed up a surprisingly small number of times so 
   far, considering the complexity and growth rate of the ABI.

 - Yes, in principle we could end up being too kernel centric. In 
   practice people are using perf to measure user-space code far more 
   often - and we ourselves use perf to develop perf tooling, which 
   gives an indirect guarantee as well.

In our experience, the almost 3 years track record of perf gives a 
strong validation to the idea that tools that are closely related to 
the kernel can (and quite likely *should*) prosper in the kernel repo 
itself.

While it was somewhat of an unknowable experiement when we started it 
3 years ago, in hindsight it was a no-brainer decision with *many* 
documented advantages to both to the kernel and to tools/perf/.

So we definitely see correlation between tool quality and the shared 
repo maintenance set-up, and i think the list above gives plenty of 
reason to suspect causation as well ...

Finally, i find it rather weird that the people pushing perf to move 
out of the kernel have not actually *worked* in such a shared repo 
scheme yet...

None of the perf developers with whom i'm working complained about 
the shared repo so far - publicly or privately. By all means they are 
enjoying it and if you look at the stats and results you'll agree 
that they are highly productive working in that environment.

If you look at tools/kvm/ contributors you'll find a very similar 
mind-set and similar experiences - albeit the project is much younger 
and smaller.

*That is what matters*.

So i think you should seriously consider moving your projects *into* 
tools/ instead of trying to get other projects to move out ...

You should at least *try* the unified model before criticising it - 
because currently you guys are preaching about sex while having sworn 
a life long celibacy ;-)

Thanks,

	Ingo
Theodore Ts'o Nov. 8, 2011, 10:21 a.m. UTC | #92
On Nov 8, 2011, at 4:32 AM, Ingo Molnar wrote:
> 
> No ifs and when about it, these are the plain facts:
> 
> - Better features, better ABIs: perf maintainers can enforce clean, 
>   functional and usable tooling support *before* committing to an 
>   ABI on the kernel side.

"We don't have to be careful about breaking interface compatibility while we are developing new features".

The flip side of this is that it's not obvious when an interface is stable, and when it is still subject to change.  It makes life much harder for any userspace code that doesn't live in the kernel.   And I think we do agree that moving all of userspace into a single git tree makes no sense, right?

> - We have a shared Git tree with unified, visible version control. I
>   can see kernel feature commits followed by tooling support, in a
>   single flow of related commits:
> 
>      perf probe: Update perf-probe document
>      perf probe: Support --del option
>      trace-kprobe: Support delete probe syntax
> 
>   With two separate Git repositories this kind of connection between
>   the tool and the kernel is inevitably weakened or lost.

"We don't have to clearly document new interfaces between kernel and userspace, and instead rely on git commit order for people to figure out what's going on with some new interface"

> - Easier development, easier testing: if you work on a kernel 
>   feature and on matching tooling support then it's *much* easier to
>   work in a single tree than working in two or more trees in 
>   parallel. I have worked on multi-tree features before, and except
>   special exceptions they are generally a big pain to develop.

I've developed in the split tree systems, and it's really not that hard.  It does mean you have to be explicit about designing interfaces up front, and then you have to have a good, robust way of negotiating what features are in the kernel, and what features are supposed by the userspace --- but if you don't do that then having good backwards and forwards compatibility between different versions of the tool simply doesn't exist.

So at the end of the day it question is whether you want to be able to (for example) update e2fsck to get better ability to fix more file system corruptions, without needing to upgrade the kernel.   If you want to be able to use a newer, better e2fsck with an older, enterprise kernel, then you have use certain programming disciplines.   That's where the work is, not in whether you have to maintain two git trees or a single git tree.

> - We are using and enforcing established quality control and coding
>   principles of the kernel project. If we mess up then Linus pushes
>   back on us at the last line of defense - and has pushed back on us
>   in the past. I think many of the currently external kernel
>   utilities could benefit from the resulting rise in quality.
>   I've seen separate tool projects degrade into barely usable
>   tinkerware - that i think cannot happen to perf, regardless of who
>   maintains it in the future.

That's basically saying that if you don't have someone competent managing the git tree and providing quality assurance, life gets hard.   Sure.   But at the same time, does it scale to move all of userspace under one git tree and depending on Linus to push back? 

I mean, it would have been nice to move all of GNOME 3 under the Linux kernel, so Linus could have pushed back on behalf of all of us power users, but as much as many of us would have appreciated someone being able to push back against the insanity which is the GNOME design process, is that really a good enough excuse to move all of GNOME 3 into the kernel source tree?   :-)

> - Better debuggability: sometimes a combination of a perf
>   change in combination with a kernel change causes a breakage. I
>   have bisected the shared tree a couple of times already, instead
>   of having to bisect a (100,000 commits x 10,000 commits) combined
>   space which much harder to debug …

What you are describing happens when someone hasn't been careful about their kernel/userspace interfaces.

If you have been rigorous with your interfaces, this isn't really an issue.   When's the last time we've had to do a NxM exhaustive testing to find a broken sys call ABI between (for example) the kernel and MySQL?

> - Code reuse: we can and do share source code between the kernel and
>   the tool where it makes sense. Both the tooling and the kernel
>   side code improves from this. (Often explicit librarization makes
>   little sense due to the additional maintenance overhead of a split
>   library project and the impossibly long latency of how the kernel
>   can rely on the ready existence of such a newly created library
>   project.)

How much significant code really can get shared?   Memory allocation is different between kernel and userspace code, how you do I/O is different, error reporting conventions are generally different, etc.   You might have some serialization and deserialization code which is in common, but (surprise!) that's generally part of your interface, which is hopefully relatively stable especially once the tool and the interface has matured.

-- Ted
Ingo Molnar Nov. 8, 2011, 10:22 a.m. UTC | #93
* Ted Ts'o <tytso@mit.edu> wrote:

> I don't believe there's ever been any guarantee that "perf test" 
> from version N of the kernel will always work on a version N+M of 
> the kernel.  Perhaps I am wrong, though. If that is a guarantee 
> that the perf developers are willing to stand behind, or have 
> already made, I would love to be corrected and would be delighted 
> to hear that in fact there is a stable, backwards compatible perf 
> ABI.

We do even more than that, the perf ABI is fully backwards *and* 
forwards compatible: you can run older perf on newer ABIs and newer 
perf on older ABIs.

To show you how it works in practice, here's a random 
cross-compatibility experiment: going back to the perf ABI of 2 years 
ago. I used v2.6.32 which was just the second upstream kernel with 
perf released in it.

So i took a fresh perf tool version and booted a vanilla v2.6.32 
(x86, defconfig, PERF_COUNTERS=y) kernel:

  $ uname -a
  Linux mercury 2.6.32 #162137 SMP Tue Nov 8 10:55:37 CET 2011 x86_64 x86_64 x86_64 GNU/Linux

  $ perf --version
  perf version 3.1.1927.gceec2

  $ perf top

  Events: 2K cycles
 61.68%  [kernel]             [k] sha_transform
 16.09%  [kernel]             [k] mix_pool_bytes_extract
  4.70%  [kernel]             [k] extract_buf
  4.17%  [kernel]             [k] _spin_lock_irqsave
  1.44%  [kernel]             [k] copy_user_generic_string
  0.75%  [kernel]             [k] extract_entropy_user
  0.37%  [kernel]             [k] acpi_pm_read

[the box is running a /dev/urandom stress-test as you can see.]

 $ perf stat sleep 1

 Performance counter stats for 'sleep 1':

          0.766698 task-clock                #    0.001 CPUs utilized          
                 1 context-switches          #    0.001 M/sec                  
                 0 CPU-migrations            #    0.000 M/sec                  
               177 page-faults               #    0.231 M/sec                  
         1,513,332 cycles                    #    1.974 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
           522,609 instructions              #    0.35  insns per cycle        
            65,812 branches                  #   85.838 M/sec                  
             7,762 branch-misses             #   11.79% of all branches        

       1.076211168 seconds time elapsed

The two <not supported> events are not supported by the old kernel - 
but the other events were and the tool picked them up without bailing 
out.

Regular profiling:

 $ perf record -a sleep 1
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.075 MB perf.data (~3279 samples) ]

perf report output:

 $ perf report

 Events: 1K cycles
  64.45%          dd  [kernel.kallsyms]    [k] sha_transform
  19.39%          dd  [kernel.kallsyms]    [k] mix_pool_bytes_extract
   4.11%          dd  [kernel.kallsyms]    [k] _spin_lock_irqsave
   2.98%          dd  [kernel.kallsyms]    [k] extract_buf
   0.84%          dd  [kernel.kallsyms]    [k] copy_user_generic_string
   0.38%         ssh  libcrypto.so.0.9.8b  [.] lh_insert
   0.28%   flush-8:0  [kernel.kallsyms]    [k] block_write_full_page_endio
   0.28%   flush-8:0  [kernel.kallsyms]    [k] generic_make_request

These examples show *PICTURE PERFECT* backwards ABI compatibility, 
when using the bleeding perf tool on an ancient perf kernel (when it 
wasnt even called 'perf events' but 'perf counters').

[ Note, i didnt go back to v2.6.31, the oldest upstream perf kernel, 
  because it's such a pain to build with recent binutils and recent 
  GCC ... v2.6.32 already needed a workaround and a couple of .config 
  tweaks to build and boot at all. ]

Then i built the ancient v2.6.32 perf tool from 2 years ago:

 $ perf --version
 perf version 0.0.2.PERF

and booted a fresh v3.1+ kernel:

 $ uname -a
 Linux mercury 3.1.0-tip+ #162138 SMP Tue Nov 8 11:14:26 CET 2011 x86_64 x86_64 x86_64 GNU/Linux

 $ perf stat ls

 Performance counter stats for 'ls':

       1.739193  task-clock-msecs         #      0.069 CPUs 
              0  context-switches         #      0.000 M/sec
              0  CPU-migrations           #      0.000 M/sec
            250  page-faults              #      0.144 M/sec
        3477562  cycles                   #   1999.526 M/sec
        1661460  instructions             #      0.478 IPC  
         839826  cache-references         #    482.883 M/sec
          15742  cache-misses             #      9.051 M/sec

    0.025231139  seconds time elapsed

 $ perf top

 ------------------------------------------------------------------------------
   PerfTop:   38916 irqs/sec  kernel:99.6% [100000 cycles],  (all, 2 CPUs)
 ------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            41191.00 - 53.1% : sha_transform
            20818.00 - 26.8% : mix_pool_bytes_extract
             5481.00 -  7.1% : _raw_spin_lock_irqsave
             2132.00 -  2.7% : extract_buf
             1788.00 -  2.3% : copy_user_generic_string
              801.00 -  1.0% : acpi_pm_read
              446.00 -  0.6% : _raw_spin_unlock_irqrestore
              284.00 -  0.4% : __memset
              259.00 -  0.3% : extract_entropy_user

 $ perf record -a -f sleep 1
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.034 MB perf.data (~1467 samples) ]

 $ perf report

 # Samples: 1023
 #
 # Overhead        Command                     Shared Object  Symbol
 # ........  .............  ................................  ......
 #
     4.50%        swapper  [kernel]                          [k] acpi_pm_read
     4.01%        swapper  [kernel]                          [k] delay_tsc
     2.05%           sudo  /lib64/libcrypto.so.0.9.8b        [.] 0x000000000a0549
     1.96%           perf  [kernel]                          [k] vsnprintf
     1.86%        swapper  [kernel]                          [k] test_clear_page_writeback
     1.66%           perf  [kernel]                          [k] format_decode
     1.56%           sudo  /lib64/ld-2.7.so                  [.] do_lookup_x

These examples show *PICTURE PERFECT* forwards ABI compatibility, 
using the ancient perf tool on a bleeding edge kernel.

During the years we migrated across various transformations of the 
subsystem and added tons of features, while maintaining the perf ABI.

I don't know where the whole ABI argument comes from - perf has 
argumably one of the best and most compatible tooling ABIs within 
Linux. I suspect back in the original perf flamewars people made up 
their mind prematurely that it 'cannot' possibly work and never 
changed their mind about it, regardless of reality proving them
wrong ;-)

And yes, the quality of the ABI and tooling cross-compatibility is 
not accidental at all, it is fully intentional and we take great care 
that it stays so. More than that we'll gladly take more 'perf test' 
testcases, for obscure corner-cases that other tools might rely on. 
I.e. we are willing to help external tooling to get their testcases 
built into the kernel repo.

Note that such level of ABI support is arguably clearly overkill for 
instrumentation: which by its very nature tends to migrate to the 
newer versions - still we maintain it because in our opinion good, 
usable tooling should have a good, extensible ABI.

Thanks,

	Ingo
Peter Zijlstra Nov. 8, 2011, 10:32 a.m. UTC | #94
On Tue, 2011-11-08 at 11:22 +0100, Ingo Molnar wrote:
> 
> We do even more than that, the perf ABI is fully backwards *and* 
> forwards compatible: you can run older perf on newer ABIs and newer 
> perf on older ABIs. 

The ABI yes, the tool no, the tool very much relies on some newer ABI
parts. Supporting fallbacks isn't always possible/wanted.
Theodore Ts'o Nov. 8, 2011, 10:41 a.m. UTC | #95
On Nov 8, 2011, at 5:22 AM, Ingo Molnar wrote:

> We do even more than that, the perf ABI is fully backwards *and* 
> forwards compatible: you can run older perf on newer ABIs and newer 
> perf on older ABIs.

It's great to hear that!   But in that case, there's an experiment we can't really run, which is if perf had been developed in a separate tree, would it have been just as successful?

My belief is that perf was successful because *you* and the other perf developers were competent developers, and who got things right.   Not because it was inside the kernel tree.   You've argued that things were much better because it was inside the tree, but that's not actually something we can put to a scientific repeatable experiment.

I will observe that some of the things that caused me to be come enraged by system tap (such as the fact that I simply couldn't even build the damned thing on a non-Red Hat compilation environment, would not have been solved by moving Systemtap into the kernel git tree --- at least not without moving a large number of its external dependencies into the kernel tree as well, such as the elf library, et. al.)   So there is a whole class of problems that were seen in previous tooling systems that were not caused by the fact that they were separate from the kernel, but that they weren't being developed by the kernel developers, so they didn't understand how to make the tool work well for kernel developers.

If we had gone back in time, and had the same set of perf developers working in an external tree, and Systemtap and/or Oprofile had been developed in the kernel tree, would it really have made that much difference?   Sure, Linus and other kernel developers would have yelled at the Systemtap and Oprofile folks more, but I haven't seen that much evidence that they listened to us when they were outside of the kernel tree, and it's not obvious they would have listened with the code being inside the kernel tree.

My claim is that is that outcome wouldn't have been all that different, and that's because the difference was *you*, Ingo Molnar, as a good engineer, would have designed a good backwards compatible ABI whether the code was inside or outside of the kernel, and you would have insisted on good taste and usefulness to kernel programmers whether perf was in our out of the kernel, and you would have insisted on kernel coding guidelines and regular release cycles, even if perf was outside of the kernel.   As Linus sometimes like to say, in many cases it's more about the _people_.

Regards,

-- Ted
Pekka Enberg Nov. 8, 2011, 11:20 a.m. UTC | #96
On Tue, 8 Nov 2011, Theodore Tso wrote:
> It's great to hear that!   But in that case, there's an experiment we 
> can't really run, which is if perf had been developed in a separate 
> tree, would it have been just as successful?

Experiment, eh?

We have the staging tree because it's a widely acknowledged belief that 
kernel code in the tree tends to improve over time compared to code that's 
sitting out of the tree. Are you disputing that belief?

If you don't dispute that, what makes you think the same effect 
doesn't apply to code that looks like Linux code and is developed the same 
way but runs in userspace?

 			Pekka
Theodore Ts'o Nov. 8, 2011, 11:25 a.m. UTC | #97
On Nov 8, 2011, at 6:20 AM, Pekka Enberg wrote:

> We have the staging tree because it's a widely acknowledged belief that kernel code in the tree tends to improve over time compared to code that's sitting out of the tree. Are you disputing that belief?

Kernel code in the kernel source tree improves; because that's where it will eventually end up --- linked against the kernel.

There are all sorts of dynamics in play that don't necessarily apply to userspace code.

Otherwise we could just link in all of the userspace code in a Linux distribution and magically expect it will get better, eh?   Not!

-- Ted
Pekka Enberg Nov. 8, 2011, 11:29 a.m. UTC | #98
On Tue, 8 Nov 2011, Theodore Tso wrote:
>> We have the staging tree because it's a widely acknowledged belief that 
>> kernel code in the tree tends to improve over time compared to code 
>> that's sitting out of the tree. Are you disputing that belief?
>
> Kernel code in the kernel source tree improves; because that's where it 
> will eventually end up --- linked against the kernel.
>
> There are all sorts of dynamics in play that don't necessarily apply to 
> userspace code.
>
> Otherwise we could just link in all of the userspace code in a Linux 
> distribution and magically expect it will get better, eh?   Not!

You just yourself said it's about the people. Why do you now think it's 
about linking against the kernel? I know I have hacked on various parts of 
the kernel that I have never linked to my kernel.

 			Pekka
Frank Ch. Eigler Nov. 8, 2011, 11:31 a.m. UTC | #99
Hi -

On Tue, Nov 08, 2011 at 11:22:35AM +0100, Ingo Molnar wrote:

> [...]  These examples show *PICTURE PERFECT* forwards ABI
> compatibility, using the ancient perf tool on a bleeding edge
> kernel. [...]

Almost: they demonstrate that those parts of the ABI that these
particular perf commands rely on have been impressively compatible.
Do you have any sort of ABI coverage measurement, to see what
parts of the ABI these perf commands do not use?

- FChE
Ingo Molnar Nov. 8, 2011, 11:34 a.m. UTC | #100
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> The ABI yes, the tool no, the tool very much relies on some newer 
> ABI parts. Supporting fallbacks isn't always possible/wanted.

Yeah, sure - and an older tool cannot possibly support newer features 
either.

Thanks,

	Ingo
Pekka Enberg Nov. 8, 2011, 11:39 a.m. UTC | #101
On Tue, 8 Nov 2011, Frank Ch. Eigler wrote:
> Almost: they demonstrate that those parts of the ABI that these
> particular perf commands rely on have been impressively compatible.
> Do you have any sort of ABI coverage measurement, to see what
> parts of the ABI these perf commands do not use?

It's pretty obvious that perf ABI is lacking on that department based on 
Vince's comments, isn't it?  There's an easy fix for this too: improve 
"perf test" to cover the cases you're intested in.  While ABI spec would 
be a nice addition, it's not going to make compatibility problems 
magically go away.

 			Pekka
Ingo Molnar Nov. 8, 2011, 12:07 p.m. UTC | #102
* Vince Weaver <vince@deater.net> wrote:

> On Mon, 7 Nov 2011, Ingo Molnar wrote:

> > I think we needed to do only one revert along the way in the past 
> > two years, to fix an unintended ABI breakage in PowerTop. 
> > Considering the total complexity of the perf ABI our 
> > compatibility track record is *very* good.
> 
> There have been more breakages, as you know.  It's just they 
> weren't caught in time so they were declared to be grandfathered in 
> rather than fixed.

I remember one such instance were you reported a 'regression' that 
spanned several -stable kernel releases - and unless the fix is easy 
and obvious that's the regular upstream treatment.

As Linus said it too on the recent Kernel Summit an ABI is only an 
ABI if it's actually *used*.

But there's more, you've repeatedly rejected our offer to extend 
'perf test' to cover the functionality that your library relies on. 
If you refuse to timely test newer upstream kernels while you rely on 
obscure details that nobody else uses and if you refuse to make your 
testcases more prominent it becomes *your* problem.

There's not much we can do if you refuse to test and refuse to push 
your testcases upstream ...

> > ... and you have argued against perf from the very first day on, 
> > when you were one of the perfmon developers - and IMO in 
> > hindsight you've been repeatedly wrong about most of your design 
> > arguments.
> 
> I can't find an exact e-mail, but I seem to recall my arguments 
> were that Pentium 4 support would be hard (it was), [...]

To the contrary, a single person implemented most of it, out of 
curiosity.

> [...] that in-kernel generalized events were a bad idea (I still 
> think that, try talking to the ARM guys sometime about that) [...]

To the contrary, generalized events work very well and they are one 
of the reasons why the perf tooling is so usable.

> [...] and that making access to raw events hard (by not using a 
> naming library) was silly. [...]

To the contrary, by 'making it easy' you mean 'translate hexa codes 
to vendor specific gibberish' which is hardly any better to actual 
users of the tool and gives the false appearance of being a solution.

All in one you advocated all the oprofile design mistakes and you 
have been proven thoroughly wrong by reality.

> > The PAPI project has the (fundamental) problem that you are still 
> > doing it in the old-style sw design fashion, with many months 
> > long delays in testing, and then you are blaming the problems you 
> > inevitably meet with that model on *us*.
> 
> The fundamental problem with the PAPI project is that we only have 
> 3 full-time developers, and we have to make sure PAPI runs on about 
> 10 different platforms, of which perf_events/Linux is only one.
> 
> Time I waste tracking down perf_event ABI regressions and DoS bugs 
> takes away from actual useful userspace PAPI development.

If people are not interested in even testing the basic test-suite of 
PAPI on a recent kernel then i'm afraid there must be something very 
wrong with the PAPI project structure.

Somehow that testing is not missing from the perf tool, despite it 
being a much younger and smaller project. Did you ever stop to think 
why that is so?

> > There was one PAPI incident i remember where it took you several 
> > *months* to report a regression in a regular PAPI test-case (no 
> > actual app affected as far as i know). No other tester ever ran 
> > the PAPI testcases so nobody else reported it.
> 
> We have a huge userbase.  They run on some pretty amazing machines 
> and do some tests that strain perf libraries to the limit. They 
> also tend to use distro kernels, assuming they even have moved to 
> 2.6.31+ kernels yet.  When these power users report problems, they 
> aren't going to be against the -tip tree.

Nobody expects you to test the -tip tree if you don't want to (it 
would certainly be useful to you if you are interested in PMU 
development), but there's a 2.5 months stabilization window after the 
upstream merge.

> > Nobody but you tests PAPI so you need to become *part* of the 
> > upstream development process, which releases a new upstream 
> > kernel every 3 months.
> 
> PAPI is a free software project, with the devel tree available from 
> CVS. It takes maybe 15 minutes to run the full PAPI regression 
> suite. I encourage you or any perf developer to try it and report 
> any issues.

I will fix what gets reported and neither i nor other regular kernel 
testers actually use it.

You really need to do more testing to fill that gap, expecting others 
to volunteer time into a project they don't actually use is extremely 
backwards...

> I can only be so comprehensive.  I didn't find the current 
> NMI-watchdog regression right away because my git tree builds 
> didn't have it enabled.  It wasn't until there started being 3.0 
> distro kernels that people started reporting the problem to us.
>
> > Also, as i mentioned it several times before, you are free to add 
> > an arbitrary number of ABI test-cases to 'perf test' and we can 
> > promise that we run that. Right now it consists of a few tests:
> 
> as mentioned before I have my own perf_event test suite with 20+ tests.
>   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html

That should probably be moved into perf test. Arnaldo, any 
objections?

> I do run it often.  It tends to be reactionary though, as I can 
> only add a test for a bug once I know about it.
> 
> I also have more up-to date perf documentation than the kernel does:
>   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/programming.html
> 
> and a cpu compatability matrix:
>   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/support.html
> 
> I didn't really want to turn this into yet another perf flamewar.  

So why then did you launch several malicious, unprovoked, 
passive-aggressive ad hominem attacks against perf developers, like:

  "Never overtly.  They're too clever for that."

and:

  "Unlike the perf developers, we *do* have to maintain backwards
   compatability."

? They were untrue, uncalled for, unfair and outright mean-spirited.

Thanks,

	Ingo
Ingo Molnar Nov. 8, 2011, 12:15 p.m. UTC | #103
* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> [...] There's an easy fix for this too: improve "perf test" to 
> cover the cases you're intested in. While ABI spec would be a nice 
> addition, it's not going to make compatibility problems magically 
> go away.

Yes, exactly - 'perf test' has been written with that exact purpose. 
In practice 'perf' will cover almost all parts of the ABI.

The one notable thing that isnt being tested in a natural way is the 
'group of events' abstraction - which, ironically, has been added on 
the perfmon guys' insistence. No app beyond the PAPI self-test makes 
actual use of it though, which results in an obvious lack of testing.

Vince: the code is in tools/perf/builtin-test.c and our offer still 
stands, feel free to extend it. Maybe there's some other volunteer 
willing to do that?

Thanks,

	Ingo
Peter Zijlstra Nov. 8, 2011, 12:20 p.m. UTC | #104
On Tue, 2011-11-08 at 13:15 +0100, Ingo Molnar wrote:
> 
> The one notable thing that isnt being tested in a natural way is the 
> 'group of events' abstraction - which, ironically, has been added on 
> the perfmon guys' insistence. No app beyond the PAPI self-test makes 
> actual use of it though, which results in an obvious lack of testing. 

Also the self monitor stuff, perf-tool doesn't use that for obvious
reasons.
Ingo Molnar Nov. 8, 2011, 12:55 p.m. UTC | #105
* Theodore Tso <tytso@MIT.EDU> wrote:

> 
> On Nov 8, 2011, at 4:32 AM, Ingo Molnar wrote:
> > 
> > No ifs and when about it, these are the plain facts:
> > 
> > - Better features, better ABIs: perf maintainers can enforce clean, 
> >   functional and usable tooling support *before* committing to an 
> >   ABI on the kernel side.
> 
> "We don't have to be careful about breaking interface compatibility 
> while we are developing new features".

See my other mail titled:

	[F.A.Q.] perf ABI backwards and forwards compatibility

the compatibility process works surprisingly well, given the 
complexity and the flux of changes.

From the experience i have with other ABI and feature extension 
efforts, perf ABI compatibility works comparably better, because the 
changes always go together so people can review and notice any ABI 
problems a lot easier than with an artificially fragmented 
tooling/kernel maintenance setup.

I guess you can do well with a split project as well - my main claim 
is that good compatibility comes *naturally* with integration.

Btw., this might explain why iOS and Android is surprisingly 
compatible as well, despite the huge complexity and the huge flux of 
changes on both platforms - versus modular approaches like Windows or 
Linux distros.

> The flip side of this is that it's not obvious when an interface is 
> stable, and when it is still subject to change. [...]

... actual results seem to belie that expectation, right?

> [...]  It makes life much harder for any userspace code that 
> doesn't live in the kernel. [...]

So *that* is the real argument? As long as compatibility is good, i 
don't think why that should be the case.

Did you consider it a possibility that out of tree projects that have 
deep ties to the kernel technically seem to be at a relative 
disadvantage to in-kernel projects because separation is technically 
costly with the costs of separation being larger than the advantages 
of separation?

> [...] And I think we do agree that moving all of userspace into a 
> single git tree makes no sense, right?

I'm inclined to agree that applications that have no connection and 
affinity to the kernel (technically or socially) should not live in 
the kernel repo. (In fact i argue that they should be sandboxed but 
that's another topic .)

But note that there are several OS projects that succeeded doing the 
equivalent of a 'whole world' single Git repo, so i don't think we 
have the basis to claim that it *cannot* work.

> > - We have a shared Git tree with unified, visible version control. I
> >   can see kernel feature commits followed by tooling support, in a
> >   single flow of related commits:
> > 
> >      perf probe: Update perf-probe document
> >      perf probe: Support --del option
> >      trace-kprobe: Support delete probe syntax
> > 
> >   With two separate Git repositories this kind of connection between
> >   the tool and the kernel is inevitably weakened or lost.
> 
> "We don't have to clearly document new interfaces between kernel 
> and userspace, and instead rely on git commit order for people to 
> figure out what's going on with some new interface"

It does not prevent the creation of documentation at all - but i 
argue that the actual *working commits* are more valuable information 
than the documentation.

That inevitably leads to the conclusion that you cannot destroy the 
more valuable information just to artificially promote the creation 
of the less valuable piece of information, right?

> > - Easier development, easier testing: if you work on a kernel 
> >   feature and on matching tooling support then it's *much* easier to
> >   work in a single tree than working in two or more trees in 
> >   parallel. I have worked on multi-tree features before, and except
> >   special exceptions they are generally a big pain to develop.
> 
> I've developed in the split tree systems, and it's really not that 
> hard.  It does mean you have to be explicit about designing 
> interfaces up front, and then you have to have a good, robust way 
> of negotiating what features are in the kernel, and what features 
> are supposed by the userspace --- but if you don't do that then 
> having good backwards and forwards compatibility between different 
> versions of the tool simply doesn't exist.

I actually think that ext4 is a good example at ABI design - and we 
borrowed heavily from that positive experience in the perf.data 
handling code.

But i also worked in other projects where the split design worked a 
lot less smoothly, and arguably ext4 would be *dead* if it had a 
messy interface design: a persistent filesystem cannot under any 
circumstance be messy to survive in the long run.

Other ABIs, not so much, and we are hurting from that.

> So at the end of the day it question is whether you want to be able 
> to (for example) update e2fsck to get better ability to fix more 
> file system corruptions, without needing to upgrade the kernel.  If 
> you want to be able to use a newer, better e2fsck with an older, 
> enterprise kernel, then you have use certain programming 
> disciplines.  That's where the work is, not in whether you have to 
> maintain two git trees or a single git tree.

I demonstrated how this actually works with perf (albeit the 
compatibility requirements are a lot less severe on perf than with a 
persistent, on-disk filesystem), do you accept that example as proof?


> > - We are using and enforcing established quality control and 
> >   coding principles of the kernel project. If we mess up then 
> >   Linus pushes back on us at the last line of defense - and has 
> >   pushed back on us in the past. I think many of the currently 
> >   external kernel utilities could benefit from the resulting rise 
> >   in quality. I've seen separate tool projects degrade into 
> >   barely usable tinkerware - that i think cannot happen to perf, 
> >   regardless of who maintains it in the future.
>
> That's basically saying that if you don't have someone competent 
> managing the git tree and providing quality assurance, life gets 
> hard. [...]

No, it says that we want to *guarantee* that someone competent is 
maintaining it. If me, Peter and Arnaldo gets hit by the same bus or 
crashes with the same airplane then i'm pretty confident that life 
will go on just fine and capable people will pick it up.

With an external project i wouldn't be nearly as sure about that - it 
could be abandonware or could degrade into tinkerware.

Working in groups and structuring that way and relying on the 
infrastructure of a large project is an *advantage* of Linux, why 
should this surprise *you* of all people, hm? :-)


> [...] Sure.  But at the same time, does it scale to move all of 
> userspace under one git tree and depending on Linus to push back?

We don't depend on Linus for every single commit, that would be silly 
and it would not scale.

We depend on Linus depending on someone who depends on someone else 
who depends on someone else. 3 people along that chain would have to 
make the same bad mistake for crap to get to Linus and while it 
happens, we try to keep it as rare as humanly possible.

> I mean, it would have been nice to move all of GNOME 3 under the 
> Linux kernel, so Linus could have pushed back on behalf of all of 
> us power users, [...]

You are starting to make sense ;-)

> [...] but as much as many of us would have appreciated someone 
> being able to push back against the insanity which is the GNOME 
> design process, is that really a good enough excuse to move all of 
> GNOME 3 into the kernel source tree?  :-)

Why not? </joking>

Seriously, if someone gave me a tools/term/ tool that has rudimentary 
xterm functionality with tabbing support, written in pure libdri and 
starting off a basic fbcon console and taking over the full screen, 
i'd switch to it within about 0.5 nanoseconds and would do most of my 
daily coding there and would help out with extending it to more apps 
(starting with a sane mail client perhaps).

I'd not expect the Gnome people to move there against their own good 
judgement - i have no right to do that. (Nor do i think would it be 
possible technically and socially: the culture friction between those 
projects is way too large IMO so it's clearly one of the clear
'HELL NO!' cases for integration.)

But why do you have to think in absolutes and extremes all the time? 
Why not excercise some good case by case judgement about the merits 
of integration versus separation?

> > - Better debuggability: sometimes a combination of a perf
> >   change in combination with a kernel change causes a breakage. I
> >   have bisected the shared tree a couple of times already, instead
> >   of having to bisect a (100,000 commits x 10,000 commits) combined
> >   space which much harder to debug …
> 
> What you are describing happens when someone hasn't been careful 
> about their kernel/userspace interfaces.

What i'm describing is what happens when there are complex bugs that 
interact in unforeseen ways.

> If you have been rigorous with your interfaces, this isn't really 
> an issue.  When's the last time we've had to do a NxM exhaustive 
> testing to find a broken sys call ABI between (for example) the 
> kernel and MySQL?

MySQL relies on very little on complex kernel facilities.

perf on the other hand uses a very complex interface to the kernel 
and extracts way more structured information from the kernel than 
MySQL does.

That's where the whole "is a tool deeply related to the kernel or 
not" judgement call starts mattering.

Also, i think we have a very clear example of split projects *NOT* 
working very well when it comes to NxMxO testing matrix: the whole 
graphics stack ...

You *really* need to acknowledge those very real complications and 
uglies as well when you argue in favor of separation ...

> > - Code reuse: we can and do share source code between the kernel 
> >   and the tool where it makes sense. Both the tooling and the 
> >   kernel side code improves from this. (Often explicit 
> >   librarization makes little sense due to the additional 
> >   maintenance overhead of a split library project and the 
> >   impossibly long latency of how the kernel can rely on the ready 
> >   existence of such a newly created library project.)
> 
> How much significant code really can get shared? [...]

It's relatively minor right now, but there's possibilities:

> [...] Memory allocation is different between kernel and userspace 
> code, how you do I/O is different, error reporting conventions are 
> generally different, etc.  You might have some serialization and 
> deserialization code which is in common, but (surprise!) that's 
> generally part of your interface, which is hopefully relatively 
> stable especially once the tool and the interface has matured.

The KVM tool would like to utilize lockdep for example, to cover 
user-space locks as well. It already uses the semantics of the kernel 
locking primitives:

disk/qcow.c:    mutex_lock(&q->mutex);
disk/qcow.c:            mutex_unlock(&q->mutex);
disk/qcow.c:            mutex_unlock(&q->mutex);
disk/qcow.c:    mutex_unlock(&q->mutex);
disk/qcow.c:    mutex_unlock(&q->mutex);
disk/qcow.c:    mutex_lock(&q->mutex);
disk/qcow.c:            mutex_unlock(&q->mutex);
disk/qcow.c:            mutex_unlock(&q->mutex);
disk/qcow.c:    mutex_unlock(&q->mutex);
disk/qcow.c:    mutex_unlock(&q->mutex);
disk/qcow.c:    mutex_lock(&q->mutex);

... and lockdep would certainly make sense for such type of 
"user-space that emulates hardware" while i don't think we'd ever 
want to go to the overhead of outright librarizing lockdep in an 
external way.

Thanks,

	Ingo
Arnaldo Carvalho de Melo Nov. 8, 2011, 12:56 p.m. UTC | #106
Em Tue, Nov 08, 2011 at 05:21:50AM -0500, Theodore Tso escreveu:
> 
> On Nov 8, 2011, at 4:32 AM, Ingo Molnar wrote:
> > 
> > No ifs and when about it, these are the plain facts:
> > 
> > - Better features, better ABIs: perf maintainers can enforce clean, 
> >   functional and usable tooling support *before* committing to an 
> >   ABI on the kernel side.

> "We don't have to be careful about breaking interface compatibility
> while we are developing new features".

My normal working environment is an MRG PREEMPT_RT kernel (2.6.33.9,
test kernels based on 3.0+) running on enterprise distros while I
develop the userspace part.

So no, at least for me, I don't keep updating the kernel part while
developing userspace.
 
> The flip side of this is that it's not obvious when an interface is
> stable, and when it is still subject to change.  It makes life much
> harder for any userspace code that doesn't live in the kernel.   And I
> think we do agree that moving all of userspace into a single git tree
> makes no sense, right?

Right, but that is the extreme as well, right?
 
> > - We have a shared Git tree with unified, visible version control. I
> >   can see kernel feature commits followed by tooling support, in a
> >   single flow of related commits:
> > 
> >      perf probe: Update perf-probe document
> >      perf probe: Support --del option
> >      trace-kprobe: Support delete probe syntax
> > 
> >   With two separate Git repositories this kind of connection between
> >   the tool and the kernel is inevitably weakened or lost.
 
> "We don't have to clearly document new interfaces between kernel and
> userspace, and instead rely on git commit order for people to figure
> out what's going on with some new interface"

Indeed, documentation is lacking, I think coming from a kernel
standpoint I relied too much in the "documentation is source code"
mantra of old days.

But I realize its a necessity and also that regression testing is as
well another necessity.

I introduced 'perf test' for this later need and rejoice everytime
people submit new test cases, like Jiri and Han did in the past, its
just that we need more of both, documentation and regression testing.

Unfortunately that is not so sexy and I have my hands full not just with
perf :-\
 
> > - Easier development, easier testing: if you work on a kernel 
> >   feature and on matching tooling support then it's *much* easier to
> >   work in a single tree than working in two or more trees in 
> >   parallel. I have worked on multi-tree features before, and except
> >   special exceptions they are generally a big pain to develop.
 
> I've developed in the split tree systems, and it's really not that
> hard.  It does mean you have to be explicit about designing interfaces
> up front, and then you have to have a good, robust way of negotiating
> what features are in the kernel, and what features are supposed by the
> userspace --- but if you don't do that then having good backwards and
> forwards compatibility between different versions of the tool simply
> doesn't exist.
 
> So at the end of the day it question is whether you want to be able to
> (for example) update e2fsck to get better ability to fix more file
> system corruptions, without needing to upgrade the kernel.   If you
> want to be able to use a newer, better e2fsck with an older,
> enterprise kernel, then you have use certain programming disciplines.
> That's where the work is, not in whether you have to maintain two git
> trees or a single git tree.

But it can as well be achieved with a single tree, or do you think
having a single tree makes that impossible to achieve? As I said I do
development basically using the split model at least for testing new
tools on older kernels.

People using the tools while developing mostly the kernel or both
kperf/uperf components do the test on the combined kernel + perf
sources.
 
> > - We are using and enforcing established quality control and coding
> >   principles of the kernel project. If we mess up then Linus pushes
> >   back on us at the last line of defense - and has pushed back on us
> >   in the past. I think many of the currently external kernel
> >   utilities could benefit from the resulting rise in quality.
> >   I've seen separate tool projects degrade into barely usable
> >   tinkerware - that i think cannot happen to perf, regardless of who
> >   maintains it in the future.
 
> That's basically saying that if you don't have someone competent
> managing the git tree and providing quality assurance, life gets hard.
> Sure.   But at the same time, does it scale to move all of userspace
> under one git tree and depending on Linus to push back? 

8 or 80 again :-\
 
> I mean, it would have been nice to move all of GNOME 3 under the Linux
> kernel, so Linus could have pushed back on behalf of all of us power

Sheesh, all of gnome? How closely related and used in kernel development
is gnome? gnome 3?

> users, but as much as many of us would have appreciated someone being
> able to push back against the insanity which is the GNOME design
> process, is that really a good enough excuse to move all of GNOME 3
> into the kernel source tree?   :-)

No, but again, you're taking it to the extreme.

- Arnaldo
Ingo Molnar Nov. 8, 2011, 12:59 p.m. UTC | #107
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2011-11-08 at 13:15 +0100, Ingo Molnar wrote:
> > 
> > The one notable thing that isnt being tested in a natural way is 
> > the 'group of events' abstraction - which, ironically, has been 
> > added on the perfmon guys' insistence. No app beyond the PAPI 
> > self-test makes actual use of it though, which results in an 
> > obvious lack of testing.
> 
> Also the self monitor stuff, perf-tool doesn't use that for obvious 
> reasons.

Indeed, and that's PAPI's strong point.

We could try to utilize it via some clever LD_PRELOAD trickery?

Adding a testcase for every bug that can be triggered via tooling 
would definitely be an improvement as well - those kinds of testcases 
generally tend to map out the really important bits faster than an 
attempt at exhaustive testing.

Thanks,

	Ingo
Arnaldo Carvalho de Melo Nov. 8, 2011, 1:08 p.m. UTC | #108
Em Tue, Nov 08, 2011 at 01:07:55PM +0100, Ingo Molnar escreveu:
> * Vince Weaver <vince@deater.net> wrote:
> > as mentioned before I have my own perf_event test suite with 20+ tests.
> >   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html
 
> That should probably be moved into perf test. Arnaldo, any 
> objections?

I'd gladly take patches, I even have in my TODO list for me to volunteer
time to do that at some point.

If somebody else than me or Vince wants to do that... Assuming there is
no licensing problem and Vince doesn't objects for that to be done.

I know that at least the QE team at Red Hat uses it and I hope other QE
teams do it.

- Arnaldo
Karel Zak Nov. 8, 2011, 1:29 p.m. UTC | #109
On Mon, Nov 07, 2011 at 03:12:28PM +0200, Pekka Enberg wrote:
> On Mon, Nov 7, 2011 at 2:47 PM, Ted Ts'o <tytso@mit.edu> wrote:
> > I don't think perf should be used as a precendent that now argues that
> > any new kernel utility should be moved into the kernel sources.  Does
> > it make sense to move all of mount, fsck, login, etc., into the kernel
> > sources?  There are far more kernel tools outside of the kernel
> > sources than inside the kernel sources.

[...]

> I don't know if it makes sense to merge the tools you've mentioned above.
> My gut feeling is that it's probably not reasonable - there's already a
> community working on it with their own development process and coding
> style. I don't think there's a simple answer to this but I don't agree with
> your rather extreme position that all userspace tools should be kept out
> of the kernel tree.

Ted's position is not extreme. He follows the simple and exactly defined
border between userspace and kernel. The native userspace feature is
variability and substitutability.

The util-linux package is really nice example:

  - you don't have to use it, you can use busybox

  - we have currently three implementation of login(1), many getty 
    implementations, etc.

  - it's normal that people use the latest util-linux releases with very 
    old kernels (in year 2008 I had report from person with kernel 2.4:-)

  - userspace is very often about portability -- it's crazy, but some people
    use some utils from util-linux on Hurd, Solaris and BSD (including very
    Linux specific things like mkswap and hwclock)


Anyway, I agree that small one-man projects are ineffective for
important system tools -- it's usually better to merge things into
large projects with reliable infrastructure and alive community (here
I agree with Lennart's idea to have 3-5 projects for whole low-level
userspace). 

    Karel
Gerd Hoffmann Nov. 8, 2011, 1:40 p.m. UTC | #110
Hi,

> Indeed, documentation is lacking, I think coming from a kernel
> standpoint I relied too much in the "documentation is source code"
> mantra of old days.

Sorry for the shameless plug, but as you are speaking of lacking
documentation:  Where the heck is the perf config file documented, other
than source code?  Reading the parser to figure how the config file is
supposed to look like really isn't fun :(

I'm looking for a way to disable the colors in the perf report tui.  Or
configure them into something readable.  No, light green on light gray
which is used by default isn't readable.

thanks,
  Gerd
Pekka Enberg Nov. 8, 2011, 2:30 p.m. UTC | #111
On Tue, Nov 8, 2011 at 3:29 PM, Karel Zak <kzak@redhat.com> wrote:
>> I don't know if it makes sense to merge the tools you've mentioned above.
>> My gut feeling is that it's probably not reasonable - there's already a
>> community working on it with their own development process and coding
>> style. I don't think there's a simple answer to this but I don't agree with
>> your rather extreme position that all userspace tools should be kept out
>> of the kernel tree.
>
> Ted's position is not extreme. He follows the simple and exactly defined
> border between userspace and kernel. The native userspace feature is
> variability and substitutability.

It's an extreme position because he's arguing that we should only have
kernel code in the tree or we need open up to all userspace code.

                        Pekka
Arnaldo Carvalho de Melo Nov. 8, 2011, 2:32 p.m. UTC | #112
Em Tue, Nov 08, 2011 at 02:40:42PM +0100, Gerd Hoffmann escreveu:
> > Indeed, documentation is lacking, I think coming from a kernel
> > standpoint I relied too much in the "documentation is source code"
> > mantra of old days.
 
> Sorry for the shameless plug, but as you are speaking of lacking

Thank you! Its easier when I get the questions for specific problems in
the documentation :-)

> documentation:  Where the heck is the perf config file documented, other
> than source code?  Reading the parser to figure how the config file is
> supposed to look like really isn't fun :(
 
> I'm looking for a way to disable the colors in the perf report tui.  Or
> configure them into something readable.  No, light green on light gray
> which is used by default isn't readable.

That was fixed in 3.2-rc1, where also we have:

[acme@felicio linux]$ cat tools/perf/Documentation/perfconfig.example
[colors]

	# These were the old defaults
	top = red, lightgray
	medium = green, lightgray
	normal = black, lightgray
	selected = lightgray, magenta
	code = blue, lightgray

[tui]

	# Defaults if linked with libslang
	report = on
	annotate = on
	top = on

[buildid]

	# Default, disable using /dev/null
	dir = /root/.debug
[acme@felicio linux]$

So you can use:

[tui]

	report = off

To disable the TUI altogether or use:

$ perf report --stdio

Or tweak the colors to your liking.

By default the TUI now uses whatever color is configured for your xterm,
not something fixed as in the past, which was a common source of
complaints, that, unfortunately I only heard indirectly :-\

Ah, if you still need to configure the colors, use "default" so that it
will use whatever is the color configured in your
xterm/gnome-terminal/whatever profile.

For reference, the default set of colors now is (from
tools/perf/util/ui/browser.c):

static struct ui_browser__colorset {
        const char *name, *fg, *bg;
        int colorset;
} ui_browser__colorsets[] = {
        {
                .colorset = HE_COLORSET_TOP,
                .name     = "top",
                .fg       = "red",
                .bg       = "default",
        },
        {
                .colorset = HE_COLORSET_MEDIUM,
                .name     = "medium",
                .fg       = "green",
                .bg       = "default",
        },
        {
                .colorset = HE_COLORSET_NORMAL,
                .name     = "normal",
                .fg       = "default",
                .bg       = "default",
        },
        {
                .colorset = HE_COLORSET_SELECTED,
                .name     = "selected",
                .fg       = "black",
                .bg       = "lightgray",
        },
        {
                .colorset = HE_COLORSET_CODE,
                .name     = "code",
                .fg       = "blue",
                .bg       = "default",
        },

It should all be fixed up now, together with many other improvements
that should make the TUI and stdio default user experience similar up
till you start using the navigation keys to do things that only are
possible with a TUI, like folding/unfolding callchains, etc.

Please let me know about any other problem you may find with it!

- Arnaldo
Avi Kivity Nov. 8, 2011, 2:41 p.m. UTC | #113
On 11/06/2011 03:35 AM, Alexander Graf wrote:
> To quickly get going, just execute the following as user:
>
>     $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash
>
> This will drop you into a shell on your rootfs.
>

Doesn't work on Fedora 15.  F15's qemu-kvm doesn't have -machine or
-virtfs.  Even qemu.git on F15 won't build virtfs since xattr.h
detection is broken (patch posted).
Christoph Hellwig Nov. 8, 2011, 2:52 p.m. UTC | #114
On Tue, Nov 08, 2011 at 04:41:40PM +0200, Avi Kivity wrote:
> On 11/06/2011 03:35 AM, Alexander Graf wrote:
> > To quickly get going, just execute the following as user:
> >
> >     $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash
> >
> > This will drop you into a shell on your rootfs.
> >
> 
> Doesn't work on Fedora 15.  F15's qemu-kvm doesn't have -machine or
> -virtfs.  Even qemu.git on F15 won't build virtfs since xattr.h
> detection is broken (patch posted).

Nevermind that running virtfs as a rootfs is a really dumb idea.  You
do now want to run a VM that has a rootfs that gets changed all the
time behind your back.

Running qemu -snapshot on the actual root block device is the only
safe way to reuse the host installation, although it gets a bit
complicated if people have multiple devices mounted into the namespace.
Sasha Levin Nov. 8, 2011, 2:55 p.m. UTC | #115
On Tue, Nov 8, 2011 at 4:52 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Nov 08, 2011 at 04:41:40PM +0200, Avi Kivity wrote:
>> On 11/06/2011 03:35 AM, Alexander Graf wrote:
>> > To quickly get going, just execute the following as user:
>> >
>> >     $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash
>> >
>> > This will drop you into a shell on your rootfs.
>> >
>>
>> Doesn't work on Fedora 15.  F15's qemu-kvm doesn't have -machine or
>> -virtfs.  Even qemu.git on F15 won't build virtfs since xattr.h
>> detection is broken (patch posted).
>
> Nevermind that running virtfs as a rootfs is a really dumb idea.  You
> do now want to run a VM that has a rootfs that gets changed all the
> time behind your back.
>
> Running qemu -snapshot on the actual root block device is the only
> safe way to reuse the host installation, although it gets a bit
> complicated if people have multiple devices mounted into the namespace.

Using block devices also requires root.
Avi Kivity Nov. 8, 2011, 2:57 p.m. UTC | #116
On 11/08/2011 04:52 PM, Christoph Hellwig wrote:
> On Tue, Nov 08, 2011 at 04:41:40PM +0200, Avi Kivity wrote:
> > On 11/06/2011 03:35 AM, Alexander Graf wrote:
> > > To quickly get going, just execute the following as user:
> > >
> > >     $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash
> > >
> > > This will drop you into a shell on your rootfs.
> > >
> > 
> > Doesn't work on Fedora 15.  F15's qemu-kvm doesn't have -machine or
> > -virtfs.  Even qemu.git on F15 won't build virtfs since xattr.h
> > detection is broken (patch posted).
>
> Nevermind that running virtfs as a rootfs is a really dumb idea.  You
> do now want to run a VM that has a rootfs that gets changed all the
> time behind your back.

True.

> Running qemu -snapshot on the actual root block device is the only
> safe way to reuse the host installation, although it gets a bit
> complicated if people have multiple devices mounted into the namespace.

How is -snapshot any different?  If the host writes a block after the
guest has been launched, but before that block was cowed, then the guest
will see the new block.

It could work with a btrfs snapshot, but not everyone uses that.
Christoph Hellwig Nov. 8, 2011, 2:59 p.m. UTC | #117
On Tue, Nov 08, 2011 at 04:57:04PM +0200, Avi Kivity wrote:
> > Running qemu -snapshot on the actual root block device is the only
> > safe way to reuse the host installation, although it gets a bit
> > complicated if people have multiple devices mounted into the namespace.
> 
> How is -snapshot any different?  If the host writes a block after the
> guest has been launched, but before that block was cowed, then the guest
> will see the new block.

Right, thinko - qemu's snapshots are fairly useless due to sitting
ontop of the file to be modified.

> It could work with a btrfs snapshot, but not everyone uses that.

Or LVM snapshot.  Either way, just reusing the root fs without care
is a dumb idea, and I really don't want any tool or script that
encurages such braindead behaviour in the kernel tree.
Jan Kiszka Nov. 8, 2011, 3:04 p.m. UTC | #118
On 2011-11-08 15:52, Christoph Hellwig wrote:
> On Tue, Nov 08, 2011 at 04:41:40PM +0200, Avi Kivity wrote:
>> On 11/06/2011 03:35 AM, Alexander Graf wrote:
>>> To quickly get going, just execute the following as user:
>>>
>>>     $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash
>>>
>>> This will drop you into a shell on your rootfs.
>>>
>>
>> Doesn't work on Fedora 15.  F15's qemu-kvm doesn't have -machine or
>> -virtfs.  Even qemu.git on F15 won't build virtfs since xattr.h
>> detection is broken (patch posted).
> 
> Nevermind that running virtfs as a rootfs is a really dumb idea.  You
> do now want to run a VM that has a rootfs that gets changed all the
> time behind your back.
> 
> Running qemu -snapshot on the actual root block device is the only
> safe way to reuse the host installation, although it gets a bit
> complicated if people have multiple devices mounted into the namespace.

I thought about this while hacking a slide on this topic: It's clumsy
(compared to -snapshot - my favorite one as well), but you could use
some snapshot on the host fs. Or a union fs (if we had  an official one)
with the write layer directed to some tmpfs area.

But what we likely rather want (as it would work without privileges) is
built-in write redirection for virtfs. Not an expert on this, but I
guess that will have to solve the same problems an in-kernel union fs
solution faces, no?

Jan
Pekka Enberg Nov. 8, 2011, 3:26 p.m. UTC | #119
On Tue, Nov 8, 2011 at 4:52 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Nevermind that running virtfs as a rootfs is a really dumb idea.  You
> do now want to run a VM that has a rootfs that gets changed all the
> time behind your back.

It's rootfs binaries that are shared, not configuration. It's
unfortunate but works OK for the single user use case it's meant for.
It's obviously not a proper solution for the generic case. We were
hoping that we could use something like overlayfs to hide the issue
under the rug. Do you think that's also a really dumb thing to do?

Using block device snapshotting would be interesting and we should
definitely look into that.

                                Pekka
Christoph Hellwig Nov. 8, 2011, 3:28 p.m. UTC | #120
On Tue, Nov 08, 2011 at 05:26:03PM +0200, Pekka Enberg wrote:
> On Tue, Nov 8, 2011 at 4:52 PM, Christoph Hellwig <hch@infradead.org> wrote:
> > Nevermind that running virtfs as a rootfs is a really dumb idea. ?You
> > do now want to run a VM that has a rootfs that gets changed all the
> > time behind your back.
> 
> It's rootfs binaries that are shared, not configuration. It's
> unfortunate but works OK for the single user use case it's meant for.
> It's obviously not a proper solution for the generic case. We were
> hoping that we could use something like overlayfs to hide the issue
> under the rug. Do you think that's also a really dumb thing to do?

It doesn't hide your issues.  Any kind of unioning will have massive
consistency issues (as in will corrupt your fs if you do stupid things)
if the underlying layer is allowed to be written to.  Thus all the
fuzz about making sure the underlying fs can never be mounted writeable
in the union mount patches.
Gerd Hoffmann Nov. 8, 2011, 3:38 p.m. UTC | #121
Hi,

>> documentation:  Where the heck is the perf config file documented, other
>> than source code?  Reading the parser to figure how the config file is
>> supposed to look like really isn't fun :(
>  
>> I'm looking for a way to disable the colors in the perf report tui.  Or
>> configure them into something readable.  No, light green on light gray
>> which is used by default isn't readable.
> 
> That was fixed in 3.2-rc1, where also we have:

Very cutting edge.  /me pulls.

> [acme@felicio linux]$ cat tools/perf/Documentation/perfconfig.example

Present now, thanks.

> [colors]
> 
> 	# These were the old defaults
> 	top = red, lightgray
> 	medium = green, lightgray
> 	normal = black, lightgray
> 	selected = lightgray, magenta
> 	code = blue, lightgray

Seems to have no effect, guess the distro perf binary is too old for
that (RHEL-6).

> [tui]
> 
> 	report = off

That works.  I don't want turn off the tui altogether though, I actually
like the interactive expanding+collapsing of the call graphs.  I just
want turn off the colors.

perf_color_default_config() in util/color.c seems to lookup a "color.ui"
config variable.  Can I set that somehow?  Tried ui= in a [color]
section -- no effect.

> By default the TUI now uses whatever color is configured for your xterm,
> not something fixed as in the past, which was a common source of
> complaints, that, unfortunately I only heard indirectly :-\
> 
> Ah, if you still need to configure the colors, use "default" so that it
> will use whatever is the color configured in your
> xterm/gnome-terminal/whatever profile.
> 
> For reference, the default set of colors now is (from
> tools/perf/util/ui/browser.c):
> 
> static struct ui_browser__colorset {
>         const char *name, *fg, *bg;
>         int colorset;
> } ui_browser__colorsets[] = {
>         {
>                 .colorset = HE_COLORSET_TOP,
>                 .name     = "top",
>                 .fg       = "red",
>                 .bg       = "default",

Bad idea IMO.  Setting only one of foreground+background gives pretty
much unpredictable results.  My xterms have different background colors,
the ones with a root shell happen to have a (dark) red background.
Which results in red-on-dark-red text.  Not good.

I'd strongly suggest to either set both background and foreground to
default or to set both to a specific color.  When doing the latter make
sure the colors have enougth contrast so they are readable.

cheers,
  Gerd
Steven Rostedt Nov. 8, 2011, 3:43 p.m. UTC | #122
On Tue, Nov 08, 2011 at 10:32:25AM +0100, Ingo Molnar wrote:
> 
> None of the perf developers with whom i'm working complained about 
> the shared repo so far - publicly or privately. By all means they are 
> enjoying it and if you look at the stats and results you'll agree 
> that they are highly productive working in that environment.

Just because you brought it up.

I personally find it awkward to work in the linux tools directory. Maybe
this is the reason that I haven't been such a big contributor of perf. I
only pushed ktest into the kernel tools directory because people
convinced me to do so. Having it there didn't seem to bring in many
other developers. Only one other person has contributed to me, and that
was just some minor changes. I still find it awkward to work on ktest
inside the kernel. I have a separate tree just for ktest, and that means
I have all the kernel files sitting there doing nothing just to be able
to work on 2 files.

Then there's the issue of waiting for Linus to pull from me. I posted my
patch set on Oct 28th, and it didn't make it into the merge window. I
don't know if Linus had an issue with it, or it just got lost in the
noise, as Linus has a lot of other things to worry about. This brings up
another question. Does Linus scale? Having more tools in the kernel
repo requires Linus to pull from more sources. Or are we just going to
have to have a "tools" maintainer. This will give a lot of control to
that person who is the gate keeper of the tools directory.

Now I've kept trace-cmd and kernelshark outside the kernel tree. I've
received lots of patches from other developers for it and some nice new
features. It requires me to think hard to keep a nice ABI, and it has
been working nicely. The event parsing is working well and there's even
a library. But I haven't pushed it too hard because I want this to apply
to perf as well. But due to disagreements of where in the kernel tree it
belongs, it has been over a year with no progress. Now we waste 4 bytes
for every event recording a non existent big kernel lock counter. For
recording a million events (which is actually low) that's 4Megs of
wasted kernel memory. New tracepoints are going into the kernel all the
time, and without a library, we are increasing the chance that more
tools will break on changes, and tracepoints will lock down kernel
inovation soon if something is not done.

Anyway, I'm having surgery tomorrow and have other things to work on.

-- Steve
Arnaldo Carvalho de Melo Nov. 8, 2011, 4:13 p.m. UTC | #123
Em Tue, Nov 08, 2011 at 04:38:48PM +0100, Gerd Hoffmann escreveu:
> Seems to have no effect, guess the distro perf is too old (RHEL-6).

> > [tui]
> > 	report = off

> That works.  I don't want turn off the tui altogether though, I actually
> like the interactive expanding+collapsing of the call graphs.  I just
> want turn off the colors.

> perf_color_default_config() in util/color.c seems to lookup a "color.ui"
> config variable.  Can I set that somehow?  Tried ui= in a [color]
> section -- no effect.

Ouch, that came from the code initialy stolen^Wcopied from git :-\

I don't think that will have any effect :-\
 
> > Ah, if you still need to configure the colors, use "default" so that it
> > will use whatever is the color configured in your
> > xterm/gnome-terminal/whatever profile.

> > For reference, the default set of colors now is:

> >                 .colorset = HE_COLORSET_TOP,
> >                 .name     = "top",
> >                 .fg       = "red",
> >                 .bg       = "default",

> Bad idea IMO.  Setting only one of foreground+background gives pretty
> much unpredictable results.  My xterms have different background colors,
> the ones with a root shell happen to have a (dark) red background.
> Which results in red-on-dark-red text.  Not good.

> I'd strongly suggest to either set both background and foreground to
> default or to set both to a specific color.  When doing the latter make

That is the case for the normal one, two colorsets below the
HE_COLORSET_TOP one.

Humm, certainly there could be logic to figure it out if background ==
foreground and do something about it.

> sure the colors have enougth contrast so they are readable.

Problem is figuring out something that is considered a good default :-\
There will always be somebody that will complain.

When doing the coding to allow using the default xterm colors I tried
several of the gnome-terminal xterm profiles and all looked kinda sane
for the "top" (hottest functions, with most hits) and "medium" lines,
where we combine some chosen foreground color ("red" and "green").

Laziest solution would be: If the user customizes that much, could the
user please customize this as well? :-)

- Arnaldo
Theodore Ts'o Nov. 8, 2011, 4:33 p.m. UTC | #124
On Tue, Nov 08, 2011 at 01:55:09PM +0100, Ingo Molnar wrote:
> I guess you can do well with a split project as well - my main claim 
> is that good compatibility comes *naturally* with integration.

Here I have to disagree; my main worry is that integration makes it
*naturally* easy for people to skip the hard work needed to keep a
stable kernel/userspace interface.

The other worry which I've mentioned, but which I haven't seen
addressed, is that the even if you can use a perf from a newer kernel
with an older kernel, this causes distributions a huge amount of pain,
since they have to package two different kernel source packages, and
only compile perf from the newer kernel source package.  This leads to
all sorts of confusion from a distribution packaging point of view.

For example, assume that RHEL 5, which is using 2.6.32 or something
like that, wants to use a newer e2fsck that does a better job fixing
file system corruptions.  If it were bundled with the kernel, then
they would have to package up the v3.1 kernel sources, and have a
source RPM that isn't used for building kernel sources, but just to
build a newer version of e2fsck.  Fortunately, they don't have to do
that.  They just pull down a newer version of e2fsprogs, and package,
build, test, and ship that.

In addition, suppose Red Hat ships a security bug fix which means a
new kernel-image RPM has to be shipped.  Does that mean that Red Hat
has to ship new binary RPM's for any and all tools/* programs that
they have packaged as separate RPM's?  Or should installing a new
kernel RPM also imply dropping new binaries in /usr/bin/perf, et. al?
There are all sorts of packaging questions that are raised
integration, and from where I sit I don't think they've been
adequately solved yet.


> Did you consider it a possibility that out of tree projects that have 
> deep ties to the kernel technically seem to be at a relative 
> disadvantage to in-kernel projects because separation is technically 
> costly with the costs of separation being larger than the advantages 
> of separation?

As the e2fsprogs developer, I live with the costs all the time; I can
testify to the facy that they are very slight.  Occasionally I have to
make parallel changes to fs/ext4/ext4.h in the kernel and
lib/ext2fs/ext2fs.h in e2fsprogs, and we use various different
techniques to detect whether the ext4 kernel code supports a
particular feature (we use the presence or absence of some sysfs
files), but it's really not been hard for us.

> But note that there are several OS projects that succeeded doing the 
> equivalent of a 'whole world' single Git repo, so i don't think we 
> have the basis to claim that it *cannot* work.

There have indeed, and there has speculation that this was one of many
contributions to why they lost out in the popularity and adoption
competition with Linux.  (Specifically, the reasoning goes that the
need to package up the kernel plus userspace meant that we had
distributions in the Linux ecosystem, and the competition kept
everyone honest.  If one distribution started making insane decisions,
whether it's forcing Unity on everyone, or forcing GNOME 3 on
everyone, it's always possible to switch to another distribution.  The
*BSD systems didn't have that safety valve....)


> But why do you have to think in absolutes and extremes all the time? 
> Why not excercise some good case by case judgement about the merits 
> of integration versus separation?

I agree that there are tradeoffs to both approaches, and I agree that
case by case judgement is something that should be done.  One of the
reasons why I've spent a lot of time pointing out the downsides of
integration and the shortcomings in the integration position is that
I've seen advocates claiming that the fact that was perf was
integrated was a precedent that meant that choice for kvm-tool was
something that should not be questioned since tools/perf justified
anything they wanted to do, and that if we wanted to argue about
whether kvm-tool should have been bundled into the kernel, we should
made different decisions about perf.

Regards,

						- Ted
Anca Emanuel Nov. 8, 2011, 5:14 p.m. UTC | #125
@Ten Ts'o: you are sponsored by something like microsoft (joking) ?
Stop trolling. If you are not familiar with perf, or other tools, save
your time and do some useful things.
Alexander Graf Nov. 8, 2011, 5:34 p.m. UTC | #126
On 11/08/2011 03:59 PM, Christoph Hellwig wrote:
> On Tue, Nov 08, 2011 at 04:57:04PM +0200, Avi Kivity wrote:
>>> Running qemu -snapshot on the actual root block device is the only
>>> safe way to reuse the host installation, although it gets a bit
>>> complicated if people have multiple devices mounted into the namespace.
>> How is -snapshot any different?  If the host writes a block after the
>> guest has been launched, but before that block was cowed, then the guest
>> will see the new block.
> Right, thinko - qemu's snapshots are fairly useless due to sitting
> ontop of the file to be modified.
>
>> It could work with a btrfs snapshot, but not everyone uses that.
> Or LVM snapshot.  Either way, just reusing the root fs without care
> is a dumb idea, and I really don't want any tool or script that
> encurages such braindead behaviour in the kernel tree.

Heh, yeah, the intent was obviously to have a separate rootfs tree 
somewhere in a directory. But that's not available at first when running 
this, so I figured for a simple "get me rolling" FAQ directing the 
guest's rootfs to / at least gets you somewhere (especially when run as 
user with init=/bin/bash).

Alex
Avi Kivity Nov. 8, 2011, 5:36 p.m. UTC | #127
On 11/08/2011 07:34 PM, Alexander Graf wrote:
>>
>>> It could work with a btrfs snapshot, but not everyone uses that.
>> Or LVM snapshot.  Either way, just reusing the root fs without care
>> is a dumb idea, and I really don't want any tool or script that
>> encurages such braindead behaviour in the kernel tree.
>
>
> Heh, yeah, the intent was obviously to have a separate rootfs tree
> somewhere in a directory. But that's not available at first when
> running this, so I figured for a simple "get me rolling" FAQ directing
> the guest's rootfs to / at least gets you somewhere (especially when
> run as user with init=/bin/bash).
>

Right, init=/bin/bash is not too insane for rootfs passthrough.

/proc will be completely broken though, need to mount the guest's.
Theodore Ts'o Nov. 8, 2011, 7:24 p.m. UTC | #128
On Tue, Nov 08, 2011 at 07:14:57PM +0200, Anca Emanuel wrote:
> @Ten Ts'o: you are sponsored by something like microsoft (joking) ?
> Stop trolling. If you are not familiar with perf, or other tools, save
> your time and do some useful things.

I am quite familiar with perf.  A disagreement with how things are
done is not trolling.

					- Ted
John Kacur Nov. 8, 2011, 9:15 p.m. UTC | #129
On Tue, 8 Nov 2011, Ted Ts'o wrote:

> On Tue, Nov 08, 2011 at 01:55:09PM +0100, Ingo Molnar wrote:
> > I guess you can do well with a split project as well - my main claim 
> > is that good compatibility comes *naturally* with integration.
> 
> Here I have to disagree; my main worry is that integration makes it
> *naturally* easy for people to skip the hard work needed to keep a
> stable kernel/userspace interface.
> 
> The other worry which I've mentioned, but which I haven't seen
> addressed, is that the even if you can use a perf from a newer kernel
> with an older kernel, this causes distributions a huge amount of pain,
> since they have to package two different kernel source packages, and
> only compile perf from the newer kernel source package.  This leads to
> all sorts of confusion from a distribution packaging point of view.
> 
> For example, assume that RHEL 5, which is using 2.6.32 or something
> like that, wants to use a newer e2fsck that does a better job fixing
> file system corruptions.  If it were bundled with the kernel, then
> they would have to package up the v3.1 kernel sources, and have a
> source RPM that isn't used for building kernel sources, but just to
> build a newer version of e2fsck.  Fortunately, they don't have to do
> that.  They just pull down a newer version of e2fsprogs, and package,
> build, test, and ship that.
> 
> In addition, suppose Red Hat ships a security bug fix which means a
> new kernel-image RPM has to be shipped.  Does that mean that Red Hat
> has to ship new binary RPM's for any and all tools/* programs that
> they have packaged as separate RPM's?  Or should installing a new
> kernel RPM also imply dropping new binaries in /usr/bin/perf, et. al?
> There are all sorts of packaging questions that are raised
> integration, and from where I sit I don't think they've been
> adequately solved yet.
>
 
This in practice is not a big deal.

There are many approaches for how the RPM can be built, but basically
getting the perf source is just a matter of
make perf-tar-src-pkg or friends such as
make perf-tarbz2-src-pkg
which will create perf-3.2.0-rc1.tar, and perf-3.2.0-rc1.tar.bz2
respectively which can be used for the src rpms. This tar ball can be used
as a separate package or subpackage.

Thanks
Vince Weaver Nov. 9, 2011, 6:04 a.m. UTC | #130
On Tue, 8 Nov 2011, Arnaldo Carvalho de Melo wrote:

> Em Tue, Nov 08, 2011 at 01:07:55PM +0100, Ingo Molnar escreveu:
> > * Vince Weaver <vince@deater.net> wrote:
> > > as mentioned before I have my own perf_event test suite with 20+ tests.
> > >   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html
>  
> > That should probably be moved into perf test. Arnaldo, any 
> > objections?
> 
> I'd gladly take patches, I even have in my TODO list for me to volunteer
> time to do that at some point.
> 
> If somebody else than me or Vince wants to do that... Assuming there is
> no licensing problem and Vince doesn't objects for that to be done.

I have no objections, though I don't really have time right now to do the 
work myself.

The test code is licensed dual GPLv2/BSD.  I should stick that in the 
package somewhere if I haven't already.

My testcases mostly are testing things necessary for proper PAPI 
functionality and are by no means complete.  There are huge
areas of perf_event functionality that are not well tested, especially
the overflow code.

Vince
Ingo Molnar Nov. 9, 2011, 8:23 a.m. UTC | #131
* Ted Ts'o <tytso@mit.edu> wrote:

> On Tue, Nov 08, 2011 at 01:55:09PM +0100, Ingo Molnar wrote:
>
> > I guess you can do well with a split project as well - my main 
> > claim is that good compatibility comes *naturally* with 
> > integration.
> 
> Here I have to disagree; my main worry is that integration makes it 
> *naturally* easy for people to skip the hard work needed to keep a 
> stable kernel/userspace interface.

There's two observations i have:

Firstly, how come that this has not actually happened in practice in 
the case of perf? Looks like the (random) version compatibility 
experiment i conducted yesterday should have failed spectacularly.

Secondly, within the kernel we don't have a stable ABI - we don't 
even have stable APIs, and still it's a 15 MLOC project that is 
thriving.

I argue that it is thriving in large part *BECAUSE* we don't have a 
stable API of any sort: if stuff is broken and the whole world needs 
to be fixed then we fix the whole world.

One could even make the argument that in the special case of deeply 
kernel integrated tools a stable kernel/userspace interface for those 
special, Linux-specific ABIs is *too expensive* and results in an 
inferior end result.

I'd really love it if people started thinking outside the box a bit. 
Why do people assume that *all* of the kernel project's code *has* to 
run in kernel mode? It's not a valid technical restriction *at all*.

"It has been done like this for 30 years" is not a valid technical 
restriction. Splitting deeply kernel related tools away from the 
kernel was a valid decision 15 years ago due to kernel image size and 
similar resource considerations. Today it's less and less true and we 
are *actively hurting* from tools being split away from the kernel 
proper.

Graphics, storage and user-space suspend are good examples i think of 
separation gone bad: and the resulting mess has cost Linux distros 
*the desktop market*. Think about it, the price we pay for this 
inferior end result is huge.

ext4tools is an example of separation gone good. I think it's the 
exception that strengthens the rule.

Why was the 2.4 to 2.6 migration so difficult? I can tell you the 
distro side story: mainly because the release took too long and tools 
broke left and right which created stop-ship situations. We had a 
much larger ABI cross section than we could sanely handle with the 
testing power we had. So we got into a negative feedback loop: the 
reduction in 2.3 testers further delayed the release, which moved the 
(independently evolving ...) tools further away from the to-be-2.6 
kernel, which further reduced the effective testing. It was not a 
sustainable. We addressed many of the problems by shortening the 
release cycle to 3 months, but IMHO we have not addressed the 
underlying problem of lack of integration.

Responsible release engineering is actually *easier* if you don't 
have a moving target and if you have the ability to fix stuff that 
breaks without being bound to an external project.

Deeply kernel integrated tools could come in the initrd and could be 
offered by the kernel, statically linked images made available via 
/proc/sbin or such. We could even swap them out on demand so there's 
no RAM overhead. There's no technical barrier.

I'd even argue that that C library is obviously something the kernel 
should offer as well - so klibc is the way to go and would help us 
further streamline this and keep Linux quality high.

We could actually keep the kernel and such tools tightly integrated, 
reducing the compatibility matrix. The kernel would upgrade with 
these tools but it *already* upgrades with some user-space components 
like the vdso so it's not a true technical barrier.

> The other worry which I've mentioned, but which I haven't seen 
> addressed, is that the even if you can use a perf from a newer 
> kernel with an older kernel, this causes distributions a huge 
> amount of pain, since they have to package two different kernel 
> source packages, and only compile perf from the newer kernel source 
> package.  This leads to all sorts of confusion from a distribution 
> packaging point of view.
> 
> For example, assume that RHEL 5, which is using 2.6.32 or something 
> like that, wants to use a newer e2fsck that does a better job 
> fixing file system corruptions. [...]

Firstly, it's not a big issue: if a tool comes with the kernel 
package then it's part of the regular backporting flow: if you 
backport a new tool to an old kernel then you do the same as if you 
backported a new kernel feature to an older enterprise kernel. 
Happens all the time, it's a technological problem with technological 
solutions. Enterprise distros explicitly do not support 
cross-distro-version package installs, so backporting will be done 
anyway.

Secondly, i actually think that the obsession with using obsolete 
kernel versions is silly technologically - and it has evolved that 
way partly *BECAUSE* we are not integrated enough and distros fear 
kernel upgrades because it had the bad habit of *breaking tools*.

The answer to that problem is to reduce the external cross section of 
the kernel and make sure that tools upgrade nicely together with the 
kernel - and integrating tools is a valid way to achieve that.

> > Did you consider it a possibility that out of tree projects that 
> > have deep ties to the kernel technically seem to be at a relative 
> > disadvantage to in-kernel projects because separation is 
> > technically costly with the costs of separation being larger than 
> > the advantages of separation?
> 
> As the e2fsprogs developer, I live with the costs all the time; I 
> can testify to the facy that they are very slight. [...]

Seriously, how can you tell that: you've never tried the integrated 
approach. I testified to the fact from the first hand experience of 
having tried both models of development.

> > But note that there are several OS projects that succeeded doing 
> > the equivalent of a 'whole world' single Git repo, so i don't 
> > think we have the basis to claim that it *cannot* work.
> 
> There have indeed, and there has speculation that this was one of 
> many contributions to why they lost out in the popularity and 
> adoption competition with Linux. [...]

I don't see Android having "lost out" in any way, do you? I actually 
see Android as being an obviously more successful approach to Linux 
on the desktop than anything else seen so far. We should at minimum 
stop and think about that fact, observe it, learn and adapt.

iOS also has not 'lost out' to Linux in any way.

> > But why do you have to think in absolutes and extremes all the 
> > time? Why not excercise some good case by case judgement about 
> > the merits of integration versus separation?
> 
> I agree that there are tradeoffs to both approaches, and I agree 
> that case by case judgement is something that should be done.  One 
> of the reasons why I've spent a lot of time pointing out the 
> downsides of integration and the shortcomings in the integration 
> position is that I've seen advocates claiming that the fact that 
> was perf was integrated was a precedent that meant that choice for 
> kvm-tool was something that should not be questioned since 
> tools/perf justified anything they wanted to do, and that if we 
> wanted to argue about whether kvm-tool should have been bundled 
> into the kernel, we should made different decisions about perf.

I don't think Pekka claimed 'anything goes' at all when he asked 
tools/kvm to be merged upstream - why are you using that strawman 
argument? He listed numerous valid technological reasons why they 
decided to work in the tools/kvm/ space and the results speak for 
themselves.

> [...] (Specifically, the reasoning goes that the need to package up 
> the kernel plus userspace meant that we had distributions in the 
> Linux ecosystem, and the competition kept everyone honest.  If one 
> distribution started making insane decisions, whether it's forcing 
> Unity on everyone, or forcing GNOME 3 on everyone, it's always 
> possible to switch to another distribution.  The *BSD systems 
> didn't have that safety valve....)

I don't think your argument makes much sense: how come Linux, a 15 
MLOC monster project running for 20 years has not been destroyed by 
the "lack of the safety valve" problem? Why would adding the at most 
1 MLOC deeply kernel related Linux tool and library space to the 
kernel repo affect the dynamics negatively? We added more code to the 
kernel last year alone.

Fact is, competition thrives within the Linux kernel as well. Why is 
a coherent, unified, focused project management an impediment to a 
good technological result? Especially when it comes to desktop 
computers / tablets / smartphones, where having a unified project is 
a *must*, so extreme are the requirements of users to get a coherent 
experience.

Think about this plain fact: there's not a single successful 
smartphone OS on the market that does not have unified project 
management. Yes, correlation is not causation and such, but still, 
think about it for a moment.

Thanks,

	Ingo
Ingo Molnar Nov. 9, 2011, 8:28 a.m. UTC | #132
* Ted Ts'o <tytso@mit.edu> wrote:

> On Tue, Nov 08, 2011 at 07:14:57PM +0200, Anca Emanuel wrote:

> > @Ten Ts'o: you are sponsored by something like microsoft (joking) 
> > ? Stop trolling. If you are not familiar with perf, or other 
> > tools, save your time and do some useful things.
> 
> I am quite familiar with perf.  A disagreement with how things are 
> done is not trolling.

Anca, Ted is not trolling me in any fashion. He is a (very 
successful) tool space and kernel developer and his opinion and 
experience about how tools should interact with the kernel project is 
of utmost importance.

Clearly Ted thinks that filesystem tools should stay separate from 
the kernel repo. I agree with him that the case for filesystem tool 
integration is weaker than for deeply kernel integrated tools such as 
perf or kvm and calling him a troll is not a way to settle that 
honest disagreement in any case.

Thanks,

	Ingo
Ingo Molnar Nov. 9, 2011, 8:38 a.m. UTC | #133
* John Kacur <jkacur@redhat.com> wrote:

> On Tue, 8 Nov 2011, Ted Ts'o wrote:
> 
> > On Tue, Nov 08, 2011 at 01:55:09PM +0100, Ingo Molnar wrote:

> > > I guess you can do well with a split project as well - my main 
> > > claim is that good compatibility comes *naturally* with 
> > > integration.
> > 
> > Here I have to disagree; my main worry is that integration makes 
> > it *naturally* easy for people to skip the hard work needed to 
> > keep a stable kernel/userspace interface.
> > 
> > The other worry which I've mentioned, but which I haven't seen 
> > addressed, is that the even if you can use a perf from a newer 
> > kernel with an older kernel, this causes distributions a huge 
> > amount of pain, since they have to package two different kernel 
> > source packages, and only compile perf from the newer kernel 
> > source package.  This leads to all sorts of confusion from a 
> > distribution packaging point of view.
> > 
> > For example, assume that RHEL 5, which is using 2.6.32 or 
> > something like that, wants to use a newer e2fsck that does a 
> > better job fixing file system corruptions.  If it were bundled 
> > with the kernel, then they would have to package up the v3.1 
> > kernel sources, and have a source RPM that isn't used for 
> > building kernel sources, but just to build a newer version of 
> > e2fsck.  Fortunately, they don't have to do that.  They just pull 
> > down a newer version of e2fsprogs, and package, build, test, and 
> > ship that.
> > 
> > In addition, suppose Red Hat ships a security bug fix which means 
> > a new kernel-image RPM has to be shipped.  Does that mean that 
> > Red Hat has to ship new binary RPM's for any and all tools/* 
> > programs that they have packaged as separate RPM's?  Or should 
> > installing a new kernel RPM also imply dropping new binaries in 
> > /usr/bin/perf, et. al? There are all sorts of packaging questions 
> > that are raised integration, and from where I sit I don't think 
> > they've been adequately solved yet.
> >
>  
> This in practice is not a big deal.
> 
> There are many approaches for how the RPM can be built, but basically
> getting the perf source is just a matter of
> make perf-tar-src-pkg or friends such as
> make perf-tarbz2-src-pkg
> which will create perf-3.2.0-rc1.tar, and perf-3.2.0-rc1.tar.bz2
> respectively which can be used for the src rpms. This tar ball can be used
> as a separate package or subpackage.

Great - the 'perf is impossible for distros' was a common counter 
argument early in the perf project's lifetime - i'm glad it turned 
out to be bogus in practice.

Would it further simplify distro side life if all utilities deeply 
related to the kernel got built together and came in a single well 
working package? kutils-3.2.0-rc1.rpm or such.

They would always upgrade together with the kernel so there would 
never be any forced backporting or separate errata pressure, beyond 
the existing flow of -stable fixes.

We do -stable fixes for tools/perf/ as well, for stability/security 
fixes, naturally - other tools would have to follow the regular 
kernel maintenance process to manage high priority fixes.

Basically distros could rely on the kernel and its utilities being a 
coherent whole, which is expected to work together, which is 
maintained and built together and which, if it regresses, is handled 
by the regular -stable kernel regressions process with high priority.

I expect it would grow one by one - it's not like we can or want to 
force utilities to go into the kernel proper. I'd also expect that 
new tools would be added initially - not existing ones moved. My 
question to you would rather be, would it make the life of distro 
release engineers gradually easier if this space grew gradually over 
the years, adding more and more critical tool functionality?

Thanks,

	Ingo
Ingo Molnar Nov. 9, 2011, 8:51 a.m. UTC | #134
* Gerd Hoffmann <kraxel@redhat.com> wrote:

> > For reference, the default set of colors now is (from
> > tools/perf/util/ui/browser.c):
> > 
> > static struct ui_browser__colorset {
> >         const char *name, *fg, *bg;
> >         int colorset;
> > } ui_browser__colorsets[] = {
> >         {
> >                 .colorset = HE_COLORSET_TOP,
> >                 .name     = "top",
> >                 .fg       = "red",
> >                 .bg       = "default",
> 
> Bad idea IMO.  Setting only one of foreground+background gives 
> pretty much unpredictable results.  My xterms have different 
> background colors, the ones with a root shell happen to have a 
> (dark) red background. Which results in red-on-dark-red text.  Not 
> good.
> 
> I'd strongly suggest to either set both background and foreground 
> to default or to set both to a specific color.  When doing the 
> latter make sure the colors have enougth contrast so they are 
> readable.

Indeed.

What we want to have is to have a set of distinctive colors - just 
two (background, foreground) colors are not enough - we also need 
colors to highlight certain information - we need 5-6 colors for the 
output to be maximally expressive. Is there a canonical way to handle 
that while still adapting to user preferences automatically by taking 
background/foreground color scheme of the xterm into account?

I suspect to fix the worst of the fallout we could add some logic to 
detect low contrast combinations (too low color distance) and fall 
back to the foreground/background colors in that case.

Plus allowing full .perfconfig configurability of all the relevant 
colors, for those with special taste.

Thanks,

	Ingo
Ingo Molnar Nov. 9, 2011, 8:55 a.m. UTC | #135
* Arnaldo Carvalho de Melo <acme@redhat.com> wrote:

> > sure the colors have enougth contrast so they are readable.
> 
> Problem is figuring out something that is considered a good default 
> :-\ There will always be somebody that will complain.
> 
> When doing the coding to allow using the default xterm colors I 
> tried several of the gnome-terminal xterm profiles and all looked 
> kinda sane for the "top" (hottest functions, with most hits) and 
> "medium" lines, where we combine some chosen foreground color 
> ("red" and "green").
> 
> Laziest solution would be: If the user customizes that much, could 
> the user please customize this as well? :-)

I don't think it's acceptable to output unreadable color combinations 
(red on dark red, etc.) in any case, so we should add some safety 
mechanism that detects bad color combinations and a fallback, static 
color scheme.

I like the current way how perf top/report adapts to the xterm color 
scheme. I use it both on dark and white backgrounds and it's easy to 
mistake it for --stdio output - which is good, a good TUI should 
blend into the console's color scheme. So i think we should keep that 
and just detect the few cases where it results in something 
unreadable.

Thanks,

	Ingo
Ingo Molnar Nov. 9, 2011, 9:21 a.m. UTC | #136
* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, Nov 08, 2011 at 10:32:25AM +0100, Ingo Molnar wrote:
> > 
> > None of the perf developers with whom i'm working complained 
> > about the shared repo so far - publicly or privately. By all 
> > means they are enjoying it and if you look at the stats and 
> > results you'll agree that they are highly productive working in 
> > that environment.
> 
> Just because you brought it up.
> 
> I personally find it awkward to work in the linux tools directory. 
> Maybe this is the reason that I haven't been such a big contributor 
> of perf. [...]

Well, this is an argument with a long history we've had from the 
moment we started perf - i think the main underlying reason for that 
is that you still see perf as competition to ftrace instead of seeing 
perf the child of ftrace, the next version of ftrace, the next 
iterative step of evolution :-/

Unfortunately there's not much that i can do about that beyond 
telling you that you are IMHO wrong - you as the main ftrace 
developer thinking that it's competition is a self-fulfilling 
expectation.

Eventually someone will do the right thing and implement 'perf trace' 
(there's still the tip:tmp.perf/trace2 prototype branch) and users 
will flock to that workflow because it's so much more intuitive in 
practice. From what i've seen from the short prototype experiments 
i've conducted it's a no-brainer superior workflow and design.

> [...] I only pushed ktest into the kernel tools directory because 
> people convinced me to do so. Having it there didn't seem to bring 
> in many other developers. [...]

It was somewhat similar with perf - contributors only arrived after 
it went upstream, and even then with a delay of a few releases.

Also, and it pains me to have to mention it, but putting a .pl script 
into the kernel repo is not necessarily a reciepe for attracting a 
lot of developers. We went to great lengths to kill the .cc perf 
report file in perf, to keep the programming environment familiar to 
kernel developers and other low level utility folks.

Also, obviously a tool has to be important, interesting and has to 
offer a distinct edge over other tools to attract contributors. Maybe 
tools/testing/ktest/ does not sound that interesting? Naming also 
matters: i sure would have moved it to tools/ktest/, its name already 
suggests that it's about testing, why repeat that twice? Sounds 
weird.

In that sense tools/kvm/ is better than perf: it has already 
attracted a core group of good, productive contributors despite still 
being an out of tree fork.

The point here was that Pekka & co not just clearly enjoys working on 
tools/kvm/ and has no trouble attracting contributors, but also 
*relies* on it being in the kernel tree.

Thanks,

	Ingo
Peter Zijlstra Nov. 9, 2011, 10:05 a.m. UTC | #137
On Tue, 2011-11-08 at 13:59 +0100, Ingo Molnar wrote:
> 
> > Also the self monitor stuff, perf-tool doesn't use that for obvious 
> > reasons.
> 
> Indeed, and that's PAPI's strong point.
> 
> We could try to utilize it via some clever LD_PRELOAD trickery?

Wouldn't be really meaningful, a perf-test case that covers it would be
much saner.
Gerd Hoffmann Nov. 9, 2011, 10:40 a.m. UTC | #138
Hi,

> What we want to have is to have a set of distinctive colors - just 
> two (background, foreground) colors are not enough - we also need 
> colors to highlight certain information - we need 5-6 colors for the 
> output to be maximally expressive. Is there a canonical way to handle 
> that while still adapting to user preferences automatically by taking 
> background/foreground color scheme of the xterm into account?

> I suspect to fix the worst of the fallout we could add some logic to 
> detect low contrast combinations (too low color distance) and fall 
> back to the foreground/background colors in that case.

As far I know it is pretty much impossible to figure the
foreground/background colors of the terminal you are running on.  You
can try some guesswork based on $TERM (linux console usually has black
background, xterm is white by default), but there will always be cases
where it fails.

You can run without colors.  You can use bold to highlight things and
reverse for the cursor.  Surely a bit limited and not as pretty as
colored, but works for sure everywhere.

You can go for a linux-console style black background.  Pretty much any
color is readable here, so you should have no problems at all to find
the 5-6 colors you want.

You can go for a xterm-like light background, for example the lightgray
used by older perf versions.  I like that background color, problem is
with most colors the contrast is pretty low.  IMHO only red, blue and
violet are readable on lightgray.  And black of course.

> Plus allowing full .perfconfig configurability of all the relevant 
> colors, for those with special taste.

Sure.  Maybe also allow multiple color sections and pick them by $TERM
or --colors switch, i.e. [colors "xterm"].

cheers,
  Gerd
Hagen Paul Pfeifer Nov. 9, 2011, 10:50 a.m. UTC | #139
On Wed, 09 Nov 2011 11:40:01 +0100, Gerd Hoffmann wrote:



> far I know it is pretty much impossible to figure the

> foreground/background colors of the terminal you are running on.  You

> can try some guesswork based on $TERM (linux console usually has black

> background, xterm is white by default), but there will always be cases

> where it fails.



You can make it more explicit, similar to .vimrc:



:set background=dark or :set background=light which in turn set the

appropriate foreground colors.





Hagen
Arnaldo Carvalho de Melo Nov. 9, 2011, 11:55 a.m. UTC | #140
Em Wed, Nov 09, 2011 at 11:40:01AM +0100, Gerd Hoffmann escreveu:
>   Hi,
> 
> > What we want to have is to have a set of distinctive colors - just 
> > two (background, foreground) colors are not enough - we also need 
> > colors to highlight certain information - we need 5-6 colors for the 
> > output to be maximally expressive. Is there a canonical way to handle 
> > that while still adapting to user preferences automatically by taking 
> > background/foreground color scheme of the xterm into account?
> 
> > I suspect to fix the worst of the fallout we could add some logic to 
> > detect low contrast combinations (too low color distance) and fall 
> > back to the foreground/background colors in that case.
> 
> As far I know it is pretty much impossible to figure the
> foreground/background colors of the terminal you are running on.  You

Glad to hear that, I thought I hadn't researched that much (I did). Hope
somebody appears and tell us how it is done :-)

> can try some guesswork based on $TERM (linux console usually has black
> background, xterm is white by default), but there will always be cases
> where it fails.
> 
> You can run without colors.  You can use bold to highlight things and
> reverse for the cursor.  Surely a bit limited and not as pretty as
> colored, but works for sure everywhere.
> 
> You can go for a linux-console style black background.  Pretty much any
> color is readable here, so you should have no problems at all to find
> the 5-6 colors you want.
> 
> You can go for a xterm-like light background, for example the lightgray
> used by older perf versions.  I like that background color, problem is
> with most colors the contrast is pretty low.  IMHO only red, blue and
> violet are readable on lightgray.  And black of course.
> 
> > Plus allowing full .perfconfig configurability of all the relevant 
> > colors, for those with special taste.
> 
> Sure.  Maybe also allow multiple color sections and pick them by $TERM
> or --colors switch, i.e. [colors "xterm"].

Its fully configurable as of now, what we need is a set of .perfconfigs
that show how people think its better, we try it, set it as the default,
leave the others in tools/perf/Documentation/perfconfig/color.examples.

- Arnaldo
Arnaldo Carvalho de Melo Nov. 9, 2011, 12:03 p.m. UTC | #141
Em Wed, Nov 09, 2011 at 10:21:09AM +0100, Ingo Molnar escreveu:
> Eventually someone will do the right thing and implement 'perf trace' 
> (there's still the tip:tmp.perf/trace2 prototype branch) and users 

I'm working on it, reworking its patches into the new evlist/evsel
abstractions, etc.

- Arnaldo
Gerd Hoffmann Nov. 9, 2011, 12:26 p.m. UTC | #142
Hi,

>>> Plus allowing full .perfconfig configurability of all the relevant 
>>> colors, for those with special taste.
>>
>> Sure.  Maybe also allow multiple color sections and pick them by $TERM
>> or --colors switch, i.e. [colors "xterm"].
> 
> Its fully configurable as of now, what we need is a set of .perfconfigs
> that show how people think its better, we try it, set it as the default,
> leave the others in tools/perf/Documentation/perfconfig/color.examples.

Yep, a set of examples works too.

The colors are not fully configurable yet though.  First, when switching
all five colorsets to "default, default" there are still things which
are colored (top bar, bottom bar, keys help display).  Second there is
no way to set terminal attributes (i.e. "top = bold" or "selected =
reverse").

cheers,
  Gerd
Arnaldo Carvalho de Melo Nov. 9, 2011, 12:30 p.m. UTC | #143
Em Wed, Nov 09, 2011 at 01:26:34PM +0100, Gerd Hoffmann escreveu:
>   Hi,
> 
> >>> Plus allowing full .perfconfig configurability of all the relevant 
> >>> colors, for those with special taste.
> >>
> >> Sure.  Maybe also allow multiple color sections and pick them by $TERM
> >> or --colors switch, i.e. [colors "xterm"].
> > 
> > Its fully configurable as of now, what we need is a set of .perfconfigs
> > that show how people think its better, we try it, set it as the default,
> > leave the others in tools/perf/Documentation/perfconfig/color.examples.
> 
> Yep, a set of examples works too.
> 
> The colors are not fully configurable yet though.  First, when switching
> all five colorsets to "default, default" there are still things which
> are colored (top bar, bottom bar, keys help display).  Second there is
> no way to set terminal attributes (i.e. "top = bold" or "selected =
> reverse").

Ok, adding those to the TODO list.

/me goes to check if http://perf.wiki.kernel.org is back working so that
we can have a _public_ TODO list, perhaps it may attract more
contributors :)

- Arnaldo
Arnaldo Carvalho de Melo Nov. 9, 2011, 12:33 p.m. UTC | #144
Em Wed, Nov 09, 2011 at 10:30:50AM -0200, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Nov 09, 2011 at 01:26:34PM +0100, Gerd Hoffmann escreveu:
> > > Its fully configurable as of now, what we need is a set of .perfconfigs
> > > that show how people think its better, we try it, set it as the default,
> > > leave the others in tools/perf/Documentation/perfconfig/color.examples.

> > Yep, a set of examples works too.

> > The colors are not fully configurable yet though.  First, when switching
> > all five colorsets to "default, default" there are still things which
> > are colored (top bar, bottom bar, keys help display).  Second there is
> > no way to set terminal attributes (i.e. "top = bold" or "selected =
> > reverse").

> Ok, adding those to my TODO list.

> /me goes to check if http://perf.wiki.kernel.org is back working so that
> we can have a _public_ TODO list, perhaps it may attract more
> contributors :)

Oops, there is one, utterly old tho ;-\

I tried changing that and adding this entry but:

https://perf.wiki.kernel.org/articles/u/s/e/Special~UserLogin_94cd.html

Returns:

The requested URL /articles/u/s/e/Special~UserLogin_94cd.html was not
found on this server.

Ingo, would that G+ page be useful for that?

- Arnaldo
Peter Zijlstra Nov. 9, 2011, 12:46 p.m. UTC | #145
On Wed, 2011-11-09 at 10:33 -0200, Arnaldo Carvalho de Melo wrote:
> 
> Ingo, would that G+ page be useful for that?
> 
*groan*

Can we please keep things sane?
Arnaldo Carvalho de Melo Nov. 9, 2011, 12:51 p.m. UTC | #146
Em Wed, Nov 09, 2011 at 01:46:42PM +0100, Peter Zijlstra escreveu:
> On Wed, 2011-11-09 at 10:33 -0200, Arnaldo Carvalho de Melo wrote:
> > 
> > Ingo, would that G+ page be useful for that?
> > 
> *groan*
> 
> Can we please keep things sane?

ROFL, I had to ask that :-P

- Arnaldo
Ingo Molnar Nov. 9, 2011, 1:17 p.m. UTC | #147
* Arnaldo Carvalho de Melo <acme@redhat.com> wrote:

> Em Wed, Nov 09, 2011 at 10:30:50AM -0200, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Nov 09, 2011 at 01:26:34PM +0100, Gerd Hoffmann escreveu:
> > > > Its fully configurable as of now, what we need is a set of .perfconfigs
> > > > that show how people think its better, we try it, set it as the default,
> > > > leave the others in tools/perf/Documentation/perfconfig/color.examples.
> 
> > > Yep, a set of examples works too.
> 
> > > The colors are not fully configurable yet though.  First, when switching
> > > all five colorsets to "default, default" there are still things which
> > > are colored (top bar, bottom bar, keys help display).  Second there is
> > > no way to set terminal attributes (i.e. "top = bold" or "selected =
> > > reverse").
> 
> > Ok, adding those to my TODO list.
> 
> > /me goes to check if http://perf.wiki.kernel.org is back working so that
> > we can have a _public_ TODO list, perhaps it may attract more
> > contributors :)
> 
> Oops, there is one, utterly old tho ;-\
> 
> I tried changing that and adding this entry but:
> 
> https://perf.wiki.kernel.org/articles/u/s/e/Special~UserLogin_94cd.html
> 
> Returns:
> 
> The requested URL /articles/u/s/e/Special~UserLogin_94cd.html was not
> found on this server.
> 
> Ingo, would that G+ page be useful for that?

Not sure - i think perf.wiki.kernel.org is a good place for 
documentation kind of information. The G+ page is more like for news 
items.

Thanks,

	Ingo
Cong Wang Nov. 9, 2011, 1:40 p.m. UTC | #148
On Tue, Nov 8, 2011 at 5:32 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> So i think you should seriously consider moving your projects *into*
> tools/ instead of trying to get other projects to move out ...
>
> You should at least *try* the unified model before criticising it -
> because currently you guys are preaching about sex while having sworn
> a life long celibacy ;-)
>

Ingo, this is making Linux another BSD... manage everything in a single
tree...

Also, what is your criteria for merging a user-space project into kernel tree?

Thanks.
Jim Paris Nov. 9, 2011, 7:25 p.m. UTC | #149
Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 09, 2011 at 11:40:01AM +0100, Gerd Hoffmann escreveu:
> >   Hi,
> > 
> > > What we want to have is to have a set of distinctive colors - just 
> > > two (background, foreground) colors are not enough - we also need 
> > > colors to highlight certain information - we need 5-6 colors for the 
> > > output to be maximally expressive. Is there a canonical way to handle 
> > > that while still adapting to user preferences automatically by taking 
> > > background/foreground color scheme of the xterm into account?
> > 
> > > I suspect to fix the worst of the fallout we could add some logic to 
> > > detect low contrast combinations (too low color distance) and fall 
> > > back to the foreground/background colors in that case.
> > 
> > As far I know it is pretty much impossible to figure the
> > foreground/background colors of the terminal you are running on.  You
> 
> Glad to hear that, I thought I hadn't researched that much (I did). Hope
> somebody appears and tell us how it is done :-)

In xterm, '\e]10;?\e\\' and '\e]11;?\e\\' will report the colors, e.g.:

#!/bin/bash
  read -s -r -d \\ -p `printf '\e]10;?\e\\'` -t 1 fg
  [ $? -ne 0 ] && fg="no response"
  echo "foreground: $fg" | cat -v
  read -s -r -d \\ -p `printf '\e]11;?\e\\'` -t 1 bg
  [ $? -ne 0 ] && bg="no response"
  echo "background: $bg" | cat -v

-jim
Arnaldo Carvalho de Melo Nov. 9, 2011, 8:13 p.m. UTC | #150
Em Wed, Nov 09, 2011 at 02:25:09PM -0500, Jim Paris escreveu:
> Arnaldo Carvalho de Melo wrote:
> > Em Wed, Nov 09, 2011 at 11:40:01AM +0100, Gerd Hoffmann escreveu:
> > > As far I know it is pretty much impossible to figure the
> > > foreground/background colors of the terminal you are running on.  You

> > Glad to hear that, I thought I hadn't researched that much (I did). Hope
> > somebody appears and tell us how it is done :-)

> In xterm, '\e]10;?\e\\' and '\e]11;?\e\\' will report the colors, e.g.:

> #!/bin/bash
>   read -s -r -d \\ -p `printf '\e]10;?\e\\'` -t 1 fg
>   [ $? -ne 0 ] && fg="no response"
>   echo "foreground: $fg" | cat -v
>   read -s -r -d \\ -p `printf '\e]11;?\e\\'` -t 1 bg
>   [ $? -ne 0 ] && bg="no response"
>   echo "background: $bg" | cat -v

gnome-terminal:

[acme@felicio ~]$ ./a.sh
foreground: no response
background: no response
[acme@felicio ~]$

:-(

- Arnaldo
Anca Emanuel Nov. 9, 2011, 10:32 p.m. UTC | #151
"I'd even argue that that C library is obviously something the
kernelshould offer as well - so klibc is the way to go and would help
usfurther streamline this and keep Linux quality high."

I think there is code to share. Why not ?
Alexander Graf Nov. 10, 2011, 1:41 a.m. UTC | #152
On 09.11.2011, at 09:23, Ingo Molnar wrote:

> 
> * Ted Ts'o <tytso@mit.edu> wrote:
> 
>> On Tue, Nov 08, 2011 at 01:55:09PM +0100, Ingo Molnar wrote:
>> 
>>> I guess you can do well with a split project as well - my main 
>>> claim is that good compatibility comes *naturally* with 
>>> integration.
>> 
>> Here I have to disagree; my main worry is that integration makes it 
>> *naturally* easy for people to skip the hard work needed to keep a 
>> stable kernel/userspace interface.
> 
> There's two observations i have:
> 

[...]

> I don't think your argument makes much sense: how come Linux, a 15 
> MLOC monster project running for 20 years has not been destroyed by 
> the "lack of the safety valve" problem? Why would adding the at most 
> 1 MLOC deeply kernel related Linux tool and library space to the 
> kernel repo affect the dynamics negatively? We added more code to the 
> kernel last year alone.
> 
> Fact is, competition thrives within the Linux kernel as well. Why is 
> a coherent, unified, focused project management an impediment to a 
> good technological result? Especially when it comes to desktop 
> computers / tablets / smartphones, where having a unified project is 
> a *must*, so extreme are the requirements of users to get a coherent 
> experience.
> 
> Think about this plain fact: there's not a single successful 
> smartphone OS on the market that does not have unified project 
> management. Yes, correlation is not causation and such, but still, 
> think about it for a moment.

I see your arguments and I think others do too. Look at the BSD or Solaris guys. Heck, even Windows and Mac OS have a lot tighter user-space and kernel bindings than we do.

However I don't see any real reason for us who already have the strong syscall ABI boundary as border defined to change that anytime soon. So far it's worked out pretty well IMHO.

But yes, if you were to push things from the bottom up, it would even make sense. If you were to push glibc into the kernel it would make sense. I maybe still wouldn't agree with it, but it'd at least be logical, because that's the next layer from the kernel's point of view.

If you were to push busybox into the kernel, it would also make sense, so that you can have a fully self-contained system that doesn't need external dependencies built inside a single tree. Again, I wouldn't agree on it because I like user space to be multi platform, but I could see the point. The same goes for udev and systemd.

For kvm tool however, I don't. It's very very high up the stack. In fact, I can't imagine too many applications being too much higher up the stack than a VM monitor. It needs to talk to the user (gtk?). It needs to talk to the network (which might be implemented using vde). It needs to talk to storage (which could be hidden behind user space libraries). It basically is a consumer of all the interfaces we provide 50 layers above the kernel.

So I find the comparison of pulling GNOME3 and KVM Tool into the kernel fair. Both depend on about the same amount of user space. And even though KVM Tool might not depend on all that much today, I'm sure you guys don't want to limit yourselves in scope just because you're "in the kernel tree".

Outside of the kernel tree, you can do your own decisions. If someone thinks it's a great idea to write device emulation in python (I would love that!), he could go in and implement it without having to worry about Linus possibly rejecting it because it's out of scope for a "Linux kernel testing tool". If you want to create the greatest GUI for virtualization the world has ever seen, you can just do it! Nothing holds you back.

You already have a very thriving development community. There are active contributers all over the place in KVM Tool. People already are interested in it. Why do you want to be in the kernel tree so badly? I honestly think it would rather hurt the project rather than help it.

So in all honesty, I wish for a KVM Tool outside of the kernel tree so it can thrive and evolve into something great - without artificial borders. And I'm sure most of the KVM Tool developers wish for the thriving part as well - which I believe can not happen inside the kernel tree.


Alex
Ingo Molnar Nov. 10, 2011, 7:47 a.m. UTC | #153
* Américo Wang <xiyou.wangcong@gmail.com> wrote:

> On Tue, Nov 8, 2011 at 5:32 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > So i think you should seriously consider moving your projects 
> > *into* tools/ instead of trying to get other projects to move out 
> > ...
> >
> > You should at least *try* the unified model before criticising it 
> > - because currently you guys are preaching about sex while having 
> > sworn a life long celibacy ;-)
> 
> Ingo, this is making Linux another BSD... manage everything in a 
> single tree...

It's not an all-or-nothing prospect. Linux user-space consists of 
well in excess of 200 MLOC code. The kernel is 15 MLOC.

I think the system-bound utilities that 'obviously' qualify for 
kernel inclusion are around 1 MLOC in total size, i.e. less than 0.5% 
of all user-space.

> Also, what is your criteria for merging a user-space project into 
> kernel tree?

Well, my criteria go roughly along these lines:

 1) The developers use that model and are productive that way and 
    produce a tool that has a significant upside.

 2) There's significant Linux-specific interactions between the 
    user-space project and the kernel.

 3) The code is clean, well designed and follows the various
    principles laid out in Documentation/CodingStyle and
    Documentation/ManagementStyle so that it can be merged into a
    prominent spot in the kernel tree and the project is ready to
    live with the (non-trivial!) consequences of all that:

        - the project does -stable kernel backports of serious bugs

        - the project follows a strict "no regressions" policy

        - the project follows the kernel release cycle of 'Winter', 
          'Spring', 'Summer' and 'Autumn' releases and follows the 
          merge window requirements and implements the post-rc1
          stabilization cycle.

These are not easy requirements and i can well imagine that many 
projects, even if they qualified on all other counts, would prefer to 
stay out of tree than be subject to such strict release engineering 
constraints.

Also, the requirements can be made stricter with time, based on 
positive and negative experiences. Projects can 'die' and move out of 
the kernel as well if the kernel repo did not work out for them. As 
long as it's all done gradually and on a case by case basis Linux can 
only benefit from this.

Thanks,

	Ingo
Ingo Molnar Nov. 10, 2011, 8 a.m. UTC | #154
* Anca Emanuel <anca.emanuel@gmail.com> wrote:

> "I'd even argue that that C library is obviously something the 
> kernelshould offer as well - so klibc is the way to go and would 
> help usfurther streamline this and keep Linux quality high."
> 
> I think there is code to share. Why not ?

The biggest downside of libc integration into the kernel would be 
that the libc ABI is *vastly* larger than the kernel ABI, and i'm not 
sure the kernel community is good enough to handle that. It's roughly 
3000 ABI components compared to the 300 ABI functions the kernel has 
today - so at least an order of magnitude larger...

The biggest upside of libc integration into the kernel would be that 
we could push Linux kernel improvements into the C library - and thus 
to apps - immediately, along a much larger ABI surface. The 
'specialization' resolution of the libc ABI is an order of magnitude 
larger than that of the kernel's, giving many more opportunities for 
good, workload specific optimizations and unique solutions.

Today the latency of getting a kernel improvement to applications via 
a change in the C library is above a year, so most kernel people 
don't actually try to improve the C library but try to find 
improvements on the kernel level which gets to a distro within a 
couple of months.

If the kernel offers a /proc/libc.so.6 library then the kernel will 
always be 'in sync' with the library (there's no library to install 
on-disk - it would be offered by the kernel) and we could use 
integration techniques like the vDSO uses today.

Thanks,

	Ingo
Anca Emanuel Nov. 10, 2011, 8:12 a.m. UTC | #155
[offtopic] Any news from Mathieu Desnoyers "Generic Ring Buffer
Library" http://www.efficios.com/ringbuffer ?
Ingo Molnar Nov. 10, 2011, 8:14 a.m. UTC | #156
* Alexander Graf <agraf@suse.de> wrote:

> [...]
>
> Outside of the kernel tree, you can do your own decisions. If 
> someone thinks it's a great idea to write device emulation in 
> python (I would love that!), he could go in and implement it 
> without having to worry about Linus possibly rejecting it because 
> it's out of scope for a "Linux kernel testing tool". If you want to 
> create the greatest GUI for virtualization the world has ever seen, 
> you can just do it! Nothing holds you back.

We actually recently added Python bindings to event tracing in perf:

 earth5:~/tip> find tools/perf/ -name '*.py'
 tools/perf/python/twatch.py
 tools/perf/util/setup.py
 tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py
 tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Core.py
 tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/SchedGui.py
 tools/perf/scripts/python/syscall-counts.py
 tools/perf/scripts/python/sctop.py
 tools/perf/scripts/python/sched-migration.py
 tools/perf/scripts/python/check-perf-trace.py
 tools/perf/scripts/python/futex-contention.py
 tools/perf/scripts/python/failed-syscalls-by-pid.py
 tools/perf/scripts/python/net_dropmonitor.py
 tools/perf/scripts/python/syscall-counts-by-pid.py
 tools/perf/scripts/python/netdev-times.py

... and Linus did not object (so far ;-) - nor does he IMHO have many 
reasons to object as long as the code is sane and useful. Nor did 
Linus object when perf extended its scope from profiling to tracing, 
system monitoring, etc.

While i don't talk for Linus, the only 'hard boundary' that Linus 
enforces and expects all maintainers to enforce that i'm aware of is 
"don't do crazy crap". Everything else is possible as long as it's 
high quality and reasonable, with a good upside story that is 
relevant to the kernel - you can let your imagination run wild, 
there's no artificial barriers that i'm aware of.

Anyway, i have outlined the rough consequences of a user-space 
project being inside the kernel repo in this post:

  http://lkml.org/lkml/2011/11/10/86

... and they are definitely not trivial and easy to meet.

Thanks,

	Ingo
Gerd Hoffmann Nov. 10, 2011, 8:39 a.m. UTC | #157
Hi,

>>> As far I know it is pretty much impossible to figure the
>>> foreground/background colors of the terminal you are running on.  You
>>
>> Glad to hear that, I thought I hadn't researched that much (I did). Hope
>> somebody appears and tell us how it is done :-)
> 
> In xterm, '\e]10;?\e\\' and '\e]11;?\e\\' will report the colors, e.g.:
> 
> #!/bin/bash
>   read -s -r -d \\ -p `printf '\e]10;?\e\\'` -t 1 fg
>   [ $? -ne 0 ] && fg="no response"
>   echo "foreground: $fg" | cat -v
>   read -s -r -d \\ -p `printf '\e]11;?\e\\'` -t 1 bg
>   [ $? -ne 0 ] && bg="no response"
>   echo "background: $bg" | cat -v

Works fine in xterm.  Neither gnome-terminal (i.e. vte widget) nor
konsole support this though.

cheers,
  Gerd
diff mbox

Patch

diff --git a/tools/testing/run-qemu/run-qemu.sh b/tools/testing/run-qemu/run-qemu.sh
new file mode 100755
index 0000000..70f194f
--- /dev/null
+++ b/tools/testing/run-qemu/run-qemu.sh
@@ -0,0 +1,338 @@ 
+#!/bin/bash
+#
+# QEMU Launcher
+#
+# This script enables simple use of the KVM and QEMU tool stack for
+# easy kernel testing. It allows to pass either a host directory to
+# the guest or a disk image. Example usage:
+#
+# Run the host root fs inside a VM:
+#
+# $ ./scripts/run-qemu.sh -r /
+#
+# Run the same with SDL:
+#
+# $ ./scripts/run-qemu.sh -r / --sdl
+# 
+# Or with a PPC build:
+#
+# $ ARCH=ppc ./scripts/run-qemu.sh -r /
+# 
+# PPC with a mac99 model by passing options to QEMU:
+#
+# $ ARCH=ppc ./scripts/run-qemu.sh -r / -- -M mac99
+#
+
+USE_SDL=
+USE_VNC=
+USE_GDB=1
+KERNEL_BIN=arch/x86/boot/bzImage
+MON_STDIO=
+KERNEL_APPEND2=
+SERIAL=ttyS0
+SERIAL_KCONFIG=SERIAL_8250
+BASENAME=$(basename "$0")
+
+function usage() {
+	echo "
+$BASENAME allows you to execute a virtual machine with the Linux kernel
+that you just built. To only execute a simple VM, you can just run it
+on your root fs with \"-r / -a init=/bin/bash\"
+
+	-a, --append parameters
+		Append the given parameters to the kernel command line.
+
+	-d, --disk image
+		Add the image file as disk into the VM.
+
+	-D, --no-gdb
+		Don't run an xterm with gdb attached to the guest.
+
+	-r, --root directory
+		Use the specified directory as root directory inside the guest.
+
+	-s, --sdl
+		Enable SDL graphical output.
+
+	-S, --smp cpus
+		Set number of virtual CPUs.
+
+	-v, --vnc
+		Enable VNC graphical output.
+
+Examples:
+
+	Run the host root fs inside a VM:
+	$ ./scripts/run-qemu.sh -r /
+
+	Run the same with SDL:
+	$ ./scripts/run-qemu.sh -r / --sdl
+	
+	Or with a PPC build:
+	$ ARCH=ppc ./scripts/run-qemu.sh -r /
+	
+	PPC with a mac99 model by passing options to QEMU:
+	$ ARCH=ppc ./scripts/run-qemu.sh -r / -- -M mac99
+"
+}
+
+function require_config() {
+	if [ "$(grep CONFIG_$1=y .config)" ]; then
+		return
+	fi
+
+	echo "You need to enable CONFIG_$1 for run-qemu to work properly"
+	exit 1
+}
+
+function has_config() {
+	grep -q "CONFIG_$1=y" .config
+}
+
+function drive_if() {
+	if has_config VIRTIO_BLK; then
+		echo virtio
+	elif has_config ATA_PIIX; then
+		echo ide
+	else
+		echo "\
+Your kernel must have either VIRTIO_BLK or ATA_PIIX
+enabled for block device assignment" >&2
+		exit 1
+	fi
+}
+
+GETOPT=`getopt -o a:d:Dhr:sS:v --long append,disk:,no-gdb,help,root:,sdl,smp:,vnc \
+	-n "$(basename \"$0\")" -- "$@"`
+
+if [ $? != 0 ]; then
+	echo "Terminating..." >&2
+	exit 1
+fi
+
+eval set -- "$GETOPT"
+
+while true; do
+	case "$1" in
+	-a|--append)
+		KERNEL_APPEND2="$KERNEL_APPEND2 $KERNEL_APPEND2"
+		shift
+		;;
+	-d|--disk)
+		QEMU_OPTIONS="$QEMU_OPTIONS -drive \
+			file=$2,if=$(drive_if),cache=unsafe"
+		USE_DISK=1
+		shift
+		;;
+	-D|--no-gdb)
+		USE_GDB=
+		;;
+	-h|--help)
+		usage
+		exit 0
+		;;
+	-r|--root)
+		ROOTFS="$2"
+		shift
+		;;
+	-s|--sdl)
+		USE_SDL=1
+		;;
+	-S|--smp)
+		SMP="$2"
+		shift
+		;;
+	-v|--vnc)
+		USE_VNC=1
+		;;
+	--)
+		shift
+		break
+		;;
+	*)
+		echo "Could not parse option: $1" >&2
+		exit 1
+		;;
+	esac
+	shift
+done
+
+if [ ! "$ROOTFS" -a ! "$USE_DISK" ]; then
+	echo "\
+Error: Please specify at least -r or -d with a target \
+FS to run off of" >&2
+	exit 1
+fi
+
+# Try to find the KVM accelerated QEMU binary
+
+[ "$ARCH" ] || ARCH=$(uname -m)
+case $ARCH in
+x86_64)
+	KERNEL_BIN=arch/x86/boot/bzImage
+	# SUSE and Red Hat call the binary qemu-kvm
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-kvm 2>/dev/null)
+
+	# Debian and Gentoo call it kvm
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which kvm 2>/dev/null)
+
+	# QEMU's own build system calls it qemu-system-x86_64
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-system-x86_64 2>/dev/null)
+	;;
+i*86)
+	KERNEL_BIN=arch/x86/boot/bzImage
+	# SUSE and Red Hat call the binary qemu-kvm
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-kvm 2>/dev/null)
+
+	# Debian and Gentoo call it kvm
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which kvm 2>/dev/null)
+
+	KERNEL_BIN=arch/x86/boot/bzImage
+	# i386 version of QEMU
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu 2>/dev/null)
+	;;
+s390*)
+	KERNEL_BIN=arch/s390/boot/image
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-system-s390x 2>/dev/null)
+	;;
+ppc*)
+	KERNEL_BIN=vmlinux
+
+	IS_64BIT=
+	has_config PPC64 && IS_64BIT=64
+	if has_config PPC_85xx; then
+		QEMU_OPTIONS="$QEMU_OPTIONS -M mpc8544ds"
+	elif has_config PPC_PSERIES; then
+		QEMU_OPTIONS="$QEMU_OPTIONS -M pseries"
+		SERIAL=hvc0
+		SERIAL_KCONFIG=HVC_CONSOLE
+	elif has_config PPC_PMAC; then
+		has_config SERIAL_PMACZILOG_TTYS || SERIAL=ttyPZ0
+		SERIAL_KCONFIG=SERIAL_PMACZILOG
+	else
+		echo "Unknown PPC board" >&2
+		exit 1
+	fi
+
+	[ "$QEMU_BIN" ] || QEMU_BIN=$(which qemu-system-ppc${IS_64BIT} 2>/dev/null)
+	;;
+esac
+
+if [ ! -e "$QEMU_BIN" ]; then
+	echo "\
+Could not find a usable QEMU binary. Please install one from \
+your distro or from source code using:
+
+  $ git clone git://git.qemu.org/qemu.git
+  $ cd qemu
+  $ ./configure
+  $ make -j
+  $ sudo make install
+" >&2
+	exit 1
+fi
+
+# The binaries without kvm in their name can be too old to support KVM, so
+# check for that before the user gets confused
+if [ ! "$(echo $QEMU_BIN | grep kvm)" -a \
+     ! "$($QEMU_BIN --help | egrep '^-machine')" ]; then
+	echo "Your QEMU binary is too old, please update to at least 0.15." >&2
+	exit 1
+fi
+QEMU_OPTIONS="$QEMU_OPTIONS -machine accel=kvm:tcg"
+
+# We need to check some .config variables to make sure we actually work
+# on the respective kernel.
+if [ ! -e .config ]; then
+	echo "\
+Please run this script on a fully compiled and configured
+Linux kernel build directory" >&2
+	exit 1
+fi
+
+if [ ! -e "$KERNEL_BIN" ]; then
+	echo "Could not find kernel binary: $KERNEL_BIN" >&2
+	exit 1
+fi
+
+QEMU_OPTIONS="$QEMU_OPTIONS -kernel $KERNEL_BIN"
+
+if [ "$USE_SDL" ]; then
+	# SDL is the default, so nothing to do
+	:
+elif [ "$USE_VNC" ]; then
+	QEMU_OPTIONS="$QEMU_OPTIONS -vnc :5"
+else
+	# When emulating a serial console, tell the kernel to use it as well
+	QEMU_OPTIONS="$QEMU_OPTIONS -nographic"
+	KERNEL_APPEND="$KERNEL_APPEND console=$SERIAL earlyprintk=serial"
+	MON_STDIO=1
+	require_config "$SERIAL_KCONFIG"
+fi
+
+if [ "$ROOTFS" ]; then
+	# Using rootfs with 9p
+	require_config "NET_9P_VIRTIO"
+	KERNEL_APPEND="$KERNEL_APPEND \
+root=/dev/root rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p"
+
+#Usage: -virtfs fstype,path=/share_path/,security_model=[mapped|passthrough|none],mount_tag=tag.
+
+
+	QEMU_OPTIONS="$QEMU_OPTIONS \
+-virtfs local,id=root,path=$ROOTFS,mount_tag=root,security_model=passthrough \
+-device virtio-9p-pci,fsdev=root,mount_tag=/dev/root"
+fi
+
+[ "$SMP" ] || SMP=1
+
+# User append args come last
+KERNEL_APPEND="$KERNEL_APPEND $KERNEL_APPEND2"
+
+############### Execution #################
+
+QEMU_OPTIONS="$QEMU_OPTIONS -smp $SMP"
+
+echo "
+	################# Linux QEMU launcher #################
+
+This script executes your currently built Linux kernel using QEMU. If KVM is
+available, it will also use KVM for fast virtualization of your guest.
+
+The intent is to make it very easy to run your kernel. If you need to do more
+advanced things, such as passing through real devices, please use QEMU command
+line options and add them to the $BASENAME command line using --.
+
+This tool is for simplicity, not world dominating functionality coverage.
+(just a hobby, won't be big and professional like libvirt)
+
+"
+
+if [ "$MON_STDIO" ]; then
+	echo "\
+### Your guest is bound to the current foreground shell. To quit the guest, ###
+### please use Ctrl-A x                                                     ###
+"
+fi
+
+echo -n "  Executing: $QEMU_BIN $QEMU_OPTIONS -append \"$KERNEL_APPEND\" "
+for i in "$@"; do
+	echo -n "\"$i\" "
+done
+echo
+echo
+
+GDB_PID=
+if [ "$USE_GDB" -a "$DISPLAY" -a -x "$(which xterm)" -a -e "$(which gdb)" ]; then
+	# Run a gdb console in parallel to the kernel
+
+	# XXX find out if port is in use
+	PORT=$(( $$ + 1024 ))
+	xterm -T "$BASENAME" -e "sleep 2; gdb vmlinux -ex 'target remote localhost:$PORT' -ex c" &
+	GDB_PID=$!
+	QEMU_OPTIONS="$QEMU_OPTIONS -gdb tcp::$PORT"
+fi
+
+$QEMU_BIN $QEMU_OPTIONS -append "$KERNEL_APPEND" "$@"
+wait $GDB_PID &>/dev/null
+