Message ID | 20111108214504.28884.61814.stgit@miche.sea.corp.google.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, 2011-11-08 at 13:45 -0800, Miche Baker-Harvey wrote: > hvc_init() must only be called once, and no thread should continue with hvc_alloc() > until after initialization is complete. The original code does not enforce either > of these requirements. A new mutex limits entry to hvc_init() to a single thread, > and blocks all later comers until it has completed. > > This patch fixes multiple crash symptoms. Hi Miche, A few nit-picky comments below .. > @@ -84,6 +85,10 @@ static LIST_HEAD(hvc_structs); > * list traversal. > */ > static DEFINE_SPINLOCK(hvc_structs_lock); > +/* > + * only one task does allocation at a time. > + */ > +static DEFINE_MUTEX(hvc_ports_mutex); The comment is wrong, isn't it? Only one task does _init_ at a time. Once the driver is initialised allocs can run concurrently. So shouldn't it be called hvc_init_mutex ? > @@ -825,11 +830,15 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data, > int i; > > /* We wait until a driver actually comes along */ > + mutex_lock(&hvc_ports_mutex); > if (!hvc_driver) { > int err = hvc_init(); > - if (err) > + if (err) { > + mutex_unlock(&hvc_ports_mutex); > return ERR_PTR(err); > + } > } > + mutex_unlock(&hvc_ports_mutex); > > hp = kzalloc(ALIGN(sizeof(*hp), sizeof(long)) + outbuf_size, > GFP_KERNEL); It'd be cleaner I think to do all the locking in hvc_init(). That's safe as long as you recheck !hvc_driver in hvc_init() with the lock held. cheers
On Tue, 08 Nov 2011 13:45:04 -0800, Miche Baker-Harvey <miche@google.com> wrote: > hvc_init() must only be called once, and no thread should continue with hvc_alloc() > until after initialization is complete. The original code does not enforce either > of these requirements. A new mutex limits entry to hvc_init() to a single thread, > and blocks all later comers until it has completed. > > This patch fixes multiple crash symptoms. No, it doesn't: not now your previous double-hvc_init patch has been reverted. It's easy to show this is the case though: replace mutex_lock() with: if (!mutex_trylock()) { WARN(1, "hvc_console: mutex is stopping us!\n"); mutex_lock(); } If you get that warning, then your mutex is doing something. Thanks, Rusty.
Rusty, Michael, Stephen, et al, Thanks for your comments on these patches. For what I'm trying to do, all three patches are necessary, but maybe I'm going about it the wrong way. Your input would be appreciated. I'm in no way claiming that these patches are "right", just that it's working for me, and that what's in the current pool is not. What I'm trying to do is: On X86, under KVM, start a virtio console device, with multiple ports on the device, at least one of which is also a console (as well as ttyS0). (Eventually, we want to be able to add virtio console ports on the fly, and to have multiple virtio console ports be consoles.) When all three of the patches are in place, this works great. (By great, I mean that getty's start up on all of ttyS0, hvc0 and hvc1, and console output goes to ttyS0 and to hvc0. "who" shows three users: ttyS0, hvc0, and hvc1. "cat /proc/consoles" shows both ttyS0 and hvc0. I can use all three getty's, and console output really does appear on both the consoles. Based on Rusty's comments, I tried removing each of the patches individually. Here's what happens today. I've seen other failure modes depending on what precisely I'm passing the guest. There's three patches: 1/3 "fix locking of vtermno" 2/3 "enforce one-time initialization with hvc_init "3/3 "use separate struct console * for each console" If I remove the "fix locking of vtermno", I only get one virtio console terminal. "who" shows the ttyS0 and the hvc0, and I can log into the gettys on both. I don't get the second virtio console getty. Interestingly, hvc0 shows up in /proc/consoles twice, and in fact the console output is dumped twice to hvc0 (as you'd expect from looking at printk.c, each line appears twice, followed by the next line.) If I remove the "enforce one-time initialization with hvc_init" patch, which makes sure only a single thread is doing the hvc_init, and gates anyone from continuing until it has completed, I get different failures, including hangs, and dereferences of NULL pointers. If I remove the "use separate struct console * for each console"patch, what I'm seeing now is that while all of ttyS0, hvc0, and hvc1 are present with gettys running on them, of the three, only ttyS0 is a console. I also re-tried each patch alone: For either the "fix locking of vtermno" or "use separate struct console * for each console" patches (in other words, not the "enforce one-time initialization with hvc_init" patch), I panic during boot with a null dereference. For just the "enforce one-time initialization with hvc_init" patch, I see all of hvc0, hvc1, and ttyS0 in a "who" listing, but only one getty is available with an hvc. Also, an echo to *either* "hvc0" or"hvc1" appears on the single hvc getty. Also, no virtio console appears in the /proc/consoles list. Michael, I agree with you about the comment and naming of the mutex around hvc_init. Stephen, the duplicate messages are not something I'm seeing. It's probably the case that there are two "consoles" (registered in printk) that have the same tty as their target. I've added a call to register_console in hvc_alloc, and I'm guessing that something in your system is making your tty register as a console in hvc_instantiate, and then it's re-registered in hvc_alloc, but I really am not sure. We don't have earlyprintk support, so the register_console in hvc_instantiate is never called. Miche
On Thu, 17 Nov 2011 10:57:37 -0800, Miche Baker-Harvey <miche@google.com> wrote: > Rusty, Michael, Stephen, et al, > > Thanks for your comments on these patches. > > For what I'm trying to do, all three patches are necessary, but maybe > I'm going about it the wrong way. Your input would be appreciated. > I'm in no way claiming that these patches are "right", just that it's > working for me, and that what's in the current pool is not. We have to *understand* the code. If we don't, we need to rewrite the code so we *do* understand it, or make way for someone who does. I'm looking at the kvm man page to try to figure out how to have virtio console, and it's deeply unclear. What kvm commandline are you using? I'll try to debug it here, and see what I learn about hvc_console. Cheers, Rusty.
Thanks, Rusty. I'm not using QEMU though, just KVM. I create the device, wait for the message from the guest that the device is ready, and then add ports. Miche On Sun, Nov 20, 2011 at 9:01 PM, Rusty Russell <rusty@rustcorp.com.au> wrote: > On Thu, 17 Nov 2011 10:57:37 -0800, Miche Baker-Harvey <miche@google.com> wrote: >> Rusty, Michael, Stephen, et al, >> >> Thanks for your comments on these patches. >> >> For what I'm trying to do, all three patches are necessary, but maybe >> I'm going about it the wrong way. Your input would be appreciated. >> I'm in no way claiming that these patches are "right", just that it's >> working for me, and that what's in the current pool is not. > > We have to *understand* the code. If we don't, we need to rewrite the > code so we *do* understand it, or make way for someone who does. > > I'm looking at the kvm man page to try to figure out how to have virtio > console, and it's deeply unclear. What kvm commandline are you using? > I'll try to debug it here, and see what I learn about hvc_console. > > Cheers, > Rusty. >
On Mon, 21 Nov 2011 14:16:38 -0800, Miche Baker-Harvey <miche@google.com> wrote: > Thanks, Rusty. I'm not using QEMU though, just KVM. I create the device, wait > for the message from the guest that the device is ready, and then add ports. > > Miche OK, since Amit was the one who implemented multi-port console, I'm going to hand this to him. I'm sure he's been looking for excuses to dive back into the console code! Cheers, Rusty.
On (Thu) 17 Nov 2011 [10:57:37], Miche Baker-Harvey wrote: > Rusty, Michael, Stephen, et al, > > Thanks for your comments on these patches. > > For what I'm trying to do, all three patches are necessary, but maybe > I'm going about it the wrong way. Your input would be appreciated. > I'm in no way claiming that these patches are "right", just that it's > working for me, and that what's in the current pool is not. > > What I'm trying to do is: > On X86, > under KVM, > start a virtio console device, > with multiple ports on the device, > at least one of which is also a console (as well as ttyS0). > > (Eventually, we want to be able to add virtio console ports on the > fly, and to have multiple virtio console ports be consoles.) Are you using kvm-tool to do this? QEMU can already hot-plug ports and virtio-console (virtio-serial) devices. > When all three of the patches are in place, this works great. (By > great, I mean that getty's start up on all of ttyS0, hvc0 and hvc1, > and console output goes to ttyS0 and to hvc0. > "who" shows three users: ttyS0, hvc0, and hvc1. > "cat /proc/consoles" shows both ttyS0 and hvc0. > I can use all three getty's, and console output really does appear on > both the consoles. > > Based on Rusty's comments, I tried removing each of the patches > individually. Here's what happens today. I've seen other failure modes > depending on what precisely I'm passing the guest. > There's three patches: > 1/3 "fix locking of vtermno" > 2/3 "enforce one-time initialization with hvc_init > "3/3 "use separate struct console * for each console" > > If I remove the "fix locking of vtermno", I only get one virtio > console terminal. "who" shows the ttyS0 and the hvc0, and I can log > into the gettys on both. I don't get the second virtio console getty. > Interestingly, hvc0 shows up in /proc/consoles twice, and in fact the > console output is dumped twice to hvc0 (as you'd expect from looking > at printk.c, each line appears twice, followed by the next line.) I don't really understand why. "fix locking of vtermno" adds locks in init_port_console(), which is called from add_port(), which should be serialised due to port additions being on work items on the same workqueue. I don't see a race here. > If I remove the "enforce one-time initialization with hvc_init" patch, > which makes sure only a single thread is doing the hvc_init, and gates > anyone from continuing until it has completed, I get different > failures, including hangs, and dereferences of NULL pointers. > > If I remove the "use separate struct console * for each console"patch, > what I'm seeing now is that while all of ttyS0, hvc0, and hvc1 are > present with gettys running on them, of the three, only ttyS0 is a > console. I don't see any difference in my testing with and without these patches. This is how I tested with qemu: ./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -chardev socket,path=/tmp/foo,server,nowait,id=foo -chardev socket,path=/tmp/bar,server,nowait,id=bar -device virtio-serial -device virtconsole,chardev=foo,nr=4 -device virtconsole,chardev=bar,nr=3 -net none -kernel /home/amit/src/linux/arch/x86/boot/bzImage -append 'root=/dev/sda1 console=tty0 console=ttyS0' -initrd /tmp/initramfs.img /guests/f14-nolvm.qcow2 -enable-kvm -snapshot With this setup, with and without patches, I can spawn two consoles via: /sbin/agetty /dev/hvc0 9600 vt100 /sbin/agetty /dev/hvc1 9600 vt100 (Strange thing is, the second one gives a 'password incorrect' error on login attempts, while the first one logs in fine. I do remember testing multiple consoles just fine a year and a half back, so no idea why this isn't behaving as expected -- but it mostly looks like a userspace issue rather than kernel one.) As mentioned earlier, I've not looked at the hvc code, but given my testing results, I'd like to first understand what you're seeing and what your environment is. Amit
On (Wed) 23 Nov 2011 [16:08:52], Amit Shah wrote: > With this setup, with and without patches, I can spawn two consoles > via: > > /sbin/agetty /dev/hvc0 9600 vt100 > /sbin/agetty /dev/hvc1 9600 vt100 > > (Strange thing is, the second one gives a 'password incorrect' error > on login attempts, while the first one logs in fine. I do remember > testing multiple consoles just fine a year and a half back, so no idea > why this isn't behaving as expected -- but it mostly looks like a > userspace issue rather than kernel one.) Right -- when I test this on the Fedora 11 VM I used back then, the two consoles work just fine without these patches. When I use something newer (F14), I get the weird password rejection, with and without your patches. Amit
On Wed, 2011-11-23 at 18:26 +0530, Amit Shah wrote: > On (Wed) 23 Nov 2011 [16:08:52], Amit Shah wrote: > > With this setup, with and without patches, I can spawn two consoles > > via: > > > > /sbin/agetty /dev/hvc0 9600 vt100 > > /sbin/agetty /dev/hvc1 9600 vt100 > > > > (Strange thing is, the second one gives a 'password incorrect' error > > on login attempts, while the first one logs in fine. I do remember > > testing multiple consoles just fine a year and a half back, so no idea > > why this isn't behaving as expected -- but it mostly looks like a > > userspace issue rather than kernel one.) > > Right -- when I test this on the Fedora 11 VM I used back then, the > two consoles work just fine without these patches. When I use > something newer (F14), I get the weird password rejection, with and > without your patches. It's not a weird password rejection, it's probably not set as a secure terminal :) Try adding hvc1 to /etc/securetty on the guest.
On (Wed) 23 Nov 2011 [15:06:04], Sasha Levin wrote: > On Wed, 2011-11-23 at 18:26 +0530, Amit Shah wrote: > > On (Wed) 23 Nov 2011 [16:08:52], Amit Shah wrote: > > > With this setup, with and without patches, I can spawn two consoles > > > via: > > > > > > /sbin/agetty /dev/hvc0 9600 vt100 > > > /sbin/agetty /dev/hvc1 9600 vt100 > > > > > > (Strange thing is, the second one gives a 'password incorrect' error > > > on login attempts, while the first one logs in fine. I do remember > > > testing multiple consoles just fine a year and a half back, so no idea > > > why this isn't behaving as expected -- but it mostly looks like a > > > userspace issue rather than kernel one.) > > > > Right -- when I test this on the Fedora 11 VM I used back then, the > > two consoles work just fine without these patches. When I use > > something newer (F14), I get the weird password rejection, with and > > without your patches. > > It's not a weird password rejection, it's probably not set as a secure > terminal :) > > Try adding hvc1 to /etc/securetty on the guest. Ah, of course :-) Amit
Amit, You said that the work would be serialized "due to port additions being on work items on the same workqueue". I'm not seeing that. I've double checked this by using a mutex_trylock in hvc_console::hvc_alloc(), and here's the relevant output from dmesg: root@myubuntu:~# dmesg | grep MBH [3307216.210274] MBH: got hvc_ports_mutex [3307216.210690] MBH: trylock of hvc_ports_mutex failed [3307216.211143] MBH: got hvc_ports_mutex This is in a system with two virtio console ports, each of which is a console. I think if the VIRTIO_CONSOLE_CONSOLE_PORT message handling were actually being serialized, the trylock should never fail. What's the source of the serialization for the workqueue items? At first reading it looks like the control_work_handler gets called for each virtio interrupt? Miche On Wed, Nov 23, 2011 at 2:38 AM, Amit Shah <amit.shah@redhat.com> wrote: > On (Thu) 17 Nov 2011 [10:57:37], Miche Baker-Harvey wrote: >> Rusty, Michael, Stephen, et al, >> >> Thanks for your comments on these patches. >> >> For what I'm trying to do, all three patches are necessary, but maybe >> I'm going about it the wrong way. Your input would be appreciated. >> I'm in no way claiming that these patches are "right", just that it's >> working for me, and that what's in the current pool is not. >> >> What I'm trying to do is: >> On X86, >> under KVM, >> start a virtio console device, >> with multiple ports on the device, >> at least one of which is also a console (as well as ttyS0). >> >> (Eventually, we want to be able to add virtio console ports on the >> fly, and to have multiple virtio console ports be consoles.) > > Are you using kvm-tool to do this? QEMU can already hot-plug ports > and virtio-console (virtio-serial) devices. > >> When all three of the patches are in place, this works great. (By >> great, I mean that getty's start up on all of ttyS0, hvc0 and hvc1, >> and console output goes to ttyS0 and to hvc0. >> "who" shows three users: ttyS0, hvc0, and hvc1. >> "cat /proc/consoles" shows both ttyS0 and hvc0. >> I can use all three getty's, and console output really does appear on >> both the consoles. >> >> Based on Rusty's comments, I tried removing each of the patches >> individually. Here's what happens today. I've seen other failure modes >> depending on what precisely I'm passing the guest. >> There's three patches: >> 1/3 "fix locking of vtermno" >> 2/3 "enforce one-time initialization with hvc_init >> "3/3 "use separate struct console * for each console" >> >> If I remove the "fix locking of vtermno", I only get one virtio >> console terminal. "who" shows the ttyS0 and the hvc0, and I can log >> into the gettys on both. I don't get the second virtio console getty. >> Interestingly, hvc0 shows up in /proc/consoles twice, and in fact the >> console output is dumped twice to hvc0 (as you'd expect from looking >> at printk.c, each line appears twice, followed by the next line.) > > I don't really understand why. "fix locking of vtermno" adds locks in > init_port_console(), which is called from add_port(), which should be > serialised due to port additions being on work items on the same > workqueue. I don't see a race here. > >> If I remove the "enforce one-time initialization with hvc_init" patch, >> which makes sure only a single thread is doing the hvc_init, and gates >> anyone from continuing until it has completed, I get different >> failures, including hangs, and dereferences of NULL pointers. >> >> If I remove the "use separate struct console * for each console"patch, >> what I'm seeing now is that while all of ttyS0, hvc0, and hvc1 are >> present with gettys running on them, of the three, only ttyS0 is a >> console. > > I don't see any difference in my testing with and without these > patches. > > This is how I tested with qemu: > > ./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -chardev > socket,path=/tmp/foo,server,nowait,id=foo -chardev > socket,path=/tmp/bar,server,nowait,id=bar -device virtio-serial > -device virtconsole,chardev=foo,nr=4 -device > virtconsole,chardev=bar,nr=3 -net none -kernel > /home/amit/src/linux/arch/x86/boot/bzImage -append 'root=/dev/sda1 > console=tty0 console=ttyS0' -initrd /tmp/initramfs.img > /guests/f14-nolvm.qcow2 -enable-kvm -snapshot > > With this setup, with and without patches, I can spawn two consoles > via: > > /sbin/agetty /dev/hvc0 9600 vt100 > /sbin/agetty /dev/hvc1 9600 vt100 > > (Strange thing is, the second one gives a 'password incorrect' error > on login attempts, while the first one logs in fine. I do remember > testing multiple consoles just fine a year and a half back, so no idea > why this isn't behaving as expected -- but it mostly looks like a > userspace issue rather than kernel one.) > > As mentioned earlier, I've not looked at the hvc code, but given my > testing results, I'd like to first understand what you're seeing and > what your environment is. > > Amit >
Hi, On (Mon) 28 Nov 2011 [15:40:41], Miche Baker-Harvey wrote: > Amit, > > You said that the work would be serialized "due to port additions > being on work items on the same workqueue". I'm not seeing that. You leave a lot of questions unanswered. What's your environment? Are you hot-plugging ports? Are you using qemu? What's your command line? > I've double checked this by using a mutex_trylock in > hvc_console::hvc_alloc(), and here's the relevant output from dmesg: > > root@myubuntu:~# dmesg | grep MBH > [3307216.210274] MBH: got hvc_ports_mutex > [3307216.210690] MBH: trylock of hvc_ports_mutex failed > [3307216.211143] MBH: got hvc_ports_mutex > > This is in a system with two virtio console ports, each of which is a > console. I think if the VIRTIO_CONSOLE_CONSOLE_PORT message handling > were actually being serialized, the trylock should never fail. Agreed. > What's the source of the serialization for the workqueue items? At > first reading it looks like the control_work_handler gets called for > each virtio interrupt? It all depends on how you add ports. If you're using qemu, they happen via the control work handler. Amit
Amit, We aren't using either QEMU or kvmtool, but we are using KVM. All theissues we are seeing happen when we try to establish multiple virtioconsoles at boot time. The command line isn't relevant, but I cantell you the protocol that's passing between the host (kvm) and theguest (see the end of this message). We do go through the control_work_handler(), but it's not providingsynchronization. Here's a trace of the control_work_handler() andhandle_control_message() calls; note that there are two concurrentcalls to control_work_handler(). I decorated control_work_handler() with a "lifetime" marker, and passedthis value to handle_control_message(), so we can see which control messages are being handled from which instance of thecontrol_work_handler() thread. Notice that we enter control_work_handler() a second time before thehandling of the second PORT_ADD message is complete. The firstCONSOLE_PORT message is handled by the second control_work_handler()call, but the second is handled by the first control_work_handler() call. root@myubuntu:~# dmesg | grep MBH[3371055.808738] control_work_handler #1[3371055.809372] + #1 handle_control_message PORT_ADD[3371055.810169] - handle_control_message PORT_ADD[3371055.810170] + #1 handle_control_message PORT_ADD [3371055.810244] control_work_handler #2[3371055.810245] + #2 handle_control_message CONSOLE_PORT[3371055.810246] got hvc_ports_mutex[3371055.810578] - handle_control_message PORT_ADD[3371055.810579] + #1 handle_control_message CONSOLE_PORT[3371055.810580] trylock of hvc_ports_mutex failed[3371055.811352] got hvc_ports_mutex[3371055.811370] - handle_control_message CONSOLE_PORT[3371055.816609] - handle_control_message CONSOLE_PORT So, I'm guessing the bug is that there shouldn't be two instances ofcontrol_work_handler() running simultaneously? Thanks, Miche Protocol:We set up the virtio console device registers during initialization,specifying the multiport feature, and some number of ports, n, where nis greater than 1. In the guest, virtcons_probe() finds our device, and successfully sendsthe VIRTIO_CONSOLE_DEVICE_READY=1 control message. On the host, we receive the VIRTIO_CONSOLE_DEVICE_READY message, andsend one VIRTIO_CONSOLE_PORT_ADD message via the Receive Control queuefor each port in the number of ports. These messages are notserialized: they are all sent at once. The VIRTIO_CONSOLE_PORT_ADD messages are received inhandle_control_message() in virtio_console.c, and add_port() is calledfor each. After each port is added, the guest sendsVIRTIO_CONSOLE_PORT_READY to the host, and in these messages, the idof the port is included in the message. On the host, in response to each VIRTIO_CONSOLE_PORT_READY=1 message,we may send a VIRTIO_CONSOLE_CONSOLE_PORT message. On the guest, in response to each VIRTIO_CONSOLE_CONSOLE_PORT message,init_port_console() is called on the individual port. The same id isused for these messages as was used for the PORT_READY messages. After each successful init_port_console(), the guest sendsVIRTIO_CONSOLE_PORT_OPEN back to the host. On Tue, Nov 29, 2011 at 6:21 AM, Amit Shah <amit.shah@redhat.com> wrote: > Hi, > > On (Mon) 28 Nov 2011 [15:40:41], Miche Baker-Harvey wrote: >> Amit, >> >> You said that the work would be serialized "due to port additions >> being on work items on the same workqueue". I'm not seeing that. > > You leave a lot of questions unanswered. What's your environment? > Are you hot-plugging ports? Are you using qemu? What's your command > line? > >> I've double checked this by using a mutex_trylock in >> hvc_console::hvc_alloc(), and here's the relevant output from dmesg: >> >> root@myubuntu:~# dmesg | grep MBH >> [3307216.210274] MBH: got hvc_ports_mutex >> [3307216.210690] MBH: trylock of hvc_ports_mutex failed >> [3307216.211143] MBH: got hvc_ports_mutex >> >> This is in a system with two virtio console ports, each of which is a >> console. I think if the VIRTIO_CONSOLE_CONSOLE_PORT message handling >> were actually being serialized, the trylock should never fail. > > Agreed. > >> What's the source of the serialization for the workqueue items? At >> first reading it looks like the control_work_handler gets called for >> each virtio interrupt? > > It all depends on how you add ports. If you're using qemu, they > happen via the control work handler. > > Amit >
Good grief! Sorry for the spacing mess-up! Here's a resend with reformatting. Amit, We aren't using either QEMU or kvmtool, but we are using KVM. All the issues we are seeing happen when we try to establish multiple virtioconsoles at boot time. The command line isn't relevant, but I can tell you the protocol that's passing between the host (kvm) and the guest (see the end of this message). We do go through the control_work_handler(), but it's not providing synchronization. Here's a trace of the control_work_handler() and handle_control_message() calls; note that there are two concurrent calls to control_work_handler(). I decorated control_work_handler() with a "lifetime" marker, and passed this value to handle_control_message(), so we can see which control messages are being handled from which instance of the control_work_handler() thread. Notice that we enter control_work_handler() a second time before the handling of the second PORT_ADD message is complete. The first CONSOLE_PORT message is handled by the second control_work_handler() call, but the second is handled by the first control_work_handler() call. root@myubuntu:~# dmesg | grep MBH [3371055.808738] control_work_handler #1 [3371055.809372] + #1 handle_control_message PORT_ADD [3371055.810169] - handle_control_message PORT_ADD [3371055.810170] + #1 handle_control_message PORT_ADD [3371055.810244] control_work_handler #2 [3371055.810245] + #2 handle_control_message CONSOLE_PORT [3371055.810246] got hvc_ports_mutex [3371055.810578] - handle_control_message PORT_ADD [3371055.810579] + #1 handle_control_message CONSOLE_PORT [3371055.810580] trylock of hvc_ports_mutex failed [3371055.811352] got hvc_ports_mutex [3371055.811370] - handle_control_message CONSOLE_PORT [3371055.816609] - handle_control_message CONSOLE_PORT So, I'm guessing the bug is that there shouldn't be two instances of control_work_handler() running simultaneously? Thanks, Miche Protocol: We set up the virtio console device registers during initialization, specifying the multiport feature, and some number of ports, n, where nis greater than 1. In the guest, virtcons_probe() finds our device, and successfully sends the VIRTIO_CONSOLE_DEVICE_READY=1 control message. On the host, we receive the VIRTIO_CONSOLE_DEVICE_READY message, and send one VIRTIO_CONSOLE_PORT_ADD message via the Receive Control queuefor each port in the number of ports. These messages are not serialized: they are all sent at once. The VIRTIO_CONSOLE_PORT_ADD messages are received in handle_control_message() in virtio_console.c, and add_port() is called for each. After each port is added, the guest sendsVIRTIO_CONSOLE_PORT_READY to the host, and in these messages, the id of the port is included in the message. On the host, in response to each VIRTIO_CONSOLE_PORT_READY=1 message, we may send a VIRTIO_CONSOLE_CONSOLE_PORT message. On the guest, in response to each VIRTIO_CONSOLE_CONSOLE_PORT message, init_port_console() is called on the individual port. The same id is used for these messages as was used for the PORT_READY messages. After each successful init_port_console(), the guest sends VIRTIO_CONSOLE_PORT_OPEN back to the host.
On (Tue) 29 Nov 2011 [09:50:41], Miche Baker-Harvey wrote: > Good grief! Sorry for the spacing mess-up! Here's a resend with reformatting. > > Amit, > We aren't using either QEMU or kvmtool, but we are using KVM. All So it's a different userspace? Any chance this different userspace is causing these problems to appear? Esp. since I couldn't reproduce with qemu. > the issues we are seeing happen when we try to establish multiple > virtioconsoles at boot time. The command line isn't relevant, but I > can tell you the protocol that's passing between the host (kvm) and > the guest (see the end of this message). > > We do go through the control_work_handler(), but it's not > providing synchronization. Here's a trace of the > control_work_handler() and handle_control_message() calls; note that > there are two concurrent calls to control_work_handler(). Ah; how does that happen? control_work_handler() should just be invoked once, and if there are any more pending work items to be consumed, they should be done within the loop inside control_work_handler(). > I decorated control_work_handler() with a "lifetime" marker, and > passed this value to handle_control_message(), so we can see which > control messages are being handled from which instance of > the control_work_handler() thread. > > Notice that we enter control_work_handler() a second time before > the handling of the second PORT_ADD message is complete. The > first CONSOLE_PORT message is handled by the second > control_work_handler() call, but the second is handled by the first > control_work_handler() call. > > root@myubuntu:~# dmesg | grep MBH > [3371055.808738] control_work_handler #1 > [3371055.809372] + #1 handle_control_message PORT_ADD > [3371055.810169] - handle_control_message PORT_ADD > [3371055.810170] + #1 handle_control_message PORT_ADD > [3371055.810244] control_work_handler #2 > [3371055.810245] + #2 handle_control_message CONSOLE_PORT > [3371055.810246] got hvc_ports_mutex > [3371055.810578] - handle_control_message PORT_ADD > [3371055.810579] + #1 handle_control_message CONSOLE_PORT > [3371055.810580] trylock of hvc_ports_mutex failed > [3371055.811352] got hvc_ports_mutex > [3371055.811370] - handle_control_message CONSOLE_PORT > [3371055.816609] - handle_control_message CONSOLE_PORT > > So, I'm guessing the bug is that there shouldn't be two instances of > control_work_handler() running simultaneously? Yep, I assumed we did that but apparently not. Do you plan to chase this one down? Amit
Amit, Ah, indeed. I am not using MSI-X, so virtio_pci::vp_try_to_find_vqs() calls vp_request_intx() and sets up an interrupt callback. From there, when an interrupt occurs, the stack looks something like this: virtio_pci::vp_interrupt() virtio_pci::vp_vring_interrupt() virtio_ring::vring_interrupt() vq->vq.callback() <-- in this case, that's virtio_console::control_intr() workqueue::schedule_work() workqueue::queue_work() queue_work_on(get_cpu()) <-- queues the work on the current CPU. I'm not doing anything to keep multiple control message from being sent concurrently to the guest, and we will take those interrupts on any CPU. I've confirmed that the two instances of handle_control_message() are occurring on different CPUs. Should this work? I don't see anywhere that QEMU is serializing the sending of data to the control queue in the guest, and there's no serialization in the control_intr. I don't understand why you are not seeing the concurrent execution of handle_control_message(). Are you taking all your interrupts on a single CPU, maybe? Or is there some other serialization in user space? Miche On Mon, Dec 5, 2011 at 2:54 AM, Amit Shah <amit.shah@redhat.com> wrote: > On (Tue) 29 Nov 2011 [09:50:41], Miche Baker-Harvey wrote: >> Good grief! Sorry for the spacing mess-up! Here's a resend with reformatting. >> >> Amit, >> We aren't using either QEMU or kvmtool, but we are using KVM. All > > So it's a different userspace? Any chance this different userspace is > causing these problems to appear? Esp. since I couldn't reproduce > with qemu. > >> the issues we are seeing happen when we try to establish multiple >> virtioconsoles at boot time. The command line isn't relevant, but I >> can tell you the protocol that's passing between the host (kvm) and >> the guest (see the end of this message). >> >> We do go through the control_work_handler(), but it's not >> providing synchronization. Here's a trace of the >> control_work_handler() and handle_control_message() calls; note that >> there are two concurrent calls to control_work_handler(). > > Ah; how does that happen? control_work_handler() should just be > invoked once, and if there are any more pending work items to be > consumed, they should be done within the loop inside > control_work_handler(). > >> I decorated control_work_handler() with a "lifetime" marker, and >> passed this value to handle_control_message(), so we can see which >> control messages are being handled from which instance of >> the control_work_handler() thread. >> >> Notice that we enter control_work_handler() a second time before >> the handling of the second PORT_ADD message is complete. The >> first CONSOLE_PORT message is handled by the second >> control_work_handler() call, but the second is handled by the first >> control_work_handler() call. >> >> root@myubuntu:~# dmesg | grep MBH >> [3371055.808738] control_work_handler #1 >> [3371055.809372] + #1 handle_control_message PORT_ADD >> [3371055.810169] - handle_control_message PORT_ADD >> [3371055.810170] + #1 handle_control_message PORT_ADD >> [3371055.810244] control_work_handler #2 >> [3371055.810245] + #2 handle_control_message CONSOLE_PORT >> [3371055.810246] got hvc_ports_mutex >> [3371055.810578] - handle_control_message PORT_ADD >> [3371055.810579] + #1 handle_control_message CONSOLE_PORT >> [3371055.810580] trylock of hvc_ports_mutex failed >> [3371055.811352] got hvc_ports_mutex >> [3371055.811370] - handle_control_message CONSOLE_PORT >> [3371055.816609] - handle_control_message CONSOLE_PORT >> >> So, I'm guessing the bug is that there shouldn't be two instances of >> control_work_handler() running simultaneously? > > Yep, I assumed we did that but apparently not. Do you plan to chase > this one down? > > Amit >
On (Tue) 06 Dec 2011 [09:05:38], Miche Baker-Harvey wrote: > Amit, > > Ah, indeed. I am not using MSI-X, so virtio_pci::vp_try_to_find_vqs() > calls vp_request_intx() and sets up an interrupt callback. From > there, when an interrupt occurs, the stack looks something like this: > > virtio_pci::vp_interrupt() > virtio_pci::vp_vring_interrupt() > virtio_ring::vring_interrupt() > vq->vq.callback() <-- in this case, that's virtio_console::control_intr() > workqueue::schedule_work() > workqueue::queue_work() > queue_work_on(get_cpu()) <-- queues the work on the current CPU. > > I'm not doing anything to keep multiple control message from being > sent concurrently to the guest, and we will take those interrupts on > any CPU. I've confirmed that the two instances of > handle_control_message() are occurring on different CPUs. So let's have a new helper, port_lock() that takes the port-specific spinlock. There has to be a new helper, since the port lock should depend on the portdev lock being taken too. For the port addition case, just the portdev lock should be taken. For any other operations, the port lock should be taken. My assumption was that we would be able to serialise the work items, but that will be too restrictive. Taking port locks sounds like a better idea. We'd definitely need the port lock in the control work handler. We might need it in a few more places (like module removal), but we'll worry about that later. Does this sound fine? Amit
So on a CONSOLE_PORT_ADD message, we would take the (existing)ports_device::ports_lock, and for other control messages we would justtake the (new) port::port_lock? You are concerned that just takingthe ports_lock for all control messages could be too restrictive? Iwouldn't have expected these messages to be frequent occurrences, butI'll defer to your experience here. The CONSOLE_CONSOLE_PORT message calls hvc_alloc, which also needsserialization. That's in another one of these three patches; are youthinking we could leave that patch be, or that we would we use theport_lock for CONSOLE_CONSOLE_PORT? Using the port_lock wouldprovide the HVC serialization "for free" but it would be cleaner if weput HVC related synchronization in hvc_console.c. On Thu, Dec 8, 2011 at 4:08 AM, Amit Shah <amit.shah@redhat.com> wrote: > On (Tue) 06 Dec 2011 [09:05:38], Miche Baker-Harvey wrote: >> Amit, >> >> Ah, indeed. I am not using MSI-X, so virtio_pci::vp_try_to_find_vqs() >> calls vp_request_intx() and sets up an interrupt callback. From >> there, when an interrupt occurs, the stack looks something like this: >> >> virtio_pci::vp_interrupt() >> virtio_pci::vp_vring_interrupt() >> virtio_ring::vring_interrupt() >> vq->vq.callback() <-- in this case, that's virtio_console::control_intr() >> workqueue::schedule_work() >> workqueue::queue_work() >> queue_work_on(get_cpu()) <-- queues the work on the current CPU. >> >> I'm not doing anything to keep multiple control message from being >> sent concurrently to the guest, and we will take those interrupts on >> any CPU. I've confirmed that the two instances of >> handle_control_message() are occurring on different CPUs. > > So let's have a new helper, port_lock() that takes the port-specific > spinlock. There has to be a new helper, since the port lock should > depend on the portdev lock being taken too. For the port addition > case, just the portdev lock should be taken. For any other > operations, the port lock should be taken. > > My assumption was that we would be able to serialise the work items, > but that will be too restrictive. Taking port locks sounds like a > better idea. > > We'd definitely need the port lock in the control work handler. We > might need it in a few more places (like module removal), but we'll > worry about that later. > > Does this sound fine? > > Amit
On (Mon) 12 Dec 2011 [11:11:55], Miche Baker-Harvey wrote: > So on a CONSOLE_PORT_ADD message, we would take the > (existing)ports_device::ports_lock, and for other control messages we > would justtake the (new) port::port_lock? You are concerned that just > takingthe ports_lock for all control messages could be too > restrictive? Iwouldn't have expected these messages to be frequent > occurrences, butI'll defer to your experience here. No, I mean we'll have to take the new port_lock() everywhere we currently take the port lock, plus in a few more places. I only suggest using port_lock() helper since we'll need a dependency on the portdev lock as well. > The CONSOLE_CONSOLE_PORT message calls hvc_alloc, which also > needsserialization. That's in another one of these three patches; are > youthinking we could leave that patch be, or that we would we use > theport_lock for CONSOLE_CONSOLE_PORT? Using the port_lock > wouldprovide the HVC serialization "for free" but it would be cleaner > if weput HVC related synchronization in hvc_console.c. Yes, definitely, since other users of hvc_console may get bitten in similar ways. However, I'm not too familiar with the hvc code, the people at linux-ppc can be of help. > On Thu, Dec 8, 2011 at 4:08 AM, Amit Shah <amit.shah@redhat.com> wrote: > > On (Tue) 06 Dec 2011 [09:05:38], Miche Baker-Harvey wrote: > >> Amit, > >> > >> Ah, indeed. I am not using MSI-X, so virtio_pci::vp_try_to_find_vqs() > >> calls vp_request_intx() and sets up an interrupt callback. From > >> there, when an interrupt occurs, the stack looks something like this: > >> > >> virtio_pci::vp_interrupt() > >> virtio_pci::vp_vring_interrupt() > >> virtio_ring::vring_interrupt() > >> vq->vq.callback() <-- in this case, that's virtio_console::control_intr() > >> workqueue::schedule_work() > >> workqueue::queue_work() > >> queue_work_on(get_cpu()) <-- queues the work on the current CPU. > >> > >> I'm not doing anything to keep multiple control message from being > >> sent concurrently to the guest, and we will take those interrupts on > >> any CPU. I've confirmed that the two instances of > >> handle_control_message() are occurring on different CPUs. > > > > So let's have a new helper, port_lock() that takes the port-specific > > spinlock. There has to be a new helper, since the port lock should > > depend on the portdev lock being taken too. For the port addition > > case, just the portdev lock should be taken. For any other > > operations, the port lock should be taken. > > > > My assumption was that we would be able to serialise the work items, > > but that will be too restrictive. Taking port locks sounds like a > > better idea. > > > > We'd definitely need the port lock in the control work handler. We > > might need it in a few more places (like module removal), but we'll > > worry about that later. > > > > Does this sound fine? > > > > Amit Amit
diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index b6b2d18..09a6159 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -29,8 +29,9 @@ #include <linux/kernel.h> #include <linux/kthread.h> #include <linux/list.h> -#include <linux/module.h> #include <linux/major.h> +#include <linux/module.h> +#include <linux/mutex.h> #include <linux/sysrq.h> #include <linux/tty.h> #include <linux/tty_flip.h> @@ -84,6 +85,10 @@ static LIST_HEAD(hvc_structs); * list traversal. */ static DEFINE_SPINLOCK(hvc_structs_lock); +/* + * only one task does allocation at a time. + */ +static DEFINE_MUTEX(hvc_ports_mutex); /* * This value is used to assign a tty->index value to a hvc_struct based @@ -825,11 +830,15 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data, int i; /* We wait until a driver actually comes along */ + mutex_lock(&hvc_ports_mutex); if (!hvc_driver) { int err = hvc_init(); - if (err) + if (err) { + mutex_unlock(&hvc_ports_mutex); return ERR_PTR(err); + } } + mutex_unlock(&hvc_ports_mutex); hp = kzalloc(ALIGN(sizeof(*hp), sizeof(long)) + outbuf_size, GFP_KERNEL);
hvc_init() must only be called once, and no thread should continue with hvc_alloc() until after initialization is complete. The original code does not enforce either of these requirements. A new mutex limits entry to hvc_init() to a single thread, and blocks all later comers until it has completed. This patch fixes multiple crash symptoms. Signed-off-by: Miche Baker-Harvey <miche@google.com> --- drivers/tty/hvc/hvc_console.c | 13 +++++++++++-- 1 files changed, 11 insertions(+), 2 deletions(-)