diff mbox series

[v2,3/3] hvc/xen: fix console unplug

Message ID 20231020161529.355083-4-dwmw2@infradead.org (mailing list archive)
State Handled Elsewhere
Headers show
Series hvc/xen: Xen console fixes. | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.

Commit Message

David Woodhouse Oct. 20, 2023, 4:15 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

On unplug of a Xen console, xencons_disconnect_backend() unconditionally
calls free_irq() via unbind_from_irqhandler(), causing a warning of
freeing an already-free IRQ:

(qemu) device_del con1
[   32.050919] ------------[ cut here ]------------
[   32.050942] Trying to free already-free IRQ 33
[   32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 __free_irq+0x1d4/0x330

It should be using evtchn_put() to tear down the event channel binding,
and let the Linux IRQ side of it be handled by notifier_del_irq() through
the HVC code.

On which topic... xencons_disconnect_backend() should call hvc_remove()
*first*, rather than tearing down the event channel and grant mapping
while they are in use. And then the IRQ is guaranteed to be freed by
the time it's torn down by evtchn_put().

Since evtchn_put() also closes the actual event channel, avoid calling
xenbus_free_evtchn() except in the failure path where the IRQ was not
successfully set up.

However, calling hvc_remove() at the start of xencons_disconnect_backend()
still isn't early enough. An unplug request is indicated by the backend
setting its state to XenbusStateClosing, which triggers a notification
to xencons_backend_changed(), which... does nothing except set its own
frontend state directly to XenbusStateClosed without *actually* tearing
down the HVC device or, you know, making sure it isn't actively in use.

So the backend sees the guest frontend set its state to XenbusStateClosed
and stops servicing the interrupt... and the guest spins for ever in the
domU_write_console() function waiting for the ring to drain.

Fix that one by calling hvc_remove() from xencons_backend_changed() before
signalling to the backend that it's OK to proceed with the removal.

Tested with 'dd if=/dev/zero of=/dev/hvc1' while telling Qemu to remove
the console device.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
---
 drivers/tty/hvc/hvc_xen.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

Comments

Jürgen Groß Oct. 23, 2023, 8:14 a.m. UTC | #1
On 20.10.23 18:15, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> On unplug of a Xen console, xencons_disconnect_backend() unconditionally
> calls free_irq() via unbind_from_irqhandler(), causing a warning of
> freeing an already-free IRQ:
> 
> (qemu) device_del con1
> [   32.050919] ------------[ cut here ]------------
> [   32.050942] Trying to free already-free IRQ 33
> [   32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 __free_irq+0x1d4/0x330
> 
> It should be using evtchn_put() to tear down the event channel binding,
> and let the Linux IRQ side of it be handled by notifier_del_irq() through
> the HVC code.
> 
> On which topic... xencons_disconnect_backend() should call hvc_remove()
> *first*, rather than tearing down the event channel and grant mapping
> while they are in use. And then the IRQ is guaranteed to be freed by
> the time it's torn down by evtchn_put().
> 
> Since evtchn_put() also closes the actual event channel, avoid calling
> xenbus_free_evtchn() except in the failure path where the IRQ was not
> successfully set up.
> 
> However, calling hvc_remove() at the start of xencons_disconnect_backend()
> still isn't early enough. An unplug request is indicated by the backend
> setting its state to XenbusStateClosing, which triggers a notification
> to xencons_backend_changed(), which... does nothing except set its own
> frontend state directly to XenbusStateClosed without *actually* tearing
> down the HVC device or, you know, making sure it isn't actively in use.
> 
> So the backend sees the guest frontend set its state to XenbusStateClosed
> and stops servicing the interrupt... and the guest spins for ever in the
> domU_write_console() function waiting for the ring to drain.
> 
> Fix that one by calling hvc_remove() from xencons_backend_changed() before
> signalling to the backend that it's OK to proceed with the removal.
> 
> Tested with 'dd if=/dev/zero of=/dev/hvc1' while telling Qemu to remove
> the console device.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Cc: stable@vger.kernel.org

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
diff mbox series

Patch

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 4a768b504263..34c01874f45b 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -377,18 +377,21 @@  void xen_console_resume(void)
 #ifdef CONFIG_HVC_XEN_FRONTEND
 static void xencons_disconnect_backend(struct xencons_info *info)
 {
-	if (info->irq > 0)
-		unbind_from_irqhandler(info->irq, NULL);
-	info->irq = 0;
+	if (info->hvc != NULL)
+		hvc_remove(info->hvc);
+	info->hvc = NULL;
+	if (info->irq > 0) {
+		evtchn_put(info->evtchn);
+		info->irq = 0;
+		info->evtchn = 0;
+	}
+	/* evtchn_put() will also close it so this is only an error path */
 	if (info->evtchn > 0)
 		xenbus_free_evtchn(info->xbdev, info->evtchn);
 	info->evtchn = 0;
 	if (info->gntref > 0)
 		gnttab_free_grant_references(info->gntref);
 	info->gntref = 0;
-	if (info->hvc != NULL)
-		hvc_remove(info->hvc);
-	info->hvc = NULL;
 }
 
 static void xencons_free(struct xencons_info *info)
@@ -553,10 +556,23 @@  static void xencons_backend_changed(struct xenbus_device *dev,
 		if (dev->state == XenbusStateClosed)
 			break;
 		fallthrough;	/* Missed the backend's CLOSING state */
-	case XenbusStateClosing:
+	case XenbusStateClosing: {
+		struct xencons_info *info = dev_get_drvdata(&dev->dev);;
+
+		/*
+		 * Don't tear down the evtchn and grant ref before the other
+		 * end has disconnected, but do stop userspace from trying
+		 * to use the device before we allow the backend to close.
+		 */
+		if (info->hvc) {
+			hvc_remove(info->hvc);
+			info->hvc = NULL;
+		}
+
 		xenbus_frontend_closed(dev);
 		break;
 	}
+	}
 }
 
 static const struct xenbus_device_id xencons_ids[] = {
@@ -616,7 +632,7 @@  static int __init xen_hvc_init(void)
 		list_del(&info->list);
 		spin_unlock_irqrestore(&xencons_lock, flags);
 		if (info->irq)
-			unbind_from_irqhandler(info->irq, NULL);
+			evtchn_put(info->evtchn);
 		kfree(info);
 		return r;
 	}