From patchwork Wed Jul 22 13:47:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1333893 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BBcGR3fNCz9sVB; Wed, 22 Jul 2020 23:47:27 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jyF5g-0001uo-2Z; Wed, 22 Jul 2020 13:47:24 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jyF5d-0001uH-NZ for kernel-team@lists.ubuntu.com; Wed, 22 Jul 2020 13:47:21 +0000 Received: from mail-wm1-f70.google.com ([209.85.128.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jyF5d-0000Xp-ES for kernel-team@lists.ubuntu.com; Wed, 22 Jul 2020 13:47:21 +0000 Received: by mail-wm1-f70.google.com with SMTP id v8so634472wma.6 for ; Wed, 22 Jul 2020 06:47:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IPhoQjcAqUCRYNwlCeB6L3nRoLdth3HSixLIj4XiBlo=; b=eClbTM/bpQeP550ndcet2gmzT0rLo1j5MX1hdyR5bYSssJjIiN1sFEfIhkeX1RExbm vZwT8m5s8EV2tM9N7kYpoPpc9RD2+ZE86vpY3d+tqAuWHDrJF0865gRHrEpbJawjaXiB 5YalpmoFdhwTyEU1zVIxBhUDD2cvsVr84nmaCpKMALhUo6Fp/ZGAqCjHFAWHWzRCKKM/ siGf3lH8LV3IKNiuBFeP7z2PrS/BY6Iq74BSBvBxK9n2UjwmxhjiT1O5hkLnpsnAgciW L5nLsqosoNxhynPLNKfQ2ljk4gzWpsAvpI8dIIHl7ap4eXQDkhbUpSVFlhqSTtXP0gg+ 8xvw== X-Gm-Message-State: AOAM531Nq+X4KYTDPThM4tiwjf+jJlPMmeyKJ/gDDyP0g96zZ6mPZdWA OUcDtU88KrboaJfz/p7dnhvfGgw1zIMssPGuW7onsoExOsnz+amKtbEo7ivQzFyE8fKBnx7BvMO eWfp0S7zlAQ7gp9h+pH3JeE+W8iN6OF7Zc/7Te7njjg== X-Received: by 2002:adf:b1cf:: with SMTP id r15mr33511521wra.118.1595425640409; Wed, 22 Jul 2020 06:47:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxCYtXeOXcF6ZLHNabmJWjgJ9VD/tNmRQJSgWZ2IeRGrwc7eo0aDrsfyIrG2ZUY1tPRMEd+lg== X-Received: by 2002:adf:b1cf:: with SMTP id r15mr33511504wra.118.1595425640131; Wed, 22 Jul 2020 06:47:20 -0700 (PDT) Received: from localhost.localdomain (host-87-11-131-192.retail.telecomitalia.it. [87.11.131.192]) by smtp.gmail.com with ESMTPSA id g14sm44882461wrw.83.2020.07.22.06.47.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jul 2020 06:47:18 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [F] [PATCH] UBUNTU: SAUCE: xen-netfront: fix potential deadlock in xennet_remove() Date: Wed, 22 Jul 2020 15:47:02 +0200 Message-Id: <20200722134702.899157-3-andrea.righi@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200722134702.899157-1-andrea.righi@canonical.com> References: <20200722134702.899157-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" There's a potential race in xennet_remove(); this is what the driver is doing upon unregistering a network device: 1. state = read bus state 2. if state is not "Closed": 3. request to set state to "Closing" 4. wait for state to be set to "Closing" 5. request to set state to "Closed" 6. wait for state to be set to "Closed" If the state changes to "Closed" immediately after step 1 we are stuck forever in step 4, because the state will never go back from "Closed" to "Closing". Make sure to check also for state == "Closed" in step 4 to prevent the deadlock. Also add a 5 sec timeout any time we wait for the bus state to change, to avoid getting stuck forever in wait_event() and add a debug message to help tracking down potential similar issues. Signed-off-by: Andrea Righi --- drivers/net/xen-netfront.c | 77 +++++++++++++++++++++++++++----------- 1 file changed, 55 insertions(+), 22 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 65edcdd6e05f..9e91da5f350c 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -76,6 +76,8 @@ module_param_named(freeze_timeout_secs, MODULE_PARM_DESC(freeze_timeout_secs, "timeout when freezing netfront device in seconds"); +#define XENNET_TIMEOUT (5 * HZ) + static const struct ethtool_ops xennet_ethtool_ops; struct netfront_cb { @@ -1368,12 +1370,18 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev) netif_carrier_off(netdev); - xenbus_switch_state(dev, XenbusStateInitialising); - wait_event(module_wq, - xenbus_read_driver_state(dev->otherend) != - XenbusStateClosed && - xenbus_read_driver_state(dev->otherend) != - XenbusStateUnknown); + do { + dev_dbg(&dev->dev, + "%s: switching to XenbusStateInitialising\n", + dev->nodename); + xenbus_switch_state(dev, XenbusStateInitialising); + err = wait_event_timeout(module_wq, + xenbus_read_driver_state(dev->otherend) != + XenbusStateClosed && + xenbus_read_driver_state(dev->otherend) != + XenbusStateUnknown, XENNET_TIMEOUT); + } while (!err); + return netdev; exit: @@ -2232,28 +2240,53 @@ static const struct attribute_group xennet_dev_group = { }; #endif /* CONFIG_SYSFS */ -static int xennet_remove(struct xenbus_device *dev) +static void xennet_bus_close(struct xenbus_device *dev) { - struct netfront_info *info = dev_get_drvdata(&dev->dev); - - dev_dbg(&dev->dev, "%s\n", dev->nodename); + int ret; - if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) { + if (xenbus_read_driver_state(dev->otherend) == XenbusStateClosed) + return; + do { + dev_dbg(&dev->dev, "%s: switching to XenbusStateClosing\n", + dev->nodename); xenbus_switch_state(dev, XenbusStateClosing); - wait_event(module_wq, - xenbus_read_driver_state(dev->otherend) == - XenbusStateClosing || - xenbus_read_driver_state(dev->otherend) == - XenbusStateUnknown); + ret = wait_event_timeout(module_wq, + xenbus_read_driver_state(dev->otherend) == + XenbusStateClosing || + xenbus_read_driver_state(dev->otherend) == + XenbusStateClosed || + xenbus_read_driver_state(dev->otherend) == + XenbusStateUnknown, + XENNET_TIMEOUT); + dev_dbg(&dev->dev, "%s: state = %d\n", dev->nodename, + xenbus_read_driver_state(dev->otherend)); + } while (!ret); + + if (xenbus_read_driver_state(dev->otherend) == XenbusStateClosed) + return; + do { + dev_dbg(&dev->dev, "%s: switching to XenbusStateClosed\n", + dev->nodename); xenbus_switch_state(dev, XenbusStateClosed); - wait_event(module_wq, - xenbus_read_driver_state(dev->otherend) == - XenbusStateClosed || - xenbus_read_driver_state(dev->otherend) == - XenbusStateUnknown); - } + ret = wait_event_timeout(module_wq, + xenbus_read_driver_state(dev->otherend) == + XenbusStateClosed || + xenbus_read_driver_state(dev->otherend) == + XenbusStateUnknown, + XENNET_TIMEOUT); + dev_dbg(&dev->dev, "%s: state = %d\n", dev->nodename, + xenbus_read_driver_state(dev->otherend)); + } while (!ret); +} + +static int xennet_remove(struct xenbus_device *dev) +{ + struct netfront_info *info = dev_get_drvdata(&dev->dev); + + dev_dbg(&dev->dev, "%s\n", dev->nodename); + xennet_bus_close(dev); xennet_disconnect_backend(info); if (info->netdev->reg_state == NETREG_REGISTERED)