From patchwork Fri Jun 9 22:13:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Belgazal, Netanel" X-Patchwork-Id: 774158 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3wkxS74LFpz9s75 for ; Sat, 10 Jun 2017 08:15:19 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=amazon.com header.i=@amazon.com header.b="VjIoh2O2"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751772AbdFIWPR (ORCPT ); Fri, 9 Jun 2017 18:15:17 -0400 Received: from smtp-fw-6001.amazon.com ([52.95.48.154]:12673 "EHLO smtp-fw-6001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751766AbdFIWPQ (ORCPT ); Fri, 9 Jun 2017 18:15:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1497046515; x=1528582515; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=EOkHZiOUulu4oOHsvqMcRFzRu+6FzGo0KWIk7bI8t5M=; b=VjIoh2O2i05ATIae+/vHx7Jq+wqzKoMPk/aYORcvLxfDfeS+3SZ6lVTz lFxfE1rmLS1QS41TViZW23Cb4fqBOIUCr+mr+Lh6XGHZBcmFjZhB/t1Fp 9GnvIFCnU8cnsMKbGVD9HyEFIFhqv0Og7TLfvVnOxQQ+ECF2G+0Z2FG2h E=; X-IronPort-AV: E=Sophos;i="5.39,320,1493683200"; d="scan'208";a="292512304" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-62005.pdx2.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 09 Jun 2017 22:15:04 +0000 Received: from EX13MTAUEB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-62005.pdx2.amazon.com (8.14.7/8.14.7) with ESMTP id v59MF0Hb031240 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Fri, 9 Jun 2017 22:15:04 GMT Received: from EX13D08UEB003.ant.amazon.com (10.43.60.11) by EX13MTAUEB001.ant.amazon.com (10.43.60.96) with Microsoft SMTP Server (TLS) id 15.0.1104.5; Fri, 9 Jun 2017 22:14:53 +0000 Received: from EX13MTAUEB001.ant.amazon.com (10.43.60.96) by EX13D08UEB003.ant.amazon.com (10.43.60.11) with Microsoft SMTP Server (TLS) id 15.0.1104.5; Fri, 9 Jun 2017 22:14:52 +0000 Received: from ud481d781358959019aec.amazon.com (10.85.92.22) by mail-relay.amazon.com (10.43.60.129) with Microsoft SMTP Server id 15.0.1104.5 via Frontend Transport; Fri, 9 Jun 2017 22:14:48 +0000 From: To: , CC: Netanel Belgazal , , , , , , , , Subject: [PATCH net-next 8/8] net: ena: bug fix in lost tx packets detection mechanism Date: Sat, 10 Jun 2017 01:13:57 +0300 Message-ID: <1497046437-20390-9-git-send-email-netanel@amazon.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1497046437-20390-1-git-send-email-netanel@amazon.com> References: <1497046437-20390-1-git-send-email-netanel@amazon.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Netanel Belgazal check_for_missing_tx_completions() is called from a timer task and looking for lost tx packets. The old implementation accumulate all the lost tx packets and did not check if those packets were retrieved on a later stage. This cause to a situation where the driver reset the device for no reason. Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 - drivers/net/ethernet/amazon/ena/ena_netdev.c | 66 +++++++++++++++------------ drivers/net/ethernet/amazon/ena/ena_netdev.h | 14 +++++- 3 files changed, 50 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 533b2fb..3ee55e2 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -80,7 +80,6 @@ static const struct ena_stats ena_stats_tx_strings[] = { ENA_STAT_TX_ENTRY(tx_poll), ENA_STAT_TX_ENTRY(doorbells), ENA_STAT_TX_ENTRY(prepare_ctx_err), - ENA_STAT_TX_ENTRY(missing_tx_comp), ENA_STAT_TX_ENTRY(bad_req_id), }; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 3c366bf..4f16ed3 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1995,6 +1995,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev) tx_info->tx_descs = nb_hw_desc; tx_info->last_jiffies = jiffies; + tx_info->print_once = 0; tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use, tx_ring->ring_size); @@ -2564,13 +2565,44 @@ static void ena_fw_reset_device(struct work_struct *work) "Reset attempt failed. Can not reset the device\n"); } -static void check_for_missing_tx_completions(struct ena_adapter *adapter) +static int check_missing_comp_in_queue(struct ena_adapter *adapter, + struct ena_ring *tx_ring) { struct ena_tx_buffer *tx_buf; unsigned long last_jiffies; + u32 missed_tx = 0; + int i; + + for (i = 0; i < tx_ring->ring_size; i++) { + tx_buf = &tx_ring->tx_buffer_info[i]; + last_jiffies = tx_buf->last_jiffies; + if (unlikely(last_jiffies && + time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) { + if (!tx_buf->print_once) + netif_notice(adapter, tx_err, adapter->netdev, + "Found a Tx that wasn't completed on time, qid %d, index %d.\n", + tx_ring->qid, i); + + tx_buf->print_once = 1; + missed_tx++; + + if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) { + netif_err(adapter, tx_err, adapter->netdev, + "The number of lost tx completions is above the threshold (%d > %d). Reset the device\n", + missed_tx, MAX_NUM_OF_TIMEOUTED_PACKETS); + set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags); + return -EIO; + } + } + } + + return 0; +} + +static void check_for_missing_tx_completions(struct ena_adapter *adapter) +{ struct ena_ring *tx_ring; - int i, j, budget; - u32 missed_tx; + int i, budget, rc; /* Make sure the driver doesn't turn the device in other process */ smp_rmb(); @@ -2586,31 +2618,9 @@ static void check_for_missing_tx_completions(struct ena_adapter *adapter) for (i = adapter->last_monitored_tx_qid; i < adapter->num_queues; i++) { tx_ring = &adapter->tx_ring[i]; - for (j = 0; j < tx_ring->ring_size; j++) { - tx_buf = &tx_ring->tx_buffer_info[j]; - last_jiffies = tx_buf->last_jiffies; - if (unlikely(last_jiffies && time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) { - netif_notice(adapter, tx_err, adapter->netdev, - "Found a Tx that wasn't completed on time, qid %d, index %d.\n", - tx_ring->qid, j); - - u64_stats_update_begin(&tx_ring->syncp); - missed_tx = tx_ring->tx_stats.missing_tx_comp++; - u64_stats_update_end(&tx_ring->syncp); - - /* Clear last jiffies so the lost buffer won't - * be counted twice. - */ - tx_buf->last_jiffies = 0; - - if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) { - netif_err(adapter, tx_err, adapter->netdev, - "The number of lost tx completion is above the threshold (%d > %d). Reset the device\n", - missed_tx, MAX_NUM_OF_TIMEOUTED_PACKETS); - set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags); - } - } - } + rc = check_missing_comp_in_queue(adapter, tx_ring); + if (unlikely(rc)) + return; budget--; if (!budget) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 8828f1d..88b5e56 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -146,7 +146,18 @@ struct ena_tx_buffer { u32 tx_descs; /* num of buffers used by this skb */ u32 num_of_bufs; - /* Save the last jiffies to detect missing tx packets */ + + /* Used for detect missing tx packets to limit the number of prints */ + u32 print_once; + /* Save the last jiffies to detect missing tx packets + * + * sets to non zero value on ena_start_xmit and set to zero on + * napi and timer_Service_routine. + * + * while this value is not protected by lock, + * a given packet is not expected to be handled by ena_start_xmit + * and by napi/timer_service at the same time. + */ unsigned long last_jiffies; struct ena_com_buf bufs[ENA_PKT_MAX_BUFS]; } ____cacheline_aligned; @@ -170,7 +181,6 @@ struct ena_stats_tx { u64 napi_comp; u64 tx_poll; u64 doorbells; - u64 missing_tx_comp; u64 bad_req_id; };