From patchwork Thu Dec 7 17:55:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 845735 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="uMtsU5EJ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yt37d1RYRz9s9Y for ; Fri, 8 Dec 2017 04:56:13 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753851AbdLGR4B (ORCPT ); Thu, 7 Dec 2017 12:56:01 -0500 Received: from mail-pg0-f66.google.com ([74.125.83.66]:35909 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753835AbdLGRz6 (ORCPT ); Thu, 7 Dec 2017 12:55:58 -0500 Received: by mail-pg0-f66.google.com with SMTP id k134so4987800pga.3 for ; Thu, 07 Dec 2017 09:55:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=eHfiPh5nn2iwR8HzdL+VPmEyAM5z51mRbpidsdKgynA=; b=uMtsU5EJ6OJQy9ZwjZw020oS2px9f4xvcwXYZG1ZFxHni+laQUxVuRpPMEa79LXCvx A4+4cQJt5YXIfpPikzLvi2ctnNfKNWC4l9mAc9PApxEiG55qGslu2cvZCR7z9cRj7uFa DfTxfkFu3TJ3san0ggklLnjOFTMlBh3a1+n0Qp0qG4344sWef0keDpLk0vgzwd3QEjGI iDYzXMnY4aENM3Zva5iRLA9xRH3gr17uUBxv3eWirjzjqIYyr5OTgMc1Gr4C9kElJ7EH UCV9lmJNxxiEtnifZuQC+PyoU7Q/FdAT1l9PwBvhsL5z2mVUT6qJxJTampTmEdv54KR4 RY6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=eHfiPh5nn2iwR8HzdL+VPmEyAM5z51mRbpidsdKgynA=; b=eUQbxl1fY3sKjpqnoBGNexHfmD7Q+ekkGjmvFwxsS98iq85cVzmKHegQC43LUSmZGa uvyxr9lgC2ETd3CWdaQeGX5CRToQzYO40glE6whYrsZ1cK6Mfp72UgkrSebLLiMNXc9k 7tDO6ZXl31ZX30GTrlxiKBHJJ19bOK7mDj7jrIausYRgt0ckwSNlv9IQKFN82YQvLVuE 8XgjjbiBWWiK/bOu1KpzJ+3a3R44MHO10JAy5xrnBN82hxiYvQk+RKdjZOwF3XknyWRM arh5pg57vl0Lxm04W4hDHtujHX90zYlqM6p+rzlqEtXJE8r0PqoJ4Pnf573O8VrCU0yY 6Dug== X-Gm-Message-State: AJaThX7vUHE66bGBdBT9p/Xb2f5hBGpP0ZQ8FvBmMNKCfs1xp4VBbQMo 18uNictXXzRRJDXvQ+ef7MY= X-Google-Smtp-Source: AGs4zMabXOHYXjuVuObOPzHjVxrle7b2g/IruVkp3OSOys5jym7JOIj54n1ZG5+pRqZMhHR1hN5hYQ== X-Received: by 10.84.218.72 with SMTP id f8mr27582282plm.143.1512669357954; Thu, 07 Dec 2017 09:55:57 -0800 (PST) Received: from [127.0.1.1] ([72.168.144.118]) by smtp.gmail.com with ESMTPSA id 62sm9077879pgh.31.2017.12.07.09.55.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Dec 2017 09:55:57 -0800 (PST) Subject: [net-next PATCH 06/14] net: sched: explicit locking in gso_cpu fallback From: John Fastabend To: willemdebruijn.kernel@gmail.com, daniel@iogearbox.net, eric.dumazet@gmail.com, davem@davemloft.net Cc: netdev@vger.kernel.org, jiri@resnulli.us, xiyou.wangcong@gmail.com Date: Thu, 07 Dec 2017 09:55:45 -0800 Message-ID: <20171207175544.5771.34475.stgit@john-Precision-Tower-5810> In-Reply-To: <20171207173500.5771.41198.stgit@john-Precision-Tower-5810> References: <20171207173500.5771.41198.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This work is preparing the qdisc layer to support egress lockless qdiscs. If we are running the egress qdisc lockless in the case we overrun the netdev, for whatever reason, the netdev returns a busy error code and the skb is parked on the gso_skb pointer. With many cores all hitting this case at once its possible to have multiple sk_buffs here so we turn gso_skb into a queue. This should be the edge case and if we see this frequently then the netdev/qdisc layer needs to back off. Signed-off-by: John Fastabend --- include/net/sch_generic.h | 20 ++++++----- net/sched/sch_generic.c | 85 ++++++++++++++++++++++++++++++++++++++------- 2 files changed, 84 insertions(+), 21 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 7bc2826..6e329f0 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -88,7 +88,7 @@ struct Qdisc { /* * For performance sake on SMP, we put highly modified fields at the end */ - struct sk_buff *gso_skb ____cacheline_aligned_in_smp; + struct sk_buff_head gso_skb ____cacheline_aligned_in_smp; struct qdisc_skb_head q; struct gnet_stats_basic_packed bstats; seqcount_t running; @@ -795,26 +795,30 @@ static inline struct sk_buff *qdisc_peek_head(struct Qdisc *sch) /* generic pseudo peek method for non-work-conserving qdisc */ static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch) { + struct sk_buff *skb = skb_peek(&sch->gso_skb); + /* we can reuse ->gso_skb because peek isn't called for root qdiscs */ - if (!sch->gso_skb) { - sch->gso_skb = sch->dequeue(sch); - if (sch->gso_skb) { + if (!skb) { + skb = sch->dequeue(sch); + + if (skb) { + __skb_queue_head(&sch->gso_skb, skb); /* it's still part of the queue */ - qdisc_qstats_backlog_inc(sch, sch->gso_skb); + qdisc_qstats_backlog_inc(sch, skb); sch->q.qlen++; } } - return sch->gso_skb; + return skb; } /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */ static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch) { - struct sk_buff *skb = sch->gso_skb; + struct sk_buff *skb = skb_peek(&sch->gso_skb); if (skb) { - sch->gso_skb = NULL; + skb = __skb_dequeue(&sch->gso_skb); qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; } else { diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 80e4ae3..dfeabe3 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -45,10 +45,9 @@ * - ingress filtering is also serialized via qdisc root lock * - updates to tree and tree walking are only done under the rtnl mutex. */ - -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) { - q->gso_skb = skb; + __skb_queue_head(&q->gso_skb, skb); q->qstats.requeues++; qdisc_qstats_backlog_inc(q, skb); q->q.qlen++; /* it's still part of the queue */ @@ -57,6 +56,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) return 0; } +static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q) +{ + spinlock_t *lock = qdisc_lock(q); + + spin_lock(lock); + __skb_queue_tail(&q->gso_skb, skb); + spin_unlock(lock); + + qdisc_qstats_cpu_requeues_inc(q); + qdisc_qstats_cpu_backlog_inc(q, skb); + qdisc_qstats_cpu_qlen_inc(q); + __netif_schedule(q); + + return 0; +} + +static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) +{ + if (q->flags & TCQ_F_NOLOCK) + return dev_requeue_skb_locked(skb, q); + else + return __dev_requeue_skb(skb, q); +} + static void try_bulk_dequeue_skb(struct Qdisc *q, struct sk_buff *skb, const struct netdev_queue *txq, @@ -112,23 +135,50 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q, static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate, int *packets) { - struct sk_buff *skb = q->gso_skb; const struct netdev_queue *txq = q->dev_queue; + struct sk_buff *skb; *packets = 1; - if (unlikely(skb)) { + if (unlikely(!skb_queue_empty(&q->gso_skb))) { + spinlock_t *lock = NULL; + + if (q->flags & TCQ_F_NOLOCK) { + lock = qdisc_lock(q); + spin_lock(lock); + } + + skb = skb_peek(&q->gso_skb); + + /* skb may be null if another cpu pulls gso_skb off in between + * empty check and lock. + */ + if (!skb) { + if (lock) + spin_unlock(lock); + goto validate; + } + /* skb in gso_skb were already validated */ *validate = false; /* check the reason of requeuing without tx lock first */ txq = skb_get_tx_queue(txq->dev, skb); if (!netif_xmit_frozen_or_stopped(txq)) { - q->gso_skb = NULL; - qdisc_qstats_backlog_dec(q, skb); - q->q.qlen--; - } else + skb = __skb_dequeue(&q->gso_skb); + if (qdisc_is_percpu_stats(q)) { + qdisc_qstats_cpu_backlog_dec(q, skb); + qdisc_qstats_cpu_qlen_dec(q); + } else { + qdisc_qstats_backlog_dec(q, skb); + q->q.qlen--; + } + } else { skb = NULL; + } + if (lock) + spin_unlock(lock); goto trace; } +validate: *validate = true; skb = q->skb_bad_txq; if (unlikely(skb)) { @@ -629,6 +679,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p); sch->padded = (char *) sch - (char *) p; } + __skb_queue_head_init(&sch->gso_skb); qdisc_skb_head_init(&sch->q); spin_lock_init(&sch->q.lock); @@ -697,6 +748,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue, void qdisc_reset(struct Qdisc *qdisc) { const struct Qdisc_ops *ops = qdisc->ops; + struct sk_buff *skb, *tmp; if (ops->reset) ops->reset(qdisc); @@ -704,10 +756,11 @@ void qdisc_reset(struct Qdisc *qdisc) kfree_skb(qdisc->skb_bad_txq); qdisc->skb_bad_txq = NULL; - if (qdisc->gso_skb) { - kfree_skb_list(qdisc->gso_skb); - qdisc->gso_skb = NULL; + skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp) { + __skb_unlink(skb, &qdisc->gso_skb); + kfree_skb_list(skb); } + qdisc->q.qlen = 0; qdisc->qstats.backlog = 0; } @@ -726,6 +779,7 @@ static void qdisc_free(struct Qdisc *qdisc) void qdisc_destroy(struct Qdisc *qdisc) { const struct Qdisc_ops *ops = qdisc->ops; + struct sk_buff *skb, *tmp; if (qdisc->flags & TCQ_F_BUILTIN || !refcount_dec_and_test(&qdisc->refcnt)) @@ -745,7 +799,11 @@ void qdisc_destroy(struct Qdisc *qdisc) module_put(ops->owner); dev_put(qdisc_dev(qdisc)); - kfree_skb_list(qdisc->gso_skb); + skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp) { + __skb_unlink(skb, &qdisc->gso_skb); + kfree_skb_list(skb); + } + kfree_skb(qdisc->skb_bad_txq); qdisc_free(qdisc); } @@ -973,6 +1031,7 @@ static void dev_init_scheduler_queue(struct net_device *dev, rcu_assign_pointer(dev_queue->qdisc, qdisc); dev_queue->qdisc_sleeping = qdisc; + __skb_queue_head_init(&qdisc->gso_skb); } void dev_init_scheduler(struct net_device *dev)