diff mbox series

[net-next,v2,3/6] net: sched: RED: Introduce an ECN tail-dropping mode

Message ID 20200311173356.38181-4-petrm@mellanox.com
State Changes Requested
Delegated to: David Miller
Headers show
Series RED: Introduce an ECN tail-dropping mode | expand

Commit Message

Petr Machata March 11, 2020, 5:33 p.m. UTC
When the RED Qdisc is currently configured to enable ECN, the RED algorithm
is used to decide whether a certain SKB should be marked. If that SKB is
not ECN-capable, it is early-dropped.

It is also possible to keep all traffic in the queue, and just mark the
ECN-capable subset of it, as appropriate under the RED algorithm. Some
switches support this mode, and some installations make use of it.

To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is
configured with this flag, non-ECT traffic is enqueued (and tail-dropped
when the queue size is exhausted) instead of being early-dropped.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---

Notes:
    v2:
    - Fix red_use_taildrop() condition in red_enqueue switch for
      probabilistic case.

 include/net/pkt_cls.h          |  1 +
 include/net/red.h              |  5 +++++
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_red.c            | 31 +++++++++++++++++++++++++------
 4 files changed, 32 insertions(+), 6 deletions(-)

Comments

Eric Dumazet March 11, 2020, 6:36 p.m. UTC | #1
On 3/11/20 10:33 AM, Petr Machata wrote:
> When the RED Qdisc is currently configured to enable ECN, the RED algorithm
> is used to decide whether a certain SKB should be marked. If that SKB is
> not ECN-capable, it is early-dropped.
> 
> It is also possible to keep all traffic in the queue, and just mark the
> ECN-capable subset of it, as appropriate under the RED algorithm. Some
> switches support this mode, and some installations make use of it.
> 
> To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is
> configured with this flag, non-ECT traffic is enqueued (and tail-dropped
> when the queue size is exhausted) instead of being early-dropped.
> 

I find the naming of the feature very confusing.

When enabling this new feature, we no longer drop packets
that could not be CE marked.

Tail drop is already in the packet scheduler, you want to disable it.


How this feature has been named elsewhere ???
(you mentioned in your cover letter : 
"Some switches support this mode, and some installations make use of it.")
Petr Machata March 12, 2020, 12:42 a.m. UTC | #2
Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 3/11/20 10:33 AM, Petr Machata wrote:
>> When the RED Qdisc is currently configured to enable ECN, the RED algorithm
>> is used to decide whether a certain SKB should be marked. If that SKB is
>> not ECN-capable, it is early-dropped.
>>
>> It is also possible to keep all traffic in the queue, and just mark the
>> ECN-capable subset of it, as appropriate under the RED algorithm. Some
>> switches support this mode, and some installations make use of it.
>>
>> To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is
>> configured with this flag, non-ECT traffic is enqueued (and tail-dropped
>> when the queue size is exhausted) instead of being early-dropped.
>>
>
> I find the naming of the feature very confusing.
>
> When enabling this new feature, we no longer drop packets
> that could not be CE marked.
> Tail drop is already in the packet scheduler, you want to disable it.
>
>
> How this feature has been named elsewhere ???
> (you mentioned in your cover letter :
> "Some switches support this mode, and some installations make use of it.")

The two interfaces that I know about are Juniper and Cumulus. I don't
know either from direct experience, but from the manual, Cumulus seems
to allow enablement of either ECN on its own[0], or ECN with RED[1]. (Or
RED on its own I presume, but I couldn't actually find that.)

In Juniper likewise, "on ECN-enabled queues, the switch [...] uses the
tail-drop algorithm to drop non-ECN-capable packets during periods of
congestion"[2]. You need to direct non-ECT traffic to a different queue
and configure RED on that to get the RED+ECN behavior ala Linux.

So this is unlike the RED qdisc, where RED is implied, and needs to be
turned off again by an anti-RED flag. The logic behind the chosen flag
name is that the opposite of early dropping is tail dropping. Note that
Juniper actually calls it that as well.

That said, I agree that from the perspective of the qdisc itself the
name does not make sense. We can make it "nodrop", or "nored", or maybe
"keep_non_ect"... I guess "nored" is closest to the desired effect.

[0] https://docs.cumulusnetworks.com/cumulus-linux-40/Layer-1-and-Switch-Ports/Buffer-and-Queue-Management/
[1] https://docs.cumulusnetworks.com/version/cumulus-linux-37/Network-Solutions/RDMA-over-Converged-Ethernet-RoCE/
[2] https://www.juniper.net/documentation/en_US/junos/topics/concept/cos-qfx-series-tail-drop-understanding.html
Eric Dumazet March 12, 2020, 1:01 a.m. UTC | #3
On 3/11/20 5:42 PM, Petr Machata wrote:
> 
> Eric Dumazet <eric.dumazet@gmail.com> writes:
> 
>> On 3/11/20 10:33 AM, Petr Machata wrote:
>>> When the RED Qdisc is currently configured to enable ECN, the RED algorithm
>>> is used to decide whether a certain SKB should be marked. If that SKB is
>>> not ECN-capable, it is early-dropped.
>>>
>>> It is also possible to keep all traffic in the queue, and just mark the
>>> ECN-capable subset of it, as appropriate under the RED algorithm. Some
>>> switches support this mode, and some installations make use of it.
>>>
>>> To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is
>>> configured with this flag, non-ECT traffic is enqueued (and tail-dropped
>>> when the queue size is exhausted) instead of being early-dropped.
>>>
>>
>> I find the naming of the feature very confusing.
>>
>> When enabling this new feature, we no longer drop packets
>> that could not be CE marked.
>> Tail drop is already in the packet scheduler, you want to disable it.
>>
>>
>> How this feature has been named elsewhere ???
>> (you mentioned in your cover letter :
>> "Some switches support this mode, and some installations make use of it.")
> 
> The two interfaces that I know about are Juniper and Cumulus. I don't
> know either from direct experience, but from the manual, Cumulus seems
> to allow enablement of either ECN on its own[0], or ECN with RED[1]. (Or
> RED on its own I presume, but I couldn't actually find that.)
> 
> In Juniper likewise, "on ECN-enabled queues, the switch [...] uses the
> tail-drop algorithm to drop non-ECN-capable packets during periods of
> congestion"[2]. You need to direct non-ECT traffic to a different queue
> and configure RED on that to get the RED+ECN behavior ala Linux.
> 
> So this is unlike the RED qdisc, where RED is implied, and needs to be
> turned off again by an anti-RED flag. The logic behind the chosen flag
> name is that the opposite of early dropping is tail dropping. Note that
> Juniper actually calls it that as well.
> 
> That said, I agree that from the perspective of the qdisc itself the
> name does not make sense. We can make it "nodrop", or "nored", or maybe
> "keep_non_ect"... I guess "nored" is closest to the desired effect.

Well, red algo is still used to decide if we attempt ECN marking, so "nodrop"
seems better to me :)

> 
> [0] https://docs.cumulusnetworks.com/cumulus-linux-40/Layer-1-and-Switch-Ports/Buffer-and-Queue-Management/
> [1] https://docs.cumulusnetworks.com/version/cumulus-linux-37/Network-Solutions/RDMA-over-Converged-Ethernet-RoCE/
> [2] https://www.juniper.net/documentation/en_US/junos/topics/concept/cos-qfx-series-tail-drop-understanding.html
>
Jakub Kicinski March 12, 2020, 2:25 a.m. UTC | #4
On Wed, 11 Mar 2020 18:01:38 -0700 Eric Dumazet wrote:
> > That said, I agree that from the perspective of the qdisc itself the
> > name does not make sense. We can make it "nodrop", or "nored", or maybe
> > "keep_non_ect"... I guess "nored" is closest to the desired effect.  
> 
> Well, red algo is still used to decide if we attempt ECN marking, so "nodrop"
> seems better to me :)

nodrop + harddrop is a valid combination and that'd look a little
strange. Maybe something like ect-only?
Petr Machata March 12, 2020, 10:16 a.m. UTC | #5
Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 11 Mar 2020 18:01:38 -0700 Eric Dumazet wrote:
>> > That said, I agree that from the perspective of the qdisc itself the
>> > name does not make sense. We can make it "nodrop", or "nored", or maybe
>> > "keep_non_ect"... I guess "nored" is closest to the desired effect.
>>
>> Well, red algo is still used to decide if we attempt ECN marking, so "nodrop"
>> seems better to me :)

I know that the D in RED could stand for "detection", but the one in RED
as a qdisc kind certainly stands for "drop" :)

> nodrop + harddrop is a valid combination and that'd look a little
> strange. Maybe something like ect-only?

But then "ecn ect_only" would look equally strange. Like, of course ECN
is ECT only...

I don't actually mind "nodrop harddrop". You can't guess what these
flags do from just their names anyway, so the fact they don't seem to
make sense next to each other is not an issue. And "nodrop" does
describe what the immediate behavior in context of the RED qdisc is.
Petr Machata March 12, 2020, 10:17 a.m. UTC | #6
Petr Machata <petrm@mellanox.com> writes:

> @@ -60,6 +60,11 @@ static inline int red_use_harddrop(struct red_sched_data *q)
>  	return q->flags & TC_RED_HARDDROP;
>  }
>  
> +static inline int red_use_taildrop(struct red_sched_data *q)

Forgot to take care of the static inline :-|

> +{
> +	return q->flags & TC_RED_TAILDROP;
> +}
> +
diff mbox series

Patch

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 341a66af8d59..9ad369aba678 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -727,6 +727,7 @@  struct tc_red_qopt_offload_params {
 	u32 limit;
 	bool is_ecn;
 	bool is_harddrop;
+	bool is_taildrop;
 	struct gnet_stats_queue *qstats;
 };
 
diff --git a/include/net/red.h b/include/net/red.h
index 5718d2b25637..372b4988118c 100644
--- a/include/net/red.h
+++ b/include/net/red.h
@@ -200,6 +200,11 @@  static inline bool red_get_flags(unsigned char flags,
 		return false;
 	}
 
+	if ((*p_flags & TC_RED_TAILDROP) && !(*p_flags & TC_RED_ECN)) {
+		NL_SET_ERR_MSG_MOD(extack, "taildrop mode is only meaningful with ECN");
+		return false;
+	}
+
 	*p_userbits = flags & ~historic_mask;
 	return true;
 }
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 277df546e1a9..45d1c0e6444e 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -286,6 +286,7 @@  struct tc_red_qopt {
 #define TC_RED_ECN		1
 #define TC_RED_HARDDROP		2
 #define TC_RED_ADAPTATIVE	4
+#define TC_RED_TAILDROP		8
 };
 
 #define TC_RED_HISTORIC_FLAGS (TC_RED_ECN | TC_RED_HARDDROP | TC_RED_ADAPTATIVE)
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 61d7c5a61279..1474f973ec6d 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -48,7 +48,7 @@  struct red_sched_data {
 	struct Qdisc		*qdisc;
 };
 
-#define RED_SUPPORTED_FLAGS TC_RED_HISTORIC_FLAGS
+#define RED_SUPPORTED_FLAGS (TC_RED_HISTORIC_FLAGS | TC_RED_TAILDROP)
 
 static inline int red_use_ecn(struct red_sched_data *q)
 {
@@ -60,6 +60,11 @@  static inline int red_use_harddrop(struct red_sched_data *q)
 	return q->flags & TC_RED_HARDDROP;
 }
 
+static inline int red_use_taildrop(struct red_sched_data *q)
+{
+	return q->flags & TC_RED_TAILDROP;
+}
+
 static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		       struct sk_buff **to_free)
 {
@@ -80,23 +85,36 @@  static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 	case RED_PROB_MARK:
 		qdisc_qstats_overlimit(sch);
-		if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) {
+		if (!red_use_ecn(q)) {
 			q->stats.prob_drop++;
 			goto congestion_drop;
 		}
 
-		q->stats.prob_mark++;
+		if (INET_ECN_set_ce(skb)) {
+			q->stats.prob_mark++;
+		} else if (!red_use_taildrop(q)) {
+			q->stats.prob_drop++;
+			goto congestion_drop;
+		}
+
+		/* Non-ECT packet in ECN taildrop mode: queue it. */
 		break;
 
 	case RED_HARD_MARK:
 		qdisc_qstats_overlimit(sch);
-		if (red_use_harddrop(q) || !red_use_ecn(q) ||
-		    !INET_ECN_set_ce(skb)) {
+		if (red_use_harddrop(q) || !red_use_ecn(q)) {
 			q->stats.forced_drop++;
 			goto congestion_drop;
 		}
 
-		q->stats.forced_mark++;
+		if (INET_ECN_set_ce(skb)) {
+			q->stats.forced_mark++;
+		} else if (!red_use_taildrop(q)) {
+			q->stats.forced_drop++;
+			goto congestion_drop;
+		}
+
+		/* Non-ECT packet in ECN taildrop mode: queue it. */
 		break;
 	}
 
@@ -171,6 +189,7 @@  static int red_offload(struct Qdisc *sch, bool enable)
 		opt.set.limit = q->limit;
 		opt.set.is_ecn = red_use_ecn(q);
 		opt.set.is_harddrop = red_use_harddrop(q);
+		opt.set.is_taildrop = red_use_taildrop(q);
 		opt.set.qstats = &sch->qstats;
 	} else {
 		opt.command = TC_RED_DESTROY;