diff mbox

[net,v2] net: sched: do not requeue a NULL skb

Message ID 1460355869-13539-1-git-send-email-larper@axis.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Lars Persson April 11, 2016, 6:24 a.m. UTC
A failure in validate_xmit_skb_list() triggered an unconditional call
to dev_requeue_skb with skb=NULL. This slowly grows the queue
discipline's qlen count until all traffic through the queue stops.

By introducing a NULL check in dev_requeue_skb it was also necessary
to make the __netif_schedule call conditional to avoid scheduling an
empty queue.

Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
Signed-off-by: Lars Persson <larper@axis.com>
---
 net/sched/sch_generic.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Eric Dumazet April 11, 2016, 1:23 p.m. UTC | #1
On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote:
> A failure in validate_xmit_skb_list() triggered an unconditional call
> to dev_requeue_skb with skb=NULL. This slowly grows the queue
> discipline's qlen count until all traffic through the queue stops.
> 
> By introducing a NULL check in dev_requeue_skb it was also necessary
> to make the __netif_schedule call conditional to avoid scheduling an
> empty queue.
> 
> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
> Signed-off-by: Lars Persson <larper@axis.com>
> ---
>  net/sched/sch_generic.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index f18c350..4e6a79c 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops);
>  
>  static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>  {
> -	q->gso_skb = skb;
> -	q->qstats.requeues++;
> -	q->q.qlen++;	/* it's still part of the queue */
> -	__netif_schedule(q);
> +	if (skb) {
> +		q->gso_skb = skb;
> +		q->qstats.requeues++;
> +		q->q.qlen++;	/* it's still part of the queue */
> +	}
> +	if (qdisc_qlen(q))
> +		__netif_schedule(q);
>  
>  	return 0;
>  }


Please always CC patch author when fixing a bug.

Why adding the if (qdisc_qlen(q)) extra test ?

This seems unrelated to the bug fix, and probably should be part of a
second patch targeting net-next tree.

Also please add a likely() clause

if (likely(skb)) {
        q->gso_skb = skb;
        q->qstats.requeues++;
        q->q.qlen++;    /* it's still part of the queue */
        __netif_schedule(q);
}

Thanks !
Lars Persson April 11, 2016, 1:38 p.m. UTC | #2
On 04/11/2016 03:23 PM, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote:
>> A failure in validate_xmit_skb_list() triggered an unconditional call
>> to dev_requeue_skb with skb=NULL. This slowly grows the queue
>> discipline's qlen count until all traffic through the queue stops.
>>
>> By introducing a NULL check in dev_requeue_skb it was also necessary
>> to make the __netif_schedule call conditional to avoid scheduling an
>> empty queue.
>>
>> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
>> Signed-off-by: Lars Persson <larper@axis.com>
>> ---
>>   net/sched/sch_generic.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index f18c350..4e6a79c 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops);
>>
>>   static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>>   {
>> -	q->gso_skb = skb;
>> -	q->qstats.requeues++;
>> -	q->q.qlen++;	/* it's still part of the queue */
>> -	__netif_schedule(q);
>> +	if (skb) {
>> +		q->gso_skb = skb;
>> +		q->qstats.requeues++;
>> +		q->q.qlen++;	/* it's still part of the queue */
>> +	}
>> +	if (qdisc_qlen(q))
>> +		__netif_schedule(q);
>>
>>   	return 0;
>>   }
>
>
> Please always CC patch author when fixing a bug.
>
> Why adding the if (qdisc_qlen(q)) extra test ?
>
> This seems unrelated to the bug fix, and probably should be part of a
> second patch targeting net-next tree.

I though it would be prudent because the queue can be non-empty even for 
the case of skb=NULL. So should it be there in this patch, another patch 
or not at all ?

>
> Also please add a likely() clause
>
> if (likely(skb)) {
>          q->gso_skb = skb;
>          q->qstats.requeues++;
>          q->q.qlen++;    /* it's still part of the queue */
>          __netif_schedule(q);
> }

Will fix.

> Thanks !
>
>
>
>
>
Eric Dumazet April 11, 2016, 2:22 p.m. UTC | #3
On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:

> I though it would be prudent because the queue can be non-empty even for 
> the case of skb=NULL. So should it be there in this patch, another patch 
> or not at all ?

Then maybe change return code ?

It seems strange that a validate_xmit_skb_list() failure stops the
__qdisc_run() loop but schedules another round.
Lars Persson April 11, 2016, 3:17 p.m. UTC | #4
On 04/11/2016 04:22 PM, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:
>
>> I though it would be prudent because the queue can be non-empty even for
>> the case of skb=NULL. So should it be there in this patch, another patch
>> or not at all ?
>
> Then maybe change return code ?
>
> It seems strange that a validate_xmit_skb_list() failure stops the
> __qdisc_run() loop but schedules another round.
>
>

It was suggested by Cong Wang to return 0 in order to stop the loop. Do 
you guys agree that the loop should be stopped for such failures ? Then 
I will put the schedule call inside the if as you proposed earlier.

- Lars
Eric Dumazet April 11, 2016, 3:52 p.m. UTC | #5
On Mon, 2016-04-11 at 17:17 +0200, Lars Persson wrote:
> 
> On 04/11/2016 04:22 PM, Eric Dumazet wrote:
> > On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:
> >
> >> I though it would be prudent because the queue can be non-empty even for
> >> the case of skb=NULL. So should it be there in this patch, another patch
> >> or not at all ?
> >
> > Then maybe change return code ?
> >
> > It seems strange that a validate_xmit_skb_list() failure stops the
> > __qdisc_run() loop but schedules another round.
> >
> >
> 
> It was suggested by Cong Wang to return 0 in order to stop the loop. Do 
> you guys agree that the loop should be stopped for such failures ? Then 
> I will put the schedule call inside the if as you proposed earlier.

What are the causes of validate_xmit_skb_list() failures ?

If gso segmentations fail because of memory pressure, better free more
skbs right now.

In any case, having a single test " if (skb)  " sounds better to me,
to have a fast path.

So your first patch was probably a better idea.

v2 has two tests instead of one.
Cong Wang April 11, 2016, 5:53 p.m. UTC | #6
On Mon, Apr 11, 2016 at 8:17 AM, Lars Persson <lars.persson@axis.com> wrote:
>
>
> On 04/11/2016 04:22 PM, Eric Dumazet wrote:
>>
>> On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:
>>
>>> I though it would be prudent because the queue can be non-empty even for
>>> the case of skb=NULL. So should it be there in this patch, another patch
>>> or not at all ?
>>
>>
>> Then maybe change return code ?
>>
>> It seems strange that a validate_xmit_skb_list() failure stops the
>> __qdisc_run() loop but schedules another round.
>>
>>
>
> It was suggested by Cong Wang to return 0 in order to stop the loop. Do you
> guys agree that the loop should be stopped for such failures ? Then I will
> put the schedule call inside the if as you proposed earlier.

I don't see any reason why we should continue on failure. And, I didn't
suggest you to return reschedule it either. I was suggesting to just return
0 for skb == NULL case.
Cong Wang April 11, 2016, 6:02 p.m. UTC | #7
On Mon, Apr 11, 2016 at 8:52 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-04-11 at 17:17 +0200, Lars Persson wrote:
>>
>> On 04/11/2016 04:22 PM, Eric Dumazet wrote:
>> > On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:
>> >
>> >> I though it would be prudent because the queue can be non-empty even for
>> >> the case of skb=NULL. So should it be there in this patch, another patch
>> >> or not at all ?
>> >
>> > Then maybe change return code ?
>> >
>> > It seems strange that a validate_xmit_skb_list() failure stops the
>> > __qdisc_run() loop but schedules another round.
>> >
>> >
>>
>> It was suggested by Cong Wang to return 0 in order to stop the loop. Do
>> you guys agree that the loop should be stopped for such failures ? Then
>> I will put the schedule call inside the if as you proposed earlier.
>
> What are the causes of validate_xmit_skb_list() failures ?
>
> If gso segmentations fail because of memory pressure, better free more
> skbs right now.
>
> In any case, having a single test " if (skb)  " sounds better to me,
> to have a fast path.
>
> So your first patch was probably a better idea.
>
> v2 has two tests instead of one.

I am fine with either way as long as the loop stops on failure.
Folding the test "if (skb)" into one also requires to retake the spinlock.
diff mbox

Patch

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f18c350..4e6a79c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -47,10 +47,13 @@  EXPORT_SYMBOL(default_qdisc_ops);
 
 static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
-	q->gso_skb = skb;
-	q->qstats.requeues++;
-	q->q.qlen++;	/* it's still part of the queue */
-	__netif_schedule(q);
+	if (skb) {
+		q->gso_skb = skb;
+		q->qstats.requeues++;
+		q->q.qlen++;	/* it's still part of the queue */
+	}
+	if (qdisc_qlen(q))
+		__netif_schedule(q);
 
 	return 0;
 }