Message ID | 1460355869-13539-1-git-send-email-larper@axis.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote: > A failure in validate_xmit_skb_list() triggered an unconditional call > to dev_requeue_skb with skb=NULL. This slowly grows the queue > discipline's qlen count until all traffic through the queue stops. > > By introducing a NULL check in dev_requeue_skb it was also necessary > to make the __netif_schedule call conditional to avoid scheduling an > empty queue. > > Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock") > Signed-off-by: Lars Persson <larper@axis.com> > --- > net/sched/sch_generic.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index f18c350..4e6a79c 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops); > > static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) > { > - q->gso_skb = skb; > - q->qstats.requeues++; > - q->q.qlen++; /* it's still part of the queue */ > - __netif_schedule(q); > + if (skb) { > + q->gso_skb = skb; > + q->qstats.requeues++; > + q->q.qlen++; /* it's still part of the queue */ > + } > + if (qdisc_qlen(q)) > + __netif_schedule(q); > > return 0; > } Please always CC patch author when fixing a bug. Why adding the if (qdisc_qlen(q)) extra test ? This seems unrelated to the bug fix, and probably should be part of a second patch targeting net-next tree. Also please add a likely() clause if (likely(skb)) { q->gso_skb = skb; q->qstats.requeues++; q->q.qlen++; /* it's still part of the queue */ __netif_schedule(q); } Thanks !
On 04/11/2016 03:23 PM, Eric Dumazet wrote: > On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote: >> A failure in validate_xmit_skb_list() triggered an unconditional call >> to dev_requeue_skb with skb=NULL. This slowly grows the queue >> discipline's qlen count until all traffic through the queue stops. >> >> By introducing a NULL check in dev_requeue_skb it was also necessary >> to make the __netif_schedule call conditional to avoid scheduling an >> empty queue. >> >> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock") >> Signed-off-by: Lars Persson <larper@axis.com> >> --- >> net/sched/sch_generic.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >> index f18c350..4e6a79c 100644 >> --- a/net/sched/sch_generic.c >> +++ b/net/sched/sch_generic.c >> @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops); >> >> static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) >> { >> - q->gso_skb = skb; >> - q->qstats.requeues++; >> - q->q.qlen++; /* it's still part of the queue */ >> - __netif_schedule(q); >> + if (skb) { >> + q->gso_skb = skb; >> + q->qstats.requeues++; >> + q->q.qlen++; /* it's still part of the queue */ >> + } >> + if (qdisc_qlen(q)) >> + __netif_schedule(q); >> >> return 0; >> } > > > Please always CC patch author when fixing a bug. > > Why adding the if (qdisc_qlen(q)) extra test ? > > This seems unrelated to the bug fix, and probably should be part of a > second patch targeting net-next tree. I though it would be prudent because the queue can be non-empty even for the case of skb=NULL. So should it be there in this patch, another patch or not at all ? > > Also please add a likely() clause > > if (likely(skb)) { > q->gso_skb = skb; > q->qstats.requeues++; > q->q.qlen++; /* it's still part of the queue */ > __netif_schedule(q); > } Will fix. > Thanks ! > > > > >
On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote: > I though it would be prudent because the queue can be non-empty even for > the case of skb=NULL. So should it be there in this patch, another patch > or not at all ? Then maybe change return code ? It seems strange that a validate_xmit_skb_list() failure stops the __qdisc_run() loop but schedules another round.
On 04/11/2016 04:22 PM, Eric Dumazet wrote: > On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote: > >> I though it would be prudent because the queue can be non-empty even for >> the case of skb=NULL. So should it be there in this patch, another patch >> or not at all ? > > Then maybe change return code ? > > It seems strange that a validate_xmit_skb_list() failure stops the > __qdisc_run() loop but schedules another round. > > It was suggested by Cong Wang to return 0 in order to stop the loop. Do you guys agree that the loop should be stopped for such failures ? Then I will put the schedule call inside the if as you proposed earlier. - Lars
On Mon, 2016-04-11 at 17:17 +0200, Lars Persson wrote: > > On 04/11/2016 04:22 PM, Eric Dumazet wrote: > > On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote: > > > >> I though it would be prudent because the queue can be non-empty even for > >> the case of skb=NULL. So should it be there in this patch, another patch > >> or not at all ? > > > > Then maybe change return code ? > > > > It seems strange that a validate_xmit_skb_list() failure stops the > > __qdisc_run() loop but schedules another round. > > > > > > It was suggested by Cong Wang to return 0 in order to stop the loop. Do > you guys agree that the loop should be stopped for such failures ? Then > I will put the schedule call inside the if as you proposed earlier. What are the causes of validate_xmit_skb_list() failures ? If gso segmentations fail because of memory pressure, better free more skbs right now. In any case, having a single test " if (skb) " sounds better to me, to have a fast path. So your first patch was probably a better idea. v2 has two tests instead of one.
On Mon, Apr 11, 2016 at 8:17 AM, Lars Persson <lars.persson@axis.com> wrote: > > > On 04/11/2016 04:22 PM, Eric Dumazet wrote: >> >> On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote: >> >>> I though it would be prudent because the queue can be non-empty even for >>> the case of skb=NULL. So should it be there in this patch, another patch >>> or not at all ? >> >> >> Then maybe change return code ? >> >> It seems strange that a validate_xmit_skb_list() failure stops the >> __qdisc_run() loop but schedules another round. >> >> > > It was suggested by Cong Wang to return 0 in order to stop the loop. Do you > guys agree that the loop should be stopped for such failures ? Then I will > put the schedule call inside the if as you proposed earlier. I don't see any reason why we should continue on failure. And, I didn't suggest you to return reschedule it either. I was suggesting to just return 0 for skb == NULL case.
On Mon, Apr 11, 2016 at 8:52 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2016-04-11 at 17:17 +0200, Lars Persson wrote: >> >> On 04/11/2016 04:22 PM, Eric Dumazet wrote: >> > On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote: >> > >> >> I though it would be prudent because the queue can be non-empty even for >> >> the case of skb=NULL. So should it be there in this patch, another patch >> >> or not at all ? >> > >> > Then maybe change return code ? >> > >> > It seems strange that a validate_xmit_skb_list() failure stops the >> > __qdisc_run() loop but schedules another round. >> > >> > >> >> It was suggested by Cong Wang to return 0 in order to stop the loop. Do >> you guys agree that the loop should be stopped for such failures ? Then >> I will put the schedule call inside the if as you proposed earlier. > > What are the causes of validate_xmit_skb_list() failures ? > > If gso segmentations fail because of memory pressure, better free more > skbs right now. > > In any case, having a single test " if (skb) " sounds better to me, > to have a fast path. > > So your first patch was probably a better idea. > > v2 has two tests instead of one. I am fine with either way as long as the loop stops on failure. Folding the test "if (skb)" into one also requires to retake the spinlock.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index f18c350..4e6a79c 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops); static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) { - q->gso_skb = skb; - q->qstats.requeues++; - q->q.qlen++; /* it's still part of the queue */ - __netif_schedule(q); + if (skb) { + q->gso_skb = skb; + q->qstats.requeues++; + q->q.qlen++; /* it's still part of the queue */ + } + if (qdisc_qlen(q)) + __netif_schedule(q); return 0; }
A failure in validate_xmit_skb_list() triggered an unconditional call to dev_requeue_skb with skb=NULL. This slowly grows the queue discipline's qlen count until all traffic through the queue stops. By introducing a NULL check in dev_requeue_skb it was also necessary to make the __netif_schedule call conditional to avoid scheduling an empty queue. Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock") Signed-off-by: Lars Persson <larper@axis.com> --- net/sched/sch_generic.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)