diff mbox

[net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

Message ID 1384848307-7217-1-git-send-email-jasowang@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jason Wang Nov. 19, 2013, 8:05 a.m. UTC
We need to drop the refcnt of page when we fail to allocate an skb for frag
list, otherwise it will be leaked. The bug was introduced by commit
2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
buffers to page frag allocators").

Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
The patch was needed for 3.12 stable.
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Nov. 19, 2013, 2:03 p.m. UTC | #1
On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> We need to drop the refcnt of page when we fail to allocate an skb for frag
> list, otherwise it will be leaked. The bug was introduced by commit
> 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> buffers to page frag allocators").
> 
> Cc: Michael Dalton <mwdalton@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> The patch was needed for 3.12 stable.

Good catch, but if we return from receive_mergeable() in the 'middle'
of the frags we would need for the current skb, who will
call the virtqueue_get_buf() to flush the remaining frags ?

Don't we also need to call virtqueue_get_buf() like 

while (--num_buf) {
    buf = virtqueue_get_buf(rq->vq, &len);
    if (!buf)
        break;
    put_page(virt_to_head_page(buf));
}

?




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Nov. 19, 2013, 6:44 p.m. UTC | #2
On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > We need to drop the refcnt of page when we fail to allocate an skb for frag
> > list, otherwise it will be leaked. The bug was introduced by commit
> > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> > buffers to page frag allocators").
> > 
> > Cc: Michael Dalton <mwdalton@google.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > The patch was needed for 3.12 stable.
> 
> Good catch, but if we return from receive_mergeable() in the 'middle'
> of the frags we would need for the current skb, who will
> call the virtqueue_get_buf() to flush the remaining frags ?
> 
> Don't we also need to call virtqueue_get_buf() like 
> 
> while (--num_buf) {
>     buf = virtqueue_get_buf(rq->vq, &len);
>     if (!buf)
>         break;
>     put_page(virt_to_head_page(buf));
> }
> 
> ?
> 
> 
> 


virtqueue_get_buf only gives you back a buffer that has been DMA-ed
to by hardware. ATM there's no way to get back a buffer once you
gave it to hardware without doing a NIC reset.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Nov. 19, 2013, 8:49 p.m. UTC | #3
On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > We need to drop the refcnt of page when we fail to allocate an skb for frag
> > list, otherwise it will be leaked. The bug was introduced by commit
> > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> > buffers to page frag allocators").
> > 
> > Cc: Michael Dalton <mwdalton@google.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > The patch was needed for 3.12 stable.
> 
> Good catch, but if we return from receive_mergeable() in the 'middle'
> of the frags we would need for the current skb, who will
> call the virtqueue_get_buf() to flush the remaining frags ?
> 
> Don't we also need to call virtqueue_get_buf() like 
> 
> while (--num_buf) {
>     buf = virtqueue_get_buf(rq->vq, &len);
>     if (!buf)
>         break;
>     put_page(virt_to_head_page(buf));
> }
> 
> ?
> 
> 


Let me explain what worries me in your suggestion:

                        struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
                        if (unlikely(!nskb)) {
                                head_skb->dev->stats.rx_dropped++;
                                return -ENOMEM;
                        }

is this the failure case we are talking about?

I think this is a symprom of a larger problem
introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
namely that we now need to allocate memory in the
middle of processing a packet.


I think discarding a completely valid and well-formed
packet from the receive queue because we are unable
to allocate new memory with GFP_ATOMIC
for future packets is not a good idea.

It certainly violates the principle of least surprize:
when one sees host pass packet to guest, one expects
the packet to get into the networking stack, not get
dropped by the driver internally.
Guest stack can do with the packet what it sees fit.

We actually wake up a thread if we can't fill up the queue,
that will fill it up in GFP_KERNEL context.

So I think we should find a way to pre-allocate if necessary and avoid
error paths where allocating new memory is a required to avoid drops.
Eric Dumazet Nov. 19, 2013, 9:36 p.m. UTC | #4
On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > > We need to drop the refcnt of page when we fail to allocate an skb for frag
> > > list, otherwise it will be leaked. The bug was introduced by commit
> > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> > > buffers to page frag allocators").
> > > 
> > > Cc: Michael Dalton <mwdalton@google.com>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > > The patch was needed for 3.12 stable.
> > 
> > Good catch, but if we return from receive_mergeable() in the 'middle'
> > of the frags we would need for the current skb, who will
> > call the virtqueue_get_buf() to flush the remaining frags ?
> > 
> > Don't we also need to call virtqueue_get_buf() like 
> > 
> > while (--num_buf) {
> >     buf = virtqueue_get_buf(rq->vq, &len);
> >     if (!buf)
> >         break;
> >     put_page(virt_to_head_page(buf));
> > }
> > 
> > ?
> > 
> > 
> 
> 
> Let me explain what worries me in your suggestion:
> 
>                         struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
>                         if (unlikely(!nskb)) {
>                                 head_skb->dev->stats.rx_dropped++;
>                                 return -ENOMEM;
>                         }
> 
> is this the failure case we are talking about?

I thought Jason patch was about this, no ?

> 
> I think this is a symprom of a larger problem
> introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> namely that we now need to allocate memory in the
> middle of processing a packet.
> 
> 
> I think discarding a completely valid and well-formed
> packet from the receive queue because we are unable
> to allocate new memory with GFP_ATOMIC
> for future packets is not a good idea.

How is it different with NIC processing in RX path ?

> 
> It certainly violates the principle of least surprize:
> when one sees host pass packet to guest, one expects
> the packet to get into the networking stack, not get
> dropped by the driver internally.
> Guest stack can do with the packet what it sees fit.
> 
> We actually wake up a thread if we can't fill up the queue,
> that will fill it up in GFP_KERNEL context.
> 
> So I think we should find a way to pre-allocate if necessary and avoid
> error paths where allocating new memory is a required to avoid drops.
> 

Really, under ATOMIC context, there is no way you can avoid dropping
packets if you cannot allocate memory. If you cannot allocate sk_buff
(256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
payload of next packets anyway. 

Same problem on a real NIC.

Under memory pressure we _do_ packet drops.
Nobody really complained.

Sure, you can add yet another cache of pre-allocated skbs and pay the
price of managing yet another cache layer, but still need to trop
packets under stress.

Pre-allocating skb on real NIC has a performance cost, because we clear
sk_buff way ahead of time. By the time skb is finally received, cpu has
to bring back into its cache memory cache lines.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Dalton Nov. 19, 2013, 9:38 p.m. UTC | #5
Great catch Jason. I agree this now raises the larger issue of how to
handle a memory alloc failure in the middle of receive. As Eric mentioned,
we can drop the packet and free the remaining (num_buf) frags.

Michael, perhaps I'm missing something, but why would you prefer
pre-allocating buffers in this case? If the guest kernel is OOM'ing,
dropping packets should provide backpressure.

Also, we could just as easily fail the initial skb alloc in page_to_skb,
and I think that case also needs to be handled now in the same fashion as
a memory allocation failure in receive_mergeable.

Best,

Mike
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Nov. 19, 2013, 9:53 p.m. UTC | #6
On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > > > We need to drop the refcnt of page when we fail to allocate an skb for frag
> > > > list, otherwise it will be leaked. The bug was introduced by commit
> > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> > > > buffers to page frag allocators").
> > > > 
> > > > Cc: Michael Dalton <mwdalton@google.com>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > The patch was needed for 3.12 stable.
> > > 
> > > Good catch, but if we return from receive_mergeable() in the 'middle'
> > > of the frags we would need for the current skb, who will
> > > call the virtqueue_get_buf() to flush the remaining frags ?
> > > 
> > > Don't we also need to call virtqueue_get_buf() like 
> > > 
> > > while (--num_buf) {
> > >     buf = virtqueue_get_buf(rq->vq, &len);
> > >     if (!buf)
> > >         break;
> > >     put_page(virt_to_head_page(buf));
> > > }
> > > 
> > > ?
> > > 
> > > 
> > 
> > 
> > Let me explain what worries me in your suggestion:
> > 
> >                         struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
> >                         if (unlikely(!nskb)) {
> >                                 head_skb->dev->stats.rx_dropped++;
> >                                 return -ENOMEM;
> >                         }
> > 
> > is this the failure case we are talking about?
> 
> I thought Jason patch was about this, no ?
> 
> > 
> > I think this is a symprom of a larger problem
> > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> > namely that we now need to allocate memory in the
> > middle of processing a packet.
> > 
> > 
> > I think discarding a completely valid and well-formed
> > packet from the receive queue because we are unable
> > to allocate new memory with GFP_ATOMIC
> > for future packets is not a good idea.
> 
> How is it different with NIC processing in RX path ?


Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.

> > 
> > It certainly violates the principle of least surprize:
> > when one sees host pass packet to guest, one expects
> > the packet to get into the networking stack, not get
> > dropped by the driver internally.
> > Guest stack can do with the packet what it sees fit.
> > 
> > We actually wake up a thread if we can't fill up the queue,
> > that will fill it up in GFP_KERNEL context.
> > 
> > So I think we should find a way to pre-allocate if necessary and avoid
> > error paths where allocating new memory is a required to avoid drops.
> > 
> 
> Really, under ATOMIC context, there is no way you can avoid dropping
> packets if you cannot allocate memory. If you cannot allocate sk_buff
> (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
> payload of next packets anyway. 

that's why we do:

                if (!try_fill_recv(rq, GFP_ATOMIC))
                        schedule_delayed_work(&vi->refill, 0);


the queues are large enough for a single failure not to be
an immediate problem.


> Same problem on a real NIC.
> 
> Under memory pressure we _do_ packet drops.
> Nobody really complained.
>
> Sure, you can add yet another cache of pre-allocated skbs and pay the
> price of managing yet another cache layer, but still need to trop
> packets under stress.

We don't need a cache even. Just enough to avoid dropping packets
if allocation failed in the middle so we don't dequeue a buffer and then
drop it.

Once we use this reserved skb, we stop processing the queue until
refill gives it back.

> Pre-allocating skb on real NIC has a performance cost, because we clear
> sk_buff way ahead of time. By the time skb is finally received, cpu has
> to bring back into its cache memory cache lines.
> 

Alternatively we can pre-allocate the memory but avoid clearing it maybe?
Eric Dumazet Nov. 19, 2013, 10 p.m. UTC | #7
On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:

> Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
> it didn't drop packets received from host as far as I can tell.
> virtio is more like a pipe than a real NIC in this respect.

Prior/after to this patch, you were not posting buffers, so if packets
were received on a physical NIC, you were dropping the packets anyway.

It makes no difference at all, adding a cushion might make you feel
better, but its really not worth it.

Under memory stress, it makes better sense to drop a super big GRO
packet (The one needing frag_list extension ...)

It gives a better signal to the sender to reduce its pressure, and gives
opportunity to free more of your memory.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Dalton Nov. 20, 2013, 1:34 a.m. UTC | #8
Hi,

After further reflection I think we're looking at two related issues:
(a) a memory leak that Jason has identified that occurs when a memory
allocation fails in receive_mergeable. Jasons commit solves this issue.
(b) virtio-net does not dequeue all buffers for a packet in the
case that an error occurs on receive and mergeable receive buffers is
enabled.

For (a), this bug is new and due to changes in 2613af0ed18a, and the
net impact is memory leak on the physical page. However, I believe (b)
has always been possible in some form because if page_to_skb() returns
NULL (e.g., due to SKB allocation failure), receive_mergeable is never
called. AFAICT this is also the behavior prior to 2613af0ed18a.

The net impact of (b) would be that virtio-net would interpret a packet
buffer that is in the middle of a mergeable packet as the start of a
new packet, which is definitely also a bug (and the buffer contents
could contain bytes that resembled a valid virtio-net header).

A solution for (b) will require handling both the page_to_skb memory
allocation failures and the memory allocation failures in
receive_mergeable introduced by 2613af0ed18a.

Best,

Mike
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Nov. 20, 2013, 3 a.m. UTC | #9
On 11/19/2013 10:03 PM, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
>> > We need to drop the refcnt of page when we fail to allocate an skb for frag
>> > list, otherwise it will be leaked. The bug was introduced by commit
>> > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
>> > buffers to page frag allocators").
>> > 
>> > Cc: Michael Dalton <mwdalton@google.com>
>> > Cc: Eric Dumazet <edumazet@google.com>
>> > Cc: Rusty Russell <rusty@rustcorp.com.au>
>> > Cc: Michael S. Tsirkin <mst@redhat.com>
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> > The patch was needed for 3.12 stable.
> Good catch, but if we return from receive_mergeable() in the 'middle'
> of the frags we would need for the current skb, who will
> call the virtqueue_get_buf() to flush the remaining frags ?
>
> Don't we also need to call virtqueue_get_buf() like 
>
> while (--num_buf) {
>     buf = virtqueue_get_buf(rq->vq, &len);
>     if (!buf)
>         break;
>     put_page(virt_to_head_page(buf));
> }
>
> ?

Yes we need this, will send V2.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Nov. 20, 2013, 3:05 a.m. UTC | #10
On 11/20/2013 04:49 AM, Michael S. Tsirkin wrote:
> On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
>> On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
>>> We need to drop the refcnt of page when we fail to allocate an skb for frag
>>> list, otherwise it will be leaked. The bug was introduced by commit
>>> 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
>>> buffers to page frag allocators").
>>>
>>> Cc: Michael Dalton <mwdalton@google.com>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>> The patch was needed for 3.12 stable.
>> Good catch, but if we return from receive_mergeable() in the 'middle'
>> of the frags we would need for the current skb, who will
>> call the virtqueue_get_buf() to flush the remaining frags ?
>>
>> Don't we also need to call virtqueue_get_buf() like 
>>
>> while (--num_buf) {
>>     buf = virtqueue_get_buf(rq->vq, &len);
>>     if (!buf)
>>         break;
>>     put_page(virt_to_head_page(buf));
>> }
>>
>> ?
>>
>>
>
> Let me explain what worries me in your suggestion:
>
>                         struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
>                         if (unlikely(!nskb)) {
>                                 head_skb->dev->stats.rx_dropped++;
>                                 return -ENOMEM;
>                         }
>
> is this the failure case we are talking about?
>
> I think this is a symprom of a larger problem
> introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> namely that we now need to allocate memory in the
> middle of processing a packet.
>
>
> I think discarding a completely valid and well-formed
> packet from the receive queue because we are unable
> to allocate new memory with GFP_ATOMIC
> for future packets is not a good idea.
>
> It certainly violates the principle of least surprize:
> when one sees host pass packet to guest, one expects
> the packet to get into the networking stack, not get
> dropped by the driver internally.
> Guest stack can do with the packet what it sees fit.
>
> We actually wake up a thread if we can't fill up the queue,
> that will fill it up in GFP_KERNEL context.
>
> So I think we should find a way to pre-allocate if necessary and avoid
> error paths where allocating new memory is a required to avoid drops.
>

The problem happens only on memory pressure, this pre-allocation may add
more stress on this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Nov. 20, 2013, 3:17 a.m. UTC | #11
On 11/20/2013 09:34 AM, Michael Dalton wrote:
> Hi,
>
> After further reflection I think we're looking at two related issues:
> (a) a memory leak that Jason has identified that occurs when a memory
> allocation fails in receive_mergeable. Jasons commit solves this issue.
> (b) virtio-net does not dequeue all buffers for a packet in the
> case that an error occurs on receive and mergeable receive buffers is
> enabled.
>
> For (a), this bug is new and due to changes in 2613af0ed18a, and the
> net impact is memory leak on the physical page. However, I believe (b)
> has always been possible in some form because if page_to_skb() returns
> NULL (e.g., due to SKB allocation failure), receive_mergeable is never
> called. AFAICT this is also the behavior prior to 2613af0ed18a.
>
> The net impact of (b) would be that virtio-net would interpret a packet
> buffer that is in the middle of a mergeable packet as the start of a
> new packet, which is definitely also a bug (and the buffer contents
> could contain bytes that resembled a valid virtio-net header).
>
> A solution for (b) will require handling both the page_to_skb memory
> allocation failures and the memory allocation failures in
> receive_mergeable introduced by 2613af0ed18a.

Ture, so we first need a patch to solve page_to_skb() failure which
could be used for stable tree prior to 2613af0ed18a. Then another patch
to solve the issue introduced by 2613af0ed18a which could be only used
for 3.12 stable. Will draft patches for them.

Thanks
>
> Best,
>
> Mike

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Nov. 20, 2013, 8:58 a.m. UTC | #12
On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
> 
> > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
> > it didn't drop packets received from host as far as I can tell.
> > virtio is more like a pipe than a real NIC in this respect.
> 
> Prior/after to this patch, you were not posting buffers, so if packets
> were received on a physical NIC, you were dropping the packets anyway.
>
> It makes no difference at all, adding a cushion might make you feel
> better, but its really not worth it.
> 
> Under memory stress, it makes better sense to drop a super big GRO
> packet (The one needing frag_list extension ...)
> 
> It gives a better signal to the sender to reduce its pressure, and gives
> opportunity to free more of your memory.
> 

OK, but in that case one wonders whether we should do more to free memory?

E.g. imagine that we dropped a packet of a specific TCP flow
because we couldn't allocate a new packet.

What happens now is that the old packet is freed as well.

So quite likely the next packet in queue will get processed
since it will reuse the memory we have just freed.

The next packet and the next after it etc all will have to go through
the net stack until they get at the socket and are dropped then
because we missed a segment.  Even worse, GRO gets disabled so the load
on receiver goes up instead of down.

Sounds like a problem doesn't it?

GRO actually detects it's the same flow and can see packet is
out of sequence. Why doesn't it drop the packet then?
Alternatively, we could (for example using the pre-allocated skb
like I suggested) notify GRO that it should start dropping packets
of this flow.

What do you think?
Michael S. Tsirkin Nov. 20, 2013, 9 a.m. UTC | #13
On Tue, Nov 19, 2013 at 05:34:16PM -0800, Michael Dalton wrote:
> Hi,
> 
> After further reflection I think we're looking at two related issues:
> (a) a memory leak that Jason has identified that occurs when a memory
> allocation fails in receive_mergeable. Jasons commit solves this issue.
> (b) virtio-net does not dequeue all buffers for a packet in the
> case that an error occurs on receive and mergeable receive buffers is
> enabled.
> 
> For (a), this bug is new and due to changes in 2613af0ed18a, and the
> net impact is memory leak on the physical page. However, I believe (b)
> has always been possible in some form because if page_to_skb() returns
> NULL (e.g., due to SKB allocation failure), receive_mergeable is never
> called. AFAICT this is also the behavior prior to 2613af0ed18a.
> 
> The net impact of (b) would be that virtio-net would interpret a packet
> buffer that is in the middle of a mergeable packet as the start of a
> new packet, which is definitely also a bug (and the buffer contents
> could contain bytes that resembled a valid virtio-net header).
> 
> A solution for (b) will require handling both the page_to_skb memory
> allocation failures and the memory allocation failures in
> receive_mergeable introduced by 2613af0ed18a.
> 
> Best,
> 
> Mike


Absolutely. I missed this fact yesterday night but I can see it clearly
in the morning.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Nov. 20, 2013, 9:06 a.m. UTC | #14
On Tue, Nov 19, 2013 at 01:38:16PM -0800, Michael Dalton wrote:
> Great catch Jason. I agree this now raises the larger issue of how to
> handle a memory alloc failure in the middle of receive. As Eric mentioned,
> we can drop the packet and free the remaining (num_buf) frags.
> 
> Michael, perhaps I'm missing something, but why would you prefer
> pre-allocating buffers in this case? If the guest kernel is OOM'ing,
> dropping packets should provide backpressure.
> 
> Also, we could just as easily fail the initial skb alloc in page_to_skb,
> and I think that case also needs to be handled now in the same fashion as
> a memory allocation failure in receive_mergeable.
> 
> Best,
> 
> Mike

Yes I missed this last night. Thanks a lot Eric and Michael for pointing
this out.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 20, 2013, 3:16 p.m. UTC | #15
On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
> > On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
> > 
> > > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
> > > it didn't drop packets received from host as far as I can tell.
> > > virtio is more like a pipe than a real NIC in this respect.
> > 
> > Prior/after to this patch, you were not posting buffers, so if packets
> > were received on a physical NIC, you were dropping the packets anyway.
> >
> > It makes no difference at all, adding a cushion might make you feel
> > better, but its really not worth it.
> > 
> > Under memory stress, it makes better sense to drop a super big GRO
> > packet (The one needing frag_list extension ...)
> > 
> > It gives a better signal to the sender to reduce its pressure, and gives
> > opportunity to free more of your memory.
> > 
> 
> OK, but in that case one wonders whether we should do more to free memory?
> 
> E.g. imagine that we dropped a packet of a specific TCP flow
> because we couldn't allocate a new packet.
> 
> What happens now is that the old packet is freed as well.
> 
> So quite likely the next packet in queue will get processed
> since it will reuse the memory we have just freed.
> 
> The next packet and the next after it etc all will have to go through
> the net stack until they get at the socket and are dropped then
> because we missed a segment.  Even worse, GRO gets disabled so the load
> on receiver goes up instead of down.
> 
> Sounds like a problem doesn't it?

I see no problem at all. GRO is a hint for high rates (and obviously
when there is enough memory)

> 
> GRO actually detects it's the same flow and can see packet is
> out of sequence. Why doesn't it drop the packet then?
> Alternatively, we could (for example using the pre-allocated skb
> like I suggested) notify GRO that it should start dropping packets
> of this flow.
> 
> What do you think?
> 

I think we disagree a lot on memory management on networking stacks.

We did a lot of work in TCP stack and Qdisc layers to lower memory
pressure (and bufferbloat), an you seem to try hard to introduce yet
another layer of buffer bloat in virtio_net.

So add whatever you want to proudly state to your management :

"Look how smart we are : we drop no packets in our layer"



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Nov. 20, 2013, 4:06 p.m. UTC | #16
On Wed, Nov 20, 2013 at 07:16:33AM -0800, Eric Dumazet wrote:
> On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
> > > On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
> > > 
> > > > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
> > > > it didn't drop packets received from host as far as I can tell.
> > > > virtio is more like a pipe than a real NIC in this respect.
> > > 
> > > Prior/after to this patch, you were not posting buffers, so if packets
> > > were received on a physical NIC, you were dropping the packets anyway.
> > >
> > > It makes no difference at all, adding a cushion might make you feel
> > > better, but its really not worth it.
> > > 
> > > Under memory stress, it makes better sense to drop a super big GRO
> > > packet (The one needing frag_list extension ...)
> > > 
> > > It gives a better signal to the sender to reduce its pressure, and gives
> > > opportunity to free more of your memory.
> > > 
> > 
> > OK, but in that case one wonders whether we should do more to free memory?
> > 
> > E.g. imagine that we dropped a packet of a specific TCP flow
> > because we couldn't allocate a new packet.
> > 
> > What happens now is that the old packet is freed as well.
> > 
> > So quite likely the next packet in queue will get processed
> > since it will reuse the memory we have just freed.
> > 
> > The next packet and the next after it etc all will have to go through
> > the net stack until they get at the socket and are dropped then
> > because we missed a segment.  Even worse, GRO gets disabled so the load
> > on receiver goes up instead of down.
> > 
> > Sounds like a problem doesn't it?
> 
> I see no problem at all. GRO is a hint for high rates (and obviously
> when there is enough memory)
> 
> > 
> > GRO actually detects it's the same flow and can see packet is
> > out of sequence. Why doesn't it drop the packet then?
> > Alternatively, we could (for example using the pre-allocated skb
> > like I suggested) notify GRO that it should start dropping packets
> > of this flow.
> > 
> > What do you think?
> > 
> 
> I think we disagree a lot on memory management on networking stacks.
> 
> We did a lot of work in TCP stack and Qdisc layers to lower memory
> pressure (and bufferbloat), an you seem to try hard to introduce yet
> another layer of buffer bloat in virtio_net.
> 
> So add whatever you want to proudly state to your management :
> 
> "Look how smart we are : we drop no packets in our layer"
> 

Hmm some kind of disconnect here.
I got you rmanagement about bufferbloat.

What I am saying is that maybe we should drop packets more
aggressively: when we drop one packet of a flow, why not
drop everything that's queued and is for the same flow?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 20, 2013, 4:14 p.m. UTC | #17
On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:

> Hmm some kind of disconnect here.
> I got you rmanagement about bufferbloat.
> 
> What I am saying is that maybe we should drop packets more
> aggressively: when we drop one packet of a flow, why not
> drop everything that's queued and is for the same flow?

I really hope your TCP flows use SACK ;)

Please read the rfc 2018 introduction for details.

If a packet is dropped, it doesn't mean following packets _have_ to be
dropped.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Nov. 20, 2013, 5:03 p.m. UTC | #18
On Wed, Nov 20, 2013 at 08:14:21AM -0800, Eric Dumazet wrote:
> On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:
> 
> > Hmm some kind of disconnect here.
> > I got you rmanagement about bufferbloat.
> > 
> > What I am saying is that maybe we should drop packets more
> > aggressively: when we drop one packet of a flow, why not
> > drop everything that's queued and is for the same flow?
> 
> I really hope your TCP flows use SACK ;)
> 
> Please read the rfc 2018 introduction for details.
> 
> If a packet is dropped, it doesn't mean following packets _have_ to be
> dropped.


Got it, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 69ad42b..3798517 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -322,9 +322,11 @@  static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb)
 				 head_skb->dev->name);
 			len = MERGE_BUFFER_LEN;
 		}
+		page = virt_to_head_page(buf);
 		if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
 			struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
 			if (unlikely(!nskb)) {
+				put_page(page);
 				head_skb->dev->stats.rx_dropped++;
 				return -ENOMEM;
 			}
@@ -341,7 +343,6 @@  static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb)
 			head_skb->len += len;
 			head_skb->truesize += MERGE_BUFFER_LEN;
 		}
-		page = virt_to_head_page(buf);
 		offset = buf - (char *)page_address(page);
 		if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
 			put_page(page);