diff mbox

[net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs

Message ID 20170814055259.31078-1-matthew@mjdsystems.ca
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Matthew Dawson Aug. 14, 2017, 5:52 a.m. UTC
Due to commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac ("udp: remove
headers from UDP packets before queueing"), when udp packets are being
peeked the requested extra offset is always 0 as there is no need to skip
the udp header.  However, when the offset is 0 and the next skb is
of length 0, it is only returned once.  The behaviour can be seen with
the following python script:

#!/usr/bin/env python3
from socket import *;
f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
f.bind(('::', 0));
addr=('::1', f.getsockname()[1]);
g.sendto(b'', addr)
g.sendto(b'b', addr)
print(f.recvfrom(10, MSG_PEEK));
print(f.recvfrom(10, MSG_PEEK));

Where the expected output should be the empty string twice.

Instead, make sk_peek_offset return negative values, and pass those values
to __skb_try_recv_datagram/__skb_try_recv_from_queue.  If the passed offset
to __skb_try_recv_from_queue is negative, the checked skb is never skipped.
After the call, the offset is set to 0 if negative to ensure all further
calculations are correct.

Also simplify the if condition in __skb_try_recv_from_queue.  If _off is
greater then 0, and off is greater then or equal to skb->len, then (_off ||
skb->len) must always be true assuming skb->len >= 0 is always true.

Also remove a redundant check around a call to sk_peek_offset in af_unix.c,
as it double checked if MSG_PEEK was set in the flags.

Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
---
 include/net/sock.h  | 4 +---
 net/core/datagram.c | 4 ++--
 net/ipv4/udp.c      | 4 ++++
 net/ipv6/udp.c      | 4 ++++
 net/unix/af_unix.c  | 8 +++++---
 5 files changed, 16 insertions(+), 8 deletions(-)

Comments

Paolo Abeni Aug. 14, 2017, 9:27 a.m. UTC | #1
On Mon, 2017-08-14 at 01:52 -0400, Matthew Dawson wrote:
> Due to commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac ("udp: remove
> headers from UDP packets before queueing"), when udp packets are being
> peeked the requested extra offset is always 0 as there is no need to skip
> the udp header.  However, when the offset is 0 and the next skb is
> of length 0, it is only returned once.  The behaviour can be seen with
> the following python script:
> 
> #!/usr/bin/env python3
> from socket import *;
> f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
> g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
> f.bind(('::', 0));
> addr=('::1', f.getsockname()[1]);
> g.sendto(b'', addr)
> g.sendto(b'b', addr)
> print(f.recvfrom(10, MSG_PEEK));
> print(f.recvfrom(10, MSG_PEEK));
> 
> Where the expected output should be the empty string twice.
> 
> Instead, make sk_peek_offset return negative values, and pass those values
> to __skb_try_recv_datagram/__skb_try_recv_from_queue.  If the passed offset
> to __skb_try_recv_from_queue is negative, the checked skb is never skipped.
> After the call, the offset is set to 0 if negative to ensure all further
> calculations are correct.
> 
> Also simplify the if condition in __skb_try_recv_from_queue.  If _off is
> greater then 0, and off is greater then or equal to skb->len, then (_off ||
> skb->len) must always be true assuming skb->len >= 0 is always true.
> 
> Also remove a redundant check around a call to sk_peek_offset in af_unix.c,
> as it double checked if MSG_PEEK was set in the flags.
> 
> Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
> ---
>  include/net/sock.h  | 4 +---
>  net/core/datagram.c | 4 ++--
>  net/ipv4/udp.c      | 4 ++++
>  net/ipv6/udp.c      | 4 ++++
>  net/unix/af_unix.c  | 8 +++++---
>  5 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 7c0632c7e870..aeeec62992ca 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -507,9 +507,7 @@ int sk_set_peek_off(struct sock *sk, int val);
>  static inline int sk_peek_offset(struct sock *sk, int flags)
>  {
>  	if (unlikely(flags & MSG_PEEK)) {
> -		s32 off = READ_ONCE(sk->sk_peek_off);
> -		if (off >= 0)
> -			return off;
> +		return READ_ONCE(sk->sk_peek_off);
>  	}
>  
>  	return 0;

You probably want/must also update sk_set_peek_off() to allow negative
values, elsewhere this will break as soon as the user will do
SOL_SOCKET/SO_PEEK_OFF.

I'm wondering adding an explicit SOCK_PEEK_OFF/MSG_PEEK_OFF socket flag
would help simplyifing the code: no need for negative offset; set such
flag when SOL_SOCKET/SO_PEEK_OFF with a non negative value is called
(and clear it when a negative value is used), forward such flag to
__skb_try_recv_datagram/__skb_try_recv_from_queue and use it to select
the proper peek behaviour.

Paolo
Matthew Dawson Aug. 14, 2017, 2:05 p.m. UTC | #2
On Monday, August 14, 2017 5:27:26 AM EDT Paolo Abeni wrote:
> On Mon, 2017-08-14 at 01:52 -0400, Matthew Dawson wrote:
> > Due to commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac ("udp: remove
> > headers from UDP packets before queueing"), when udp packets are being
> > peeked the requested extra offset is always 0 as there is no need to skip
> > the udp header.  However, when the offset is 0 and the next skb is
> > of length 0, it is only returned once.  The behaviour can be seen with
> > the following python script:
> > 
> > #!/usr/bin/env python3
> > from socket import *;
> > f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
> > g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
> > f.bind(('::', 0));
> > addr=('::1', f.getsockname()[1]);
> > g.sendto(b'', addr)
> > g.sendto(b'b', addr)
> > print(f.recvfrom(10, MSG_PEEK));
> > print(f.recvfrom(10, MSG_PEEK));
> > 
> > Where the expected output should be the empty string twice.
> > 
> > Instead, make sk_peek_offset return negative values, and pass those values
> > to __skb_try_recv_datagram/__skb_try_recv_from_queue.  If the passed
> > offset
> > to __skb_try_recv_from_queue is negative, the checked skb is never
> > skipped.
> > After the call, the offset is set to 0 if negative to ensure all further
> > calculations are correct.
> > 
> > Also simplify the if condition in __skb_try_recv_from_queue.  If _off is
> > greater then 0, and off is greater then or equal to skb->len, then (_off
> > ||
> > skb->len) must always be true assuming skb->len >= 0 is always true.
> > 
> > Also remove a redundant check around a call to sk_peek_offset in
> > af_unix.c,
> > as it double checked if MSG_PEEK was set in the flags.
> > 
> > Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
> > ---
> > 
> >  include/net/sock.h  | 4 +---
> >  net/core/datagram.c | 4 ++--
> >  net/ipv4/udp.c      | 4 ++++
> >  net/ipv6/udp.c      | 4 ++++
> >  net/unix/af_unix.c  | 8 +++++---
> >  5 files changed, 16 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 7c0632c7e870..aeeec62992ca 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -507,9 +507,7 @@ int sk_set_peek_off(struct sock *sk, int val);
> > 
> >  static inline int sk_peek_offset(struct sock *sk, int flags)
> >  {
> >  
> >  	if (unlikely(flags & MSG_PEEK)) {
> > 
> > -		s32 off = READ_ONCE(sk->sk_peek_off);
> > -		if (off >= 0)
> > -			return off;
> > +		return READ_ONCE(sk->sk_peek_off);
> > 
> >  	}
> >  	
> >  	return 0;
> 
> You probably want/must also update sk_set_peek_off() to allow negative
> values, elsewhere this will break as soon as the user will do
> SOL_SOCKET/SO_PEEK_OFF.
I'm happy to do this either in this patch or as a second patch and make this a 
series, whatever the networking community prefers.

I'm actually surprised that only unix sockets can have negative values.  Is 
there a reason for that?  I had assumed that sk_set_peek_off would allow 
negative values as the code already has to support negative values due to what 
the initial value is.

> I'm wondering adding an explicit SOCK_PEEK_OFF/MSG_PEEK_OFF socket flag
> would help simplyifing the code: no need for negative offset; set such
> flag when SOL_SOCKET/SO_PEEK_OFF with a non negative value is called
> (and clear it when a negative value is used), forward such flag to
> __skb_try_recv_datagram/__skb_try_recv_from_queue and use it to select
> the proper peek behaviour.
The negative value is documented and supported for unix sockets, so I don't 
think we can just reject all negative values.  However, I do see a way to 
implement that while supporting user space sending negative values.  If that 
is preferred let me know and I'll see what I can make.

I assume that would make this patch target net-next?  Would it be possible to 
pull this into net so it can fix this regression for the next kernel release, 
while I work on getting the better solution finished?  I'm happy to to make 
__skb_try_recv_datagram/__skb_try_recv_from_queue accept the extra parameter 
so they don't see an offset less then 0 in this patch.
Willem de Bruijn Aug. 14, 2017, 3:03 p.m. UTC | #3
> I'm actually surprised that only unix sockets can have negative values.  Is
> there a reason for that?  I had assumed that sk_set_peek_off would allow
> negative values as the code already has to support negative values due to what
> the initial value is.

A negative initial value indicates that PEEK_OFF is disabled. It only
makes sense to peek from a positive offset from the start of the data.

>> I'm wondering adding an explicit SOCK_PEEK_OFF/MSG_PEEK_OFF socket flag
>> would help simplyifing the code:

The behavior needs to be bifurcated between peeking with
offset and without offset.

When peeking with offset is enabled by setting SO_PEEK_OFF,
subsequent reads do move the offset, so the observed behavior
is correct.

When sk->sk_peek_offset is negative, offset mode is disabled
and the same packet must be read twice.

An explicit boolean flag to discern between the two may make
the code simpler to understand, not sure whether that is logically
required.

>> no need for negative offset; set such
>> flag when SOL_SOCKET/SO_PEEK_OFF with a non negative value is called
>> (and clear it when a negative value is used), forward such flag to
>> __skb_try_recv_datagram/__skb_try_recv_from_queue and use it to select
>> the proper peek behaviour.
> The negative value is documented and supported for unix sockets, so I don't
> think we can just reject all negative values.

What is the documented behavior on those sockets? The original
patch that introduced it explicitly mentions peek mode as disabled
when the value is negative:

  https://www.spinics.net/lists/netdev/msg189589.html

> However, I do see a way to
> implement that while supporting user space sending negative values.  If that
> is preferred let me know and I'll see what I can make.
>
> I assume that would make this patch target net-next?

I suspect that we can fix this with a small enough patch to be able
to send to net.

> Would it be possible to
> pull this into net so it can fix this regression for the next kernel release,
> while I work on getting the better solution finished?  I'm happy to to make
> __skb_try_recv_datagram/__skb_try_recv_from_queue accept the extra parameter
> so they don't see an offset less then 0 in this patch.
>
> --
> Matthew
Paolo Abeni Aug. 14, 2017, 3:31 p.m. UTC | #4
On Mon, 2017-08-14 at 11:03 -0400, Willem de Bruijn wrote:
> > I'm actually surprised that only unix sockets can have negative values.  Is
> > there a reason for that?  I had assumed that sk_set_peek_off would allow
> > negative values as the code already has to support negative values due to what
> > the initial value is.
> 
> A negative initial value indicates that PEEK_OFF is disabled. It only
> makes sense to peek from a positive offset from the start of the data.

With the current code, the user space can disable peeking with offset
setting a negative offset value, after that peeking with offset has
been enabled. But only for UNIX sockets. I think the same should be
allowed for UDP sockets.

> > > I'm wondering adding an explicit SOCK_PEEK_OFF/MSG_PEEK_OFF socket flag
> > > would help simplyifing the code:
> 
> The behavior needs to be bifurcated between peeking with
> offset and without offset.
> 
> When peeking with offset is enabled by setting SO_PEEK_OFF,
> subsequent reads do move the offset, so the observed behavior
> is correct.
> 
> When sk->sk_peek_offset is negative, offset mode is disabled
> and the same packet must be read twice.
> 
> An explicit boolean flag to discern between the two may make
> the code simpler to understand, not sure whether that is logically
> required.

Yes, an explicit PEEK_OFF flag is just to keep the code simpler, so
that there is no need to add checks at every sk_peek_offset() call site
and the relevant logic can be fully encapsulated under the MSG_PEEK
branch in __skb_try_recv_from_queue(), I think/hope.
It's not a functional requirement.

Cheers,

Paolo
Thiago Macieira Aug. 14, 2017, 4:06 p.m. UTC | #5
On Monday, 14 August 2017 08:03:50 PDT Willem de Bruijn wrote:
> > I'm actually surprised that only unix sockets can have negative values. 
> > Is
> > there a reason for that?  I had assumed that sk_set_peek_off would allow
> > negative values as the code already has to support negative values due to
> > what the initial value is.
> 
> A negative initial value indicates that PEEK_OFF is disabled. It only
> makes sense to peek from a positive offset from the start of the data.

But here's a question: if the peek offset is equal to the length, should the 
reading return an empty datagram? This would indicate to the caller that there 
was a datagram there, which was skipped over.

That's how we deal with empty datagrams anyway.
Willem de Bruijn Aug. 14, 2017, 4:33 p.m. UTC | #6
On Mon, Aug 14, 2017 at 12:06 PM, Thiago Macieira
<thiago.macieira@intel.com> wrote:
> On Monday, 14 August 2017 08:03:50 PDT Willem de Bruijn wrote:
>> > I'm actually surprised that only unix sockets can have negative values.
>> > Is
>> > there a reason for that?  I had assumed that sk_set_peek_off would allow
>> > negative values as the code already has to support negative values due to
>> > what the initial value is.
>>
>> A negative initial value indicates that PEEK_OFF is disabled. It only
>> makes sense to peek from a positive offset from the start of the data.
>
> But here's a question: if the peek offset is equal to the length, should the
> reading return an empty datagram? This would indicate to the caller that there
> was a datagram there, which was skipped over.

In the general case, no, it should read at the offset, which is the next skb.

Zero length packets are a special case. This did come up before and
we chose to signal their existence in the queue by returning 0 for each
once, even in the offset-enabled mode.

Since we only need to change no-offset semantics to fix this bug,
I would not change this behavior, which is also expected by some
applications by now.

>
> That's how we deal with empty datagrams anyway.

What is? With no-offset and a zero payload skb at the head, peek
or recv returns 0, right?
Thiago Macieira Aug. 14, 2017, 5:02 p.m. UTC | #7
On Monday, 14 August 2017 09:33:48 PDT Willem de Bruijn wrote:
> > But here's a question: if the peek offset is equal to the length, should
> > the reading return an empty datagram? This would indicate to the caller
> > that there was a datagram there, which was skipped over.
> 
> In the general case, no, it should read at the offset, which is the next
> skb.

I beg to differ. In this particular case, we are talking about datagrams. If 
it were stream sockets, I would agree with you: just skip to the next. But in 
datagrams, the same way you do return zero-sized ones, I would return an empty 
one if you peeked at or past the end.

> Since we only need to change no-offset semantics to fix this bug,
> I would not change this behavior, which is also expected by some
> applications by now.

Do applications using SOCK_DGRAM rely on the behaviour of skipping over 
datagrams that are too short?

> > That's how we deal with empty datagrams anyway.
> 
> What is? With no-offset and a zero payload skb at the head, peek
> or recv returns 0, right?

Right.
Willem de Bruijn Aug. 14, 2017, 6:25 p.m. UTC | #8
On Mon, Aug 14, 2017 at 1:02 PM, Thiago Macieira
<thiago.macieira@intel.com> wrote:
> On Monday, 14 August 2017 09:33:48 PDT Willem de Bruijn wrote:
>> > But here's a question: if the peek offset is equal to the length, should
>> > the reading return an empty datagram? This would indicate to the caller
>> > that there was a datagram there, which was skipped over.
>>
>> In the general case, no, it should read at the offset, which is the next
>> skb.
>
> I beg to differ. In this particular case, we are talking about datagrams. If
> it were stream sockets, I would agree with you: just skip to the next. But in
> datagrams, the same way you do return zero-sized ones, I would return an empty
> one if you peeked at or past the end.

It can be argued either way. I would not change it in the scope of
this bug.

>
>> Since we only need to change no-offset semantics to fix this bug,
>> I would not change this behavior, which is also expected by some
>> applications by now.
>
> Do applications using SOCK_DGRAM rely on the behaviour of skipping over
> datagrams that are too short?

It is established behavior. It cannot be ruled out that an application
somewhere depends on it.
Thiago Macieira Aug. 14, 2017, 6:33 p.m. UTC | #9
On Monday, 14 August 2017 11:25:23 PDT Willem de Bruijn wrote:
> > Do applications using SOCK_DGRAM rely on the behaviour of skipping over
> > datagrams that are too short?
> 
> It is established behavior. It cannot be ruled out that an application
> somewhere depends on it.

Understood.

By the way, what were the usecases for the peek offset feature?

Also, do they apply to non-peeking recv?
Willem de Bruijn Aug. 14, 2017, 6:46 p.m. UTC | #10
On Mon, Aug 14, 2017 at 2:33 PM, Thiago Macieira
<thiago.macieira@intel.com> wrote:
> On Monday, 14 August 2017 11:25:23 PDT Willem de Bruijn wrote:
>> > Do applications using SOCK_DGRAM rely on the behaviour of skipping over
>> > datagrams that are too short?
>>
>> It is established behavior. It cannot be ruled out that an application
>> somewhere depends on it.
>
> Understood.
>
> By the way, what were the usecases for the peek offset feature?

The idea was to be able to peek at application headers of upper
layer protocols and multiplex messages among threads. It proved
so complex even for UDP that we did not attempt the same feature
for TCP. Also, KCM implements demultiplexing using eBPF today.

> Also, do they apply to non-peeking recv?

Reading at an offset is not implemented. Especially for RPC over
TCP, reading at an offset could make sense. Say, if a message
is received completely, but it is head-of-line blocked behind
another that has a hole. Here, too, KCM is an alternative.
Thiago Macieira Aug. 14, 2017, 6:58 p.m. UTC | #11
On Monday, 14 August 2017 11:46:42 PDT Willem de Bruijn wrote:
> > By the way, what were the usecases for the peek offset feature?
> 
> The idea was to be able to peek at application headers of upper
> layer protocols and multiplex messages among threads. It proved
> so complex even for UDP that we did not attempt the same feature
> for TCP. Also, KCM implements demultiplexing using eBPF today.

Interesting, but how would userspace coordinate like that? Suppose multiple 
threads are woken up by a datagram being received, they peek at a certain 
offset shared among them all to see which one reads. Suppose that thread is 
slow or blocked and, while it's getting its act together, another datagram 
arrives.

Because of that, the other threads can't disable their polling. They will 
continually be woken up by the kernel if they go back to poll/select. Even 
with epoll, there's no new edge trigger since event is already at level.

How will they avoid busy-waiting? And won't this secondary coordination 
obviate the need for offset peeking?
Willem de Bruijn Aug. 14, 2017, 7:03 p.m. UTC | #12
On Mon, Aug 14, 2017 at 2:58 PM, Thiago Macieira
<thiago.macieira@intel.com> wrote:
> On Monday, 14 August 2017 11:46:42 PDT Willem de Bruijn wrote:
>> > By the way, what were the usecases for the peek offset feature?
>>
>> The idea was to be able to peek at application headers of upper
>> layer protocols and multiplex messages among threads. It proved
>> so complex even for UDP that we did not attempt the same feature
>> for TCP. Also, KCM implements demultiplexing using eBPF today.
>
> Interesting, but how would userspace coordinate like that? Suppose multiple
> threads are woken up by a datagram being received

This assumes a separate listener thread and worker threadpool.
Thiago Macieira Aug. 14, 2017, 7:15 p.m. UTC | #13
On Monday, 14 August 2017 12:03:16 PDT Willem de Bruijn wrote:
> On Mon, Aug 14, 2017 at 2:58 PM, Thiago Macieira
> 
> <thiago.macieira@intel.com> wrote:
> > On Monday, 14 August 2017 11:46:42 PDT Willem de Bruijn wrote:
> >> > By the way, what were the usecases for the peek offset feature?
> >> 
> >> The idea was to be able to peek at application headers of upper
> >> layer protocols and multiplex messages among threads. It proved
> >> so complex even for UDP that we did not attempt the same feature
> >> for TCP. Also, KCM implements demultiplexing using eBPF today.
> > 
> > Interesting, but how would userspace coordinate like that? Suppose
> > multiple
> > threads are woken up by a datagram being received
> 
> This assumes a separate listener thread and worker threadpool.

The listener thread still needs to synchronise with the worker that got 
activated and wait for it to recv from the socket before the listener thread 
can go back to poll().

If we are really talking about threads in the same process, it might be easier 
for the listener to just read the datagram anyway and pass it on to the 
worker. That way, it can proceed immediately to the next datagram and not have 
to wait for the possibly slow worker.

If it is a separate process, then I don't see another way and this might be 
necessary.

By the way, what does recv with MSG_PEEK | MSG_TRUNC return? Is it the full 
datagram's size or is it the size minus the peek offset?
Willem de Bruijn Aug. 14, 2017, 7:39 p.m. UTC | #14
On Mon, Aug 14, 2017 at 3:15 PM, Thiago Macieira
<thiago.macieira@intel.com> wrote:
> On Monday, 14 August 2017 12:03:16 PDT Willem de Bruijn wrote:
>> On Mon, Aug 14, 2017 at 2:58 PM, Thiago Macieira
>>
>> <thiago.macieira@intel.com> wrote:
>> > On Monday, 14 August 2017 11:46:42 PDT Willem de Bruijn wrote:
>> >> > By the way, what were the usecases for the peek offset feature?
>> >>
>> >> The idea was to be able to peek at application headers of upper
>> >> layer protocols and multiplex messages among threads. It proved
>> >> so complex even for UDP that we did not attempt the same feature
>> >> for TCP. Also, KCM implements demultiplexing using eBPF today.
>> >
>> > Interesting, but how would userspace coordinate like that? Suppose
>> > multiple
>> > threads are woken up by a datagram being received
>>
>> This assumes a separate listener thread and worker threadpool.
>
> The listener thread still needs to synchronise with the worker that got
> activated and wait for it to recv from the socket before the listener thread
> can go back to poll().
>
> If we are really talking about threads in the same process, it might be easier
> for the listener to just read the datagram anyway and pass it on to the
> worker. That way, it can proceed immediately to the next datagram and not have
> to wait for the possibly slow worker.
>
> If it is a separate process, then I don't see another way and this might be
> necessary.
>
> By the way, what does recv with MSG_PEEK | MSG_TRUNC return? Is it the full
> datagram's size or is it the size minus the peek offset?

udp_recvmsg returns ulen if the flag is passed.

        if (flags & MSG_TRUNC)
                err = ulen;

This is computed earlier in the function as udp payload length and not
modified after.

        ulen = udp_skb_len(skb);
diff mbox

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..aeeec62992ca 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -507,9 +507,7 @@  int sk_set_peek_off(struct sock *sk, int val);
 static inline int sk_peek_offset(struct sock *sk, int flags)
 {
 	if (unlikely(flags & MSG_PEEK)) {
-		s32 off = READ_ONCE(sk->sk_peek_off);
-		if (off >= 0)
-			return off;
+		return READ_ONCE(sk->sk_peek_off);
 	}
 
 	return 0;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index ee5647bd91b3..a91cd8aa244b 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -175,8 +175,8 @@  struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
 	*last = queue->prev;
 	skb_queue_walk(queue, skb) {
 		if (flags & MSG_PEEK) {
-			if (_off >= skb->len && (skb->len || _off ||
-						 skb->peeked)) {
+			if (_off >= 0 && _off >= skb->len &&
+			    (_off || skb->peeked)) {
 				_off -= skb->len;
 				continue;
 			}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index a7c804f73990..a0d8d6d285f4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1576,6 +1576,10 @@  int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 try_again:
 	peeking = off = sk_peek_offset(sk, flags);
 	skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
+	if (off < 0) {
+		off = 0;
+		peeking = 0;
+	}
 	if (!skb)
 		return err;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 578142b7ca3e..34d8f9f8d87d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -364,6 +364,10 @@  int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 try_again:
 	peeking = off = sk_peek_offset(sk, flags);
 	skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
+	if (peeking < 0) {
+		off = 0;
+		peeking = 0;
+	}
 	if (!skb)
 		return err;
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7b52a380d710..390e0627dc05 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2127,6 +2127,8 @@  static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
 		skip = sk_peek_offset(sk, flags);
 		skb = __skb_try_recv_datagram(sk, flags, NULL, &peeked, &skip,
 					      &err, &last);
+		if (skip < 0)
+			skip = 0;
 		if (skb)
 			break;
 
@@ -2304,10 +2306,10 @@  static int unix_stream_read_generic(struct unix_stream_read_state *state,
 	 */
 	mutex_lock(&u->iolock);
 
-	if (flags & MSG_PEEK)
-		skip = sk_peek_offset(sk, flags);
-	else
+	skip = sk_peek_offset(sk, flags);
+	if (skip < 0) {
 		skip = 0;
+	}
 
 	do {
 		int chunk;