diff mbox series

bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg

Message ID 20200605084625.9783-1-anny.hu@linux.alibaba.com
State Accepted
Delegated to: BPF Maintainers
Headers show
Series bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg | expand

Commit Message

dihu June 5, 2020, 8:46 a.m. UTC
When user application calls read() with MSG_PEEK flag to read data
of bpf sockmap socket, kernel panic happens at
__tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
queue after read out under MSG_PEEK flag is set. Because it's not
judged whether sk_msg is the last msg of ingress_msg queue, the next
sk_msg may be the head of ingress_msg queue, whose memory address of
sg page is invalid. So it's necessary to add check codes to prevent
this problem.

[20759.125457] BUG: kernel NULL pointer dereference, address:
0000000000000008
[20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G            E
5.4.32 #1
[20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
4.1.12 06/18/2017
[20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
[20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
[20759.276099] tcp_bpf_recvmsg+0x113/0x370
[20759.281137] inet_recvmsg+0x55/0xc0
[20759.285734] __sys_recvfrom+0xc8/0x130
[20759.290566] ? __audit_syscall_entry+0x103/0x130
[20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
[20759.301700] ? __audit_syscall_exit+0x1e4/0x290
[20759.307235] __x64_sys_recvfrom+0x24/0x30
[20759.312226] do_syscall_64+0x55/0x1b0
[20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: dihu <anny.hu@linux.alibaba.com>
---
 net/ipv4/tcp_bpf.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

John Fastabend June 8, 2020, 4:06 p.m. UTC | #1
dihu wrote:
> When user application calls read() with MSG_PEEK flag to read data
> of bpf sockmap socket, kernel panic happens at
> __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
> queue after read out under MSG_PEEK flag is set. Because it's not
> judged whether sk_msg is the last msg of ingress_msg queue, the next
> sk_msg may be the head of ingress_msg queue, whose memory address of
> sg page is invalid. So it's necessary to add check codes to prevent
> this problem.
> 
> [20759.125457] BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G            E
> 5.4.32 #1
> [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
> 4.1.12 06/18/2017
> [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
> [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
> [20759.276099] tcp_bpf_recvmsg+0x113/0x370
> [20759.281137] inet_recvmsg+0x55/0xc0
> [20759.285734] __sys_recvfrom+0xc8/0x130
> [20759.290566] ? __audit_syscall_entry+0x103/0x130
> [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
> [20759.301700] ? __audit_syscall_exit+0x1e4/0x290
> [20759.307235] __x64_sys_recvfrom+0x24/0x30
> [20759.312226] do_syscall_64+0x55/0x1b0
> [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Signed-off-by: dihu <anny.hu@linux.alibaba.com>
> ---
>  net/ipv4/tcp_bpf.c | 3 +++
>  1 file changed, 3 insertions(+)
> 

Thanks, looks good to me.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Jakub Sitnicki June 9, 2020, 9:03 a.m. UTC | #2
On Fri, Jun 05, 2020 at 10:46 AM CEST, dihu wrote:
> When user application calls read() with MSG_PEEK flag to read data
> of bpf sockmap socket, kernel panic happens at
> __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
> queue after read out under MSG_PEEK flag is set. Because it's not
> judged whether sk_msg is the last msg of ingress_msg queue, the next
> sk_msg may be the head of ingress_msg queue, whose memory address of
> sg page is invalid. So it's necessary to add check codes to prevent
> this problem.
>
> [20759.125457] BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G            E
> 5.4.32 #1
> [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
> 4.1.12 06/18/2017
> [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
> [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
> [20759.276099] tcp_bpf_recvmsg+0x113/0x370
> [20759.281137] inet_recvmsg+0x55/0xc0
> [20759.285734] __sys_recvfrom+0xc8/0x130
> [20759.290566] ? __audit_syscall_entry+0x103/0x130
> [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
> [20759.301700] ? __audit_syscall_exit+0x1e4/0x290
> [20759.307235] __x64_sys_recvfrom+0x24/0x30
> [20759.312226] do_syscall_64+0x55/0x1b0
> [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: dihu <anny.hu@linux.alibaba.com>
> ---
>  net/ipv4/tcp_bpf.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 5a05327..b82e4c3 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
>  		} while (i != msg_rx->sg.end);
>  
>  		if (unlikely(peek)) {
> +			if (msg_rx == list_last_entry(&psock->ingress_msg,
> +						      struct sk_msg, list))
> +				break;
>  			msg_rx = list_next_entry(msg_rx, list);
>  			continue;
>  		}

Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Alexei Starovoitov June 9, 2020, 5:58 p.m. UTC | #3
On Tue, Jun 9, 2020 at 2:04 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Fri, Jun 05, 2020 at 10:46 AM CEST, dihu wrote:
> > When user application calls read() with MSG_PEEK flag to read data
> > of bpf sockmap socket, kernel panic happens at
> > __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
> > queue after read out under MSG_PEEK flag is set. Because it's not
> > judged whether sk_msg is the last msg of ingress_msg queue, the next
> > sk_msg may be the head of ingress_msg queue, whose memory address of
> > sg page is invalid. So it's necessary to add check codes to prevent
> > this problem.
> >
> > [20759.125457] BUG: kernel NULL pointer dereference, address:
> > 0000000000000008
> > [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G            E
> > 5.4.32 #1
> > [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
> > 4.1.12 06/18/2017
> > [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
> > [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
> > [20759.276099] tcp_bpf_recvmsg+0x113/0x370
> > [20759.281137] inet_recvmsg+0x55/0xc0
> > [20759.285734] __sys_recvfrom+0xc8/0x130
> > [20759.290566] ? __audit_syscall_entry+0x103/0x130
> > [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
> > [20759.301700] ? __audit_syscall_exit+0x1e4/0x290
> > [20759.307235] __x64_sys_recvfrom+0x24/0x30
> > [20759.312226] do_syscall_64+0x55/0x1b0
> > [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Signed-off-by: dihu <anny.hu@linux.alibaba.com>
> > ---
> >  net/ipv4/tcp_bpf.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > index 5a05327..b82e4c3 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
> >               } while (i != msg_rx->sg.end);
> >
> >               if (unlikely(peek)) {
> > +                     if (msg_rx == list_last_entry(&psock->ingress_msg,
> > +                                                   struct sk_msg, list))
> > +                             break;
> >                       msg_rx = list_next_entry(msg_rx, list);
> >                       continue;
> >               }
>
> Acked-by: Jakub Sitnicki <jakub@cloudflare.com>

Applied. Thanks
diff mbox series

Patch

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 5a05327..b82e4c3 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -64,6 +64,9 @@  int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
 		} while (i != msg_rx->sg.end);
 
 		if (unlikely(peek)) {
+			if (msg_rx == list_last_entry(&psock->ingress_msg,
+						      struct sk_msg, list))
+				break;
 			msg_rx = list_next_entry(msg_rx, list);
 			continue;
 		}