diff mbox series

[RFC,net] net/tls: clear SG markings on encryption error

Message ID 20191122214553.20982-1-jakub.kicinski@netronome.com
State RFC
Headers show
Series [RFC,net] net/tls: clear SG markings on encryption error | expand

Commit Message

Jakub Kicinski Nov. 22, 2019, 9:45 p.m. UTC
When tls_do_encryption() fails the SG lists are left with the
SG_END and SG_CHAIN marks in place. One could hope that once
encryption fails we will never see the record again, but that
is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support
to sk_msg handling") added special handling to ENOMEM and ENOSPC
errors which mean we may see the same record re-submitted.

In all honesty I don't understand why we need the ENOMEM handling.
Waiting for socket memory without setting SOCK_NOSPACE on any
random memory allocation failure seems slightly ill advised.

Having said that, undoing the SG markings seems wise regardless.

Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com
Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
John, I'm sending this mostly to ask if we can safely remove
the ENOMEM handling? :)

I was going to try the sockmap tests myself, but looks like the current
LLVM 10 build I get from their debs just segfaults when trying to build
selftest :/

Also there's at least one more bug in this piece of code, TLS 1.3
can't assume there's at least one free SG entry.

 net/tls/tls_sw.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Jakub Kicinski Nov. 22, 2019, 10:36 p.m. UTC | #1
On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote:
> Also there's at least one more bug in this piece of code, TLS 1.3
> can't assume there's at least one free SG entry.

And I don't see any place where the front and back of the SG circular
buffer are actually chained :( This:

static inline void sk_msg_init(struct sk_msg *msg)
{
	BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
	memset(msg, 0, sizeof(*msg));
	sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
}

looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end,
we don't know where the end is going to be..

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 6cb077b646a5..6c6ce6f90e7d 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg)
 
 static inline void sk_msg_init(struct sk_msg *msg)
 {
-       BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
        memset(msg, 0, sizeof(*msg));
-       sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
+       sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data);
 }
 
 static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src,

Hm?
John Fastabend Nov. 23, 2019, 3:25 a.m. UTC | #2
Jakub Kicinski wrote:
> When tls_do_encryption() fails the SG lists are left with the
> SG_END and SG_CHAIN marks in place. One could hope that once
> encryption fails we will never see the record again, but that
> is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support
> to sk_msg handling") added special handling to ENOMEM and ENOSPC
> errors which mean we may see the same record re-submitted.
> 
> In all honesty I don't understand why we need the ENOMEM handling.
> Waiting for socket memory without setting SOCK_NOSPACE on any
> random memory allocation failure seems slightly ill advised.
> 
> Having said that, undoing the SG markings seems wise regardless.
> 
> Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com
> Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
> Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> John, I'm sending this mostly to ask if we can safely remove
> the ENOMEM handling? :)
> 
What ENOMEM are you asking about here? The return code handling
from bpf_exec_tx_verdict?


	ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
				  record_type, &copied,
				  msg->msg_flags);
	if (ret) {
		if (ret == -EINPROGRESS)
			num_async++;
		else if (ret == -ENOMEM)
			goto wait_for_memory;
		else if (ret == -ENOSPC)
			goto rollback_iter;
		else if (ret != -EAGAIN)
			goto send_end;
	}

I would want to run it through some of our tests but I don't think
there is anything specific about BPF that needs it to be handled.
I was just trying to handle the error case gracefully and
wait_for_memory seems like the right behavior to me. What is ill
advised here?

> I was going to try the sockmap tests myself, but looks like the current
> LLVM 10 build I get from their debs just segfaults when trying to build
> selftest :/
> 
> Also there's at least one more bug in this piece of code, TLS 1.3
> can't assume there's at least one free SG entry.

There should always be one free SG entry at the end of the ring
that is used for chaining.

From sk_msg_sg{}

	/* The extra element is used for chaining the front and sections when
	 * the list becomes partitioned (e.g. end < start). The crypto APIs
	 * require the chaining.
	 */
	struct scatterlist		data[MAX_MSG_FRAGS + 1];

Can we use that element in that case? Otherwise probably can
add an extra element there if needed, data[MAX_MSG_FRAGS + 2].

> 
>  net/tls/tls_sw.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 24161750a737..4a0ea87b20cf 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags,
>  	if (rc < 0) {
>  		if (rc != -EINPROGRESS) {
>  			tls_err_abort(sk, EBADMSG);
> +
> +			i = msg_pl->sg.end;
> +			if (prot->version == TLS_1_3_VERSION) {
> +				sg_mark_end(sk_msg_elem(msg_pl, i));
> +				sg_unmark_end(sk_msg_elem(msg_pl, i));
> +			}
> +			sk_msg_iter_var_prev(i);
> +			sg_unmark_end(sk_msg_elem(msg_pl, i));
> +
> +			i = msg_en->sg.end;
> +			sk_msg_iter_var_prev(i);
> +			sg_unmark_end(sk_msg_elem(msg_en, i));
> +
>  			if (split) {
>  				tls_ctx->pending_open_record_frags = true;
>  				tls_merge_open_record(sk, rec, tmp, orig_end);
> -- 
> 2.23.0
> 

Can you copy the tls_push_record() error handling from BPF side instead of
embedding more into tls_push_record itself?

	err = tls_push_record(sk, flags, record_type);
	if (err < 0) {
		*copied -= sk_msg_free(sk, msg);
		tls_free_open_rec(sk);
		goto out_err;
	}

If the BPF program is not installed I guess you can skip the copied part
because you wont have the 'more_data' case.

So something like (untested/compiled/etc)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 141da093ff04..0469eb73bc88 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -771,8 +771,14 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
 
 	policy = !(flags & MSG_SENDPAGE_NOPOLICY);
 	psock = sk_psock_get(sk);
-	if (!psock || !policy)
-		return tls_push_record(sk, flags, record_type);
+	if (!psock || !policy) {
+		err = tls_push_record(sk, flags, record_type);
+		if (err) {
+			sk_msg_free(sk, msg);
+			tls_free_open_rec(sk); // might not be needed in noBPF
+		}
+		return err;
+	}
 more_data:
 	enospc = sk_msg_full(msg);
 	if (psock->eval == __SK_NONE) {
John Fastabend Nov. 23, 2019, 6:56 a.m. UTC | #3
Jakub Kicinski wrote:
> On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote:
> > Also there's at least one more bug in this piece of code, TLS 1.3
> > can't assume there's at least one free SG entry.
> 
> And I don't see any place where the front and back of the SG circular
> buffer are actually chained :( This:

The easiest way to generate a message that needs to be chained is to
use cork but we haven't yet enabled cork. However, there is one case
with the use of apply, pass, and drop that I think this case could
also be generated. I'll add a test for it and a fix. This case should
only be hit when using with BPF and programs using apply/cork.

I have the patches for cork support on a branch as well so we should
probably just send those out.

> 
> static inline void sk_msg_init(struct sk_msg *msg)
> {
> 	BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
> 	memset(msg, 0, sizeof(*msg));
> 	sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> }
> 
> looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end,
> we don't know where the end is going to be..

We use end->MAX_MSG_FRAGS and size==0 to indicate an fresh sk_msg. This
should only ever be called to initialize a msg never afterwards.

> 
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 6cb077b646a5..6c6ce6f90e7d 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg)
>  
>  static inline void sk_msg_init(struct sk_msg *msg)
>  {
> -       BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
>         memset(msg, 0, sizeof(*msg));
> -       sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> +       sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data);
>  }
>  

I don't think we want to chain these here. We could drop the init
marker part but its handy when reading sg values.

>  static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src,
> 
> Hm?

Nice catch on the missing chaining we dropped across various revisions
and rebases of the code. Without cork support our test cases don't hit
it now.
Jakub Kicinski Nov. 25, 2019, 7:52 p.m. UTC | #4
Sorry for the delay, this code is too tricky for me to handle while
applying patches :) Lemme quickly answer best I can and go back into
digging deeper.

On Fri, 22 Nov 2019 19:25:13 -0800, John Fastabend wrote:
> Jakub Kicinski wrote:
> > When tls_do_encryption() fails the SG lists are left with the
> > SG_END and SG_CHAIN marks in place. One could hope that once
> > encryption fails we will never see the record again, but that
> > is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support
> > to sk_msg handling") added special handling to ENOMEM and ENOSPC
> > errors which mean we may see the same record re-submitted.
> > 
> > In all honesty I don't understand why we need the ENOMEM handling.
> > Waiting for socket memory without setting SOCK_NOSPACE on any
> > random memory allocation failure seems slightly ill advised.
> > 
> > Having said that, undoing the SG markings seems wise regardless.
> > 
> > Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com
> > Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
> > Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> > John, I'm sending this mostly to ask if we can safely remove
> > the ENOMEM handling? :)
> >   
> What ENOMEM are you asking about here? The return code handling
> from bpf_exec_tx_verdict?
> 
> 
> 	ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
> 				  record_type, &copied,
> 				  msg->msg_flags);
> 	if (ret) {
> 		if (ret == -EINPROGRESS)
> 			num_async++;
> 		else if (ret == -ENOMEM)
> 			goto wait_for_memory;
> 		else if (ret == -ENOSPC)
> 			goto rollback_iter;
> 		else if (ret != -EAGAIN)
> 			goto send_end;
> 	}
> 
> I would want to run it through some of our tests but I don't think
> there is anything specific about BPF that needs it to be handled.
> I was just trying to handle the error case gracefully and
> wait_for_memory seems like the right behavior to me. What is ill
> advised here?

It's probably too strong of an expression :)

I was not seeing a clear relationship between ENOMEM and socket being
out of memory. The ENOMEM is likely because of a slab allocation
failure (quick look doesn't really reveal send_pages returning ENOMEM
if socket is full). So the wait is equivalent to a msleep, no?

I was a little worried about this error handling, it seems to make
assumptions about the reasons for errors. I heard a report of user
seeing ENOSPC coming out of a crypto accelerator, which didn't end well.

> > I was going to try the sockmap tests myself, but looks like the current
> > LLVM 10 build I get from their debs just segfaults when trying to build
> > selftest :/
> > 
> > Also there's at least one more bug in this piece of code, TLS 1.3
> > can't assume there's at least one free SG entry.  
> 
> There should always be one free SG entry at the end of the ring
> that is used for chaining.
> 
> From sk_msg_sg{}
> 
> 	/* The extra element is used for chaining the front and sections when
> 	 * the list becomes partitioned (e.g. end < start). The crypto APIs
> 	 * require the chaining.
> 	 */
> 	struct scatterlist		data[MAX_MSG_FRAGS + 1];
> 
> Can we use that element in that case? Otherwise probably can
> add an extra element there if needed, data[MAX_MSG_FRAGS + 2].

If sg really is a circular buffer we'd need to shift all entries to
make sure the hole is in the right place. Then we have a chain entry 
in the middle of the ring, and that's not good, right now the range
of [0, MAX_MSG_FRAGS) is assumed to contain data pages :S

> >  net/tls/tls_sw.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index 24161750a737..4a0ea87b20cf 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags,
> >  	if (rc < 0) {
> >  		if (rc != -EINPROGRESS) {
> >  			tls_err_abort(sk, EBADMSG);
> > +
> > +			i = msg_pl->sg.end;
> > +			if (prot->version == TLS_1_3_VERSION) {
> > +				sg_mark_end(sk_msg_elem(msg_pl, i));
> > +				sg_unmark_end(sk_msg_elem(msg_pl, i));
> > +			}
> > +			sk_msg_iter_var_prev(i);
> > +			sg_unmark_end(sk_msg_elem(msg_pl, i));
> > +
> > +			i = msg_en->sg.end;
> > +			sk_msg_iter_var_prev(i);
> > +			sg_unmark_end(sk_msg_elem(msg_en, i));
> > +
> >  			if (split) {
> >  				tls_ctx->pending_open_record_frags = true;
> >  				tls_merge_open_record(sk, rec, tmp, orig_end);
>
> Can you copy the tls_push_record() error handling from BPF side instead of
> embedding more into tls_push_record itself?
> 
> 	err = tls_push_record(sk, flags, record_type);
> 	if (err < 0) {
> 		*copied -= sk_msg_free(sk, msg);
> 		tls_free_open_rec(sk);
> 		goto out_err;
> 	}
> 
> If the BPF program is not installed I guess you can skip the copied part
> because you wont have the 'more_data' case.
> 
> So something like (untested/compiled/etc)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 141da093ff04..0469eb73bc88 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -771,8 +771,14 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
>  
>  	policy = !(flags & MSG_SENDPAGE_NOPOLICY);
>  	psock = sk_psock_get(sk);
> -	if (!psock || !policy)
> -		return tls_push_record(sk, flags, record_type);
> +	if (!psock || !policy) {
> +		err = tls_push_record(sk, flags, record_type);
> +		if (err) {
> +			sk_msg_free(sk, msg);
> +			tls_free_open_rec(sk); // might not be needed in noBPF
> +		}
> +		return err;
> +	}
>  more_data:
>  	enospc = sk_msg_full(msg);
>  	if (psock->eval == __SK_NONE) {

Ah, that's a great suggestion, I missed the copied count is passed by
reference into bpf_exec_tx_verdict() and thought more surgery would be
required to handle it like that.
Jakub Kicinski Nov. 25, 2019, 7:58 p.m. UTC | #5
On Fri, 22 Nov 2019 22:56:05 -0800, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote:  
> > > Also there's at least one more bug in this piece of code, TLS 1.3
> > > can't assume there's at least one free SG entry.  
> > 
> > And I don't see any place where the front and back of the SG circular
> > buffer are actually chained :( This:  
> 
> The easiest way to generate a message that needs to be chained is to
> use cork but we haven't yet enabled cork. However, there is one case
> with the use of apply, pass, and drop that I think this case could
> also be generated. I'll add a test for it and a fix. This case should
> only be hit when using with BPF and programs using apply/cork.
> 
> I have the patches for cork support on a branch as well so we should
> probably just send those out.
> 
> > 
> > static inline void sk_msg_init(struct sk_msg *msg)
> > {
> > 	BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
> > 	memset(msg, 0, sizeof(*msg));
> > 	sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> > }
> > 
> > looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end,
> > we don't know where the end is going to be..  
> 
> We use end->MAX_MSG_FRAGS and size==0 to indicate an fresh sk_msg. This
> should only ever be called to initialize a msg never afterwards.
> 
> > 
> > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> > index 6cb077b646a5..6c6ce6f90e7d 100644
> > --- a/include/linux/skmsg.h
> > +++ b/include/linux/skmsg.h
> > @@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg)
> >  
> >  static inline void sk_msg_init(struct sk_msg *msg)
> >  {
> > -       BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
> >         memset(msg, 0, sizeof(*msg));
> > -       sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> > +       sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data);
> >  }
> >    
> 
> I don't think we want to chain these here. We could drop the init
> marker part but its handy when reading sg values.
> 
> >  static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src,
> > 
> > Hm?  
> 
> Nice catch on the missing chaining we dropped across various revisions
> and rebases of the code. Without cork support our test cases don't hit
> it now.

I see, thanks for the explanation! I'll leave fixing the chaining to
you, then.
diff mbox series

Patch

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 24161750a737..4a0ea87b20cf 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -737,6 +737,19 @@  static int tls_push_record(struct sock *sk, int flags,
 	if (rc < 0) {
 		if (rc != -EINPROGRESS) {
 			tls_err_abort(sk, EBADMSG);
+
+			i = msg_pl->sg.end;
+			if (prot->version == TLS_1_3_VERSION) {
+				sg_mark_end(sk_msg_elem(msg_pl, i));
+				sg_unmark_end(sk_msg_elem(msg_pl, i));
+			}
+			sk_msg_iter_var_prev(i);
+			sg_unmark_end(sk_msg_elem(msg_pl, i));
+
+			i = msg_en->sg.end;
+			sk_msg_iter_var_prev(i);
+			sg_unmark_end(sk_msg_elem(msg_en, i));
+
 			if (split) {
 				tls_ctx->pending_open_record_frags = true;
 				tls_merge_open_record(sk, rec, tmp, orig_end);