[v2] kcm: remove any offset before parsing messages

Message ID 1536657703-27577-1-git-send-email-asmadeus@codewreck.org
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • [v2] kcm: remove any offset before parsing messages
Related show

Commit Message

Dominique Martinet Sept. 11, 2018, 9:21 a.m.
The current code assumes kcm users know they need to look for the
strparser offset within their bpf program, which is not documented
anywhere and examples laying around do not do.

The actual recv function does handle the offset well, so we can create a
temporary clone of the skb and pull that one up as required for parsing.

The pull itself has a cost if we are pulling beyond the head data,
measured to 2-3% latency in a noisy VM with a local client stressing
that path. The clone's impact seemed too small to measure.

This bug can be exhibited easily by implementing a "trivial" kcm parser
taking the first bytes as size, and on the client sending at least two
such packets in a single write().

Note that bpf sockmap has the same problem, both for parse and for recv,
so it would pulling twice or a real pull within the strparser logic if
anyone cares about that.

Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
---

Hmm, while trying to benchmark this, I sometimes got hangs in
kcm_wait_data() for the last packet somehow?
The sender program was done (exited (zombie) so I assumed the sender
socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
indicating it parsed a header but there was no skb to peek at?
But the sock is locked so this shouldn't be racy...

I can get it fairly often with this patch and small messages with an
offset, but I think it's just because the pull changes some timing - I
can't hit it with just the clone, and I can hit it with a pull without
clone as well.... And I don't see how pulling a cloned skb can impact
the original socket, but I'm a bit fuzzy on this.

(it's interesting that I didn't seem to hit this race when pulling in
strparser, that shouldn't be any different)


I'll look at that a bit more, but there have been no activity here for
a while and I don't have the energy to keep pushing in the strparser
direction, so take this patch more as a change of direction and a bit
as a 'bump' on the subject - I think it's important but I have too much
on my plate right now to keep pushing if there is no interest from the
authors.

 net/kcm/kcmsock.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

Comments

Dominique Martinet Sept. 12, 2018, 5:36 a.m. | #1
Dominique Martinet wrote on Tue, Sep 11, 2018:
> Hmm, while trying to benchmark this, I sometimes got hangs in
> kcm_wait_data() for the last packet somehow?
> The sender program was done (exited (zombie) so I assumed the sender
> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
> indicating it parsed a header but there was no skb to peek at?
> But the sock is locked so this shouldn't be racy...
> 
> I can get it fairly often with this patch and small messages with an
> offset, but I think it's just because the pull changes some timing - I
> can't hit it with just the clone, and I can hit it with a pull without
> clone as well.... And I don't see how pulling a cloned skb can impact
> the original socket, but I'm a bit fuzzy on this.

This is weird, I cannot reproduce at all without that pull, even if I
add another delay there instead of the pull, so it's not just timing...

I was confusing kcm_recvmsg and kcm_rcv_strparser but I still think
there is a race in kcm_wait_data between the peek and the wait. I have
confirmed on a hang that the sock's sk_receive_queue is not empty so
skb_peek would return something at this point, but it is waiting in
kcm_wait_data's sk_wait_data.. And no event would be coming at this
point since the sender is done.

kcm_wait_data does have the socket lock but the enqueuing counterpart
(kcm_queue_rcv_skb) is apparently called without lock most of the time
(at least, sk->sk_lock.owned is not set most of the times) ; but if I
add a lock_sock() here it can deadlock when a kfree_skb triggers
kcm_rcv_ready if that kfree_skb was done with the lock... ugh.

I thought the mux rx lock would be taken all the time but that appear
not to be the case either, and even if it were I don't see how to make
sk_wait_data with another lock in the first place.


So meh, I'm not sure why the pull messes up with this, I'm tempted to
say this is an old problem but I'm quite annoyed I do not seem to be
able to reproduce it.
I'm really out of time now so unless someone cares it'll wait for a few
more weeks.


(As an unrelated note, wrt to the patch, it might be nice to add a new
kcm socket option so users could say "my bpf program handles offsets,
don't bother with this")
David Miller Sept. 18, 2018, 1:45 a.m. | #2
From: Dominique Martinet <asmadeus@codewreck.org>
Date: Wed, 12 Sep 2018 07:36:42 +0200

> Dominique Martinet wrote on Tue, Sep 11, 2018:
>> Hmm, while trying to benchmark this, I sometimes got hangs in
>> kcm_wait_data() for the last packet somehow?
>> The sender program was done (exited (zombie) so I assumed the sender
>> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
>> indicating it parsed a header but there was no skb to peek at?
>> But the sock is locked so this shouldn't be racy...
>> 
>> I can get it fairly often with this patch and small messages with an
>> offset, but I think it's just because the pull changes some timing - I
>> can't hit it with just the clone, and I can hit it with a pull without
>> clone as well.... And I don't see how pulling a cloned skb can impact
>> the original socket, but I'm a bit fuzzy on this.
> 
> This is weird, I cannot reproduce at all without that pull, even if I
> add another delay there instead of the pull, so it's not just timing...

I really can't apply this patch until you resolve this.

It is weird, given your description, though...
Dominique Martinet Sept. 18, 2018, 1:57 a.m. | #3
David Miller wrote on Mon, Sep 17, 2018:
> From: Dominique Martinet <asmadeus@codewreck.org>
> Date: Wed, 12 Sep 2018 07:36:42 +0200
> 
> > Dominique Martinet wrote on Tue, Sep 11, 2018:
> >> Hmm, while trying to benchmark this, I sometimes got hangs in
> >> kcm_wait_data() for the last packet somehow?
> >> The sender program was done (exited (zombie) so I assumed the sender
> >> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
> >> indicating it parsed a header but there was no skb to peek at?
> >> But the sock is locked so this shouldn't be racy...
> >> 
> >> I can get it fairly often with this patch and small messages with an
> >> offset, but I think it's just because the pull changes some timing - I
> >> can't hit it with just the clone, and I can hit it with a pull without
> >> clone as well.... And I don't see how pulling a cloned skb can impact
> >> the original socket, but I'm a bit fuzzy on this.
> > 
> > This is weird, I cannot reproduce at all without that pull, even if I
> > add another delay there instead of the pull, so it's not just timing...
> 
> I really can't apply this patch until you resolve this.
> 
> It is weird, given your description, though...

Thanks for the reminder! I totally agree with you here and did not
expect this to be merged as it is (in retrospect, I probably should have
written something to that extent in the subject, "RFC"?)

I really don't have much time to give to that right now as I'm doing
this on my freetime, and the lack of reply has been rather demotivating
so it got pushed back a few times...
Given you did reply now I'll try to spend some time to figure that out
in the next couple of weeks but it might not make it for this cycle
depending on the number of rc we'll get and time you want this to soak
it -next.


(I can start by putting the pull back in netparser and try to reproduce,
it's really weird that I never got it to happen at the time...)
David Miller Sept. 18, 2018, 2:40 a.m. | #4
From: Dominique Martinet <asmadeus@codewreck.org>
Date: Tue, 18 Sep 2018 03:57:23 +0200

> Given you did reply now I'll try to spend some time to figure that out
> in the next couple of weeks but it might not make it for this cycle
> depending on the number of rc we'll get and time you want this to soak
> it -next.

Great.

Remind me, is there actually any way for the bpf programs run in this
situation to even _see_ strp_msg(skb)->offset at all?

There isn't right?  And the alternate proposal was to add such a
facility, right?

Just trying to remember all of the context, maybe it's good
information to add to the commit message?
Dominique Martinet Sept. 18, 2018, 2:45 a.m. | #5
David Miller wrote on Mon, Sep 17, 2018:
> Remind me, is there actually any way for the bpf programs run in this
> situation to even _see_ strp_msg(skb)->offset at all?

No, they can see it, so it's possible to make a KCM program that works
right now if you are careful (I'm not sure why the offset within bpf is
different from the offset in the kernel though, it looks like the bpf
program skips the qos part of the control buffer)

> There isn't right?  And the alternate proposal was to add such a
> facility, right?

The problem is that this isn't documented at all, and I could not find
any example doing that until Dave gave me one (I couldn't get it to work
because of the different offset).

The alternate proposal was to just document it, yes.

> Just trying to remember all of the context, maybe it's good
> information to add to the commit message?

Good idea, I'll add some more explanation there.
David Miller Sept. 18, 2018, 2:51 a.m. | #6
From: Dominique Martinet <asmadeus@codewreck.org>
Date: Tue, 18 Sep 2018 04:45:36 +0200

> David Miller wrote on Mon, Sep 17, 2018:
>> Remind me, is there actually any way for the bpf programs run in this
>> situation to even _see_ strp_msg(skb)->offset at all?
> 
> No, they can see it, so it's possible to make a KCM program that works
> right now if you are careful (I'm not sure why the offset within bpf is
> different from the offset in the kernel though, it looks like the bpf
> program skips the qos part of the control buffer)

What helper is used in the BPF program to get this offset value?

(also good info to add to the commit message)
Dominique Martinet Sept. 18, 2018, 2:58 a.m. | #7
David Miller wrote on Mon, Sep 17, 2018:
> > No, they can see it, so it's possible to make a KCM program that works
> > right now if you are careful (I'm not sure why the offset within bpf is
> > different from the offset in the kernel though, it looks like the bpf
> > program skips the qos part of the control buffer)
> 
> What helper is used in the BPF program to get this offset value?
> 
> (also good info to add to the commit message)

Dave defined one himself ; for a simple protocol where the offset is in
the first four bytes of the message.

The whole bpf program could look like this:

------
struct kcm_rx_msg { int full_len; int offset; };
static inline struct kcm_rx_msg *kcm_rx_msg(struct __sk_buff *skb) {
	return (struct kcm_rx_msg *)skb->cb;
}
int decode_framing(struct __sk_buff *skb) {
	return load_word(skb, kcm_rx_msg(skb)->offset);
}
------

If we go towards documenting it, adding a helper would be useful yes;
buf if we pull that becomes unnecessary.
(I'll add the example program in the commit message anyway at your
suggestion)

Patch

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 571d824e4e24..36c438b95955 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -381,8 +381,32 @@  static int kcm_parse_func_strparser(struct strparser *strp, struct sk_buff *skb)
 {
 	struct kcm_psock *psock = container_of(strp, struct kcm_psock, strp);
 	struct bpf_prog *prog = psock->bpf_prog;
+	struct sk_buff *clone_skb = NULL;
+	struct strp_msg *stm;
+	int rc;
+
+	stm = strp_msg(skb);
+	if (stm->offset) {
+		skb = clone_skb = skb_clone(skb, GFP_ATOMIC);
+		if (!clone_skb)
+			return -ENOMEM;
+
+		if (!pskb_pull(clone_skb, stm->offset)) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		/* reset cloned skb's offset for bpf programs using it */
+		stm = strp_msg(clone_skb);
+		stm->offset = 0;
+	}
+
+	rc = (*prog->bpf_func)(skb, prog->insnsi);
+out:
+	if (clone_skb)
+		kfree_skb(clone_skb);
 
-	return (*prog->bpf_func)(skb, prog->insnsi);
+	return rc;
 }
 
 static int kcm_read_sock_done(struct strparser *strp, int err)