diff mbox

net: guard against coalescing packets from buggy network drivers

Message ID 535A78ED.3020309@secomea.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Svenning Sørensen April 25, 2014, 3:02 p.m. UTC
This patch adds a guard against coalescing packets received by buggy network
drivers that don't initialize skb->tail properly.

Coalescing such packets will corrupt data in a way that makes it hard to
track down the real cause of the problem. Observed symptoms are a flood of
WARNs by tcp_recvmsg at random times, leaving no clear indication as to why
the packet buffers were corrupted. See for example this thread:
https://groups.google.com/forum/#!topic/linux.kernel/A8ZI9aXooSU

Obviously the correct thing to do is fixing the drivers in question (eg. by
using skb_put() instead of just setting skb->len) - this patch will 
hopefully
help to track them down with less effort than I had to put into it :)

NB: in my case it was an out-of-tree Bluegiga WiFi driver; a quick look 
indicates
that there are in-tree drivers (for hardware that I don't own) with the 
same bug.


Signed-off-by: Svenning Soerensen <sss@secomea.com>

          return true;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet April 25, 2014, 3:43 p.m. UTC | #1
On Fri, 2014-04-25 at 17:02 +0200, Svenning Sørensen wrote:
> This patch adds a guard against coalescing packets received by buggy network
> drivers that don't initialize skb->tail properly.
> 
> Coalescing such packets will corrupt data in a way that makes it hard to
> track down the real cause of the problem. Observed symptoms are a flood of
> WARNs by tcp_recvmsg at random times, leaving no clear indication as to why
> the packet buffers were corrupted. See for example this thread:
> https://groups.google.com/forum/#!topic/linux.kernel/A8ZI9aXooSU
> 
> Obviously the correct thing to do is fixing the drivers in question (eg. by
> using skb_put() instead of just setting skb->len) - this patch will 
> hopefully
> help to track them down with less effort than I had to put into it :)
> 
> NB: in my case it was an out-of-tree Bluegiga WiFi driver; a quick look 
> indicates
> that there are in-tree drivers (for hardware that I don't own) with the 
> same bug.
> 
> 
> Signed-off-by: Svenning Soerensen <sss@secomea.com>

If you want a debugging patch, this should happen way before TCP stack
or even IP stack.

Say you want to use this buggy driver on a router, this code path will
never bit hit.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Svenning Sørensen April 25, 2014, 4:01 p.m. UTC | #2
On 25-04-2014 17:43, Eric Dumazet wrote:
>
> If you want a debugging patch, this should happen way before TCP stack
> or even IP stack.
>
> Say you want to use this buggy driver on a router, this code path will
> never bit hit.
>
You are right, of course, there are more effective ways to catch buggy 
drivers.
But they will probably also be much more expensive.
This one is very cheap, being in a relatively cold path, especially 
compared to the memcpy in the same path.

Svenning
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 25, 2014, 4:49 p.m. UTC | #3
On Fri, 2014-04-25 at 18:01 +0200, Svenning Sørensen wrote:

> You are right, of course, there are more effective ways to catch buggy 
> drivers.
> But they will probably also be much more expensive.
> This one is very cheap, being in a relatively cold path, especially 
> compared to the memcpy in the same path.

It seems you missed the recent tipc thread then.

They are many ways this code path can be abused, your patch only takes
care of one case.

In tipc case, they added skb to shinfo->frag_list without changing
skb->data_len.

So skb_tailroom(to) was not returning 0 as it should.

The invariant should be more strict.

BUG_ON(to->data + to->len != skb_tail_pointer(to));

I still think a BUG() to catch this is better.

There is no guarantee the developer/user will catch a WARN(),
the bug could sit there a long time.

I have no pity for serious bugs like that.

Since you spot buggy drivers, please provide their fixes or at least
name them so that we can take care of them.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Svenning Sørensen April 25, 2014, 7:32 p.m. UTC | #4
On 25-04-2014 18:49, Eric Dumazet wrote:
> On Fri, 2014-04-25 at 18:01 +0200, Svenning Sørensen wrote:
>
>> You are right, of course, there are more effective ways to catch buggy
>> drivers.
>> But they will probably also be much more expensive.
>> This one is very cheap, being in a relatively cold path, especially
>> compared to the memcpy in the same path.
> It seems you missed the recent tipc thread then.
Yes, I'm not subscribed to the mailing list at the moment - too much 
traffic and too little time..

>
> They are many ways this code path can be abused, your patch only takes
> care of one case.
>
> In tipc case, they added skb to shinfo->frag_list without changing
> skb->data_len.
>
> So skb_tailroom(to) was not returning 0 as it should.
>
> The invariant should be more strict.
>
> BUG_ON(to->data + to->len != skb_tail_pointer(to));
>
> I still think a BUG() to catch this is better.
>
> There is no guarantee the developer/user will catch a WARN(),
> the bug could sit there a long time.
>
> I have no pity for serious bugs like that.
Neither do I.
Either BUG or WARN would have worked for me and would have saved me a 
lot of time in chasing this bug in a monster size (almost 800 files!) 
third party driver for this unfamiliar chip on newly developed embedded 
hardware (so I couldn't be sure it wasn't a hardware bug) with nothing 
else than a serial port and a few MB flash to store log files on.


>
> Since you spot buggy drivers, please provide their fixes or at least
> name them so that we can take care of them.
>
> Thanks
>
>
Strange, I was almost certain that I saw at least one with the same bug, 
but I can't seem to find it any longer.
Maybe I looked too quick, or maybe it's been fixed by the git pull I've 
made in the meantime.
I'll take a closer look if I get some spare time some day..
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1b62343..188a60a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3833,6 +3833,19 @@  bool skb_try_coalesce(struct sk_buff *to, struct 
sk_buff *from,
          return false;

      if (len <= skb_tailroom(to)) {
+        /* Guard against buggy network drivers that don't initialize
+         * skb properly: if they manipulate skb->len directly without
+         * setting skb->tail, the skb_put below may return a pointer to
+         * live data (such as TCP headers) that (without this guard)
+         * would get overwritten by coalescing.
+         * The correct thing to do is fixing the network drivers - but
+         * this guard should give us a chance to spot them before they
+         * cause real hard-to-debug problems :)
+         */
+        if (WARN_ONCE(skb_tail_pointer(to) < to->data,
+                  "skb inconsistency: tail below data\n"))
+            return false;
+
          BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
          *delta_truesize = 0;