Message ID | 535A78ED.3020309@secomea.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2014-04-25 at 17:02 +0200, Svenning Sørensen wrote: > This patch adds a guard against coalescing packets received by buggy network > drivers that don't initialize skb->tail properly. > > Coalescing such packets will corrupt data in a way that makes it hard to > track down the real cause of the problem. Observed symptoms are a flood of > WARNs by tcp_recvmsg at random times, leaving no clear indication as to why > the packet buffers were corrupted. See for example this thread: > https://groups.google.com/forum/#!topic/linux.kernel/A8ZI9aXooSU > > Obviously the correct thing to do is fixing the drivers in question (eg. by > using skb_put() instead of just setting skb->len) - this patch will > hopefully > help to track them down with less effort than I had to put into it :) > > NB: in my case it was an out-of-tree Bluegiga WiFi driver; a quick look > indicates > that there are in-tree drivers (for hardware that I don't own) with the > same bug. > > > Signed-off-by: Svenning Soerensen <sss@secomea.com> If you want a debugging patch, this should happen way before TCP stack or even IP stack. Say you want to use this buggy driver on a router, this code path will never bit hit. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25-04-2014 17:43, Eric Dumazet wrote: > > If you want a debugging patch, this should happen way before TCP stack > or even IP stack. > > Say you want to use this buggy driver on a router, this code path will > never bit hit. > You are right, of course, there are more effective ways to catch buggy drivers. But they will probably also be much more expensive. This one is very cheap, being in a relatively cold path, especially compared to the memcpy in the same path. Svenning -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-04-25 at 18:01 +0200, Svenning Sørensen wrote: > You are right, of course, there are more effective ways to catch buggy > drivers. > But they will probably also be much more expensive. > This one is very cheap, being in a relatively cold path, especially > compared to the memcpy in the same path. It seems you missed the recent tipc thread then. They are many ways this code path can be abused, your patch only takes care of one case. In tipc case, they added skb to shinfo->frag_list without changing skb->data_len. So skb_tailroom(to) was not returning 0 as it should. The invariant should be more strict. BUG_ON(to->data + to->len != skb_tail_pointer(to)); I still think a BUG() to catch this is better. There is no guarantee the developer/user will catch a WARN(), the bug could sit there a long time. I have no pity for serious bugs like that. Since you spot buggy drivers, please provide their fixes or at least name them so that we can take care of them. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25-04-2014 18:49, Eric Dumazet wrote: > On Fri, 2014-04-25 at 18:01 +0200, Svenning Sørensen wrote: > >> You are right, of course, there are more effective ways to catch buggy >> drivers. >> But they will probably also be much more expensive. >> This one is very cheap, being in a relatively cold path, especially >> compared to the memcpy in the same path. > It seems you missed the recent tipc thread then. Yes, I'm not subscribed to the mailing list at the moment - too much traffic and too little time.. > > They are many ways this code path can be abused, your patch only takes > care of one case. > > In tipc case, they added skb to shinfo->frag_list without changing > skb->data_len. > > So skb_tailroom(to) was not returning 0 as it should. > > The invariant should be more strict. > > BUG_ON(to->data + to->len != skb_tail_pointer(to)); > > I still think a BUG() to catch this is better. > > There is no guarantee the developer/user will catch a WARN(), > the bug could sit there a long time. > > I have no pity for serious bugs like that. Neither do I. Either BUG or WARN would have worked for me and would have saved me a lot of time in chasing this bug in a monster size (almost 800 files!) third party driver for this unfamiliar chip on newly developed embedded hardware (so I couldn't be sure it wasn't a hardware bug) with nothing else than a serial port and a few MB flash to store log files on. > > Since you spot buggy drivers, please provide their fixes or at least > name them so that we can take care of them. > > Thanks > > Strange, I was almost certain that I saw at least one with the same bug, but I can't seem to find it any longer. Maybe I looked too quick, or maybe it's been fixed by the git pull I've made in the meantime. I'll take a closer look if I get some spare time some day.. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1b62343..188a60a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3833,6 +3833,19 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, return false; if (len <= skb_tailroom(to)) { + /* Guard against buggy network drivers that don't initialize + * skb properly: if they manipulate skb->len directly without + * setting skb->tail, the skb_put below may return a pointer to + * live data (such as TCP headers) that (without this guard) + * would get overwritten by coalescing. + * The correct thing to do is fixing the network drivers - but + * this guard should give us a chance to spot them before they + * cause real hard-to-debug problems :) + */ + if (WARN_ONCE(skb_tail_pointer(to) < to->data, + "skb inconsistency: tail below data\n")) + return false; + BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); *delta_truesize = 0;
This patch adds a guard against coalescing packets received by buggy network drivers that don't initialize skb->tail properly. Coalescing such packets will corrupt data in a way that makes it hard to track down the real cause of the problem. Observed symptoms are a flood of WARNs by tcp_recvmsg at random times, leaving no clear indication as to why the packet buffers were corrupted. See for example this thread: https://groups.google.com/forum/#!topic/linux.kernel/A8ZI9aXooSU Obviously the correct thing to do is fixing the drivers in question (eg. by using skb_put() instead of just setting skb->len) - this patch will hopefully help to track them down with less effort than I had to put into it :) NB: in my case it was an out-of-tree Bluegiga WiFi driver; a quick look indicates that there are in-tree drivers (for hardware that I don't own) with the same bug. Signed-off-by: Svenning Soerensen <sss@secomea.com> return true; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html