Message ID | 4B4357A9.5010507@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 05 Jan 2010 16:15:53 +0100 > If hardware is buggy, driver should focus on security first, > performance doesnt matter in this case. Copying every packet is way too much of a performance hit to accept. Things are not so black and white, there are shades of grey. We can fix this without making things so slow, try harder... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 05, 2010 at 12:40:40PM -0800, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 05 Jan 2010 16:15:53 +0100 > > > If hardware is buggy, driver should focus on security first, > > performance doesnt matter in this case. > > Copying every packet is way too much of a performance hit to > accept. > > Things are not so black and white, there are shades of grey. > > We can fix this without making things so slow, try harder... I think thats what I've done in my third version of the patch here. By raising the copybreak value, and setting an initial allocation size of 16k, we: 1) close the security hole by default 2) allow for smaller allocations during run time (allocations of the recevied frame size, rather than 16k every time) 3) Give the user an out if they need the added performance. They have a module option to lower the copybreak thrshold, and changing the mtu to a different value will reset the rx allocation size, and frame filter value (and issue a big warning that doing so may be dangerous) As we determine which hardware id's are buggy, we can start blacklisting them specifically, ejoining them from making mtu changes at all, and when we are confident we have identified all the buggy hardware, we can flip the default behavior to allocate a resonable frame size and set a good copybreak threshold for anything not on the blacklist. Regards Neil > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Neil Horman <nhorman@tuxdriver.com> Date: Tue, 5 Jan 2010 16:38:56 -0500 > I think thats what I've done in my third version of the patch here. > By raising the copybreak value, and setting an initial allocation > size of 16k, we: Right, I'll take an even closer look at your patch later this afternoon, thanks Neil. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 05, 2010 at 01:45:37PM -0800, David Miller wrote: > From: Neil Horman <nhorman@tuxdriver.com> > Date: Tue, 5 Jan 2010 16:38:56 -0500 > > > I think thats what I've done in my third version of the patch here. > > By raising the copybreak value, and setting an initial allocation > > size of 16k, we: > > Right, I'll take an even closer look at your patch later this > afternoon, thanks Neil. > No problem, thank you Dave! Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 05, 2010 at 12:40:40PM -0800, David Miller wrote:
[...]
> We can fix this without making things so slow, try harder...
I have reproduced something which seems rather close to the behavior
described in the ccc paper (I have a couple of old PCI rev 10 8169
chipset based network cards).
The setup includes two r8169 face-to-face. The target has an usual
MTU. I tweaked its skb allocation function to include 4000 bytes
of space instead of the usual 1500 something. It is grossly poisoned
to estimate the size of the actual DMA. The RX descriptor has no
knowledge of the extra buffer space above 1500 bytes.
The sender has a bigger MTU (say 6000) and it sends increasingly
sized ping. No rocket science so far.
The result looks like this (so far I checked only one descriptor at
a time when a packet is received):
entry 000 opts1 0x3281c040 opts2 0x00000000 size 066
entry 001 opts1 0x3486c5ec opts2 0x00000000 size 1518
entry 002 opts1 0x3481c040 opts2 0x00000000 size 066
entry 003 opts1 0x3486c5ed opts2 0x00000000 size 1518
entry 004 opts1 0x3486c5ee opts2 0x00000000 size 1518
entry 005 opts1 0x3481c040 opts2 0x00000000 size 066
entry 006 opts1 0x3486c5ef opts2 0x00000000 size 1522
entry 007 opts1 0x3481c040 opts2 0x00000000 size 066
entry 008 opts1 0x3486c5f0 opts2 0x00000000 size 1522 <- first test packet
entry 009 opts1 0x3486c5f1 opts2 0x00000000 size 1522
entry 010 opts1 0x3486c5f2 opts2 0x00000000 size 1522
entry 011 opts1 0x3486c5f3 opts2 0x00000000 size 1526
entry 012 opts1 0x3486c5f4 opts2 0x00000000 size 1526
entry 013 opts1 0x3486c5f5 opts2 0x00000000 size 1526 <- ... so far so good
entry 014 opts1 0x3486c5f6 opts2 0x00000000 size 1526
entry 015 opts1 0x3486c5f7 opts2 0x00000000 size 1530
entry 016 opts1 0x3486c5f8 opts2 0x00000000 size 1530
entry 017 opts1 0x3486c5f9 opts2 0x00000000 size 1530
entry 018 opts1 0x3486c5fa opts2 0x00000000 size 1530
entry 019 opts1 0x3486c5fb opts2 0x00000000 size 1534
entry 020 opts1 0x3486c5fc opts2 0x00000000 size 1534
entry 021 opts1 0x3486c5fd opts2 0x00000000 size 1534
entry 022 opts1 0x3486c5fe opts2 0x00000000 size 1534
entry 023 opts1 0x3486c5ff opts2 0x00000000 size 1536
entry 024 opts1 0x20803ff0 opts2 0x00000000 size 1536 <- problem starts here.
entry 025 opts1 0x00803ff0 opts2 0x00000000 size 1536 The lines below are
entry 026 opts1 0x00803ff0 opts2 0x00000000 size 1536 not meaningful.
entry 027 opts1 0x00803ff0 opts2 0x00000000 size 1536
The driver will mess things up because it needs the desc->opts1.RxRes bit
to be set before it claims that the packet is broken. Since this condition
is not fulfilled when opts1 = 0x20803ff0, the problem goes unnoticed and
it can be further exploited.
AFAIUI, if the driver does not lose sync (to use the paper's words), i.e.
if it notices the condition above and stops / resets the chipset, there
is not much to exploit.
From: Francois Romieu <romieu@fr.zoreil.com> Date: Thu, 7 Jan 2010 02:01:22 +0100 > entry 024 opts1 0x20803ff0 opts2 0x00000000 size 1536 <- problem starts here. > entry 025 opts1 0x00803ff0 opts2 0x00000000 size 1536 The lines below are > entry 026 opts1 0x00803ff0 opts2 0x00000000 size 1536 not meaningful. > entry 027 opts1 0x00803ff0 opts2 0x00000000 size 1536 > > The driver will mess things up because it needs the desc->opts1.RxRes bit > to be set before it claims that the packet is broken. Since this condition > is not fulfilled when opts1 = 0x20803ff0, the problem goes unnoticed and > it can be further exploited. > > AFAIUI, if the driver does not lose sync (to use the paper's words), i.e. > if it notices the condition above and stops / resets the chipset, there > is not much to exploit. Thanks for doing this test. Also note the FirstFrag and LastFrag bits. Here we see FirstFrag set, and this indicates a fragmented frame. The fix seems to be to simply check the error bits unconditionally, or alternatively validate the FirstFrag and LastFrag bits unconditionally. I don't even think we need to reset the chip, just do what the code does now and recycle the buffer back to the chip as we currently do when RxRes is set. We can keep a state bit around, showing that we saw the beginning of a fragmented frame, and we recycle buffers back to the chip when in state state until we see LastFrag. That should work, right? In your trace, we're merely seeing the >1536 frame being chopped into a fragmented frame by the device. The first one has FirstFrag set and the remaining (that you've shown) have neither bit set. Did you eventually get a descriptor with LastFrag (bit 28) set? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2010 at 05:15:14PM -0800, David Miller wrote: [...] > I don't even think we need to reset the chip, just do what the code > does now and recycle the buffer back to the chip as we currently do > when RxRes is set. We can keep a state bit around, showing that we > saw the beginning of a fragmented frame, and we recycle buffers back > to the chip when in state state until we see LastFrag. > > That should work, right ? > > In your trace, we're merely seeing the >1536 frame being chopped > into a fragmented frame by the device. The first one has FirstFrag > set and the remaining (that you've shown) have neither bit set. > > Did you eventually get a descriptor with LastFrag (bit 28) set? Yes. For a single packet, at the time the first interruption is received, 4 Rx descriptors have been written. The FirstFrag bit is set on the first descriptor only and the LastFrag bit is not set anywhere : entry 019 opts1 0x20803ff0 opts2 0x00000000 used 1536 <- FirstFrag entry 020 opts1 0x00803ff0 opts2 0x00000000 used 1536 entry 021 opts1 0x00803ff0 opts2 0x00000000 used 1536 entry 022 opts1 0x00803ff0 opts2 0x00000000 used 1536 entry 023 opts1 0x80000600 opts2 0x00000000 used 1524 ^^^^ (apparently we race with the DMA transfer) More events pop up, starting at 023. They are all identical to 20 / 21 /22. It then ends as : entry 028 opts1 0x00803ff0 opts2 0x00000000 used 1536 entry 029 opts1 0x10803ff0 opts2 0x00000000 used 1010 <- LastFrag bit set entry 030 opts1 0x80000600 opts2 0x00000000 used 000 <- 0 used byte. Entry entry 031 opts1 0x80000600 opts2 0x00000000 used 000 is available. entry 032 opts1 0x80000600 opts2 0x00000000 used 000 The total size of the DMA is not far from 16384 bytes (1536 * 10 + 1010). Simply recycling the buffer may work. I'll do a few tests with it : I am still unable to corrupt the descriptor ring itself.
From: Francois Romieu <romieu@fr.zoreil.com> Date: Sat, 9 Jan 2010 00:48:00 +0100 > For a single packet, at the time the first interruption is received, > 4 Rx descriptors have been written. The FirstFrag bit is set on the > first descriptor only and the LastFrag bit is not set anywhere : > > entry 019 opts1 0x20803ff0 opts2 0x00000000 used 1536 <- FirstFrag > entry 020 opts1 0x00803ff0 opts2 0x00000000 used 1536 > entry 021 opts1 0x00803ff0 opts2 0x00000000 used 1536 > entry 022 opts1 0x00803ff0 opts2 0x00000000 used 1536 > entry 023 opts1 0x80000600 opts2 0x00000000 used 1524 > ^^^^ > (apparently we race with the DMA transfer) > > More events pop up, starting at 023. They are all identical to 20 / 21 /22. > It then ends as : > > entry 028 opts1 0x00803ff0 opts2 0x00000000 used 1536 > entry 029 opts1 0x10803ff0 opts2 0x00000000 used 1010 <- LastFrag bit set > entry 030 opts1 0x80000600 opts2 0x00000000 used 000 <- 0 used byte. Entry > entry 031 opts1 0x80000600 opts2 0x00000000 used 000 is available. > entry 032 opts1 0x80000600 opts2 0x00000000 used 000 > > The total size of the DMA is not far from 16384 bytes (1536 * 10 + 1010). My guess is that once the chip starts filling out frags like this, the signal to trigger what should be the LastFrag entry never propagates to the DMA engine and thus the descriptor table. Thus the device just keeps filling packet contents with subsequent data, with FirstFrag and LastFrag clear, until that ~16K limit is reached. At which point it finally sets LastFrag. > Simply recycling the buffer may work. I'll do a few tests with it : I am > still unable to corrupt the descriptor ring itself. Whilst the above will end up gobbling up to (16K - BIG_PACKET_SZ) worth of innocent frames, the DMA engine seems to keep things at least self-consistent. The only bug seems to be the omission of the LastFrag trigger at the proper place. About recycling, as far as I can tell that's exactly what the driver already does in these exact cases, right? If RxRes is not set, and we don't see both FirstFrag and LastFrag set, we'll recycle the frame. if (unlikely(status & RxRES)) { ... } else { ... if (unlikely(rtl8169_fragmented_frame(status))) { dev->stats.rx_dropped++; dev->stats.rx_length_errors++; rtl8169_mark_to_asic(desc, tp->rx_buf_sz); continue; } Therefore we shouldn't need to change anything and there is actually no "bug" or "exploit" at all. We just end up dropping some RX frames because the chip didn't DMA them properly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2010-01-08 at 16:02 -0800, David Miller wrote: [...] > Whilst the above will end up gobbling up to (16K - BIG_PACKET_SZ) > worth of innocent frames, the DMA engine seems to keep things at least > self-consistent. > > The only bug seems to be the omission of the LastFrag trigger at the > proper place. No, the attacker controls the completion status by writing it in previous valid frames. Please read the slides (<http://events.ccc.de/congress/2009/Fahrplan/attachments/1483_26c3_ipv4_fuckups.pdf> pages 80-87). I suspect that: 1. There is an internal ring buffer for RX DMA containing both frame payload and completion status 2. When a frame is (slightly?) over-length, the ingress and egress logic can disagree about the length of payload in the buffer 3. This results in stale data (usually frame payload) being written to memory as the completion status It is conceivable that the bug can be avoided simply by rounding the RxMaxSize. [...] > Therefore we shouldn't need to change anything and there is actually > no "bug" or "exploit" at all. We just end up dropping some RX frames > because the chip didn't DMA them properly. The intent of the exploit is precisely to cause other packets to be dropped! Ben.
On Sun, Jan 10, 2010 at 01:57:18AM +0000, Ben Hutchings wrote: > On Fri, 2010-01-08 at 16:02 -0800, David Miller wrote: > [...] > > Whilst the above will end up gobbling up to (16K - BIG_PACKET_SZ) > > worth of innocent frames, the DMA engine seems to keep things at least > > self-consistent. > > > > The only bug seems to be the omission of the LastFrag trigger at the > > proper place. > > No, the attacker controls the completion status by writing it in > previous valid frames. Please read the slides > (<http://events.ccc.de/congress/2009/Fahrplan/attachments/1483_26c3_ipv4_fuckups.pdf> pages 80-87). Yes. The paper suggests that it should be possible to control opts1 either completely or enough to break things badly. The first packet does not hurt too much. Things look different in the descriptor of the subsequent packets. I am more or less able to control bits 8-23 from 0-31 in opts1, enough to be unpleasant. Iff the FirstFrag and LastFrag bits can not be set on these packets, it should be enough to (1) do the fragmented_frame test sooner and return the descriptor to the chipset. Otherwise we can (2) take a complete reset on the first suspect packet (whose pattern is more specific) to stop the challenger here. FWIW, a netratelimited printk of the source mac address may help too. I am biased in favor of (2) as: - it will not inhibit multi-desc packets - the challenger is supposed to be in the LAN and can already hurt quite a lot anyway Comments ?
From: Francois Romieu <romieu@fr.zoreil.com> Date: Mon, 11 Jan 2010 00:50:17 +0100 > Iff the FirstFrag and LastFrag bits can not be set on these packets, > it should be enough to (1) do the fragmented_frame test sooner and > return the descriptor to the chipset. Otherwise we can (2) take a > complete reset on the first suspect packet (whose pattern is more > specific) to stop the challenger here. > > FWIW, a netratelimited printk of the source mac address may help too. > > I am biased in favor of (2) as: > - it will not inhibit multi-desc packets > - the challenger is supposed to be in the LAN and can already hurt > quite a lot anyway > > Comments ? In my opinion if you reset, you give them more power. Instead of just dropping the next few frames, you allow them to cause a drop of how ever many RX frames can arrive during the reset period _PLUS_ the amount of other RX frames which were in the receive ring at the point of detection. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 10, 2010 at 10:45:04PM -0800, David Miller wrote: [...] > In my opinion if you reset, you give them more power. > > Instead of just dropping the next few frames, you allow them > to cause a drop of how ever many RX frames can arrive during > the reset period _PLUS_ the amount of other RX frames which > were in the receive ring at the point of detection. Sure. The fragmented frame test has been moved a few lines up, before checking for the status bits. Exceedingly small frames are detected and dropped as well. I have spent the evening disrupting the communication. As a challenger one does not even need complete control : send a few random packets and the card goes to neverland. :o/ I'll do some tests limiting the crap at the first packet.
From: Francois Romieu <romieu@fr.zoreil.com> Date: Tue, 12 Jan 2010 01:16:46 +0100 > I have spent the evening disrupting the communication. As a challenger > one does not even need complete control : send a few random packets and > the card goes to neverland. :o/ > > I'll do some tests limiting the crap at the first packet. Thanks for doing this work Francois. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 22:24 Mon 11 Jan 2010, David Miller wrote: > From: Francois Romieu <romieu@fr.zoreil.com> > Date: Tue, 12 Jan 2010 01:16:46 +0100 > > > I have spent the evening disrupting the communication. As a challenger > > one does not even need complete control : send a few random packets and > > the card goes to neverland. :o/ > > > > I'll do some tests limiting the crap at the first packet. > > Thanks for doing this work Francois. How is the testing going? Thanks again, Brandon -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 12, 2010 at 01:16:46AM +0100, Francois Romieu wrote: > On Sun, Jan 10, 2010 at 10:45:04PM -0800, David Miller wrote: > [...] > > In my opinion if you reset, you give them more power. > > > > Instead of just dropping the next few frames, you allow them > > to cause a drop of how ever many RX frames can arrive during > > the reset period _PLUS_ the amount of other RX frames which > > were in the receive ring at the point of detection. > > Sure. > > The fragmented frame test has been moved a few lines up, before checking > for the status bits. Exceedingly small frames are detected and dropped > as well. > > I have spent the evening disrupting the communication. As a challenger > one does not even need complete control : send a few random packets and > the card goes to neverland. :o/ > > I'll do some tests limiting the crap at the first packet. > Any further results you can share with us Francois? Thanks Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16:50 Sat 30 Jan 2010, Neil Horman wrote: > On Tue, Jan 12, 2010 at 01:16:46AM +0100, Francois Romieu wrote: > > On Sun, Jan 10, 2010 at 10:45:04PM -0800, David Miller wrote: > > [...] > > > In my opinion if you reset, you give them more power. > > > > > > Instead of just dropping the next few frames, you allow them > > > to cause a drop of how ever many RX frames can arrive during > > > the reset period _PLUS_ the amount of other RX frames which > > > were in the receive ring at the point of detection. > > > > Sure. > > > > The fragmented frame test has been moved a few lines up, before checking > > for the status bits. Exceedingly small frames are detected and dropped > > as well. > > > > I have spent the evening disrupting the communication. As a challenger > > one does not even need complete control : send a few random packets and > > the card goes to neverland. :o/ > > > > I'll do some tests limiting the crap at the first packet. > > > Any further results you can share with us Francois? Can I offer any help? I found that I have access to a Rev 0x10 card. Cheers, Brandon -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 60f96c4..c2bbf59 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -4500,7 +4500,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev, } else { struct sk_buff *skb = tp->Rx_skbuff[entry]; dma_addr_t addr = le64_to_cpu(desc->addr); - int pkt_size = (status & 0x00001FFF) - 4; + int pkt_size = (status & 0x00001FFF); struct pci_dev *pdev = tp->pci_dev; /*