Patchwork qlcnic: dont assume NET_IP_ALIGN is 2

login
register
mail settings
Submitter Eric Dumazet
Date Sept. 17, 2010, 9:57 a.m.
Message ID <1284717448.3391.75.camel@edumazet-laptop>
Download mbox | patch
Permalink /patch/65047/
State Accepted
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - Sept. 17, 2010, 9:57 a.m.
Amit, I noticed following bug in qlnic driver.

Also, skb->truesize should not be changed by drivers, unless dealing
with fragments.

When you have :
	skb->truesize = skb->len + sizeof(struct sk_buff);

you are lying to stack that can queue many "small" UDP packets,
accouting for small packets in socket rcvbuf, while the truesize was
really 1532 + sizeof(struct sk_buff)

Could you take a look ?

Thanks


[PATCH] qlcnic: dont assume NET_IP_ALIGN is 2

qlcnic driver allocates rx skbs and gives to hardware too bytes of extra
storage, allowing for corruption of kernel data.

NET_IP_ALIGN being 0 on some platforms (including x86), drivers should
not assume it's 2.

rds_ring->skb_size = rds_ring->dma_size + NET_IP_ALIGN;
...
skb = dev_alloc_skb(rds_ring->skb_size);
skb_reserve(skb, 2);
pci_map_single(pdev, skb->data, rds_ring->dma_size, PCI_DMA_FROMDEVICE);

(and rds_ring->skb_size == rds_ring->dma_size) -> bug


Because of extra alignment (1500 + 32) -> four extra bytes are available
before the struct skb_shared_info, so corruption is not noticed.

Note: this driver could use netdev_alloc_skb_ip_align()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 18, 2010, 5:58 a.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 17 Sep 2010 11:57:28 +0200

> [PATCH] qlcnic: dont assume NET_IP_ALIGN is 2
> 
> qlcnic driver allocates rx skbs and gives to hardware too bytes of extra
> storage, allowing for corruption of kernel data.
> 
> NET_IP_ALIGN being 0 on some platforms (including x86), drivers should
> not assume it's 2.
> 
> rds_ring->skb_size = rds_ring->dma_size + NET_IP_ALIGN;
> ...
> skb = dev_alloc_skb(rds_ring->skb_size);
> skb_reserve(skb, 2);
> pci_map_single(pdev, skb->data, rds_ring->dma_size, PCI_DMA_FROMDEVICE);
> 
> (and rds_ring->skb_size == rds_ring->dma_size) -> bug
> 
> 
> Because of extra alignment (1500 + 32) -> four extra bytes are available
> before the struct skb_shared_info, so corruption is not noticed.
> 
> Note: this driver could use netdev_alloc_skb_ip_align()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied to net-2.6, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 20, 2010, 12:18 p.m.
Le lundi 20 septembre 2010 à 06:16 -0500, Amit Salecha a écrit :

> This can be cleaned up. 
> Though I have one doubt. We are allocating larger packets than the actual data used.
> Doesn't it will break accounting ?

truesize accounts for the real size of buffers, not the used part in
them.

IMHO, a driver not dealing with fragments should not touch skb->truesize

# grep truesize drivers/net/tg3.c
<nothing>

If a driver deals with fragments, it probably should use "+=" operator
only, not hardcoding sizeof(struct sk_buff) thing that only core network
has to deal with.

$ find drivers/net/bnx2x|xargs grep truesize
drivers/net/bnx2x/bnx2x_cmn.c:		skb->truesize += frag_len;

Almost all drivers are fine, they are some of them that should be
changed.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 20, 2010, 3:58 p.m.
From: Amit Salecha <amit.salecha@qlogic.com>
Date: Mon, 20 Sep 2010 06:16:31 -0500

> Though I have one doubt. We are allocating larger packets than the actual data used.
> Doesn't it will break accounting ?

No, it will "fix" accounting.

You must charge to the SKB all of the non-shared memory that was
allocated to the SKB.

This means even if the packet only uses 128 bytes of the SKB
data area, you must still account for the full blob of linear
memory that was allocated for the SKB data area in skb->truesize.

Otherwise remote attackers could consume enormous amounts of memory by
tricking our socket accounting via carefully sized packets.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
amit salecha - Sept. 21, 2010, 8:19 a.m.
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, September 20, 2010 9:29 PM
> To: Amit Salecha
> Cc: eric.dumazet@gmail.com; netdev@vger.kernel.org; Ameen Rahman;
> Anirban Chakraborty
> Subject: Re: [PATCH] qlcnic: dont assume NET_IP_ALIGN is 2
> 
> From: Amit Salecha <amit.salecha@qlogic.com>
> Date: Mon, 20 Sep 2010 06:16:31 -0500
> 
> > Though I have one doubt. We are allocating larger packets than the
> actual data used.
> > Doesn't it will break accounting ?
> 
> No, it will "fix" accounting.
> 
> You must charge to the SKB all of the non-shared memory that was
> allocated to the SKB.
> 
> This means even if the packet only uses 128 bytes of the SKB
> data area, you must still account for the full blob of linear
> memory that was allocated for the SKB data area in skb->truesize.
> 
> Otherwise remote attackers could consume enormous amounts of memory by
> tricking our socket accounting via carefully sized packets.

Wont this affect throughput ?
As problem discuss in this thread http://www.mail-archive.com/netdev@vger.kernel.org/msg06848.html, it can affect tcp window scaling.

-Amit
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 21, 2010, 8:34 a.m.
Le mardi 21 septembre 2010 à 03:19 -0500, Amit Salecha a écrit :
> 
> > -----Original Message-----
> > From: David Miller [mailto:davem@davemloft.net]
> > Sent: Monday, September 20, 2010 9:29 PM
> > To: Amit Salecha
> > Cc: eric.dumazet@gmail.com; netdev@vger.kernel.org; Ameen Rahman;
> > Anirban Chakraborty
> > Subject: Re: [PATCH] qlcnic: dont assume NET_IP_ALIGN is 2
> > 
> > From: Amit Salecha <amit.salecha@qlogic.com>
> > Date: Mon, 20 Sep 2010 06:16:31 -0500
> > 
> > > Though I have one doubt. We are allocating larger packets than the
> > actual data used.
> > > Doesn't it will break accounting ?
> > 
> > No, it will "fix" accounting.
> > 
> > You must charge to the SKB all of the non-shared memory that was
> > allocated to the SKB.
> > 
> > This means even if the packet only uses 128 bytes of the SKB
> > data area, you must still account for the full blob of linear
> > memory that was allocated for the SKB data area in skb->truesize.
> > 
> > Otherwise remote attackers could consume enormous amounts of memory by
> > tricking our socket accounting via carefully sized packets.
> 
> Wont this affect throughput ?
> As problem discuss in this thread http://www.mail-archive.com/netdev@vger.kernel.org/msg06848.html, it can affect tcp window scaling.
> 

Amit, if you believe this is a problem, you should address it for all
NICS, not only qlcnic.

Qlcnic was lying to stack, because it consumed 2Kbytes blocs and
pretended they were consuming skb->len bytes.
(assuming MTU=1500, problem is worse if MTU is bigger)

So in order to improve "throughput", you were allowing for memory
exhaust and freeze of the _machine_ ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 21, 2010, 7:33 p.m.
From: Amit Salecha <amit.salecha@qlogic.com>
Date: Tue, 21 Sep 2010 03:41:42 -0500

>> So in order to improve "throughput", you were allowing for memory
>> exhaust and freeze of the _machine_ ?
>>
> This won't lead to such problem. truesize is used for accounting only.

Yes, it will.

Do you understand that we enforce both socket-level and system-wide
networking buffer usage in the stack?  And this limiting is based
upon skb->truesize and therefore only works if skb->truesize is
accurate?

It's meant to keep people from attacking a server and consuming large
percentages of system memory with networking buffer memory such that
other tasks cannot complete successfully.

And by mis-reporting the truesize you are subverting that entirely.

This qlcnic truesize bug is a huge security hole, can't you see this
now?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 75ba744..60ab753 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -1316,7 +1316,7 @@  qlcnic_alloc_rx_skb(struct qlcnic_adapter *adapter,
 		return -ENOMEM;
 	}
 
-	skb_reserve(skb, 2);
+	skb_reserve(skb, NET_IP_ALIGN);
 
 	dma = pci_map_single(pdev, skb->data,
 			rds_ring->dma_size, PCI_DMA_FROMDEVICE);