Patchwork bnx2x: avoid atomic allocations during initialization

login
register
mail settings
Submitter Michal Schmidt
Date Sept. 5, 2013, 8:13 p.m.
Message ID <1378411989-19775-1-git-send-email-mschmidt@redhat.com>
Download mbox | patch
Permalink /patch/272944/
State Accepted
Delegated to: David Miller
Headers show

Comments

Michal Schmidt - Sept. 5, 2013, 8:13 p.m.
During initialization bnx2x allocates significant amounts of memory
(for rx data, rx SGEs, TPA pool) using atomic allocations.

I received a report where bnx2x failed to allocate SGEs and it had
to fall back to TPA-less operation.

Let's use GFP_KERNEL allocations during initialization, which runs
in process context. Add gfp_t parameters to functions that are used
both in initialization and in the receive path.

Use an unlikely branch in bnx2x_frag_alloc() to avoid atomic allocation
by netdev_alloc_frag(). The branch is taken several thousands of times
during initialization, but then never more. Note that fp->rx_frag_size
is never greater than PAGE_SIZE, so __get_free_page() can be used here.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 38 +++++++++++++++----------
 1 file changed, 23 insertions(+), 15 deletions(-)
David Miller - Sept. 6, 2013, 6:27 p.m.
From: Michal Schmidt <mschmidt@redhat.com>
Date: Thu,  5 Sep 2013 22:13:09 +0200

> During initialization bnx2x allocates significant amounts of memory
> (for rx data, rx SGEs, TPA pool) using atomic allocations.
> 
> I received a report where bnx2x failed to allocate SGEs and it had
> to fall back to TPA-less operation.
> 
> Let's use GFP_KERNEL allocations during initialization, which runs
> in process context. Add gfp_t parameters to functions that are used
> both in initialization and in the receive path.
> 
> Use an unlikely branch in bnx2x_frag_alloc() to avoid atomic allocation
> by netdev_alloc_frag(). The branch is taken several thousands of times
> during initialization, but then never more. Note that fp->rx_frag_size
> is never greater than PAGE_SIZE, so __get_free_page() can be used here.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

This change looks good to me, if some Broadcom folks could take a look
and ACK/NACK that would be great.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Kravkov - Sept. 7, 2013, 1:45 a.m.
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: Friday, September 06, 2013 9:28 PM
> To: mschmidt@redhat.com
> Cc: netdev@vger.kernel.org; Ariel Elior; Eilon Greenstein
> Subject: Re: [PATCH] bnx2x: avoid atomic allocations during initialization
> 
> From: Michal Schmidt <mschmidt@redhat.com>
> Date: Thu,  5 Sep 2013 22:13:09 +0200
> 
> > During initialization bnx2x allocates significant amounts of memory
> > (for rx data, rx SGEs, TPA pool) using atomic allocations.
> >
> > I received a report where bnx2x failed to allocate SGEs and it had to
> > fall back to TPA-less operation.
> >
> > Let's use GFP_KERNEL allocations during initialization, which runs in
> > process context. Add gfp_t parameters to functions that are used both
> > in initialization and in the receive path.
> >
> > Use an unlikely branch in bnx2x_frag_alloc() to avoid atomic
> > allocation by netdev_alloc_frag(). The branch is taken several
> > thousands of times during initialization, but then never more. Note
> > that fp->rx_frag_size is never greater than PAGE_SIZE, so
> __get_free_page() can be used here.
> >
> > Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> 
> This change looks good to me, if some Broadcom folks could take a look and
> ACK/NACK that would be great.
>

Michal,

Once you allocated the memory during initialization , you will most probably fail to allocate its replacement during RX handling (on this machine).
This will cause packets to drop continuously (buffers which were allocated at init time will be returned to the ring).

I prefer driver to allow user to receive traffic in non TPA mode, than not receive traffic at all. 

Thanks
Dmitry
 

 
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 7, 2013, 4:29 a.m.
From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Sat, 7 Sep 2013 01:45:10 +0000

> Once you allocated the memory during initialization , you will most
> probably fail to allocate its replacement during RX handling (on
> this machine).

Not true, these two events can exist in totally different timeframes
and totally different memory pressure situations.

Probe should never fail because of an atomic memory allocation if at
all possible.

You should use sleeping allocations in every possible situation for
maximum stability, and this absolutely matches the approach taken
by every other major driver.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Kravkov - Sept. 7, 2013, 9:07 a.m.
On Sat, Sep 7, 2013 at 7:29 AM, David Miller <davem@davemloft.net> wrote:
> From: "Dmitry Kravkov" <dmitry@broadcom.com>
> Date: Sat, 7 Sep 2013 01:45:10 +0000
>
>> Once you allocated the memory during initialization , you will most
>> probably fail to allocate its replacement during RX handling (on
>> this machine).
>
> Not true, these two events can exist in totally different timeframes
> and totally different memory pressure situations.
As seen from the customers, these allocation fail when system
constantly under memory pressure - like VM environment or 32-bit
platforms

>
> Probe should never fail because of an atomic memory allocation if at
> all possible.
We here are in open() flow, not probe()

>
> You should use sleeping allocations in every possible situation for
> maximum stability, and this absolutely matches the approach taken
> by every other major driver.
Sleeping allocation will be valid only for one ring overlap - it just
delays the issue, not resolving it, after ring overlap ALL of them
will be replaced by allocations from interrupt context. And again if
we, discover issue earlier we can provide system with fully working
networking with very acceptable performance using SW GRO stack.
Instead of having device which drops every TPA/FW_GRO aggregation -
user will not able to ssh :(

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 8, 2013, 1:48 a.m.
From: Dmitry Kravkov <dkravkov@gmail.com>
Date: Sat, 7 Sep 2013 12:07:50 +0300

> On Sat, Sep 7, 2013 at 7:29 AM, David Miller <davem@davemloft.net> wrote:
>> Probe should never fail because of an atomic memory allocation if at
>> all possible.
> We here are in open() flow, not probe()

It's the same from my perspective.

Better to drop some packets than have _COMPLETELY INOPERATIVE INTERFACE_.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Schmidt - Sept. 9, 2013, 12:20 p.m.
On 09/07/2013 03:45 AM, Dmitry Kravkov wrote:
> Once you allocated the memory during initialization , you will most
> probably fail to allocate its replacement during RX handling (on this
> machine).

Why do you think this would happen "most probably"? I would instead
expect the VM subsystem to respond to the memory pressure created by the
initial GFP_KERNEL allocations by freeing some memory to allow the
future atomic allocations to succeed.

Michal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 11, 2013, 7:44 p.m.
From: Michal Schmidt <mschmidt@redhat.com>
Date: Thu,  5 Sep 2013 22:13:09 +0200

> During initialization bnx2x allocates significant amounts of memory
> (for rx data, rx SGEs, TPA pool) using atomic allocations.
> 
> I received a report where bnx2x failed to allocate SGEs and it had
> to fall back to TPA-less operation.
> 
> Let's use GFP_KERNEL allocations during initialization, which runs
> in process context. Add gfp_t parameters to functions that are used
> both in initialization and in the receive path.
> 
> Use an unlikely branch in bnx2x_frag_alloc() to avoid atomic allocation
> by netdev_alloc_frag(). The branch is taken several thousands of times
> during initialization, but then never more. Note that fp->rx_frag_size
> is never greater than PAGE_SIZE, so __get_free_page() can be used here.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

I've applied this, the voiced objections were completely unreasonable
and absolutely do not match the established basic approaches to memory
allocation taken in every other major networking driver.

Thanks Michal.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 8d726f6..884e8ad 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -484,10 +484,10 @@  static void bnx2x_set_gro_params(struct sk_buff *skb, u16 parsing_flags,
 	NAPI_GRO_CB(skb)->count = num_of_coalesced_segs;
 }
 
-static int bnx2x_alloc_rx_sge(struct bnx2x *bp,
-			      struct bnx2x_fastpath *fp, u16 index)
+static int bnx2x_alloc_rx_sge(struct bnx2x *bp, struct bnx2x_fastpath *fp,
+			      u16 index, gfp_t gfp_mask)
 {
-	struct page *page = alloc_pages(GFP_ATOMIC, PAGES_PER_SGE_SHIFT);
+	struct page *page = alloc_pages(gfp_mask, PAGES_PER_SGE_SHIFT);
 	struct sw_rx_page *sw_buf = &fp->rx_page_ring[index];
 	struct eth_rx_sge *sge = &fp->rx_sge_ring[index];
 	dma_addr_t mapping;
@@ -566,7 +566,7 @@  static int bnx2x_fill_frag_skb(struct bnx2x *bp, struct bnx2x_fastpath *fp,
 
 		/* If we fail to allocate a substitute page, we simply stop
 		   where we are and drop the whole packet */
-		err = bnx2x_alloc_rx_sge(bp, fp, sge_idx);
+		err = bnx2x_alloc_rx_sge(bp, fp, sge_idx, GFP_ATOMIC);
 		if (unlikely(err)) {
 			bnx2x_fp_qstats(bp, fp)->rx_skb_alloc_failed++;
 			return err;
@@ -610,12 +610,17 @@  static void bnx2x_frag_free(const struct bnx2x_fastpath *fp, void *data)
 		kfree(data);
 }
 
-static void *bnx2x_frag_alloc(const struct bnx2x_fastpath *fp)
+static void *bnx2x_frag_alloc(const struct bnx2x_fastpath *fp, gfp_t gfp_mask)
 {
-	if (fp->rx_frag_size)
+	if (fp->rx_frag_size) {
+		/* GFP_KERNEL allocations are used only during initialization */
+		if (unlikely(gfp_mask & __GFP_WAIT))
+			return (void *)__get_free_page(gfp_mask);
+
 		return netdev_alloc_frag(fp->rx_frag_size);
+	}
 
-	return kmalloc(fp->rx_buf_size + NET_SKB_PAD, GFP_ATOMIC);
+	return kmalloc(fp->rx_buf_size + NET_SKB_PAD, gfp_mask);
 }
 
 #ifdef CONFIG_INET
@@ -695,7 +700,7 @@  static void bnx2x_tpa_stop(struct bnx2x *bp, struct bnx2x_fastpath *fp,
 		goto drop;
 
 	/* Try to allocate the new data */
-	new_data = bnx2x_frag_alloc(fp);
+	new_data = bnx2x_frag_alloc(fp, GFP_ATOMIC);
 	/* Unmap skb in the pool anyway, as we are going to change
 	   pool entry status to BNX2X_TPA_STOP even if new skb allocation
 	   fails. */
@@ -746,15 +751,15 @@  drop:
 	bnx2x_fp_stats(bp, fp)->eth_q_stats.rx_skb_alloc_failed++;
 }
 
-static int bnx2x_alloc_rx_data(struct bnx2x *bp,
-			       struct bnx2x_fastpath *fp, u16 index)
+static int bnx2x_alloc_rx_data(struct bnx2x *bp, struct bnx2x_fastpath *fp,
+			       u16 index, gfp_t gfp_mask)
 {
 	u8 *data;
 	struct sw_rx_bd *rx_buf = &fp->rx_buf_ring[index];
 	struct eth_rx_bd *rx_bd = &fp->rx_desc_ring[index];
 	dma_addr_t mapping;
 
-	data = bnx2x_frag_alloc(fp);
+	data = bnx2x_frag_alloc(fp, gfp_mask);
 	if (unlikely(data == NULL))
 		return -ENOMEM;
 
@@ -947,7 +952,8 @@  int bnx2x_rx_int(struct bnx2x_fastpath *fp, int budget)
 			memcpy(skb->data, data + pad, len);
 			bnx2x_reuse_rx_data(fp, bd_cons, bd_prod);
 		} else {
-			if (likely(bnx2x_alloc_rx_data(bp, fp, bd_prod) == 0)) {
+			if (likely(bnx2x_alloc_rx_data(bp, fp, bd_prod,
+						       GFP_ATOMIC) == 0)) {
 				dma_unmap_single(&bp->pdev->dev,
 						 dma_unmap_addr(rx_buf, mapping),
 						 fp->rx_buf_size,
@@ -1307,7 +1313,8 @@  void bnx2x_init_rx_rings(struct bnx2x *bp)
 				struct sw_rx_bd *first_buf =
 					&tpa_info->first_buf;
 
-				first_buf->data = bnx2x_frag_alloc(fp);
+				first_buf->data =
+					bnx2x_frag_alloc(fp, GFP_KERNEL);
 				if (!first_buf->data) {
 					BNX2X_ERR("Failed to allocate TPA skb pool for queue[%d] - disabling TPA on this queue!\n",
 						  j);
@@ -1329,7 +1336,8 @@  void bnx2x_init_rx_rings(struct bnx2x *bp)
 			for (i = 0, ring_prod = 0;
 			     i < MAX_RX_SGE_CNT*NUM_RX_SGE_PAGES; i++) {
 
-				if (bnx2x_alloc_rx_sge(bp, fp, ring_prod) < 0) {
+				if (bnx2x_alloc_rx_sge(bp, fp, ring_prod,
+						       GFP_KERNEL) < 0) {
 					BNX2X_ERR("was only able to allocate %d rx sges\n",
 						  i);
 					BNX2X_ERR("disabling TPA for queue[%d]\n",
@@ -4214,7 +4222,7 @@  static int bnx2x_alloc_rx_bds(struct bnx2x_fastpath *fp,
 	 * fp->eth_q_stats.rx_skb_alloc_failed = 0
 	 */
 	for (i = 0; i < rx_ring_size; i++) {
-		if (bnx2x_alloc_rx_data(bp, fp, ring_prod) < 0) {
+		if (bnx2x_alloc_rx_data(bp, fp, ring_prod, GFP_KERNEL) < 0) {
 			failure_cnt++;
 			continue;
 		}